Question 1

What is Big Data

Accepted Answer

information that cannot be processed using traditional methods due to:
- volume - extemely large datasets,
- variety - of sources
- velocity - often need to be real-time

Often

raw – unclear how to obtain value
Unstructured / Semi structured

Today companies have the ability to store any data they generate, but don’t know what to do with it.

more data = more processing

Question 2

key drivers of Big Data

Accepted Answer

This trend started when enterprises created more data from their operations Web and social media content explosion IoT networked devices

Question 3

3 reasons for the expontential data growth

Accepted Answer

Instrumentation more sensors more storage Interconnection more things are interconnected Intelligence computers have become cheap software has become powerful Systems on a Chip (SoC) are dense, fast, cheap

Question 4

3 Big Data Characteristics

Accepted Answer

Volume Variety Velocity (Veracity - trustworthness) (Value)

Question 5

Can data warehouses handle Big Data and why?

Accepted Answer

They're great with structured data, ideally relational Struggle with Big Data due to variety

Question 6

How much data an organisation creates, is cleansed, transformed and loaded into Data Warehouse?

Accepted Answer

Only 20% of data that could be used. albeit the very important 20% 80% of data is Raw, Unstructured or Semi Structured.

Question 7

3 categories of data based on its form in the primary source

Accepted Answer

Structured: transactional data from enterprise applications Semi-structured: machine data from the IOT Unstructured: text, audio and video from social media and Web applications Relational databases only work well with structured data.

Question 8

How to handle unstructured data?

Accepted Answer

NoSQL databases (“Not only” SQL databases)

Question 9

NoSQL Databases Attributes

Accepted Answer

Significant installed base of systems, particularly websites, using a NoSQL database
Supports distributed, scalable, and real-time data updates
Schema-free design that provides flexibility to start loading data and then changing it later
Provides BASE rather than ACID support.
- Basically available: high availability of 24/7 (often demanded for most transactional systems), is relaxed
- Soft state: database may be inconsistent at any point in time
- Eventually consistent

NoSQL database falls into several technology architecture categories:

Key-Value
Column-Family
Document
Graph

Question 10

Relational Databases Attributes

Accepted Answer

Large installed base of applications, often running key business processes within an enterprise Large pool of experience people with skills such as DBA, application developer, architect, and business analyst Increasing scalability and capability due to advances in relational technology and underlying infrastructure Large pool of BI, data integration, and related tools that leverage the technology Requires a schema with tables, columns, and other entities to load and query database For transactional data it provides ACID support to guarantee transactional integrity and reliability. Atomic: Entire transaction succeeds or ii is rolled back Consistent: A transaction needs to be in a consistent state to be completed Isolated: Transactions are independent of each other Durable: Transactions persist after they are completed

Question 11

define data velocity

Accepted Answer

How fast data is generated, flows, is stored, retrieved and analysed.

Question 12

key characteristics of stream analytics + 2 use cases

Accepted Answer

Data has a short shelf life Spot trend, opportunity, or problem in microseconds algorithmic traders fraud detection

Question 13

Basically, what is required to make Big Data valuable?

Accepted Answer

Need to be able to process a massive volume of disparate types of data and analyse it to produce insight in a time frame driven by the business need. The algorithms and models haven't changed. We are still doing correlation and link analysis and prediction. It's just that the volume of data we run the models against have become much larger. Machine learning has an increasing part to play

Question 14

Are DW trusted? Need?

Accepted Answer

_Businesses need trust._ Data in a data warehouse is trusted. It goes through a rigorous process of cleansing, formatting, enrichment, meta data attachment etc. It’s high quality Quality is expensive Data in a DW is high value. _Need_ Government regulations require high quality data. CEO CFO of companies publically traded on US based stock exchanges must certify accuracy of their financial statements. This also applies to their non-US operations.

Big Data Flashcards

(25 cards)