Big data Flashcards

Question 1

Q

What is ‘Big Data’?

Answer

A

‘Big Data’ is a catch-all term for data that won’t fit the usual containers.

Question 2

Q

What are the three defining features of Big Data known as ‘the three Vs’?

Answer

A

Volume
Velocity
Variety

Question 3

Q

What does ‘Volume’ refer to in Big Data?

Answer

A

There is too much data for it all to fit on a conventional hard drive or even a server.

Question 4

Q

What does ‘Velocity’ refer to in Big Data?

Answer

A

Data on the servers is created and modified rapidly, requiring responses within milliseconds.

Question 5

Q

What does ‘Variety’ refer to in Big Data?

Answer

A

The data consists of many different types, including binary and multimedia files.

Question 6

Q

Why is the unstructured nature of Big Data challenging?

Answer

A

It makes it difficult to analyze the data using conventional databases that require a structured format.

Question 7

Q

What is a primary requirement for processing Big Data stored over multiple servers?

Answer

A

The processing must be distributed across more than one machine.

Question 8

Q

What programming paradigm is particularly suited for distributed processing of Big Data?

Answer

A

Functional programming.

Question 9

Q

What are the characteristics of functional programming that aid in distributed code?

Answer

A

Stateless
Immutable data structures
Higher-order functions

Question 10

Q

What is the fact-based model for representing data?

Answer

A

A way of storing each piece of information as a fact that is immutable and includes a timestamp.

Question 11

Q

What does it mean that facts in the fact-based model are immutable?

Answer

A

Facts never change once created and cannot be overwritten.

Question 12

Q

How does the fact-based model reduce data loss?

Answer

A

It prevents accidental data loss due to human error by not allowing overwriting of facts.

Question 13

Q

What is graph schema used for in Big Data?

Answer

A

To graphically represent the structure of a dataset using nodes and edges.

Question 14

Q

In a graph schema, what do nodes represent?

Answer

A

Entities that can contain properties.

Question 15

Q

What do edges represent in a graph schema?

Answer

A

Relationships between entities, labelled with a brief description.

Question 16

Q

Are timestamps commonly included in graph schema diagrams?

Answer

Study These Flashcards

A

No, timestamps are rarely included; it is assumed that each node contains the most recent information.

Question 17

Q

Fill in the blank: The processing associated with using Big Data must be split across multiple _______.

Answer

Study These Flashcards

A

machines

Question 18

Q

True or False: Conventional databases are well-suited for storing Big Data.

Answer

Study These Flashcards

A

False

Question 19

Q

What is an alternative method to represent properties in graph schema?

Answer

Study These Flashcards

A

Listing an entity’s properties inside rectangles joined to entities with a dashed line.

Question 20

Q

examples of big data volume

Answer

Study These Flashcards

A

Hundreds of terabytes
Large medical datasets for diagnosis
Gene sequencing
Predicting disease outbreaks
Results of large-scale scientific
experiments

Question 21

Q

examples of big data variety

Answer

Study These Flashcards

A

Cannot be represented in a table // by a
relational database
Email messages
Videos
Images
Web site contents
Facial recognition

Question 22

Q

examples of big data velocity

Answer

Study These Flashcards

A

Thousands of items to process per
second.
Data must be processed as it is received
– it cannot be batched and processed
later
Card payment fraud detection
Recommendations systems

Question 23

Q

Explain some of the challenges that Big Data brings with it and the approaches that can be
taken to overcome these, in relation to programming and hardware.

Answer

Study These Flashcards

A

Challenges:
* Data cannot be stored on one server
* Not possible to process data quickly enough with one computer.
* Data cannot be represented in a relational database.
* Some unstructured data are difficult to analyse.
How overcome:
* Distributed database systems
distributed across multiple servers.

Use of functional programming.
Functional programming makes it easier to write distributable code // determine which parts
of code can be run independently.
Functional programming makes it easier to write correct code // example features of
functional programming that facilitate writing correct code
Use of servers with multiple CPUs / cores / drives.
Use of languages such as XML or JSON to describe semi-structured data.
Use of fact-based model can manage bigger data sets better than a relational model.

Big data Flashcards

(23 cards)