Big data Flashcards

(23 cards)

1
Q

What is ‘Big Data’?

A

‘Big Data’ is a catch-all term for data that won’t fit the usual containers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three defining features of Big Data known as ‘the three Vs’?

A
  • Volume
  • Velocity
  • Variety
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does ‘Volume’ refer to in Big Data?

A

There is too much data for it all to fit on a conventional hard drive or even a server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does ‘Velocity’ refer to in Big Data?

A

Data on the servers is created and modified rapidly, requiring responses within milliseconds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does ‘Variety’ refer to in Big Data?

A

The data consists of many different types, including binary and multimedia files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is the unstructured nature of Big Data challenging?

A

It makes it difficult to analyze the data using conventional databases that require a structured format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a primary requirement for processing Big Data stored over multiple servers?

A

The processing must be distributed across more than one machine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What programming paradigm is particularly suited for distributed processing of Big Data?

A

Functional programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the characteristics of functional programming that aid in distributed code?

A
  • Stateless
  • Immutable data structures
  • Higher-order functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the fact-based model for representing data?

A

A way of storing each piece of information as a fact that is immutable and includes a timestamp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does it mean that facts in the fact-based model are immutable?

A

Facts never change once created and cannot be overwritten.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does the fact-based model reduce data loss?

A

It prevents accidental data loss due to human error by not allowing overwriting of facts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is graph schema used for in Big Data?

A

To graphically represent the structure of a dataset using nodes and edges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In a graph schema, what do nodes represent?

A

Entities that can contain properties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do edges represent in a graph schema?

A

Relationships between entities, labelled with a brief description.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Are timestamps commonly included in graph schema diagrams?

A

No, timestamps are rarely included; it is assumed that each node contains the most recent information.

17
Q

Fill in the blank: The processing associated with using Big Data must be split across multiple _______.

18
Q

True or False: Conventional databases are well-suited for storing Big Data.

19
Q

What is an alternative method to represent properties in graph schema?

A

Listing an entity’s properties inside rectangles joined to entities with a dashed line.

20
Q

examples of big data volume

A

Hundreds of terabytes
Large medical datasets for diagnosis
Gene sequencing
Predicting disease outbreaks
Results of large-scale scientific
experiments

21
Q

examples of big data variety

A

Cannot be represented in a table // by a
relational database
Email messages
Videos
Images
Web site contents
Facial recognition

22
Q

examples of big data velocity

A

Thousands of items to process per
second.
Data must be processed as it is received
– it cannot be batched and processed
later
Card payment fraud detection
Recommendations systems

23
Q

Explain some of the challenges that Big Data brings with it and the approaches that can be
taken to overcome these, in relation to programming and hardware.

A

Challenges:
* Data cannot be stored on one server
* Not possible to process data quickly enough with one computer.
* Data cannot be represented in a relational database.
* Some unstructured data are difficult to analyse.
How overcome:
* Distributed database systems
distributed across multiple servers.

  • Use of functional programming.
  • Functional programming makes it easier to write distributable code // determine which parts
    of code can be run independently.
  • Functional programming makes it easier to write correct code // example features of
    functional programming that facilitate writing correct code
  • Use of servers with multiple CPUs / cores / drives.
  • Use of languages such as XML or JSON to describe semi-structured data.
  • Use of fact-based model can manage bigger data sets better than a relational model.