4.11 Big Data Flashcards

(9 cards)

1
Q

What is meant by Big Data

A

A catch-all term for data that does not fit into usual containers and so requires other techniques to process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 attributes of Big Data and what do they entail

A
  • Volume: This means big data is too big to fit into a single server
  • Velocity: Data changes very regularly and needs to be processed within a short amount of time
  • Variety: Data is made up from different forms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the problems with Big Data regarding volume

A
  • Processing has to be carried out across multiple machines
  • Results of processing can vary if multiple machines operate on connected data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the problems with Big Data regarding velocity

A
  • As data is spread over many locations, the time to communicate may affect the processing that can happen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the problems with Big Data regarding variety

A
  • It is difficult to determine the structure of data
  • Traditional storage methods are not appropriate as data does not fit into table format
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the techniques to overcome Big Data problems that have to do with volume

A
  • Programs using functional programming can be run separately in isolation without worrying about any processing side effects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the techniques to overcome Big Data problems that have to do with variety

A

Data can be represented as a fact-based model to avoid a structure that is too rigid.
Links between data can be determined using a graph schema rather than a rigid structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the features of a fact based model

A
  • Data is stored at individual facts with timestamps
  • Each fact is immutable
  • Facts can be updated with a new fact with a new data timestamp
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the advantages of fact based modelling

A
  • Because data is immutable it reduces the risk of losing data due to human error
  • It has no need for indexing because data is just appended to the dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly