(4.11) Big Data Flashcards

(9 cards)

1
Q

Definition of Big Data & 3 types of Big data

A

Big data is data that cannot be analysed using traditional processes

types of big data:
1. Data lacking structure
2. High volume of data
3. Must be processed at high velocity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Big data: Volume (challenges)

A

A high volume of data:
* doesn’t fit into a single server
* Has to be stored over multiple servers
* relational databases don’t scale well across multiple machines
* Increased: complexity, security issues, latency and data inconsistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Big data: Velocity (challenges)

A

A high velocity of data:
* Data on servers is created and modified rapidly
* Servers must respond within milliseconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Big data: Variety (challenges)

A

A high variety of data:
* Data held on servers consist of many different types of data
* hard to process/analyse data in different structures
* relational databases cannot be used because they require data being stored in row-and-column format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens when data sizes are too big to fit one a single server?

A
  • processing must be distributed across more than one machine

… functional programming is a solution to this (because it makes it easier to write correct, efficient, distributed code)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Specific features of functional programming that help solve Big Data challenges

A

functional programming makes it easier to write correct and efficient distributed code

specific features of FP:
1. support immutable data structures (data structures that cannot be modified)
2. statelessness (no side effects/ output for given input is always the same)
3. high-order functions (a function that takes one or more functions as arguments or return a function as an output)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the fact-based model for representing data?

A

A way of representing Big Data

reduces the risk of losing data due to human error, no index is used for the data

new data is appended to the dataset as it is created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does the fact-based model work?

A
  • each individual piece of data is stored as a fact
  • facts are immutable
  • each fact is stored with a timestamp

new data is appended to the dataset as it is created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Representing big data using graph schema

A

uses graphs consisting of nodes and edges to graphically represent the structure of a dataset

  • nodes represent entities (can contain properties)
  • edges represent relationships (labelled with a brief description)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly