Definition of Big Data & 3 types of Big data
Big data is data that cannot be analysed using traditional processes
types of big data:
1. Data lacking structure
2. High volume of data
3. Must be processed at high velocity
Big data: Volume (challenges)
A high volume of data:
* doesn’t fit into a single server
* Has to be stored over multiple servers
* relational databases don’t scale well across multiple machines
* Increased: complexity, security issues, latency and data inconsistency
Big data: Velocity (challenges)
A high velocity of data:
* Data on servers is created and modified rapidly
* Servers must respond within milliseconds
Big data: Variety (challenges)
A high variety of data:
* Data held on servers consist of many different types of data
* hard to process/analyse data in different structures
* relational databases cannot be used because they require data being stored in row-and-column format
What happens when data sizes are too big to fit one a single server?
… functional programming is a solution to this (because it makes it easier to write correct, efficient, distributed code)
Specific features of functional programming that help solve Big Data challenges
functional programming makes it easier to write correct and efficient distributed code
specific features of FP:
1. support immutable data structures (data structures that cannot be modified)
2. statelessness (no side effects/ output for given input is always the same)
3. high-order functions (a function that takes one or more functions as arguments or return a function as an output)
What is the fact-based model for representing data?
A way of representing Big Data
reduces the risk of losing data due to human error, no index is used for the data
new data is appended to the dataset as it is created
How does the fact-based model work?
new data is appended to the dataset as it is created
Representing big data using graph schema
uses graphs consisting of nodes and edges to graphically represent the structure of a dataset