what is Big Data?
The term Big Data is described as data that can’t be processed or analysed using traditional processes or tools because it falls into one or more of the following categories:
- too big to fit into a single server
- too heterogeneous (diverse in character or content) - structured, semi-structured or totally unstructured
- its production can occur at very high rates
The three defining features of big data can be remembered as “the three Vs”:
Velocity, Volume and Variety
Big Data Velocity
define Data in motion(velocity)
define Data at rest(velocity) and it’s batch processing
Big Data Volume
Volume in Big Data refers to the size of the data to be processed. Large volumes of data fall into the Big Data category if that data must be analysed as a single dataset.
What is a distributed file system?
A distributed file system is one in which the blocks of individual files are spread across more than one server.
e.g. Google’s distributed file system is GFS. Yahoo, Facebook, and Twitter use HDFS, the Hadoop Distributed File System.
Both systems use racks of servers with network switches interconnecting servers in a rack and servers in other racks
Big Data Variety
Examples of big data?
Twitter, continuously monitored banking interactions, data from surveillance systems
How does Machine Learning benefit Big Data?
What are the principles of fact based modelling?
Big data can be stored this way
What are the advantages of fact based modelling?
Graph schema (what does each component represent)
Functional Programming