Elements that conform Big Data
Velocity
is the speed at which data accumulates, it generates really fast and never stops.
Volume
is the scale of the data or the increase in the amount of data stored.
Variety
is the diversity of the data, data can be structured, semi-structured, and unstructured. Variety means different sources of which data comes from like machines, people, and processes.
Veracity
it’s the quality, origin of data, and its conformity to facts and accuracy. With large amount of data obtain and accumulated, it needs to be classified as real, false, accurate or reliable.
Value
is our ability and need to turn data into value. Value can be profit, medical, or social benefit.
Big Data Processing Tools
Hadoop
It is a Java-based (Text form) open-source framework, allows distributed storage and processing of large datasets across clusters of computers.
Hadoop Benefits
Hadoop Distributed File System (HDFS)
Storage system for big data.
Hadoop Distributed File System (HDFS) capacities
Hive
It is an open-source data warehouse software for reading, writing, and managing large data that are stored directly on Hadoop or other data storage system.
Spark capacities
Spark
A general-purpose data processing engine designed to extract and process large volumes of data for a wide range of application in real-time.