What are the 4 Vs of Big Data?
Volume, velocity, veracity, and variety
What does volume refer to in terms of big data?
The size of the data being processed
For example, YouTube processes 72 hours of video uploads every minute
What are the challenges associated with the volume of big data?
What does variety refer to in terms of big data?
Refers to the complexity of the data. It can come in many different forms with varying levels of structure
What are issues associated with the variety of big data?
What does velocity refer to in terms of big data?
The speed at which data is created, stored, and analyzed. The amount of data per time
Why is it important that we use real-time processing to keep up with the velocity of big data?
Real-time processing reduces the risk of missing opportunities due to moving slowly. The faster we can act on data insights, the better results we can get
What does veracity refer to in terms of big data?
It is the quality of the data including its validity and volatility. Also includes reliability of the data source
What are the 5 steps of the data science process?
Acquire, prepare, analyze, report, act
What is included in the acquire data step of the data science process?
What are the 2 parts of the prepare step of the data science process?
A. Explore data
B. Pre-process data
What is included in the explore data step of the data science process?
What is included in the pre-process data step of the data science process?
What is included in the analyze data step of the data science process?
Select analytical techniques and build models and run the models on the data set to achieve output
What are some common analysis techniques?
What is the goal of classification analysis?
Predicting a category based on data. Supervised learning technique
What is the goal of regression analysis?
Predicting a numeric value based on data. Supervised learning technique
What is the goal of clustering for analysis?
Organizing similar items into groups. Unsupervised learning technique
What is the goal of association analysis?
To find rules to capture associations between items. Unsupervised learning technique
What is the goal of graph analytics?
To use graph structures to find connections between entities. Unsupervised learning technique
What are the 3 steps of the data analysis process?
What is included in the report step of the data science process?
Communicating results in an effective way including things that were unexpected or went wrong to increase the chances for learning
What is included in the act step of the data science process?
Apply results to take action. Determine the next steps which may include revisiting the model depending on results
What are the 4 characteristics of patterns and models that are the goal of data mining?