Big Data Analytics Flashcards

(4 cards)

1
Q
  1. What is Big Data Analytics?
A

It is the process of collecting, processing, and analyzing large amounts of raw data to uncover trends, patterns, and correlations. The goal is to transform terabytes of data—generated by customers, sensors, and transactions—into actionable insights for data-informed decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. The 4-Step Workflow The document outlines a specific lifecycle for operationalizing big data:
A
  1. Collect: Gathering structured and unstructured data from diverse sources like cloud storage, mobile apps, and IoT sensors.
  2. Process: Organizing the data for analysis. The text highlights two specific methods:
    ◦ Batch Processing: Analyzing large blocks of data over time. It has a longer turnaround but is useful for historical analysis.
    ◦ Stream Processing: Analyzing small batches of real-time events. It offers quicker decision-making but is more complex and expensive.
  3. Clean: Scrubbing “dirty data” by fixing formatting errors and removing duplicates. This is critical because poor data quality leads to flawed insights.
  4. Analyze: Applying advanced techniques to the prepared data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Key Analysis Methods
A
  • Data Mining: Sorting through datasets to identify anomalies and clusters.
  • Predictive Analytics: Using historical data to forecast future risks and opportunities.
  • Deep Learning: A subset of AI/Machine Learning that layers algorithms to find patterns in abstract, complex data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Essential Tools & Technologies Big Data requires an ecosystem of tools rather than a single solution:
A
  • Hadoop: An open-source framework for storing and processing data on commodity hardware.
  • NoSQL Databases: Non-relational systems designed for unstructured data (unlike traditional SQL databases).
  • Spark: A cluster computing framework capable of handling both batch and stream processing for fast computation.
  • Tableau: A platform for visual analysis and sharing insights across the organization.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly