F.4. Data Analytics and Business Intelligence Flashcards

Understand how organizations leverage business intelligence, big data, and data mining techniques. (35 cards)

1
Q

What is data analytics?

A

The process of gathering and analyzing data to discover patterns and draw conclusions in a way that produces meaningful information that can be used to aid in decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the primary goal of data analytics?

A

To provide information about issues that the analyst or manager either knows or knows he or she does not know (known unknowns).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is data science?

A

A field of study and analysis that uses algorithms and processes to extract hidden knowledge and insights from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the goal of data science?

A

To provide actionable insights into issues where the analyst or manager does not know what he or she does not know (that is, “unknown unknowns”).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is business intelligence?

A

The combination of architectures, analytical and other tools, databases, applications, and methodologies that enable interactive access, sometimes in real time, to data such as sales revenue, costs, income, and product data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the four phases of the business intelligence process, and what do they lead to?

A

It involves the transformation of data into information, then to knowledge, and finally to insight.

The insights gained from the use of business intelligence lead to recommendations for the best action to take.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the four main components of a Business Intelligence system?

A
  • A data warehouse (DW) containing the source data.
  • Business analytics tools to mine, manipulate, and analyze the data.
  • A business performance management (BPM) component.
  • A user interface, usually in the form of a dashboard.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define:

Big Data

A

Vast datasets that are too large to be analyzed using standard software tools and require new processing technologies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the four V’s of Big Data?

A
  • Volume - the amount of data
  • Velocity - the speed at which data is generated and changed
  • Variety - the diverse forms of data that organizations create and collect
  • Veracity - the accuracy of data, the extent to which it can be trusted for decision-making
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between structured, unstructured, and semi-structured data?

A
  • Structured data: Organized format, suitable for relational databases.
  • Unstructured data: No defined format, text-heavy.
  • Semi-structured data: Some format, not fully organized in relational databases.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the purpose of a dashboard in a Business Intelligence system?

A

To organize and display information relevant to a given objective or process, often with interactive elements for data exploration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is data mining?

A

The use of statistical techniques to search large datasets to discover previously unknown, useful patterns, trends, and relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the main objective of data science?

A

To extract hidden knowledge and insights from data for forecasting and strategic decision making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some general challenges of managing data analytics?

A
  • Data capture
  • Data curation (the organization and integration of disparate data collected from various sources)
  • Data storage
  • Security and privacy protection
  • Data search
  • Data sharing
  • Data transfer
  • Data analysis
  • Data visualization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the role of a data scientist?

A

A professional with skills in statistics, data analysis, machine learning, math, programming, business, and IT, focusing on extracting insights from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the significance of “veracity” in Big Data?

A

It refers to the accuracy of data, ensuring it is objective and relevant for decision-making and avoiding biased, ambiguous, irrelevant, inconsistent, incomplete, or even deceptive data from being used in analysis, which would result in poor decisions.

17
Q

What is the iterative process in data mining?

A

The repetition of a process to generate a sequence of outcomes, where each iteration’s outcome is the starting point for the next.

18
Q

What is the difference between data analytics and data science?

A
  • Data analytics focuses on known unknowns.
  • Data science aims to provide insights into unknown unknowns.
19
Q

What is the benefit of combining good data and good data science?

A

They enable data-driven decision-making, leveraging relevant information, transforming data into insights, discovering opportunities, and increasing competitive advantage. They can lead to large productivity gains for a company and the ability to do things it has never done before.

20
Q

What is the primary purpose of data mining?

A

To create predictions and make inferences about relationships using historical data.

21
Q

What are the basic concepts of predictive analytics?

A
  • Classification
  • Prediction
  • Association rules
  • Online recommendation systems
  • Data reduction
  • Clustering
  • Dimension reduction
  • Data exploration
  • Data visualization
22
Q

What is classification in predictive analytics in data mining?

A

Classification involves detrmining which category data belongs to, such as whether a customer will purchase or not purchase. Data is assigned to predefined classes using algorithms to predict what the classification is or will be.

23
Q

What is the difference between classification and prediction in predictive analytics in data mining?

A
  • Classification involves determining which category data belongs to, such as whether customer will purchase or not purchase..
  • Prediction involves estimating a continuous numerical value using regression analysis, such as for those who are predicted purchasers, estimating the amount of their purchases (causal forecasting).

Both classification and prediction answer the question, “What will happen?” but classification answers with categories while prediction answers with numbers.

24
Q

What are association rules in predictive analytics in data mining?

A

They are used to find patterns of association between items in large databases, such as associations among items purchased from a retail store, or “what goes with what.”

25
What is the role of online recommendation systems in predictive analytics in data mining?
To deliver personalized recommendations to users based on collaborative filtering, which generates rules for “what goes with what” at the individual user level.
26
What is clustering in predictive analytics in data mining?
Discovering groups in datasets with similar characteristics without using known structures in the data or fixed groups.
27
What is data reduction in predictive analytics in data mining?
Consolidating a large number of records into a smaller set by grouping them into homogeneous groups.
28
What is dimension reduction in predictive analytics in data mining?
Reducing the number of variables in the data to improve manageability, interpretability, and predictive ability.
29
What is data exploration in predictive analytics in data mining?
Used to understand the data and detect unusual values by examining each variable individually and looking at relationships between and among the variables to discover patterns in the data.
30
What is data visualization in predictive analytics in data mining?
Creating graphics to visualize the distribution of variables and detect outliers (data entries that do not fit into the model because they are extreme observations).
31
What is supervised learning in predictive analytics in data mining?
It is used for classification and prediction. It involves training an algorithm using **labeled data** as the training data. Labeled data contains the outcome value for each record, such as what each customer purchased. ## Footnote After the algorithm has “learned” from the training data, it is tested by applying it to another sample of data for which the outcome is known but is initially hidden (called the **validation data**) to see if it works properly and if it can make predictions that are close to what actually happened. After the algorithm has been thoroughly tested, it can be used to classify data or make predictions from data where the outcome is unknown.
32
What is unsupervised learning in predictive analytics in data mining?
They are trained on data just as supervised learning algorithms are, but the training dataset is **unlabeled**, meaning there are no known outcome variables to predict or classify. Instead, the algorithm discovers patterns, groupings, or structures in the data. ## Footnote Association rules, dimension reduction, and clustering are unsupervised learning methods.
33
What are neural networks in data mining?
They are a machine learning technique used in data mining. Instead of a data scientist needing to manually decide which features matter, a neural network automatically discovers complex patterns and features in the training data.
34
What are the steps in a data mining project?
* Understand the purpose of the project * Select the dataset to use * Explore, clean, and preprocess the data * Reduce data dimension if needed * Determine the data mining task * Partition the data * Select data mining techniques to use * Use algorithms to perform the task * Interpret results of the algorithms and choose the best * Deploy the model
35
What are some challenges of data mining?
* Poor data quality * Information in multiple locations * Biases in data evaluation * Correlation vs. causation (correlation does not prove causation) * Ethical issues such as data privacy * Data security * Unstructured data