What is the main learning objective of data analysis?
To explain how data can be analysed to provide business intelligence.
What is Knowledge Discovery in Databases (KDD)?
A process of analysing data to gather knowledge, consisting of five basic stages.
List the five basic stages of Knowledge Discovery in Databases.
What is the purpose of the selection stage in data analysis?
To determine specific questions the business wants to answer.
What is noise in a dataset?
Corrupted or unwanted data within a dataset.
What are outliers in data analysis?
Data points that fall significantly outside the range of other data.
True or False: Outliers should always be removed from a dataset.
False.
What is the role of finance professionals in the data cleaning process?
What is the transformation stage in data analysis?
The process of preparing the data for analysis.
What is sampling in data analysis?
Choosing a representative sample to analyse instead of the whole population.
What does aggregation in data analysis involve?
Combining several features together to summarise data.
What does ETL stand for in data warehousing?
Extraction, Transformation, Loading.
What is the difference between ETL and ELT?
What is the significance of artificial intelligence (AI) in data analytics?
AI allows for the processing of large volumes of data quickly and can automate decision-making.
List some key capabilities of AI that are transforming data analytics.
What is machine learning?
The ability of AI algorithms to learn and improve their analytical skills over time.
Name the three main types of machine learning used in data analytics.
What does supervised learning involve?
Training the algorithm to recognize key features of the data.
Fill in the blank: The pre-processing stage is designed to _______.
[clean data to ensure it is of good quality]
What is the challenge of using aggregation in data analysis?
It may lead to false or over-generalised conclusions.
What is the primary requirement for machine learning to learn effectively?
Large volumes of disaggregated and diverse data
Disaggregated data means that it is broken down as far as possible, and diverse data is gathered from a wide variety of sources.
What are the two important uses of supervised learning in data analytics?
What is classification in the context of machine learning?
The process of separating data within a dataset into different categories.
What type of classification uses a yes/no split?
Binary classification