What is descriptive statistical analysis?
It uses numbers to describe the qualities of a data set.
What is inferential statistical analysis?
It draws conclusions about a larger population based on a sample group.
What is associational statistical analysis?
It is used to make predictions and find causation.
What is predictive analysis?
It uses statistical algorithms and machine learning to predict future events and behavior.
What is prescriptive analysis?
It helps organizations use data to guide decision-making.
What is exploratory data analysis?
It identifies patterns and trends in a data set.
What is causal analysis?
It determines causation or why things happen, used in quality assurance and investigations.
What role does metadata play in unstructured data analysis?
It provides information about data for management, storage, and analysis.
What is Natural Language Processing (NLP)?
A machine learning method to analyze the meaning of unstructured text data.
How are images analyzed in unstructured data?
By understanding unstructured information, e.g., diagnosing medical conditions from x-rays.
What is supervised machine learning?
It requires labeled input and output data for training and is used for classification and prediction.
What is unsupervised machine learning?
It uses raw, unlabeled data to identify patterns and cluster similar data.
What are the main uses of unsupervised machine learning?
Clustering datasets, understanding relationships, and initial data analysis.
What are key differences between supervised and unsupervised learning?
Supervised needs labeled data and is used for classification/prediction; unsupervised finds relationships and is less explainable.
How is data dredging different from data mining?
Data dredging lacks a hypothesis and produces patterns by chance.
Quantitative data, name 4 features
Expressed as a numerical value
Analysed using computational techniques and algorithms
Measured objectively
Answers questions like ‘how much’ and ‘how often’
Qualitative data, name 4 features
Represented as a name or symbol
Organised into themes
Measured subjectively
Answers questions such as ‘why or ‘how’
Mean
Mathematical average of a range of numbers.
Median
The midpoint in a range of numbers when in numerical order
Mode
The most commonly occurring number in the data set.
Skewness
Skewness indicates how symmetrical a range of numbers is