What are the three purposes of data analysis?
Description, prediction, explanation.
What defines a numeric variable?
Measures quantity (how many/how much).
What defines a categorical variable?
Labels/groups/categories.
In a rectangular dataset, what does each row represent?
An entity.
In a rectangular dataset, what does each column represent?
A variable.
What is classification?
Grouping into predefined categories.
What is cross-classification?
Grouping by combinations of categories.
What is a two-way table?
Counts for two categorical variables.
Why is it called two-way?
Uses two categorical variables.
What does a classification model predict?
A category/label.
What is a proportion?
Fraction of total with an attribute.
How are proportions commonly expressed?
Percentages.
Why use percentages?
Easier comparison.
What is a baseline model (classification)?
Predicts most common class.
What is a confusion matrix?
Actual vs predicted table.
What does PCC measure?
Percentage correctly classified.
When is a prediction correct?
Actual equals predicted.
What is a conditional proportion?
Proportion within a subgroup.
What is an algorithm?
Rules for making predictions.
What is a decision rule?
If cutoff rule for classification.
What is algorithmic bias?
Unfair outcomes from biased data.
Why is unbalanced data risky?
Model favors larger group.
Why is correct labelling important?
Affects model accuracy.
Key features of distributions?
Centre, shape, variation.