108 Flashcards

(81 cards)

1
Q

What are the three purposes of data analysis?

A

Description, prediction, explanation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What defines a numeric variable?

A

Measures quantity (how many/how much).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What defines a categorical variable?

A

Labels/groups/categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In a rectangular dataset, what does each row represent?

A

An entity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In a rectangular dataset, what does each column represent?

A

A variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is classification?

A

Grouping into predefined categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is cross-classification?

A

Grouping by combinations of categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a two-way table?

A

Counts for two categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is it called two-way?

A

Uses two categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does a classification model predict?

A

A category/label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a proportion?

A

Fraction of total with an attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How are proportions commonly expressed?

A

Percentages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why use percentages?

A

Easier comparison.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a baseline model (classification)?

A

Predicts most common class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a confusion matrix?

A

Actual vs predicted table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does PCC measure?

A

Percentage correctly classified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When is a prediction correct?

A

Actual equals predicted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a conditional proportion?

A

Proportion within a subgroup.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is an algorithm?

A

Rules for making predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a decision rule?

A

If cutoff rule for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is algorithmic bias?

A

Unfair outcomes from biased data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Why is unbalanced data risky?

A

Model favors larger group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why is correct labelling important?

A

Affects model accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Key features of distributions?

A

Centre, shape, variation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What do mean and median measure?
Centre (typical value).
26
What is symmetric shape?
Balanced distribution.
27
What is positive skew?
Tail to the right.
28
What is negative skew?
Tail to the left.
29
What is unimodal?
One peak.
30
What is bimodal?
Two peaks.
31
What is IQR?
Middle 50 percent spread.
32
What does IQR tell you?
At least 50 percent within range.
33
What does standard deviation measure?
Spread around mean.
34
Larger standard deviation means?
More variation.
35
What does a dot plot show?
Individual values.
36
What does a box plot show?
Five-number summary.
37
What does a scatter plot show?
Relationship between two variables.
38
Scatter plot positive association?
Both increase.
39
Scatter plot negative association?
One increases, other decreases.
40
What is no association?
No clear pattern.
41
Goal of prediction models?
Predict numeric values.
42
Baseline prediction model?
Predicts mean.
43
Prediction error formula?
Actual minus predicted.
44
Positive error means?
Under-predicted.
45
Negative error means?
Over-predicted.
46
What is a prediction interval?
Range of likely values.
47
Accuracy vs precision?
Accuracy = correct percent, Precision = closeness.
48
Why use training/testing split?
Test on new data.
49
What is overfitting?
Too fitted to training data.
50
What makes a good model?
Works on unseen data.
51
What is an explanatory variable?
Predictor variable.
52
What is a response variable?
Outcome variable.
53
What does correlation measure?
Strength and direction.
54
Correlation of +1 means?
Perfect positive relationship.
55
Correlation of -1 means?
Perfect negative relationship.
56
What is middle 95 percent?
Typical values.
57
What is tail proportion?
Percent beyond a value.
58
Tail less than 2.5 percent means?
Unusual value.
59
What is a statistical test?
Compare data to a model.
60
What is the null hypothesis?
Chance explanation.
61
What is chance variation?
Random fluctuation.
62
Compatible with null means?
Within middle 95 percent.
63
What is an experiment?
Controlled study.
64
Why randomise?
Reduce bias.
65
What is random allocation?
Random assignment to groups.
66
What is a randomisation test?
Simulates chance outcomes.
67
What is a population?
Whole group.
68
What is a sample?
Subset.
69
What is a parameter?
True population value.
70
What is an estimate?
Sample-based value.
71
What is sampling variation?
Differences between samples.
72
What is a confidence interval?
Range of plausible values.
73
Why use confidence intervals?
Estimates are uncertain.
74
Narrow confidence interval means?
More precise.
75
Confidence vs prediction interval?
Population vs individual.
76
What is bootstrapping?
Simulating sampling variation.
77
95 percent confidence interval comes from?
Middle 95 percent of bootstrap values.
78
Confidence interval format?
Lower, upper.
79
What is dynamic data?
Continuously updated data.
80
What are pixels?
Small colour squares in images.
81
What makes a good research question?
Includes who and what.