week 1 supervised learning Flashcards

(29 cards)

1
Q

What is the goal of supervised learning?

A

Learn a predictor f: X → Y that maps features to labels using labelled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is feature selection?

A

Choosing measurable quantities (features x¹,…,xᵈ) that help predict the output label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a feature vector?

A

A vector x ∈ ℝᵈ representing the selected measurable attributes of an input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a label in supervised learning?

A

A quantitative or categorical value y representing the target to be predicted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a labelled dataset?

A

A set of pairs (x₁,y₁),…,(xₙ,yₙ) where each feature vector has a corresponding label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a learning algorithm?

A

A map taking a dataset (X × Y)ⁿ and outputting a predictor f: X → Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What makes a predictor good?

A

It should approximate the ‘true’ predictor f_true, though the true label function rarely exists exactly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why does the ‘true’ predictor rarely exist?

A

Because labelers may disagree or the features do not capture all necessary information (plant location example).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the minimum error predictor?

A

A predictor that assigns labels to minimize misclassification error over the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why can’t we compute the true minimum-error predictor directly?

A

It requires knowing the entire population distribution of (X,Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What assumption is made in statistical learning to avoid enumerating the whole population?

A

We assume the population is described by a probability distribution ρ ∈ Prob(X × Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does sampling from the population correspond to?

A

Drawing samples from ρ, the assumed distribution over (X,Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Bayes optimal predictor f*?

A

The predictor that minimizes expected loss: f*(x) ∈ argmin_z ∫ l(y,z) dρ(y|x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does indicator notation 1[⋯] mean?

A

It equals 1 if the condition is true, otherwise 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the Bayes classifier do in binary classification?

A

Predicts 1 if P(Y=1|X=x) exceeds a threshold depending on the loss function; otherwise predicts 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is Bayes risk non-zero?

A

Because the conditional distributions overlap; even the optimal classifier misclassifies some points.

17
Q

What is the disadvantage of k-NN compared to the Bayes classifier?

A

It is slow, requires the dataset at inference time, and tends to overfit.

18
Q

What is the goal of decision theory in ML?

A

To compute f* that minimizes expected risk under the true distribution ρ.

19
Q

What is risk R[f]?

A

The expected loss: ∫ l(y,f(x)) dρ(x,y).

20
Q

What alternative objective involving TPR and FPR can be used?

A

Maximize TPR subject to constraints on FPR (useful in imbalanced classification).

21
Q

What is a False Positive Rate (FPR)?

A

P(f(X)=1 | Y=0).

22
Q

What is a True Positive Rate (TPR)?

A

P(f(X)=1 | Y=1).

23
Q

Why do we use probabilistic modelling in supervised learning?

A

Because we cannot observe the entire population, so we model (X,Y) with a probability distribution.

24
Q

How does the Abalone example illustrate minimum error prediction?

A

If x = number of rings ≤ 8, most are male → f(x)=male minimizes empirical error.

25
What is overfitting in supervised learning?
When a predictor performs well on training data but poorly on unseen data.
26
What does X × Y represent?
The joint space of features and labels.
27
What does (X × Y)ⁿ represent?
A dataset of n labelled examples.
28
What is exploratory data analysis used for?
Understanding dataset structure before training models.
29
What is the goal when performing supervised learning?
Minimize prediction error on unseen data.