week 1 supervised learning Flashcards by Timothee Maurin

What is the goal of supervised learning?

Learn a predictor f: X → Y that maps features to labels using labelled data.

How well did you know this?

Not at all

Perfectly

What is feature selection?

Choosing measurable quantities (features x¹,…,xᵈ) that help predict the output label.

How well did you know this?

Not at all

Perfectly

What is a feature vector?

A vector x ∈ ℝᵈ representing the selected measurable attributes of an input.

How well did you know this?

Not at all

Perfectly

What is a label in supervised learning?

A quantitative or categorical value y representing the target to be predicted.

How well did you know this?

Not at all

Perfectly

What is a labelled dataset?

A set of pairs (x₁,y₁),…,(xₙ,yₙ) where each feature vector has a corresponding label.

How well did you know this?

Not at all

Perfectly

What is a learning algorithm?

A map taking a dataset (X × Y)ⁿ and outputting a predictor f: X → Y.

How well did you know this?

Not at all

Perfectly

What makes a predictor good?

It should approximate the ‘true’ predictor f_true, though the true label function rarely exists exactly.

How well did you know this?

Not at all

Perfectly

Why does the ‘true’ predictor rarely exist?

Because labelers may disagree or the features do not capture all necessary information (plant location example).

How well did you know this?

Not at all

Perfectly

What is the minimum error predictor?

A predictor that assigns labels to minimize misclassification error over the entire population.

How well did you know this?

Not at all

Perfectly

Why can’t we compute the true minimum-error predictor directly?

It requires knowing the entire population distribution of (X,Y).

How well did you know this?

Not at all

Perfectly

What assumption is made in statistical learning to avoid enumerating the whole population?

We assume the population is described by a probability distribution ρ ∈ Prob(X × Y).

How well did you know this?

Not at all

Perfectly

What does sampling from the population correspond to?

Drawing samples from ρ, the assumed distribution over (X,Y).

How well did you know this?

Not at all

Perfectly

What is the Bayes optimal predictor f*?

The predictor that minimizes expected loss: f*(x) ∈ argmin_z ∫ l(y,z) dρ(y|x).

How well did you know this?

Not at all

Perfectly

What does indicator notation 1[⋯] mean?

It equals 1 if the condition is true, otherwise 0.

How well did you know this?

Not at all

Perfectly

What does the Bayes classifier do in binary classification?

Predicts 1 if P(Y=1|X=x) exceeds a threshold depending on the loss function; otherwise predicts 0.

How well did you know this?

Not at all

Perfectly

Why is Bayes risk non-zero?

Study These Flashcards

Because the conditional distributions overlap; even the optimal classifier misclassifies some points.

What is the disadvantage of k-NN compared to the Bayes classifier?

Study These Flashcards

It is slow, requires the dataset at inference time, and tends to overfit.

What is the goal of decision theory in ML?

Study These Flashcards

To compute f* that minimizes expected risk under the true distribution ρ.

What is risk R[f]?

Study These Flashcards

The expected loss: ∫ l(y,f(x)) dρ(x,y).

What alternative objective involving TPR and FPR can be used?

Study These Flashcards

Maximize TPR subject to constraints on FPR (useful in imbalanced classification).

What is a False Positive Rate (FPR)?

Study These Flashcards

P(f(X)=1 | Y=0).

What is a True Positive Rate (TPR)?

Study These Flashcards

P(f(X)=1 | Y=1).

Why do we use probabilistic modelling in supervised learning?

Study These Flashcards

Because we cannot observe the entire population, so we model (X,Y) with a probability distribution.

How does the Abalone example illustrate minimum error prediction?

Study These Flashcards

If x = number of rings ≤ 8, most are male → f(x)=male minimizes empirical error.

What is overfitting in supervised learning?

When a predictor performs well on training data but poorly on unseen data.

What does X × Y represent?

The joint space of features and labels.

What does (X × Y)ⁿ represent?

A dataset of n labelled examples.

What is exploratory data analysis used for?

Understanding dataset structure before training models.

What is the goal when performing supervised learning?

Minimize prediction error on unseen data.

week 1 supervised learning Flashcards

(29 cards)