week 2 Flashcards

(44 cards)

1
Q

What is the Bayes optimal classifier in binary classification?

A

The predictor f*(x) = 1[ P(Y=1|X=x) ≥ threshold ] that minimises expected risk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the likelihood ratio used in the optimal rule?

A

“ℒ(x) = ρ(x|Y=1) / ρ(x|Y=0).”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What form does the Bayes classifier take using the likelihood ratio?

A

“f*(x)=1[ ℒ(x) ≥ (π₀(l01−l00)) / (π₁(l10−l11)) ].”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a likelihood ratio test (LRT)?

A

A classifier of the form fη(x)=1[ℒ(x)≥η] with threshold η.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does Neyman–Pearson lemma state?

A

If class-conditional densities are continuous, the classifier that maximises TPR subject to FPR ≤ α is an LRT with threshold chosen so FPR=α.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Type I and Type II errors?

A

Type I: false positive. Type II: false negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is True Positive Rate (TPR)?

A

P(f(X)=1 | Y=1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is False Negative Rate (FNR)?

A

P(f(X)=0 | Y=1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is False Positive Rate (FPR)?

A

P(f(X)=1 | Y=0).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is True Negative Rate (TNR)?

A

P(f(X)=0 | Y=0).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can risk be decomposed using FPR and TPR?

A

R[f] = α·FPR − β·TPR + γ for constants α,β≥0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an alternative supervised learning goal beyond minimising risk?

A

Maximising TPR subject to constraint FPR ≤ α.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What distinguishes discriminative from generative models?

A

Discriminative models learn predictors f; generative models model ρ(x,y) or ρ(y), ρ(x|y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do generative models learn?

A

ρ(x|y), ρ(y) allowing computation of ρ(y|x) via Bayes’ rule.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does Linear Discriminant Analysis assume?

A

Classes have Gaussian class-conditional distributions with shared covariance Σ and different means μᵢ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Quadratic Discriminant Analysis?

A

A generative model like LDA but each class has its own covariance Σᵢ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What assumption does Naive Bayes make?

A

Conditional independence of features xⱼ given class y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What loss corresponds to maximum likelihood estimation?

A

The log-loss l(x,y,θ)=−logρθ(x,y).

19
Q

What is empirical risk for maximum likelihood?

A

R̂(θ)=Σ_j −logρθ(x_j, y_j).

20
Q

Why are generative models hard in practice?

A

It’s difficult to specify a realistic density model ρ(x,y) for complex data.

21
Q

What is the purpose of regularisation?

A

To penalise model complexity and prevent overfitting by modifying the objective.

22
Q

What is explicit regularisation?

A

Adding λΩ(f) to empirical risk: J(f)=R̂(f)+λΩ(f).

23
Q

What does λ (lambda) control in regularisation?

A

The strength of the complexity penalty; it is a hyperparameter chosen with cross-validation.

24
Q

What is L2 regularisation?

A

Ω(θ)=‖θ‖₂² = Σ_j θⱼ² (ridge).

25
What effect does L2 regularisation have?
Shrinks parameters but does not force them to zero.
26
What is L1 regularisation?
Ω(θ)=‖θ‖₁ = Σ_j |θⱼ| (lasso).
27
What effect does L1 regularisation have?
Encourages sparsity by driving some parameters to zero.
28
How does cross-validation help choose λ?
Evaluate model performance on validation folds for different λ and choose the λ with best generalisation.
29
What problem does Lasso regression solve?
Minimising (1/n) Σ (y - fθ(x))² + λ‖θ‖₁.
30
What does a linear separator do?
Finds a hyperplane that correctly separates binary classes.
31
How is a line written in vector form?
wᵀx − b = 0.
32
How does a linear classifier assign labels?
Predict +1 if wᵀx − b > 0, predict −1 if wᵀx − b < 0.
33
When is a point (x,y) correctly classified?
When y(wᵀx − b) > 0.
34
What does SVM aim to maximise?
The margin: the distance between decision boundary and closest data point.
35
What is hinge loss?
l(y,f(x)) = max(1 − y(wᵀx − b), 0).
36
What objective does SVM minimise?
J(w,b)= (1/n)Σ max(1−y_j(wᵀx_j−b),0) + (λ/2)‖w‖₂².
37
Why include regularisation in SVM?
To penalise large weights and improve generalisation.
38
What is a soft-margin SVM?
An SVM with low C (high λ), allowing misclassified or low-margin points.
39
What is a hard-margin SVM?
An SVM with very large C (small λ), forcing perfect separability.
40
How is the perceptron related to SVM?
It is SVM with b=0, λ=0, trained via stochastic gradient descent.
41
What probability does logistic regression model?
P(y=1|x) = σ(wᵀx − b).
42
What loss does logistic regression typically use?
Cross-entropy loss l(y,z)=−y log z − (1−y) log(1−z).
43
What is logistic loss for classification?
−1[y=1] log σ(wᵀx−b) − 1[y=0] log σ(−wᵀx+b).
44
How can linear models handle non-linear boundaries?
By applying feature maps x → φ(x) that add nonlinear transformations.