What is the Bayes optimal classifier in binary classification?
The predictor f*(x) = 1[ P(Y=1|X=x) ≥ threshold ] that minimises expected risk.
What is the likelihood ratio used in the optimal rule?
“ℒ(x) = ρ(x|Y=1) / ρ(x|Y=0).”
What form does the Bayes classifier take using the likelihood ratio?
“f*(x)=1[ ℒ(x) ≥ (π₀(l01−l00)) / (π₁(l10−l11)) ].”
What is a likelihood ratio test (LRT)?
A classifier of the form fη(x)=1[ℒ(x)≥η] with threshold η.
What does Neyman–Pearson lemma state?
If class-conditional densities are continuous, the classifier that maximises TPR subject to FPR ≤ α is an LRT with threshold chosen so FPR=α.
What are Type I and Type II errors?
Type I: false positive. Type II: false negative.
What is True Positive Rate (TPR)?
P(f(X)=1 | Y=1).
What is False Negative Rate (FNR)?
P(f(X)=0 | Y=1).
What is False Positive Rate (FPR)?
P(f(X)=1 | Y=0).
What is True Negative Rate (TNR)?
P(f(X)=0 | Y=0).
How can risk be decomposed using FPR and TPR?
R[f] = α·FPR − β·TPR + γ for constants α,β≥0.
What is an alternative supervised learning goal beyond minimising risk?
Maximising TPR subject to constraint FPR ≤ α.
What distinguishes discriminative from generative models?
Discriminative models learn predictors f; generative models model ρ(x,y) or ρ(y), ρ(x|y).
What do generative models learn?
ρ(x|y), ρ(y) allowing computation of ρ(y|x) via Bayes’ rule.
What does Linear Discriminant Analysis assume?
Classes have Gaussian class-conditional distributions with shared covariance Σ and different means μᵢ.
What is Quadratic Discriminant Analysis?
A generative model like LDA but each class has its own covariance Σᵢ.
What assumption does Naive Bayes make?
Conditional independence of features xⱼ given class y.
What loss corresponds to maximum likelihood estimation?
The log-loss l(x,y,θ)=−logρθ(x,y).
What is empirical risk for maximum likelihood?
R̂(θ)=Σ_j −logρθ(x_j, y_j).
Why are generative models hard in practice?
It’s difficult to specify a realistic density model ρ(x,y) for complex data.
What is the purpose of regularisation?
To penalise model complexity and prevent overfitting by modifying the objective.
What is explicit regularisation?
Adding λΩ(f) to empirical risk: J(f)=R̂(f)+λΩ(f).
What does λ (lambda) control in regularisation?
The strength of the complexity penalty; it is a hyperparameter chosen with cross-validation.
What is L2 regularisation?
Ω(θ)=‖θ‖₂² = Σ_j θⱼ² (ridge).