What is the bias–variance trade-off?
Bias = error from oversimplification. Introduced by approximating real-world problems with an overly simplistic model. Wrong assumptions about data.
Variance = error from oversensitivity. Introduced by a model’s sensitivity to small noise in training data. Model too complex and overfits training data.
Increasing complexity ↓ bias but ↑ variance.
Why does overfitting occur?
Model learns noise or spurious patterns due to:
High complexity
Too few examples
Poor regularization
Data leakage
What is the difference between training error and generalization error?
Training error = performance on training data.
Generalization error = expected performance on unseen data.
What is cross-validation and why is it used?
Resampling method to estimate out-of-sample performance.
Reduces variance and avoids relying on a single validation split.
Explain k-fold vs stratified k-fold CV.
Stratified preserves class proportions → better stability in classification tasks.
What is regularization? Give examples.
Techniques to reduce overfitting by penalizing complexity:
L1 (sparsity)
L2 (weight shrinkage)
Dropout
Data augmentation
Early stopping
How does L1 differ from L2 regularization?
L1: promotes sparse weights; feature selection
L2: distributes weights; smoother optimization; prevents large weights
What is the purpose of a validation set?
Hyperparameter tuning and model selection without contaminating the test set.
What is data leakage, and how do you prevent it?
Using information from the test/validation set during training.
Prevent by applying preprocessing inside pipelines after splitting data.
How do you detect overfitting in practice?
Training loss ↓, validation loss ↑
Gap between training and validation metrics
Poor performance shift after deployment
Why is early stopping a form of regularization?
Stops training before the model fits noise; limits effective complexity.
When would AUC be preferred over accuracy?
Class imbalance
Different decision thresholds
When ranking performance matters
What evaluation metric do you use for class imbalance?
F1-score
Balanced accuracy
ROC-AUC
Precision–Recall AUC
Why do ML engineers care about calibration?
Probabilities should reflect true likelihood → critical for risk-sensitive tasks (credit scoring, medical decisions, agentic planning).
How do you choose between a simpler and more complex model?
Trade-off accuracy vs interpretability, training cost, robustness, risk of overfitting, and deployment constraints.
What is hyperparameter search, and what are common strategies?
Exploration of config space:
Grid search
Random search
Bayesian optimization
Hyperband / ASHA
Population-based training
Why does dropout reduce overfitting?
Forces networks to not rely on specific neurons; ensemble averaging effect.
Why is validation loss often noisier than training loss?
Validation set is smaller; no gradient smoothing; stochasticity in regularization (dropout, augmentation).
How does batch normalization affect model evaluation?
Different behavior in train vs eval mode (uses running averages in eval).
Incorrect mode → inflated or broken metrics.
What is the impact of label noise on model selection?
Increases variance
Causes overfitting
Makes validation metrics unreliable
Need robust losses or cleaning strategies.
What is the double descent phenomenon?
Deep models can generalize well even far past the interpolation threshold where classical bias–variance predicts poor performance.
Why is the test set used exactly once?
Repeated evaluation leaks information into training → overly optimistic performance.
What is the difference between global and local interpretability?
Global: model-wide behavior (feature importance, coefficients)
Local: individual prediction explanations (SHAP, LIME)
How does SHAP compute feature attributions?
Uses Shapley values from cooperative game theory → average marginal contribution of each feature.