Chatpter - 2 Statistical Learning Flashcards by Bijon Guha

What is the general form of a statistical learning model?

Y = f(X) + ε, where f is the unknown function capturing the relationship between predictors X and response Y, and ε is a random error term independent of X with mean zero.

How well did you know this?

Not at all

Perfectly

What is the difference between a predictor and a response variable?

Predictors (X1, X2, … Xp) are input or independent variables used to explain or predict outcomes. The response variable (Y) is the output or dependent variable being predicted.

How well did you know this?

Not at all

Perfectly

What are the two main reasons to estimate f?

Prediction and Inference. Prediction uses f-hat to estimate Y when only X is known. Inference aims to understand how Y is associated with predictors.

How well did you know this?

Not at all

Perfectly

What are reducible vs irreducible errors?

Reducible error comes from inaccuracies in the estimated function f-hat and can be reduced with better models. Irreducible error is the variance of the error term ε and cannot be eliminated.

How well did you know this?

Not at all

Perfectly

What is training data?

Training data is the dataset {(x1,y1),…,(xn,yn)} used to train a model so it can learn the relationship between predictors and response.

How well did you know this?

Not at all

Perfectly

What is the difference between parametric and non-parametric methods?

Parametric methods assume a specific functional form for f and estimate parameters. Non-parametric methods do not assume a specific shape for f and instead learn it directly from data.

How well did you know this?

Not at all

Perfectly

What is the Mean Squared Error (MSE) formula and what does it measure?

MSE = (1/n) Σ(yi − f̂(xi))². It measures the average squared difference between predicted and actual values.

How well did you know this?

Not at all

Perfectly

Write the Bias-Variance decomposition of expected test MSE.

E(y0 − f̂(x0))² = Var(f̂(x0)) + Bias(f̂(x0))² + Var(ε). These represent model variance, squared bias, and irreducible error.

How well did you know this?

Not at all

Perfectly

What is the KNN classification probability formula?

Pr(Y=j | X=x0) = (1/K) Σ I(yi = j) over the K nearest neighbors. It estimates class probability by the fraction of neighbors belonging to class j.

How well did you know this?

Not at all

Perfectly

What is the Bayes error rate formula?

1 − E[max_j Pr(Y=j | X)]. It represents the lowest possible classification error achievable.

How well did you know this?

Not at all

Perfectly

What are the assumptions of parametric methods?

Parametric methods assume a specific functional form for the relationship between predictors and response, such as linearity.

How well did you know this?

Not at all

Perfectly

When does irreducible error exist?

Always. It arises from unmeasured variables, inherent randomness, or noise in the system that cannot be modeled.

How well did you know this?

Not at all

Perfectly

What is bias and what is variance in a model?

Bias is error due to incorrect assumptions about the model form (underfitting). Variance is error due to sensitivity to training data fluctuations (overfitting).

How well did you know this?

Not at all

Perfectly

What is the shape of the test MSE curve as model flexibility increases?

A U-shaped curve. Initially test MSE decreases as bias falls, then increases as variance dominates.

How well did you know this?

Not at all

Perfectly

What happens to bias and variance as K increases in KNN?

As K increases, bias increases and variance decreases. As K decreases, bias decreases and variance increases.

How well did you know this?

Not at all

Perfectly

Prediction vs Inference — how do they differ?

Study These Flashcards

Prediction focuses on accurate predictions and often uses flexible models. Inference focuses on understanding relationships between predictors and response.

Parametric vs Non-parametric methods — pros and cons?

Study These Flashcards

Parametric methods are simpler and require less data but risk incorrect assumptions. Non-parametric methods are flexible but require more data and may overfit.

Supervised vs Unsupervised learning — what’s the difference?

Study These Flashcards

Supervised learning uses labeled data with both X and Y. Unsupervised learning uses only X and seeks patterns or structure in data.

Regression vs Classification — when do you use each?

Study These Flashcards

Regression is used when the response variable is continuous. Classification is used when the response variable is categorical.

Flexibility vs Interpretability in models?

Study These Flashcards

As model flexibility increases, interpretability generally decreases. Simple linear models are interpretable; complex models like deep learning are less interpretable.

Bayes Classifier vs KNN — how are they related?

Study These Flashcards

The Bayes classifier is the theoretical optimal classifier using true probabilities. KNN approximates these probabilities using nearby observations.

Why is minimizing training MSE not sufficient for model selection?

Study These Flashcards

Because a model may overfit training data. The goal is to minimize test MSE, which reflects performance on unseen data.

When would you prefer a less flexible model?

Study These Flashcards

When interpretability is important, when the true relationship is simple, when data is limited, or when avoiding overfitting.

What is the No Free Lunch theorem in statistics?

Study These Flashcards

No single algorithm performs best for all datasets. Model performance depends on the specific data and problem.

Can a model with very low training error be a bad model? Why?

Yes. It may overfit the training data, capturing noise rather than true patterns, resulting in poor performance on new data.

Why is the Bayes classifier never used in practice?

It requires knowledge of the true conditional probability distribution Pr(Y|X), which is unknown for real-world data.

If training data for KNN doubles

what happens to bias and variance?

Why does a very rough thin-plate spline overfit?

With low smoothness constraints, the spline fits every training point perfectly, capturing noise instead of the true underlying pattern.

Chatpter - 2 Statistical Learning Flashcards

(28 cards)