Chatpter - 2 Statistical Learning Flashcards

(28 cards)

1
Q

What is the general form of a statistical learning model?

A

Y = f(X) + ε, where f is the unknown function capturing the relationship between predictors X and response Y, and ε is a random error term independent of X with mean zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between a predictor and a response variable?

A

Predictors (X1, X2, … Xp) are input or independent variables used to explain or predict outcomes. The response variable (Y) is the output or dependent variable being predicted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two main reasons to estimate f?

A

Prediction and Inference. Prediction uses f-hat to estimate Y when only X is known. Inference aims to understand how Y is associated with predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are reducible vs irreducible errors?

A

Reducible error comes from inaccuracies in the estimated function f-hat and can be reduced with better models. Irreducible error is the variance of the error term ε and cannot be eliminated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is training data?

A

Training data is the dataset {(x1,y1),…,(xn,yn)} used to train a model so it can learn the relationship between predictors and response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between parametric and non-parametric methods?

A

Parametric methods assume a specific functional form for f and estimate parameters. Non-parametric methods do not assume a specific shape for f and instead learn it directly from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Mean Squared Error (MSE) formula and what does it measure?

A

MSE = (1/n) Σ(yi − f̂(xi))². It measures the average squared difference between predicted and actual values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Write the Bias-Variance decomposition of expected test MSE.

A

E(y0 − f̂(x0))² = Var(f̂(x0)) + Bias(f̂(x0))² + Var(ε). These represent model variance, squared bias, and irreducible error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the KNN classification probability formula?

A

Pr(Y=j | X=x0) = (1/K) Σ I(yi = j) over the K nearest neighbors. It estimates class probability by the fraction of neighbors belonging to class j.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Bayes error rate formula?

A

1 − E[max_j Pr(Y=j | X)]. It represents the lowest possible classification error achievable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the assumptions of parametric methods?

A

Parametric methods assume a specific functional form for the relationship between predictors and response, such as linearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When does irreducible error exist?

A

Always. It arises from unmeasured variables, inherent randomness, or noise in the system that cannot be modeled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is bias and what is variance in a model?

A

Bias is error due to incorrect assumptions about the model form (underfitting). Variance is error due to sensitivity to training data fluctuations (overfitting).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the shape of the test MSE curve as model flexibility increases?

A

A U-shaped curve. Initially test MSE decreases as bias falls, then increases as variance dominates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens to bias and variance as K increases in KNN?

A

As K increases, bias increases and variance decreases. As K decreases, bias decreases and variance increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Prediction vs Inference — how do they differ?

A

Prediction focuses on accurate predictions and often uses flexible models. Inference focuses on understanding relationships between predictors and response.

17
Q

Parametric vs Non-parametric methods — pros and cons?

A

Parametric methods are simpler and require less data but risk incorrect assumptions. Non-parametric methods are flexible but require more data and may overfit.

18
Q

Supervised vs Unsupervised learning — what’s the difference?

A

Supervised learning uses labeled data with both X and Y. Unsupervised learning uses only X and seeks patterns or structure in data.

19
Q

Regression vs Classification — when do you use each?

A

Regression is used when the response variable is continuous. Classification is used when the response variable is categorical.

20
Q

Flexibility vs Interpretability in models?

A

As model flexibility increases, interpretability generally decreases. Simple linear models are interpretable; complex models like deep learning are less interpretable.

21
Q

Bayes Classifier vs KNN — how are they related?

A

The Bayes classifier is the theoretical optimal classifier using true probabilities. KNN approximates these probabilities using nearby observations.

22
Q

Why is minimizing training MSE not sufficient for model selection?

A

Because a model may overfit training data. The goal is to minimize test MSE, which reflects performance on unseen data.

23
Q

When would you prefer a less flexible model?

A

When interpretability is important, when the true relationship is simple, when data is limited, or when avoiding overfitting.

24
Q

What is the No Free Lunch theorem in statistics?

A

No single algorithm performs best for all datasets. Model performance depends on the specific data and problem.

25
Can a model with very low training error be a bad model? Why?
Yes. It may overfit the training data, capturing noise rather than true patterns, resulting in poor performance on new data.
26
Why is the Bayes classifier never used in practice?
It requires knowledge of the true conditional probability distribution Pr(Y|X), which is unknown for real-world data.
27
If training data for KNN doubles
what happens to bias and variance?
28
Why does a very rough thin-plate spline overfit?
With low smoothness constraints, the spline fits every training point perfectly, capturing noise instead of the true underlying pattern.