Statistical Learning Flashcards by Daniel Baron

What are predictors (features)?

The input variables x1,x2,…,xp. (Ideally) Independent variables used to predict y.

How well did you know this?

Not at all

Perfectly

What is the response variable?

The output y. Also called the target or dependent variable. The thing we are trying to predict.

How well did you know this?

Not at all

Perfectly

What is the goal of statistical learning?

Estimate a function f(x) that relates predictors to response.

How well did you know this?

Not at all

Perfectly

What is the general statistical model?

Y=f(X)+ε where ε is random error.

How well did you know this?

Not at all

Perfectly

Why can’t we perfectly predict Y?

Because of irreducible error ε.

How well did you know this?

Not at all

Perfectly

What type of response variable gives a regression problem?

Quantitative (numeric).

How well did you know this?

Not at all

Perfectly

What type of response variable gives a classification problem?

Qualitative (categorical).

How well did you know this?

Not at all

Perfectly

Give 2 examples of regression problems.

House price prediction, life expectancy prediction.

How well did you know this?

Not at all

Perfectly

Give 2 examples of classification problems.

Fraud detection, disease diagnosis.

How well did you know this?

Not at all

Perfectly

What is a linear regression model with p predictors?

Y=β0+β1X1+⋯+βpXp

How well did you know this?

Not at all

Perfectly

What do the coefficients βj represent?

The change in Y for a 1-unit increase in X (slope of each feature)j, holding others fixed.

How well did you know this?

Not at all

Perfectly

What assumption does linear regression make?

The relationship between predictors and response is linear.

How well did you know this?

Not at all

Perfectly

What is polynomial regression?

A regression model that includes powers of a predictor (e.g. Y=β0+β1X1+⋯+βp(X_p)^p).

How well did you know this?

Not at all

Perfectly

What is Mean Squared Error (MSE)?

average of difference b/w outcomes and prediction, errors, squared. u = (u_1, … , u_p) and v =(v_1, … , v_p)

MSE=(1/p) * ∑(u_i−v_i)^2

How well did you know this?

Not at all

Perfectly

Why square the errors in MSE?

Penalizes large errors more heavily and ensures no negative errors.

How well did you know this?

Not at all

Perfectly

What is MAE?

Study These Flashcards

Mean Absolute Error. Average of absolute difference b/w outcomes and predictions.. u = (u_1, … , u_p) and v =(v_1, … , v_p)

MSE=(1/p) * ∑|u_i−v_i|

Which metric penalizes large errors more, MSE or MAE?

Study These Flashcards

MSE. bc squared

What is training data used for?

Study These Flashcards

to train y=f(x) to obtain a learned model y=f_hat(x)

What is testing data used for?

Study These Flashcards

Evaluating accuracy on the learned model

what is training error used for?

Study These Flashcards

quantify difference b/w true response & models predicted response for the data that the model was fit to.

Why is testing error important?

Study These Flashcards

It reflects how the model performs on data/information it has never seen before

We always expect training error to be ___ testing error.

Study These Flashcards

Less than

What is overfitting?

Study These Flashcards

when the model matches the training so well that there ends up being Low training error, high testing error

Why does overfitting happen?

Study These Flashcards

Model is too flexible and captures noise. Picking up random trends in data rather than just an underlying pattern

Flexible models have ___ bias.

Low

Rigid models have ___ bias.

High

Flexible models have ___ variance.

High

Rigid models have ___ variance.

Low

What is the bias-variance tradeoff?

obtaining a small testing MSE requires finding a model f_hat w/ low bias & variance

What is irreducible error?

Error from noise 𝜖 that cannot be eliminated or improved upon.(b/c data noise cannot be predicted from model)

What is reducible error?

error we can,in theory, improve upon w/ a better modelling choice

what is variance?

measures how f_hat changes w/ training data

what is bias?

the error obtained by estimating f w/ f_hat. deviance from the truth

a sample size n large, few predictor (p small) is

flexible. w/ fewer predictors, harder to overfit

many predictors (p large), few observations (n small): ___

inflexible/rigid,more likely to overfit

when the relationship is highly nonlinear__-

flexible model. w/inflexibles models its less likely

whats the tierlist of stuff from high to low interpretability?

1. Subset Selection Lasso 2. Least Squares 3. Generalized Additive Models/Trees 4. Bagging Boosting 5. Support Vector Machine and Deep Learning

Statistical Learning Flashcards

(37 cards)