Linear Regression Flashcards

(13 cards)

1
Q

What is bias-variance tradeoff?
Linear Regression vs. Deep NN, which has higher bias/variance?

A

Bias: Error due to wrong assumptions → underfitting.

Variance: Error due to sensitivity to data fluctuations → overfitting.

Linear Regression - high bias
Deep NN - high variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear Regression equation, what optimisation method is used?

A

y ≈ β₀ + β₁x₁ + … + βₙxₙ
OLS used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

OLS closed form answer, Is it convex/concave? Assumptions of OLS?

A

β̂ = (XᵀX)⁻¹ Xᵀy
Convex because quadratic and global minimum guaranteed.

  1. Linearity: y = Xβ + ε
  2. Exogeneity: E[ε|X] = 0
  3. Homoscedasticity: Var(ε|X) = σ²
  4. No perfect multicollinearity: XᵀX invertible
  5. Errors are uncorrelated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Evaluation of Linear Regression metrics, their formulas and standard error of β̂.

A

R² = 1 − (RSS/TSS) → fraction of variance explained by the model.
Other metrics: MSE, RMSE.

SE(β̂) = √[ σ² (XᵀX)⁻¹ ] = √ [σ²/n]
Where:
σ² = variance of the errors (estimated as σ̂² = RSS / (n − p), with p = number of predictors including intercept).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Properties of R² and Adjusted R² formula.

A

Can’t decrease R² when you add variables (even useless ones), so we use:

Adjusted R²
= 1 − [(1 − R²) * (n − 1) / (n − p − 1)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does F statistic say and what is p value? When do we reject H₀?

A

It tests whether the overall regression model is significant, i.e., whether at least one predictor has a non-zero coefficient.

The p-value measures the probability of observing the data (or more extreme) assuming the null hypothesis H₀ is true.

Generally when p<0.05, we reject H₀

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Methods of Feature selection in Multiple Linear Regression?

A

1) Forward selection - start with null model and add variables based on p values or R2 values.

2) Backward selection: start w/ all predictors and then remove one at a time to a simpler/better model.

3) Mixed selection: start with a null model keep adding and subtracting using p values or adjusted R2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the OLS objective function for baseline linear regression, ridge, lasso and elastic net.

A

Regular: min_w ||y − Xw||²

Ridge:
min_w ||y − Xw||² + λ||w||²₂

Lasso: min_w ||y − Xw||² + λ||w||₁

Elastic Net:
min_w ||y − Xw||² + λ₁||w||₁ + λ₂||w||²₂

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Ridge Regression?

How does ridge regression shrink coefficients?

Does ridge regression perform feature selection?

A

Linear regression with an L2 penalty on coefficients.

Continuously shrinks them toward zero but never exactly zero.

No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is Ridge preferred?
Effect of λ in ridge regression?
What happens when λ = 0?
How does ridge handle multicollinearity?

A

When many correlated predictors exist.

Larger λ → stronger shrinkage.

Ridge reduces to OLS.

Distributes coefficient weights across correlated features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Lasso? Key property of lasso? Does Lasso perform feature selection?

A

Linear regression with an L1 penalty on coefficients.

Produces sparse models.

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does lasso behave with correlated predictors?

When is lasso preferred?

A

Selects one and ignores the rest.

When only a few predictors are truly relevant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Elastic Net? Why was elastic net introduced? How does elastic net handle correlated predictors?

A

A combination of L1 and L2 regularization.

To combine sparsity of lasso with stability of ridge.

Tends to select groups of correlated features together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly