Linear Regression Flashcards

Question 1

Q

What is bias-variance tradeoff?
Linear Regression vs. Deep NN, which has higher bias/variance?

Answer

A

Bias: Error due to wrong assumptions → underfitting.

Variance: Error due to sensitivity to data fluctuations → overfitting.

Linear Regression - high bias
Deep NN - high variance

Question 2

Q

Linear Regression equation, what optimisation method is used?

Answer

A

y ≈ β₀ + β₁x₁ + … + βₙxₙ
OLS used

Question 3

Q

OLS closed form answer, Is it convex/concave? Assumptions of OLS?

Answer

A

β̂ = (XᵀX)⁻¹ Xᵀy
Convex because quadratic and global minimum guaranteed.

Linearity: y = Xβ + ε
Exogeneity: E[ε|X] = 0
Homoscedasticity: Var(ε|X) = σ²
No perfect multicollinearity: XᵀX invertible
Errors are uncorrelated

Question 4

Q

Evaluation of Linear Regression metrics, their formulas and standard error of β̂.

Answer

A

R² = 1 − (RSS/TSS) → fraction of variance explained by the model.
Other metrics: MSE, RMSE.

SE(β̂) = √[ σ² (XᵀX)⁻¹ ] = √ [σ²/n]
Where:
σ² = variance of the errors (estimated as σ̂² = RSS / (n − p), with p = number of predictors including intercept).

Question 5

Q

Properties of R² and Adjusted R² formula.

Answer

A

Can’t decrease R² when you add variables (even useless ones), so we use:

Adjusted R²
= 1 − [(1 − R²) * (n − 1) / (n − p − 1)]

Question 6

Q

What does F statistic say and what is p value? When do we reject H₀?

Answer

A

It tests whether the overall regression model is significant, i.e., whether at least one predictor has a non-zero coefficient.

The p-value measures the probability of observing the data (or more extreme) assuming the null hypothesis H₀ is true.

Generally when p<0.05, we reject H₀

Question 7

Q

Methods of Feature selection in Multiple Linear Regression?

Answer

A

1) Forward selection - start with null model and add variables based on p values or R2 values.

2) Backward selection: start w/ all predictors and then remove one at a time to a simpler/better model.

3) Mixed selection: start with a null model keep adding and subtracting using p values or adjusted R2.

Question 8

Q

What is the OLS objective function for baseline linear regression, ridge, lasso and elastic net.

Answer

A

Regular: min_w ||y − Xw||²

Ridge:
min_w ||y − Xw||² + λ||w||²₂

Lasso: min_w ||y − Xw||² + λ||w||₁

Elastic Net:
min_w ||y − Xw||² + λ₁||w||₁ + λ₂||w||²₂

Question 9

Q

What is Ridge Regression?

How does ridge regression shrink coefficients?

Does ridge regression perform feature selection?

Answer

A

Linear regression with an L2 penalty on coefficients.

Continuously shrinks them toward zero but never exactly zero.

No.

Question 10

Q

When is Ridge preferred?
Effect of λ in ridge regression?
What happens when λ = 0?
How does ridge handle multicollinearity?

Answer

A

When many correlated predictors exist.

Larger λ → stronger shrinkage.

Ridge reduces to OLS.

Distributes coefficient weights across correlated features.

Question 11

Q

What is Lasso? Key property of lasso? Does Lasso perform feature selection?

Answer

A

Linear regression with an L1 penalty on coefficients.

Produces sparse models.

Yes.

Question 12

Q

How does lasso behave with correlated predictors?

When is lasso preferred?

Answer

A

Selects one and ignores the rest.

When only a few predictors are truly relevant.

Question 13

Q

What is Elastic Net? Why was elastic net introduced? How does elastic net handle correlated predictors?

Answer

A

A combination of L1 and L2 regularization.

To combine sparsity of lasso with stability of ridge.

Tends to select groups of correlated features together.

Linear Regression Flashcards

(13 cards)