ML: Linear Regression Flashcards

(35 cards)

1
Q

What is the geometric interpretation of linear regression?

A

Fitting a hyperplane that minimizes the perpendicular distance (least squares) between observed targets and predictions; mathematically, projecting y onto the column space of X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Derive the closed-form OLS solution.

A

Minimize: βˆ₯π‘¦βˆ’π‘‹π›½βˆ₯^2

Set derivative to zero: 𝑋^𝑇𝑋𝛽=𝑋^𝑇𝑦

Solution (if invertible):𝛽=(𝑋^𝑇𝑋)^βˆ’1𝑋^𝑇𝑦

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is OLS unbiased?

A

Because under the assumption
𝐸[πœ€βˆ£π‘‹]=0

𝐸[𝛽^]=𝛽

i.e., estimator expectation equals the true population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When does the OLS solution not exist?

A

When 𝑋^𝑇𝑋 is non-invertible (singular/pseudo-singular), typically due to perfect multicollinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What metrics are commonly used to evaluate linear regression?

A

RMSE

MAE

𝑅^2 and Adjusted R^2

Cross-validated RMSE/MAE

Prediction intervals (when uncertainty matters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a negative R^2 indicate?

A

The model performs worse than a horizontal line at the mean of the target variable.

Model is worse than just taking average of data as prediciton.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is Adjusted R^2 preferred for model comparison?

A

Unlike R^2, it penalizes adding irrelevant predictors, preventing inflated model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Where does linear regression appear in agentic AI evaluation?

A

Reward model fitting (e.g., mapping features β†’ scalar reward)

Calibration of LLM confidence scores

Post-hoc explainability (e.g., linear surrogate models for SHAP/LIME)

Retrieval scoring baselines in RAG (TF-IDF or BM25 often mapped linearly to relevance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the Gauss–Markov assumptions?

A

Linearity

No perfect multicollinearity

Exogeneity (errors uncorrelated with predictors)

Homoscedasticity

No autocorrelation
Under these, OLS is BLUE (Best Linear Unbiased Estimator).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is normality an assumption for unbiased coefficients?

A

No. Normality is needed only for valid hypothesis tests and confidence intervals, not for estimating coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What steps do you take if residuals show heteroscedasticity?

A

Transform target (log, Box–Cox)

Use Weighted Least Squares

Use robust standard errors (Huber–White)

Switch to models tolerant to heteroscedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you detect multicollinearity?

A

VIF (Variance Inflation Factor)

Condition number

Correlation matrix

Singular values of 𝑋^𝑇𝑋

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why does multicollinearity not harm predictions but harms inference?

A

Coefficients become unstable and highly sensitive to noise, but combined predictions may still be good if features span the same subspace.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does omitted variable bias occur?

A

If an omitted variable is correlated with both the included predictor and the target, coefficients absorb the correlation and become biased.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Compare forward selection, backward selection, and stepwise selection.

A

Forward: start empty β†’ add predictors that improve model

Backward: start full β†’ remove least useful predictors

Stepwise: mix of both, with bidirectional testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is subset selection often unstable?

A

Small data perturbations may change which variables are selected; sensitive to correlated features.

17
Q

Why are LASSO and Ridge more reliable than subset selection?

A

Because they produce more stable solutions through continuous shrinkage and avoid combinatorial search.

18
Q

What is leverage vs influence?

A

Leverage: unusual predictor values (x-space)

Influence: actual impact on fitted regression (e.g., via Cook’s distance).
High leverage β‰  influential unless it also changes predictions.

19
Q

How do you handle outliers?

A

Robust regression (Huber, RANSAC)

Winsorizing

Transformations

Check data errors

Use median-based metrics (MAE)

20
Q

What is the consequence of heteroscedasticity?

A

OLS remains unbiased but no longer has minimum variance β†’ standard errors become incorrect β†’ hypothesis tests invalid.

21
Q

How do you test for heteroscedasticity?

A

Breusch–Pagan

White test

Residual plots (funnel shape)

22
Q

Why doesn’t OLS require normally distributed predictors or residuals for unbiasedness?

A

Bias depends on expectation of errors conditional on X; normality only affects inference/testing, not point estimation.

23
Q

How do you check residual normality?

A

QQ-plots

Shapiro–Wilk test

Kolmogorov–Smirnov test

Skewness/kurtosis indicators

24
Q

How does regularization help with multicollinearity?

A

Ridge shrinks coefficient magnitudes, stabilizing them

LASSO performs feature selection

Both reduce variance inflation.

25
What does a very large VIF value indicate and what should you do?
VIF > 10 suggests severe multicollinearity. Fix by removing the variable, combining features, or using PCA/Ridge regression.
26
What is a confounder in regression?
A variable that influences both the predictor and outcome, creating spurious associations.
27
How do you adjust for confounding?
Include confounder as a predictor Use stratification Use propensity matching Use instrumental variables if confounder unmeasurable
28
What is a GLM?
A model where: 𝑦 comes from an exponential family distribution Linear predictor πœ‚=𝑋𝛽 Link function 𝑔(𝐸[𝑦])=πœ‚
29
Examples of GLMs and their link functions?
Logistic regression β†’ logit link Poisson regression β†’ log link Gamma regression β†’ inverse link Gaussian regression β†’ identity link
30
When would you prefer Poisson regression over linear regression?
For count data with non-negative integer values and variance proportional to the mean.
31
When is quasi-Poisson or negative binomial preferred?
If data exhibit overdispersionβ€”variance much larger than the mean.
32
Why do LLM agents still rely on linear models internally?
Interpretability (linear reward models) Fast calibration layers Lightweight routing/classification steps Debuggable surrogate models for SHAP/LIME explanations Estimating tool success probabilities
33
How does linear regression help debug RAG systems?
Used to model: Relevance score vs. answer correctness Hallucination probability vs. retrieval confidence Retrieval features β†’ hit/miss outcomes Helps detect poorly calibrated retrievers.
34
Why is linear regression useful for agent reward shaping?
Linear models correlate low-dimensional features with agent success signals, producing stable, monotonic reward shaping with interpretable weights.
35
How does multicollinearity appear in LLM feature spaces?
High-dimensional embeddings often include strongly correlated directions; linear probing on such embeddings often requires Ridge regularization.