ML: Linear Regression Flashcards by Henry Ye

What is the geometric interpretation of linear regression?

Fitting a hyperplane that minimizes the perpendicular distance (least squares) between observed targets and predictions; mathematically, projecting y onto the column space of X.

How well did you know this?

Not at all

Perfectly

Derive the closed-form OLS solution.

Minimize: ∥𝑦−𝑋𝛽∥^2

Set derivative to zero: 𝑋^𝑇𝑋𝛽=𝑋^𝑇𝑦

Solution (if invertible):𝛽=(𝑋^𝑇𝑋)^−1𝑋^𝑇𝑦

How well did you know this?

Not at all

Perfectly

Why is OLS unbiased?

Because under the assumption
𝐸[𝜀∣𝑋]=0

𝐸[𝛽^]=𝛽

i.e., estimator expectation equals the true population parameter.

How well did you know this?

Not at all

Perfectly

When does the OLS solution not exist?

When 𝑋^𝑇𝑋 is non-invertible (singular/pseudo-singular), typically due to perfect multicollinearity.

How well did you know this?

Not at all

Perfectly

What metrics are commonly used to evaluate linear regression?

RMSE

MAE

𝑅^2 and Adjusted R^2

Cross-validated RMSE/MAE

Prediction intervals (when uncertainty matters)

How well did you know this?

Not at all

Perfectly

What does a negative R^2 indicate?

The model performs worse than a horizontal line at the mean of the target variable.

Model is worse than just taking average of data as prediciton.

How well did you know this?

Not at all

Perfectly

Why is Adjusted R^2 preferred for model comparison?

Unlike R^2, it penalizes adding irrelevant predictors, preventing inflated model performance.

How well did you know this?

Not at all

Perfectly

Where does linear regression appear in agentic AI evaluation?

Reward model fitting (e.g., mapping features → scalar reward)

Calibration of LLM confidence scores

Post-hoc explainability (e.g., linear surrogate models for SHAP/LIME)

Retrieval scoring baselines in RAG (TF-IDF or BM25 often mapped linearly to relevance).

How well did you know this?

Not at all

Perfectly

What are the Gauss–Markov assumptions?

Linearity

No perfect multicollinearity

Exogeneity (errors uncorrelated with predictors)

Homoscedasticity

No autocorrelation
Under these, OLS is BLUE (Best Linear Unbiased Estimator).

How well did you know this?

Not at all

Perfectly

Is normality an assumption for unbiased coefficients?

No. Normality is needed only for valid hypothesis tests and confidence intervals, not for estimating coefficients.

How well did you know this?

Not at all

Perfectly

What steps do you take if residuals show heteroscedasticity?

Transform target (log, Box–Cox)

Use Weighted Least Squares

Use robust standard errors (Huber–White)

Switch to models tolerant to heteroscedasticity

How well did you know this?

Not at all

Perfectly

How do you detect multicollinearity?

VIF (Variance Inflation Factor)

Condition number

Correlation matrix

Singular values of 𝑋^𝑇𝑋

How well did you know this?

Not at all

Perfectly

Why does multicollinearity not harm predictions but harms inference?

Coefficients become unstable and highly sensitive to noise, but combined predictions may still be good if features span the same subspace.

How well did you know this?

Not at all

Perfectly

How does omitted variable bias occur?

If an omitted variable is correlated with both the included predictor and the target, coefficients absorb the correlation and become biased.

How well did you know this?

Not at all

Perfectly

Compare forward selection, backward selection, and stepwise selection.

Forward: start empty → add predictors that improve model

Backward: start full → remove least useful predictors

Stepwise: mix of both, with bidirectional testing

How well did you know this?

Not at all

Perfectly

Why is subset selection often unstable?

Study These Flashcards

Small data perturbations may change which variables are selected; sensitive to correlated features.

Why are LASSO and Ridge more reliable than subset selection?

Study These Flashcards

Because they produce more stable solutions through continuous shrinkage and avoid combinatorial search.

What is leverage vs influence?

Study These Flashcards

Leverage: unusual predictor values (x-space)

Influence: actual impact on fitted regression (e.g., via Cook’s distance).
High leverage ≠ influential unless it also changes predictions.

How do you handle outliers?

Study These Flashcards

Robust regression (Huber, RANSAC)

Winsorizing

Transformations

Check data errors

Use median-based metrics (MAE)

What is the consequence of heteroscedasticity?

Study These Flashcards

OLS remains unbiased but no longer has minimum variance → standard errors become incorrect → hypothesis tests invalid.

How do you test for heteroscedasticity?

Study These Flashcards

Breusch–Pagan

White test

Residual plots (funnel shape)

Why doesn’t OLS require normally distributed predictors or residuals for unbiasedness?

Study These Flashcards

Bias depends on expectation of errors conditional on X; normality only affects inference/testing, not point estimation.

How do you check residual normality?

Study These Flashcards

QQ-plots

Shapiro–Wilk test

Kolmogorov–Smirnov test

Skewness/kurtosis indicators

How does regularization help with multicollinearity?

Study These Flashcards

Ridge shrinks coefficient magnitudes, stabilizing them

LASSO performs feature selection

Both reduce variance inflation.

What does a very large VIF value indicate and what should you do?

VIF > 10 suggests severe multicollinearity. Fix by removing the variable, combining features, or using PCA/Ridge regression.

What is a confounder in regression?

A variable that influences both the predictor and outcome, creating spurious associations.

How do you adjust for confounding?

Include confounder as a predictor Use stratification Use propensity matching Use instrumental variables if confounder unmeasurable

What is a GLM?

A model where: 𝑦 comes from an exponential family distribution Linear predictor 𝜂=𝑋𝛽 Link function 𝑔(𝐸[𝑦])=𝜂

Examples of GLMs and their link functions?

Logistic regression → logit link Poisson regression → log link Gamma regression → inverse link Gaussian regression → identity link

When would you prefer Poisson regression over linear regression?

For count data with non-negative integer values and variance proportional to the mean.

When is quasi-Poisson or negative binomial preferred?

If data exhibit overdispersion—variance much larger than the mean.

Why do LLM agents still rely on linear models internally?

Interpretability (linear reward models) Fast calibration layers Lightweight routing/classification steps Debuggable surrogate models for SHAP/LIME explanations Estimating tool success probabilities

How does linear regression help debug RAG systems?

Used to model: Relevance score vs. answer correctness Hallucination probability vs. retrieval confidence Retrieval features → hit/miss outcomes Helps detect poorly calibrated retrievers.

Why is linear regression useful for agent reward shaping?

Linear models correlate low-dimensional features with agent success signals, producing stable, monotonic reward shaping with interpretable weights.

How does multicollinearity appear in LLM feature spaces?

High-dimensional embeddings often include strongly correlated directions; linear probing on such embeddings often requires Ridge regularization.

ML: Linear Regression Flashcards

(35 cards)