Parameter Estimation Flashcards

Question

Define quadratic form.

Answer 1

Expression v^T A v producing a scalar. Intuition: generalized 'squared length'. Example: (y − Xβ)^T(y − Xβ).

Answer 2

||v||_2 = sqrt(v^T v). Intuition: Euclidean length. Example: minimizing RSS is minimizing ||y − Xβ||_2^2.

Answer 3

E[ε_i] = 0 for all i. Intuition: errors centered at zero; no systematic shift. Example: residuals should be centered around 0.

Answer 4

E[y_i] = x_i^T β (equivalently E[y|x] is linear in predictors). Intuition: straight-line mean trend. Example: expected sales increases linearly with budgets (given model).

Answer 5

Cov(ε_i, ε_j) = 0 for i≠j. Intuition: noise of one observation doesn’t predict another. Example: companies’ unexplained shocks are independent-ish.

Answer 6

Var(ε_i)=σ^2 constant across i. Intuition: same noise level for all x. Example: sales variability doesn’t grow with ad budget.

Answer 7

X has full column rank so X^T X is invertible. Intuition: no perfect multicollinearity. Example: YouTube not an exact linear combo of Facebook/newspaper.

Answer 8

Under mean-zero, linearity, uncorrelated errors with constant variance, and full rank, OLS is BLUE: best (minimum variance) among unbiased linear estimators. Intuition: OLS is optimal in a specific class. Example: among unbiased linear estimators of β, OLS variance is smallest.

Answer 9

Best Linear Unbiased Estimator. Intuition: within linear unbiased estimators, you can’t beat OLS variance. Example: OLS beats any other linear unbiased estimator.

Answer 10

E[estimator] − true parameter. Intuition: systematic error. Example: biased slope consistently overshoots β1.

Answer 11

How much an estimator varies across repeated samples. Intuition: stability. Example: high-variance β̂ jumps around between samples.

Answer 12

Chooses parameter values maximizing the likelihood (joint density of data as a function of parameters). Intuition: make observed data most 'probable' under the model. Example: with normal errors, MLE for β equals OLS.

Answer 13

Assume ε_i ~ Normal(0, σ^2) i.i.d. Intuition: bell-shaped noise. Example: then maximizing log-likelihood ↔ minimizing RSS.

Answer 14

Log of likelihood; easier to maximize and has same maximizer. Intuition: turns products into sums. Example: normal log-likelihood contains −(1/2σ^2)Σ(y_i−μ_i)^2.

Answer 15

Independent and identically distributed. Intuition: same distribution, no dependence. Example: ε_i are independent with same σ^2.

Answer 16

Fit: estimate β; explain: interpret β; predict: compute ŷ for new x. Intuition: three distinct goals. Example: budgets → sales interpretation vs forecast.

Answer 17

Using the same data for exploration and formal inference can inflate false positives. Intuition: 'peeking' then testing. Example: exploring correlations then claiming significance on same dataset.

Answer 18

Point beyond Q3+1.5·IQR or below Q1−1.5·IQR. Intuition: unusually large/small value. Example: very high newspaper budget flagged in boxplot.

Answer 19

Measure of linear association between two variables (−1 to 1). Intuition: strength of straight-line relationship. Example: corr(sales, YouTube) ≈ 0.78 in marketing data.

Answer 20

Plot pairs to see form/strength/outliers/heteroskedasticity. Intuition: look before modeling. Example: sales vs YouTube shows curvature + increasing spread.

Answer 21

In SLR, residual is the vertical difference between a point and the fitted line: e_i = y_i − (β̂0+β̂1 x_i). Intuition: how far up/down the point is from the line. Example: at x=10, actual y=25, predicted y=22 ⇒ residual=3.

Answer 22

Squaring makes deviations positive, penalizes large errors more, and yields a smooth objective with a closed-form solution. Intuition: big mistakes matter a lot. Example: residuals 2 and −2 both contribute 4.

Answer 23

β̂_OLS is the β that minimizes RSS = (y−Xβ)^T(y−Xβ). Intuition: best-fitting hyperplane in squared-error sense. Example: computed by lm() in R.

Answer 24

The squared 2-norm of the residual vector: ||y − Xβ||_2^2. Intuition: shortest distance from y to Col(X). Example: distance from y to its projection ŷ.

Answer 25

ŷ is the orthogonal projection of y onto Col(X). Intuition: closest point in model space. Example: ŷ lies in span of X columns.

Answer 26

e = y − ŷ is orthogonal to Col(X): X^T e = 0 at OLS solution. Intuition: leftover is perpendicular. Example: normal equations are orthogonality conditions.

Answer 27

Columns of X are linearly independent; no column is an exact linear combo of others. Intuition: each predictor adds unique information. Example: not having Facebook = 2*YouTube exactly.

Answer 28

If predictors are perfectly collinear or p+1 > n, X^T X is singular. Intuition: cannot uniquely identify β. Example: duplicate predictor column.

Answer 29

Least-squares estimates (β̂), fitted values ŷ, residuals e, and diagnostics based on the OLS fit. Intuition: a complete OLS summary. Example: summary(lm(...)) gives estimates, SEs, t, p.

Answer 30

Numerically stable method to compute OLS without explicitly inverting X^T X. Intuition: avoid unstable matrix inversion. Example: lm() typically uses QR decomposition.

Answer 31

Once observed, y values are realizations; estimation conditions on observed data. Intuition: you can’t change the sample you got. Example: OLS uses fixed y and X to estimate β.

Answer 32

Linear in parameters β (not necessarily linear in x after transformations). Intuition: β enter as a linear combination. Example: y = β0 + β1 log(x) is linear in β.

Answer 33

The mean structure explained by predictors. Intuition: what the model tries to capture. Example: predicted sales from budgets.

Answer 34

Unexplained variability not captured by predictors. Intuition: noise + omitted factors. Example: competitor actions, seasonality.

Answer 35

Minimum variance among all linear unbiased estimators. Intuition: most precise unbiased linear β̂. Example: OLS beats other linear unbiased β estimators.

Answer 36

BLUE result uses only mean/variance/covariance assumptions, not distribution shape. Intuition: geometry + second moments. Example: OLS is BLUE even with non-normal errors.

Answer 37

Exact finite-sample t/F inference and MLE equivalence; large-sample inference can rely on CLT. Intuition: distributional convenience. Example: normal residuals justify t-tests exactly.

Answer 38

ŷ minimizes ||y − v|| over all v in Col(X). Intuition: closest point in that subspace. Example: no other fitted-value vector has smaller RSS.

Answer 39

Maps observed responses to fitted values: ŷ = Hy; residuals are (I−H)y. Intuition: linear operator for fitting. Example: compute fitted values via matrix multiplication.

Answer 40

Because it puts a 'hat' on y: y → ŷ. Intuition: 'hats' denote estimates. Example: ŷ = Hy.

Answer 41

More rows than columns (n > p+1). Intuition: more data points than parameters. Example: 200×4 design matrix.

Answer 42

Too many equations to satisfy exactly; must approximate. Intuition: can’t pass through all points. Example: pick best plane instead of exact match.

Answer 43

Residual e_i = y_i − ŷ_i is computed from data; error ε_i is unobserved random variable in the true model. Intuition: residual is what you see; error is what generated the data. Example: ε_i unknown even after fitting; e_i is output from lm().

Answer 44

Fitted values ŷ_i are predictions for observed x_i; prediction for new x_* uses same form but with new inputs. Intuition: in-sample vs out-of-sample. Example: predict(lmfit, newdata=...).

Answer 45

Model is linear if it’s a linear combo of β’s; x can be transformed. Intuition: β enter without being multiplied together. Example: y=β0+β1 x+β2 x^2 is linear in β.

Answer 46

Uncorrelated: Cov(ε_i,ε_j)=0; independent: stronger, implies uncorrelated (for finite variance). Intuition: independence is a bigger assumption. Example: Gauss–Markov needs uncorrelated; MLE derivation used independence.

Answer 47

Homoskedasticity is about Var(ε|x); normality is about distribution shape. Intuition: spread vs shape. Example: errors can be non-normal but homoskedastic.

Answer 48

Unbiased means correct on average; efficient means smallest variance among a class. Intuition: accuracy vs precision. Example: OLS is unbiased and (within linear unbiased) efficient.

Answer 49

Existence: minimizer always exists; uniqueness requires full column rank (invertible X^T X). Intuition: you can minimize, but might be many solutions. Example: perfect collinearity → multiple β give same ŷ.

Answer 50

Col(X) is span of columns (possible ŷ); Row(X) is span of rows. Intuition: fitted values live in Col(X). Example: projection is onto Col(X), not Row(X).

Answer 51

Steps: 1) Fit model to get β̂. 2) Compute ŷ_i = x_i^T β̂. 3) e_i = y_i − ŷ_i. Formula: e = y − ŷ. When: after fitting to assess model fit. Mistake: confusing e_i with ε_i (unobserved).

Answer 52

Steps: 1) Write x_i row incl. intercept (1, x_i1,...,x_ip). 2) Multiply by β̂. Formula: ŷ_i = x_i^T β̂. When: prediction at observed x. Mistake: forgetting intercept or dummy coding.

Answer 53

RSS(β) = (y − Xβ)^T (y − Xβ) = ||y − Xβ||_2^2. When: deriving OLS. Mistake: treating as scalar without respecting matrix order.

Answer 54

Set gradient to 0: ∂/∂β RSS = −2X^T y + 2X^T Xβ = 0 ⇒ X^T Xβ = X^T y. When: computing β̂. Mistake: dropping transposes or mixing order.

Answer 55

If X^T X invertible: β̂ = (X^T X)^{-1} X^T y. When: theoretical derivations; software uses numerically stable methods. Mistake: explicitly inverting in code unnecessarily.

Answer 56

H = X (X^T X)^{-1} X^T. Then ŷ = Hy; e = (I − H)y. When: leverage/influence theory. Mistake: H is n×n, not (p+1)×(p+1).

Answer 57

h_ii = (H)_{ii}. Steps: compute H (or use lm.influence()) and take diagonal. When: identify high-leverage points. Mistake: confusing leverage with residual size.

Answer 58

(X^T X)^T = X^T (X^T)^T = X^T X. When: matrix algebra in derivations. Mistake: forgetting reverse order when transposing products.

Answer 59

Result: ∂(Xv)/∂v = X. When: gradient derivations. Mistake: mixing scalar and vector derivatives.

Answer 60

Result: ∂/∂v [v^T (X^T X) v] = 2(X^T X)v. When: derivative of β^T X^T X β. Mistake: missing factor of 2.

Answer 61

ℓ(β) = const − (1/(2σ^2)) Σ (y_i − μ_i)^2 where μ_i = x_i^T β. Maximizing ℓ ↔ minimizing RSS. Mistake: forgetting negative sign or constants.

Answer 62

Code: fit <- lm(y ~ x, data=df) Extract: coef(fit), fitted(fit), resid(fit), summary(fit). Mistake: forgetting data= or variable names.

Answer 63

Code: fit <- lm(y ~ x1 + x2 + x3, data=df) Mistake: using * when you meant + (adds interaction).

Answer 64

Code: summary(fit)$coefficients Gives: Estimate, Std. Error, t value, Pr(>|t|). Mistake: interpreting p-value as effect size.

Answer 65

Code: predict(fit, newdata=newdf) Mistake: newdf must have same predictor names and factor levels.

Answer 66

Code: sum(is.na(df)) Mistake: NA vs other missing codes (e.g., 999).

Answer 67

Code: boxplot(df$var) Outliers: boxplot.stats(df$var)$out Mistake: deleting outliers automatically without investigation.

Answer 68

Code: cor(df) Mistake: using correlation to imply causation.

Answer 69

Code: pairs(df) Mistake: concluding linearity without checking curvature/variance patterns.

Answer 70

RSS = (y−Xβ)^T(y−Xβ) = y^T y − y^T Xβ − β^T X^T y + β^T X^T Xβ. Because middle terms are scalars, y^T Xβ = (β^T X^T y)^T = β^T X^T y, so combine to −2β^T X^T y.

Answer 71

∂/∂β [y^T y] = 0; ∂/∂β [−2β^T X^T y] = −2X^T y; ∂/∂β [β^T X^T Xβ] = 2X^T Xβ.

Answer 72

Set gradient to 0: −2X^T y + 2X^T Xβ = 0 ⇒ X^T Xβ = X^T y ⇒ β̂ = (X^T X)^{-1}X^T y (if invertible).

Answer 73

False. Residuals e_i are computed from fitted model: e_i = y_i − ŷ_i. Errors ε_i are unobserved random variables in the true data-generating process. Exam trick: they use ε and ê interchangeably—watch the hat.

Answer 74

Usually no. It's numerically less stable than QR-based methods (what lm() uses). Correct: use lm() (or qr.solve / crossprod with care). Exam trick: asks for 'the formula' vs 'recommended computation'.

Answer 75

False. OLS estimation does not require predictors to be normal. Normality is about errors (and mostly for inference). Exam trick: swaps 'predictors' with 'errors'.

Answer 76

False. Gauss–Markov uses mean-zero, linearity, uncorrelated errors with constant variance, and full rank. Exam trick: confuses BLUE result with t-test assumptions.

Answer 77

MLE=OLS only under added assumptions (e.g., normal i.i.d. errors). 'Optimal' depends on criterion/class. With heteroskedasticity or correlation, OLS isn’t BLUE and other estimators may be better. Exam trick: overgeneralizing a conditional result.

Answer 78

True (typical case). OLS chooses β minimizing ||y−Xβ||^2, i.e., closest point in Col(X). Exam trick: they might say OLS 'solves' y=Xβ exactly—wrong.

Answer 79

Intercept interpretation may be extrapolation or meaningless if x=0 isn’t plausible/observed. Exam trick: intercept interpretation requires context.

Answer 80

False. Leverage depends on x-values (geometry), not y. A point can have high leverage but small residual (fits the model well). Exam trick: conflates leverage with outlier in y.

Answer 81

False. L1 loss (least absolute deviations) yields a different estimator (median-based, not closed form like OLS). Exam trick: assumes all loss functions equivalent.

Answer 82

Yes—unbiasedness depends on mean-zero and exogeneity-type assumptions, not normality. Non-normality mainly affects exact small-sample inference. Exam trick: treats normality as required for unbiasedness.

Answer 83

False. Symmetric does not imply invertible; singular symmetric matrices exist (det=0). Exam trick: 'symmetric' ≠ 'full rank'.

Answer 84

False reasoning. They are equal because each is a scalar and equal to its transpose, not because multiplication commutes. Exam trick: tries to bait you into saying matrices commute.

Answer 85

OLS minimizes squared *vertical* distances in y-direction (residuals), not horizontal x distances. Exam trick: swaps axes.

Answer 86

False. Perfect collinearity makes X^T X singular; β̂ is not uniquely identified. Exam trick: 'more predictors always better'.

Answer 87

Correlation shows association, not causation. Causal claims require design/assumptions beyond regression. Exam trick: causal language in a regression course.

Answer 88

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 89

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 90

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 91

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 92

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 93

H is n×n because it maps an n-vector y to an n-vector ŷ. Watch for: mixing parameter dimension with observation dimension.

Answer 94

True. Coefficient units change; predictions remain invariant to linear re-scaling. Watch for: interpreting β without units.

Answer 95

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 96

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 97

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 98

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 99

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 100

H is n×n because it maps an n-vector y to an n-vector ŷ. Watch for: mixing parameter dimension with observation dimension.

Answer 101

True. Coefficient units change; predictions remain invariant to linear re-scaling. Watch for: interpreting β without units.

Answer 102

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 103

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 104

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 105

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 106

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 107

H is n×n because it maps an n-vector y to an n-vector ŷ. Watch for: mixing parameter dimension with observation dimension.

Answer 108

True. Coefficient units change; predictions remain invariant to linear re-scaling. Watch for: interpreting β without units.

Answer 109

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 110

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 111

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 112

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 113

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 114

H is n×n because it maps an n-vector y to an n-vector ŷ. Watch for: mixing parameter dimension with observation dimension.

Answer 115

True. Coefficient units change; predictions remain invariant to linear re-scaling. Watch for: interpreting β without units.

Answer 116

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 117

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 118

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 119

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 120

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 121

H is n×n because it maps an n-vector y to an n-vector ŷ. Watch for: mixing parameter dimension with observation dimension.

Answer 122

True. Coefficient units change; predictions remain invariant to linear re-scaling. Watch for: interpreting β without units.

Answer 123

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 124

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 125

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 126

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 127

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 128

H is n×n because it maps an n-vector y to an n-vector ŷ. Watch for: mixing parameter dimension with observation dimension.

Answer 129

True. Coefficient units change; predictions remain invariant to linear re-scaling. Watch for: interpreting β without units.

Answer 130

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 131

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 132

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 133

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 134

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 135

H is n×n because it maps an n-vector y to an n-vector ŷ. Watch for: mixing parameter dimension with observation dimension.

Answer 136

True. Coefficient units change; predictions remain invariant to linear re-scaling. Watch for: interpreting β without units.

Answer 137

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 138

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 139

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 140

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 141

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 142

H is n×n because it maps an n-vector y to an n-vector ŷ. Watch for: mixing parameter dimension with observation dimension.

Answer 143

True. Coefficient units change; predictions remain invariant to linear re-scaling. Watch for: interpreting β without units.

Answer 144

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 145

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 146

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 147

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 148

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 149

H is n×n because it maps an n-vector y to an n-vector ŷ. Watch for: mixing parameter dimension with observation dimension.

Answer 150

True. Coefficient units change; predictions remain invariant to linear re-scaling. Watch for: interpreting β without units.

Answer 151

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 152

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 153

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 154

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 155

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 156

H is n×n because it maps an n-vector y to an n-vector ŷ. Watch for: mixing parameter dimension with observation dimension.

Answer 157

True. Coefficient units change; predictions remain invariant to linear re-scaling. Watch for: interpreting β without units.

Answer 158

False. Only true if the model fits perfectly (RSS=0) or if p+1=n and X full rank (interpolation), which is not typical. Watch for: confusing fitted values with observed values.

Answer 159

No. Tall helps, but you also need columns linearly independent (no perfect multicollinearity). Watch for: 'tall' ≠ 'full rank'.

Answer 160

False. Independence ⇒ uncorrelated (with finite variance), but uncorrelated does not imply independent. Watch for: reversing implication.

Answer 161

OLS minimizes RSS for given X and y, not the sampling variance of residuals; and residual variance depends on σ^2 and model fit. Watch for: mixing optimization objective with sampling properties.

Answer 162

False. It’s closest among vectors in Col(X) only. Watch for: forgetting 'within the subspace'.

Answer 163

TSS = Σ(y_i − ȳ)^2; total variability around the mean. Intuition: variability with no predictors. Example: sample variance × (n−1).

Answer 164

ESS = Σ(ŷ_i − ȳ)^2; variability explained by the regression. Intuition: gain from using model vs mean. Example: improvement over y-bar model.

Answer 165

RSS = Σ(y_i − ŷ_i)^2; unexplained variability. Intuition: leftover noise. Example: vertical deviations from fitted line.

Answer 166

TSS = ESS + RSS (for models with intercept). Intuition: total = explained + unexplained.

Answer 167

df_resid = n − (p+1); loss from estimating parameters. Intuition: info left after fitting.

Answer 168

Population variance of ε; noise level. Intuition: spread around true regression.

Answer 169

σ̂² = RSS / (n − p − 1); unbiased estimator. Intuition: average squared residual.

Answer 170

R² = 1 − RSS/TSS = ESS/TSS. Intuition: proportion of variability explained.

Answer 171

Adjusted R² penalizes extra predictors. Intuition: fairer model comparison.

Answer 172

Parameters cannot be uniquely estimated when XᵀX is singular. Intuition: redundant predictors.

Answer 173

Exact linear dependence among predictors. Intuition: measuring same thing twice.

Answer 174

Predictors highly correlated but not exact multiples. Intuition: unstable estimates.

Answer 175

TSS = Σ(y_i − ȳ)^2. Use: total variability. Mistake: forgetting intercept requirement.

Answer 176

ESS = Σ(ŷ_i − ȳ)^2. Use: explained variability.

Answer 177

σ̂² = RSS / (n − p − 1). Use: inference. Mistake: dividing by n.

Answer 178

R² = 1 − RSS/TSS. Use: descriptive fit. Mistake: using for causal claims.

Answer 179

anova(lm_fit) extracts ESS and RSS. Mistake: confusing rows.

Answer 180

False. Wrong functional form can still yield high R².

Answer 181

False. It never decreases; use adjusted R².

Answer 182

Exact non-identifiability; drop a predictor.

Answer 183

Possible multicollinearity.

Parameter Estimation Flashcards

(207 cards)