What is the difference between the error term (u) and the residual (û)
u (error term): Unobservable; the true difference between actual y and the population regression line. Contains all omitted factors.
û (residual): Observable; the difference between actual y and our estimated (sample) regression line.
What is the Zero Conditional Mean Assumption (SLR.4) and why is it critical?
E(u∣x)=0 . It means the error term is unrelated to x. Critical because: If it fails (e.g., omitted variable bias), OLS estimates are biased.
What is the difference between homoskedasticity and heteroskedasticity ?
Homoskedasticity: Constant variance of errors (Var(u∣x)=σ 2).
Heteroskedasticity: Variance of errors changes with x. This does NOT bias coefficients, but makes standard errors invalid.
What three things affect the variance of β1 ( residual version with hat) ?
Error variance (σ2): Larger error variance → larger variance
Sample variation in x (SSTx): More variation in x → smaller variance
Sample size (n): Larger n → smaller variance
What are the five Gauss-Markov assumptions (MLR.1 – MLR.5) ?
If all hold, OLS is BLUE (Best Linear Unbiased Estimator).
What is omitted variable bias ?
Bias occurs when a relevant variable is left out that is correlated with an included regressor.
What is multicollinearity and what are its consequences?
High correlation between two or more independent variables.
Consequences:
Large standard errors (imprecise estimates)
Coefficients may be individually insignificant even if jointly significant
Does not cause bias (if MLR.4 holds)
Does not violate MLR.3 (unless perfect correlation)
What is the difference between R2 and adjusted R2
?
R2: Always increases when you add variables (even irrelevant ones).
Adjusted R2 : Penalises adding variables. Use it to compare models with different numbers of regressors. Can be negative.
What is the difference between statistical significance and economic significance?
Statistical significance: Coefficient is reliably different from zero (small p-value / large |t|).
Economic significance: Coefficient is large enough to matter in the real world (practical importance). A tiny effect can be statistically significant with a large sample.
How do you interpret a 95% confidence interval for βj?
If we repeatedly took random samples and calculated the confidence interval each time, 95% of those intervals would contain the true population parameter βj”
(Not: “there is a 95% chance βj lies in this interval”.)
What is the relationship between the t-test (single restriction) and the F-test (multiple restrictions) ?
For testing a single linear restriction, the F-statistic equals the t-statistic squared. Both tests give the same conclusion.
For multiple restrictions, use the F-test (cannot use individual t-tests).
What is the dummy variable trap and how do you avoid it?
Perfect multicollinearity caused by including all dummy variables for a categorical variable (e.g., both “male” and “female”) plus an intercept.
Avoid by: Including only
m−1 dummies for m categories. The omitted category becomes the base group.
What happens if you include an irrelevant variable in your regression?
What happens if you omit a relevant variable that is correlated with included regressors?