What is the multivariate linear regression model?
Yi = β0 + β1X1i + β2X2i + … + βkXki + ui. It studies how multiple explanatory variables jointly affect one dependent variable.
What does the zero conditional mean assumption imply in the multivariate case?
E[ui | X1i, X2i, …, Xki] = 0. It ensures the error term is uncorrelated with all regressors, making OLS unbiased and consistent.
What are the four key Least Squares assumptions in the multivariate model?
1) Zero conditional mean: E[ui | X1i, …, Xki] = 0; 2) Data are i.i.d.; 3) Large outliers are unlikely (finite fourth moments); 4) No perfect multicollinearity.
What is the dummy variable trap?
Including all categories’ dummies (plus an intercept), which creates perfect multicollinearity and prevents OLS estimation. Solution: omit one base category.
What is imperfect multicollinearity?
Regressors are highly, but not perfectly, correlated, making it difficult to distinguish individual effects.
Why is multicollinearity a problem?
It inflates the variances of estimated coefficients, yielding less precise estimates and smaller t-statistics.
What is omitted variable bias (OVB)?
Bias that occurs when a relevant variable correlated with an included regressor is omitted and absorbed in the error term.
Which Least Squares assumption is violated when there is omitted variable bias?
The zero conditional mean assumption: E[ui | Xi] ≠ 0.
Should we include all variables to avoid omitted variable bias? Why (not)?
No. Include relevant, predetermined regressors. Irrelevant variables increase variance, and ‘bad controls’ bias causal interpretation.
What are ‘bad controls’?
Variables that are outcomes of the treatment or regressors of interest (e.g., occupation when estimating the effect of education on earnings).
What is a joint hypothesis?
A hypothesis imposing multiple coefficient restrictions simultaneously (e.g., H0: β3 = β4 = 0).
Which test is used to test a joint hypothesis?
The F-test, which compares the fit of restricted vs. unrestricted models.
When should the F-test be used instead of the t-test?
Use an F-test for two or more simultaneous restrictions; use a t-test for a single restriction on one coefficient.
Why can’t t-tests be used for joint hypotheses?
Because coefficients can be correlated; the F-test accounts for their covariance and tests joint significance.
What is the conditional mean independence assumption?
Once controls are included, the expected error does not depend on regressors: E[ui | X1i, …, Xki] = 0.
What is the difference between a control variable and a variable of interest?
A variable of interest’s coefficient measures the causal effect of interest; controls isolate this effect by holding other influences constant.
Do coefficients on control variables measure causal effects?
Not necessarily. They often capture associations used to adjust for confounding rather than direct causal effects.
What are the main measures of fit in a multivariate regression model?
R², Adjusted R², and the Standard Error of the Regression (SER).
When interpreting coefficients in log-linear models, what does β1 mean?
β1 × 100 gives the approximate percentage change in Y for a one-unit increase in X (if β1 is small).
What does model specification in practice involve?
Choosing a base model grounded in theory, testing alternative specifications, and checking robustness of coefficients to ensure results are not driven by model choice.