What is the average wage gap between blacks and nonblacks in this model?
The UNCONDITIONAL log wage gap is -0.318. I.e. blacks have a
(e-0.318-1) * 100% = -27.2%
lower wage than nonblacks on average.
What is the average wage gap between blacks and nonblacks in this model?
The CONDITIONAL log wage gap is
-0.178 = (e-0.178-1) * 100% = -16.3%
lower wages.
Give an economic explanation as to why the estimated
gaps di er between the models. What do you conclude concerning underspecification and possible endogeneity problems?

Part of the unconditional difference is driven by fact that wage also depends on education, experience, and region. The gap is smaller in the second model, because blacks are more likely to have less education, experience and live in the south.
The first model is clearly underspecified, as important predictor variables for the wage were omitted. Since these omitted variables are correlated with “black”, the effect of “black” in model1 is biased.
Model 3 additionally includes interaction terms between black and experience and experience squared to analyze whether the racial wage gap depends on experience. Calculate the average wage gap between blacks and non-blacks for workers with x years of experience.

difference calculation (calculate only the terms that are multiplied with “black”, as only these differ):
0.15989 + exper*-0.0541 + exper² * 0.00164
Explain the idea of AIC and the role of the two components of the AIC formula.
What are the advantages of the AIC compared to R2 for the purpose of model selection?
The R2 only cares about the bias, and thus always suggests the model with the largest number of parameters. The AIC is better because it weights bias and variance against each other.
Based on the results, which model do you prefer?
Model 3 performs best (lowest AIC) and is therefore preferable.
Show theoretically that OLS regression residuals are in general not homoskedastic, even if the error terms are homoskedastic.
Explain how you would use residual analysis to check for heteroskedasticity.
Transform the residuals to standardized or studentized residuals and plot against fitted values.
How can the OLS regression residuals be transformed in an appropriate way to detect heteroskedasticity graphically?
Explain how the White test for heteroskedasticity is implemented.
Regress squared OLS residuals on all variables from the linear model (incl. interaction term and squares), intercept is the constant variance. Under H0, the regressors should be jointly insignificant.
The results of the White test for model 3 are given in the output. How do you interpret the result of the test? What do you conclude?
p-value = 0.093
At 5% significance, we fail to reject H0 (homoskedasticity).
At 10% significance, we reject H0 (homoskedasticity).
What is the interpretation of the coefficient on cigpacks?
Each additional pack of cigaretts decreases the birth weight by 11.5 ounces, holding the other variables fixed. The effect is statistically significant at the 1%-level.
Under which condition is the OLS estimator of the effect of cigpacks unbiased? Under which condition is it consistent? Do you think these conditions are fulfilled in the given example?
Conditions are likely not fulfilled. There could be some unobserved characteristics of the mother (e.g. health, personality traits etc.) that affect both smoking during pregnancy and the child’s weight.
Which two conditions have to be fulfilled for an instrument to be valid? Can these conditions be tested, and if yes, how?
In the present example, we use the variable cigprice as an instrument for cigpacks. Interpret the results below. Do you think cigprice is a suitable instrument in this context? Can you identify problems in the IV estimation of this example?
Instrument is not statistically significant (p-value 0.317), i.e. the cigarette price has no effect on cigarette consumption.
–> The “relevance” condition for the instrument is clearly not fulfilled, and the instrument should not be used!
In the present example, we use the variable cigprice as an instrument for cigpacks. Interpret the results below. Do you think cigprice is a suitable instrument in this context? Can you identify problems in the IV estimation of this example?
–> This is an example that weak instruments can sometimes produce extreme results.
What is the purpose of the Hausman test in the case of IV estimation? In the present example, the Hausman test gives a p-value of 0.278. Interpret this result. What are the limitations of the Hausman test in the present example?
Wu-Hausmann test tests whether variable is actually exogenous:
LIMITATIONS:
Assume that the data are generated according to the first model below, but we estimate the coefficient of x on the basis of the second model. What are the necessary steps to show the bias of the OLS estimator ^β?

How is the residual maker M defined?
M = I - P = I - X(X’X)-1X’
How can you rewrite the average of y = ȳ as a sum?
1/N Σi=1N yi
What is |m|?
the number of model parameters = k
What is the average of pii?
avg(pii) = 1/N Σi=1N pii =1/N tr(P) = 1/N tr(Ik) = K/N
Which model would you choose? Discuss first whether it is possible to compare all three models by the given AIC values.
The three models cannot directly be compared:
Of the first two, I’d choose the first which has a lower AIC.