HW 1 Flashcards by Olivia Kelnreiter

What is the average wage gap between blacks and nonblacks in this model?

The UNCONDITIONAL log wage gap is -0.318. I.e. blacks have a

(e^-0.318-1) * 100% = -27.2%

lower wage than nonblacks on average.

How well did you know this?

Not at all

Perfectly

What is the average wage gap between blacks and nonblacks in this model?

The CONDITIONAL log wage gap is

-0.178 = (e^-0.178-1) * 100% = -16.3%

lower wages.

How well did you know this?

Not at all

Perfectly

Give an economic explanation as to why the estimated
gaps di er between the models. What do you conclude concerning underspecification and possible endogeneity problems?

Part of the unconditional difference is driven by fact that wage also depends on education, experience, and region. The gap is smaller in the second model, because blacks are more likely to have less education, experience and live in the south.

The first model is clearly underspecified, as important predictor variables for the wage were omitted. Since these omitted variables are correlated with “black”, the effect of “black” in model1 is biased.

How well did you know this?

Not at all

Perfectly

Model 3 additionally includes interaction terms between black and experience and experience squared to analyze whether the racial wage gap depends on experience. Calculate the average wage gap between blacks and non-blacks for workers with x years of experience.

difference calculation (calculate only the terms that are multiplied with “black”, as only these differ):

0.15989 + exper*-0.0541 + exper² * 0.00164

How well did you know this?

Not at all

Perfectly

Explain the idea of AIC and the role of the two components of the AIC formula.

Idea: bias-variance tradeoff, more parameters decrease bias but increase variance
RSS(K): As K increases, RSS decreases -> less bias. This rewards the quality of the approcimation
2K/N: As K increases, variance increases. This part “punishes” simply adding additional parameters.

How well did you know this?

Not at all

Perfectly

What are the advantages of the AIC compared to R2 for the purpose of model selection?

The R2 only cares about the bias, and thus always suggests the model with the largest number of parameters. The AIC is better because it weights bias and variance against each other.

How well did you know this?

Not at all

Perfectly

Based on the results, which model do you prefer?

Model 3 performs best (lowest AIC) and is therefore preferable.

How well did you know this?

Not at all

Perfectly

Show theoretically that OLS regression residuals are in general not homoskedastic, even if the error terms are homoskedastic.

Bring residuals in appropriate form:
e = Y - X^β = Y - X(X’X)^-1X’Y = (I_N - P) Y
Detect distribution (homo- or heteroskedastic):
Var(e) = Var( (I_N - P) Y ) = (I_N - P) Var(Y) (I_N - P)’ =
= M Var(ε) M’ = M σ²I_NM’
As M is the residual maker and idempotent so that MM’ = M: Var (e) = (I_N - P) σ²I_N
From (I_N - P) we see that residuals are generally not homoskedastic

How well did you know this?

Not at all

Perfectly

Explain how you would use residual analysis to check for heteroskedasticity.

Transform the residuals to standardized or studentized residuals and plot against fitted values.

How well did you know this?

Not at all

Perfectly

How can the OLS regression residuals be transformed in an appropriate way to detect heteroskedasticity graphically?

We know that OLS residuals have Var(e_i) = (1 - P_ii)σ²
Standardized residuals are defined by:
* *ẽ_i = e / [σ√(1-p_ii)]**
Therefore the variance of standardized residuals is:
* *Var(ẽ_i) = [1 / σ√(1-p_ii)] Var(e_i) [1 / σ√(1-p_ii)]’**
Through simplification, you get Var(ẽ_i) = 1

How well did you know this?

Not at all

Perfectly

Explain how the White test for heteroskedasticity is implemented.

H₀: E [e_i² | x_i] = σ² = constant
H₁: E [e_i² | x_i] ≠ constant

Regress squared OLS residuals on all variables from the linear model (incl. interaction term and squares), intercept is the constant variance. Under H₀, the regressors should be jointly insignificant.

How well did you know this?

Not at all

Perfectly

The results of the White test for model 3 are given in the output. How do you interpret the result of the test? What do you conclude?

p-value = 0.093

At 5% significance, we fail to reject H₀ (homoskedasticity).

At 10% significance, we reject H₀ (homoskedasticity).

How well did you know this?

Not at all

Perfectly

What is the interpretation of the coefficient on cigpacks?

Each additional pack of cigaretts decreases the birth weight by 11.5 ounces, holding the other variables fixed. The effect is statistically significant at the 1%-level.

How well did you know this?

Not at all

Perfectly

Under which condition is the OLS estimator of the effect of cigpacks unbiased? Under which condition is it consistent? Do you think these conditions are fulfilled in the given example?

Condition for unbiasedness: E(ε|X)=0
Condition for consistency: E(X’ε) = 0 or Cov(ε, X)=0

Conditions are likely not fulfilled. There could be some unobserved characteristics of the mother (e.g. health, personality traits etc.) that affect both smoking during pregnancy and the child’s weight.

How well did you know this?

Not at all

Perfectly

Which two conditions have to be fulfilled for an instrument to be valid? Can these conditions be tested, and if yes, how?

RELEVANCE of the instrument: Cov (X, Z) ≠ 0
- -> Can be tested by regressing X on Z (and all other variables to check if the effect on Z is significant
EXOGENEITY of the instrument: E(ε_i‘z) = 0
- -> Can not be tested

How well did you know this?

Not at all

Perfectly

In the present example, we use the variable cigprice as an instrument for cigpacks. Interpret the results below. Do you think cigprice is a suitable instrument in this context? Can you identify problems in the IV estimation of this example?

Study These Flashcards

Instrument is not statistically significant (p-value 0.317), i.e. the cigarette price has no effect on cigarette consumption.

–> The “relevance” condition for the instrument is clearly not fulfilled, and the instrument should not be used!

Study These Flashcards

One additional pack increases (!) the birth weight by 70.8 ounces, ceteris paribus.
Effect is not statistically significant.
Result is unexpected: positive sign and very large magnitude. The IV estimation is highly suscept given the weak instrument in the First Stage.

–> This is an example that weak instruments can sometimes produce extreme results.

What is the purpose of the Hausman test in the case of IV estimation? In the present example, the Hausman test gives a p-value of 0.278. Interpret this result. What are the limitations of the Hausman test in the present example?

Study These Flashcards

Wu-Hausmann test tests whether variable is actually exogenous:

H₀: cigpacks is exogeneous, H₁: cigpacks is endogenous
p-value of 0.272 –> Fail to reject the H0 of exogeneity, i.e. there is no evidence that there is an endogeneity problem. OLS is consistent.

LIMITATIONS:

Assumes that the instrument is valid
May fail to reject H₀ if standard errors are high

Assume that the data are generated according to the first model below, but we estimate the coefficient of x on the basis of the second model. What are the necessary steps to show the bias of the OLS estimator ^β?

Study These Flashcards

Get rid of the intercept by defining the residual maker P₀ = X(X’X)^-1X’ (in this case, X = 1_N)
Transform the model by multiplying each component with the residual maker M₀ = I_N - P₀
Calculate the bias (^β) = E [^β] - β₀
- Use that E [^β] = E[(X’X)^-1X’Y]