What is panel data?
Panel data consist of observations for N cross-sectional units (e.g., individuals, firms) observed over T time periods. It combines both cross-sectional and time-series dimensions.
What are the main advantages of using panel data?
1) Controls for omitted variables that are time-invariant; 2) Increases precision through more observations; 3) Allows study of dynamic behavior; 4) Identifies effects not detectable in pure cross-sections or time series.
Write the general static panel data model and explain its components.
Y_it = α + X_itβ + η_i + u_it, where η_i captures unobserved, time-invariant individual effects and u_it is the idiosyncratic error term varying over i and t.
What problem arises if cov(X_it, η_i) ≠ 0 in the pooled OLS model?
The zero conditional mean assumption E[u_it|X_it] = 0 is violated, causing biased and inconsistent OLS estimates due to omitted variable bias.
What is the identifying assumption in static panel data models?
Strict exogeneity: E[u_it | X_i1, …, X_iT, η_i] = 0. This implies regressors are uncorrelated with past, current, and future idiosyncratic errors.
What happens if the strict exogeneity assumption is violated?
The within (FE) and first-difference estimators become inconsistent; results cannot be given a causal interpretation.
Explain the within (fixed effects) estimator using the within-transformation.
The within transformation demeans all variables: Y_it - Ȳ_i = β(X_it - X̄_i) + (u_it - ū_i). FE estimator: β̂_FE = (X̃’X̃)^(-1) X̃’Ỹ, where X̃ = X_it - X̄_i.
What is the main source of variation exploited by the fixed effects estimator?
Within-individual variation over time; it uses how changes in X_it within an individual affect Y_it while holding time-invariant factors constant.
Why are time-invariant regressors not identified in a fixed effects model?
Because the within-transformation removes all time-invariant characteristics (their variation is fully captured by η_i).
What is the formula for the first-difference (FD) estimator?
ΔY_it = βΔX_it + Δu_it, estimated by OLS. The FD estimator is consistent if E[(X_it - X_it−1)(u_it - u_it−1)] = 0 (strict exogeneity).
What are the main disadvantages of fixed effects estimation?
1) Cannot estimate time-invariant regressors; 2) Out-of-sample prediction impossible; 3) Loss of degrees of freedom; 4) May produce imprecise estimates if most variation is between individuals.
Write the random effects model and its key additional assumption.
Y_it = α + X_itβ + η_i + u_it, with E[X_it, η_i] = 0. The RE model assumes that individual effects are uncorrelated with regressors.
What is the main difference between the FE and RE models?
FE allows correlation between X_it and η_i, while RE assumes independence. Thus, RE is more efficient under E[X_it, η_i]=0, but inconsistent otherwise.
What is the Hausman test used for in panel data analysis?
To test whether RE is consistent. H0: E[η_i|X_it]=0 (both FE and RE consistent, RE efficient). H1: E[η_i|X_it]≠0 (only FE consistent).
What is the formula for the Hausman test statistic?
HW = (β̂_FE − β̂_RE)’ [Var(β̂_FE) − Var(β̂_RE)]⁻¹ (β̂_FE − β̂_RE) ~ χ²(k), where k is the number of regressors.
How can you test for the existence of individual fixed effects?
Use an F-test: H0: η_1 = η_2 = … = η_N (pooled model). F = ((SSR₀−SSR₁)/(N−1)) / (SSR₁/(NT−N−K)). Reject H0 if F > critical value.
How can you test the validity of strict exogeneity in a FE model?
Estimate Y_it = βX_it + γX_it+1 + η_i + u_it and test H0: γ = 0. If rejected, strict exogeneity is violated.
Why is the RE estimator more efficient than FE under the null hypothesis of the Hausman test?
Because RE exploits both within- and between-individual variation, whereas FE only uses within variation.
In what cases are the FE, LSDV, and FD estimators identical?
If T = 2 (two time periods), FE, LSDV, and FD estimators yield identical estimates.
What type of standard errors should be used in FE estimation and why?
Clustered (HAC) standard errors, since they are robust to heteroskedasticity and serial correlation within individuals.