What is panel data?
Panel (or longitudinal) data – combine time-series and cross-sectional data by including observations on the same variables from the same cross-sectional sample from two or more different time periods.
What are different panel types?
What are 4 diferent kinds of variables that we encounter while using panel data?
Why use panel data?
What are types of panel regressions?
What are advantages and disadvantages of Pooled OLS?
Advantage: a large sample size, leading to precise estimators and test statistics with more power.
Disadvantage: does not capture different trends that might be present in the data in different time periods. Though this can be mitigated by including dummy variables that control for different years, but then you would be just running a fixed effects model.
What is important to do wihen creating Pooled OLS model?
You need to check for and fix where necessary:
1. (Multi)collinearity
2. Heteroscedasticity of residuals
3. Normality of residuals
4. Autocorrelation of residuals
Advantage of FE models over POLS
Avoids bias due to omitted variables that don’t change over time (like geography) or that change over time (if you include time fixed effect) equally for all entities (like the speed limit). It does so by including the dummy variables outlined prior. That is, the dummy variables allowing each entity’s intercept and each time period’s intercept to vary around the omitted condition baseline (when all the fixed effect dummies equal zero).
Disadvantages of FE model
Advantages and disadvantage of the RE model
Advantages: more df than FE; you can estimate coefficients for explanatory variables that are constant over time
Disadvantage: requires us to assume that the unobserved impact of the omitted variables is uncorrelated with the independent variables (𝑋𝑠), if we’re going to avoid omitted variable bias.
How to select the best panel regression model?
By running some tests:
1. Joint significance of differing group means – tests if pooled OLS is better than the fixed effects model; if 𝐻_0 is rejected fixed effects is better.
2. Breusch-Pagan test – tests if pooled OLS is better than the random effects model; if 𝐻_0 is rejected, it means random effects is better.
3. Hausman test – checks if random effects is better than fixed effects; if 𝐻_0 is rejected fixed effects is better.
Why stationarity is not important for panel data?
However, there is also a lot of evidence that if you have relatively large time-frames, non-stationarity might create a lot of issues. So it is important to fix that
How non-stationarity can be fixed in panel data?
Levin-Lin-Chu test – a test that, essentially, is a modification of the ADF test, which checks stationarity of the time trends in each cross-sectional entity and aggregates the results for the whole panel model.
If non-stationarity is identified it has to be fixed the same way how non-stationary time-series data is fixed, by calculating the difference between consecutive observations.
However, here you have to be careful as differencing can heavily shrink the sample size of your data (i.e. taking the difference gets rid of the first observations in each cross-sectional entity).
Autocorrelation and heteroscedasticity of residuals
Regarding heteroscedasticity and autocorrelation, you, essentially use similar tests that you used on time-series and cross-sectional data.
The only noteworthy expectation is that for random effects data sets heteroskedasticity is not a big problem, due to how residuals are used in such models, so you do not need to check for it.
How autocorrelation and heteroscedasticity in panel regressions can be fixed?
Autocorrelation and heteroscedasticity in panel regressions can be fixed by employing the same approaches that you use for time-series and cross-sectional data, including using appropriate robust standard errors.
With panel regressions, you have a choice between several roust standard errors, two of which are included in gretl:
1. Arellano – it is a standard error that allows to take into account heteroscedasticity and autocorrelation if you have large 𝑛 and small 𝑇.
2. Panel-corrected standard errors (PCSE) – it takes into account heteroscedasticity issues, and to a smaller extent autocorrelation issues. Hence, you use this when you only have an issue of heteroscedasticity in your model.
How collinearity can be fixed in panel data regressions?
For Pooled OLS you can use the same approach we used before – Variance Inflation Factor (VIF) to check for collinearity.
However, for fixed and random effects models you need to use instead Belsley-Kuh-Welsch collinearity diagnostics, which is similar to VIF, but has a bit more complicated interpretation.