What is model specification?
Model specification: involves choosing the set of variables to include in a regression
What are 5 key principals analysts can use to minimize potential for specification errors?
What are 4 things that could cause a regression to deviate from its functional form?
What is model misspecification?
What’s the difference between Unconditional heteroskedasticity and Conditional heteroskedasticity, and which one is more problematic?
conditional heteroskedasticity is more problematic.
What is the Breusch Pagan test, and formula?
A test for conditional heteroskedasticity (one sides, right tailed test)
Breusch Pagan Test = n*R2
n = number of observations
What is serial correlation or auto correlation? How does it affect t statistic and f statistic?
What is the difference between positive correlation & negative correlation?
What is the Durbin-Watson (DW) test vs the Breusch-Godfrey (BG) test?
DW test: methods to access presence of serial correlation and can only be used for positive serial correlation/first order correlation and negative serial correlation (should not differ significantly from 2.)
BG test: can be used for both positive and negative serial correlation (degrees of freedom for BG = (N-K-P-1) P = # of lags.)
What are 3 affects of positive serial correlation vs 3 affects of negative serial correlation?
positive serial correlation
- standard errors/residuals are underestimated
- t -statistic are inflated
- type 1 error increases
negative serial correlation
- standard errors/residuals are over stated
- f -statistic are understated
- type 2 errors increases
What’s the difference between type 1 and type 2 errors?
type 1: false positive (indicates they have disease when they don’t) (when true null hypothesis is rejected)
type 2: false negative (indicates they don’t have disease when they do) (when failing to reject a false null hypothesis)
What is multi-collinearity and 3 affects?
What is the name and formula for detecting multi-collinearity?
variation inflation factor (VIF) = 1 / (1 - Rj^2)
Rj ^2 = R^2 for each independent variable
VIF 1= no correlation among independent variables
VIF 1< correlation among independent variables (suggested above 5 investigate and above 10 serious multi-collinearity which will require change to model)
How do you correct for serial correlation and heteroskedasticity?
( eg. Imagine trying to measure the height of a bouncy kid. If you only use one snapshot (OLS), you’ll get a bad estimate because the kid keeps jumping. Newey-West smooths out the bounces by averaging past movements and accounting for how wild the jumps are.)
What are 3 possible solutions to addressing multicollinearity? What causes multicollinearity?
reduces multicollinearity because the new variable isn’t as strongly correlated with square footage.)