Explain how to test whether a regression is affected by heteroskedasticity
Heteroskedasticity is where the variance of error systematically varies with one or more explanatory variables
Describe approaches to using heteroskedastic data
Characterise multicollinearity and its consequences; distinguish between multicollinearity and perfect collinearity
Multicollinearity is where one or more explanatory variables can be substantially explained by the others. When modelling this means jointly significant variables may have very small individual f-stats.
Perfect collinearity is where one of the variables is perfectly described by another
Describe the consequences of excluding a relevant explanatory variable from a model and contrast those with the consequences of including a relevant regressor
Omitting important variable means increasing bias in the model
Including irrelevant variable reduces adjusted R^2 due to penalty factor
Explain two model selection procedures and how these relate to the bias variance tradeoff
Bias-variance is balancing large models with low bias but less precise parameters, with high bias but less estimation error models
Describe methods for identifying outliers and their impact
Found using cooks distance, find residuals of model with potential outlier dropped, if Dj > 1 then outlier