Define Multicollinearity
High correlation between at least two independent variables.
When a (multiple) regression model has a multicollinearity issue, what happens? (3)
The goal of a multi-regression model is to measure the marginal effects of one independent variable on the dependent variable following the ceteris paribus assumption (all other variables remaining constant).
When 2 independent variables are highly correlated (such as age and experience), this also means that they move together.
This means there is no opportunity to disentangle their effects. The individual effect becomes obscured.
How do you detect a multicollinearity problem? (2)
Describe the Variance Inflation Factor (VIF)
Variance inflation factor (VIF):
Measures the linear association between IV and all the other IV’s:
§ Quantifies the severity of multicollinearity
§ VIF varies takes a value of 1 and above (no upper limit)
§ VIF value shows the percentage the variance, inflated for each coefficient
–> i.e. VIF of 1.7 -> Variance is 70% bigger in the data, compared to no multicollinearity.
How do you detect multicollinearity with a correlation matrix?
○ If the correlation coefficients > 0.7 –> Signal multicollinearity (more cautious: 0.5)
How do you detect multicollinearity with the results from the Variance Inflation Factor (VIF)? (3)
How can you deal with multicollinearity? (3)
Describe what a composite variable is:
OLS assumes homoscedasticity. Define homoscedasticity (2)
Define heteroscedasticity (3)
What are some issues when using OLS with heteroscedasticity? (4)
How can we detect heteroscedasticity? (3)
How can you deal with heteroscedasticity? (3)
What is Reverse Causality?
When pursuing a study, we usually assume that changes in the dependent variable are caused by changes in the
independent variable(s). However, reverse causality occurs when the dependent variable also causes a change in the independent variable.
(This is a form of endogeneity)
How can you account for Reverse Causality? (5)
Define omitted variable bias (3)
This is where a relevant variable that influences both the dependent variable (1) and one (or more) independent variables (2), is left out of the model (3).
This can create misleading results as the omitted variable gets “mixed into” the effects of the included variables, distorting the influence of the variables that are included.
How can you deal with the Omitted Variable Bias? (2)
What does the value of the ‘mean’ represent in the case of dummy variables?
This represents the % of cases where the dummy variable takes the value 1.
When correcting for heteroscedasticity, what happens to the results? (3)
Define Causal Inference and challenges associated with not having a time-lag
This is the assumption that the independent variable affects the dependent variable.
Not having a time lag creates the possibility of reverse causality, and rather the dependent variable can influence the dependent variable.