Week 4 Flashcards

Question 1

Q

Define Multicollinearity

Answer

A

High correlation between at least two independent variables.

Question 2

Q

When a (multiple) regression model has a multicollinearity issue, what happens? (3)

Answer

A

The goal of a multi-regression model is to measure the marginal effects of one independent variable on the dependent variable following the ceteris paribus assumption (all other variables remaining constant).
When 2 independent variables are highly correlated (such as age and experience), this also means that they move together.
This means there is no opportunity to disentangle their effects. The individual effect becomes obscured.

Question 3

Q

How do you detect a multicollinearity problem? (2)

Answer

A

Check correlation coefficients: Use a correlation matrix for all independent variables (to detect if there is a correlation between independent variables)
○ If the correlation coefficients > 0.7 –>
Signal multicollinearity (more cautious:
0.5)
Variance inflation factor (VIF)
○ Measures the linear association between IV and all the other IV’s.

Question 4

Q

Describe the Variance Inflation Factor (VIF)

Answer

A

Variance inflation factor (VIF):
Measures the linear association between IV and all the other IV’s:
§ Quantifies the severity of multicollinearity
§ VIF varies takes a value of 1 and above (no upper limit)
§ VIF value shows the percentage the variance, inflated for each coefficient
–> i.e. VIF of 1.7 -> Variance is 70% bigger in the data, compared to no multicollinearity.

Question 5

Q

How do you detect multicollinearity with a correlation matrix?

Answer

A

○ If the correlation coefficients > 0.7 –> Signal multicollinearity (more cautious: 0.5)

Question 6

Q

How do you detect multicollinearity with the results from the Variance Inflation Factor (VIF)? (3)

Answer

A

No Multicollinearity:
- VIF = 1 –> No correlation between a given IV and other IV’s in the model.
1. Moderate Multicollinearity:
  - VIF between 1 and 5 –> Not severe, no need to pay special attention.
2. Severe Multicollinearity:
  a. Cautious: VIF > 5
  b. Less cautious: VIF > 10
  Multicollinearity is likely a problem in the regression model and can effect your estimates.

Question 7

Q

How can you deal with multicollinearity? (3)

Answer

A

Increase sample size - however, may not be feasible
Drop one of the variables that causes the problem
- First estimate the model with one variable, then with the other –> Important to consider how dropping the variable may impact the study.
Transform the highly correlated IVs
- Log transformation (if collinear relationship is non-linear or exponential)
  ○ Can be checked with VIF before and after
- Create a composite variable, combine collinear IVs.

Question 8

Q

Describe what a composite variable is:

Answer

A

A composite variable is one which is made of two or more variables which are highly correlated conceptually or statistically.

Question 9

Q

OLS assumes homoscedasticity. Define homoscedasticity (2)

Answer

A

Variance of the error term is constant over various values of the IVs.
Dispersion of the error remains the same over the range of observations

Question 10

Q

Define heteroscedasticity (3)

Answer

A

The error term does not have a constant variance.
Variance changes in response to a change in the value of IV’s.
Dispersion of the error changes over the range of observations.

Question 11

Q

What are some issues when using OLS with heteroscedasticity? (4)

Answer

A

OLS assumption (of constant error variance) violated.
Biased standard errors
Unreliable t-statistics
Unreliable significance tests
–> misleading conclusions about significance.

Question 12

Q

How can we detect heteroscedasticity? (3)

Answer

A

Breusch-Pagan test
White test
Scatterplot of residuals (for each independent variable)

Question 13

Q

How can you deal with heteroscedasticity? (3)

Answer

A

Transform the dependent variable (doesn’t always work)
Use weighted regression
- Each observation is weighted
- Observations with a higher variance get a lower weight in determining the regression coefficients.
Use ‘robust; standard errors
- Adjusts the OLS standard errors for heteroscedasticity.

Question 14

Q

What is Reverse Causality?

Answer

A

When pursuing a study, we usually assume that changes in the dependent variable are caused by changes in the
independent variable(s). However, reverse causality occurs when the dependent variable also causes a change in the independent variable.
(This is a form of endogeneity)

Question 15

Q

How can you account for Reverse Causality? (5)

Answer

A

Have a model that is well-grounded in theory
Explain the mechanism with strong reasoning behind the “how” and “why”
Acknowledge possible endogeneity issues
Use lagged independent variables
Advanced econometric techniques to mitigate endogeneity.

Question 16

Q

Define omitted variable bias (3)

Answer

Study These Flashcards

A

This is where a relevant variable that influences both the dependent variable (1) and one (or more) independent variables (2), is left out of the model (3).
This can create misleading results as the omitted variable gets “mixed into” the effects of the included variables, distorting the influence of the variables that are included.

Question 17

Q

How can you deal with the Omitted Variable Bias? (2)

Answer

Study These Flashcards

A

Avoid simple regression models (i.e. 1 Independent Variable)
Include variables that are mostly likely to be the most important theoretically in explaining the DV.

Question 18

Q

What does the value of the ‘mean’ represent in the case of dummy variables?

Answer

Study These Flashcards

A

This represents the % of cases where the dummy variable takes the value 1.

Question 19

Q

When correcting for heteroscedasticity, what happens to the results? (3)

Answer

Study These Flashcards

A

Robust standard errors adjust the standard errors for heteroscedasticity.
–> Standard error, t-values will differ: because significance might change.
The regression coefficients and R^2 will remain the same.
The F-statistic will differ too because it is a test of the overall significance of the model.

Question 20

Q

Define Causal Inference and challenges associated with not having a time-lag

Answer

Study These Flashcards

A

This is the assumption that the independent variable affects the dependent variable.
Not having a time lag creates the possibility of reverse causality, and rather the dependent variable can influence the dependent variable.

Week 4 Flashcards

(20 cards)