Causal relationship
a change in one variable (action) CAUSES change in another variable (result)
Correlation
the change between X and Y can partially be explained by other factors
Error Term
Residual
R^2
Goodness of Fit
null hypothesis
The null hypothesis states “no difference” or “no effect”
alternative hypothesis
The alternative hypothesis states there is a difference/effect
T-test
T-test formula
Divide the coefficient by the standard error to get the t-value
F-test
Test a set of regression coefficients for joint significance
F-stat > Critical Value = Reject the Null
(p-value of F lower than the level of significance)
F-test formula
You want the F-stat high & probability low
Interpreting Coefficients:
Level-Level
Y = β1 X1
on average a one-unit increase in X is associated with a β1-unit increase in Y, holding all else constant
Interpreting Coefficients:
Log-Level
lnY= β1 X1
on average a one-unit increase in X is associated with a β1% increase in Y, holding all else constant
Interpreting Coefficients:
Level-Log
Y= β1 lnX1
on average a 1% increase in X is associated with a β1-unit increase in Y, holding all else constant
Interpreting Coefficients:
Log-Log
lnY= β1 lnX1
on average, a 1% increase in X is associated with a β1% increase in Y, holding all else constant
Dummy/binary variable
Only has two possible values – e.g. X = 1 if female; X= 0 is male
Y = B0 + B1female
Ex: On average, being female is associated with a B1 difference in Y compared to male, holding all else constant
Categorical Variable
A variable like “region” has multiple values (south, west, northeast, midwest) that should be transformed into individual dummy (0 or 1) variables
Y = B0 + B1south + B2west + B3 northeast
Ex: On average, living in the South is associated with a B1 change in Y compared to the Midwest, holding all else constant.
Interaction term
An independent variable in a regression equation that is the multiple of two or more other independent variables. Each interaction term has its own regression coefficient
Does the effect of work experience on salary differ between males and females?
Y = B0 +B1Experience + B2Female + B3(Experience*Female) + e
Ex: On average, a one-unit increase in experience has a B3 difference in Y for females compared to males, holding all else constant
This allows the effect of experience on income to vary by gender
B3 now measures the effect of an additional year of experience for females relative to males
7 Classical Assumptions
Omitted Variable Bias
Y = β0 + β1X1 +e
where error term absorbs an omitted variable X2
Variable Inclusion Criteria
Theory: is there sound justification for including the variable?
Bias: do the coefficients for other variables change noticeably when the variable is included?
T-Test: is the variable’s estimated coefficient statistically significant?
R-square: has the R-square (adjusted R-square) improved?
First-order serial correlation
occurs when the value of the error term in one period is a function of its value in the previous period; the current error term is correlated with the previous error term.
DW Test
compare DW(d) to the critical values (𝐝_𝐋, 𝐝_𝐔)
Newey-West Standard Errors
-Designed to correct for the consequences of first-order serial correlation; they are technically still biased, but are more accurate than OLS standard errors so they can be used for t-tests and other hypothesis tests
Newey-West SE > OLS SE
-Larger standard errors produce lower t-scores, so coefficients won’t be as statistically significant