EC2C3 Notes Flashcards

(210 cards)

1
Q

Parameter Def

A

FIxed number describing the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Estimate

A

To estimate a parameter means to create an estimate, which is a value used to infer what the parameter is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Estimator

A

Method (formula) to create an estimate

e.g. sample average (summation of y values / n) creates an estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is OLS, what are the estimates from it?

A

OLS is an estimator

We get B0, and B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Consider the equation

y_i = B_0 + B_1X_1i + B_2X_2i + U_i

Explain each item in it

A

Y is the outcome variable

x1i, x2i: regressors, independent variables. We are primarily interessted in one s0pecific regressor and it’s effect on the outcome, known as regressor of interest

B0; constant, intercet when all regressors are 0

Ui: Error term, residuals (difference between OLS/predicted Y and observed Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the effect of random assignment

A

There will be no confounders as nothing is correlated with randomised treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between treatement that is as good as randomly assigned but not directly randomised

A

As good as randomly assigned is if we do not directly randomise treatment, but the treatment is still uncorrelated with all other determinants of the outcome.

e.g. rainfall is as good as random by nature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variance and standard deviation are what

A

Measures of spread

standard deviation i ssquare error of the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does the var and standard deviation of y measure

A

Measure the spread of the values of that variable either in the sample or in population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the var and standard deviation of an estimate (e.g. sample average, regression coefficient)

A

Measure the spread of that estimate across repeated samples

sd(B1 hat); ‘if we repeatedly drew data and created an estimate B1 hat using each, how spread out would those estimates be#

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

standard error meaning

A

estimate of the standard deviation of an estimate

Calculating standard eror requires plugging in sample estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a counterfactual

A

Counterfactual is the outcome that would have been observed under another treatment status that didn’t happen.

This is unobserved e.g. the treatment group in the absence of treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define Bias

A

Difference between our estimate of the causal effect and the true causal effect

avg. potential outcome in absence of treatment for the treated - avg potential outcome in absence of treated for the control

(ybar 0,d=1 - ybar 0,d=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the ‘Sample Average Treatment Effect on the Treated’

A

Avg. Potential outcome of treatment for treated individuals - Avg. Potential outcome in absence of treatment for treated individuals

(Ybar 1,d=1) - (Ybar 0,d=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the bias if we randomlly asign treatment?

A

Bias in sample is expected to be 0

Thus

E(potential outcome in absence of treatment given treated indv.) = E (potential outcome in absence of treatement, given control group)

E(y0i I D=1) = E(y0i given d=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the notation for a potential outcome?

A

y1i = potential outcome if treatment 1

y0i = potential outcome if treatment 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Given no bias, what is the Average Treatment Effect (ATE)

A

ybar1i - ybar0i is the average treatment group for everyone in the sample

E( potential outcome of treatment - potential outcome in absence of treatment)

ATE=ATT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do we justify that treatment is randomly assigned

A

Comparing characteristics of treatment and control individuals

IF they look the same on observable characteristics, so its reasonable to claim they would be similar on unobservable characteristics –> thus no bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do we compare characteristics of treatment and control individuals

A

Through a T-test

e.g. Xbar,1t - Xbar0t

evaluate if treated and control look the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How would you do a comparison of means of treatment and control?

A

Null Hypoth: Diff in means is 0

T test done by dividing the difference in means by standard error for the difference in means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does Var (X-Y) = ?

If so calculate the the SE(Xbar1 - XBar)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Interpret all the terms in linear regression:

yi - B0 +B1x1i + B2x2i +ui

A

yi - outcome for observation i
B0 - expected value of y if x1=0 and x2=0

𝛽1 is the average change in 𝑦𝑖 associated with π‘₯1𝑖 increasing by 1, holding fixed all other π‘₯ (in this example, just π‘₯2).

𝛽2 is the average change in 𝑦𝑖 associated with π‘₯2𝑖 increasing by 1, holding fixed all other π‘₯ (in this example, just π‘₯1).

𝑒𝑖 is the effect of all factors other than π‘₯1 and π‘₯2 on 𝑦.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do we estimate parameters of our linear regression form?

A

We use OLS, which chooses coefficients that minimise the sum of squared residuals

with a single regressor we have formulas for parameters that solve the minimisation problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the formulas that solve the OLS minimisation, when we have a single regressor

A

𝑏1 that solves this is 𝛽1 Μ‚ = πΆπ‘œπ‘£(π‘₯1,𝑦)/ π‘‰π‘Žπ‘Ÿ(π‘₯1)

𝑏0 that solves this is 𝛽0 Μ‚= π‘¦Μ…βˆ’π›½1 Μ‚π‘₯Μ….

These formulas only apply if there is a single regressor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does the hat in OLS estimates symbolise?
The hat is used to denote that the estimate may differ from the true population values If we have population data then we can write minimising blaues as just t 𝛽0 and 𝛽1 because we know there is no sampling error (we often still write the hats to denote that we applied the OLS method). With population data we are estimating the exact β€œtrue” values. (best approx of cef)
26
What does OLS best approximate?
The conditional expectation function
27
How do we estimate the error? What is it called?
The residual is the estimate of the error 𝑒𝑖 Μ‚ =𝑦𝑖 βˆ’π›½0 Μ‚βˆ’π›½1 Μ‚π‘₯1𝑖 βˆ’π›½2 Μ‚π‘₯2𝑖. or uihat = yi - y hat
28
OLS is done by minimising the sum of squared residuals: When this is done, what 4 mathematical properties are true
1. **Expectation of the residual is 0** 𝐸[𝑒𝑖 Μ‚] =0 2. Residuals are orthognal (uncorrelated in sample) to each regressor 𝐸[π‘₯1𝑖𝑒𝑖̂] = 0,𝐸[π‘₯2𝑖𝑒𝑖̂] = 0 3. **Covariance of residual and regressor is 0** and thus πΆπ‘œπ‘£(π‘₯1𝑖,𝑒𝑖 Μ‚) =0 and πΆπ‘œπ‘£(π‘₯2𝑖,𝑒𝑖̂) = 0.
29
How is Covariance defined?
Cov(X,Y) = E(XY) - E(X)E(Y)
30
Do the properties of the residual (uhat) that follow OLS still follow if the the true (unobserved) error (u) does not satisfy those properties
Yes they still satisfy mechanical properties An omitted variable will still be part of the true error (ui) and may be correlated with regressors, but OLS will create regression estimates such that residuals (uhat) are uncorrelated with regressors in the model.
31
What is the true unobserved error
It is defined as ui = yi - E(yi given Xi) Gap between observed and what population expectation is given Xi endogeneity comes from ui and regressors being correlated
32
What is the issue then with omitted variables if OLS makes the residual uncorrelated with the regressors in the model?
Residuals will not be representative of the true effect of all factors other than x1 and x2 on y due to the bias caused by confoudner contaminating regressions and error
33
What is the formula for standard error of 𝛽1 Μ‚ in a regression with just one regressor, π‘₯
𝑠𝑒(𝛽1 Μ‚)=√(1/ 𝑛) ( π‘‰π‘Žπ‘Ÿ(𝑒𝑖 Μ‚)/ π‘‰π‘Žπ‘Ÿ(π‘₯𝑖))
34
What is heteroskedacity?
It means the variance of the error term changes with x
35
What is homoskedacity
Variance of the error term does not change with x
36
What is the issue with heteroskedacity?
The baseline standard error formula is incorrect as var of error is related to x Hence we use the robust standard error formula: 𝑠𝑒(𝛽1 Μ‚)=√(1/𝑛) π‘‰π‘Žπ‘Ÿ((π‘₯𝑖 βˆ’ 𝐸[π‘₯𝑖])𝑒𝑖̂) / π‘‰π‘Žπ‘Ÿ(π‘₯𝑖)^2 it is used as **default**
37
How do we test for correlation in our linear regression
Consider a null hypothesis of 𝐻0:𝛽1 = 𝑐 and alternative 𝐻1:𝛽1 β‰  𝑐. Typically, 𝑐 = 0.
38
how do we perform a t test calculation
Calculate the t-stat: 𝑑-π‘ π‘‘π‘Žπ‘‘ = 𝛽1 Μ‚βˆ’π‘ / 𝑠𝑒(𝛽1 Μ‚) = x - null hypoth x/ se(x) If absolute value of T state greater than 1.96 than we reject the null hypothesis (for a=0.05)
39
What is the idea behind a confidence interval
The idea of a 95% confidence interval is β€œIf we repeatedly gathered data, created estimates, and created confidence intervals, 95% of those confidence intervals would contain the true value of 𝛽1.” If the value associated with the null hypothesis, 𝑐, is in the interior of the confidence interval, then we fail to reject 𝐻0. If the hypothesized value is not in the interior of the confidence interval, we reject 𝐻0. The 95% confidence interval gives us the set of values, 𝑑, for which we would fail to reject: 𝐻0:𝛽 = 𝑑.
40
construct a 95% confidence level for B1 hat
95% 𝐢𝐼 =[𝛽1 Μ‚βˆ’1.96⋅𝑠𝑒(𝛽1 Μ‚) , 𝛽1 Μ‚+1.96⋅𝑠𝑒(𝛽1 Μ‚)]
41
What do P values represent
Represent the probability of observing the estimated value, or something more extremely different from the null, if the null is true if P values less than a we reject the null
42
How does a long and short regression differ?
Long regression includes at least one more variable e.g. x3i Interested in how this effects the coefficient estimates | q
43
How do we construct the *auxillary regression*
Regression of **the omitted variable** on the regressors in the short regression e.g. x3i = a0 + a1x1i + a2x2i + vi
44
What is the formula for OVB (Omitted Variable Bias), write mathematically and in english the bias that affects the short regressor due to the omitted variable from the long regression
𝛽1𝑆 = 𝛽1 +𝛽3π‘Ž1. In English β€œthe short regression coefficient is equal to the corresponding long coefficient plus the product of the coefficient for the variable of interest in the auxiliary regression multiplied by the coefficient for the omitted variable in the long regression.
45
Is OVB distinct from bias? How?
OVB is β€œthe mathematical relationship between coefficients for the same variable in any two regressions which differ only in that one regression contains at least one additional regressor” Bias is β€œThe difference between our estimate of the causal effect and the true causal effect
46
Does adding variables to remove OVB imply a long regression has a causal interpretation
Adding controls can reduce bias if treated and control are more similar due to conditioning on controls. does not imply causal
47
Does OVB hold for any paired regressions
Yes
48
What is the point of Regression Anatomy/Frisch Waugh Lovell Theorem
Show's how regression 'matches', Rather than matching exactly on π‘₯2, regression creates a version of π‘₯1 that is uncorrelated with π‘₯2, written π‘₯1𝑖̃ Μ‚. (x tilda hat) Regression estimates how 𝑦 changes when π‘₯1𝑖̃ Μ‚ changes. The idea is that π‘₯1𝑖̃ Μ‚ changing is not correlated with π‘₯2 changing, and the confounder has been β€œmatched on.”
49
What are the steps of The Frisch Waugh Lovell Theorem/ Regression Anatomy? Start with 𝑦𝑖=𝛽0+𝛽1π‘₯1𝑖+𝛽2π‘₯2𝑖+𝑒𝑖
1. Run a regression of **x1 on all other regressors** π‘₯1𝑖=𝛿0+𝛿1π‘₯2𝑖+π‘₯1𝑖̃ 2. Calculate **residuals** of that equation to get π‘₯1𝑖̃ Μ‚ (x tilda hat) It is a property of OLS that πΆπ‘œπ‘£(π‘₯1𝑖̃ Μ‚,π‘₯2𝑖)=0 (see above). Residuals, π‘₯1𝑖̃ Μ‚, represent** β€œthe portion of π‘₯1𝑖 that is uncorrelated with π‘₯2𝑖.” ** 3. Analogously we can do the same for, running a regression of Y on all other regressors except x1 𝑦𝑖=𝛾0+𝛾1π‘₯2𝑖+𝑦𝑖̃ Residuals, 𝑦𝑖̃ Μ‚, represent β€œthe portion of 𝑦𝑖 that is uncorrelated with π‘₯2𝑖.” 4. Thus estimate of 𝛼1 is the same as the estimate of the original 𝛽1 𝑦𝑖̃ Μ‚=𝛼0+𝛼1π‘₯1𝑖̃ Μ‚+𝑒𝑖 𝑦𝑖=𝛼0+𝛼1π‘₯1𝑖̃ Μ‚+𝑣𝑖
50
What is the standard error formula when there are multiplex (regressors)
𝑦𝑖= 𝛽0+ βˆ‘(𝛽𝑗π‘₯𝑗𝑖) π‘˜ 𝑗=1 +𝑒𝑖 𝑠𝑒(𝛽ℓ Μ‚)=√1𝑛 π‘‰π‘Žπ‘Ÿ(𝑒𝑖̂) π‘‰π‘Žπ‘Ÿ(π‘₯ℓ𝑖̃ Μ‚) Above, π‘₯ℓ𝑖̃ Μ‚ are the residuals from a regression of π‘₯ℓ𝑖 on all other π‘₯.
51
What happens when we do a nonlinear transformation
The transformed variable in the regression no longer is the standard 'average change in yi associated with xi increasing by 1'
52
How do we interpret B1 in a nonlinear transformation
1. Write the equation in the form y = f(x) 2. Find πœ•π‘¦/πœ•π‘₯ . 3. Use the principle that Δ𝑦 β‰ˆ πœ•π‘¦/πœ•π‘₯ (Ξ”π‘₯), and plug in πœ•π‘¦ πœ•π‘₯ from step 2. We then need to plug in a value for Ξ”π‘₯ to solve for the average change in y associated with x changng by that amount. either (1) value given to plug in (2) make ajudgement regarding the meaningful value to evaluate; typically the mean/median.
53
derivation of natural log transformations
54
what is the **linear-log** transformation, and interpretation
𝑦𝑖 = 𝛽0 +𝛽1ln(π‘₯𝑖)+𝑒𝑖 𝛽**1 is approximately the average change in 𝑦 associated with π‘₯ increasing by 100%.** (We use 𝛽1/100 for a 1% change).
55
what is the **log-linear** transformation, and interpretation
ln(𝑦𝑖) = 𝛽0 +𝛽1π‘₯𝑖 +𝑒𝑖 **100β‹… 𝛽1 is approximately the average percent change in 𝑦 associated with π‘₯ increasing by 1.**
56
what is the **log-log** transformation, and interpretation
ln(𝑦𝑖) = 𝛽0 +𝛽1ln(π‘₯𝑖) +𝑒𝑖 𝛽**1 is approximately the average percent change in 𝑦 associated with π‘₯ increasing by 1%. (The elasticity of 𝑦 with respect to π‘₯).**
57
What is the important fact to remember and state with nonlinear transformation method
The principles of Δ𝑦 β‰ˆ πœ•π‘¦ πœ•π‘₯ Ξ”π‘₯ is still an approximation and thus it must alwasy ve said to be the **approxmiately the average ...**
58
when are these approximations valid, evalaute each log transformatio
Valid with **small percent changes (< 20%)** For just ln(X) linearlog: we consider a **1% change in x (B1/100)** to avoid this concern. need to consider a change in x of less than 20% Log-log, for both ln(x) and ln(y): we consider a 1% change in x to avoid this issue, again B1/100 For just ln(y) log-linear: if B1>0.2, then use the formula **100(𝑒^(𝛽1) βˆ’ 1)** for the exact average percent change in y associated with x increasing by 1
59
𝑦𝑖 = 𝛽0 +𝛽1π‘₯1𝑖 +𝛽2π‘₯2𝑖 +𝑒𝑖 Suppose π‘₯2𝑖 is a continuous, or multivalued, variable (such as years of education). Suppose π‘₯1𝑖 is a binary (dummy) variable representing a qualitative state. Interpret 𝛽1
β€œholding fixed π‘₯2𝑖, 𝛽1 is the average change in 𝑦𝑖 associated with an individual having π‘₯1𝑖 = 1 rather than π‘₯1𝑖 = 0.
60
If we only have a single dummy variable, as in, 𝑦𝑖 = 𝛽0 +𝛽1𝐷𝑖 +𝑒𝑖, Interpret 𝛽1
𝛽1 is interpreted as β€œthe average change in 𝑦𝑖 associated with an individual having 𝐷𝑖 = 1 rather than 𝐷𝑖 = 0.” In this case with no controls, there are formulas for the estimates of 𝛽0 and 𝛽1.
61
When there are no controls and a dummy variable, what is the formula for the estimatres of 𝛽0Μ‚ , 𝛽1Μ‚
𝛽0Μ‚=𝑦̅𝑖,𝐷=0 (the average of 𝑦 for the observations with 𝐷𝑖 = 0) 𝛽1Μ‚= 𝑦̅𝑖,𝐷=1 βˆ’π‘¦Μ…π‘–,𝐷=0 (the difference in the average of 𝑦 between the groups with 𝐷𝑖 = 1 and 𝐷𝑖 = 0)
62
What is the condition 'no perfect collinearity'
No regressor can be a sum of multiples of other regressors and a constant E.g., if π‘₯1𝑖 = π‘Ž0 + π‘Ž1π‘₯2𝑖 + π‘Ž3π‘₯3𝑖 for constant real numbers π‘Ž0, π‘Ž1, π‘Ž2, and π‘Ž3, there is a violation. In such a case, estimates cannot be created
63
What happens when there are mutually exclusive and exhaustive categories for something, and **we include separate binary variables for each category and a constant**
There is a **violation of perfect collinearity** and estimates can't be created This is the **dummy variable trap** e.g. 𝑦𝑖 = 𝛽0 +𝛽1π‘†π‘π‘Ÿπ‘–π‘›π‘”π‘– +𝛽2π‘Šπ‘–π‘›π‘‘π‘’π‘Ÿπ‘– +𝛽3π‘†π‘’π‘šπ‘šπ‘’π‘Ÿπ‘– +𝛽4π΄π‘’π‘‘π‘’π‘šπ‘›π‘– +𝑒𝑖
64
How do we solve the dummy variable trap
1. **Omit a dummy variable** 2. **Remove the constant**
65
What are variables that are the product of two variables called
Interaction variables e.g. x3i = x1i * x2i
66
For the equation 𝑦𝑖 = 𝛽0 +𝛽1π‘₯1𝑖 +𝛽2π‘₯2𝑖 +𝛽3π‘₯1𝑖 β‹…π‘₯2𝑖 +𝑒𝑖. Interpret B2 and B3, as well as B2 + B3 where y is income x2 is years of education x3 is binary dummy for being a uk citizen
1. Differentiate w.r.t variable of interest (x2) (πœ•π‘¦π‘–/ πœ•π‘₯2𝑖) =𝛽2+𝛽3π‘₯1𝑖. Thus, 𝛽2 represents β€œthe average change in 𝑦𝑖 associated with π‘₯2𝑖 increasing by 1 if π‘₯1𝑖 = 0.” (β€œThe average change in income associated with one more year of education for non-UK citizens.”) 𝛽3 represents β€œThe average difference in the association of one more year of education with income for UK compared to non-UK citizens.” Thus, the association of one more year of education with income for UK citizens is 𝛽2 + 𝛽
67
Var(A+B)
Var (A) + Var (B) + 2Cov(A,B)
68
Var(A-B)
Var (A) + Var (B) - 2Cov(A,B)
69
What is the t stat for the H0: B1 - B2 = 0 H1: B1 - B2 β‰  0
Need SE(𝛽1 Μ‚βˆ’π›½2 Μ‚) π‘‰π‘Žπ‘Ÿ(𝛽1 Μ‚βˆ’π›½2 Μ‚) = π‘‰π‘Žπ‘Ÿ(𝛽1 Μ‚)+π‘‰π‘Žπ‘Ÿ(𝛽2 Μ‚)βˆ’2πΆπ‘œπ‘£(𝛽1 Μ‚,𝛽2 Μ‚). Thus: 𝑑-π‘ π‘‘π‘Žπ‘‘ = (𝛽1 Μ‚βˆ’π›½2 Μ‚βˆ’0)/ (βˆšπ‘‰π‘Žπ‘Ÿ Μ‚ (𝛽1Μ‚)+π‘‰π‘Žπ‘Ÿ Μ‚ (𝛽2Μ‚)βˆ’2πΆπ‘œπ‘£ . Μ‚ (𝛽1 Μ‚,𝛽2 Μ‚))
70
What is a joint hypothesis?
Hypothesis that requires 2 or more equal signs H0: B1=0 & B2=0
71
Could you do separate T-tests for each
No as it would be imprecise as each test ignored half of the hypothesis
72
What test do you do for joint hypothesis then?
We need to do an F-test
73
What is **classical measurement error in a regressor**
When we only observe π‘₯1𝑖 = π‘₯1𝑖 βˆ— + 𝑀𝑖 where πΆπ‘œπ‘£(𝑀𝑖,𝑒𝑖) = 0 and πΆπ‘œπ‘£(𝑀𝑖,π‘₯1𝑖 βˆ—) = 0 i.e. some constant is added to all observed x measurements, it is uncorrelated to the error or regressor
74
What bias does **classical measurement error in the regressor** cause
*attenuation bias*, estimated coefficient is biased towards 0 --> IB1hatI < B1
75
What is the effect of the **classical measurement error on regressor** for everything but bias
* X Variable stretch, slope of line closer to 0 * Increase in var(x) and var(uhat) * Ambiguous effect on standard errors, normally increase though
76
What is **classical measurement error in the outcome**
Only observe 𝑦1𝑖 = 𝑦1𝑖 βˆ— + 𝑀𝑖 where πΆπ‘œπ‘£(𝑀𝑖,𝑒𝑖) = 0 and πΆπ‘œπ‘£(𝑀𝑖,π‘₯1𝑖 βˆ—) = 0.
77
Does **classical measurement error in the outcome** cause bias?
This form of measurement error does not result in bias, the measurement error is uncorrelated with x1i * it doesn't result in any omitted confounders
78
Effect of **classical measurement error in the outcome** on everything but bias
* Y variable is stretched, but avg. value of y for each x does not change, and thus the estimated slope is on average unchanged * Increases var(Uhat) and standard errors normally increases
79
What is **non-classical or systematic measurement error?**
Any form of more complicated measurement error Evaluated on a case by case basis e.g. systematic overestimation of healthy habits, underestimation of unhealthy habits
80
Does **non-classical (or systematic) measurement error** cause bias?
Yes
81
What are the three forms of missing data?
1. Missing at random 2. Data missing based on a cutoff at the x value; either x or y missing if x is below some threshold 3. Data missed based on a cutoff at the y value; either x or y is missing if y is below some threshold
82
Effect of data **missing at random** on bias and other
* No concerns of bias * Just a smaller sample * OLS estimator is still unbiased
83
Effect of data of **missing based on a cutoff of the x value**
* No Bias As the slope of the regression line is the same across the domain of all x, we just have a smaller domain but still the same slope
84
Effect of data **missing based on a cutoff of the y value**
* Causes Bias Error is represented by vertical distance between a point and the line. small x values need a large positive error to meet threshold, thus as x increases, error term decreases on average --> omitted variable in ui that changes on avg. when x changes --> confounder
85
What is the point of control variables
𝑦𝑖 = 𝛽0 +𝛽1π‘₯1𝑖 +𝑒𝑖 There is OVB if there is a variable that is correlated with π‘₯1𝑖 and also correlated with 𝑦𝑖. The idea of a control variable is to bring the omitted variable out of the error term and include it directly in the model, e.g., 𝑦𝑖 = 𝛽0 +𝛽1π‘₯1𝑖 +𝛽2π‘₯2𝑖 +𝑒𝑖.
86
What are good and bad controls variables
Good control variables - determined prior to treatment or are immutable characteristics of individuals (isn't an outcome of treatment) Bad control variables - control variable that introduces a new confounder; typically happens when the **control is itself an outcome or determined after treatment**
87
What makes a control 'bad'?
Holding fixed the bad control , changes in treatement may be correlated with changes in a confounder Adding a bad control induces correlation of treatment with confounders
88
89
What is the issue of using outcomes of treatment as controls?
Outcomes of treatment are affected by variables that are components of the error. If we include an outcome of treatment as a regressor, we induce confounders (because omitted variables are correlated with regressors and also affect the outcomes)
90
What is 'internal validity?
Estimate can be interpreted as a causal effect for the population that is used in the study no issues (confounders, attenuation bias, bias due to y cutoffs, no simultanaeity/reverse causality, no bad control)
91
External validity
estimate is represenative of the effect for another population nearly always an assumption, checked by creating estimates in various settings and checking if effects are comparable
92
What is R2
𝑅2 represents the fraction of the variation in the outcome that is explained by the regression line.
93
R2 Formula
𝑅2 =π‘‰π‘Žπ‘Ÿ(π‘ŒΜ‚) /π‘‰π‘Žπ‘Ÿ(π‘Œ) =1βˆ’π‘‰π‘Žπ‘Ÿ(𝑒̂) /π‘‰π‘Žπ‘Ÿ(π‘Œ)
94
What happens when you add additional regressors into a regression model R^2
R^2 will never decrease
95
Does R2 tell you if the regression is contaminated or not?
No it only tells you if the points are close to the line
96
When do we care about R^2 primarily
When the goal is to predict y (as opposed to treatment estimation), we no longer care for causality and just for x explaining a lot of the variation in y This is the case in the first stage of an insrumental variables regression
97
What is standardising a variable?
standardising is a form of normalising where we - u (mean) - / (divide) by s.d. useful for when units cannot be easily understood
98
When standardiisng just x1, what is B1 interpreted as
𝛽1βˆ— is interpreted as β€œthe average change in 𝑦 that is associated with π‘₯1 increasing by 1 standard deviation.”
99
standardising just y, what is B1 interpreted as
𝛽1βˆ— is interpreted as β€œthe average number of standard deviations that 𝑦 changes by that is associated with π‘₯1 increasing by 1.”
100
standardising both x1 and y, what is B1 interpreted as?
β€œthe average number of standard deviations that 𝑦 changes by that is associated with π‘₯1 increasing by 1 standard deviation.”
101
Do we have to subtract the mean and dividie by the standard deviation
No for the interpretation it is sufficient to divide by the standard deviation
102
What makes a valid instrument?
If it matches the two assumptions of: (1) **relevance, πΆπ‘œπ‘£(𝑧𝑖,π‘₯1𝑖) β‰  0**, the instrument is **correlated with the variable of interest** (2) **exogeneity, πΆπ‘œπ‘£(𝑧𝑖, 𝑒𝑖) = 0**, the instrument is **uncorrelated with the error** term of the regression. --> made up of exclusion, and as good as randomly assigned
103
What is the assumption of relevance? importance?
Relevance is that the** instrument is correlated with the variable of interest** ** πΆπ‘œπ‘£(𝑧𝑖,π‘₯1𝑖) β‰  0** Without it, it can't be a good instrument to isolate the variation in treatment that is not due to source(s) of bias
104
What is the **exclusion assumption**
Exclusion is one of the two assumptions of exogeneity (πΆπ‘œπ‘£(𝑧𝑖, 𝑒𝑖)=0 ) The exclusion assumption is that **𝑧𝑖 does not directly affect 𝑦𝑖** (i.e., that 𝑧𝑖 itself is excluded from 𝑒𝑖 and only affects 𝑦𝑖 through correlation with π‘₯1𝑖, or controls if there are any).
105
What is the **as good as random** assumption
Realised value of **𝑧𝑖 is uncorrelated with all unobserved factors in 𝑒𝑖 that affect 𝑦𝑖**. I.e., 𝑧𝑖 is not itself determined by unobserved factors that affect 𝑦𝑖
106
evaluate the relevance assumption for this case
Relevance, πΆπ‘œπ‘£(𝑧𝑖,π‘₯1𝑖) β‰  0, is probably true. Being drafted made people much more likely to serve in the military.
107
evaluate as good as randomly assigned assumption in this case
β€œAs good as randomly assigned” is almost certainly true. The lottery operated by birthdates being randomly chosen, and individuals with chosen birthdates being drafted. Being drafted should not have been correlated with any determinants of income at age 50. In fact, Gmeiner would argue that 𝑧 was randomly assigned, not just β€œas good as” randomly assigned.
108
evaluate exclusion in this context
Exclusion would mean that the only mechanism by which being drafted affected income at age 50 was through military service. This is less likely to be true. Some individuals who were drafted chose to pursue a college education (because they knew by going to college the military would allow them to avoid service). Thus, there is a potential secondary channel whereby 𝑧𝑖 affects 𝑦𝑖 that is not only through π‘₯1𝑖. Exclusion might fail.
109
What is the 2SLS
The most general method of implementing analysis with an instrumental variable is called 2SLS
110
What are the steps of 2SLS Method
1. Consider the **first-stage** regression (variable causing bias is outcome, and instrument is a regressor) 2. Estimate first stage w OLS, creating predicted values (e.g. Μ‚ =𝛿0 Μ‚+𝛿1 ̂𝑧𝑖.) 3. Estimate **second-stage** by using predicted values in equation of interest rather than original xi 𝑦𝑖 = 𝛽0 +𝛽1π‘₯1𝑖 Μ‚+𝑒𝑖 4. Estimate OLS of second stage with predicted values, gives us 2SLS Estimate of B1
111
If exogeneity and relevance assumptioons are true, then what will the IV estimate converge to
It will converge to the true B1, overcoming bias
112
What is the First Stage Equation
Variable Causing Bias = d0 + d1*Instrument + error We create predicted values with this
113
What is the Second Stage Equation
Equation of interest but with predicted values in place of xi
114
What is the equation of interest?
rw
115
What is the reduced form equation?
An outcome we care about is on the left, and variables that do not cause bias is on the right Y= a0 + a1zi + ui Where z is the instrument
116
Derive reduced form in the below form From Y= B0 + B1x1i + ui
117
What does the reduced form coefficient estimate Does z affect y directly?
Estimate of reduced form coefficient, gives an estimate of relationship of zi and yi. Exogeneity means z cannot affect y directly
118
How do we know the reduced form coefficient operates only through xi
As the reduced form coefficient is equivalent to B1d1 d1 (from 1st stage) represents the association of zi with x1i B1 represents the effect of x1i on y
119
What does the intercept of the reduced form represent?
Referred to as the **intention-to -treat** effect Most common if z is a binary variable that offers thre treatment, while x is the treatment
120
what is the ols formula for d1 (coefficient for first stage regression)
𝛿1 Μ‚=πΆπ‘œπ‘£(𝑧𝑖,π‘₯1𝑖) / π‘‰π‘Žπ‘Ÿ(𝑧𝑖) π‘₯1𝑖 = 𝛿0 +𝛿1𝑧𝑖 +𝑣𝑖
121
What is the OLS formula for πœ™1 (coefficient for reduced forms)
πœ™1 = πΆπ‘œπ‘£(𝑧𝑖,𝑦𝑖) / π‘‰π‘Žπ‘Ÿ(𝑧𝑖)
122
What is the B1 2SLS estimator
πΆπ‘œπ‘£(𝑧𝑖,𝑦𝑖) / πΆπ‘œπ‘£(𝑧𝑖,π‘₯1𝑖) COV(z,y); z and y from the reduced form Cov(z,x); z and x from the first stage
123
What is the B1 2SLS also equivalent to
𝛽 Μ‚1,2𝑠𝑙𝑠 = πœ™1/𝛿1 = πΆπ‘œπ‘£(𝑧𝑖,𝑦𝑖) / πΆπ‘œπ‘£(𝑧𝑖,π‘₯1𝑖)
124
when z is a binary variable, what does the B1 2SLS estimator simplify to what is it called?
𝛽 Μ‚1,2𝑠𝑙𝑠 = πΆπ‘œπ‘£(𝑧𝑖,𝑦𝑖) / πΆπ‘œπ‘£(𝑧𝑖,π‘₯1𝑖) The notation **(𝑦̅𝑖,𝑧=1 βˆ’ 𝑦̅𝑖,𝑧=0) / (π‘₯̅𝑖,𝑧=1 βˆ’ π‘₯̅𝑖,𝑧=0)** This is the Wald Estimator: n 𝑀 ̅𝑖,𝑧=𝑐 denotes the average of 𝑀 for the subsample with 𝑧𝑖 = 𝑐.
125
Why does the exogeneity assumption work?
If exogeneity is true, we have πΆπ‘œπ‘£(π‘₯1𝑖 Μ‚,𝑒𝑖) = πΆπ‘œπ‘£(𝛿0 Μ‚+𝛿1 ̂𝑧𝑖,𝑒𝑖) = 𝛿1 Μ‚πΆπ‘œπ‘£(𝑧𝑖,𝑒𝑖) = 0. The final equality is because we assume exogeneity, πΆπ‘œπ‘£(𝑧𝑖,𝑒𝑖) = 0. Essentially, the predicted values from the first stage represent the β€œportion” of π‘₯1𝑖 that is uncorrelated with 𝑒𝑖. (I.e., we isolate the portion of π‘₯1𝑖 that is uncorrelated with all unobserved determinants of the outcome, and thus there are no confounders). We attain a representative estimate of 𝛽1.
126
Why does relevance work?
Relevance means 𝛿1 estimates of 𝛽1 Μ‚β‰ 0 because πΆπ‘œπ‘£(𝑧𝑖,π‘₯1𝑖) β‰  0. With this, π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖 Μ‚ β‰ 0, and we can create estimates of B1 in the second stage (i.e., to calculate 𝛽1 =πΆπ‘œπ‘£(π‘₯1𝑖 Μ‚,𝑦𝑖) / π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖 Μ‚) , we need π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖 Μ‚) to be nonzero).
127
128
How do we attain a small standard error for B1?
Need 'strong relevance', statisticla rs between zi and xi is strong (d1 stat. sig. or high R2 in first stage regression)
129
What is the formula for the standard error estimate of B1 in the second stage?
√(1/𝑛) (π‘‰π‘Žπ‘Ÿ(𝑒̂) / π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖 Μ‚) ) . We need a large π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖 Μ‚) to attain a low standard error for B1 in the second stage the **residual is defined by the oriignal data**
130
What is the formula for the standard error of d1 in the first stage? What do we need for a small standard error
The formula for the standard error of 𝛿1 Μ‚ in the first stage is √(1/𝑛) π‘‰π‘Žπ‘Ÿ(𝑣̂) / π‘‰π‘Žπ‘Ÿ(𝑧) . To attain a small standard error 𝛿1 Μ‚ in the first stage), we need a small π‘‰π‘Žπ‘Ÿ(𝑣̂).
131
How is π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖 Μ‚), π‘‰π‘Žπ‘Ÿ(𝑣𝑖̂) related? Show from: π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖) = π‘‰π‘Žπ‘Ÿ(𝛿0 Μ‚+𝛿1 ̂𝑧𝑖 +𝑣𝑖̂)
π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖) = π‘‰π‘Žπ‘Ÿ(𝛿0 Μ‚+𝛿1 ̂𝑧𝑖 +𝑣𝑖̂) π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖) = π‘‰π‘Žπ‘Ÿ(𝛿0 Μ‚+𝛿1 ̂𝑧𝑖)+π‘‰π‘Žπ‘Ÿ(𝑣𝑖̂) π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖) = π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖 Μ‚)+π‘‰π‘Žπ‘Ÿ(𝑣𝑖̂) The key to notice is that if π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖 Μ‚) is large we will have small π‘‰π‘Žπ‘Ÿ(𝑣̂). In such a case, 𝛿1 standard error (is more likely to be significant) and 𝛽1 Μ‚ has a small Μ‚ will also have a smaller standard error.
132
How do we test for the significance of d1? If it is large enough for significance, what does it imply about other properties
To test for significance of 𝛿1 Μ‚ we consider the 𝑑-π‘ π‘‘π‘Žπ‘‘, | 𝛿1 Μ‚βˆ’0/ 𝑠𝑒(𝛿1) , If this is large enough for significance, it is because 𝑠𝑒(𝛿1 Μ‚) is small, which means π‘‰π‘Žπ‘Ÿ(𝑣̂) is small, and thus π‘‰π‘Žπ‘Ÿ(π‘₯1𝑖 Μ‚) is big, and 𝑠𝑒(𝛽1 Μ‚) is small.
133
What is the key takeaway for attaining small standard errors in the second stage?
The key takeaway is that we need a statistically significant coefficient for the instrument in the first stage to attain small standard errors in the second stage.
134
How do we generalise the 2SLS Method to multiple variables, controls and isntruments
1. Estimate a **separate first stage for each regressor** that might cause bias, with that regressor as the outcome. 2. We** include all instruments and controls that do not caue bias, while omitting any that could cause bias in the second stage.** 3. Create predicted values from each first stage regression, plug into equation of interest for second stage
135
Do relevance and exogeneity differ for generalised IV?
Analogous: (1) Relevance, each **regressor with a bias concern is correlated with the instruments.** (2) Exogeneity, **Cπ‘œπ‘£(𝑧𝑗𝑖,𝑒𝑖) = 0 for all instruments 𝑗. **
136
When you have multiple regressors instrumented, is there another IV assumption?
**Rank,** we **need at least as many instruments as regressors that are instrumented**. The intuition is that each instrument can only β€œfix” one regressor, although if we have extra instruments, that is beneficial because it creates more variation in the regressors (see relevance).
137
Why must we have rank? (at least one instrument assigned to each regressor that is instrumented)
If we have two instrumented regressors and two instruments, with one of the instruments correlated with both instrumented regressors, and the other instrument uncorrelated with both instrumented regressors, then estimates will have large standard errors. Essentially** relevance β€œfails”, because the one instrument that is relevant is not able to create enough variation in both regressors.**
138
What is an overidentified instrumental variable estimation
If we have more instruments than regressors that are instrumented
139
What is an estimation that is identified or exactly identified
if there are the same number of instruments as regressors that are instrumented
140
Reverse causality means
Only y has a causal effect on x
141
Simultaneity means
x has a causal effect on y and, also, y has a causal effect on x
142
what is the issue with drawign supply and demand curves
We only iobserve the price and quantity pairs We don't observe the supply and demand curves directly
143
If we estimate the demand curve by OLS, what would be the result?
Regression with ln(P) as the outcome and ln(Q) as the regressor results in a single regression line
144
Why doesn't the OLS regression of ln(Pi) = a_s + B_s + n_si work
They are simultaneous equations, as P and Q are simulataneously determined by each other Two parameters that explain the rs between P and Q
145
What is a structural equation
They show the structure (theory) of the system
146
Key Points: - Assume rs - Simultaneity issue; explain with second equation - Data is realised pairs - OLS doesn't isolate a single mechanism; 'bundles both' - Describe bias
147
What does the shifts in blue lines represent?
Shifts in the blue lines represent changes in ui, **which causes changes in y1i holding fixed y2i**
148
What do shifts in the red line suggest?
Shifts in the red line represent changes in vi, which causes **changes in y2i holding fixed y1i**
149
How can we solve our issue with simultaneous data
Find an instrument that causes variation in one channel, holding fixed the other channel . I.e., to estimate 𝛼1, we need to create variation that holds fixed the relationship defined by 𝛼1 (see below).
150
Given two simultaneous equations, create a potential instrument for it, and estimate a1: 𝑦1𝑖 = 𝛽0 +𝛽1𝑦2𝑖 +𝛽2π‘₯2𝑖 +𝑒𝑖 𝑦2𝑖 = 𝛼0 +𝛼1𝑦1𝑖 +𝛼2𝑀2𝑖 +𝑣𝑖.
A potential instrument is π‘₯2, must make the exogeneity assumption that πΆπ‘œπ‘£(π‘₯2𝑖,𝑣𝑖) = 0. The first stage equation is, 𝑦1𝑖 = 𝛾0 +𝛾1𝑀2𝑖 +𝛾2π‘₯2𝑖 +πœ‚π‘–. incl x2 regressor and w2 as control We calculate predicted values, and use in the equation of interest, then estimate with OLS, 𝑦2𝑖 = 𝛼0 +𝛼1𝑦1𝑖 Μ‚+𝛼2𝑀2𝑖 +𝑣𝑖. The key concept is that the instrument creates variation that holds fixed one channel (i.e., 𝑣𝑖 is assumed to be uncorrelated with π‘₯2𝑖. Thus, 𝑣𝑖 doesn’t change on average when π‘₯2𝑖 changes, which means we are holding fixed the red line).
151
Definition of an **endogenous variable**
**Determined within the system** Any regressor that causes bias
152
Definition of an **exogenous variable**
**Taken as given, not from the system** Variable does not cause bias as a regressor in the OLS
153
Definition of an **identified parameter**
Identified parameter, it can be learned from an infinite amount of data
154
Definition of an **unidentified parameter**
Cannot be learnt, even with an infinite amount of data
155
Definition of a reduced form equation
The principle of a reduced form equation is heuristically β€œan outcome we care about is on the left and variables that do not cause bias are on the right.” More formally we define a reduced form by, β€œan endogenous outcome is on the left and exogenous variables are on the right.”
156
How do we derive the reduced form equation for structural equations?
Derive reduced form equations by plugging one structural equation into the other and simplifying
157
Derive the reduced form equation for these: 𝑦1𝑖 = 𝛽0 + 𝛽1𝑦2𝑖 + 𝛽2π‘₯2𝑖 + 𝑒i y2𝑖 = 𝛼0 + 𝛼1𝑦1𝑖 + 𝛼2𝑀2𝑖 + 𝑣i
can be estimated by OLS, but not structural equations of interest
158
How could we use the structural form equation to solve the parameters of interest in the reduced form coefficients
The reduced form coefficients for the respective instruments, divided, can be used to estimate each parameter Dividing strips out the scale of the instrument and isolates the causal link
159
If we are in an overidentified case, what do we use to create estimates
We use 2SLS
160
If we do not have an instrument for a regressor, could we still use reduced form coefficients to solve the parameter of interest coefficients
No we couldn't We could say the equation is unidentified
161
What is cross-sectional data?
Data on several individuals at a single point in time
162
What notation did we use for cross sectional data?
𝑦𝑖 = 𝛽0 + 𝛽1π‘₯1𝑖 + 𝛽2π‘₯2𝑖 + 𝑣i for which the subscript 𝑖 denotes individual 𝑖.
163
What is Panel Data
Observe data for several individuals, and observe each individual at several points in time e.g. N individuals for T time periods
164
What notation do we use for panel data
𝑦𝑖𝑑 = 𝛽0 + 𝛽1π‘₯1𝑖𝑑 + 𝛽2π‘₯2𝑖𝑑 + 𝑣𝑖𝑑. Subscript 𝑖𝑑 denotes individual 𝑖 at time 𝑑.
165
Hoiw do we control for time-invariant effects with the error term in panel data?
Decompose error term 𝑣𝑖𝑑 into** π‘Žπ‘–, representing a time-invariant piece, and 𝑒𝑖𝑑, representing a time-varying piece.** ai is essentially the effect of being an individual
166
What is the concern with ai (time-invariant effects)
Unobserved and possibly correlated with our regressors --> confounder Concern always there but discuss in panel data because data is strong enough to allow us to overcome the concern
167
How do we overcome the confounder of time invariant effects?
1. First Differences 2. Fixed Effects
168
How do we do **First Differences** to overcome time-invariant effects?
For any variable, 𝑀, define the notation Δ𝑀𝑖𝑑 ≔ 𝑀𝑖𝑑 βˆ’ 𝑀𝑖,π‘‘βˆ’1. 1. Perform Δ𝑦𝑖𝑑 = 𝑦𝑖𝑑 βˆ’ 𝑦𝑖,π‘‘βˆ’1 Δ𝑦𝑖𝑑 = 𝛽0 + 𝛽1π‘₯1𝑖𝑑 + 𝛽2π‘₯2𝑖𝑑 + π‘Žπ‘– + 𝑒𝑖𝑑 βˆ’ (𝛽0 + 𝛽1π‘₯1𝑖,π‘‘βˆ’1 + 𝛽2π‘₯2𝑖,π‘‘βˆ’1 + π‘Žπ‘– + 𝑒𝑖,π‘‘βˆ’1) 2. Thus Δ𝑦𝑖𝑑 = 𝛽1Ξ”π‘₯1𝑖𝑑 + 𝛽2Ξ”π‘₯2𝑖𝑑 + Δ𝑒𝑖𝑑 **ai is differenced away** thus ai is no longer a confounder because it does not affect the difference across time periods in either treatment or outcome
169
Does only ai get removed by differencing?
No, any time invariant effects (constant, other variables) are removed Cost of removing bias is that we lose all these time invariant effects
170
How do we normally overcome bias due to confounders
we take it out of the error term and include it directly in the model we can do an analgous operation for time-invariant effects tthrough dummy variables (fixed effects)
171
How do we perform **fixed effects** to overcome time-invariant effects
we include the dummy variable, 𝛿𝑖 for all individuals except one (we must exclude one dummy variable to avoid perfect collinearity due to the dummy variable trap). The individual without a dummy variable in the model is often called the β€œomitted group” or β€œcomparison group.”
172
Interpret coefficients for the fixed effects formula
𝛽1 and 𝛽2 are β€œthe average change in the outcome associated with π‘₯1 (or π‘₯2) increasing by 1, holding fixed all other π‘₯ and holding fixed who the individual is.” 𝛽0 is β€œthe expected outcome for the omitted group when all π‘₯ are 0.” π‘Žπ‘– is β€œthe average change in the outcome associated with being individual 𝑖 compared to the omitted group, holding fixed all π‘₯.
173
Can we include time-invariant regressors in a fixed effects regression
No, including the variable **violates no perfect collinearity** The heuristic explanation is that the dummy variable, 𝛿𝑖, and effect π‘Žπ‘–, capture the effect of all time-invariant characteristics of person 𝑖. If a variable, π‘₯2𝑖, does not change over time, we could lump its effect in with π‘Žπ‘–, and do not need to separately estimate the effect.
174
What is the cost of removing time-invariant bias with fixed effects
Again can't estimate the effect of any time-invariant variables Any time invariant variables must be excluded
175
How do econometricians generally use the term fixed effects?
Use it to refer to any situation in which dummy variables are included all possible values of a variable commonly applied to time periods
176
What are two way fixed effects?
When we control for both individual and time fixed effects
177
Is first differences or fixed effects more common
For the purpose of the exam, just know that first differences and fixed effects are two methods of overcoming the bias caused by π‘Žπ‘–. Know the mechanics as described above. In practice, fixed effects is more common because of the β€œsimplicity” of implementation and because of the desirability of directly estimating the π‘Žπ‘–.
178
What is the ideal method to estimate a treatment effect
in a randomised trial
179
Given two groups, one being treated in time period 1, and one not. How do we estimate the treatment effect? Let 𝑦𝑖𝑑 denote the outcome of interest for individual 𝑖 at time 𝑑. We write the model, 𝑦𝑖𝑑 = 𝛿𝐷𝑖𝑑 + 𝛾𝑖 + πœ†π‘‘ + 𝑒𝑖𝑑 where 𝛾𝑖 is the time-invariant effect of being individual 𝑖, πœ†π‘‘ is the effect of being time 𝑑 (the same for all individuals), and 𝛿 is the effect of treatment, with 𝐷𝑖𝑑 denoting a binary variable that takes the value 1 if individual 𝑖 is treated at time 𝑑. (𝐷𝑖𝑑 = 1 only for group 2 at time 2).
Use DiD: 𝛿̂ = 𝑦̅̅22Μ…Μ… βˆ’ 𝑦̅̅21Μ…Μ… βˆ’ (𝑦̅̅12Μ…Μ… βˆ’ 𝑦̅̅11Μ…Μ…) Treatment effect = average post treatment - average pre treatment - (average post control - average pre control)
180
How does the DiD estimator work? consider expectations
𝐸[𝛿̂] = 𝐸[𝑦̅̅22Μ…Μ… βˆ’ 𝑦̅̅21Μ…Μ… βˆ’ (𝑦̅̅12Μ…Μ… βˆ’ 𝑦̅̅11Μ…Μ…)] = 𝐸[𝛿(1) + 𝛾2 + πœ†2 + 𝑒22] βˆ’ 𝐸[𝛿(0) + 𝛾2 + πœ†1 + 𝑒21] βˆ’ (𝐸[𝛿(0) + 𝛾1 + πœ†2 + 𝑒12] βˆ’ 𝐸[𝛿(0) + 𝛾1 + πœ†1 + 𝑒11]) The 𝛾𝑖 and πœ†π‘‘ terms cancel, leaving us with the expression below. = 𝛿 + 𝐸[𝑒22 βˆ’ 𝑒21 βˆ’ (𝑒12 βˆ’ 𝑒11)] We assume the errors are all mean-zero. We have 𝐸[𝛿̂] = 𝐸[𝑦̅̅22Μ…Μ… βˆ’ 𝑦̅̅21Μ…Μ… βˆ’ (𝑦̅̅12Μ…Μ… βˆ’ 𝑦̅̅11Μ…Μ…)] = οΏ½
181
Explain the key assumption of parallel trends
The change of the outcome in the control group between time periods is what would have happened in the treatment group in the absence of treatment the observed outcome in the control group, is the same as what would've happened in the treatment group in the absence of treatment
182
How is parallel trends shown to be crucial to the DiD mathematically
shown by **πœ†π‘‘ being the same for both treated and control**, and thus the πœ†2 and πœ†1 terms cancelled above. Consider if the effect of time differed across groups 𝐸[𝑦̅̅22Μ…Μ… βˆ’ 𝑦̅̅21Μ…Μ… βˆ’ (𝑦̅̅12Μ…Μ… βˆ’ 𝑦̅̅11Μ…Μ…)] goes to = 𝛿 + (πœ†22 βˆ’ πœ†21) βˆ’ (πœ†12 βˆ’ πœ†11) we **estimate the effect of treatment bundled with the time trend in the treated group minus the time trend in the control group. The estimator of the treatment effect is biased by the difference in the time trends. **
183
How is parallel trends shown graphically. How do we use it tto find the treatment graphically? What does it look like graphically if the assumption fails
184
Express the DiD assumption mathematically , let 𝑦𝑖𝑑(1) denote the potential outcome when 𝑖 is treated at time 𝑑 and 𝑦𝑖𝑑(0) denote the potential outcome when 𝑖 is untreated at time t
𝐸[𝑦22(0) βˆ’ 𝑦21(0)] = 𝐸[𝑦12(0) βˆ’ 𝑦11(0)]
185
How else can the difference in difference estimator be implemented?
It can be implemented using a regression with dummy variables, and interactions
186
What is B0 in this DiD Regression
187
Implement the difference in difference estimator using a regression
𝑦𝑖𝑑 = 𝛽0 + 𝛽1π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ + 𝛽2π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝛽3π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ β‹… π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝑒𝑖t
188
Interpret the coefficients of the regression of the DiD
𝑦𝑖𝑑 = 𝛽0 + 𝛽1π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ + 𝛽2π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝛽3π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ β‹… π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝑒𝑖t π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ is a binary variable that takes the value 1 if an individual is in the treated group and 0 otherwise Pπ‘œπ‘ π‘‘π‘–π‘‘ is a binary variable that takes the value 1 for the post-treatment time period(s) and 0 otherwise. π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ β‹… π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ is an interaction. It is binary, taking the value 1 for the treated group posttreatment and 0 otherwise πœ•π‘¦π‘–π‘‘/πœ•π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ = 𝛽1 + 𝛽3π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ * 𝛽1 is the** pre-treatment average change in the outcome associated with being the treated group compared to the control group**. 𝛽3 is the **average change in this association after treatment time** (presumably, 𝛽3 is the effect of treatment). πœ•π‘¦π‘–π‘‘/πœ•π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘= 𝛽2 + 𝛽3π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ * 𝛽2 is the **average change in the outcome associated with being in the post-treatment time period compared to the pre-treatment time period for the control group**. 𝛽3 is the **average difference in this for the treatment group** (presumably, 𝛽3 is the effect of treatment).
189
What is B0 in this DiD Regression y𝑖𝑑 = 𝛽0 + 𝛽1π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ + 𝛽2π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝛽3π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ β‹… π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝑒𝑖t
* 𝛽0 is the average 𝑦 in the before period for the control group.
190
What is 𝛽0 + 𝛽1 in this DiD Regression y𝑖𝑑 = 𝛽0 + 𝛽1π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ + 𝛽2π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝛽3π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ β‹… π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝑒𝑖t
𝛽0 + 𝛽1 is the average 𝑦 in the before period for the treated group.
191
B0 + 𝛽2 in this DiD Regression y𝑖𝑑 = 𝛽0 + 𝛽1π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ + 𝛽2π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝛽3π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ β‹… π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝑒𝑖t
B0 + 𝛽2 is the average 𝑦 in the after period for the control group.
192
𝛽0 + 𝛽1 + 𝛽2 + 𝛽3 in this DiD Regression y𝑖𝑑 = 𝛽0 + 𝛽1π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ + 𝛽2π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝛽3π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ β‹… π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝑒𝑖t
average 𝑦 in the after period for the treated group
193
𝛽3 in this DiD Regression y𝑖𝑑 = 𝛽0 + 𝛽1π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ + 𝛽2π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝛽3π‘‡π‘Ÿπ‘’π‘Žπ‘‘π‘’π‘‘π‘–π‘‘ β‹… π‘ƒπ‘œπ‘ π‘‘π‘–π‘‘ + 𝑒𝑖t
𝛽3 is the effect of being in the treated group in the post time period (the effect of treatment, previously written as 𝛿).
194
Given the definitions of the coefficients in the DiD regression, write the DiD estimator to prove it works
(𝛽0 + 𝛽1 + 𝛽2 + 𝛽3 βˆ’ (𝛽0 + 𝛽1)) βˆ’ (𝛽0 + 𝛽2 βˆ’ (𝛽0)) = 𝛽3
195
What kind of method is regression discontinuity
It is the same as DiD as an 'identification strategy' - using non-experimental observational data to estimate causal data
196
How to set up regression discontinuity
multi-valued or continuous variable, π‘₯𝑖, and a binary variable, 𝐷𝑖, for which: * 𝐷𝑖 = 1 𝑖𝑓 π‘₯𝑖 β‰₯ π‘₯0 * 𝐷𝑖 = 0 𝑖𝑓 π‘₯𝑖 < π‘₯0. xi is called the running or forcing variable, goal is to estimate causal effect of Di on yi
197
What is the running or forcing variable?
The multi-valued or continuous variable that binary variable Di, switches depending on its value
198
What is the regression for a regression discontinuity design given multi-valued or continuous variable, π‘₯𝑖 , and a binary variable, 𝐷𝑖 , for which, * 𝐷𝑖 = 1 𝑖𝑓 π‘₯𝑖 β‰₯ π‘₯0 * 𝐷𝑖 = 0 𝑖𝑓 π‘₯𝑖 < π‘₯0.
We estimate a regression of the form below. 𝑦𝑖 = 𝛽0 + 𝛽1(π‘₯𝑖 βˆ’ π‘₯0) + 𝛽2𝐷𝑖 + 𝛽3(π‘₯𝑖 βˆ’ π‘₯0) β‹… 𝐷𝑖 + 𝑒𝑖
199
Interpret the coefficients for the regression of the regression discontinuity design
𝑦𝑖 = 𝛽0 + 𝛽1(π‘₯𝑖 βˆ’ π‘₯0) + 𝛽2𝐷𝑖 + 𝛽3(π‘₯𝑖 βˆ’ π‘₯0) β‹… 𝐷𝑖 + 𝑒1 𝛽1 is β€œthe average change in 𝑦𝑖 associated with π‘₯𝑖 increasing by 1 when π‘₯𝑖 < π‘₯0.” 𝛽1 + 𝛽3 is β€œthe average change in 𝑦𝑖 associated with π‘₯𝑖 increasing by 1 when π‘₯𝑖 β‰₯ π‘₯0. 𝛽2 is β€œthe average change in 𝑦𝑖 when 𝐷 increases from 0 to 1 at π‘₯𝑖 = π‘₯0.” (Note that this interpretation relies on centring π‘₯𝑖 on π‘₯0.)
200
Draw the regression discontinuity design
201
What happens if we do not centre on X0
estimates for the costant and coefficient for binary variable differ at x=0 jump is different, figures generally more complicated? why?
202
What is the key to implementing a regression discontinuity
key to implementing a regression discontinuity is that we only use data for which π‘₯ is within a β€œwindow” or β€œbandwidth” of the threshold, π‘₯0. That is, we use data for which π‘₯𝑖 ∈ [π‘₯0 βˆ’ β„Ž, π‘₯0 + β„Ž].
203
What is the critical assumption for regression discontinuity
individuals who receive treatment arecomparable to individuals who did not receive treatment (i.e., we assume that 𝐷 is as good as randomly assigned and there are no confounders). Part of this assumption is that individuals cannot choose on which side of the cutoff they are, AND the bandwidth is small
204
What do we have to do for the regression discontinuity assumpiton of comparable treatment and control, to be believeable
We thus (1) restrict the data to observations with π‘₯ in a bandwidth around π‘₯0 (2) only assume that, within the bandwidth of π‘₯ around the threshold, treatment is as good as randomly assigned.
205
Why do we include the terms 𝛽1(π‘₯𝑖 βˆ’ π‘₯0) and 𝛽3(π‘₯𝑖 βˆ’ π‘₯0) β‹… Di in the regression?
allows for estimating the linear effect of π‘₯ on the outcome Excluding these can induce bias. If we only estimated, 𝑦𝑖 = 𝛽0 + 𝛽2𝐷𝑖 + 𝑒𝑖 the estimate of 𝛽2 could be contaminated by the association of π‘₯ with 𝑦i
206
What does increasing the bandwith do
* means more data is used * SE smaller when the sample size is larger * more likely assumption is violated
207
What is the trade-off with increasing/decreasing the bandwidth
Increasing bandwidth, increases bias but reduces variance Decreasing bandwidth, decreases bias but increases variance in practice we just use multiple, and check if sensiitve to it
208
we include the terms 𝛽1(π‘₯𝑖 βˆ’ π‘₯0) and 𝛽3(π‘₯𝑖 βˆ’ π‘₯0), to predict the linear relationship of x with y. But what if it is a nonlinear rs? as shown below the left is the true treatment effect, while the right is the smaller effect using a linear regression
Overcome by augmeting the regression equation with nonlinear terms e.g. using a quadratic, need it on both sides of the jump so it appears twice: 𝑦𝑖 = 𝛽0 + 𝛽1(π‘₯𝑖 βˆ’ π‘₯0) + 𝛽2(π‘₯𝑖 βˆ’ π‘₯0)^2 + 𝛽3𝐷𝑖 + 𝛽4(π‘₯𝑖 βˆ’ π‘₯0) β‹… 𝐷𝑖 + 𝛽5(π‘₯𝑖 βˆ’ π‘₯0)^2 ⋅𝐷𝑖 + 𝑒i
209
How do we know what degree of polynomial to use?
Just graph the data. If it looks linear, use linear. If it looks quadratic, use quadratic. In a small enough bandwidth, by Taylor’s theorem all polynomials can be approximated by a line, so linear is all that is needed.
210
How do we test the RD assumption
1. **Density**: if there are more individuals above, or below the threshold, it is likely they are choosing and not as good as randomly assigned 2. **Covariate values**: if individuals above and below the threshold have similar average vlaues of observable ocvariates, then they look similar and more bleiveable