EC2C3 Notes Flashcards

Question

What does the hat in OLS estimates symbolise?

Answer 1

The hat is used to denote that the estimate may differ from the true population values If we have population data then we can write minimising blaues as just t 𝛽0 and 𝛽1 because we know there is no sampling error (we often still write the hats to denote that we applied the OLS method). With population data we are estimating the exact “true” values. (best approx of cef)

Answer 2

The conditional expectation function

Answer 3

The residual is the estimate of the error 𝑢𝑖 ̂ =𝑦𝑖 −𝛽0 ̂−𝛽1 ̂𝑥1𝑖 −𝛽2 ̂𝑥2𝑖. or uihat = yi - y hat

Answer 4

1. **Expectation of the residual is 0** 𝐸[𝑢𝑖 ̂] =0 2. Residuals are orthognal (uncorrelated in sample) to each regressor 𝐸[𝑥1𝑖𝑢𝑖̂] = 0,𝐸[𝑥2𝑖𝑢𝑖̂] = 0 3. **Covariance of residual and regressor is 0** and thus 𝐶𝑜𝑣(𝑥1𝑖,𝑢𝑖 ̂) =0 and 𝐶𝑜𝑣(𝑥2𝑖,𝑢𝑖̂) = 0.

Answer 5

Cov(X,Y) = E(XY) - E(X)E(Y)

Answer 6

Yes they still satisfy mechanical properties An omitted variable will still be part of the true error (ui) and may be correlated with regressors, but OLS will create regression estimates such that residuals (uhat) are uncorrelated with regressors in the model.

Answer 7

It is defined as ui = yi - E(yi given Xi) Gap between observed and what population expectation is given Xi endogeneity comes from ui and regressors being correlated

Answer 8

Residuals will not be representative of the true effect of all factors other than x1 and x2 on y due to the bias caused by confoudner contaminating regressions and error

Answer 9

𝑠𝑒(𝛽1 ̂)=√(1/ 𝑛) ( 𝑉𝑎𝑟(𝑢𝑖 ̂)/ 𝑉𝑎𝑟(𝑥𝑖))

Answer 10

It means the variance of the error term changes with x

Answer 11

Variance of the error term does not change with x

Answer 12

The baseline standard error formula is incorrect as var of error is related to x Hence we use the robust standard error formula: 𝑠𝑒(𝛽1 ̂)=√(1/𝑛) 𝑉𝑎𝑟((𝑥𝑖 − 𝐸[𝑥𝑖])𝑢𝑖̂) / 𝑉𝑎𝑟(𝑥𝑖)^2 it is used as **default**

Answer 13

Consider a null hypothesis of 𝐻0:𝛽1 = 𝑐 and alternative 𝐻1:𝛽1 ≠ 𝑐. Typically, 𝑐 = 0.

Answer 14

Calculate the t-stat: 𝑡-𝑠𝑡𝑎𝑡 = 𝛽1 ̂−𝑐 / 𝑠𝑒(𝛽1 ̂) = x - null hypoth x/ se(x) If absolute value of T state greater than 1.96 than we reject the null hypothesis (for a=0.05)

Answer 15

The idea of a 95% confidence interval is “If we repeatedly gathered data, created estimates, and created confidence intervals, 95% of those confidence intervals would contain the true value of 𝛽1.” If the value associated with the null hypothesis, 𝑐, is in the interior of the confidence interval, then we fail to reject 𝐻0. If the hypothesized value is not in the interior of the confidence interval, we reject 𝐻0. The 95% confidence interval gives us the set of values, 𝑑, for which we would fail to reject: 𝐻0:𝛽 = 𝑑.

Answer 16

95% 𝐶𝐼 =[𝛽1 ̂−1.96⋅𝑠𝑒(𝛽1 ̂) , 𝛽1 ̂+1.96⋅𝑠𝑒(𝛽1 ̂)]

Answer 17

Represent the probability of observing the estimated value, or something more extremely different from the null, if the null is true if P values less than a we reject the null

Answer 18

Long regression includes at least one more variable e.g. x3i Interested in how this effects the coefficient estimates | q

Answer 19

Regression of **the omitted variable** on the regressors in the short regression e.g. x3i = a0 + a1x1i + a2x2i + vi

Answer 20

𝛽1𝑆 = 𝛽1 +𝛽3𝑎1. In English “the short regression coefficient is equal to the corresponding long coefficient plus the product of the coefficient for the variable of interest in the auxiliary regression multiplied by the coefficient for the omitted variable in the long regression.

Answer 21

OVB is “the mathematical relationship between coefficients for the same variable in any two regressions which differ only in that one regression contains at least one additional regressor” Bias is “The difference between our estimate of the causal effect and the true causal effect

Answer 22

Adding controls can reduce bias if treated and control are more similar due to conditioning on controls. does not imply causal

Answer 23

Show's how regression 'matches', Rather than matching exactly on 𝑥2, regression creates a version of 𝑥1 that is uncorrelated with 𝑥2, written 𝑥1𝑖̃ ̂. (x tilda hat) Regression estimates how 𝑦 changes when 𝑥1𝑖̃ ̂ changes. The idea is that 𝑥1𝑖̃ ̂ changing is not correlated with 𝑥2 changing, and the confounder has been “matched on.”

Answer 24

1. Run a regression of **x1 on all other regressors** 𝑥1𝑖=𝛿0+𝛿1𝑥2𝑖+𝑥1𝑖̃ 2. Calculate **residuals** of that equation to get 𝑥1𝑖̃ ̂ (x tilda hat) It is a property of OLS that 𝐶𝑜𝑣(𝑥1𝑖̃ ̂,𝑥2𝑖)=0 (see above). Residuals, 𝑥1𝑖̃ ̂, represent** “the portion of 𝑥1𝑖 that is uncorrelated with 𝑥2𝑖.” ** 3. Analogously we can do the same for, running a regression of Y on all other regressors except x1 𝑦𝑖=𝛾0+𝛾1𝑥2𝑖+𝑦𝑖̃ Residuals, 𝑦𝑖̃ ̂, represent “the portion of 𝑦𝑖 that is uncorrelated with 𝑥2𝑖.” 4. Thus estimate of 𝛼1 is the same as the estimate of the original 𝛽1 𝑦𝑖̃ ̂=𝛼0+𝛼1𝑥1𝑖̃ ̂+𝑢𝑖 𝑦𝑖=𝛼0+𝛼1𝑥1𝑖̃ ̂+𝑣𝑖

Answer 25

𝑦𝑖= 𝛽0+ ∑(𝛽𝑗𝑥𝑗𝑖) 𝑘 𝑗=1 +𝑢𝑖 𝑠𝑒(𝛽ℓ ̂)=√1𝑛 𝑉𝑎𝑟(𝑢𝑖̂) 𝑉𝑎𝑟(𝑥ℓ𝑖̃ ̂) Above, 𝑥ℓ𝑖̃ ̂ are the residuals from a regression of 𝑥ℓ𝑖 on all other 𝑥.

Answer 26

The transformed variable in the regression no longer is the standard 'average change in yi associated with xi increasing by 1'

Answer 27

1. Write the equation in the form y = f(x) 2. Find 𝜕𝑦/𝜕𝑥 . 3. Use the principle that Δ𝑦 ≈ 𝜕𝑦/𝜕𝑥 (Δ𝑥), and plug in 𝜕𝑦 𝜕𝑥 from step 2. We then need to plug in a value for Δ𝑥 to solve for the average change in y associated with x changng by that amount. either (1) value given to plug in (2) make ajudgement regarding the meaningful value to evaluate; typically the mean/median.

Answer 28

𝑦𝑖 = 𝛽0 +𝛽1ln(𝑥𝑖)+𝑢𝑖 𝛽**1 is approximately the average change in 𝑦 associated with 𝑥 increasing by 100%.** (We use 𝛽1/100 for a 1% change).

Answer 29

ln(𝑦𝑖) = 𝛽0 +𝛽1𝑥𝑖 +𝑢𝑖 **100⋅ 𝛽1 is approximately the average percent change in 𝑦 associated with 𝑥 increasing by 1.**

Answer 30

ln(𝑦𝑖) = 𝛽0 +𝛽1ln(𝑥𝑖) +𝑢𝑖 𝛽**1 is approximately the average percent change in 𝑦 associated with 𝑥 increasing by 1%. (The elasticity of 𝑦 with respect to 𝑥).**

Answer 31

The principles of Δ𝑦 ≈ 𝜕𝑦 𝜕𝑥 Δ𝑥 is still an approximation and thus it must alwasy ve said to be the **approxmiately the average ...**

Answer 32

Valid with **small percent changes (< 20%)** For just ln(X) linearlog: we consider a **1% change in x (B1/100)** to avoid this concern. need to consider a change in x of less than 20% Log-log, for both ln(x) and ln(y): we consider a 1% change in x to avoid this issue, again B1/100 For just ln(y) log-linear: if B1>0.2, then use the formula **100(𝑒^(𝛽1) − 1)** for the exact average percent change in y associated with x increasing by 1

Answer 33

“holding fixed 𝑥2𝑖, 𝛽1 is the average change in 𝑦𝑖 associated with an individual having 𝑥1𝑖 = 1 rather than 𝑥1𝑖 = 0.

Answer 34

𝛽1 is interpreted as “the average change in 𝑦𝑖 associated with an individual having 𝐷𝑖 = 1 rather than 𝐷𝑖 = 0.” In this case with no controls, there are formulas for the estimates of 𝛽0 and 𝛽1.

Answer 35

𝛽0̂=𝑦̅𝑖,𝐷=0 (the average of 𝑦 for the observations with 𝐷𝑖 = 0) 𝛽1̂= 𝑦̅𝑖,𝐷=1 −𝑦̅𝑖,𝐷=0 (the difference in the average of 𝑦 between the groups with 𝐷𝑖 = 1 and 𝐷𝑖 = 0)

Answer 36

No regressor can be a sum of multiples of other regressors and a constant E.g., if 𝑥1𝑖 = 𝑎0 + 𝑎1𝑥2𝑖 + 𝑎3𝑥3𝑖 for constant real numbers 𝑎0, 𝑎1, 𝑎2, and 𝑎3, there is a violation. In such a case, estimates cannot be created

Answer 37

There is a **violation of perfect collinearity** and estimates can't be created This is the **dummy variable trap** e.g. 𝑦𝑖 = 𝛽0 +𝛽1𝑆𝑝𝑟𝑖𝑛𝑔𝑖 +𝛽2𝑊𝑖𝑛𝑡𝑒𝑟𝑖 +𝛽3𝑆𝑢𝑚𝑚𝑒𝑟𝑖 +𝛽4𝐴𝑢𝑡𝑢𝑚𝑛𝑖 +𝑢𝑖

Answer 38

1. **Omit a dummy variable** 2. **Remove the constant**

Answer 39

Interaction variables e.g. x3i = x1i * x2i

Answer 40

1. Differentiate w.r.t variable of interest (x2) (𝜕𝑦𝑖/ 𝜕𝑥2𝑖) =𝛽2+𝛽3𝑥1𝑖. Thus, 𝛽2 represents “the average change in 𝑦𝑖 associated with 𝑥2𝑖 increasing by 1 if 𝑥1𝑖 = 0.” (“The average change in income associated with one more year of education for non-UK citizens.”) 𝛽3 represents “The average difference in the association of one more year of education with income for UK compared to non-UK citizens.” Thus, the association of one more year of education with income for UK citizens is 𝛽2 + 𝛽

Answer 41

Var (A) + Var (B) + 2Cov(A,B)

Answer 42

Var (A) + Var (B) - 2Cov(A,B)

Answer 43

Need SE(𝛽1 ̂−𝛽2 ̂) 𝑉𝑎𝑟(𝛽1 ̂−𝛽2 ̂) = 𝑉𝑎𝑟(𝛽1 ̂)+𝑉𝑎𝑟(𝛽2 ̂)−2𝐶𝑜𝑣(𝛽1 ̂,𝛽2 ̂). Thus: 𝑡-𝑠𝑡𝑎𝑡 = (𝛽1 ̂−𝛽2 ̂−0)/ (√𝑉𝑎𝑟 ̂ (𝛽1̂)+𝑉𝑎𝑟 ̂ (𝛽2̂)−2𝐶𝑜𝑣 . ̂ (𝛽1 ̂,𝛽2 ̂))

Answer 44

Hypothesis that requires 2 or more equal signs H0: B1=0 & B2=0

Answer 45

No as it would be imprecise as each test ignored half of the hypothesis

Answer 46

We need to do an F-test

Answer 47

When we only observe 𝑥1𝑖 = 𝑥1𝑖 ∗ + 𝑤𝑖 where 𝐶𝑜𝑣(𝑤𝑖,𝑢𝑖) = 0 and 𝐶𝑜𝑣(𝑤𝑖,𝑥1𝑖 ∗) = 0 i.e. some constant is added to all observed x measurements, it is uncorrelated to the error or regressor

Answer 48

*attenuation bias*, estimated coefficient is biased towards 0 --> IB1hatI < B1

Answer 49

* X Variable stretch, slope of line closer to 0 * Increase in var(x) and var(uhat) * Ambiguous effect on standard errors, normally increase though

Answer 50

Only observe 𝑦1𝑖 = 𝑦1𝑖 ∗ + 𝑤𝑖 where 𝐶𝑜𝑣(𝑤𝑖,𝑢𝑖) = 0 and 𝐶𝑜𝑣(𝑤𝑖,𝑥1𝑖 ∗) = 0.

Answer 51

This form of measurement error does not result in bias, the measurement error is uncorrelated with x1i * it doesn't result in any omitted confounders

Answer 52

* Y variable is stretched, but avg. value of y for each x does not change, and thus the estimated slope is on average unchanged * Increases var(Uhat) and standard errors normally increases

Answer 53

Any form of more complicated measurement error Evaluated on a case by case basis e.g. systematic overestimation of healthy habits, underestimation of unhealthy habits

Answer 54

1. Missing at random 2. Data missing based on a cutoff at the x value; either x or y missing if x is below some threshold 3. Data missed based on a cutoff at the y value; either x or y is missing if y is below some threshold

Answer 55

* No concerns of bias * Just a smaller sample * OLS estimator is still unbiased

Answer 56

* No Bias As the slope of the regression line is the same across the domain of all x, we just have a smaller domain but still the same slope

Answer 57

* Causes Bias Error is represented by vertical distance between a point and the line. small x values need a large positive error to meet threshold, thus as x increases, error term decreases on average --> omitted variable in ui that changes on avg. when x changes --> confounder

Answer 58

𝑦𝑖 = 𝛽0 +𝛽1𝑥1𝑖 +𝑢𝑖 There is OVB if there is a variable that is correlated with 𝑥1𝑖 and also correlated with 𝑦𝑖. The idea of a control variable is to bring the omitted variable out of the error term and include it directly in the model, e.g., 𝑦𝑖 = 𝛽0 +𝛽1𝑥1𝑖 +𝛽2𝑥2𝑖 +𝑢𝑖.

Answer 59

Good control variables - determined prior to treatment or are immutable characteristics of individuals (isn't an outcome of treatment) Bad control variables - control variable that introduces a new confounder; typically happens when the **control is itself an outcome or determined after treatment**

Answer 60

Holding fixed the bad control , changes in treatement may be correlated with changes in a confounder Adding a bad control induces correlation of treatment with confounders

Answer 61

Outcomes of treatment are affected by variables that are components of the error. If we include an outcome of treatment as a regressor, we induce confounders (because omitted variables are correlated with regressors and also affect the outcomes)

Answer 62

Estimate can be interpreted as a causal effect for the population that is used in the study no issues (confounders, attenuation bias, bias due to y cutoffs, no simultanaeity/reverse causality, no bad control)

Answer 63

estimate is represenative of the effect for another population nearly always an assumption, checked by creating estimates in various settings and checking if effects are comparable

Answer 64

𝑅2 represents the fraction of the variation in the outcome that is explained by the regression line.

Answer 65

𝑅2 =𝑉𝑎𝑟(𝑌̂) /𝑉𝑎𝑟(𝑌) =1−𝑉𝑎𝑟(𝑢̂) /𝑉𝑎𝑟(𝑌)

Answer 66

R^2 will never decrease

Answer 67

No it only tells you if the points are close to the line

Answer 68

When the goal is to predict y (as opposed to treatment estimation), we no longer care for causality and just for x explaining a lot of the variation in y This is the case in the first stage of an insrumental variables regression

Answer 69

standardising is a form of normalising where we - u (mean) - / (divide) by s.d. useful for when units cannot be easily understood

Answer 70

𝛽1∗ is interpreted as “the average change in 𝑦 that is associated with 𝑥1 increasing by 1 standard deviation.”

Answer 71

𝛽1∗ is interpreted as “the average number of standard deviations that 𝑦 changes by that is associated with 𝑥1 increasing by 1.”

Answer 72

“the average number of standard deviations that 𝑦 changes by that is associated with 𝑥1 increasing by 1 standard deviation.”

Answer 73

No for the interpretation it is sufficient to divide by the standard deviation

Answer 74

If it matches the two assumptions of: (1) **relevance, 𝐶𝑜𝑣(𝑧𝑖,𝑥1𝑖) ≠ 0**, the instrument is **correlated with the variable of interest** (2) **exogeneity, 𝐶𝑜𝑣(𝑧𝑖, 𝑢𝑖) = 0**, the instrument is **uncorrelated with the error** term of the regression. --> made up of exclusion, and as good as randomly assigned

Answer 75

Relevance is that the** instrument is correlated with the variable of interest** ** 𝐶𝑜𝑣(𝑧𝑖,𝑥1𝑖) ≠ 0** Without it, it can't be a good instrument to isolate the variation in treatment that is not due to source(s) of bias

Answer 76

Exclusion is one of the two assumptions of exogeneity (𝐶𝑜𝑣(𝑧𝑖, 𝑢𝑖)=0 ) The exclusion assumption is that **𝑧𝑖 does not directly affect 𝑦𝑖** (i.e., that 𝑧𝑖 itself is excluded from 𝑢𝑖 and only affects 𝑦𝑖 through correlation with 𝑥1𝑖, or controls if there are any).

Answer 77

Realised value of **𝑧𝑖 is uncorrelated with all unobserved factors in 𝑢𝑖 that affect 𝑦𝑖**. I.e., 𝑧𝑖 is not itself determined by unobserved factors that affect 𝑦𝑖

Answer 78

Relevance, 𝐶𝑜𝑣(𝑧𝑖,𝑥1𝑖) ≠ 0, is probably true. Being drafted made people much more likely to serve in the military.

Answer 79

“As good as randomly assigned” is almost certainly true. The lottery operated by birthdates being randomly chosen, and individuals with chosen birthdates being drafted. Being drafted should not have been correlated with any determinants of income at age 50. In fact, Gmeiner would argue that 𝑧 was randomly assigned, not just “as good as” randomly assigned.

Answer 80

Exclusion would mean that the only mechanism by which being drafted affected income at age 50 was through military service. This is less likely to be true. Some individuals who were drafted chose to pursue a college education (because they knew by going to college the military would allow them to avoid service). Thus, there is a potential secondary channel whereby 𝑧𝑖 affects 𝑦𝑖 that is not only through 𝑥1𝑖. Exclusion might fail.

Answer 81

The most general method of implementing analysis with an instrumental variable is called 2SLS

Answer 82

1. Consider the **first-stage** regression (variable causing bias is outcome, and instrument is a regressor) 2. Estimate first stage w OLS, creating predicted values (e.g. ̂ =𝛿0 ̂+𝛿1 ̂𝑧𝑖.) 3. Estimate **second-stage** by using predicted values in equation of interest rather than original xi 𝑦𝑖 = 𝛽0 +𝛽1𝑥1𝑖 ̂+𝑢𝑖 4. Estimate OLS of second stage with predicted values, gives us 2SLS Estimate of B1

Answer 83

It will converge to the true B1, overcoming bias

Answer 84

Variable Causing Bias = d0 + d1*Instrument + error We create predicted values with this

Answer 85

Equation of interest but with predicted values in place of xi

Answer 86

An outcome we care about is on the left, and variables that do not cause bias is on the right Y= a0 + a1zi + ui Where z is the instrument

Answer 87

Estimate of reduced form coefficient, gives an estimate of relationship of zi and yi. Exogeneity means z cannot affect y directly

Answer 88

As the reduced form coefficient is equivalent to B1d1 d1 (from 1st stage) represents the association of zi with x1i B1 represents the effect of x1i on y

Answer 89

Referred to as the **intention-to -treat** effect Most common if z is a binary variable that offers thre treatment, while x is the treatment

Answer 90

𝛿1 ̂=𝐶𝑜𝑣(𝑧𝑖,𝑥1𝑖) / 𝑉𝑎𝑟(𝑧𝑖) 𝑥1𝑖 = 𝛿0 +𝛿1𝑧𝑖 +𝑣𝑖

Answer 91

𝜙1 = 𝐶𝑜𝑣(𝑧𝑖,𝑦𝑖) / 𝑉𝑎𝑟(𝑧𝑖)

Answer 92

𝐶𝑜𝑣(𝑧𝑖,𝑦𝑖) / 𝐶𝑜𝑣(𝑧𝑖,𝑥1𝑖) COV(z,y); z and y from the reduced form Cov(z,x); z and x from the first stage

Answer 93

𝛽 ̂1,2𝑠𝑙𝑠 = 𝜙1/𝛿1 = 𝐶𝑜𝑣(𝑧𝑖,𝑦𝑖) / 𝐶𝑜𝑣(𝑧𝑖,𝑥1𝑖)

Answer 94

𝛽 ̂1,2𝑠𝑙𝑠 = 𝐶𝑜𝑣(𝑧𝑖,𝑦𝑖) / 𝐶𝑜𝑣(𝑧𝑖,𝑥1𝑖) The notation **(𝑦̅𝑖,𝑧=1 − 𝑦̅𝑖,𝑧=0) / (𝑥̅𝑖,𝑧=1 − 𝑥̅𝑖,𝑧=0)** This is the Wald Estimator: n 𝑤 ̅𝑖,𝑧=𝑐 denotes the average of 𝑤 for the subsample with 𝑧𝑖 = 𝑐.

Answer 95

If exogeneity is true, we have 𝐶𝑜𝑣(𝑥1𝑖 ̂,𝑢𝑖) = 𝐶𝑜𝑣(𝛿0 ̂+𝛿1 ̂𝑧𝑖,𝑢𝑖) = 𝛿1 ̂𝐶𝑜𝑣(𝑧𝑖,𝑢𝑖) = 0. The final equality is because we assume exogeneity, 𝐶𝑜𝑣(𝑧𝑖,𝑢𝑖) = 0. Essentially, the predicted values from the first stage represent the “portion” of 𝑥1𝑖 that is uncorrelated with 𝑢𝑖. (I.e., we isolate the portion of 𝑥1𝑖 that is uncorrelated with all unobserved determinants of the outcome, and thus there are no confounders). We attain a representative estimate of 𝛽1.

Answer 96

Relevance means 𝛿1 estimates of 𝛽1 ̂≠0 because 𝐶𝑜𝑣(𝑧𝑖,𝑥1𝑖) ≠ 0. With this, 𝑉𝑎𝑟(𝑥1𝑖 ̂ ≠0, and we can create estimates of B1 in the second stage (i.e., to calculate 𝛽1 =𝐶𝑜𝑣(𝑥1𝑖 ̂,𝑦𝑖) / 𝑉𝑎𝑟(𝑥1𝑖 ̂) , we need 𝑉𝑎𝑟(𝑥1𝑖 ̂) to be nonzero).

Answer 97

Need 'strong relevance', statisticla rs between zi and xi is strong (d1 stat. sig. or high R2 in first stage regression)

Answer 98

√(1/𝑛) (𝑉𝑎𝑟(𝑢̂) / 𝑉𝑎𝑟(𝑥1𝑖 ̂) ) . We need a large 𝑉𝑎𝑟(𝑥1𝑖 ̂) to attain a low standard error for B1 in the second stage the **residual is defined by the oriignal data**

Answer 99

The formula for the standard error of 𝛿1 ̂ in the first stage is √(1/𝑛) 𝑉𝑎𝑟(𝑣̂) / 𝑉𝑎𝑟(𝑧) . To attain a small standard error 𝛿1 ̂ in the first stage), we need a small 𝑉𝑎𝑟(𝑣̂).

Answer 100

𝑉𝑎𝑟(𝑥1𝑖) = 𝑉𝑎𝑟(𝛿0 ̂+𝛿1 ̂𝑧𝑖 +𝑣𝑖̂) 𝑉𝑎𝑟(𝑥1𝑖) = 𝑉𝑎𝑟(𝛿0 ̂+𝛿1 ̂𝑧𝑖)+𝑉𝑎𝑟(𝑣𝑖̂) 𝑉𝑎𝑟(𝑥1𝑖) = 𝑉𝑎𝑟(𝑥1𝑖 ̂)+𝑉𝑎𝑟(𝑣𝑖̂) The key to notice is that if 𝑉𝑎𝑟(𝑥1𝑖 ̂) is large we will have small 𝑉𝑎𝑟(𝑣̂). In such a case, 𝛿1 standard error (is more likely to be significant) and 𝛽1 ̂ has a small ̂ will also have a smaller standard error.

Answer 101

To test for significance of 𝛿1 ̂ we consider the 𝑡-𝑠𝑡𝑎𝑡, | 𝛿1 ̂−0/ 𝑠𝑒(𝛿1) , If this is large enough for significance, it is because 𝑠𝑒(𝛿1 ̂) is small, which means 𝑉𝑎𝑟(𝑣̂) is small, and thus 𝑉𝑎𝑟(𝑥1𝑖 ̂) is big, and 𝑠𝑒(𝛽1 ̂) is small.

Answer 102

The key takeaway is that we need a statistically significant coefficient for the instrument in the first stage to attain small standard errors in the second stage.

Answer 103

1. Estimate a **separate first stage for each regressor** that might cause bias, with that regressor as the outcome. 2. We** include all instruments and controls that do not caue bias, while omitting any that could cause bias in the second stage.** 3. Create predicted values from each first stage regression, plug into equation of interest for second stage

Answer 104

Analogous: (1) Relevance, each **regressor with a bias concern is correlated with the instruments.** (2) Exogeneity, **C𝑜𝑣(𝑧𝑗𝑖,𝑢𝑖) = 0 for all instruments 𝑗. **

Answer 105

**Rank,** we **need at least as many instruments as regressors that are instrumented**. The intuition is that each instrument can only “fix” one regressor, although if we have extra instruments, that is beneficial because it creates more variation in the regressors (see relevance).

Answer 106

If we have two instrumented regressors and two instruments, with one of the instruments correlated with both instrumented regressors, and the other instrument uncorrelated with both instrumented regressors, then estimates will have large standard errors. Essentially** relevance “fails”, because the one instrument that is relevant is not able to create enough variation in both regressors.**

Answer 107

If we have more instruments than regressors that are instrumented

Answer 108

if there are the same number of instruments as regressors that are instrumented

Answer 109

Only y has a causal effect on x

Answer 110

x has a causal effect on y and, also, y has a causal effect on x

Answer 111

We only iobserve the price and quantity pairs We don't observe the supply and demand curves directly

Answer 112

Regression with ln(P) as the outcome and ln(Q) as the regressor results in a single regression line

Answer 113

They are simultaneous equations, as P and Q are simulataneously determined by each other Two parameters that explain the rs between P and Q

Answer 114

They show the structure (theory) of the system

Answer 115

Key Points: - Assume rs - Simultaneity issue; explain with second equation - Data is realised pairs - OLS doesn't isolate a single mechanism; 'bundles both' - Describe bias

Answer 116

Shifts in the blue lines represent changes in ui, **which causes changes in y1i holding fixed y2i**

Answer 117

Shifts in the red line represent changes in vi, which causes **changes in y2i holding fixed y1i**

Answer 118

Find an instrument that causes variation in one channel, holding fixed the other channel . I.e., to estimate 𝛼1, we need to create variation that holds fixed the relationship defined by 𝛼1 (see below).

Answer 119

A potential instrument is 𝑥2, must make the exogeneity assumption that 𝐶𝑜𝑣(𝑥2𝑖,𝑣𝑖) = 0. The first stage equation is, 𝑦1𝑖 = 𝛾0 +𝛾1𝑤2𝑖 +𝛾2𝑥2𝑖 +𝜂𝑖. incl x2 regressor and w2 as control We calculate predicted values, and use in the equation of interest, then estimate with OLS, 𝑦2𝑖 = 𝛼0 +𝛼1𝑦1𝑖 ̂+𝛼2𝑤2𝑖 +𝑣𝑖. The key concept is that the instrument creates variation that holds fixed one channel (i.e., 𝑣𝑖 is assumed to be uncorrelated with 𝑥2𝑖. Thus, 𝑣𝑖 doesn’t change on average when 𝑥2𝑖 changes, which means we are holding fixed the red line).

Answer 120

**Determined within the system** Any regressor that causes bias

Answer 121

**Taken as given, not from the system** Variable does not cause bias as a regressor in the OLS

Answer 122

Identified parameter, it can be learned from an infinite amount of data

Answer 123

Cannot be learnt, even with an infinite amount of data

Answer 124

The principle of a reduced form equation is heuristically “an outcome we care about is on the left and variables that do not cause bias are on the right.” More formally we define a reduced form by, “an endogenous outcome is on the left and exogenous variables are on the right.”

Answer 125

Derive reduced form equations by plugging one structural equation into the other and simplifying

Answer 126

can be estimated by OLS, but not structural equations of interest

Answer 127

The reduced form coefficients for the respective instruments, divided, can be used to estimate each parameter Dividing strips out the scale of the instrument and isolates the causal link

Answer 128

We use 2SLS

Answer 129

No we couldn't We could say the equation is unidentified

Answer 130

Data on several individuals at a single point in time

Answer 131

𝑦𝑖 = 𝛽0 + 𝛽1𝑥1𝑖 + 𝛽2𝑥2𝑖 + 𝑣i for which the subscript 𝑖 denotes individual 𝑖.

Answer 132

Observe data for several individuals, and observe each individual at several points in time e.g. N individuals for T time periods

Answer 133

𝑦𝑖𝑡 = 𝛽0 + 𝛽1𝑥1𝑖𝑡 + 𝛽2𝑥2𝑖𝑡 + 𝑣𝑖𝑡. Subscript 𝑖𝑡 denotes individual 𝑖 at time 𝑡.

Answer 134

Decompose error term 𝑣𝑖𝑡 into** 𝑎𝑖, representing a time-invariant piece, and 𝑢𝑖𝑡, representing a time-varying piece.** ai is essentially the effect of being an individual

Answer 135

Unobserved and possibly correlated with our regressors --> confounder Concern always there but discuss in panel data because data is strong enough to allow us to overcome the concern

Answer 136

1. First Differences 2. Fixed Effects

Answer 137

For any variable, 𝑤, define the notation Δ𝑤𝑖𝑡 ≔ 𝑤𝑖𝑡 − 𝑤𝑖,𝑡−1. 1. Perform Δ𝑦𝑖𝑡 = 𝑦𝑖𝑡 − 𝑦𝑖,𝑡−1 Δ𝑦𝑖𝑡 = 𝛽0 + 𝛽1𝑥1𝑖𝑡 + 𝛽2𝑥2𝑖𝑡 + 𝑎𝑖 + 𝑢𝑖𝑡 − (𝛽0 + 𝛽1𝑥1𝑖,𝑡−1 + 𝛽2𝑥2𝑖,𝑡−1 + 𝑎𝑖 + 𝑢𝑖,𝑡−1) 2. Thus Δ𝑦𝑖𝑡 = 𝛽1Δ𝑥1𝑖𝑡 + 𝛽2Δ𝑥2𝑖𝑡 + Δ𝑢𝑖𝑡 **ai is differenced away** thus ai is no longer a confounder because it does not affect the difference across time periods in either treatment or outcome

Answer 138

No, any time invariant effects (constant, other variables) are removed Cost of removing bias is that we lose all these time invariant effects

Answer 139

we take it out of the error term and include it directly in the model we can do an analgous operation for time-invariant effects tthrough dummy variables (fixed effects)

Answer 140

we include the dummy variable, 𝛿𝑖 for all individuals except one (we must exclude one dummy variable to avoid perfect collinearity due to the dummy variable trap). The individual without a dummy variable in the model is often called the “omitted group” or “comparison group.”

Answer 141

𝛽1 and 𝛽2 are “the average change in the outcome associated with 𝑥1 (or 𝑥2) increasing by 1, holding fixed all other 𝑥 and holding fixed who the individual is.” 𝛽0 is “the expected outcome for the omitted group when all 𝑥 are 0.” 𝑎𝑖 is “the average change in the outcome associated with being individual 𝑖 compared to the omitted group, holding fixed all 𝑥.

Answer 142

No, including the variable **violates no perfect collinearity** The heuristic explanation is that the dummy variable, 𝛿𝑖, and effect 𝑎𝑖, capture the effect of all time-invariant characteristics of person 𝑖. If a variable, 𝑥2𝑖, does not change over time, we could lump its effect in with 𝑎𝑖, and do not need to separately estimate the effect.

Answer 143

Again can't estimate the effect of any time-invariant variables Any time invariant variables must be excluded

Answer 144

Use it to refer to any situation in which dummy variables are included all possible values of a variable commonly applied to time periods

Answer 145

When we control for both individual and time fixed effects

Answer 146

For the purpose of the exam, just know that first differences and fixed effects are two methods of overcoming the bias caused by 𝑎𝑖. Know the mechanics as described above. In practice, fixed effects is more common because of the “simplicity” of implementation and because of the desirability of directly estimating the 𝑎𝑖.

Answer 147

in a randomised trial

Answer 148

Use DiD: 𝛿̂ = 𝑦̅̅22̅̅ − 𝑦̅̅21̅̅ − (𝑦̅̅12̅̅ − 𝑦̅̅11̅̅) Treatment effect = average post treatment - average pre treatment - (average post control - average pre control)

Answer 149

𝐸[𝛿̂] = 𝐸[𝑦̅̅22̅̅ − 𝑦̅̅21̅̅ − (𝑦̅̅12̅̅ − 𝑦̅̅11̅̅)] = 𝐸[𝛿(1) + 𝛾2 + 𝜆2 + 𝑢22] − 𝐸[𝛿(0) + 𝛾2 + 𝜆1 + 𝑢21] − (𝐸[𝛿(0) + 𝛾1 + 𝜆2 + 𝑢12] − 𝐸[𝛿(0) + 𝛾1 + 𝜆1 + 𝑢11]) The 𝛾𝑖 and 𝜆𝑡 terms cancel, leaving us with the expression below. = 𝛿 + 𝐸[𝑢22 − 𝑢21 − (𝑢12 − 𝑢11)] We assume the errors are all mean-zero. We have 𝐸[𝛿̂] = 𝐸[𝑦̅̅22̅̅ − 𝑦̅̅21̅̅ − (𝑦̅̅12̅̅ − 𝑦̅̅11̅̅)] = �

Answer 150

The change of the outcome in the control group between time periods is what would have happened in the treatment group in the absence of treatment the observed outcome in the control group, is the same as what would've happened in the treatment group in the absence of treatment

Answer 151

shown by **𝜆𝑡 being the same for both treated and control**, and thus the 𝜆2 and 𝜆1 terms cancelled above. Consider if the effect of time differed across groups 𝐸[𝑦̅̅22̅̅ − 𝑦̅̅21̅̅ − (𝑦̅̅12̅̅ − 𝑦̅̅11̅̅)] goes to = 𝛿 + (𝜆22 − 𝜆21) − (𝜆12 − 𝜆11) we **estimate the effect of treatment bundled with the time trend in the treated group minus the time trend in the control group. The estimator of the treatment effect is biased by the difference in the time trends. **

Answer 152

𝐸[𝑦22(0) − 𝑦21(0)] = 𝐸[𝑦12(0) − 𝑦11(0)]

Answer 153

It can be implemented using a regression with dummy variables, and interactions

Answer 154

𝑦𝑖𝑡 = 𝛽0 + 𝛽1𝑇𝑟𝑒𝑎𝑡𝑒𝑑𝑖𝑡 + 𝛽2𝑃𝑜𝑠𝑡𝑖𝑡 + 𝛽3𝑇𝑟𝑒𝑎𝑡𝑒𝑑𝑖𝑡 ⋅ 𝑃𝑜𝑠𝑡𝑖𝑡 + 𝑢𝑖t

Answer 155

𝑦𝑖𝑡 = 𝛽0 + 𝛽1𝑇𝑟𝑒𝑎𝑡𝑒𝑑𝑖𝑡 + 𝛽2𝑃𝑜𝑠𝑡𝑖𝑡 + 𝛽3𝑇𝑟𝑒𝑎𝑡𝑒𝑑𝑖𝑡 ⋅ 𝑃𝑜𝑠𝑡𝑖𝑡 + 𝑢𝑖t 𝑇𝑟𝑒𝑎𝑡𝑒𝑑𝑖𝑡 is a binary variable that takes the value 1 if an individual is in the treated group and 0 otherwise P𝑜𝑠𝑡𝑖𝑡 is a binary variable that takes the value 1 for the post-treatment time period(s) and 0 otherwise. 𝑇𝑟𝑒𝑎𝑡𝑒𝑑𝑖𝑡 ⋅ 𝑃𝑜𝑠𝑡𝑖𝑡 is an interaction. It is binary, taking the value 1 for the treated group posttreatment and 0 otherwise 𝜕𝑦𝑖𝑡/𝜕𝑇𝑟𝑒𝑎𝑡𝑒𝑑𝑖𝑡 = 𝛽1 + 𝛽3𝑃𝑜𝑠𝑡𝑖𝑡 * 𝛽1 is the** pre-treatment average change in the outcome associated with being the treated group compared to the control group**. 𝛽3 is the **average change in this association after treatment time** (presumably, 𝛽3 is the effect of treatment). 𝜕𝑦𝑖𝑡/𝜕𝑃𝑜𝑠𝑡𝑖𝑡= 𝛽2 + 𝛽3𝑇𝑟𝑒𝑎𝑡𝑒𝑑𝑖𝑡 * 𝛽2 is the **average change in the outcome associated with being in the post-treatment time period compared to the pre-treatment time period for the control group**. 𝛽3 is the **average difference in this for the treatment group** (presumably, 𝛽3 is the effect of treatment).

Answer 156

* 𝛽0 is the average 𝑦 in the before period for the control group.

Answer 157

𝛽0 + 𝛽1 is the average 𝑦 in the before period for the treated group.

Answer 158

B0 + 𝛽2 is the average 𝑦 in the after period for the control group.

Answer 159

average 𝑦 in the after period for the treated group

Answer 160

𝛽3 is the effect of being in the treated group in the post time period (the effect of treatment, previously written as 𝛿).

Answer 161

(𝛽0 + 𝛽1 + 𝛽2 + 𝛽3 − (𝛽0 + 𝛽1)) − (𝛽0 + 𝛽2 − (𝛽0)) = 𝛽3

Answer 162

It is the same as DiD as an 'identification strategy' - using non-experimental observational data to estimate causal data

Answer 163

multi-valued or continuous variable, 𝑥𝑖, and a binary variable, 𝐷𝑖, for which: * 𝐷𝑖 = 1 𝑖𝑓 𝑥𝑖 ≥ 𝑥0 * 𝐷𝑖 = 0 𝑖𝑓 𝑥𝑖 < 𝑥0. xi is called the running or forcing variable, goal is to estimate causal effect of Di on yi

Answer 164

The multi-valued or continuous variable that binary variable Di, switches depending on its value

Answer 165

We estimate a regression of the form below. 𝑦𝑖 = 𝛽0 + 𝛽1(𝑥𝑖 − 𝑥0) + 𝛽2𝐷𝑖 + 𝛽3(𝑥𝑖 − 𝑥0) ⋅ 𝐷𝑖 + 𝑢𝑖

Answer 166

𝑦𝑖 = 𝛽0 + 𝛽1(𝑥𝑖 − 𝑥0) + 𝛽2𝐷𝑖 + 𝛽3(𝑥𝑖 − 𝑥0) ⋅ 𝐷𝑖 + 𝑢1 𝛽1 is “the average change in 𝑦𝑖 associated with 𝑥𝑖 increasing by 1 when 𝑥𝑖 < 𝑥0.” 𝛽1 + 𝛽3 is “the average change in 𝑦𝑖 associated with 𝑥𝑖 increasing by 1 when 𝑥𝑖 ≥ 𝑥0. 𝛽2 is “the average change in 𝑦𝑖 when 𝐷 increases from 0 to 1 at 𝑥𝑖 = 𝑥0.” (Note that this interpretation relies on centring 𝑥𝑖 on 𝑥0.)

Answer 167

estimates for the costant and coefficient for binary variable differ at x=0 jump is different, figures generally more complicated? why?

Answer 168

key to implementing a regression discontinuity is that we only use data for which 𝑥 is within a “window” or “bandwidth” of the threshold, 𝑥0. That is, we use data for which 𝑥𝑖 ∈ [𝑥0 − ℎ, 𝑥0 + ℎ].

Answer 169

individuals who receive treatment arecomparable to individuals who did not receive treatment (i.e., we assume that 𝐷 is as good as randomly assigned and there are no confounders). Part of this assumption is that individuals cannot choose on which side of the cutoff they are, AND the bandwidth is small

Answer 170

We thus (1) restrict the data to observations with 𝑥 in a bandwidth around 𝑥0 (2) only assume that, within the bandwidth of 𝑥 around the threshold, treatment is as good as randomly assigned.

Answer 171

allows for estimating the linear effect of 𝑥 on the outcome Excluding these can induce bias. If we only estimated, 𝑦𝑖 = 𝛽0 + 𝛽2𝐷𝑖 + 𝑢𝑖 the estimate of 𝛽2 could be contaminated by the association of 𝑥 with 𝑦i

Answer 172

* means more data is used * SE smaller when the sample size is larger * more likely assumption is violated

Answer 173

Increasing bandwidth, increases bias but reduces variance Decreasing bandwidth, decreases bias but increases variance in practice we just use multiple, and check if sensiitve to it

Answer 174

Overcome by augmeting the regression equation with nonlinear terms e.g. using a quadratic, need it on both sides of the jump so it appears twice: 𝑦𝑖 = 𝛽0 + 𝛽1(𝑥𝑖 − 𝑥0) + 𝛽2(𝑥𝑖 − 𝑥0)^2 + 𝛽3𝐷𝑖 + 𝛽4(𝑥𝑖 − 𝑥0) ⋅ 𝐷𝑖 + 𝛽5(𝑥𝑖 − 𝑥0)^2 ⋅𝐷𝑖 + 𝑢i

Answer 175

Just graph the data. If it looks linear, use linear. If it looks quadratic, use quadratic. In a small enough bandwidth, by Taylor’s theorem all polynomials can be approximated by a line, so linear is all that is needed.

Answer 176

1. **Density**: if there are more individuals above, or below the threshold, it is likely they are choosing and not as good as randomly assigned 2. **Covariate values**: if individuals above and below the threshold have similar average vlaues of observable ocvariates, then they look similar and more bleiveable

EC2C3 Notes Flashcards

(210 cards)