4 different forms of evaluation:
Why evaluate? Objectives:
The logical framework/model of evaluation:
Different levels of program evaluation:
Theory of change:
ToC analyses how inputs lead to intended outcomes/impacts. Identify causal steps and which underlying assumptions need to hold, what data we need etc..
Different types of correlation:
Counterfactual:
Need a group of people telling us what would have been the case if we did NOT implement the program. This cannot be done 100% since we don’t have two identical worlds… But we do our best to find a good enough counterfactual so that w can measure the impact (difference between T and C). This helps us measure causality.
What is the basic formula for measuring impacts?
To take the difference between outcome for participants vs non-participants:
Yi(1) - Yi(0)
But, as we cannot observe same unit, we must take the average impact:
E(Yi(1)) - E(Yi(0)).
So this is the expected value for the T minus the expected value for the C group.
What is the bias of the impact measurement?
The bias is:
E(Y(0)|T) - E(Y(0)|C).
So it’s the difference between being in the treatment group but not receiving the treatment and being in the control group where you obviously not receive the treatment.
If we have a perfect counterfactual, this bias=0.
This B happens because we use an estimate of ATE.
3 techniques for impact evaluation:
Random sampling and assignment:
When we randomly select a sample from a population and den randomly assign some of them in the sample to the T and the rest to the C.
RCT
Random control trial. When using random sample and assignment, we create a relevant comparison group.
There shouldn’t be any systematical differences between the groups, no bias. –> T and C have same outcome Y in absence of the program.
Is it ethical to randomise?
Not always. If the program involves large benefits for the treated ones, then why should my neighbour get those benefits but not me? Just by luck? If we had the chance to prove who needed it the most, maybe it would have been me. But self selection destroys the properties of a relevant counterfactual…
ATE=
Average treatment effect
Issues with RCT:
PSM
Propensity score matching. Find a group that are similar in the observable characteristics and assume that the unobservables also are similar across treated and untreated.
When use PSM?
When RCT is not possible, ex in ex-post situations where program is already implemented or when RCT is too expensive.
PSM method’s 3 steps:
PSM issues:
- Strict assumption that unobservables also are similar.
Difference-in-difference
No random sample available, maybe because the program aimed at help out a certain group.
Diff-in-diff ATE=
[E(Yt1|T) - E(Yt0|T)] - E((Yc1|C) - E(Yc0|C)).
SO simply the difference of the changes over time for the two groups.
Key assumption of DiD
Parallel trends - that the groups have the same pace of change before the program starts, bc then we can assume that if the treated group did had the program they would have been equally well off as the untreated.
Issues of DiD