What is reliability?
The desired consistency or reproductibility of test scores
-does my test give me the same accurate measurement each time?
Test score theory
every person has a true score that we can measure but no test is free from error
x=T+e (x=observed score, T= true score, e= error)
Classical test theory: 4 assumptions
Classical test theory:
The domain sampling model
If we construct a test on something, we can’t ask all possible questions
The domain sampling model
formula
reliability = variance of observed scores on short test/variance of true scores
As the sample gets larger, estimate is more accurate
Other things can affect performance…
Types of reliability
Test-retest reliability
Source of error in test-retest reliability
time sampling
Issues with test-retest reliability
Can we use it when measuring things like mood, stress, etc.?
Won’t the person’s score increase the 2nd time because of practice effect?
What if we want to measure changes between 1st and 2nd administration?
Can the actual experience of being tested change the thing being tested?
What if some event happens in between the 1st and 2nd administration to change the thing being tested?
Parallel forms reliability
Parallel forms reliability- source of error
item sampling
Parallel forms reliability- Ways to change the form of test
Parallel forms reliability: issues
What if we give the different forms to people at two different times?
Do we give the different forms to the same people, or different people?
What if people work out how to answer the one form from doing the other form?
Difficult to generate a big enough item pool
Internal consistency reliability
Do the different items within one test all measure the same thing to the same extent?
I.e., Are items within a single test highly correlated?
Split-half reliability
Coefficient alpha
Internal consistency reliability: source of error
-internal consistency/reliability of on test administered on one occasion
Split-half reliability
A test is split in half
Each half scored separately
Total scores for each half correlated
Split-half reliability- advantage
Split-half reliability- disadvantage
-challenging to divide the test into equal halves
SPEARMAN-BROWN CORRECTION
solves the problem of split half tests having reduced reliability compared to the total test
rsb= 2r(hh)/1+r(hh)
r(sb)- predicted reliability
r(hh)- reliability of the current test(correlation between halves)
Split-half reliability: issues
Example: We have a test of 20 items, split in half, and correlate each half
Similar to 2 tests of 10 items
The fewer items we have, the lower our reliability
Yes – we will get a different reliability coefficient for each different split
Ideally the halves should be equivalent
Coefficient/cronbach’s alpha
Takes the average of all possible split-half correlations for a test
a=kr/(1+(k-1)r)
k- number of indicators
r- mean inter-rater correlation
cronbach’s alpha
-number of items
Rapid increase in internal consistency reliability from 2 to 10 items
Steady increase from 11 to 30
Tapers off after about 40 items
Interpreting cronbach’s alpha