inter rater reliability
-Applies when judgment must be exercised in scoring responses (e.g., WAIS-IV VCI subtests)
If there are more than two raters, take the mean of the correlations for each pair of raters (A & B, B & C, A & C)
inter rater reliability categorical decisions
When there is a finite number of categories to which each person being rated can be assigned
Two methods for assessing:
inter rater reliability: percent agreement
how to calculate percent agreement
Two raters independently decide whether an item score should be 0 or 1 for N individuals who complete the item
A = number of times item was scored 0 by both #1 and #2
B = number of times item was scored 0 by #1 and 1 by #2
C = number of times item was scored 1 by #1 and 0 by #2
D = number of times item was scored 0 by both #1 and #2
Percent agreement = percentage of cases for which both raters gave the same score (either both 0 or both 1) = (A+D)/N
calculating chance agreement for a score = 0
Total scores of 0 given for Rater #1 = A + B
Total scores of 0 given for Rater #2 = A + C
Proportion of cases given a score of 0 by Rater #1 = (A+B)/N
Proportion of cases given a score of 0 by Rater #2 = (A+C)/N
Chance agreement for a score of 0 =
(A+B)/N times (A+C)/N
calculating chance agreement for score = 1
Total scores of 1 given for Rater #1 = C + D
Total scores of 1 given for Rater #2 = B + D
Proportion of cases given a score of 1 by Rater #1 = (C+D)/N
Proportion of cases given a score of 1 by Rater #2 = (B+D)/N
Chance agreement for a score of 1 =
(C+D)/N times (B+D)/N
calculating total chance agreement
Add the chance agreement for a score of 0 to the chance agreement for a score of 1
(A+B)/N times (A+C)/N
PLUS
(C+D)/N times (B+D)/N
implications of reliability
standard error of measurement
- The SEM permits us to estimate how much error is likely to be present in an individual examinee’s score
SEM in words
SEM and reliability
standard error of measurement according to classical reliability theory
According to Classical -Error is normally distributed around a mean of 0
-SEM = the standard deviation of the distribution of error scores
Using the probabilities associated with the normal curve
estimating error
We can use the SEM to make probability statements about the amount of error associated with an observed score
NOTE: To do this accurately, we have to use the exact values rather than the “approximate” values we used in Chapter 1 of the Manual
confidence intervals for estimated true score
We can also construct confidence intervals around the estimated true score
-We can’t know the actual true score, but we can estimate it.
These confidence intervals tell us the range in which the person’s true score is likely to fall with a specified degree of certainty (probability)
These are the CI’s that are given in the table in the WAIS-IV Manual
Step 1. Calculate the estimated true score
Step 2. Calculate the standard error of estimate
Step 3. Calculate the desired confidence interval
estimating the true score formula in words
Step 1. Subtract the Mean (M) from the observed score (Xo)
Step 2. Multiply Step 1 by the reliability of the test (rtt)
Step 3. Add the Mean to Step 2.
standard error of estimate
Standard Error of Estimate (SEE) = SEM times reliability
CI’s around estimated score….
…..will sometimes be asymmetrical around the obtained score
Reason: regression towards the mean
difference between estimated true scores and observed scores
GREATER when
LESS when
-Reliability is HIGHER
Observed Score is closer to Mean
standard error of difference
how to find SED in words
The SED will always be larger than the larger of the two SEMs
using SED
example of using SED
SED and WAIS-IV
For most purposes, the differences given in the WAIS-IV Interpretation Manual are sufficient