What are some cultural considerations in test construction and standardization?
1- make sure test norms are appropriate for the targeted test taker population
2- understand the culture and time period of the test taker when interpreting results
3- context should be taken into account
What is reliability?
The consistency in measurement
I.e. how much of the observed variance is due to actual variance among the true scores
What is a reliability coefficient?
An index of reliability
Ratio between the true score variance and the total variance
What is an observed score comprised of?
The “true” score plus error
What is error?
A component of the observed score that doesn’t have to do with the trait being measured
What is measurement error?
Factors associated with the process of measurement that don’t have to do with the variable being measured (i.e. interference)
What are the types of measurement error?
Random: unpredictable fluctuations or inconsistencies (i.e. noise)
Systematic: constant
Where might error be introduced into the assessment process?
1 - Construction of a test
2 - Test administration
3 - Interpretation
4 - Sampling
How might error be introduced when designing a test?
Variance existing within the items on the test
I.e. issues with the phrasing, item sampling, content sampling, examples given, etc.
How might error be introduced during test administration?
Setting, materials, environment
Test-taker: sleep, physical discomfort, personal situations
Test examiner: appearance, demeanor, level of training
What is methodological error?
Issues with training, ambiguous wording, biased framing of questions, etc.
Tends to be more systematic than random
How might error be introduced during interpretation?
Subjectivity by assessors (especially during behavioral assessment)
What is sampling error?
When the sample isn’t actually representative of the population
What is Test-Retest reliability?
Having the same test-takers take the same test under two different administrations (some time has passed)
When is it appropriate to use test-retest as an estimate of reliability? When is it NOT appropriate?
Appropriate when the variable we’re testing is supposed to be stable over time (ex: personality)
NOT appropriate when the variable is expected to change over time (ex: mood)
What is the coefficient of stability?
A type of test-retest in which the time interval between administrations is 6 months or more
What is Parallel-Forms or Alternate-Forms reliability?
Having the same group of test-takers take two different versions of a test (typically form A and form B, one right after the other)
What is the purpose of Parallel-Forms and Alternate-Forms?
To find the Coefficient of Equivalence, or the degree to which two versions/forms of a test (meant to measure the same construct) are similar or equivalent to each other
When is Parallel-Forms or Alternate-Forms reliability appropriate to use?
When we have two versions of a test created to measure the same construct
What is a Parallel Form? What does this mean for the coefficient of equivalence?
Each version of the test produces EQUAL means and variances
This results in a higher coefficient of equivalence
What is an Alternate Form? What does this mean for the coefficient of equivalence?
Each version of the test has similar item content and difficulty, but they don’t meet the strict requirements for a parallel form (same mean and variance)
This results in a lower coefficient of equivalence
What is Split-Half Reliability?
Divide the scores from a single test administration into two halves, then look at the correlation (Pearson r) between the two groups of scores
What is the Spearman-Brown formula?
Used to adjust the half-test reliability by changing the way the test is divided in half (obtaining different combinations and thus correlations)
Allows a test developer to estimate the amount of internal consistency of a test and its items
When is Split-Half reliability appropriate to use?
When the items on the test are linear and continuous (dichotomous items require a different formula)