Item difficulty
P=
Total Number of Examinees Passing the Item divided by
Total Number of Examinees
Item discrimination
refers to the extent to which an item differentiates between examinees
who obtain low or high scores on the test or an external criterion
*Symbolized by letter “D”
* -1 to +1
Internal consistency reliability
*Indicates the degree of consistency across different test items
* is useful for estimating the reliability of tests that measure characteristics that fluctuate over time or are susceptible to memory or practice effects
Cronbach’S coefficient alpha
Mean of all possible split-half correlation coefficients
Inter-rater reliability
Used for measures that are subjectively scored (essays and projective tests)
* uses kappa statistic
Test-retest reliability
*administering the same test to the same examinees on two occasions and correlating the two sets of scores
*appropriate for determining reliability of tests designed to measure attributes that are relatively stable over time and that are not affected by repeated measurement.
Factors that affect the reliability coefficient
standard error of measurement (SEM)
index of the amount of error that can be expected in obtained scores due to the unreliability of the test.
Confidence interval
helps a test user estimate the range within which an examinee’s true score is likely to fall given his or her obtained score. This range is calculated using the standard error of measurement
standard error of measurement (SEM)
an index of the amount of error that can be expected in obtained scores due to the unreliability of the test.
Classical test theory.
Variability is a combination of true score and random measurement error
Kuder-Richardson formula 20 (KR-20)
Does same as Cronbach’s coefficient alpha, but is used as a substitute when test items are scored dichotomously
Validity
Accuracy in terms of the extent to which the test measures what it was designed to measure
* 3 C’s : content, construct, and criterion-related
Content validity
*assesses how well a test samples a particular content area
* Built into the test
* is of concern when test is designed to measure a content or behavior domain
Construct validity
Two subtypes:
*convergent validity
*discriminant validity
Criterion validity (rxy)
*assesses how well a test score can be used to predict or estimate criterion outcome
*scores range from -1.0 to +1.0
*SQUARE THE SCORE
Two subtypes:
*concurrent validity
*predictive validity
Taylor- Russell Tables
Related to hiring decisions
*complete set of tables that provide measure of incremental validity
*tells us how much better diff companies would be doing (in hiring decisions ) if they added a predictor test
Incremental validity
Amount of improvement in success rate that results from using a predictor test
Convergent Validity
Want ↑convergent validity
*
Standard scores
Indicate examinee’S relative standing in comparison group
Z score distribution
T score