What is the first step in test construction?
Specifying the test’s purpose
What does item analysis evaluate in test development?
The effectiveness of test items
What is the focus of Classical Test Theory?
Test-level information rather than item-level information
How is item difficulty (p-value) calculated?
P = Total # of examinees Passing the Exam ÷ Total # of Examinees
What does a high p-value (e.g., 0.85) indicate about an item?
The item is easy
What does a low p-value (e.g., 0.15) indicate about an item?
The item is difficult
What is the interpretation range for item difficulty?
0.00–1.00
What is item discrimination?
The ability of a test item to distinguish between high and low scorers
What formula represents the Discrimination Index (D)?
D = U – L
What does a high discrimination index indicate?
High scorers are more likely to answer the item correctly than low scorers
What is the range for interpreting discrimination values?
1.0 to +1.0
What is Item Response Theory (IRT)?
A theory that models how individual test items function in relation to a person’s latent trait
What does the Item Characteristic Curve (ICC) depict?
The probability of a correct response changes with ability
What is the advantage of IRT over Classical Test Theory?
Items behave consistently across sample populations
What does reliability reflect in test scores?
Variability in test scores and consistency of measurement
What does the formula X = T + E represent?
Observed Score (X) = True Score (T) + Error (E)
What is the purpose of the reliability coefficient?
To estimate how consistently a test measures a construct
What does a reliability coefficient of 0.85 indicate?
85% of the variability in test scores is due to true differences
What is the Test-Retest method for estimating reliability?
Same test given twice, scores are correlated
What is Split-half reliability?
Dividing the test in two halves and correlating the scores
What does Cronbach’s coefficient alpha measure?
Internal consistency of items measuring a single construct
What is the Kuder-Richardson Formula 20 (KR-20)?
A variation of coefficient alpha for dichotomously scored items
What is the purpose of inter-rater reliability?
To ensure consistent scoring across different evaluators
What is the Kappa statistic used for?
Measuring inter-rater reliability for nominal or ordinal scales