Test Construction Flashcards

(85 cards)

1
Q

What is the first step in test construction?

A

Specifying the test’s purpose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does item analysis evaluate in test development?

A

The effectiveness of test items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the focus of Classical Test Theory?

A

Test-level information rather than item-level information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is item difficulty (p-value) calculated?

A

P = Total # of examinees Passing the Exam ÷ Total # of Examinees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a high p-value (e.g., 0.85) indicate about an item?

A

The item is easy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a low p-value (e.g., 0.15) indicate about an item?

A

The item is difficult

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the interpretation range for item difficulty?

A

0.00–1.00

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is item discrimination?

A

The ability of a test item to distinguish between high and low scorers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What formula represents the Discrimination Index (D)?

A

D = U – L

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does a high discrimination index indicate?

A

High scorers are more likely to answer the item correctly than low scorers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the range for interpreting discrimination values?

A

1.0 to +1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Item Response Theory (IRT)?

A

A theory that models how individual test items function in relation to a person’s latent trait

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the Item Characteristic Curve (ICC) depict?

A

The probability of a correct response changes with ability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the advantage of IRT over Classical Test Theory?

A

Items behave consistently across sample populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does reliability reflect in test scores?

A

Variability in test scores and consistency of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the formula X = T + E represent?

A

Observed Score (X) = True Score (T) + Error (E)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of the reliability coefficient?

A

To estimate how consistently a test measures a construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does a reliability coefficient of 0.85 indicate?

A

85% of the variability in test scores is due to true differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the Test-Retest method for estimating reliability?

A

Same test given twice, scores are correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Split-half reliability?

A

Dividing the test in two halves and correlating the scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does Cronbach’s coefficient alpha measure?

A

Internal consistency of items measuring a single construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the Kuder-Richardson Formula 20 (KR-20)?

A

A variation of coefficient alpha for dichotomously scored items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the purpose of inter-rater reliability?

A

To ensure consistent scoring across different evaluators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the Kappa statistic used for?

A

Measuring inter-rater reliability for nominal or ordinal scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does Percent Agreement calculate?
The proportion of times raters give the same score
26
How does test length affect reliability?
Longer tests generally yield larger reliability coefficients
27
What is the effect of guessing on reliability coefficients?
Higher guessing probability lowers the reliability coefficient
28
What does Standard Error of Measurement (SEM) indicate?
Variability caused by measurement error
29
How is SEM calculated?
SEM = SDx x √(1 – rxx)
30
What does a smaller SEM suggest?
The test score is more likely to be close to the true score
31
What does a reliability coefficient range from?
0 to 1
32
What is the interpretation of a reliability coefficient of 0.80 or higher?
Most tests are considered acceptably reliable
33
What is the probability of a true score lying within +/- 1 SEM?
68%
34
What does the Standard Error of the Difference Between Two Scores estimate?
Observed score differences between two individuals
35
What is the probability that a true score lies within the range of plus/minus 1 SEM?
68%
36
What is the probability that a true score lies within the range of plus/minus 1.96 SEM?
95%
37
What is the probability that a true score lies within the range of plus/minus 2.58 SEM?
99%
38
If the reliability coefficient is +1.0, what does the standard error of measurement equal?
0
39
What does the Standard Error of the Difference Between Two Scores estimate?
How much observed score differences might be due to measurement error.
40
What is the formula for SEdiff when both scores come from the same test?
SEdiff = √(2 x SEM²)
41
What is the formula for SEdiff when scores come from different tests?
SEdiff = √(SEM₁² + SEM₂²)
42
When can you be about 95% confident that the difference between two scores reflects a real change?
If the difference exceeds 2 × SEdiff.
43
What does validity refer to in the context of a test?
A test's accuracy in measuring what it is designed to measure.
44
What does content validity measure?
Whether the test adequately represents the domain of interest.
45
What is the difference between content validity and face validity?
Content validity is a systematic evaluation by experts; face validity refers to whether the test 'looks like' it measures what it is intended to measure.
46
What does construct validity evaluate?
Whether a test has the predicted relationship with other variables.
47
What are the two types of construct validity?
* Convergent Validity * Discriminant Validity
48
What does convergent validity indicate?
The test correlates with same or similar constructs.
49
What does discriminant validity indicate?
The test does not correlate with measures of unrelated constructs.
50
What is the Multitrait-Multimethod Matrix used for?
To organize data for assessing a test’s convergent and discriminant validity.
51
What kind of validity does concurrent validity measure?
Measures the test and criterion at about the same time.
52
What is predictive validity?
Criterion measured some time after the predictor has been assessed.
53
What does the Standard Error of Estimate (SEE) measure?
The accuracy of predictions made by your test.
54
What is the formula for SEE?
SEE = SDy x √(1 – rxy²)
55
What is incremental validity?
The added predictive value a new measure provides beyond existing tools.
56
What is the formula for calculating incremental validity?
Incremental Validity = Positive Hit Rate – Base Rate
57
What does sensitivity measure?
How accurately the test identifies true positives.
58
What does specificity measure?
How accurately the test identifies true negatives.
59
What is the Positive Predictive Value (PPV)?
The probability that someone does have the disorder when they test positive.
60
What is the Negative Predictive Value (NPV)?
The probability that someone does not have the disorder when they test negative.
61
What is the relationship between validity and reliability?
A test’s reliability places a ceiling on its validity.
62
What does the formula Rxy:: √rxx indicate?
The predictor’s criterion-related validity coefficient cannot exceed the square root of its reliability coefficient.
63
What is the effect of low reliability on validity?
A test with low reliability cannot have a high degree of validity.
64
What is the Correction for Attenuation?
A method to estimate the validity of a test by correcting for reliability issues.
65
What limits the validity of a predictor?
The reliability of the predictor and the criterion ## Footnote Validity is capped by the square root of the product of the predictor's and criterion's reliabilities.
66
What is the purpose of Correction for Attenuation?
To estimate a predictor’s validity coefficient if both the predictor and criterion were perfectly reliable ## Footnote It adjusts for measurement error to reveal the true strength of the relationship.
67
What information is needed for the Correction for Attenuation formula?
* The predictor’s current reliability coefficient * The criterion’s current reliability coefficient * The criterion-related validity coefficient
68
What is Criterion Contamination?
Occurs when rater’s knowledge of predictor performance influences their rating on the criterion measure ## Footnote This tends to artificially inflate the correlation between scores on the predictor and criterion.
69
What is Cross-Validation?
The process of re-assessing a test’s criterion-reliability on a new sample ## Footnote It checks the generalizability of the original validity coefficient.
70
What does Shrinkage refer to in validity coefficients?
The validity coefficient becomes smaller on cross-validation ## Footnote This happens because chance factors in the original sample are not all present in the cross-validation sample.
71
What are Raw scores?
Scores with limited meaning until tied to criterion-referenced or norm-referenced interpretations.
72
What is Norm-Referenced Interpretation?
Comparing an examinee’s scores to scores obtained by people included in a normative sample ## Footnote It identifies individual differences.
73
What do Percentile Ranks (PR) express?
An examinee’s raw score in terms of the % of examinees in the norm sample with lower scores.
74
What is Nonlinear transformation?
Used to adjust raw scores in ways that change the shape of the score distribution.
75
What are Standard Scores?
Transformed scores that allow meaningful comparison across individuals, tests, or populations.
76
What are Z-Scores?
Standard scores with a mean of 0 and SD of 1.
77
What is the formula for T-Scores?
T = (z x 10) + 50
78
What is the mean and SD for IQ Scores?
Mean=100; SD=15.
79
What is the purpose of Criterion-Referenced Interpretation?
Interpreting scores in terms of a prespecified standard.
80
What is the Expectancy Table used for?
To estimate the probability that an examinee will attain different scores on a criterion based on their predictor score.
81
What is Factor Analysis?
A multivariate statistical technique used to determine how many factors are needed to account for the intercorrelations among test items.
82
What is Communality in Factor Analysis?
The total amount of variability in scores on the item that is accounted for by the factor analysis.
83
What are the two types of rotation in Factor Analysis?
* Orthogonal (uncorrelated) * Oblique (correlated)
84
What indicates the amount of variability accounted for by each component in Principal Components Analysis?
Eigenvalues.
85
What is a common rule for retaining components in Principal Components Analysis?
Only retain components that have eigenvalues greater than 1.