Test Construction Flashcards

Question

What do positive and negative predictive values represent?

Answer 1

PPV = probability a person with a positive test truly has the condition. NPV = probability a person with a negative test truly does not. Both depend on base rate of the condition.

Answer 2

True positive = correctly identified case; false positive = incorrectly identified case; true negative = correctly excluded case; false negative = missed case. Used to evaluate decision accuracy of a test.

Answer 3

Shows the likelihood of successful performance (criterion) for each score range on a predictor, aiding cutoff and hiring decisions.

Answer 4

When the base rate of success is very high or very low, even valid tests can yield misleading rates of true vs. false positives.

Answer 5

Proportion of examinees who answered correctly (0–1). p = .50 provides maximum discrimination between high and low scorers.

Answer 6

How well an item distinguishes high scorers from low scorers. Typically computed as (upper group % correct − lower group % correct). Higher D = better item.

Answer 7

Adjusts scores to penalize random guessing. Reduces mean and increases SD of the distribution.

Answer 8

The proportion of a variable’s variance explained by all extracted factors. Communality = Σ (factor loadings²).

Answer 9

Simplifies interpretation by making factor loadings clearer. Orthogonal rotation → uncorrelated factors. Oblique rotation → correlated factors.

Answer 10

The correlation between a test item and an underlying factor. High loading = item strongly measures that factor.

Answer 11

Orthogonal assumes factors are independent (uncorrelated). Oblique allows factors to correlate.

Answer 12

Standardized scores showing distance from the mean in SD units. Formula: z = (X − M)/SD. Mean = 0, SD = 1.

Answer 13

T-scores are transformed z-scores scaled to Mean = 50, SD = 10. Used for ease of interpretation and to avoid negatives.

Answer 14

The percentage of the norm group that scored below a given score. Distribution is rectangular (flat), not normal.

Answer 15

Cutoffs determine pass/fail or hire/no-hire thresholds. Optimized by balancing false positives and false negatives to maximize true decisions.

Answer 16

Utility analysis estimates financial or practical value of using a selection test. Selection ratio = number hired ÷ number of applicants; lower ratios improve decision accuracy.

Answer 17

It estimates the effect of increasing or decreasing test length on a test’s reliability coefficient. Used in test construction and split-half reliability adjustments. Key principle: Longer tests generally yield higher reliability (up to a point). Not to be confused with: Correction for attenuation: estimates true validity if measures were perfectly reliable. Standard error of measurement: estimates true score range.

Answer 18

Definition: Communality = proportion of a test’s variance explained by the common factors (shared variance). Formula (for orthogonal/unrelated factors): ℎ 2 = ∑ ( factor loadings ) 2 h 2 =∑(factor loadings) 2 Example: If Test A loads .40 on Factor I and .30 on Factor II → ℎ 2 = .40 2 + .30 2 = .16 + .09 = .25 h 2 =.40 2 +.30 2 =.16+.09=.25. Interpretation: 25% of Test A’s variance is explained by the factors; 75% is unique or error variance.

Answer 19

The coefficient of determination (r²) shows how much variance in the criterion is explained by the predictor. 𝑟 = .70 ⇒ 𝑟 2 = .49 r=.70⇒r 2 =.49. Therefore, 49% of the variability in criterion scores is explained by the predictor. The remaining 51% reflects other factors or error.

Answer 20

Coefficient of internal consistency. Uses one administration to assess consistency across items, not across time. Suitable for traits that vary (e.g., anxiety, mood). Not appropriate: Coefficient of stability (test–retest): assumes the trait is stable. Coefficient of equivalence: uses alternate forms over time. Coefficient of determination: measures shared variance, not reliability.

Answer 21

Formula: CI = Obtained score ± (1 × SEM) for 68% confidence. Calculation: 103 ± 5 → 98 to 108. Interpretation: There is a 68% chance the examinee’s true score lies between 98 and 108. Note: 95% CI = ±1.96 × SEM 99% CI = ±2.58 × SEM

Answer 22

Another term for test–retest reliability. Measures the consistency of scores across time by administering the same test twice to the same group. Appropriate for stable traits (e.g., intelligence). Not for: fluctuating states (use internal consistency instead). Key distinction: Coefficient of equivalence: compares different forms. Coefficient of stability: compares same test, different times.

Answer 23

Meaning: Proportion of a test’s variance explained by the common factors. Orthogonal formula: ℎ2=∑(loading)222=∑(loading)2. Compute: .402+.302=.16+.09=.25.402+.302=.16+.09=.25. Interpretation: 25% of Test A’s variance is common; the rest is uniqueness 𝑢2=1−ℎ2=.75u2=1−h2=.75 (specific + error). If factors were oblique: use the structure and factor intercorrelation (Φ) matrix: ℎ2=𝑙′Φ 𝑙h2=l′Φl (not simple squaring). Pitfall: Don’t add raw loadings; always square then sum for orthogonal factors.

Answer 24

Use the Coefficient of Internal Consistency (e.g., Cronbach’s alpha). Requires only one administration of the test. Assesses the consistency among items within a single testing session. Suitable when the construct changes over time. Avoid: Coefficient of Stability (test–retest): assumes trait is stable. Coefficient of Equivalence: requires two forms at different times. Coefficient of Determination: measures shared variance, not reliability.

Answer 25

Formula: 𝑟2=(.70)2=.49 r2=(.70)2=.49 Interpretation: 49% of the variability in criterion scores is explained by the predictor. The remaining 51% reflects other influences or measurement error. Concept: r2 is the coefficient of determination, representing shared variance between predictor and criterion.

Answer 26

You need the examinee’s score and the Standard Error of Measurement (SEM). Formula: CI = Obtained score ± (1 × SEM) for 68% confidence. SEM is derived from the test’s standard deviation and reliability coefficient. Note: Standard error of estimate → used for predicted criterion scores. Test mean and SD alone are insufficient without SEM.

Answer 27

It reflects the examinee’s status on a latent trait or ability (θ). IRT models the relationship between item responses and the underlying trait being measured (e.g., intelligence, anxiety). Focuses on item characteristics (difficulty, discrimination) rather than total test score. Contrast: Classical Test Theory: interprets scores relative to total performance. Norm-referenced: compares to others. Criterion-referenced: compares to preset standards.

Answer 28

A small heterotrait–monomethod coefficient. Indicates low correlation between different traits measured by the same method → evidence the test is not measuring unrelated constructs. Contrast: Large monotrait–heteromethod: evidence of convergent validity (same trait, different methods). Large heterotrait–monomethod: signals poor divergent validity (too much overlap between different traits).

Answer 29

z=126−10610=+2.0z = \frac{126 - 106}{10} = +2.0z=10126−106=+2.0 → two standard deviations above the mean. A z-score of +2.0 corresponds to the 98th percentile. Interpretation: The examinee scored higher than about 98% of the population. Reference values: +1 SD → 84th percentile +2 SD → 98th percentile +3 SD → 99.9th percentile

Answer 30

It will produce more false positives than false negatives. Reason: When few people actually have the disorder, most tested individuals are healthy. Even a small false-positive rate applied to a large healthy group creates many false positives. Example: With a 1% base rate in 10,000 people → 98 true positives, 2 false negatives 9,702 true negatives, 198 false positives Takeaway: Low base rate → poor positive predictive value, even with high test accuracy.

Answer 31

A factor loading is the correlation between a variable (test) and a factor. To find the shared variance, square the loading: .702=.49.70^2 = .49.702=.49. Therefore, 49% of Test A’s variance is explained by Factor II. Interpretation: The remaining 51% reflects unique variance and measurement error.

Answer 32

Order (lowest → highest): z = +.75 → 0.75 SD above mean Percentile rank = 84 → ≈ +1 SD above mean T = 65 → +1.5 SD above mean Key conversions: T-score: 𝑇=50+(𝑧×10) Percentile rank: 68 ≈ +1 SD = 84th percentile Interpretation: Converting all values to standard deviation units (z) allows valid comparison across scoring systems.

Answer 33

D = 0 means equal proportions of high- and low-achieving examinees answered the item correctly. The item does not discriminate between strong and weak test-takers. Range: –1.0 to +1.0 Positive D: high scorers > low scorers → good item. Negative D: low scorers > high scorers → flawed item. Goal in test construction: maximize positive D values (preferably ≥ .30).

Answer 34

True–false tests have the lowest reliability because of the high probability of guessing correctly (50%). Reliability decreases as chance success increases. Guessing probabilities: True–false → 50% 3-option multiple choice → 33% 7-option multiple choice → 14% Free recall → ≈0% Principle: More possible response options → lower chance of random correctness → higher reliability.

Test Construction Flashcards

(58 cards)