Test Construction Flashcards

(103 cards)

1
Q

What does adequate reliability in a test indicate?

A

Test scores can be expected to be consistent

Adequate reliability does not imply that the test measures what it was designed to measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is validity traditionally defined as?

A

The degree to which a test accurately measures what it was designed to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three types of validity?

A
  • Content Validity
  • Construct Validity
  • Criterion-Related Validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does the AERA, APA, & NCME (2014) define validity?

A

The degree to which evidence and theory support the interpretation of test scores for proposed uses of tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the five sources of validity evidence identified?

A
  • Evidence based on test content
  • The response process
  • The internal structure of the test
  • Relationships with other variables
  • The consequences of testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is content validity?

A

Evidence that a test measures one or more content or behavior domains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is content validity established?

A

By clearly defining the domain to be assessed and including items that are a representative sample of that domain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is face validity?

A

The extent to which test items ‘look valid’ to examinees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False: Face validity is an actual type of validity.

A

False

Face validity is important but not a formal type of validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does construct validity refer to?

A

Evidence that a test measures a hypothetical trait inferred from behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are convergent and divergent validity?

A
  • Convergent Validity: High correlations with scores on related constructs
  • Divergent Validity: Low correlations with scores on unrelated constructs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the multitrait-multimethod matrix?

A

A table of correlation coefficients providing information about a test’s reliability and validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the monotrait-monomethod coefficient?

A

A reliability coefficient for the test being validated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a large monotrait-heteromethod coefficient indicate?

A

Evidence of the test’s convergent validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a small heterotrait-monomethod coefficient indicate?

A

Evidence of the test’s divergent validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is factor analysis used for?

A

Assessing a test’s convergent and divergent validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the four basic steps in factor analysis?

A
  • Administer the test and related traits
  • Correlate scores and list in a correlation matrix
  • Derive the initial factor matrix
  • Rotate the factor matrix and interpret
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a factor loading?

A

A correlation coefficient indicating the correlation between each test and each identified factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How is communality calculated?

A

By squaring and adding the factor loadings when factors are orthogonal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does a high correlation with the factor indicate in factor analysis?

A

Evidence of convergent validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does a low correlation with the factor indicate in factor analysis?

A

Evidence of divergent validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the purpose of rotating the factor matrix?

A

To produce data that is easier to interpret

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the significance of naming factors in factor analysis?

A

It aids in interpreting the relationships between tests and underlying constructs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is criterion-related validity?

A

The degree to which scores on a test predict or estimate scores on another measure.

It is crucial for tests used in contexts like hiring decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are the two types of criterion-related validity?
* Concurrent Validity * Predictive Validity
26
What is concurrent validity?
Evaluated by obtaining scores on the predictor and criterion at about the same time. ## Footnote Important for estimating current status.
27
What is predictive validity?
Evaluated by obtaining scores on the predictor before obtaining scores on the criterion. ## Footnote Important for estimating future status.
28
What does the criterion-related validity coefficient range from?
-1 to +1
29
What does a higher criterion-related validity coefficient indicate?
More accurate predictor scores for predicting criterion scores.
30
How can the amount of variability in one measure explained by another be determined?
By squaring the criterion-related validity coefficient.
31
What is shrinkage in the context of cross-validation?
The phenomenon where the correlation coefficient for a new sample is likely to be smaller than the original coefficient.
32
What is the standard error of estimate?
A measure of prediction error in criterion-related validity studies.
33
How is a 95% confidence interval constructed?
By adding and subtracting two standard errors of estimate from the predicted criterion score.
34
What is the formula for calculating the standard error of estimate?
Standard deviation of the criterion measure times the square root of (1 - validity coefficient squared).
35
What happens to the standard error of estimate when the validity coefficient is +1 or -1?
The standard error is 0.
36
What does correction for attenuation address?
The impact of measurement error on the magnitude of the criterion-related validity coefficient.
37
What is incremental validity?
The increase in prediction accuracy by adding a new predictor to existing methods.
38
What method can be used to estimate incremental validity?
Using the Taylor-Russell tables.
39
What are true positives?
Recently hired employees who obtained high scores on both the predictor and criterion.
40
What are false positives?
Recently hired employees who obtained high scores on the predictor but low scores on the criterion.
41
What are true negatives?
Recently hired employees who obtained low scores on both the predictor and criterion.
42
What are false negatives?
Recently hired employees who obtained low scores on the predictor but high scores on the criterion.
43
How is the base rate calculated?
By dividing the number of employees with high criterion scores by the total number of employees.
44
What is diagnostic efficiency?
The ability of a test to correctly distinguish between people who do and do not have a disorder.
45
What does sensitivity measure?
The proportion of people with the disorder identified by the test.
46
What does specificity measure?
The proportion of people without the disorder identified by the test.
47
What is the hit rate?
The proportion of people correctly categorized by the test.
48
What is the positive predictive value?
The probability that a person who tests positive actually has the disorder.
49
What is the negative predictive value?
The probability that a person who tests negative does not have the disorder.
50
What affects the positive and negative predictive values?
The prevalence of the disorder in each setting.
51
How does reliability affect validity?
A predictor's reliability places a ceiling on its validity.
52
What relationship exists between a predictor's validity coefficient and its reliability index?
The validity coefficient can be no greater than its reliability index.
53
What is the reliability index?
The square root of the predictor's reliability coefficient.
54
What are norm-referenced scores?
Scores that indicate how well an examinee did on the test compared to individuals in the standardization sample ## Footnote They are designed to make distinctions among individuals or groups in terms of the ability or trait assessed by a test.
55
What is the primary objective of using norm-referenced scores?
To make distinctions among individuals or groups in terms of the ability or trait assessed by a test ## Footnote (Urbina, 2014, p. 212)
56
What are percentile ranks?
Indicates the percentage of examinees in the reference group who scored at or below the score obtained by an examinee ## Footnote For example, a percentile rank of 82 means that 82% of examinees scored 75 or lower.
57
How is the conversion of raw scores to percentile ranks described?
As a nonlinear transformation ## Footnote This is because the percentile rank distribution is always rectangular regardless of the raw score distribution.
58
What are standard scores?
Scores indicating how well an examinee did on a test in terms of standard deviations from the mean score of the reference group ## Footnote Includes z-scores, T-scores, IQ scores, and stanines.
59
What transformation is used to convert raw scores to standard scores?
Linear transformation ## Footnote The distribution of standard scores has the same shape as the raw score distribution.
60
What is a z-score?
A standard score with a mean of 0 and standard deviation of 1.0 ## Footnote It expresses an examinee’s score in terms of standard deviations from the mean.
61
How is a z-score calculated?
z = (X – M)/SD ## Footnote Where X is the raw score, M is the mean, and SD is the standard deviation.
62
What does a T-score of 40 indicate?
That the raw score is one standard deviation below the mean ## Footnote T-scores have a mean of 50 and standard deviation of 10.
63
What is the mean and standard deviation for full-scale IQ scores on the SB-5 and Wechsler tests?
Mean of 100 and standard deviation of 15 ## Footnote An IQ score of 85 indicates one standard deviation below the mean.
64
What are stanines?
Standard scores with a mean of 5 and standard deviation of 2, ranging from 1 to 9 ## Footnote Each stanine represents one-half of a standard deviation except for the extremes.
65
What is the primary objective of using criterion-referenced scores?
To evaluate a person’s or group’s degree of competence or mastery in terms of a preestablished standard ## Footnote (Urbina, 2014, p. 121)
66
What do percentage scores indicate?
The percentage of test items answered correctly ## Footnote For instance, answering 75 of 150 items correctly results in a percentage score of 50%.
67
What are expectancy tables used for?
To provide information on an examinee’s expected score on another measure based on their obtained test score ## Footnote They predict future criterion scores based on predictor scores.
68
What is a cutoff score?
A predetermined score that distinguishes between mastery and non-mastery ## Footnote For example, a cutoff of 80% correct identifies those who have achieved mastery.
69
How does ranking work in selection decisions?
Candidates are ranked from highest to lowest based on their test scores ## Footnote Candidates are selected from the top down until the desired number is chosen.
70
What is banding in the context of test scores?
Grouping test scores into bands determined by the test’s standard error of measurement ## Footnote Scores within each band are considered equivalent.
71
Why do advocates support banding?
It helps reduce adverse impact by including members of minority groups who tend to receive lower test scores ## Footnote Candidates within a band are selected based on experience or skills rather than test scores alone.
72
What is Classical Test Theory (CTT)?
A theory of measurement used for developing and evaluating tests, also known as true score test theory.
73
What is the formula for obtained test scores in Classical Test Theory?
X = T + E, where X is the obtained score, T is the true score, and E is the measurement error.
74
What does true score variability represent in CTT?
Actual differences among examinees regarding what the test is measuring.
75
What is measurement error in the context of CTT?
Random factors that affect test performance in unpredictable ways.
76
Define test reliability.
The extent to which a test provides consistent information.
77
What does a reliability coefficient indicate?
The amount of variability in obtained test scores due to true score variability.
78
What is the range of reliability coefficients?
0 to 1.0.
79
What reliability coefficient is considered minimally acceptable for many tests?
.70 or higher.
80
List the four main methods for assessing a test's reliability.
* Test-Retest Reliability * Alternate Forms Reliability * Internal Consistency Reliability * Inter-Rater Reliability
81
What does test-retest reliability measure?
The consistency of scores over time.
82
What is alternate forms reliability?
The consistency of scores over different forms of the test.
83
What is internal consistency reliability?
The consistency of scores over different test items.
84
What is coefficient alpha, also known as Cronbach's alpha?
A method for evaluating internal consistency reliability.
85
What does split-half reliability involve?
Splitting the test in half and correlating the scores on the two halves.
86
What is inter-rater reliability?
The consistency of scores or ratings assigned by different raters.
87
What is Cohen's kappa coefficient?
An inter-rater reliability coefficient corrected for chance agreement.
88
What is consensual observer drift?
When raters communicate while assigning ratings, leading to increased consistency but decreased accuracy.
89
What factor affects the reliability coefficient related to content?
Content Homogeneity.
90
How does the range of scores affect reliability coefficients?
Reliability coefficients are larger when test scores are unrestricted in range.
91
What effect does guessing have on reliability coefficients?
Easier guessing leads to lower reliability coefficients.
92
What is the difference between reliability index and reliability coefficient?
Reliability index is the theoretical correlation between observed and true test scores.
93
How is item difficulty for dichotomously scored items calculated?
p = Number of correct answers / Total number of examinees.
94
What is an acceptable item discrimination index (D) value?
A D value of .30 or higher.
95
What does the standard error of measurement indicate?
The range within which an examinee's true score is likely to be given their obtained score.
96
How do you calculate the standard error of measurement?
Multiply the test's standard deviation by the square root of 1 minus the reliability coefficient.
97
What are the confidence intervals for test scores based on the standard error of measurement?
68%: ±1 SEM, 95%: ±2 SEM, 99%: ±3 SEM.
98
What is Item Response Theory (IRT)?
An alternative to CTT that focuses on individual test items rather than total test scores.
99
What does IRT allow for in test development?
Determining the probability of a specific examinee correctly answering any test item.
100
What are the three item parameters depicted in an item characteristic curve (ICC)?
* Difficulty parameter * Discrimination parameter * Probability of guessing correctly
101
What does the difficulty parameter in IRT indicate?
The level of the trait required for a 50% probability of answering the item correctly.
102
How is the discrimination parameter indicated in IRT?
By the slope of the ICC.
103
What does the point where the ICC crosses the y-axis represent?
The probability of guessing correctly.