An examiner administers and scores the same test numerous times without deviating from the procedure in order to reduce the possibility of measurement error. This exemplifies what?
Standardization
The scores of a representative population sample on a test that an examiner compares an individual's scores to are referred to as \_\_\_\_\_\_\_\_; while they allow for comparisons on a person's performance on different tests, they do not provide the ultimate standard of performance.
Norms
A psychological test that is regarded as \_\_\_\_\_\_\_\_ is administered, scored, and interpreted independent of the subjective judgment of the examiner.
Objective
The SAT and GRE are examples of \_\_\_\_\_\_\_\_ tests, as they provide information about a person's best possible performance, while the MMPI-2 and PAI are \_\_\_\_\_\_\_\_ tests, providing information about a person's usual experience.
Maximum
performance;
typical
performance
________ tests assess the difficulty level
an examinee can attain (e.g., Information
from WAIS), ________ tests assess the
person’s response rate (e.g., Digit
Symbol from WAIS), and ________ tests
help determine whether an individual can
attain a certain level of acceptable
performance (e.g., test of reading skills).
Power;
speed;
mastery
A \_\_\_\_\_\_\_\_ occurs when an instrument cannot take on a value higher than some limit due to the measure not including enough difficult items, resulting in all high-achieving examinees getting similar scores (test is too easy); conversely, a \_\_\_\_\_\_\_\_ occurs when an instrument cannot take on a lower value and thus all low-achieving examinees get similar scores (test is too hard).
Ceiling
effect; floor
effect
In contrast to normative measures, these types of measures require individuals to use their own frame of reference to compare 2 or more desirable options and choose the one that is most preferred.
Ipsative
measures
\_\_\_\_\_\_\_\_ is the consistency of a test, or the degree to which a test provides the same results under the same conditions; \_\_\_\_\_\_\_\_ refers to the degree that a test measures what it claims to be measuring.
Reliability;
validity
A perfectly reliable test would yield every
examinees’ ________ every time it was
administered, as this would indicate the
examinees’ actual ability on whatever the
test is measuring; however, a test is
never perfectly reliable due to ________,
which is random and can be caused by
environmental noise, examinee’s mood
on testing day, and any other number of
factors.
True score;
measurement
error
The most commonly used methods of estimating
reliability of a test use a correlation coefficient,
referred to as the ________, ranging in value
from 0.0 to +1.0, where coefficients closer to 0.0
indicate less reliability and values closer to +1.0
indicate increasing reliability; the coefficient is
not squared to determine the proportion of
variability, unlike other correlation coefficients,
rather it is interpreted directly.
Reliability
coefficient
A researcher administers the same instrument to the same group of college students on 2 separate occasions; following the second administration, the researcher correlates on the first and second administrations. What type of reliability is the researcher attempting to obtain?
Test-retest
reliability (or
“coefficient of
stability”)
TRUE or FALSE: It is not recommended to use the test-retest coefficient when attempting to obtain reliability for a test that measures attributes that are unstable (e.g., mood).
TRUE: Low coefficients, in such cases, would likely be more a reflection of the attribute's unreliability rather than the test's unreliability
A researcher administers one form of a test on one day, then administers an equivalent form to the same group of people at a later date/time. What type of reliability is being sought in this example?
Alternate forms
reliability (or “coefficient
of equivalence;”
parallel-forms reliability)
When correlations are obtained among individual
test items, ________ reliability is being
assessed; the 3 methods for obtaining this
reliability include ________ (involves dividing
test into 2 parts then correlating responses from
the 2 parts), ________ (used when test items are
dichotomously scored- e.g., “true/false”), and
________ (used for tests with multiple-scored
items- e.g., “never/rarely/sometimes/always”).
Internal consistency (or "coefficient of internal consistency"); split-half; Kuder-Richardson Formula 20; Cronbach's coefficient alpha
While the split-half reliability coefficient usually lowers the reliability coefficient artificially, the \_\_\_\_\_\_\_\_ can be used to correct for the effects of shortening the measure.
Spearman-Brown
prediction formula
Measures of internal consistency are not good at assessing reliability for \_\_\_\_\_\_\_\_ tests.
Speed tests, as the
correlation would
be spuriously
inflated
Instruments that rely on rater judgments would be best to have high \_\_\_\_\_\_\_\_ reliability, which is increased when scoring categories are \_\_\_\_\_\_\_\_ and \_\_\_\_\_\_\_\_.
Inter-rater (interscorer); mutually exclusive (a particular behavior belongs to a single category); exhaustive (categories cover all possible responses/behaviors)
The \_\_\_\_\_\_\_\_ estimates the amount of error to be expected in an individual test score and is used to determine a range, referred to as a/an \_\_\_\_\_\_\_\_, within which an examinee's true score will likely fall.
Standard Error of
Measurement;
confidence
interval
What is the
formula for the
standard error of
the measurement?
SDx√1-rxx (SDx = standard deviation of test scores; = reliability coefficient)
What is the probability that a person's true score lies within a range of plus or minus 1 standard error of measurement (SEM) of their obtained score? How about plus or minus 1.96 (2) SEM? And finally, plus or minus 2.58 (2.5) SEM?
68% of the
time; 95% of
the time; 99%
of the time
TRUE or FALSE: Hypothetically, a test with a reliability coefficient of +1.0 would have a standard error of measurement of 0.0.
TRUE: A test
with perfect
reliability will
have no error
The standard error of measurement is \_\_\_\_\_\_\_\_ related to the reliability coefficient (rxx) and \_\_\_\_\_\_\_\_ related to the standard deviation of test scores (SDx).
Inversely;
positively
What reliability
coefficient, when
practical, is the
best to use?
Alternate-forms
Classical test theory states that an observed score reflects \_\_\_\_\_\_\_\_ plus \_\_\_\_\_\_\_\_.
True score
variance;
random error
variance