Assessment Flashcards

(571 cards)

1
Q

Clinical interviewing can be…

A

Structured, semi-structured, and unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Informal assessment

A

Observation of behavior, rating scales, classification techniques, records, and personal documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Personality assessment

A

Standardized tests (e.g., MMPI), projective tests (e.g., TAT), and interest inventories (e.g., Strong Interest Inventory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Examples of Ability assessment

A

Achievement tests (e.g., WRAT), aptitude tests (e.g., SAT), and intelligence tests (e.g., WISC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

According to the history of assessment, what civilization developed the first widely used tests around 2300 B.C.E.?

A

The ancient Chinese, who used physical fitness and endurance tests to screen candidates for government civil service positions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What was the primary purpose of the ancient Chinese assessment system (circa 2300 B.C.E.)?

A

To select qualified individuals for government service using physically demanding and often brutal examinations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why are the ancient Chinese civil service exams significant in assessment history?

A

They represent the first large-scale, systematic use of testing to make decisions about selection and placement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How did the 19th century influence the development of modern assessment practices?

A

The 19th century introduced scientific measurement, standardization, and early psychometric principles that shaped modern testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What role did early pioneers of assessment play during the 19th century?

A

They laid the foundation for intelligence, aptitude, personality, and interest testing, influencing current assessment practices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What major focus dominated assessment development during the early 20th century?

A

The scientific measurement of intelligence, driven by a growing interest in objectively quantifying human abilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What major limitation of early intelligence tests became apparent in the 20th century?

A

They failed to account for the diversity of human intelligence, often reflecting cultural and contextual bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What types of assessments emerged in response to the limitations of intelligence testing?

A

Tests measuring aptitude, personality, and interests, allowing for a broader understanding of individual differences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why did aptitude testing become important in modern assessment?

A

Aptitude tests assess potential for learning or future performance, rather than general intelligence alone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why were personality and interest assessments developed in the 20th century?

A

To better understand behavioral tendencies, preferences, and vocational fit, which intelligence tests could not capture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How did 20th-century developments shape present-day assessment practices?

A

They established the use of multiple assessment domains (intelligence, aptitude, personality, interests) to form a comprehensive evaluation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the overall historical trend in the evolution of assessment?

A

A shift from single-trait, physically demanding tests to multidimensional, scientifically grounded psychological assessments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

According to the history of assessment, what civilization developed the first widely used tests around 2300 B.C.E.?

A

The ancient Chinese, who used physical fitness and endurance tests to screen candidates for government civil service positions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What was the primary purpose of the ancient Chinese assessment system (circa 2300 B.C.E.)?

A

To select qualified individuals for government service using physically demanding and often brutal examinations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why are the ancient Chinese civil service exams significant in assessment history?

A

They represent the first large-scale, systematic use of testing to make decisions about selection and placement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Jean Esquirol

A

Jean Esquirol (1772–1840) used language development to identify varying levels of intelligence. His work is considered a forerunner of verbal IQ. He is credited with recognizing that intellectual disability (at the time called mental retardation) was related to developmental deficiencies rather than mental illness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Edouard Seguin

A

Edouard Seguin (1812–1880) developed the form board, which improved the motor skills of individuals with intellectual disability. The form board is considered a predecessor to performance IQ testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Sir Francis Galton

A

Sir Francis Galton (1822–1911) was a biologist credited with launching the testing movement and developing the first test of intelligence. He pioneered the use of rating-scale and questionnaire methods and developed the correlation coefficient through his work in examining the relationship between reaction time, grip strength, and intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

William Wundt

A

William Wundt (1832–1920) founded one of the first psychological laboratories to conduct experimental research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

James Cattell

A

James Cattell (1860–1944) was one of the first to apply statistical concepts to psychological assessment. Cattell popularized the term mental test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Hermann Ebbinghaus
Hermann Ebbinghaus (1850–1909) studied human memory and is well known for his work on the forgetting curve. He administered mental tests to school-age children and was able to show that his sentence completion test was related to scholastic achievement.
26
Alfred Binet
(1875–1911) developed the first modern intelligence test, the Binet-Simon scale, with Theophile Simon.
27
Lewis Terman
(1877–1956) revised the Binet-Simon scale, naming the enhanced version the Stanford-Binet Intelligence Test.
28
Stanford-Binet Intelligence Test
The first intelligence test to incorporate the intelligence quotient (ratio IQ), which is chronological age divided by mental age.
29
Arthur Otis
(1886–1964) devised the first scientifically reliable measure for testing the intelligence of individuals in groups. The assessment was called the Otis Group Intelligence Scale.
30
Robert Yerkes
(1876–1956) used Otis’s group intelligence instrument to develop the Army Alpha and Army Beta group intelligence tests.
31
Army Alpha
Designed to screen the cognitive ability of military recruits. The intelligence measure was eventually revised for civilian use.
32
Army Beta
The language-free version of the test designed for recruits who could not read or were foreign-born.
33
Charles Spearman
(1863–1945) and L. L. Thurston (1887–1955) developed a statistical test known as factor analysis, which led to the development of multiple aptitude testing.
34
James Bryant Conant
(1893–1978), in conjunction with the Educational Testing Service (ETS), developed the Scholastic Aptitude Test (SAT).
35
Edward Thorndike
(1874–1949) developed the first achievement test battery, the Stanford Achievement Test (SAT), which provided an objective measure of academic performance and could be administered to large groups of students.
36
Robert Sessions Woodworth
(1896–1962) developed Woodworth’s Personal Data Sheet, an emotional stability-screening test for World War I military recruits. It was the first standardized personality inventory.
37
Starke Hathaway
(1903–1984) and J. Charnley McKinley (1891–1950) developed the Minnesota Multiphasic Personality Inventory (MMPI), an objective measure of personality structure.
38
MMPI-2
The second version of the Minnesota Multiphasic Personality Inventory, now the personality test most widely used to identify and diagnose psychopathology.
39
Carl Jung
(1875–1961), Herman Rorschach (1884–1922), and Henry Murray (1893–1988) developed projective techniques (Jung’s word associations, Rorschach’s inkblots, and Murray’s Thematic Apperception Test, respectively) to assess personality.
40
Frank Parsons
(1854–1908) was the father of vocational guidance and counseling. His work gave birth to the development of vocational and interest inventories.
41
Edward Strong
(1884–1963) devised the Strong Vocational Interest Blank, which is known today as the Strong Interest Inventory.
42
Strong Interest Inventory
Remains among the most widely used and researched vocational measure in career counseling.
43
Definition of Measurement
Measurement is the process of defining and estimating the magnitude of human attributes and behavioral expressions using standardized instruments.
44
Assumption 1 of Measurement
Human attributes and behaviors are distinct enough to be objectively defined and quantified.
45
Assumption 2 of Measurement
All human attributes and behavioral expressions exist in all people.
46
Assumption 3 of Measurement
The presence or absence of attributes or behaviors in certain situations indicates normalcy or deficiency.
47
Instruments Used in Measurement
Measurement instruments such as tests, surveys, and inventories.
48
Definition of Assessment
Assessment is a broad, systematic process of gathering and documenting client information.
49
Difference Between a Test and an Assessment
• A test is a subset of assessment, providing data from responses to test items. • Assessment encompasses the entire process of collecting and integrating information.
50
Definition of a Test
A test is a standardized instrument used to yield data about an examinee’s responses to specific items.
51
Definition of Interpretation
Interpretation is when the counselor assigns meaning to test or assessment data using norms, criteria, or professional judgment.
52
What are thee different Basis for Interpretation
1. Comparing to a peer group (norm-referenced) 2. Using predetermined criteria (criterion-referenced) 3. Applying professional judgment
53
Definition of Evaluation
Evaluation is determining the worth, significance, or progress based on measurement results.
54
Example of Evaluation
Examining a client’s monthly Beck Depression Inventory scores to determine progress over time.
55
Purpose of Evaluation
To assess client progress and determine the effectiveness of interventions, programs, or services.
56
Purpose of Limiting Perfect Scores
To differentiate between individuals by highlighting differences in ability, performance, or characteristics.
57
Power Tests
A power test limits perfect scores by including very difficult items and measures how well a test-taker performs regardless of time limits.
58
What Power Tests Measure
The level of ability or knowledge a test-taker has when given sufficient time.
59
Speed Tests
A speed test limits perfect scores by imposing strict time limits, not item difficulty.
60
What Speed Tests Measure
How quickly a test-taker can understand questions and respond accurately.
61
Power vs. Speed Tests
• Power tests → difficulty limits scores, time is not emphasized • Speed tests → time limits scores, items are usually easy
62
Maximal Performance Tests
A test designed to measure a client’s best possible or highest attainable performance.
63
Examples of Maximal Performance Tests
Achievement tests and aptitude tests.
64
Typical Performance Tests
A test that measures characteristic or usual behavior, not one’s best effort.
65
Example of Typical Performance Testing
Personality tests, which assess normal patterns of behavior, thoughts, and emotions.
66
Standardized Tests
A test with uniform administration, scoring, and interpretation procedures.
67
Key Features of Standardized Tests
• Predetermined instructions • Objective scoring • Established reliability and validity • Comparison to a norm group
68
Examples of Standardized Tests
The SAT and GRE.
69
Nonstandardized Tests
A test that allows flexibility in administration, scoring, and interpretation.
70
Limitation of Nonstandardized Tests
Scores cannot be compared to a norm group, requiring reliance on professional judgment.
71
Examples of Nonstandardized Tests
Projective tests such as the Rorschach Inkblot Test and the Thematic Apperception Test (TAT).
72
Individual Tests
A test administered to one examinee at a time.
73
Advantages of Individual Tests
• Builds rapport • Allows close observation • Counselor can monitor fatigue, anxiety, and motivation
74
Disadvantages of Individual Tests
They are time-consuming and more costly.
75
Group Tests
A test administered to two or more examinees at the same time.
76
Advantages of Group Tests
• Economical • Efficient administration • Objective scoring • Established norms
77
Disadvantages of Group Tests
• Limited flexibility • Restricted responses • Less opportunity for individual observation
78
Objective Tests
A test with clear correct answers and consistent scoring, minimizing examiner bias.
79
Examples of Objective Tests
Multiple-choice, true/false, and matching questions.
80
Subjective Tests
A test involving open-ended responses that are influenced by examiner and examinee interpretation.
81
Example of Subjective Testing
Essay questions or open-ended responses.
82
Objective Tests
A test with clear correct answers and consistent scoring, minimizing examiner bias.
83
Examples of Objective Tests
Multiple-choice, true/false, and matching questions.
84
What is the primary purpose of assessment in counseling?
To gather systematic information that supports diagnosis, treatment planning, placement, selection, monitoring progress, and evaluating outcomes.
85
How does assessment support diagnosis and treatment planning in counseling?
Assessment helps counselors identify symptoms, evaluate their severity, and determine their impact on functioning, which guides diagnosis and treatment decisions.
86
How is the Beck Depression Inventory–II (BDI-II) used in diagnosis and treatment planning?
It helps determine the severity of depressive symptoms, supports diagnosis of mood disorders, and informs treatment recommendations.
87
Why are diagnostic systems important in counseling assessment?
They provide standardized terminology that allows mental health professionals to communicate clearly about diagnosis and treatment.
88
Which diagnostic system is most widely used in counseling assessment?
The Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition, published by the American Psychiatric Association.
89
Why is assessment increasingly important in managed care settings?
Managed care organizations often require formal diagnoses and documentation to authorize and reimburse treatment.
90
How is assessment used for placement services in counseling?
Counselors use assessment data to determine the most appropriate program, service, or setting for a client.
91
What assessments might be used to determine classroom placement for a child?
Behavioral records, observations, and individualized achievement tests.
92
How is assessment used for admission purposes?
Assessments help determine eligibility for educational programs or institutions.
93
What is an example of an assessment used for admission decisions?
The GRE, which is often required for postgraduate program admission.
94
How is assessment used for selection purposes?
Assessments are used to select candidates for specific programs, positions, or jobs.
95
What type of assessment might be used to select an auto mechanic?
A battery of mechanical aptitude tests to evaluate suitability for the position.
96
Why is assessment important for monitoring client progress?
It allows counselors to determine whether clients are moving toward counseling goals over time.
97
How can the BDI-II be used to monitor client progress?
It can be administered periodically (e.g., intake, session 3, 5, 7) to track changes in depression severity.
98
What is an example of informal assessment used to monitor progress?
Asking clients to rate symptoms (e.g., depression or anxiety) on a 1–10 scale at each session.
99
Why is client self-report useful in progress monitoring?
It provides ongoing, real-time insight into the client’s subjective experience and symptom changes.
100
What does it mean to evaluate counseling outcomes?
Determining whether counseling interventions are effective overall, not just whether an individual client improves.
101
Why are counselors increasingly required to evaluate outcomes?
To demonstrate accountability and provide evidence that counseling leads to positive client change, especially for managed care.
102
Is assessment limited to the initial counseling session?
No. Assessment is an ongoing process used throughout counseling for diagnosis, monitoring, and evaluation.
103
What are the six primary purposes of assessment in counseling?
1. Diagnosis & treatment planning 2. Placement services 3. Admission decisions 4. Selection decisions 5. Monitoring client progress 6. Evaluating counseling outcomes
104
What does outcome research in counseling evaluate?
Outcome research evaluates the effectiveness of counseling by examining (a) the degree of client change and (b) the factors that contribute to client change.
105
Why is outcome research important for professional counselors?
It helps counselors determine whether counseling works, guides program improvement, and supports accountability to stakeholders and managed care.
106
Which theorist proposed a five-step process for evaluating counseling outcomes?
Whiston (2016).
107
What are examples of evaluation study focuses in outcome research?
A specific counseling service, a particular intervention, or an entire counseling program.
108
What is the most common quantitative design used in outcome research?
A pretest–intervention–posttest design, comparing scores before and after counseling.
109
How does a qualitative evaluation design assess counseling outcomes?
By interviewing participants and analyzing their experiences and narratives about counseling.
110
What are common ways to select participants in outcome research?
• All clients • A random sample • A specific subgroup (e.g., adolescents, women, Latino clients)
111
Why should counselors involve as many participants as feasible in outcome research?
To increase variation in perspectives and improve experimental validity.
112
What types of assessments are commonly used in outcome research?
Established instruments with strong validity and reliability, often symptom-based, measuring change over time.
113
Can counselors create their own assessments for outcome research?
Yes, counselors may develop study-specific surveys when appropriate.
114
How is quantitative data analyzed in outcome research?
By determining whether changes are statistically significant, indicating the intervention was effective.
115
How is qualitative data analyzed in counseling outcome research?
Through coding narratives and transcripts and identifying themes related to client change.
116
What two key questions does outcome research answer for counselors?
1. Do clients change? 2. What factors contribute to that change?
117
How does outcome research support ethical counseling practice?
It promotes evidence-based practice, responsible decision-making, and continuous improvement of services.
118
On exams, how can you recognize an outcome research question?
Look for language about effectiveness, client change over time, pretest–posttest, or program evaluation.
119
According to Whiston (2016), what are the five steps of outcome research in counseling?
1. Define the evaluation study focus (what service, intervention, or program is being evaluated) 2. Determine the evaluation design (e.g., pretest–posttest or qualitative interviews) 3. Select participants (all clients, random sample, or subgroup) 4. Select assessments (valid and reliable instruments or study-specific tools) 5. Analyze data (statistical analysis for quantitative data or thematic analysis for qualitative data)
120
What is the overall purpose of following Whiston’s five-step outcome research process?
To determine whether counseling is effective, assess client change over time, and identify factors that contribute to change, supporting evidence-based practice and accountability.
121
What is the most comprehensive source for commercially available English language assessments
The Mental Measurements Yearbook (MMY), published by the Buros Institute of Mental Measurements, is the most comprehensive source for commercially available English-language assessments. It provides: • Test purpose and population • Administration and scoring procedures • Reliability and validity data • Norming information • Pricing and forms • Expert critical reviews (key distinction)
122
Tests in Print (TIP)
Tests in Print (TIP), also published by the Buros Institute of Mental Measurements, provides: • Titles of all published tests • Intended population • Author and publisher • Publication date and acronym ❌ Does NOT include: • Reliability or validity data • Norms • Expert critiques
123
Tests (PRO-ED)
Tests is a quick-reference directory of thousands of assessments that includes: • Purpose and major features • Target population • Administration time • Scoring method • Cost and availability ❌ Does NOT include: • Reliability • Validity • Norms • Expert critiques Used mainly for initial screening and selection of assessments.
124
Test Critiques
Test Critiques, published by PRO-ED, provides: • Detailed descriptions of assessments • Administration and interpretation guidance • Reliability and validity information • In-depth expert reviews (≈8 pages) It is user-friendly, written for both professionals and non-experts, and is updated annually.
125
Validity
Validity refers to how accurately a test measures the construct it claims to measure. It answers the question: 👉 “Does this test measure what it is supposed to measure?”
126
What Validity Is Concerned With
Validity addresses: 1. What the instrument measures 2. How well it measures the construct 3. Whether meaningful inferences can be made from the test scores
127
Validity Depends on
Validity depends on: • Purpose of testing • Population being tested A test may produce valid scores for one group or purpose and invalid scores for another.
128
Population and Purpose Matter
Validity varies depending on the test-taker and the intended use. Example: • An anxiety measure may show high validity for anxious adults • The same test may show low validity for disruptive children Therefore, validity must always be reported relative to the target population and purpose.
129
Exam Caution Card
❌ Saying “This test is valid.” ✅ Correct phrasing: “Scores from this test demonstrate validity for this purpose with this population.”
130
Content Validity
Content validity is the extent to which a test’s items adequately represent all important areas of the construct’s domain. It is established by: • Clearly defining the domain • Ensuring all major content areas are included • Weighting items so more important areas have more items 📌 Judgment-based, not statistical
131
Content Validity Example
The test must include items representing: • Physical symptoms (e.g., sleep, appetite) • Psychological symptoms (e.g., sadness, loss of interest) • Cognitive symptoms (e.g., guilt, worthlessness, suicidal thoughts) If psychological symptoms are more central to depression, more items must measure those symptoms.
132
Criterion Validity (General)
Criterion validity is the extent to which test scores are related to an external criterion of performance. It answers: 👉 “Does this test relate to a meaningful real-world outcome?” There are two types: • Concurrent validity • Predictive validity
133
Concurrent Validity
Concurrent validity measures the relationship between: • Test scores now • A criterion measured at the same time 📌 Both collected simultaneously
134
Concurrent Validity Example
Administer a depression test to adults while simultaneously collecting: • Hospital admission data for suicidal ideation If higher depression scores correlate with more admissions, concurrent validity is supported.
135
Predictive Validity
Predictive validity examines how well test scores predict a future outcome. • Test administered now • Criterion measured later 📌 Time delay is the key
136
Predictive Validity Example
Depression scores collected today are correlated with: • Number of hospitalizations for suicidal ideation two years later If scores predict future hospitalizations → predictive validity is supported.
137
Construct Validity
Construct validity is the extent to which a test measures a theoretical construct (abstract concept). It is especially important for constructs like: • Personality • Intelligence • Depression • Anxiety
138
Ways to Establish Construct Validity
Construct validity is supported through: • Experimental design • Factor analysis • Convergent validity • Discriminant (divergent) validity
139
Experimental Design Validity
If a test truly measures a construct: • Scores should change in expected directions after treatment Example: • Depression scores decrease after effective counseling • If not → problem may be the test, design, or treatment
140
Factor Analysis
Factor analysis is a statistical technique that identifies latent (hidden) factors underlying test items. For construct validity: • Subscales must relate to the overall construct • Subscales must be related but not redundant
141
Convergent Validity
Convergent validity exists when a test correlates strongly with other measures of the same construct. 📌 “Measures that should be related are related.”
142
Convergent Validity Example
A new depression test shows a strong positive correlation with: • Beck Depression Inventory–II (BDI-II) This supports convergent validity.
143
discriminant (divergent) validity
Discriminant validity is established when a test does NOT correlate with measures of theoretically unrelated constructs.
144
How is discriminant validity demonstrated?
By showing little to no correlation between the test and measures of unrelated traits or constructs.
145
Give an example of discriminant validity using a depression test.
A depression inventory shows no relationship with an achievement test, supporting discriminant validity.
146
What is face validity?
Face validity refers to whether a test appears to measure what it claims to measure.
147
Why is face validity NOT considered true validity?
Because it is: • Superficial • Based on appearance only • Lacks empirical support
148
When does a depression test have face validity?
When the test items look like they measure depression (e.g., sadness, sleep problems, hopelessness).
149
What provides the strongest evidence of validity for an assessment?
Establishing multiple types of validity, including content, criterion, and construct validity.
150
How does convergent validity differ from discriminant validity?
• Convergent validity → related constructs are correlated • Discriminant validity → unrelated constructs are not correlated
151
True or False: Face validity alone is sufficient to establish test validity.
False. Face validity does not provide empirical evidence of accuracy.
152
Which type of validity is most likely tested on the NCE when the question states, 'the test does NOT correlate with unrelated constructs'?
Discriminant validity
153
How is validity typically reported in test manuals and reports?
Validity is reported as a correlation coefficient between test scores and a criterion.
154
What is a validity coefficient?
A validity coefficient is the correlation between a test score and a criterion measure. how well a test or measure accurately predicts or relates to a specific outcome (criterion), indicating its usefulness; it's a number from -1.0 to +1.0, where closer to 1.0 means stronger validity (e.g., a test score predicting job performance).
155
What does a higher validity coefficient indicate?
A stronger relationship between test scores and the criterion, meaning better predictive accuracy.
156
Besides correlation coefficients, how else can validity be reported?
Validity can be reported using a regression equation.
157
What is the purpose of a regression equation in validity reporting?
To predict a future criterion score based on a current test score. models the relationship between variables, allowing you to predict a dependent variable (ŷ or Y) based on one or more independent variables (x or X) by finding the best-fit line through data points, with b₀ as the intercept (where the line crosses the Y-axis) and b₁ as the slope (how steep the line is). It helps understand trends, like how much a child grows yearly or how gas prices relate to other factors.
158
Give an example of using a regression equation in practice.
Predicting a college GPA from a student’s SAT score.
159
Why are predictions based on regression equations never 100% accurate?
Because measurement error and imperfect validity are always present.
160
What statistic must be reported along with regression-based predictions?
The standard error of estimate.
161
What is the standard error of estimate?
A statistic that indicates the expected margin of error in a predicted criterion score.
162
What causes the standard error of estimate?
The imperfect validity of the test being used for prediction.
163
What does a smaller standard error of estimate mean?
Predictions are more accurate and closer to actual criterion scores.
164
Conceptually, how is the standard error of estimate calculated?
By examining the squared differences between actual scores and predicted scores, averaged across cases.
165
On the NCE, if a question mentions 'margin of prediction error,' what is the answer?
Standard error of estimate
166
True or False: A test with perfect validity would have a standard error of estimate of zero.
True (theoretically—but this never occurs in practice).
167
Which two statistics are most commonly associated with predictive validity reports?
• Validity coefficient • Standard error of estimate
168
What is decision accuracy?
The degree to which a test correctly supports counselor decisions about diagnosis, treatment, or placement.
169
Why is decision accuracy important for counselors?
Because counselors make real-world decisions (diagnosis, treatment, placement) that impact client outcomes.
170
What does sensitivity measure?
An instrument’s ability to correctly identify the presence of a condition.
171
Example of sensitivity?
A depression inventory correctly identifies a depressed client as depressed.
172
What does specificity measure?
An instrument’s ability to correctly identify the absence of a condition.
173
Example of specificity?
A depression inventory correctly identifies a non-depressed client as not depressed.
174
What is a false positive error?
When a test incorrectly identifies the presence of a condition.
175
Example of a false positive?
A depression inventory says a non-depressed client is depressed.
176
What is a false negative error?
When a test incorrectly identifies the absence of a condition.
177
Example of a false negative?
A depression inventory says a depressed client is not depressed.
178
Which error is usually more dangerous in mental health screening?
False negatives (missing a real problem).
179
What is efficiency in decision accuracy?
The ratio of total correct decisions to the total number of decisions.
180
In simple terms, what does efficiency tell you?
How accurate the test is overall.
181
What is incremental validity?
The extent to which a test adds predictive power beyond existing information or assessments.
182
Example of incremental validity?
A new aptitude test improves prediction of college GPA beyond SAT scores alone.
183
If a test does not improve prediction beyond existing data, what is its incremental validity?
Low or none.
184
On exams, sensitivity is MOST associated with which phrase?
“Correctly identifies those WITH the condition”.
185
On exams, specificity is MOST associated with which phrase?
“Correctly identifies those WITHOUT the condition”.
186
Which decision accuracy term answers: “Does this test help me make better decisions than before?”
Incremental validity.
187
One-sentence NCE summary of decision accuracy?
Decision accuracy evaluates how well a test correctly identifies, excludes, and improves decisions about client conditions.
188
What is reliability?
The consistency of scores obtained by the same person across repeated test administrations.
189
What question does reliability answer?
“Does the test give consistent results?”
190
What is the difference between a true score and an observed score?
Observed score = true score + error
191
Formula for observed score?
X = T + e (X = observed score, T = true score, e = error)
192
What causes measurement error?
• Instrument problems • Test-taker factors (anxiety, fatigue) • Testing environment (noise, distractions)
193
What is reliability MOST concerned with?
The amount of error in test scores.
194
What is test–retest reliability?
The consistency of scores across time using the same test.
195
Another name for test–retest reliability?
Temporal stability
196
Best use of test–retest reliability?
For stable traits (e.g., intelligence).
197
Major problems with test–retest reliability?
• Memory effects • Practice effects • Longer time gaps ↓ correlation
198
What is alternative (parallel) form reliability?
Comparing scores from two equivalent versions of the same test.
199
Advantage of alternative form reliability?
Eliminates memory and practice effects.
200
Disadvantage of alternative form reliability?
True equivalence between forms is hard to achieve.
201
What is internal consistency?
How consistently test items measure the same construct within one administration.
202
What is split-half reliability?
Correlation between two halves of the same test.
203
Main limitation of split-half reliability?
Shortening the test reduces reliability.
204
Which formula corrects split-half reliability?
The Spearman–Brown Prophecy Formula
205
What does the Spearman–Brown formula do?
Estimates reliability for a full-length test from split-half data.
206
What is inter-item consistency?
Correlation among all test items and the total score.
207
Which reliability formula is used for dichotomous items?
Kuder–Richardson Formula 20 (KR-20)
208
Which reliability formula is used for Likert-type scales?
Cronbach’s alpha
209
What is inter-scorer (inter-rater) reliability?
Consistency of scores between two or more raters.
210
When is inter-scorer reliability especially important?
When scoring involves judgment or subjectivity.
211
Example of inter-scorer reliability?
Multiple clinicians independently scoring open-ended responses.
212
How is reliability reported?
As a reliability coefficient (correlation).
213
What does a reliability coefficient close to 1.00 indicate?
High reliability (low error).
214
What does a reliability coefficient below 1.00 indicate?
Presence of measurement error.
215
Typical acceptable reliability range?
.80 – .95
216
Reliability expectations for aptitude/achievement tests?
Usually >.90
217
Reliability expectations for personality inventories?
Can be below .90 and still acceptable.
218
One-sentence NCE summary of reliability?
Reliability reflects the consistency of test scores and freedom from measurement error.
219
Standard Error of Measurement (SEM — Standard Error of Measurement)
A statistic that estimates how an individual’s repeated test scores are distributed around their true score.
220
Why is the Standard Error of Measurement (SEM — Standard Error of Measurement) needed?
Because a person’s true score is unknown and all test scores contain measurement error.
221
What does the Standard Error of Measurement (SEM — Standard Error of Measurement) represent in simple terms?
The standard deviation of an individual’s repeated scores on the same test.
222
Formula for the Standard Error of Measurement (SEM — Standard Error of Measurement)?
SEM = SD × √(1 − r) SD = Standard Deviation r = reliability coefficient
223
What is the relationship between reliability and the Standard Error of Measurement (SEM — Standard Error of Measurement)?
Inverse relationship — higher reliability = smaller SEM.
224
What happens to the Standard Error of Measurement (SEM — Standard Error of Measurement) if reliability equals 1.00?
SEM equals 0 (no measurement error).
225
How is the Standard Error of Measurement (SEM — Standard Error of Measurement) commonly reported?
As a confidence interval around the observed score.
226
What percentage of scores fall within ±1 Standard Error of Measurement (SEM — Standard Error of Measurement)?
68%
227
What percentage of scores fall within ±2 Standard Error of Measurement (SEM — Standard Error of Measurement)?
95%
228
If a test mean is 93 and the Standard Error of Measurement (SEM — Standard Error of Measurement) is 2, what is the 68% confidence interval?
91–95
229
If the Standard Error of Measurement (SEM — Standard Error of Measurement) is 2, what is the 95% confidence interval?
±4 points from the mean (±2 SEM)
230
How does test length affect reliability?
Longer tests are generally more reliable than shorter tests.
231
How does homogeneity of test items affect reliability?
More homogeneous (similar-content) items increase reliability.
232
What is range restriction, and how does it affect reliability?
Limited score range lowers reliability.
233
How does heterogeneity of the test group affect reliability?
More diverse test-takers increase reliability estimates.
234
Why do speed tests often show artificially high reliability?
Because most test-takers answer most items correctly.
235
Can a test be reliable but not valid?
Yes.
236
Can a test be valid but not reliable?
No.
237
One-sentence rule for validity and reliability?
Valid test scores are always reliable, but reliable scores are not always valid.
238
What is item analysis?
A statistical process examining individual test items to evaluate test quality.
239
Why is item analysis used?
To remove confusing, too easy, or too difficult items from future tests.
240
What is item difficulty?
The percentage of test-takers who answer an item correctly.
241
How is item difficulty calculated?
Number correct ÷ total test-takers = p value
242
What does a p value of .90 indicate with item difficulty?
The item is very easy.
243
What p value yields the most score variability with item difficulty?
.50
244
What is item discrimination in item difficulty?
measures how well a test question separates high-performers from low-performers
245
How is item discrimination calculated in item difficulty?
Performance of top 25% minus performance of bottom 25%.
246
What is positive item discrimination in item difficulty?
More high scorers answer correctly than low scorers.
247
What do zero or negative item discrimination values indicate in item difficulty?
Poor test items that should be revised or removed.
248
NCE one-liner for item discrimination?
Good items separate people who have the trait from those who do not.
249
What is test theory?
A framework that requires psychological constructs to be measurable in quality and quantity to be considered empirical.
250
What is the primary goal of test theory?
To reduce test error and improve reliability and validity of scores.
251
What must professional counselors know about test theory?
The major models used to develop, evaluate, and interpret assessment instruments.
252
What is Classical Test Theory (CTT — Classical Test Theory)?
A psychometric theory stating that an observed score equals a true score plus error.
253
Core equation of Classical Test Theory (CTT — Classical Test Theory)?
Observed score = True score + Error
254
Central aim of Classical Test Theory (CTT — Classical Test Theory)?
To increase the reliability of test scores.
255
What aspect of testing does Classical Test Theory (CTT — Classical Test Theory) primarily focus on?
Total test scores rather than individual items.
256
What is Item Response Theory (IRT — Item Response Theory)?
A modern test theory that uses mathematical models to evaluate individual test items and test performance.
257
Another name for Item Response Theory (IRT — Item Response Theory)?
Modern test theory
258
What is the primary focus of Item Response Theory (IRT — Item Response Theory)?
How individual test items function across different levels of ability.
259
Name three uses of Item Response Theory (IRT — Item Response Theory).
• Detecting item bias • Equating scores across different tests • Tailoring test items to individual test-takers
260
How does Item Response Theory (IRT — Item Response Theory) detect item bias?
By examining whether items function differently for different groups (e.g., males vs. females).
261
How does Item Response Theory (IRT — Item Response Theory) differ from Classical Test Theory?
It focuses on individual items rather than total test scores.
262
Who proposed the construct-based validity model?
Samuel Messick (1995)
263
What is the core idea of the construct-based validity model?
Validity is a single, holistic construct—not separate types.
264
How does the construct-based validity model differ from Classical Test Theory?
It rejects separating validity into content, criterion, and construct components.
265
What two major aspects does Messick emphasize in validity?
• Internal structural aspects • External aspects of validity
266
According to Messick, how should validity be understood?
As an integrated evaluation of score meaning and score use.
267
What is a scale?
A group of items combined to produce a composite score on a single variable.
268
What does a scale measure?
A specific construct or variable.
269
What types of variables can scales measure?
Discrete variables and continuous variables.
270
What is the difference between discrete and continuous variables?
Discrete: distinct categories Continuous: measured along a range
271
What is quantitative data?
Data represented numerically.
272
What is qualitative data?
Data represented in nonnumeric forms (e.g., Yes/No responses).
273
How are scales typically scored?
By summing or averaging responses across items.
274
Why are scales important in assessment?
They improve measurement reliability and allow constructs to be quantified.
275
NCE one-liner for test theory?
Test theory provides the scientific foundation for reducing error and improving measurement quality.
276
NCE one-liner for Item Response Theory (IRT — Item Response Theory)?
IRT evaluates how individual test items function across levels of ability.
277
NCE one-liner for Classical Test Theory (CTT — Classical Test Theory)?
CTT focuses on total test scores and increasing reliability by reducing error.
278
NCE one-liner for scales?
Scales combine multiple items to measure a single construct.
279
What are scales of measurement?
Systems used to classify or measure characteristics of data.
280
What are the four scales of measurement?
1. Nominal 2. Ordinal 3. Interval 4. Ratio
281
What is a nominal scale?
A scale that names or categorizes data without order or equal intervals.
282
What does a nominal scale NOT provide?
• Rank order • Equal intervals • Meaningful magnitude
283
Example of a nominal scale variable?
Gender
284
Can numbers be used in a nominal scale?
Yes, but only as labels (e.g., male = 0, female = 1).
285
NCE clue for nominal scale?
“Name only, no order.”
286
What is an ordinal scale?
A scale that ranks data in order but does not assume equal intervals.
287
What does an ordinal scale provide?
Rank order
288
What does an ordinal scale NOT provide?
Equal spacing between values.
289
Common example of an ordinal scale?
Likert-type scale
290
Ordinal example interpretation?
A rating of 4 indicates more satisfaction than 3, but not twice as much.
291
NCE clue for ordinal scale?
“Ranked, unequal spacing.”
292
What is an interval scale?
A scale with rank order and equal intervals but no true zero.
293
Key feature of an interval scale?
Equal distances between points.
294
What is missing from an interval scale?
An absolute zero point.
295
Classic example of an interval scale?
Temperature in Fahrenheit
296
Why can’t ratios be used with interval scales?
Zero does not represent absence of the construct.
297
Common data type for counseling assessments?
Interval scale
298
NCE clue for interval scale?
“Equal intervals, no true zero.”
299
What is a ratio scale?
A scale with rank order, equal intervals, and a true zero.
300
What makes the ratio scale the most advanced?
It includes all properties of nominal, ordinal, and interval scales.
301
What does a true zero mean on a ratio scale?
Complete absence of the measured variable.
302
Example of a ratio scale?
Height
303
Why are ratios meaningful on a ratio scale?
Because zero is absolute (e.g., 6 feet is twice 3 feet).
304
Common fields using ratio scales?
Natural sciences (e.g., weight, time, length).
305
NCE clue for ratio scale?
“Equal intervals + true zero.”
306
Which scale allows meaningful ranking but not equal intervals?
Ordinal scale
307
Which scale allows equal spacing but not true ratios?
Interval scale
308
Which scale allows multiplication and division comparisons?
Ratio scale
309
One-sentence NCE summary of scales of measurement?
Nominal names, ordinal ranks, interval measures equal spacing, and ratio adds a true zero.
310
Likert Scale (Likert-type scale)
A scale commonly used to measure attitudes or opinions using graded response options.
311
Typical format of a Likert scale (Likert-type scale) item
A statement followed by response options ranging from Strongly Disagree to Strongly Agree.
312
What construct is most often measured by a Likert scale (Likert-type scale)
Attitudes or opinions.
313
Example response options for a Likert scale (Likert-type scale)
Strongly Disagree – Disagree – Neutral – Agree – Strongly Agree
314
Measurement level typically associated with a Likert scale (Likert-type scale)
Ordinal scale (often treated as interval in practice).
315
NCE clue for a Likert scale (Likert-type scale)
“Strongly agree to strongly disagree.”
316
Semantic Differential Scale (Self-Anchored scale)
A scale that measures attitudes by asking respondents to rate a concept between two opposite adjectives.
317
What assumption underlies the semantic differential scale (self-anchored scale)
People think dichotomously (in opposites).
318
Typical format of a semantic differential scale (self-anchored scale)
A line or continuum anchored by two opposing adjectives (e.g., Bad — Good).
319
Example of a semantic differential scale (self-anchored scale) item
“How do you feel about your NCE scores?” Bad __________ Good
320
What does the respondent do on a semantic differential scale (self-anchored scale)
Places a mark along the continuum between two adjectives.
321
NCE clue for a semantic differential scale (self-anchored scale)
“Opposite adjectives with a line between them.”
322
Thurstone Scale (Equal-appearing interval scale)
A scale that measures multiple dimensions of an attitude using agree/disagree responses.
323
Key feature of a Thurstone scale (equal-appearing interval scale)
Items are scaled to represent equal-appearing intervals of attitude strength.
324
What type of responses are used in a Thurstone scale (equal-appearing interval scale)
Agree / Disagree
325
What method is associated with a Thurstone scale (equal-appearing interval scale)
Paired comparison method.
326
What does a Thurstone scale (equal-appearing interval scale) attempt to measure
Attitudes across multiple dimensions.
327
NCE clue for a Thurstone scale (equal-appearing interval scale)
“Agree/disagree statements with equal-appearing intervals.”
328
Guttman Scale (Cumulative scale)
A scale designed to measure the intensity or extremity of a variable.
329
How are items arranged in a Guttman scale (cumulative scale)
From least extreme to most extreme.
330
Key principle of a Guttman scale (cumulative scale)
Agreement with an extreme item implies agreement with all previous items.
331
What does the Guttman scale (cumulative scale) measure best
Intensity or strength of an attitude.
332
Example context for a Guttman scale (cumulative scale)
Increasing levels of tolerance or acceptance.
333
NCE clue for a Guttman scale (cumulative scale)
“If you agree with the last item, you agree with all before it.”
334
Which scale uses graded agreement levels
Likert scale (Likert-type scale)
335
Which scale uses opposing adjectives on a continuum
Semantic differential scale (self-anchored scale)
336
Which scale uses agree/disagree with equal-appearing intervals
Thurstone scale (equal-appearing interval scale)
337
Which scale measures intensity through cumulative agreement
Guttman scale (cumulative scale)
338
One-sentence NCE summary of types of scales
Likert scales rate agreement, semantic differential scales rate between opposites, Thurstone scales measure attitudes with equal intervals, and Guttman scales measure intensity cumulatively.
339
Method associated with a Thurstone scale (equal-appearing interval scale)
Paired comparison method.
340
What does a Thurstone scale (equal-appearing interval scale) attempt to measure
Attitudes across multiple dimensions.
341
NCE clue for a Thurstone scale (equal-appearing interval scale)
“Agree/disagree statements with equal-appearing intervals.”
342
Which scale uses agree/disagree with equal-appearing intervals
Thurstone scale (equal-appearing interval scale).
343
Which scale measures intensity through cumulative agreement
Guttman scale (cumulative scale).
344
One-sentence NCE summary of types of scales
Likert scales rate agreement, semantic differential scales rate between opposites, Thurstone scales measure attitudes with equal intervals, and Guttman scales measure intensity cumulatively.
345
What shape does a normal distribution form when graphed?
A bell-shaped curve.
346
What is another name for the bell-shaped curve?
The normal curve (bell curve).
347
What does it mean that a normal curve is symmetrical?
The left and right sides of the curve are mirror images.
348
Where is the highest point of a normal curve located?
At the center of the distribution.
349
Where are the lowest points of a normal curve located?
At the extreme ends (tails) of the distribution.
350
What does it mean that a normal curve is asymptotic?
The tails approach the horizontal axis but never touch it.
351
What does “asymptotic” imply about extreme scores?
Extremely high or low scores are possible but very rare.
352
What statistical concepts characterize a normal distribution?
• Measures of central tendency • Measures of variability
353
Which measures of central tendency apply to a normal distribution?
Mean, median, and mode.
354
Relationship among mean, median, and mode in a normal distribution?
They are equal and located at the center.
355
Why are normal distributions important in assessment?
They provide the mathematical foundation for score interpretation.
356
What is the relationship between normal distributions and derived scores?
Derived scores are based on the mathematical properties of normal distributions.
357
Why are normal distributions essential for comparing test scores?
They allow meaningful comparisons across individuals and tests.
358
What kinds of comparisons do normal distributions allow?
• Comparing different clients on the same test • Comparing one client across multiple tests
359
What type of assessments rely on normal distributions?
Norm-referenced assessments.
360
Which derived scores originate from normal distributions?
• Percentile ranks • Normal curve equivalents • Stanines (standard nines) • z-scores (standard scores expressed in standard deviation units) • T scores (standard scores with a mean of 50 and a standard deviation of 10)
361
Why can derived scores exist only because of normal distributions?
Because normal distributions provide predictable mathematical relationships.
362
NCE cue for normal distributions and test scores?
“Bell curve makes score comparison possible.”
363
One-sentence NCE summary of normalmdistribution?
The normal distribution’s symmetry and mathematical properties allow raw scores to be converted into meaningful derived scores.
364
What are norms?
Norms are typical scores or performances used as a comparison standard for evaluating test scores.
365
What is a norm-referenced assessment?
A norm-referenced assessment compares an individual’s score to the average score (mean) of a norm group.
366
What question does a norm-referenced assessment answer?
“How did this person perform compared to others?”
367
What is the reference point in a norm-referenced assessment?
The norm group’s average score (mean).
368
Why are derived scores important in norm-referenced assessment?
They indicate an individual’s relative position within the norm group.
369
What information does knowing a person’s relative position provide?
How well the individual performed compared to peers.
370
Why is a raw score alone insufficient in norm-referenced assessment?
Raw scores lack meaning without comparison to a norm group.
371
Example of interpreting Ivan’s score using norms?
Ivan’s score (67) is above the group mean (63), indicating above-average performance.
372
Examples of norm-referenced college admissions exams?
• GRE (Graduate Record Examination) • SAT (Scholastic Assessment Test) • ACT (American College Testing) • MCAT (Medical College Admission Test) • GMAT (Graduate Management Admission Test)
373
Examples of norm-referenced intelligence tests?
• Stanford–Binet Intelligence Scales • Wechsler intelligence tests
374
Examples of norm-referenced personality inventories?
• MBTI (Myers–Briggs Type Indicator) • CPI (California Psychological Inventory)
375
What is a criterion-referenced assessment?
measures a learner's performance against specific, predetermined standards or skills (criteria) rather than comparing them to other students, showing if they've mastered content like a driver's test or a state exam
376
What question does a criterion-referenced assessment answer?
“Did the person meet the standard?”
377
Example of a criterion-referenced assessment?
• Driver’s licensing exams • Professional licensure exams such as the NCE (National Counselor Examination) • High school graduation exams • CPCE (Counselor Preparation Comprehensive Examination)
378
What is an ipsative assessment?
An assessment that compares an individual’s current score to their own previous scores.
379
What type of reference frame does an ipsative assessment use?
An internal (self-referenced) frame of reference.
380
How do norm-referenced and criterion-referenced assessments differ from ipsative assessments?
Norm- and criterion-referenced assessments use external standards, while ipsative assessments use self-comparison.
381
Common settings where ipsative assessments are used?
• Physical education classes • Computer games • Fitness tracking
382
One-sentence NCE summary of 7.3.2?
Norm-referenced assessments interpret scores by comparing individuals to a norm group, unlike criterion-referenced assessments (standard-based) and ipsative assessments (self-based).
383
What is a percentage score?
A percentage score is the raw score divided by the total number of test items.
384
What does a percentage score tell you?
The number or proportion of test items answered correctly.
385
Ivan answered 67 out of 100 questions correctly. What is his percentage score?
67 percent (67%).
386
Why does a percentage score lack interpretive meaning by itself?
Because it must be compared to a criterion or a norm group to be meaningful.
387
What is a percentile rank (also called a percentile)?
A percentile rank is the percentage of scores that fall at or below a given score in a norm group.
388
What question does a percentile rank answer?
“What percentage of people scored the same as or lower than this person?”
389
How is a percentile rank different from a percentage score?
• A percentage score = percent correct • A percentile rank = percent of people scoring below a given score
390
What is the possible range of percentile ranks?
Less than 1 to greater than 99.
391
Why can percentile ranks never be 0 or 100?
Because percentile ranks represent the percentage of scores below a given score, and the normal curve is asymptotic.
392
What is the mean (average) percentile rank?
50.
393
Are percentile ranks equal units of measurement?
No, percentile ranks are not equal units of measurement.
394
How do percentile ranks behave near the mean of a normal distribution?
They exaggerate small differences in raw scores near the mean.
395
How do percentile ranks behave at the tails of the distribution?
They minimize differences in raw scores at the extremes.
396
Why are percentile ranks considered ordinal, not interval, data?
Because the distances between percentile ranks are not equal across the scale.
397
What statistical information is needed to calculate a percentile rank using the normal distribution?
• Mean • Standard deviation (standard deviation, SD) • Individual raw score
398
What percentile rank corresponds to +1 standard deviation on a normal curve?
The 84th percentile.
399
How can the 84th percentile be calculated without a table?
• 50% of scores fall below the mean • 34% fall between the mean and +1 standard deviation • 50% + 34% = 84%
400
One-sentence NCE summary of percentiles?
Percentile ranks indicate the percentage of scores at or below a given score, are norm-referenced, and are not equal-interval measurements.
401
What percentile rank corresponds to +1 standard deviation on a normal curve?
The 84th percentile.
402
What does standardization mean in assessment?
Standardization is the process of converting raw scores into standard scores using a norm group.
403
What is the purpose of standardization?
To create a typical (average) score that serves as a reference point for interpreting future test results.
404
What is a standard group (also called a norm group)?
A group of test-takers whose scores are used to establish norms for comparison.
405
Why must the standard group be representative of future test-takers?
Because scores are only meaningful when compared to a relevant and similar population.
406
Example of improper norm comparison?
Comparing a third-grade student to a fifth-grade norm group.
407
What are standardized scores?
Converted raw scores that show how an individual performed relative to a norm group.
408
In what type of assessment are standardized scores used?
Norm-referenced assessments.
409
What do standardized scores indicate statistically?
The number of standard deviations a score is above or below the mean.
410
Why are standardized scores more useful than raw scores?
They allow comparison across different tests and test administrations.
411
What key statistical concept underlies standardized scores?
The standard deviation (standard deviation, SD).
412
What is a z-score (z-score, standard score)?
A standardized score indicating how many standard deviations a raw score is from the mean.
413
What is the mean and standard deviation of z-scores?
Mean = 0, standard deviation = 1.
414
What is a T score (T score, standard score)?
A standardized score derived from a z-score with a mean of 50 and a standard deviation of 10.
415
Why are T scores often preferred over z-scores?
They eliminate negative numbers and decimals.
416
What is deviation IQ (deviation intelligence quotient)?
An intelligence score standardized with a mean of 100 and a standard deviation of 15.
417
What does deviation IQ replace historically?
Ratio intelligence quotient (ratio intelligence quotient).
418
What is a stanine score (standard nine)?
A standardized score that divides the normal distribution into nine categories.
419
What are the mean and standard deviation of stanine scores?
Mean = 5, standard deviation = 2.
420
What is a normal curve equivalent score (normal curve equivalent score)?
A standardized score with a mean of 50 and a standard deviation of approximately 21.06.
421
What is a key advantage of normal curve equivalent scores?
They have equal-interval properties, unlike percentile ranks.
422
How do standardized scores differ from percentile ranks?
Standardized scores are equal-interval measures; percentile ranks are not.
423
What is the most fundamental standardized score from which others are derived?
The z-score (z-score, standard score).
424
One-sentence NCE / CPCE summary of standardized scores?
Standardized scores convert raw scores into equal-interval measures that reflect distance from the mean in standard deviation units.
425
What is a z-score (z-score, standard score)?
The most basic type of standardized score that expresses how many standard deviations a raw score is from the mean.
426
What are the mean and standard deviation of a z-score distribution?
Mean = 0; standard deviation = 1.
427
What does a z-score represent conceptually?
The number of standard deviation units a score is above or below the mean.
428
What does a positive z-score indicate?
The raw score is above the mean.
429
What does a negative z-score indicate?
The raw score is below the mean.
430
What does a z-score of 0 indicate?
The raw score is exactly at the mean.
431
Formula for calculating a z-score (z-score, standard score)?
z = (X − M) / SD (X = raw score, M = mean, SD = standard deviation)
432
What information is required to calculate a z-score?
The raw score, the mean, and the standard deviation.
433
What does X represent in the z-score formula?
X represents the individual’s raw score.
434
What does M represent in the z-score formula?
M represents the sample mean.
435
What does SD represent in the z-score formula?
SD represents the sample standard deviation.
436
How can z-scores be used with the normal curve (normal distribution)?
They show where a score falls relative to the mean in standard deviation units.
437
What percentile rank corresponds to a z-score of +1?
Approximately the 84th percentile rank.
438
What percentile rank corresponds to a z-score of 0?
The 50th percentile rank.
439
What percentile rank corresponds to a z-score of −1?
Approximately the 16th percentile rank.
440
If a student’s z-score is +1.00, how did they perform relative to peers?
They scored one standard deviation above the mean and above approximately 84% of peers.
441
If a student’s z-score is 0, what does this indicate about performance?
The student scored exactly at the mean and at the 50th percentile rank.
442
Why are z-scores considered the foundation of other derived scores?
Because most other standardized scores are calculated by transforming z-scores.
443
Name standardized scores commonly derived from z-scores.
T scores (T scores, standard scores), deviation intelligence quotient (deviation IQ), stanine scores (standard nine), and normal curve equivalent scores (normal curve equivalent scores).
444
One-sentence NCE / CPCE summary of z-scores?
A z-score expresses a raw score as the number of standard deviations it lies above or below the mean.
445
What is a T score (T score, standard score)?
A standardized score with a mean of 50 and a standard deviation of 10.
446
What types of assessments commonly use T scores (T scores, standard scores)?
Personality, interest, and aptitude measures.
447
What are the mean and standard deviation of a T score (T score, standard score)?
Mean = 50; standard deviation = 10.
448
How are T scores (T scores, standard scores) derived?
By transforming a z-score (z-score, standard score).
449
Formula for calculating a T score (T score, standard score)?
T = 10(z) + 50 (z = z-score, standard score)
450
What does a T score above 50 indicate?
The raw score is above the mean.
451
What does a T score below 50 indicate?
The raw score is below the mean.
452
How many standard deviations above the mean is a T score of 60?
One standard deviation above the mean.
453
What T score corresponds to the mean?
A T score of 50.
454
If a person has a z-score (z-score, standard score) of −2, what is the T score?
T = 10(−2) + 50 = 30.
455
What does a T score of 30 indicate?
The score is two standard deviations below the mean.
456
What approximate percentile rank corresponds to a T score of 30?
Approximately the 2nd percentile rank.
457
One-sentence NCE summary of T scores?
T scores express standardized performance with a mean of 50 and standard deviation of 10 and are commonly used in personality and interest testing.
458
What is a deviation IQ (deviation intelligence quotient)?
A standardized score used primarily in intelligence testing with a mean of 100 and a standard deviation of 15.
459
Why are deviation IQs often called standard scores (standard scores, SS)?
Because they are commonly used to interpret achievement and aptitude test results.
460
What are the mean and standard deviation of deviation IQ scores (deviation intelligence quotient scores)?
Mean = 100; standard deviation = 15.
461
Formula for calculating a deviation IQ or standard score (standard score, SS)?
SS = 15(z) + 100 (z = z-score, standard score)
462
What does a deviation IQ score above 100 indicate?
The raw score is above the mean.
463
What does a deviation IQ score below 100 indicate?
The raw score is below the mean.
464
If a person has a z-score (z-score, standard score) of +1, what is the deviation IQ score?
SS = 15(1) + 100 = 115.
465
How many standard deviations above the mean is a deviation IQ of 115?
One standard deviation above the mean.
466
How are deviation IQ scores interpreted relative to z-scores and T scores?
In the same way—by how many standard deviations they fall above or below the mean.
467
One-sentence NCE summary of deviation IQ scores?
Deviation IQ scores are standardized scores with a mean of 100 and a standard deviation of 15 used primarily in intelligence and achievement testing.
468
What are developmental scores?
Scores that place an individual’s raw score along a developmental continuum to derive meaning.
469
How do developmental scores differ from standard scores?
Developmental scores describe location on a developmental continuum, whereas standard scores transform raw scores into a new mean and standard deviation.
470
What do developmental scores compare?
An individual’s performance relative to others of the same age or grade level.
471
For which populations are developmental scores most commonly used?
Children and young adolescents.
472
What is an age-equivalent score?
A developmental score that compares an individual’s performance to the average performance of individuals of the same age.
473
How are age-equivalent scores reported?
In chronological years and months.
474
How should an age-equivalent score be interpreted?
As the age at which the average individual earns the same score.
475
Example: What does an age-equivalent score of 8 years 2 months mean for a 7-year-5-month-old child?
The child is performing at the average level of children aged 8 years 2 months.
476
Do age-equivalent scores indicate readiness for advanced placement?
No.
477
What is a grade-equivalent score?
A developmental score that compares an individual’s performance to the average performance of students at a given grade level.
478
How are grade-equivalent scores reported?
As a decimal representing grade level and months completed in that grade.
479
What does a grade-equivalent score of 5.6 mean?
Performance equivalent to the average student who has completed 6 months of fifth grade.
480
Example: A first-grader who has completed 2 months scores a grade equivalent of 1.2. What does this mean?
The student is performing at the mean for her grade level.
481
Are grade-equivalent scores useful for measuring growth over time?
Yes, they can show individual growth from year to year.
482
Do grade-equivalent scores indicate mastery of higher-grade material?
No.
483
Can grade-equivalent scores be used to justify grade skipping or retention?
No.
484
Why is it incorrect to move a student to a higher grade based solely on a high grade-equivalent score?
Because the student was not compared to students in the higher grade and grade-equivalent scores do not analyze specific skills.
485
What does a grade-equivalent score actually tell us?
Where an individual’s score falls relative to peers at the same grade level.
486
Example: What can we correctly conclude about a seventh-grader with a grade-equivalent score of 10.2 in math?
The student is performing higher than most seventh-grade peers in math.
487
What can we NOT conclude about that seventh-grader?
That the student is ready for tenth-grade math.
488
Key limitation of developmental scores on exams?
They are often misinterpreted as indicators of readiness or ability.
489
One-sentence NCE/CPCE summary of developmental scores?
Developmental scores describe where a person falls on an age or grade continuum but do not measure skill mastery or placement readiness.
490
What are survey batteries?
A collection of tests that measure across broad content areas rather than one subject.
491
Primary purpose of survey batteries?
To assess general academic progress.
492
Typical setting for survey batteries?
School settings.
493
Key limitation of survey batteries?
They do not assess any single subject in depth.
494
Stanford Achievement Test, Tenth Edition (SAT-10): purpose?
Measures academic knowledge across multiple subject areas.
495
Iowa Test of Basic Skills (ITBS)
For Students in kindergarten through eighth grade. a series of nationally standardized achievement tests for students in K-8 (and formerly high school) measuring core subjects like reading, math, science, and language arts against national norms
496
Iowa Test of Educational Development (ITED) is designed for whom?
High school students.
497
Metropolitan Achievement Test, Eighth Edition (MAT-8): scope?
Broad assessment of achievement across kindergarten through twelfth grade. a series of standardized tests used to measure K-12 students' knowledge and skills in core subjects like reading, math, and language arts, providing educators and parents data to track progress, identify strengths/weaknesses, and inform instruction
498
TerraNova, Third Edition: key feature?
Broad-based achievement testing with multiple versions, including Common Core and Spanish editions. Spanish Text-to-Speech (TTS) for math sections and sometimes provide Spanish language accommodations, helping English Language Learners (ELLs) access content, but it's not a full Spanish language proficiency test itself, rather a tool for Spanish speakers or learners within the general assessment framework.
499
What are diagnostic tests?
Tests designed to identify learning disabilities and specific academic skill deficits.
500
How do diagnostic tests differ from survey batteries?
Diagnostic tests provide in-depth analysis of specific strengths and weaknesses.
501
Wide Range Achievement Test, Fourth Edition (WRAT-4):
brief, reliable assessment tool for ages 5-94 that measures foundational academic skills in reading (word decoding, sentence comprehension), spelling, and math computation, used by educators and clinicians to diagnose learning disabilities, track progress, and guide interventions, offering a quick, efficient snapshot of basic academic functioning.
502
Key Math Diagnostic Test, Third Edition (KeyMath-3): focus?
Comprehensive assessment of math-related learning disabilities.
503
Woodcock–Johnson IV Tests of Achievement (WJ IV ACH): strength?
Detailed assessment of reading, writing, and math aligned with Individuals with Disabilities Education Act (IDEA) categories.
504
Peabody Individual Achievement Test–Revised (PIAT-R): main use?
Screening for learning disabilities in reading, math, and spelling. This test is. standardized, individually given test assessing academic skills in areas like reading, math, spelling, and general knowledge for students K-12 (or up to age 22 with the Normative Update, PIAT-R/NU). It's designed to be a low-pressure, conversational, multiple-choice test that identifies a student's academic strengths and weaknesses, helping educators and parents understand overall achievement and specific learning needs.
505
Test of Adult Basic Education (TABE): target population?
Adults aged 16 years and older seeking to improve basic skills.
506
What are readiness tests?
Criterion-referenced achievement tests indicating minimum skills needed to advance to the next level.
507
Common criticism of readiness tests?
Cultural and language bias affecting students from lower socioeconomic status and non-English-speaking homes.
508
Cognitive Abilities Test, Form 6 (CogAT): measures what?
Verbal, quantitative, and nonverbal reasoning abilities.
509
Otis–Lennon School Ability Test, Eighth Edition (OLSAT-8): focus?
Abstract thinking and reasoning abilities.
510
ACT Assessment: purpose?
Predicts readiness for college-level academic work.
511
Scholastic Assessment Test (SAT) Reasoning Test: assesses what?
Critical reading, mathematical reasoning, and writing skills.
512
Graduate Record Examination (GRE) Revised General Test: predicts what?
Graduate school success.
513
Miller Analogies Test (MAT): primary method?
Analogy-based assessment of analytical reasoning.
514
Law School Admission Test (LSAT): assesses which skills?
Reading comprehension, analytical reasoning, and logical reasoning.
515
Medical College Admission Test (MCAT): focus?
Scientific knowledge, problem solving, and critical thinking.
516
What is vocational aptitude testing?
Predictive testing measuring potential for occupational success.
517
Two categories of vocational aptitude tests?
Multiple aptitude tests and special aptitude tests.
518
Armed Services Vocational Aptitude Battery (ASVAB): key feature?
Measures multiple abilities for military and civilian job placement.
519
Differential Aptitude Test, Fifth Edition (DAT): intended population?
Students in grades seven through twelve.
520
Special aptitude tests measure what?
One specific, homogeneous area of aptitude.
521
Stanford–Binet Intelligence Scales, Fifth Edition (SB-5): age range?
Ages 2 through 90 years.
522
Stanford–Binet Intelligence Scales scoring system?
Mean of 100 and standard deviation of 15.
523
Wechsler scales: defining feature?
Most widely used intelligence tests with age-specific versions.
524
Wechsler Adult Intelligence Scale, Fourth Edition (WAIS-IV): age range?
Ages 16 through 89 years.
525
Wechsler Intelligence Scale for Children, Fifth Edition (WISC-V): age range?
Ages 6 through 16 years.
526
Wechsler Preschool and Primary Scale of Intelligence, Fourth Edition (WPPSI-IV): age range?
Ages 2 years 6 months through 7 years 3 months.
527
Kaufman Assessment Battery for Children, Second Edition (KABC-II): special population focus?
Minority children and children with learning disabilities.
528
Kaufman Assessment Battery for Children theoretical models used?
Luria neuropsychological model and Cattell–Horn–Carroll (CHC) theory.
529
One-sentence NCE/CPCE summary of Section 7.4?
Assessment batteries differ by purpose—survey for breadth, diagnostic for depth, readiness for minimum competency, aptitude for prediction, and intelligence for cognitive functioning.
530
No Child Left Behind Act (NCLB) testing
Test results were used to evaluate school progress and accountability.
531
Advanced Placement (AP) exams
Scores can determine college credit and placement.
532
high school exit exams
Passing is required to receive a diploma.
533
driver’s license tests
Passing determines legal permission to drive.
534
professional licensure and certification exams
Passing determines entry into a profession.
535
Key feature #1 of high stakes testing
A single defined assessment determines the outcome.
536
Key feature #2 of high stakes testing
A clear pass–fail cutoff score.
537
Key feature #3 of high stakes testing
Test results have direct, real-world consequences.
538
NCE / CPCE one-sentence summary of high stakes testing
High stakes testing uses criterion-referenced standardized tests as the sole basis for major educational or professional decisions with significant consequences.
539
What is clinical assessment?
A “whole person” assessment that evaluates clients using multiple methods such as testing, observation, interviewing, and performance.
540
What is the primary goal of clinical assessment?
To increase client self-awareness and assist counselors with case conceptualization and treatment planning.
541
What domains are typically included in clinical assessment?
• Personality • Behavior • Affect • Cognition • Functioning • Risk (e.g., suicide)
542
Why is clinical assessment considered a “whole person” approach?
Because it integrates multiple data sources rather than relying on a single test.
543
What do personality tests assess?
The affective realm, including stable traits such as temperament and behavior patterns.
544
What aspects of personality are considered stable?
Traits and patterns that remain consistent through adulthood.
545
What are the two major categories of personality tests?
• Objective personality tests • Projective personality tests
546
What are objective personality tests?
Standardized self-report instruments using structured response formats such as multiple-choice or true/false.
547
What are the main purposes of objective personality tests?
• Identify personality traits, types, and states • Assess self-concept • Detect psychopathology • Assist with treatment planning
548
Key characteristic that distinguishes objective personality tests?
They have standardized administration, scoring, and interpretation.
549
What is the Minnesota Multiphasic Personality Inventory–2 (MMPI-2 — Minnesota Multiphasic Personality Inventory–Second Edition)?
A test used to identify adult psychopathology and assist in diagnosis.
550
Key features of the Minnesota Multiphasic Personality Inventory–2 (MMPI-2 — Minnesota Multiphasic Personality Inventory–Second Edition)?
• 567 true/false items • Adult population • 10 clinical scales • Multiple validity scales
551
What do the validity scales on the Minnesota Multiphasic Personality Inventory–2 (MMPI-2 — Minnesota Multiphasic Personality Inventory–Second Edition) measure?
Response distortion such as lying, defensiveness, exaggeration, or inconsistency.
552
What is the Millon Clinical Multiaxial Inventory–Fourth Edition (MCMI-IV — Millon Clinical Multiaxial Inventory–Fourth Edition)?
A test that assesses Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) personality disorders and clinical syndromes in adults.
553
What is the Myers-Briggs Type Indicator (MBTI — Myers-Briggs Type Indicator)?
A personality inventory based on Carl Jung’s psychological types, often used for self-awareness and career counseling.
554
What are the four dimensions measured by the Myers-Briggs Type Indicator (MBTI — Myers-Briggs Type Indicator)?
• Extraversion vs. Introversion • Sensing vs. Intuition • Thinking vs. Feeling • Judging vs. Perceiving
555
What is the California Psychological Inventory–Form 434 (CPI 434 — California Psychological Inventory)?
A measure of normal, nonpathological personality traits.
556
What population is best suited for the California Psychological Inventory–Form 434 (CPI 434 — California Psychological Inventory)?
Well-adjusted individuals; often used for vocational prediction.
557
What is the Sixteen Personality Factors Questionnaire (16PF — Sixteen Personality Factors Questionnaire)?
A test measuring 16 basic personality traits in normal populations, based on Raymond Cattell’s theory.
558
What is the NEO Personality Inventory–Third Edition (NEO PI-3 — NEO Personality Inventory–Third Edition)?
A measure of normal personality based on the Big Five personality traits.
559
What are the Big Five personality traits measured by the NEO Personality Inventory–Third Edition (NEO PI-3 — NEO Personality Inventory–Third Edition)?
• Neuroticism • Extraversion • Openness to experience • Agreeableness • Conscientiousness
560
What is the Coopersmith Self-Esteem Inventory (SEI — Self-Esteem Inventory)?
A test designed to measure self-esteem in children and adolescents.
561
What are projective personality tests?
Assessments that interpret responses to ambiguous stimuli to reveal unconscious thoughts and motivations.
562
What theoretical orientation underlies projective personality tests?
Psychoanalytic theory.
563
Why are projective tests considered indirect measures?
Clients are unaware of what is being assessed, reducing conscious response distortion.
564
What is the Rorschach Inkblot Test?
A projective test using 10 inkblot cards to assess personality and thought processes.
565
What are the three scoring components of the Rorschach Inkblot Test?
• Location • Determinants • Content
566
What is the Thematic Apperception Test (TAT — Thematic Apperception Test)?
A projective test where clients create stories about ambiguous pictures.
567
Major limitation of the Thematic Apperception Test (TAT — Thematic Apperception Test)?
Lack of a standardized scoring system.
568
What is the House-Tree-Person (HTP — House-Tree-Person) test?
A projective drawing technique used to interpret personality characteristics.
569
What are sentence completion tests?
Projective tests requiring clients to complete unfinished statements.
570
Do sentence completion tests have objective scoring systems?
No, interpretation is subjective.
571
One-sentence NCE / CPCE summary of projective personality tests?
Projective tests use ambiguous stimuli to uncover unconscious processes but lack standardization and reliability.