Test Development Flashcards

(95 cards)

1
Q

TEST DEVELOPMENT | TRUE OR FALSE?
All tests are created equal.

A

FALSE. All tests are not created equal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

TEST DEVELOPMENT | TRUE OR FALSE?
The creation of a good test is not a matter of chance; it is the product of the thoughtful and sound application of established principles of test development.

A

TRUE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

IDENTIFICATION
In this context, this phrase is an umbrella term for all that goes into the process of creating a test.

A

Test Development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

FIVE STAGES OF TEST DEVELOPMENT | ENUMERATION
The process of developing a test occurs in five stages:

A

1) Test Conceptualization
2) Test Construction
3) Test Tryout
4) Item Analysis
5) Test Revision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
The conception of an idea for a test.

A

Test Conceptualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

FIVE STAGES OF TEST DEVELOPMENT | FILL IN THE BLANK
Once the idea for a test is conceived (test conceptualization), ________ ____________________ begins.

A

Test Construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
A stage in the process of test development that entails writing test items (or rewriting or revising existing items), as well as formatting items, setting scoring rules, and otherwise designing and building a test.

A

Test Construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
Once a preliminary form of the test has been developed, it is administered to a representative sample of testtakers under conditions that simulate the conditions that the final version of the test will be administered under.

A

Test Tryout

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
The data will be collected and testtakers’ performance on the test as a whole and on each item will be analyzed.

A

Test Tryout

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
Statistical procedures are employed to assist in making judgments about which items are good as they are, which items need to be revised, and which items should be discarded.

A

Item Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
May include analyses of item reliability, item validity, and item discrimination.

A

Item Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
Depending on the type of test, difficulty level may be analyzed as well.

A

Item Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
Refers to action taken to modify a test’s content or format for the purpose of improving the test’s effectiveness as a tool of measurement.

A

Test Revision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
This action is usually based on item analyses, as well as related information derived from the test tryout.

A

Test Revision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
The revised version of the test will then be tried out on a new sample of testtakers. After the results are analyzed, the test will be further revised if necessary.

A

Test Revision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

TEST CONCEPTUALIZATION | FILL IN THE BLANK
The beginnings of any published test can probably be traced to thoughts—_____-______, in behavioral terms.

A

Self-talk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

TEST CONCEPTUALIZATION | TRUE OR FALSE?
The test developer says to themselves something like, “There ought to be a test designed to measure [item/s to be measured] in [such and such] way.” The stimulus for such a thought could be almost anything.

A

TRUE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

For these tests, items are developed to differentiate among test-takers, and the goal is to spread scores out and rank individuals relative to a group.

A

Norm-referenced Tests (NRTs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
The goal is to spread scores out and rank individuals relative to a group.

A

NRTs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
Items are developed to measure mastery of specific skills or knowledge.

A

Criterion-referenced tests (CRTs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
Item difficulty is chosen so not everyone gets the same score—some easy, some hard, to maximize score variability.

A

NRTs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
Each item is directly tied to learning objectives.

A

CRTs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
Items must clearly represent the content domain and performance standards, with less concern about spreading scores.

A

CRTs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
Items focus on discrimination between high and low performers.

A

NRTs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
**NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES** Items focus on alignment with criteria and ensuring that mastery is accurately assessed.
CRTs
26
**NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES** A college entrance exam (like SATs) — items are written with a range of difficulty so students can be ranked from highest to lowest. The goal is to see who performs better relative to others.
NRTs
27
**NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES** A driver's license exam — items are written directly from the rules of driving. The goal is to see whether the person meets the set standard (safe to drive), not how they compare with other test-takers.
CRTs
28
**IDENTIFICATION** Preliminary research for creating a prototype test; its purpose is to decide which test items should be kept, revised, or discarded.
Pilot Work
29
**IDENTIFICATION** In developing a structured interview to measure introversion/extraversion, for example, this process may involve open-ended interviews with research subjects believed to be for some reason (perhaps on the basis of an existing test) to be introverted or extroverted.
Pilot Research
30
**IDENTIFICATION** Additionally, interviews with parents, teachers, friends, and others who know the subject might also be arranged.
Pilot Work
31
**IDENTIFICATION** Another type might involve physiological monitoring of the subjects (such as monitoring of heart rate) as a function of exposure to different types of stimuli.
Pilot Study
32
**TEST CONSTRUCTION | IDENTIFICATION** The process by which a measuring device is designed and calibrated and by which numbers (or other indices)—scale values—are assigned to different amounts of the trait, attribute, or characteristic being measured.
Scaling
33
**TEST CONSTRUCTION | IDENTIFICATION** Historically, this prolific psychologist is credited for being at the forefront of efforts to develop methodologically sound scaling methods.
L. L. Thurstone
34
**TEST CONSTRUCTION | IDENTIFICATION** In common parlance, these are instruments used to measure something, such as weight. In psychometrics, they may also be conceived of as instruments used to measure. Here, however, that *something* being measured is likely to be a trait, a state, or an ability.
Scales
35
**TEST CONSTRUCTION | TYPES OF SCALES** Nominal, Ordinal, Interval, and Ratio
Scales of Measurement
36
**TEST CONSTRUCTION | TYPES OF SCALES** Function of age is of critical interest
Age-based scale
37
**TEST CONSTRUCTION | TYPES OF SCALES** Grade is of critical interest
Grade-based scale
38
**TEST CONSTRUCTION | TYPES OF SCALES** Scores that can range from 1 to 9
Stanine scale
39
**TEST CONSTRUCTION | TYPES OF SCALES** Unidimensional vs. _______________
Multidimensional
40
**TEST CONSTRUCTION | TYPES OF SCALES** ________________ vs. Multidimensional
Unidimensional
41
**TEST CONSTRUCTION | TYPES OF SCALES** Comparative vs. _________________
Categorical
42
**TEST CONSTRUCTION | TYPES OF SCALES** _______________ vs. Categorical
Comparative
43
**TEST CONSTRUCTION | SCALING METHODS** Can be defined as a grouping of words, statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by the testtaker.
Rating Scale
44
**TEST CONSTRUCTION | SCALING METHODS** Indicate level, frequency, or quality.
Rating Scale
45
**TEST CONSTRUCTION | SCALING METHODS** Can be used to record judgments of oneself, others, experiences, or objects, and they can take several forms.
Rating Scale
46
**TEST CONSTRUCTION | SCALING METHODS** Because the final test score is obtained by summing the ratings across all the items, it is termed a _______________ _________.
Summative Scale
47
**TEST CONSTRUCTION | SCALING METHODS** One type of summative rating scale that is used extensively in psychology, usually to scale attitudes.
Likert Scale
48
**TEST CONSTRUCTION | SCALING METHODS** Relatively easy to construct. Each item presents the testtaker with five alternative responses (sometimes seven), usually on an agree-disagree or approve-disapprove continuum.
Likert Scale
49
**TEST CONSTRUCTION | SCALING METHODS** Another scaling method that yields ordinal-level measures.
Guttman Scale
50
**TEST CONSTRUCTION | SCALING METHODS** Items on it range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured.
Guttman Scale
51
**TEST CONSTRUCTION | SCALING METHODS** A feature is that all respondents who agree with the stronger statements of the attitude will also agree with milder statements.
Guttman Scale
52
**TEST CONSTRUCTION | SCALING METHODS** Are developed through the administration of a number of items to a target group.
Guttman Scale
53
**TEST CONSTRUCTION | SCALING METHODS** The resulting data of the Guttman scale are then analyzed by means of _________________ ____________, an item-analysis procedure and approach to test development that involves a graphic mapping of a testtaker's responses.
Scalogram Analysis
54
**TEST CONSTRUCTION | WRITING ITEMS** When devising a standardized test using a multiple-choice format, it is usually advisable that the first draft contain approximately ________ the number of items that the final version of the test will contain.
Twice
55
**TEST CONSTRUCTION | WRITING ITEMS** The reservoir or well from which items will or will not be drawn for the final version of the test.
Item Pool
56
**TEST CONSTRUCTION | ITEM FORMAT** Require testtakers to select a response from a set of alternative responses.
Selected-Response Format
57
**TEST CONSTRUCTION | ITEM FORMAT** - Stem - Correct alternative or option - Several incorrect alternatives or options (distractors or foils)
Multiple Choice
58
**TEST CONSTRUCTION | ITEM FORMAT** The testtaker is presented with two columns: *premises* on the left and *responses* on the right.
Matching
59
**TEST CONSTRUCTION | ITEM FORMAT** True or False.
Selected-Response Format
60
**TEST CONSTRUCTION | ITEM FORMAT** Require testtakers to supply or create the correct answer, not merely to select it (e.g., completion item, short answer, essay).
Constructed-Response Format
61
**TEST TRYOUT | TRUE OR FALSE?** The test should be tried out on people who are similar in critical respects to the people for whom the test was designed.
**TRUE.**
62
**TEST TRYOUT | TRUE OR FALSE?** Equally important are questions about the number of people on whom the test should be tried out. An informal rule of thumb is that there should be no fewer than 5 subjects and preferably as many as 10 for each item on the test. In general, the *less* subjects in the tryout, the better.
**FALSE.** The *more* subjects in the tryout, the better.
63
**TEST TRYOUT | FILL IN THE BLANK** A definite risk in using too few subjects during test tryout comes during factor analysis of the findings, when what we might call ______________ ___________—that actually are just artifacts of the small sample size—may emerge.
Phantom Factors
64
**ITEM ANALYSIS | IDENTIFICATION** Shows how easy or hard an item is, based on the proportion of students who answered correctly.
An Index of the Item's Difficulty
65
**ITEM ANALYSIS | IDENTIFICATION** Indicates how consistently an item measures, contributing to the overall test reliability.
An Index of the Item's Reliability
66
**ITEM ANALYSIS | IDENTIFICATION** Reflects how well an item measures what it is supposed to measure.
An Index of the Item's Validity
67
**ITEM ANALYSIS | IDENTIFICATION** Shows how well an item distinguishes between high-performing and low-performing students.
An Index of Item Discrimination
68
**TEST REVISION | TRUE OR FALSE?** Happens before a newly developed test has gone through pilot work and initial administrations.
**FALSE.** Test revision occurs *right after*.
69
**TEST REVISION | TRUE OR FALSE?** Focus is on improving items, instructions, scoring, and format based on item analysis and feedback. The goal is to produce the finalized first edition of the test.
**TRUE.**
70
**TEST REVISION | IDENTIFICATION** Involves updating and modifying a test that is already published and in use.
Test Revision of an Existing Test
71
**TEST REVISION | IDENTIFICATION** May include adding/removing items, updating norms, revising outdated content, or improving reliability/validity. The goal is to release an improved new edition that reflects current research, language, and test-taker needs.
Test Revision of an Existing Test
72
For these tests, items are developed to measure mastery of specific skills or knowledge, and each item must clearly represent the current domain and performance standards (less concern about spreading scores).
Criterion-referenced tests (CRTs)
73
Preliminary research for creating a prototype test.
Pilot Work
73
This man is credited for being at the forefront efforts to develop methodologically sound scaling methods.
L.L. Thurstone
73
The process by which a measuring device is designed and calibrated and by which scale values are assigned to different amounts of the trait, attribute, or characteristic being measured.
Scaling
74
Instruments used to measure something.
Scales
75
**SCALING METHODS** Defined as a grouping of words, statements, or symbols on which judgements of the strength of a particular trait, attitude or emotion are indicated by the test taker.
Rating Scale
75
**SCALING METHODS** These scales indicate level, frequency, or quality.
Rating Scales
76
**SCALING METHODS** The final test score from a rating scale is obtained by summing the ratings across all the items, therefore it is termed a ______________ scale.
Summative Scale
77
**SCALING METHODS** A type of summative rating scale used extensively in psychology, usually to scale attitudes.
Likert Scale
77
**SCALING METHODS** A scaling method that yields ordinal level measures and in which its items range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured.
Guttman Scale
78
**SCALING METHODS** A feature of this scale is that all respondents who agree with the stronger statements of the attitude will also agree with milder statements.
Guttman Scale
79
**SCALING METHODS** These are developed through the administration of a number of items to a target group.
Guttman Scale
80
**SCALING METHODS** An item analysis procedure and approach to test development that involves a graphic mapping of a test taker’s responses.
Scalogram Analysis
81
**WRITING ITEMS** A reservoir or well from which items will or will not be drawn for the final version of the test.
Item Pool
82
**ITEM FORMAT** Require testtakers to select a response from a set of alternative responses.
Selected-response Format
83
**ITEM FORMAT** Require testtakers to supply or create the correct answer, not merely to select it.
Constructed-response Format
84
*TRUE OR FALSE* In general, the **more subjects** in a test tryout, the better.
**TRUE**
85
**ITEM ANALYSIS** This index shows how easy or hard an item is, based on the proportion of students who answered correctly.
Index of Item’s Difficulty
86
**ITEM ANALYSIS** This index shows how consistently an item measures, contributing to the overall test reliability.
Index of the Item’s Reliability
87
**ITEM ANALYSIS** This index shows how well an item measures what it is supposed to measure.
Index of the Item’s Validity
88
**ITEM ANALYSIS** This index shows how well an item distinguishes between high-performing and low-performing students.
Index of Item Discrimination
89
This test revision relies on the goal to produce a finalized first edition of the test.
Test Revision of a New Test
90
This test revision relies on the goal of releasing an improved new edition that reflects current research, language, and test-taker needs.
Test Revision of an Existing Test
91
**ITEM FORMAT** The test taker is presented with two columns: premises on the left, and responses on the right.
Matching Type