Test Development Flashcards by Nick Tabaldo

TEST DEVELOPMENT | TRUE OR FALSE?
All tests are created equal.

FALSE. All tests are not created equal.

How well did you know this?

Not at all

Perfectly

TEST DEVELOPMENT | TRUE OR FALSE?
The creation of a good test is not a matter of chance; it is the product of the thoughtful and sound application of established principles of test development.

TRUE.

How well did you know this?

Not at all

Perfectly

IDENTIFICATION
In this context, this phrase is an umbrella term for all that goes into the process of creating a test.

Test Development

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | ENUMERATION
The process of developing a test occurs in five stages:

1) Test Conceptualization
2) Test Construction
3) Test Tryout
4) Item Analysis
5) Test Revision

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
The conception of an idea for a test.

Test Conceptualization

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | FILL IN THE BLANK
Once the idea for a test is conceived (test conceptualization), ________ ____________________ begins.

Test Construction

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
A stage in the process of test development that entails writing test items (or rewriting or revising existing items), as well as formatting items, setting scoring rules, and otherwise designing and building a test.

Test Construction

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
Once a preliminary form of the test has been developed, it is administered to a representative sample of testtakers under conditions that simulate the conditions that the final version of the test will be administered under.

Test Tryout

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
The data will be collected and testtakers’ performance on the test as a whole and on each item will be analyzed.

Test Tryout

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
Statistical procedures are employed to assist in making judgments about which items are good as they are, which items need to be revised, and which items should be discarded.

Item Analysis

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
May include analyses of item reliability, item validity, and item discrimination.

Item Analysis

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
Depending on the type of test, difficulty level may be analyzed as well.

Item Analysis

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
Refers to action taken to modify a test’s content or format for the purpose of improving the test’s effectiveness as a tool of measurement.

Test Revision

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
This action is usually based on item analyses, as well as related information derived from the test tryout.

Test Revision

How well did you know this?

Not at all

Perfectly

FIVE STAGES OF TEST DEVELOPMENT | IDENTIFICATION
The revised version of the test will then be tried out on a new sample of testtakers. After the results are analyzed, the test will be further revised if necessary.

Test Revision

How well did you know this?

Not at all

Perfectly

TEST CONCEPTUALIZATION | FILL IN THE BLANK
The beginnings of any published test can probably be traced to thoughts—_____-______, in behavioral terms.

Self-talk

How well did you know this?

Not at all

Perfectly

TEST CONCEPTUALIZATION | TRUE OR FALSE?
The test developer says to themselves something like, “There ought to be a test designed to measure [item/s to be measured] in [such and such] way.” The stimulus for such a thought could be almost anything.

TRUE.

How well did you know this?

Not at all

Perfectly

For these tests, items are developed to differentiate among test-takers, and the goal is to spread scores out and rank individuals relative to a group.

Norm-referenced Tests (NRTs)

How well did you know this?

Not at all

Perfectly

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
The goal is to spread scores out and rank individuals relative to a group.

NRTs

How well did you know this?

Not at all

Perfectly

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
Items are developed to measure mastery of specific skills or knowledge.

Criterion-referenced tests (CRTs)

How well did you know this?

Not at all

Perfectly

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
Item difficulty is chosen so not everyone gets the same score—some easy, some hard, to maximize score variability.

NRTs

How well did you know this?

Not at all

Perfectly

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
Each item is directly tied to learning objectives.

CRTs

How well did you know this?

Not at all

Perfectly

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
Items must clearly represent the content domain and performance standards, with less concern about spreading scores.

CRTs

How well did you know this?

Not at all

Perfectly

NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES
Items focus on discrimination between high and low performers.

NRTs

How well did you know this?

Not at all

Perfectly

**NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES** Items focus on alignment with criteria and ensuring that mastery is accurately assessed.

CRTs

**NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES** A college entrance exam (like SATs) — items are written with a range of difficulty so students can be ranked from highest to lowest. The goal is to see who performs better relative to others.

NRTs

**NORM-REFERENCED VS. CRITERION-REFERENCED TESTS: ITEM DEVELOPMENT ISSUES** A driver's license exam — items are written directly from the rules of driving. The goal is to see whether the person meets the set standard (safe to drive), not how they compare with other test-takers.

CRTs

**IDENTIFICATION** Preliminary research for creating a prototype test; its purpose is to decide which test items should be kept, revised, or discarded.

Pilot Work

**IDENTIFICATION** In developing a structured interview to measure introversion/extraversion, for example, this process may involve open-ended interviews with research subjects believed to be for some reason (perhaps on the basis of an existing test) to be introverted or extroverted.

Pilot Research

**IDENTIFICATION** Additionally, interviews with parents, teachers, friends, and others who know the subject might also be arranged.

Pilot Work

**IDENTIFICATION** Another type might involve physiological monitoring of the subjects (such as monitoring of heart rate) as a function of exposure to different types of stimuli.

Pilot Study

**TEST CONSTRUCTION | IDENTIFICATION** The process by which a measuring device is designed and calibrated and by which numbers (or other indices)—scale values—are assigned to different amounts of the trait, attribute, or characteristic being measured.

Scaling

**TEST CONSTRUCTION | IDENTIFICATION** Historically, this prolific psychologist is credited for being at the forefront of efforts to develop methodologically sound scaling methods.

L. L. Thurstone

**TEST CONSTRUCTION | IDENTIFICATION** In common parlance, these are instruments used to measure something, such as weight. In psychometrics, they may also be conceived of as instruments used to measure. Here, however, that *something* being measured is likely to be a trait, a state, or an ability.

Scales

**TEST CONSTRUCTION | TYPES OF SCALES** Nominal, Ordinal, Interval, and Ratio

Scales of Measurement

**TEST CONSTRUCTION | TYPES OF SCALES** Function of age is of critical interest

Age-based scale

**TEST CONSTRUCTION | TYPES OF SCALES** Grade is of critical interest

Grade-based scale

**TEST CONSTRUCTION | TYPES OF SCALES** Scores that can range from 1 to 9

Stanine scale

**TEST CONSTRUCTION | TYPES OF SCALES** Unidimensional vs. _______________

Multidimensional

**TEST CONSTRUCTION | TYPES OF SCALES** ________________ vs. Multidimensional

Unidimensional

**TEST CONSTRUCTION | TYPES OF SCALES** Comparative vs. _________________

Categorical

**TEST CONSTRUCTION | TYPES OF SCALES** _______________ vs. Categorical

Comparative

**TEST CONSTRUCTION | SCALING METHODS** Can be defined as a grouping of words, statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by the testtaker.

Rating Scale

**TEST CONSTRUCTION | SCALING METHODS** Indicate level, frequency, or quality.

Rating Scale

**TEST CONSTRUCTION | SCALING METHODS** Can be used to record judgments of oneself, others, experiences, or objects, and they can take several forms.

Rating Scale

**TEST CONSTRUCTION | SCALING METHODS** Because the final test score is obtained by summing the ratings across all the items, it is termed a _______________ _________.

Summative Scale

**TEST CONSTRUCTION | SCALING METHODS** One type of summative rating scale that is used extensively in psychology, usually to scale attitudes.

Likert Scale

**TEST CONSTRUCTION | SCALING METHODS** Relatively easy to construct. Each item presents the testtaker with five alternative responses (sometimes seven), usually on an agree-disagree or approve-disapprove continuum.

Likert Scale

**TEST CONSTRUCTION | SCALING METHODS** Another scaling method that yields ordinal-level measures.

Guttman Scale

**TEST CONSTRUCTION | SCALING METHODS** Items on it range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured.

Guttman Scale

**TEST CONSTRUCTION | SCALING METHODS** A feature is that all respondents who agree with the stronger statements of the attitude will also agree with milder statements.

Guttman Scale

**TEST CONSTRUCTION | SCALING METHODS** Are developed through the administration of a number of items to a target group.

Guttman Scale

**TEST CONSTRUCTION | SCALING METHODS** The resulting data of the Guttman scale are then analyzed by means of _________________ ____________, an item-analysis procedure and approach to test development that involves a graphic mapping of a testtaker's responses.

Scalogram Analysis

**TEST CONSTRUCTION | WRITING ITEMS** When devising a standardized test using a multiple-choice format, it is usually advisable that the first draft contain approximately ________ the number of items that the final version of the test will contain.

Twice

**TEST CONSTRUCTION | WRITING ITEMS** The reservoir or well from which items will or will not be drawn for the final version of the test.

Item Pool

**TEST CONSTRUCTION | ITEM FORMAT** Require testtakers to select a response from a set of alternative responses.

Selected-Response Format

**TEST CONSTRUCTION | ITEM FORMAT** - Stem - Correct alternative or option - Several incorrect alternatives or options (distractors or foils)

Multiple Choice

**TEST CONSTRUCTION | ITEM FORMAT** The testtaker is presented with two columns: *premises* on the left and *responses* on the right.

Matching

**TEST CONSTRUCTION | ITEM FORMAT** True or False.

Selected-Response Format

**TEST CONSTRUCTION | ITEM FORMAT** Require testtakers to supply or create the correct answer, not merely to select it (e.g., completion item, short answer, essay).

Constructed-Response Format

**TEST TRYOUT | TRUE OR FALSE?** The test should be tried out on people who are similar in critical respects to the people for whom the test was designed.

**TRUE.**

**TEST TRYOUT | TRUE OR FALSE?** Equally important are questions about the number of people on whom the test should be tried out. An informal rule of thumb is that there should be no fewer than 5 subjects and preferably as many as 10 for each item on the test. In general, the *less* subjects in the tryout, the better.

**FALSE.** The *more* subjects in the tryout, the better.

**TEST TRYOUT | FILL IN THE BLANK** A definite risk in using too few subjects during test tryout comes during factor analysis of the findings, when what we might call ______________ ___________—that actually are just artifacts of the small sample size—may emerge.

Phantom Factors

**ITEM ANALYSIS | IDENTIFICATION** Shows how easy or hard an item is, based on the proportion of students who answered correctly.

An Index of the Item's Difficulty

**ITEM ANALYSIS | IDENTIFICATION** Indicates how consistently an item measures, contributing to the overall test reliability.

An Index of the Item's Reliability

**ITEM ANALYSIS | IDENTIFICATION** Reflects how well an item measures what it is supposed to measure.

An Index of the Item's Validity

**ITEM ANALYSIS | IDENTIFICATION** Shows how well an item distinguishes between high-performing and low-performing students.

An Index of Item Discrimination

**TEST REVISION | TRUE OR FALSE?** Happens before a newly developed test has gone through pilot work and initial administrations.

**FALSE.** Test revision occurs *right after*.

**TEST REVISION | TRUE OR FALSE?** Focus is on improving items, instructions, scoring, and format based on item analysis and feedback. The goal is to produce the finalized first edition of the test.

**TRUE.**

**TEST REVISION | IDENTIFICATION** Involves updating and modifying a test that is already published and in use.

Test Revision of an Existing Test

**TEST REVISION | IDENTIFICATION** May include adding/removing items, updating norms, revising outdated content, or improving reliability/validity. The goal is to release an improved new edition that reflects current research, language, and test-taker needs.

Test Revision of an Existing Test

For these tests, items are developed to measure mastery of specific skills or knowledge, and each item must clearly represent the current domain and performance standards (less concern about spreading scores).

Criterion-referenced tests (CRTs)

Preliminary research for creating a prototype test.

Pilot Work

This man is credited for being at the forefront efforts to develop methodologically sound scaling methods.

L.L. Thurstone

The process by which a measuring device is designed and calibrated and by which scale values are assigned to different amounts of the trait, attribute, or characteristic being measured.

Scaling

Instruments used to measure something.

Scales

**SCALING METHODS** Defined as a grouping of words, statements, or symbols on which judgements of the strength of a particular trait, attitude or emotion are indicated by the test taker.

Rating Scale

**SCALING METHODS** These scales indicate level, frequency, or quality.

Rating Scales

**SCALING METHODS** The final test score from a rating scale is obtained by summing the ratings across all the items, therefore it is termed a ______________ scale.

Summative Scale

**SCALING METHODS** A type of summative rating scale used extensively in psychology, usually to scale attitudes.

Likert Scale

**SCALING METHODS** A scaling method that yields ordinal level measures and in which its items range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured.

Guttman Scale

**SCALING METHODS** A feature of this scale is that all respondents who agree with the stronger statements of the attitude will also agree with milder statements.

Guttman Scale

**SCALING METHODS** These are developed through the administration of a number of items to a target group.

Guttman Scale

**SCALING METHODS** An item analysis procedure and approach to test development that involves a graphic mapping of a test taker’s responses.

Scalogram Analysis

**WRITING ITEMS** A reservoir or well from which items will or will not be drawn for the final version of the test.

Item Pool

**ITEM FORMAT** Require testtakers to select a response from a set of alternative responses.

Selected-response Format

**ITEM FORMAT** Require testtakers to supply or create the correct answer, not merely to select it.

Constructed-response Format

*TRUE OR FALSE* In general, the **more subjects** in a test tryout, the better.

**TRUE**

**ITEM ANALYSIS** This index shows how easy or hard an item is, based on the proportion of students who answered correctly.

Index of Item’s Difficulty

**ITEM ANALYSIS** This index shows how consistently an item measures, contributing to the overall test reliability.

Index of the Item’s Reliability

**ITEM ANALYSIS** This index shows how well an item measures what it is supposed to measure.

Index of the Item’s Validity

**ITEM ANALYSIS** This index shows how well an item distinguishes between high-performing and low-performing students.

Index of Item Discrimination

This test revision relies on the goal to produce a finalized first edition of the test.

Test Revision of a New Test

This test revision relies on the goal of releasing an improved new edition that reflects current research, language, and test-taker needs.

Test Revision of an Existing Test

**ITEM FORMAT** The test taker is presented with two columns: premises on the left, and responses on the right.

Matching Type

Test Development Flashcards

(95 cards)