Chapter 8 - Test Development Flashcards by Amanda Johnson

What are the five stages of test development?

Conceptualization, Construction, Tryout, Analysis, Revision

How well did you know this?

Not at all

Perfectly

What are the relations among the five stages of making a test?

Sequential process (build off each other), but isn’t necessarily a one and done (the revision process requires going back to previous stages)

How well did you know this?

Not at all

Perfectly

What types of questions need to be considered during the conceptualization phase?

What is motivating us to create something new/revise something old (what’s the purpose)?
What will the test cover, and who will use it?
How will the test be administered and used?

How well did you know this?

Not at all

Perfectly

What processes are part of test construction?

Pilot work, determining scaling methods, writing items, deciding how the items will be scored

How well did you know this?

Not at all

Perfectly

What is pilot work?

Creating a prototype and receiving feedback on it using expert panels or focus groups

How well did you know this?

Not at all

Perfectly

What are the different rating scales?

Likert, Guttman, Comparative, Categorical

How well did you know this?

Not at all

Perfectly

What is comparative scaling vs categorical scaling?

Comparative = judgment of a stimulus in comparison with another
Categorical = physical stimuli are placed into two or more alternative (pre-determined) categories

How well did you know this?

Not at all

Perfectly

What is a Guttman scale?

Items range sequentially from weaker to stronger expressions of an attitude or belief
So, if an individual responds “yes” to a later question, they should’ve also responded “yes” to previous ones

How well did you know this?

Not at all

Perfectly

What is an item pool?

A set of items from which the final version of the test will be derived

How well did you know this?

Not at all

Perfectly

What is the difference between selected-response format and constructed-response format?

Selected = items require the test taker to select a response from a set of pre-determined ones (ex: multiple choice; true/false)

Constructed = items require test takers to supply or create the “correct” answer (ex: short response questions or essays)

How well did you know this?

Not at all

Perfectly

What is computerized adaptive testing?

Interactive, computer-administered test in which the items presented to the test taker are influenced by the test taker’s responses to previous questions

How well did you know this?

Not at all

Perfectly

What are the benefits of computerized adaptive testing?

Efficiency in testing time and number of items presented
Reduces floor and ceiling effects (not having enough items to test either the low or high end)

How well did you know this?

Not at all

Perfectly

What is a floor effect? A ceiling effect?

Floor = diminished ability to distinguish test takers at the low end of the scale
Ceiling = diminished ability to distinguish test takers at the high end of the scale

How well did you know this?

Not at all

Perfectly

What are the different methods for scoring items? How might our motivations for test conceptualization impact the type we use?

Cumulatively, Class scoring, Ipsative scoring

The purpose behind creating a test can serve as a guide for the kind of scoring you wish to use. For example, if we’re trying to diagnose somebody, we may use class scoring

How well did you know this?

Not at all

Perfectly

What is cumulative scoring?

Based on the assumption that the higher the score on a test, the higher the test taker is on that ability or trait

How well did you know this?

Not at all

Perfectly

What is class scoring?

Study These Flashcards

Responses earn credit toward placement in a particular category with other test takers who have a similar pattern of responses
Ex: diagnostic criteria

What is Ipsative scoring?

Study These Flashcards

Comparing a test taker’s score on one scale within a test to another scale within that same test (ideographic interpretation)

What qualities separate a good item from a bad item?

Study These Flashcards

Reliable and Valid
Discriminates test takers

What are some guidelines for test tryout?

Study These Flashcards

Should be standardized with same population it was designed for
5 to 10 respondents per item
Should be administered in same conditions, with the same instructions, as the final product

What is the purpose of the different types of item analyses indices - difficulty, reliability, validity, discrimination?

Study These Flashcards

To determine which items are best for which purpose (how reliable and valid are they? how well do they discriminate test takers?)

What is the difference between the item-difficulty index and the item-endorsement index?

Study These Flashcards

Difficulty = proportion of respondents answering an item correctly
Endorsement = percentage of agreement (as opposed to percentage “correct”, b/c there may not be a “correct” response)

What is the d-value on the item-discrimination index?

Study These Flashcards

The proportion of high scorers answering an item correctly versus the proportion of low scorers answering the item correctly

This value can NEVER be negative
We generally want it to be higher, but there’s no particular threshold

What are the a and b parameters on an item characteristic curve?

Study These Flashcards

a parameter = the relatedness of the item to the latent construct (the slope of the line); this can NEVER be negative

b parameter = the point of the latent construct where the probability of endorsing the item equals 0.50

How do qualitative methods differ from more quantitative methods?

Study These Flashcards

Data generation and analysis relies primarily on verbal rather than mathematical procedures

What are some examples of qualitative methods of item analysis?

Think-aloud test administration (verbalize thoughts) Expert panels Sensitivity review (examines fairness by checking for offensive language or stereotypes)

When/why would we want to revise a test?

- outdated or offensive words - norms no longer represent the population - psychometric properties need improvement - underlying theory has changed

What is Cross-Validation in test revision?

The revalidation of a test on a new sample of test takers, other than those on whom performance was originally found to be valid Can have the same or new inclusion criteria

What is Co-Validation in test revision? What are the benefits of it?

Two or more tests are validated using the same sample of test takers - economical for test developers - minimizes sampling error

What are the 3 applications of IRT in test building/revising?

- Evaluating existing tests for the purpose of mapping test revisions - Determining measurement equivalence across test-taker populations - Developing item banks

Chapter 8 - Test Development Flashcards

(29 cards)