Chapter 8 - Test Development Flashcards

(29 cards)

1
Q

What are the five stages of test development?

A

Conceptualization, Construction, Tryout, Analysis, Revision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the relations among the five stages of making a test?

A

Sequential process (build off each other), but isn’t necessarily a one and done (the revision process requires going back to previous stages)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What types of questions need to be considered during the conceptualization phase?

A

What is motivating us to create something new/revise something old (what’s the purpose)?
What will the test cover, and who will use it?
How will the test be administered and used?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What processes are part of test construction?

A

Pilot work, determining scaling methods, writing items, deciding how the items will be scored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is pilot work?

A

Creating a prototype and receiving feedback on it using expert panels or focus groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the different rating scales?

A

Likert, Guttman, Comparative, Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is comparative scaling vs categorical scaling?

A

Comparative = judgment of a stimulus in comparison with another
Categorical = physical stimuli are placed into two or more alternative (pre-determined) categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Guttman scale?

A

Items range sequentially from weaker to stronger expressions of an attitude or belief
So, if an individual responds “yes” to a later question, they should’ve also responded “yes” to previous ones

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an item pool?

A

A set of items from which the final version of the test will be derived

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between selected-response format and constructed-response format?

A

Selected = items require the test taker to select a response from a set of pre-determined ones (ex: multiple choice; true/false)

Constructed = items require test takers to supply or create the “correct” answer (ex: short response questions or essays)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is computerized adaptive testing?

A

Interactive, computer-administered test in which the items presented to the test taker are influenced by the test taker’s responses to previous questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the benefits of computerized adaptive testing?

A
  • Efficiency in testing time and number of items presented
  • Reduces floor and ceiling effects (not having enough items to test either the low or high end)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a floor effect? A ceiling effect?

A

Floor = diminished ability to distinguish test takers at the low end of the scale
Ceiling = diminished ability to distinguish test takers at the high end of the scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the different methods for scoring items? How might our motivations for test conceptualization impact the type we use?

A

Cumulatively, Class scoring, Ipsative scoring

The purpose behind creating a test can serve as a guide for the kind of scoring you wish to use. For example, if we’re trying to diagnose somebody, we may use class scoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is cumulative scoring?

A

Based on the assumption that the higher the score on a test, the higher the test taker is on that ability or trait

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is class scoring?

A

Responses earn credit toward placement in a particular category with other test takers who have a similar pattern of responses
Ex: diagnostic criteria

17
Q

What is Ipsative scoring?

A

Comparing a test taker’s score on one scale within a test to another scale within that same test (ideographic interpretation)

18
Q

What qualities separate a good item from a bad item?

A

Reliable and Valid
Discriminates test takers

19
Q

What are some guidelines for test tryout?

A
  • Should be standardized with same population it was designed for
  • 5 to 10 respondents per item
  • Should be administered in same conditions, with the same instructions, as the final product
20
Q

What is the purpose of the different types of item analyses indices - difficulty, reliability, validity, discrimination?

A

To determine which items are best for which purpose (how reliable and valid are they? how well do they discriminate test takers?)

21
Q

What is the difference between the item-difficulty index and the item-endorsement index?

A

Difficulty = proportion of respondents answering an item correctly
Endorsement = percentage of agreement (as opposed to percentage “correct”, b/c there may not be a “correct” response)

22
Q

What is the d-value on the item-discrimination index?

A

The proportion of high scorers answering an item correctly versus the proportion of low scorers answering the item correctly

This value can NEVER be negative
We generally want it to be higher, but there’s no particular threshold

23
Q

What are the a and b parameters on an item characteristic curve?

A

a parameter = the relatedness of the item to the latent construct (the slope of the line); this can NEVER be negative

b parameter = the point of the latent construct where the probability of endorsing the item equals 0.50

24
Q

How do qualitative methods differ from more quantitative methods?

A

Data generation and analysis relies primarily on verbal rather than mathematical procedures

25
What are some examples of qualitative methods of item analysis?
Think-aloud test administration (verbalize thoughts) Expert panels Sensitivity review (examines fairness by checking for offensive language or stereotypes)
26
When/why would we want to revise a test?
- outdated or offensive words - norms no longer represent the population - psychometric properties need improvement - underlying theory has changed
27
What is Cross-Validation in test revision?
The revalidation of a test on a new sample of test takers, other than those on whom performance was originally found to be valid Can have the same or new inclusion criteria
28
What is Co-Validation in test revision? What are the benefits of it?
Two or more tests are validated using the same sample of test takers - economical for test developers - minimizes sampling error
29
What are the 3 applications of IRT in test building/revising?
- Evaluating existing tests for the purpose of mapping test revisions - Determining measurement equivalence across test-taker populations - Developing item banks