What are the five stages of test development?
Conceptualization, Construction, Tryout, Analysis, Revision
What are the relations among the five stages of making a test?
Sequential process (build off each other), but isn’t necessarily a one and done (the revision process requires going back to previous stages)
What types of questions need to be considered during the conceptualization phase?
What is motivating us to create something new/revise something old (what’s the purpose)?
What will the test cover, and who will use it?
How will the test be administered and used?
What processes are part of test construction?
Pilot work, determining scaling methods, writing items, deciding how the items will be scored
What is pilot work?
Creating a prototype and receiving feedback on it using expert panels or focus groups
What are the different rating scales?
Likert, Guttman, Comparative, Categorical
What is comparative scaling vs categorical scaling?
Comparative = judgment of a stimulus in comparison with another
Categorical = physical stimuli are placed into two or more alternative (pre-determined) categories
What is a Guttman scale?
Items range sequentially from weaker to stronger expressions of an attitude or belief
So, if an individual responds “yes” to a later question, they should’ve also responded “yes” to previous ones
What is an item pool?
A set of items from which the final version of the test will be derived
What is the difference between selected-response format and constructed-response format?
Selected = items require the test taker to select a response from a set of pre-determined ones (ex: multiple choice; true/false)
Constructed = items require test takers to supply or create the “correct” answer (ex: short response questions or essays)
What is computerized adaptive testing?
Interactive, computer-administered test in which the items presented to the test taker are influenced by the test taker’s responses to previous questions
What are the benefits of computerized adaptive testing?
What is a floor effect? A ceiling effect?
Floor = diminished ability to distinguish test takers at the low end of the scale
Ceiling = diminished ability to distinguish test takers at the high end of the scale
What are the different methods for scoring items? How might our motivations for test conceptualization impact the type we use?
Cumulatively, Class scoring, Ipsative scoring
The purpose behind creating a test can serve as a guide for the kind of scoring you wish to use. For example, if we’re trying to diagnose somebody, we may use class scoring
What is cumulative scoring?
Based on the assumption that the higher the score on a test, the higher the test taker is on that ability or trait
What is class scoring?
Responses earn credit toward placement in a particular category with other test takers who have a similar pattern of responses
Ex: diagnostic criteria
What is Ipsative scoring?
Comparing a test taker’s score on one scale within a test to another scale within that same test (ideographic interpretation)
What qualities separate a good item from a bad item?
Reliable and Valid
Discriminates test takers
What are some guidelines for test tryout?
What is the purpose of the different types of item analyses indices - difficulty, reliability, validity, discrimination?
To determine which items are best for which purpose (how reliable and valid are they? how well do they discriminate test takers?)
What is the difference between the item-difficulty index and the item-endorsement index?
Difficulty = proportion of respondents answering an item correctly
Endorsement = percentage of agreement (as opposed to percentage “correct”, b/c there may not be a “correct” response)
What is the d-value on the item-discrimination index?
The proportion of high scorers answering an item correctly versus the proportion of low scorers answering the item correctly
This value can NEVER be negative
We generally want it to be higher, but there’s no particular threshold
What are the a and b parameters on an item characteristic curve?
a parameter = the relatedness of the item to the latent construct (the slope of the line); this can NEVER be negative
b parameter = the point of the latent construct where the probability of endorsing the item equals 0.50
How do qualitative methods differ from more quantitative methods?
Data generation and analysis relies primarily on verbal rather than mathematical procedures