Can only do psychometrics on what types of tests?
standardized norm-referenced
Standardized (norm-referenced ) tests must be:
Standardized
talking about the method; tells you what you’re supposed to say-standardizes responses
Norm referenced
normed against or given to a large group of children in our case to find out the range of scores for what normal looks like-that allows for a meaningful comparison among children.
A good standardized norm referenced tests should have what 3 things
4 types of validity
What is validity?
the extent to which the test accurately measures what it says it’s measuring
Construct Validity
the idea that what items we are choosing to use actually go with that theoretical construct. So all the steps you’re taking to get happiness
ex: if testing receptive, ask a series of questions-people would have to agree-doesn’t have to be a questionnaire-With construct you cannot directly measure it, you have to get at it in different ways.
* a lot of what we do is construct because of behavior*
Content Validity
the extent to which this measures the entire body. experts in the field or statistics are who drive this
-2 questions within content are: what degree does the test include a respresentative sample of all important parts of that behavioral domain and to what extent is the test free from the influence of irregular variables
2 questions within content validity
Face validity
not necessarily done by experts in the field: do you look at it and think “yeah that’s what it measures”
-very close to the construct validity but face validity is much broader and lighter
Criterion-related validity
when you see if the test is related to some other gold standard. so one way is to look at concurrent validity (do they score similarly on this other test)
-are 2 tests supposed to measure the same thing giving you the same answer is the question for criterion related validity
construct
happiness, anger, motivation & we can try to get at these constructs by asking certain questions-assume these things drive our human behavior
predictive validity
how well test predicts future performance on related tests
Reliability-3 types
Reliability
is it doing a good job of measuring language?
inter-rater reliability
2 judges are deciding if the types of responses you’re getting are the same-want 2 judges to get identical/close to the same results. This is where you use statistics and look at how correlated they are
want inter-rater reliability to be 90% or greater
Test-retest reliability
to see if test is stable over time
Internal consistency reliability
looks at individual items in a test
Normative sample & derived scores
Normative sample
who you are assessing, SES/range
Raw scores
uninterpretable!!! because of age..why they get converted to standard scores
Standard scores (z-scores, t-scores, scaled scores)
developed through assesing your sample; model the test and find out what the mean and standard deviation are-on average how far from teh mean is the group. if standard deviation is big/far away from mean you have a flat curve
Percentile rank
score you performed at or better than. It is not a percentage of how many you got correct on a test. if average then your percentile rank is 50