Give an example of a norm-referenced test.
An IQ test – as the average is dependent upon all participants.
Give an example of a criterion-referenced test.
A quiz where the answers are compared to a set of correct answers.
What are the advantages and disadvantages of using a norm-referenced or criterion-referenced test?
The advantages of norm-referenced test include scores being protected from a test possibly easier or harder (in the case of writing an examination – if the writer is in a worse mood, and writes a very hard test, a lower proportion will fail it), and it yields a good distribution of scores, which then allows discrimination of good and bad). The disadvantage of norm-referenced tests is that the norm is easily changeable – and scores are changed if the norm is (if the norm is really smart, less likely to get grade deserved). The advantage of criterion referenced tests include scores not changing if the norm changes, and that absolute standards can be set based on what people can actually do. The disadvantage of criterion referenced tests include that scores are affected by test difficulty, and that there is a larger risk of floor or ceiling effects.
In a criterion-referenced course, the proportion of higher grades is found to increase from a previous semester. What three reasons could account for this?
The students got smarter, the assessment was easier than previous, or the teaching of the course was better.
What is classical test theory?
Classical test theory is the traditional conceptual basis of psychometrics – it is the idea that every measurement we take can be decomposed into two parts: true score (underlying thing trying to be measured) and measurement error.
What is true score theory?
Another name for classical test theory
What is reliability in terms of the relationship between true and total test score variability?It is a type of test construction.
Reliability is the ratio between the true variability and the total variability.
List the various sources of measurement error.
Test construction, test administration, test scoring and then other influences.
What is item sampling?
Item sampling is where a small selection of questions are asked out of a great range of possibilities. It is a type of test construction.
What is content sampling?
A type of test construction – can be a source of measurement error.
What is domain sampling?
A type of test construction – can be a source of measurement error.
Why can we only estimate the reliability of a test and not measure it directly?
Reliability is based on the hypothetical construct of a true score, which in turn means that we are never able to say for certain what the reliability is – we can only estimate it.
Describe four methods available to us to help estimate the reliability of a test.
Estimating the internal consistency, test-retest reliability, parallel-forms reliability and inter-rater reliability.
What is internal consistency?
How much the item scores in a test correlate with one another on average (the average correlation between the items on a scale).
If someone describes a psychological scale as having “high internal coherence”, what are they talking about?
They are saying that the psychological scale has high internal consistency – there is a high average correlation between the items on the scale.
What is inter-item consistency?
An alternative label for internal consistency.
How do you calculate Cronbach’s alpha using JAMOVI?
Select FACTOR then RELIABILITY ANALYSIS. Then, select all the items in scale and move them to the ‘items’ box.
Describe the steps involved in calculating Cronbach’s alpha by hand.
The questionnaire is split in half. The total score for each half is calculated. Then the correlation between the total scores for each half is worked out. Steps 1-3 is then repeated for all possible two way splits of the questionnaire. The average of all possible split half correlations is calculated. Finally, the correlation is adjusted to account for the fact that the test has been shortened by applying the Spearman-Brown formula as a correction.
What is test-retest reliability?
The correlation between scores on the same test by the same people sone at two different times.
What statistic can we use to evaluate test-retest reliability?
r – correlation
Test-retest reliability involving giving the same test twice. Give an example of a situation where giving the same test with the same items twice might be a problem.
In a situation where the learning effect is likely/possible – particularly tests which are tests of competency.
What is parallel forms reliability?
The correlation between scores on two versions of the same test by the same people done at the same time.
What statistic can we use to evaluate parallel forms reliability?
Pearsons r
Imagine you create an ability test which involves an examiner making a number of ratings of an individual’s thumb-rolling skill. You test the inter-rater reliability using two examiners and obtain a correlation of 0.87. What does this mean?
The correlation between each examiners is good, however the means and SDs would need to be checked for similarity.