what does construct validity mean?
Does a test measure the construct that it claims to measure
Unscientific to presume validity - face validity can deceive
Cannot rely on authority - must prove case using standard rigorous steps
what is face validity?
when you assume someone is valid because it looks valid
may hide underlying invalid constructs
desirable, not enough, not even necessary
what are the preconditions that must be met before you statistically measure construct validity?
Necessary steps for validity must be present - involves discrimination, reliability, structure
Validity must be present - involves patterns of links to other things
what are the steps taken to statistically measure construct validity?
what are the two sources of invalidity?
Systematic error - bias in a particular direction - caused by particular thing you can identify
Random error - bias in no particular direction - caused by different things you cannot identify
what are the 5 standard steps to achieve construct validity in questionnaires?
item design: what type of items are included in a questionnaire?
Close-ended - quantitative information, more common, more convenient, efficient, more top-down
Open-ended - qualitatively rich, generate own thoughts in response, must be coded to turn into numbers, labour-intensive, more bottom-up
item design: how to write good items?
item design: bad examples of items
Oxford Capacity Analysis by Scientologist
item design: what is scaling?
applying a particular type of number to a response on a questionnaire
used for close-ended items
item design: how is an item scaled?
item design: what are the response options?
Decide on how much or few options there are
More options get more information - studies show that validity approaches its maximum between 5-7 items
item design: what is the impact of labelling on responses to questionnaires?
Label every item to reduce ambiguity of what items mean - by standardising you are reducing variation
item design: what is the impact of neural/uncertain responses on responses to questionnaires?
can increase information capture and therefore validity and accuracy
OR
can increase laziness and therefore decrease information capture and accuracy and validity SO
no overall benefit
item design: what is the impact of forward-scored and reverse-scored items on responses to questionnaires?
should be roughly equal
reduces acquiescence bias
permits test of completing scale seriously by screening out non-standard respondents
item analysis: discrimination in questionnaires and items
item analysis: how is variation measured statistically?
Desirable statistical features:
○ More dispersion –> higher SD - SD of scores on a scale is a direct measurement of dispersion of variability or variation in scores
○ Central average –> middling M - items scores then to clump into a bundle
○ Symmetric distribution –> lower SKEW - skew is a direct measure of the asymmetry or imbalance in a distribution of scores
item analysis: what are the levels of discrimination?
○ Good distribution - broader spread of scores, mean is close to the scale midpoint, distribution is balanced
○ Bad distribution - narrower spread of scores, mean is lower than scale midpoint, distribution is positively skewed
item analysis: what is the ideal criteria when using a 5-point scale?
M because 2-4
SD less than 1
what is internal consistency?
consistency between items
what is test-retest reliability?
consistency over time
what is inter-rater reliability?
consistency between scores
what is scale reliability?
how consistently a scale produces similar results when measuring the same construct multiple times
reliability analysis: why does scale reliability matter?
Scale reliability matters because it is a precondition for validity
- A scale must be reliable and consistent to be valid but a reliable scale can be invalid due to random error