Validity
Definition: How accurate a measure is. Does the test measure what it is supposed to measure?
*Standardized assessment tools are going to be valid that’s how they become evidence based *
Example: Standardized IQ test (Wechsler Adult Intelligence) are VAILD because they are intended to measure intelligence and do.
Construct Validity
Definition: Does the test measure the correct construct (characteristics) it is supposed to measure?
If I am measuring the ability to utilize a coping mechanism does my measure actually measure coping mechanisms as opposed to something else
Example: You may be determining if a mindfulness education program increases emotional maturity in elementary students. Construct validity would measure if your research is actually measuring emotional maturity.
Content Validity
Definition: Is the test fully representative of what it aims to measure? Does it represent all content?
Example: A Spanish teacher develops an end of year- of- year test for her students. The test should cover all content that was covered throughout the year.
This would have high content validity. Similarly, if she includes questions unrelated to Spanish or tests content that was not covered the results are no longer a valid measure of Spanish knowledge (low content validity would)
Face Validity
Defintion: Does the content of the test appears to be suitable to it’s aims?
Does it appear to be at face value align with what we are trying to measure?
Example: You created a survey to measure anger in students. You ask questions like:
How often are you angry?
How many fights have you been involved in recently?
On the surface, the survey seems like a good representation of what you want to test, so you consider it to have a high face validity.
Reliability
Definition: How consistent a measure is.
Example: If you take the ACT 5 times you get roughly the same score on the exam. This would be a RELIABLE measure.
Memory Trick: Reliability = Consistency
Inter-rater Reliability
Definition: Refers to the level of agreement between raters or judges. If everyone agrees, inter-rater reliability is 1 ( or 100 %) and if everyone disagrees, inter-rater reliability is 0 (0%).
*Researchers you have similar results or scores *
People are subjective, so different observers’ perceptions of situations and phenomena naturally differ. Reliable research aims to minimize subjective as much as possible so that a different researcher can replicate the same results.
Example: We would expect inter-rater reliability to be high in scores with points awarded by judges.
In the Olympics, scores for subjective events (gymnastics and diving) should have ratings from judges that are similar. Scores that are outliers are generally not included.
Test-retest Reliability
Definition: Test- retest reliability is the closeness of the agreement between the results of successive measurements of the same measure carried out under the same conditions of measurement.
*That is why there are trails. Have to go through and alot of retest and retest *
Will the results produce the same outcome when the test is given again.
*Have to be in the same condition with the same outcome fractures time and time again *
Example: You administer a questionnaire to measure the IQ of a group of students. You administer the test six months apart, but the results are significantly different.
This would be an example of low test-retest reliability as IQ should be something that is relatively consistent over time.
Reliable but not Valid .
A test may be reliable but not valid.
Example: If your scale is not calibrated correctly ( 5 lbs over or under) you will have reliable readings of your weight but not valid.
However, a test cannot be valid unless it is reliable. ( You consistently get the same weight but it is not a valid weight).