Reliability, Validity & Error Flashcards

Question 1

Q

What is reliability in psychological measurement?

Answer

A

The consistency, stability, and accuracy of a measurement tool

Indicates how free a test is from random error

Test score = True score + Random error + Systematic error

High reliability means dependable and consistent results across time, raters, or items

Question 2

Q

What is test-retest reliability, and how is it assessed?

Answer

A

Measures stability of a test over time

Same test given to same participants at two different times

Scores compared using correlation coefficients (e.g., Pearson r)

Example: Administering a personality questionnaire twice, several weeks apart; high correlation = stable measurement

Question 3

Q

What is internal consistency reliability, and how is it assessed?

Answer

A

Measures how consistently items within a test assess the same construct

Methods: Cronbach’s alpha, split-half reliability, item-total correlations

High alpha (>0.70–0.80) indicates good consistency

Example: Depression scale items (sadness, loss of interest) correlate highly, showing internal consistency

Question 4

Q

What is inter-rater reliability, and how is it assessed?

Answer

A

Measures agreement between different raters assessing the same phenomenon

Methods: Kappa statistics, intraclass correlation, percent agreement

Example: Multiple healthcare professionals independently rate patient pain; high agreement = high inter-rater reliability

Question 5

Q

What is measurement error in psychological assessment?

Answer

A

The difference between the observed score and the true score
Represents variability or inconsistency in scores not related to the actual trait
Measurement error reduces accuracy and reliability

Question 6

Q

How are reliability and measurement error related?

Answer

A

Inversely related:

High reliability → low measurement error (stable, consistent scores)
Low reliability → high measurement error (scores fluctuate due to unrelated factors)

Question 7

Q

What are examples of random measurement error? (8)

Answer

A

Random error: Caused by chance factors such as the test-taker’s mood, distractions, or physical condition during testing. These errors fluctuate unpredictably and reduce reliability.

Systematic error (bias): Consistent deviations from the true score caused by factors such as test bias, rater bias, or poorly designed items. Unlike random error, systematic error can skew results in one direction.

Test characteristics: Short tests or poorly worded items introduce more error, lowering reliability.

Administration and scoring procedures: Variations in instructions, scoring criteria, or test environment can introduce error.

Response format: Choice of Likert scales, multiple-choice, or open-ended questions affect measurement error differently.

Participant factors: Fatigue, motivation, anxiety, or misunderstanding items can raise error levels.

Instrumentation error: Calibration or design issues with measurement tools.

Rater variability: Especially important in subjective assessments, where differences between raters affect score consistency.

Question 8

Q

How can reliability estimates help calculate the margin of error around an individual’s score?

Answer

A

Reliability indicates score consistency across repeated measurements

Used to compute the Standard Error of Measurement (SEM), which reflects expected score variability due to measurement error

SEM helps create confidence intervals around an observed score to estimate the range of the “true” score

SE = SD / sqrt(N)
SEM = SD * sqrt (1-Reliability)

Question 9

Q

What is the formula for SEM and what does it represent?

Answer

A

SEM = SD × √(1 − Reliability)

SD = standard deviation of test scores in the sample

Reliability = reliability coefficient of the test (e.g., Cronbach’s alpha)

SEM represents the typical measurement error expected in an individual’s score

Question 10

Q

How is a confidence interval for an individual score calculated using SEM?

Answer

A

Observed Score ± (Critical Value × SEM)

Example: 95% CI → Observed Score ± (1.96 × SEM)

Provides a range in which the individual’s true score likely falls with a specified level of confidence

Question 11

Q

When is calculating a margin of error for individual scores most useful?

Answer

A

In practice, confidence intervals around individual scores are not commonly calculated for most psychological assessments, as they are more relevant for estimating population parameters in survey research. However, they may be useful in specific situations where precise estimates of an individual’s true score are important, such as high-stakes testing or clinical assessments.

Question 12

Q

What procedures are used to develop items for a scale of measurement of individual differences (sorry langt svar)

Answer

A

Define the Construct:
Begin by clearly defining the psychological or behavioral trait to be measured. Understanding the construct’s nature and components is fundamental for guiding item development.
Generate a Pool of Items:
Brainstorm widely to create a large pool of potential items that cover all aspects of the construct. Having more items initially allows for refinement and item reduction later.
Use Clear, Concise Language:
Items should be phrased in simple, straightforward language understandable by the target population. Avoid jargon and complex wording.
Avoid Double-Barreled Items:
Each item should measure only one aspect of the construct to prevent confusion and improve response accuracy.
Vary Item Wording:
Include both positively and negatively worded items to control for response biases such as acquiescence (tendency to agree with statements).
Select Appropriate Scale Types:
Choose a measurement scale that fits the construct and research aims:
Likert Scales: Commonly used for agreement or frequency.
Thurstone Scales: Rank items based on perceived relevance or intensity.
Guttman Scales: Hierarchical, often used with formative constructs.
Expert Review for Content Validity:
Engage qualified subject-matter experts to evaluate items for relevance, clarity, and representativeness.
Use feedback iteratively to refine items.
Experts assess whether items adequately cover the domain of the construct, ensuring content validity—the extent to which the scale covers all facets of the construct comprehensively.
Pilot Testing:
Administer the draft scale to a representative sample to assess item performance, clarity, and reliability.
Scale Scoring and Item Weighting:
Depending on scale type, scoring may involve sum or average scores, weighted scores using Confirmatory Factor Analysis (CFA), expert-assigned weights, or more advanced Item Response Theory (IRT) techniques.
Norming and Benchmarking:
Establish norms using representative samples, enabling interpretation of individual scores relative to the population.

Question 13

Q

how does the procedures a does relate to content validity?

Answer

A

Content validity is achieved by ensuring that the scale items fully and appropriately represent the construct domain.

Proper item development, expert review, and comprehensive coverage of the construct’s facets are essential to content validity.

Without extensive and systematic item development and content validation, the scale risks omitting important aspects or including irrelevant items, compromising the measure’s accuracy and usefulness.

Reliability, Validity & Error Flashcards

(13 cards)