Lecture 6 Flashcards

Question 1

Q

Construct

Answer

A

Hypothetical factor that cannot be observed directly; its existence is inferred from certain behaviors and assumed to follow from certain circumstances

Question 2

Q

Operational Definition

Answer

A

A way to attach a system of measurement in a way that can be replicated and which is a faithful PROXY of the construct.

We use different kinds of measures (e.g. questionnaires, tests) to act as faithful proxies of what we really want to measure

Question 3

Q

Reliability

Answer

A

Reproducibility of a measurement; the extent to which measures of the same phenomenon are consistent and repeatable; measures that are high in reliability will contain a minimum measurement error.

Question 4

Q

Validity

Answer

A

The extent to which a measure of a constuct truly measures that constuct and not something else.

Question 5

Q

Classical Test Theory

Answer

A

AKA “Classical Reliability Theory”
1. True score is constant
2. Error is random
3. Correlation between the true scores and error is 0
4. Correlation between errors of difficult measurements occasions are also equal (this error is because error is assumed to be random)

X(observed score) = TRUE SCORE + ERROR

Random Error and True Score

Question 6

Q

Generalizability Theory

Answer

A

Error is seperated into pieces, each of which can be estimated (if we collect the data properly)
1. To better understand what aspect is producing the most amount of error therefore we can understand it better.
2. Quickly identify where the error is coming from and determine how good the measure is
3. Explicitly connects measurement operations to the purpose of measurement

Random Error, True Score, Test-Retest Error, Rater Error, and Other identifiable sources of Error.

Question 7

Q

Temporal Reliability

Answer

A

Reproductibility of values of a variable when you measure the same subjects twice or more. A form of reliability in which a test is administered on two seperate occassions and the correlation between them is calculated.
1. Test-Retest
2. Parallel Test

Question 8

Q

Test-Retest

Answer

A

Administer the same test two or more different times and calculate the correlation between the scores. This helps us understand how stable the test is over time.

Question 9

Q

Parallel Test

Answer

A

Giving two different versions of a test that is supposed to be assessing the same construct (or constructs).
In which you would find the correlation between the two different versions of the test.

Minimizing threat to internal validity so test-takers do not end up remembering items from previous administrations*

Question 10

Q

Internal Consistency Reliability

Answer

A

Measures the extent to which a measure yields the same number of score each time it is administered, assuming everything else is equal.
(Reliability of a 10-item test would be higher than that of a 5 similar items)

Question 11

Q

Split-half reliability

Answer

A

A format of reliability in which one half of the items on a test are correlated with the remaining items.

Finding questions that are let us say in one already established tool and merging it with another already established tool and detemining the reliablity then

Question 12

Q

Cronbach’s Alpha

Answer

A

The internal reliability within a test by comparing each item to all other items.

Each item is compared with each other to assure that the ratings scale is consistent.

Used with a likert scale type item (range in severity from 1-10)

See formula in notes (can be calculated in R)

Question 13

Q

Kuder-Richardson coefficient (KR-20)

Answer

A

Variables that are dichotomous in nature (yes/no or true/false)
Use the KR-20 equation to determine if the items are not in fact continuous in nature

See notes for formula

Question 14

Q

Interrater Reliability

Answer

A

The reliability of a test’s results as it occurs across multiple test adminstrators (that is the stability of test results for the same test takers but across different test adminstrators or scores)

Can be determined by:
(number of agreements)(number of agreements + disagreements)

Question 15

Q

Intraobserver or Interrater Reliability

Answer

A

The degree of agreement between two or more observers of the same event
This type of reliability is estimated by having two or more observers watching the same event and independently recording the variables according to a predetermined coding system.
A correlation coefficient is computed to demonstrate the strength of the relationship between one observer’s rating and the other’s.

Question 16

Q

Cohen’s Kappa

Answer

Study These Flashcards

A

This is used to determine diagnosis similarities by two doctors
WHERE
Po= proportion of observed agreements between the raters
Pe= probability of agreement occuring by chance
K=(Po-Pe)/(1-Pe)

Question 17

Q

How do you calculate the probability of agreement by chance

Answer

Study These Flashcards

A

Probability that the raters would agree simply by random chance:
Calculated by considering the marginal totals (rows and column) totals in a contingency table and summing the products of these marginal totals

Cohen’s Kappa for nominal or more categorical data

Question 18

Q

Standards of Reliability

Answer

Study These Flashcards

A

.80 is a good benchmark for research
.90 - .95 is better for application (e.g., diagnosis, personnel decisions)

Question 19

Q

Factors that affect reliability

Answer

Study These Flashcards

A

Length of test: the more items the better the reliability
Timing of the test: the faster or less time spent on a test, the less reliable.
Group heterogeneity/heterogeneity: the greater the variance of scores (more varied the sample is) the more reliable the test. VARIANCE IS NEEDED TO HAVE RELIABILITY
Item Difficulty: the more varying of difficulty among the test items, the reliable the test.

Question 20

Q

Item Response Theory (IRT)

Answer

Study These Flashcards

A

Framework which helps us to evaluate a test takers performance to specific test items and to understand their underlying abilities or traits through that. LATENT TRAIT

Examples:
Adaptive Testing: SAT on a computer in that it is adaptive in nature with the questions becoming subseqently harder or easier based on your response to the previous questions

Clinical Assessment: Often focused on finding solid diagnostic items which could include the following questions from PHQ-9:
-PHQ-9, Item5, Poor appetite or overeating
-PHQ-9, Item9, Thoughts you would be better off dead, or of hurting yourself

Question 21

Q

Item Response Theory Parameters on which items are characterized include:

Answer

Study These Flashcards

A

Their difficulty (known as “location” for their location on the difficulty range)
Discrimination, representing how steeply the rate of success of individuals varies with their ability.
“Psuedo-guessing”: the probability of scoring correct due to guessing (33% pire chance on a multiple choice item with 3 possible responses)

With all three of these parameters, IRT can do a reliable job in estimating ability at the low, middle, and high levels of ability.

Question 22

Q

Answer

Study These Flashcards

A

Lecture 6 Flashcards

Midterm Study (22 cards)