Unit 4 Key Ideas Flashcards

(24 cards)

1
Q

What does sampling mean?

A

Act of selecting observation units (things we’re interested in studying) to collect data from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does sample mean?

A

The observational units (things we’re interested in) chosen to collect data from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the population of interest?

A

Is a specific group of individuals/things researchers/data statisticians are interested in for survey/study. (i.e. the observational units we are interested in studying). This group is defined by specific characteristics (inclusion criteria)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the inclusion criteria?

A

Specifications used by data scientists to define who or what is in the population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three main sampling strategies?

A

Volunteer sampling, Quota sampling, Probability sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define volunteer sampling

A

Makes the study/survey available to any respondent who meets the inclusion criteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define quota sampling

A

(Similar to volunteer sampling w/exceptions) Makes the study/survey available to any respondent who meets inclusion criteria, but you have SPECIFIC QUOTAS for specific respondents

(e.g. Researchers at the Cleveland Clinic are interested in performing a study on a new heart disease treatment. Between January and December of 2022, they ask all patients with heart disease if they are interested in participating in this study until they’ve recruited 50 adults over the age of 65 and 50 adults between the ages of 40 and 65. The blood pressure of all patients is recorded multiple times a day by researchers. Participants who agree to participate in the study are compensated with a gift card to a retail store of their choosing.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define probability sampling (i.e. convenience sample)

A

Assigning all eligible observational units a non-zero probability to be included in the sample (i.e., every observational unit has a chance at being selected), and then randomly selecting some observation units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the social exchange theory?

A

Claims that in exchange for giving data, respondents must be given something in return equivalent to their effort while providing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define survey incentive

A

Is the acting out of the theory proposed prior. Or the act of offering something in return for a respondent offering their data (e.g. gift cards, coupons, being put in a raffle)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define non-response

A

When the respondent selected for the sample chooses not to respond to the survey in its entirety or partially

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What rate of non-response becomes problematic? In what particular type of sampling?

A

A non-response rate of 30% or higher becomes problematic, especially in probability samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is data validation?

A

Is the act of ensuring that all collected values from each observation for each attribute are valid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is data cleaning? What should you not do when data cleaning?

A

Data cleaning is removing invalid values from a data set. You should never remove entire rows if some values are missing because that may lead to gaps in knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the external validity of a sample?

A

The extent to which the observational units in the sample are like the ones in the population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is internal validity? What defines it in a study design?

A

The extent to which a study’s design supports making a causal inference. It all depends on the extent to which the observational units in the control group are like the observations in the treatment group (OG group).

17
Q

High internal validity has what things in it?

A

1) Thorough causal graph
2) Matching/blocking based on key factors identified in causal graphs (moderators, confounders, mediators)
3) Incorporation of random assignment
4) A thoughtful evaluation of the study’s internal validity
5) Computation of average treatment effect

18
Q

Representativeness

A

The extent to which the observational units in the sample are similar to the observational units in the population of interest.

Think of “Who’s not here?”

Does my survey on American’s favorite foods include everyone or just some?

19
Q

What is the study of how and why individuals choose to respond to surveys?

A

Survey response

20
Q

Define parameters

A

Characteristics about an entire population. For example, if you were studying the average height of all adults in a city, the true average height would be a parameter. Since it’s impractical to measure every single person in a whole city, this is often substituted with sample statistics. This is computed from a sample we taken from an entire population of interest.

21
Q

Define generalization

A

The act of using a sample statistic to determine the value of a parameter.

22
Q

Define estimate

A

The value we use as a stand-in for our parameter.

23
Q

How does the sample mean (estimate of parameter) have the same value of the actual parameter?

A

Observational samples are thoughtfully selected using a reliable sampling strategy (e.g. volunteer sampling is often biased and unreliable whereas probability and quota sampling isn’t). Everyone is represented in the sample so it is equivalent to observing all individuals.

(Think representativeness and “Who’s not here?”

24
Q

High external validity has what?

A

1) Detailed list of specific inclusion criteria
2) Incorporation of random sampling
3) A thoughtful evaluation of the sample’s external validty