Section 7: sampling, random error, statisical inference, selection bias & different sampling methods Flashcards

(99 cards)

1
Q

What is a sample?

A
  • selected subset of a source pop
  • should be representative of the source pop
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of taking a sample?

A

to study smth that we can’t study in the whole pop coz of practical restrictions (finance, time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can research be conducted in whole populations?

A

Yes but,
* almost always conducted in samples
* rare cases research conducted in whole pop but usually pop very small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is sampling?

A

process of selecting # of individuals from all individuals found in a source pop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a sampling frame?

A

list (or database) containing all individuals in pop & is used for sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are sampling units?

A
  • Individuals to be potentially selected
  • most of time individuals but can also have larger sampling units (e.g. fam, streets, hospitals, schools, etc)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the source population?

A

group of all individuals in which we’re interested to assess parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What can the source population be?

A
  • vague (e.g. total pop of a country or city)
  • specific (e.g. smokers of a country, all patients with heart disease, etc)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Who is the source pop in descriptive research?

A

should be restricted to country or region where sample was taken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Who is the source pop for analytical research?

A

can be more general regarding source pop but depends on RQ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Who is the source population if a study investigated the prevalence of obesity in Cyprus, by recruiting a random sample of adults?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Who is the source population if a study investigated the association between smoking and oesophageal cancer among a sample of 35-65 year olds in Canada?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Who is the source pop if a study investigated the association between educational attainment and stroke among a sample of elderly individuals in Sweden?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a population parameter?

A

measurement of a quantity (or association) in a population that we’re interested about. e.g.:

  • mean age
  • prev of obesity
  • mean diff in BP between men & women
  • OR for association between smoking & cancer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an estimate?

A

measurement of a quantity (or association) in a sample that’s supposed to represent true quantity or association in source popualtion (parameter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is sampling variation?

A

difference between different sample estimates derived from the same source population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is sampling error?

A

statistical error that occurs when a sample doesn’t represent entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is sampling error also referred to as and why?

A

Coz sampling erorr is a result of chance, usually referred to as random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How does sample size influence the mangitude of the random error?

A

larger sample size minimizes the magnitude of the random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why does research often rely on samples instead of whole populations?

A

Because of cost & time constraints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is standard error of the mean (SEM)?

A

variability of means that might be calculated from repeated sampls from the same population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the relaitonship between sample size & SEM?

A

Larger sample sizes decreases the SEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What can the standard error be used for to calculate?

A

degree of uncertainty around an estimate: the 95% confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What do confidence intervals indicate?

A

range of values, derived from sample data, that is likely to contain the true population parameter (like the mean or proportion) with a specified level of confidence (often 95%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How are confidence intervals calculated?
using standard error of the mean
26
How is the lower confidence interval calculated?
* sample mean - (z * standard error) * if 95% CIs = sample mean - (1.96 * standard error)
27
How is the upper confidence interval calculated?
* sample mean + (Z * standard error) * if 95% CIs = sample mean + (1.96 * standard error)
28
What does the z-value depend on?
depends on significance level (alpha) that is chosen
29
What significance level (alpha) is most commonly used in biomed research?
Alpha = 0.05 (5% signficance level)
30
What is the z-value for a 5% signficance level?
1.96
31
What is the confidence interval for a significance level of 5%?
For significance level of 5%, get 95% confidence intervals
32
For the association between smoking & BP, if the mean difference is 12.3 (95% CI: 10.8; 13.8), how would the CIs be interpreted?
We're 95% certain that the true pop mean difference lies between 10.8 & 13.8
33
For the association between age & BP, interpret the CIs if: the regression coefficient 3.6 (05% CI: 0.5; 6.7)
95% certain that the true pop regression coefficient lies between 0.5 & 6.7
34
For the association between obesity & hypertension, if the OR is 2.10 (95% CI: 1.80; 2.40), how would this be interpreted?
95% certain that true pop OR lies between 1.80 & 2.40
35
What does the Confidence interval indicate?
level of uncertainty around the sample estimate
36
What does a 95% CI indicate?
a range within whcih we can be 95%s certain that true pop measure lies there.
37
What is the relationship between 95% CI and sample size?
larger the sample size, the narrower the 95% CI
38
What are 2 possibilities for any given association between 2 factors (variables)?
* association doesn/t exist (2 variables arent linked) * association does exist (2 are linked)
39
What is the Null hypothesis H0?
always states that there is no associaiton between the 2 variables
40
What is the alternative hypothesis (H1)?
always states that there is an association between the 2 variables
41
What significance level is often used?
level of 5% = p-value of <0.05 to conclude statistical significance
42
What is the 95% CI & P-value affected by?
* sample size * magnitude of the association
43
What does it mean if we have an estimate with a p-value of <0.05?
statistically signfiicant so exclude possibility of it being a chance finding
44
If we assume the presence of an association in the population....
* **large sample sizes** will give **smaller** p-values * estimates of **large magnitude** will also give **smaller** p-values
45
What can the 95% CI also be used for?
accepting or rejecting the null hypothesis
46
What is random error?
Error introduced solely by **chance** and is inherent in the sampling process
47
What is systematic error?
(AKA bias) introduced via **man-made** acitons relating to the conduct of a study
48
What does selection bias lead to?
A biased sample which almost always = biased estimates
49
What are the types of sampling methods?
* Probability (random) sampling * systematic sampling * non-probability sampling
50
What is probability (random) sampling?
* sample selected by probabalistic methods * involves random selection allowing to make strong statistical conclusions about whole group
51
What is systematic sampling?
Sample selected according to simple, systematic rule
52
What is non-probability sampling?
* Sample selected by easily employment (convinient) * involves non-random seelection based on conveience or other criteria allow easily data collection
53
What are the types of probability sampling methods?
* simple random sampling * stratified RS * cluster sampling * multi-stage sampling
54
What are the types of systematic sampling?
* simple systematic sampling * proportional quota sampling
55
What is simple random sampling?
all individuals in sampling frame have **same probability** of being selected indepentely of all others
56
When is simple random sampling mostly used?
in quantitative research
57
What does simple random sampling ensure given a large sample size?
chosen individuals are **representative** of source pop * demography (age, sex, ethnicity) * other factors (history, disease status, lifestyle factors)
58
What is the procedure for simple random sampling?
* use tools like random # generators or other techniques that are based entirely on chance * e.g. random sample of 100 cancer patients from registry of 1,000. assign a # to each patient in database & use generator to select 100
59
What are the advantages of simple random sampling?
* ensures **representative sample** from source pop (provided sample is large enough) * **cheaper & less time** consuming than other sampling methods * ideal for **quantitative** studies & **testing hypothesis**
60
What are the disadvantages of simple random sampling?
* if **sampling frame is too large/pop is geographically diverse**, can be impractical to perform (difficulty to access lists of full pop) * **if large sample required**, simple random sampling can be time consuming & costly
61
What is stratified random sampling?
* sample principles as simple random sampling but within **strata (subgroups)** of pop * in terms of key demographic characteristics
62
What is an example of stratified random sampling?
* company has 800 female employees & 200 male employeees * need sample of 100 * sort pop into 2 strata based on gender * want to ensure sample reflects gender balance of company so you choose random sampling on each group, selecting 80 women & 20 men = representative sample of 100
63
What is the procedure of stratified random sampling?
* identify source pop * set up sampling frame * decide on sample size * decide on pre-defined pop strata * based on overall proportions of pop, calculate how many ppl should be sampled from each strata * **randomly** select individuals to fill strata
64
What are the disadvantages of stratified random sampling?
* more time consuming than simple RS * can't be used when researchers can't confidnetly classify eveery member of pop into subgroup * higher complexity can give rise to errors (e.g. stratification not done properly)
64
What are the advantages of stratified random sampling?
* allows to draw more precise conclusion by ensuring every subgroup is propertly represented in sample * enables comparison of pop sub-groups
65
What is cluster sampling?
* Based on hierarchical structure of natural clusters (groups) of individuals wihtin the pop * natural clusters can be hospitals, schools, streets, city distrcits, etc
66
What does cluster sampling involve?
taking a random sample of these natural clusters & seleciting all individuals in selected clusters
67
What is the sampling frame?
list of all clusters
68
What is the procedure for cluster sampling?
1. identify source pop 2. set up sampling frame (comprised of clusters) 3. decide on sample size (# of clusters & individuals) 4. randomly select clusters from sampling frame
69
What are the advantages of cluster sampling?
can **reduce cost & time** of data collection esp when pop is **spread over large area**
70
What are the disadvantages of cluster sampling?
* considerable differences between clusters can cause errors * difficult to gaurantee that sampled clusters rlly are representative of whole pop
71
When can the representativeness be compromised in cluster sampling?
Representativeness can be comprised if: * too few clusters selected * clusters too specific * clusters contain too few individuals
72
What is multi-stage sampling?
* Uses hierarchical structure of natural clusters (groups) of individuals within the pop * similar to cluster sampling
72
What are the advantages of multi-stage sampling?
* can improve sampling representativeness (compared to simple RS) - esp if pop is geographically divers or sample is too small * cheaper & less time consuming (dep on # of stages tho)
73
What are the disadvantages of multistage sampling?
representativeness of sample compromised if: * Too few clusters are selected and/or * Clusters are too specific and/or * Clusters contain too few individuals
74
What is (simple) systematic sampling?
* sampl selected according to simple, **systematic** rule but **not** randomly
75
what are examples of simple systematic sampling?
Selecting ppl from sampling frame: * whose name starts with certain letter * who were born in selected month * selecting every 2nd/5th/10 person from sampling frame
76
What is the procedure of (simple) systematic sampling?
* identify source pop * set up smpling frame * deicde on sample size * systematically select individuals from sampling frame * example of systematic rule: recruit every 3rd individual
77
What are the advantages of (simple) systematic sampling?
* acceptable, more conveniet, alternative approach if random sampling isn't possible * faster & possibly cheaper
78
What are the disadvantages of (simple) systematic sampling?
**representativeness** of sample can be **compromised** if system of choice selects indviiduals in **non-random** way
79
What is the procedure of proportioanl quota sampling?
* identify soruce pop * set up sampling frame * decide on sample size * decide on pre-defined pop strata * select individuals to fill strata non-randomly (i.e. using systematic sampling)
80
What are the advantages of proportional quota sampling?
* acceptable, more convenient, alternative approach if stratified random sampling isn't possible * Compared to simple systematic sampling, could ensure original pop structure as it uses predefined population strata
81
What are the disadvantages of proportional quota sampling?
* representativeness of sample may be compromised as individuals are selected in a non-random fashion
82
What is convinience sampling?
* most frequent example of non-probability sampling * Individuals are selected in non-random fashion, solely based on convenience (i.e. they are easy to access)
83
What's an example of convenience sampling?
* Researching anxiety of university students, so after a class of an elective course, you ask your fellow students to complete a survey on the topic. * This is a convenient way to gather data, but as you only surveyed students at the same year and elective course as you, the sample is not representative of all the students of your class nor the university.
84
What is the procedure of convenience sampling?
* identify source pop * set up samploing frame * decide on sample size * conveniently select individuals
85
What are the advantages of convenience sampling?
* cheap * fast * convenient
86
What are the disadvantages of convenience sampling?
representativeness of sample definitely **compromised** coz individuals are selected in a **non-random fashion**
87
What are the types of non-random sampling methods?
* Convenience sampling * Purposive sampling * Voluntary response sampling * Snowball sample
88
What is purposive sampling?
select individuals with the characteristics the researcher wants in the study
89
What is voluntary response sampling?
ppl volunterr themselves for the study
90
What is snowball sample?
participants recruit other participants
91
What does the choice of sampling method in order to minimize selection bias depend on?
* aim of study * natur of source pop * sample size * pracitcal issues (financial resources, time, etc)
92
Which sampling method to choose in order to minimize selection bias when there are **no financial & time constraints?**
* always advised to use **probability (random) sampling** tehcnicues to minimize selection bias * **stratified RS** is ideal method if **sample is small**
93
What can we assume when non-random sampling techniques have been used?
* representativeness of sampl is always questionable * assume that **selection bias** is operating at some extent
94
What should always be avoided whatever the study design?
Convenience sampling!
95
What do we do with confounding?
We adjust for it (e.g., stratify, regress)
96
What do we do for effect mdoification?
report it, can't adjust it
97
What is stratification?
Dividing (stratifying) your study population into subgroups (strata) based on levels of a confounder & then analyzing exposure–outcome relationship within each stratum. E.g. * Stratum 1: non-smokers * Stratum 2: smokers You then estimate coffee–heart disease association separately within each group.