Advanced Statistics Flashcards

(229 cards)

1
Q

What is a variable in research?

A

Anything measured, observed, or manipulated in a study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are data?

A

Measurements obtained from variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is qualitative data?

A

Data where individuals fall into non-numerically related categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is nominal data?

A

Named categories without intrinsic order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is binary data?

A

Nominal data with only 2 categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is ordinal data?

A

Ordered categories where differences between levels are not equal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is quantitative data?

A

Numerical data obtained by counting or measuring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is discrete data?

A

Quantitative data consisting of whole numbers with limited possible values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is continuous data?

A

Quantitative data that can take any value within a range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is the interval-ratio distinction rarely needed in medical research?

A

Most interval data can be treated similarly to ratio data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can quantitative data be converted into categories?

A

Yes, by defining class intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an independent variable?

A

A variable controlled by the researcher that is presumed to cause change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a dependent variable?

A

A variable whose value depends on the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a mediator variable?

A

A variable that explains how or why an independent variable affects a dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What distinguishes complete and partial mediators?

A

A complete mediator is essential for causation; a partial mediator reduces but does not eliminate the effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Can a mediator cause the outcome independently?

A

No.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which tests are used for nominal and ordinal data?

A

Nonparametric tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which tests are used for most measured variables?

A

Parametric tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is descriptive statistics?

A

Methods used to summarise and communicate data without making inferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is mean?

A

The arithmetic average, sensitive to extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When is the mean inappropriate?

A

For skewed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the median?

A

The value dividing data into 2 equal halves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why is median more stable than the mean?

A

It is less influenced by extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the mode?

A

The most frequently occurring value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is skewness?
Asymmetry in a data distribution.
26
How do mean, median, and mode relate in normal distribution?
They are equal.
27
What characterises positive skew?
Mean greater than median with a long right tail.
28
What characterises negative skew?
Mean less than median with a long left tail.
29
What is dispersion?
The spread of data values.
30
What is range?
Difference between highest and lowest values.
31
What is interquartile range?
Difference between the 75th and 25th percentiles.
32
What is variance?
Average squared deviation from the mean.
33
What is standard deviation (SD)?
Square root of variance, expressed in original units.
34
What is standard error (SE)?
SE = SD / √ n
35
What does SE represent?
Precision of the sample mean as an estimate of the population mean.
36
Which graphs are used for categorical data?
Bar charts and pie charts.
37
What graph is used for continuous data?
Histogram.
38
What does a box-whisker plot display?
Quartiles, median, range, and outliers.
39
Why is normal distribution important?
Many statistical tests assume normality.
40
What proportions lie within 1, 2, and 3 SDs in a normal distribution?
68%, 95%, and 99% respectively.
41
What is the central limit theorem?
Sample means are normally distributed if sample size is sufficiently large.
42
What is a standard normal distribution?
A normal distribution with mean 0 and SD 1.
43
What is a z-score?
Standardised value calculated as (x-mean) / SD
44
What is probability?
Relative frequency of an event, ranging from 0 to 1.
45
What are odds?
Ratio of events occurrence to non-occurrence.
46
How are odds and probability related?
Odds = p / (1-p) Probability = odds/ (1+odds)
47
What is the target population?
The population to which study results are intended to apply.
48
What is the source population?
The accessible sampling frame.
49
What is the eligible population?
Individuals meeting study criteria.
50
Who are study entrants?
Eligible individuals who consent to participate.
51
Who are study completers?
Participants who complete all study requirements.
52
What is random sampling?
Every individual has equal selection probability.
53
What is stratified sampling?
Random sampling within predefined strata.
54
What is cluster sampling?
Sampling of pre-existing groups rather than individuals.
55
What is non-random sampling?
Sampling where selection probability is unknown.
56
What is purposive sampling?
Deliberate selection based on study objectives.
57
What is inferential statistics?
Methods used to draw conclusions about populations from samples.
58
What is point estimation?
Estimating a range likely to include the true parameter.
59
What is a hypothesis?
A conjectural statement linking variables.
60
What is the null hypothesis (H₀)?
Statement of no difference or effect.
61
What is the alternative hypothesis (H₁)?
Statement opposing H₀ and reflecting the research belief.
62
What is a one-tailed hypothesis?
Hypothesis specifying direction of effect.
63
What is a two-tailed hypothesis?
Hypothesis specifying difference without direction.
64
What is a p-value?
Probability of rejecting the H₀ when H₀ is true.
65
What p-value is commonly considered to be statistically significant?
p < 0.05.
66
What is a Type I error?
False positive; occurs when H₀ is incorrectly rejected, resulting in false positive conclusion in favor of H₁.
67
How can a Type I error occur in practice?
When a researcher incorrectly upholds the H₁ and rejects the H₀, despite it being true.
68
What is the probability of committing a Type I error called?
Alpha (α).
69
What is the conventional threshold for α?
Less than 0.05, commonly accepted as the level of statistical significance (p-value).
70
How can repeated testing increase Type I error?
Multiple hypothesis testing, subgroup analyses, or secondary analyses increase the chance that at least one test will be falsely significant.
71
What is a Type II error?
False negative; occurs when H₀ is incorrectly accepted, leading to false negative conclusion when a true difference exists.
72
What is the probability of committing a Type II error called?
Beta (β).
73
What are common causes of Type II error?
Small sample size and large variance.
74
What does 1-β represent?
The power of a study.
75
How are Type I and Type II errors related?
Reducing Type I error generally increases Type II error, and vice versa.
76
What are the traditional values for α and β?
α = 5%, β = 20%.
77
What is the corresponding conventional study power?
80%.
78
What is statistical power?
The ability of a study to detect a true difference between groups when it exists.
79
What are the typical values chosen for power and beta?
β = 0.2 and power = 80% (0.8).
80
What factors influence study power?
Sample size, effect size, variability of observations, and chosen level of statistical significance (p-value).
81
Why is 'power calculation' considered a misnomer?
It is essentially a sample size calculation based on predefined values of α, power, effect size, and variance.
82
How should variance ideally be estimated?
From pilot studies or previously published literature.
83
What is the standardised difference?
Effect size expressed as the target difference in means divided by the standard deviation.
84
What is Altman's nomogram used for?
Estimating sample size using standardised difference and power.
85
How does increasing the significance level affect power?
It increases power but also increases the risk of Type I error.
86
How does increasing sample size affect power?
It increases power by reducing variance but increases cost.
87
How does effect size influence power?
Large effect sizes increase power but may ignore clinically meaningful small effects.
88
How can variability be reduced to increase power?
Through more precise measurements or matching subjects.
89
Why is a one-sided test more powerful than a two-sided test?
Because it concentrates statistical power in one direction, assuming strong prior justification.
90
How does test selection influence power?
Parametric tests are generally more powerful when assumptions are met.
91
Why are samples used in research?
Samples are used to draw inferences about the population from which they are drawn.
92
When can study results be generalised to the population?
When the sample is reasonably representative of the population.
93
Why can representativeness never be perfect?
Sample data are only approximations of true population values.
94
What is a confidence interval (CI)?
A range of values within which the true population parameter is likely to lie.
95
What does a 95% confidence interval indicate?
That 95 out of 100 similarly drawn samples would contain the true population value.
96
What does a narrower confidence interval indicate?
Greater precision and better representativeness of the sample.
97
For which measures can confidence intervals be calculated?
Means, effect sizes, relative risks, odds ratios, and number needed to treat (NNT).
98
What does the degree of confidence represent?
The complement of the p-value (e.g. 95% confidence corresponds to an α = 0.05).
99
How does increasing confidence level affect confidence interval width?
Higher confidence levels result in wider confidence intervals.
100
What does confidence interval width reflect?
Precision of the estimate, influenced by standard error and sample size.
101
How do confidence interval limits affect interpretation?
If limits include the population value, there is no evidence of a difference.
102
What is the value of no difference?
0 for mean difference, 1 for ratio measures, and infinity for NNT.
103
What information do confidence intervals provide?
Degree of confidence, precision, clinical significance, and statistical significance.
104
How can confidence interval width be reduced?
Lower confidence level, reduced SD, or increased sample size.
105
What is effect size?
Difference in outcomes between intervention and control divided by SD.
106
Why are effect sizes important?
They quantify magnitude of effect and are independent of sample size.
107
Why are effect sizes useful in meta-analyses?
They allow comparison across studies using different measurement scales.
108
What is Cohen's d?
A standardised difference between 2 means.
109
How are effect sizes conventionally graded?
Small (0.2), medium (0.5), and large (0.8).
110
How can effect size be interpreted clinically?
A z-score indicating the proportion of control scores exceeded by the experimental group.
111
What is the Common Language Effect Size (CLES)?
Probability that a randomly selected experimental score exceeds a control score.
112
Why is repeated statistical testing problematic?
It increases the likelihood of Type I errors.
113
What is the Bonferroni correction?
Adjusting significance level by dividing it by the number of tests performed.
114
What is a limitation of Bonferroni correction?
It is overly conservative and may increase false negatives.
115
What is family-wise error (FWE)?
Probability of at least one Type I error across multiple comparisons.
116
What is the False Discovery Rate (FDR)?
The expected proportion of false positives among significant findings.
117
What is inferential statistics?
Methods used to generalise findings from a sample to a population.
118
What are 2 main forms of statistical inference?
Estimation and hypothesis testing.
119
What is point estimation?
Estimation of a single population parameter value.
120
What is interval estimation?
Estimation of a parameter range with a defined confidence level.
121
What is reliability?
The consistency and replicability of a measurement instrument.
122
Does high reliability guarantee validity?
No, it guarantees consistency but not truth.
123
How is test-retest reliability assessed?
By administering the same test twice to the same population.
124
What is Cronbach's alpha?
A measure of internal consistency, with 0.70 commonly used as a cut-off.
125
What is interrater reliability?
Agreement between multiple raters using the same instrument.
126
What is intraclass correlation coefficient (ICC)?
Proportion of variance attributable to true differences between subjects.
127
What is validity?
The extent to which an instrument measures what it intends to measure.
128
What is face validity?
Subjective judgment of whether an instrument appears to measure the intended construct.
129
What is construct validity?
Whether an instrument measures the theoretical construct of interest.
130
What is content validity?
Degree to which test items represent all relevant domains.
131
What is criterion validity?
Performance of a test against an external criterion.
132
What is concurrent validity?
Validity based on current correlations.
133
What is predictive validity?
Validity based on future outcomes.
134
What is convergent validity?
Agreement between instruments measuring the same construct.
135
What is discriminant validity?
Low correlation between instruments measuring different constructs.
136
What is experimental validity?
Sensitivity of an instrument to detect change after intervention.
137
What is precision?
Degree of variability in repeated measurements.
138
What is accuracy?
Closeness of a measurement to the true population value.
139
What compromises accuracy?
Bias.
140
Why is percent agreement misleading?
It overestimates agreement by ignoring chance agreement.
141
What does kappa measure?
Agreement beyond chance for categorical variables.
142
When is weighted kappa used?
For ordinal data.
143
What is Bland-Altman analysis used for?
Assessing agreement for continuous variables.
144
What is the most important question when choosing a statistical test?
What is being tested, comparison of point estimates (Category 1) or demonstration of association/relationship (Category 2)?
145
What defines Category 1 statistical questions?
Comparing samples for point estimates such as means, medians, or proportions.
146
What defines Category 2 statistical questions?
Demonstrating associations or relationships between variables.
147
What category do causal association studies usually fall into?
Category 2.
148
What is the first question to ask in Category 1 testing?
What is the nature of the point estimate, mean or proportion?
149
What is the second key question in Category 1 testing?
How many groups are being compared?
150
What is the third key question in Category 1 testing?
Are observations paired or unpaired (independent)?
151
What is the fourth key question in Category 1 testing?
Can a parametric distribution be assumed?
152
When can parametric tests be used?
When at least one variable is quantitative and normally distributed.
153
What assumption is made in exam settings if distribution is not stated?
Biologically variables are assumed to be normally distributed.
154
Which parametric tests are commonly used to compare means?
t-test and ANOVA.
155
When are non-parametric tests used?
When variables are qualitative or when quantitative data are not normally distributed.
156
What do non-parametric tests compare?
Ranks rather than means or medians alone.
157
What is the non-parametric equivalent of the one-sample t-test?
Sign test.
158
What is the non-parametric equivalent of the paired t-test?
Wilcoxon rank sum test.
159
What test is used for 2 independent groups in non-parametric testing?
Mann-Whitney U test.
160
What is the non-parametric equivalent of one-way ANOVA?
Kruskal-Wallis test.
161
Why are data transformations used?
To allow the use of more robust parametric tests.
162
What transformation is commonly used for right-skewed data?
Log transformation.
163
What transformation is used for Poisson distributions?
Square root transformation.
164
What transformation is often used for survival rates?
Reciprocal transformation.
165
What transformation is commonly used for proportions?
Logit transformation.
166
What is a limitation of transformed data?
Confidence intervals may be difficult to interpret.
167
What is the chi-square test primarily used for?
Comparing frequency counts or proportions.
168
What is a contingency table?
A table displaying frequency data for categorical variables.
169
What is required of rows and columns in a contingency table?
They must be mutually exclusive.
170
What are observed frequencies?
Actual outcomes observed in a study.
171
What are expected frequencies?
Outcomes predicted if the null hypothesis were true.
172
When should Fisher's exact test replace chi-square?
When expected frequencies are <5 in more than 20% of cells.
173
When is Yates' correction used?
When total sample size is <100 or any cell value is <10.
174
Why should chi-square be applied to percentages?
Because converting to percentages reduces cell values and yields incorrect results.
175
What is McNemar's test?
A chi-square test for paired categorical data.
176
What is the Mantel-Haenszel test?
A chi-square method assessing effects of 2 dichotomous variables.
177
What is log-linear analysis?
A chi-square-based method using log frequencies for multiple variables.
178
What is a one-sample t-test used for?
Comparing a sample mean with a known population mean.
179
What assumptions are required for a one-sample t-test?
Normal distribution and adequate sample size.
180
What is the purpose of a two-sample (Student's) t-test?
Comparing means of 2 samples.
181
When is an unpaired t-test used?
For independent samples measured once.
182
When is a paired t-test used?
For repeated measurements in the same subjects.
183
What assumption regarding variance applies to t-tests?
Equal variance between groups.
184
How can equality of variance be tested?
Using Levene's test.
185
What is ANOVA used for?
Comparing means across multiple groups.
186
What does ANOVA compare?
Variance between groups relative to variance within groups.
187
What is a one-way ANOVA?
Comparison of one independent variable across multiple groups.
188
What is a two-way ANOVA?
Analysis involving 2 independent variables.
189
When is repeated-measures ANOVA used?
When the same subjects are measured multiple times.
190
What statistic is used in ANOVA?
F-statistic.
191
What is a limitation of ANOVA?
It identifies that differences exist but not where they occur.
192
Why are post-hoc tests needed after ANOVA?
To identify specific group differences.
193
What assumptions underlie ANOVA?
Normality, equal variance, and independent observations.
194
What are degrees of freedom?
The number of values free to vary when estimating a statistic.
195
Why is n-1 used when estimating population SD from a sample?
One degree of freedom is lost due to estimation of the mean.
196
How are degrees of freedom lost in regression or ANOVA?
One df lost for each parameter estimated.
197
What is the df formula for chi-square tests?
df = (Rows - 1) x (Columns - 1)
198
What is the df for a two-sample t-test?
df = (n₁ + n₂ − 2)
199
What does correlation measure?
Degree of association between 2 quantitative variables.
200
What visual tool illustrates correlation?
Scatterplot.
201
What correlation coefficient is used for parametric data?
Pearson's correlation coefficient (r).
202
What is the range of r?
-1 to +1.
203
What does the sign of r indicate?
Direction of the relationship.
204
What does the magnitude of r indicate?
Strength and linearity of association.
205
Does correlation imply causation?
No.
206
When is Spearman's rho preferred?
For ordinal date, non-normal distributions, non-linearity, or small samples.
207
When is Kendall's tau preferred?
When ordinal ranks are not equidistant.
208
Why is regression needed beyond correlation?
To predict values of one variable from another.
209
What is the equation for simple linear regression?
y = a + bx
210
What does the regression coefficient (b) represent?
The change in y per unit change in x.
211
What method is used to fit the regression line?
Least squares method.
212
What is multiple linear regression?
Prediction of one dependent variable using multiple independent variables.
213
What is collinearity?
High correlation between independent variables.
214
What does R² represent?
Proportion of variance in the dependent variable explained by predictors.
215
When is logistic regression used?
When the dependent variable is binary.
216
What is the '1 in 10' rule?
Number of predictors ≤10% of sample size or number of events.
217
What is multivariable analysis?
One dependent variable with multiple independent variables.
218
What is multivariate analysis?
Multiple dependent and independent variables.
219
When is ANCOVA used?
When both categorical and continuous independent variables are present.
220
What is factor analysis used for?
Identifying latent variables underlying observed correlations.
221
What are 2 main types of factor analysis?
Exploratory and confirmatory.
222
What is exploratory factor analysis used for?
Data reduction and identification of latent constructs.
223
What is confirmatory factor analysis used for?
Testing predefined factor structures and construct validity.
224
What eigenvalue criterion is commonly used?
Retain factors with eigenvalues >1 (Kaiser rule).
225
Waht factor loading is considered significant?
≥0.40.
226
What is stratification used for?
Controlling known confounders.
227
What is direct standardisation?
Applying study rates to a standard population.
228
What is indirect standardisation?
Applying standard population rates to the study sample.
229
Draw 2x2 table to know for Sen, Spec, PPV and NPV.
+ - Test + TP FP Test - FN TN Sp = TN/(TN+FP) PPV = TP/(TP+FP) NPV = TN/(TN+FN) | Sn = TP/(TP+FN) ## Footnote Start clockwise top left Sn, Sp columns ↓ PPV/NPV → rows T's always in numerator!