alpha and p value
α = “What level of false-positive risk am I willing to accept beforehand?”
p-value = “Given my data, how likely is it I’d see this result if the null were true?”
Type I and II error
a Type I error is a false positive, where you incorrectly reject a true null hypothesis, while a Type II error is a false negative, where you fail to reject a false null hypothesis
Type 1 FP
Type 2 FN
Confidence interval and power
If a question stem says a trial had wide confidence intervals overlapping the null, the correct interpretation is:
Study was underpowered (too few participants or too much variability).
The null is 1
e.g., CI of 0.52 - 1.01
SD empirical rule
68% within 1 SD
95% within 2
99.7% within 3
if something is 2 SD below the mean, 95%
Patient weight 166, mean 171.2, SD is 2.6 pounds. 166 is 5.2 away from 171.2 which is 2 SD. 95% fall within 2 SD, so 5% outside of the range of 2 SD. 5%/2 because it’s above and below is 2.5%
95% plus 2.5% is 97.5%
Statistical test methods
One-way ANOVA → compares means of a continuous variable across ≥2 groups (categorical IV). Not appropriate here.
Two-way ANOVA → compares means with two categorical independent variables. Not appropriate here.
Pearson correlation → measures linear correlation between two continuous, normally distributed variables. ✅
Spearman correlation → measures monotonic correlation when data are not normally distributed or ordinal.
Chi square test
Chi-square test → compares proportions/associations between categorical variables → ✅ correct.
Comparing means
. Comparing Means (Continuous Outcomes)
2 groups
→ t-test (independent if 2 separate groups, paired if before/after in same group)
≥3 groups
→ ANOVA (one-way if 1 factor, two-way if 2 factors)
Comparing proportions
2 categorical variables (Yes/No, Hospital A vs B, Male vs Female, etc.)
→ Chi-square test (large sample)
→ Fisher’s exact test (small sample size, expected counts <5)
Correlation
Continuous + Continuous
Normal data → Pearson correlation
Non-normal / ordinal data → Spearman correlation
Regression
Regression (Prediction Models)
Continuous dependent variable
→ Linear regression
Binary dependent variable (e.g., disease yes/no)
→ Logistic regression
Time-to-event outcome (e.g., survival analysis)
→ Cox proportional hazards model
Tests
Means → t-test / ANOVA
Proportions → Chi-square / Fisher
Relationship → Correlation
Prediction → Regression
KM vs Cox
Kaplan-Meier Survival Analysis:
Non-parametric method.
Estimates survival functions.
Compares survival between categorical groups (e.g., male vs. female).
Cannot handle continuous variables directly.
Cox Proportional Hazards Regression:
Semi-parametric method.
Models the hazard function and estimates hazard ratios.
Can handle both categorical and continuous predictors.
Allows adjustment for multiple covariates.
Sensitivity
TP/TP+FN SnNOUT
Specificity
TN/TN+FP SpPin
PPV
chance of having x condition when test is positive (abnormal)
NPV
chance of NOT having condition when test is negative (normal)
Among people who have glaucoma, 95% were identified by the test (i.e., tested positive).
Sensitivity
Among all people who tested positive, how many actually have the disease?
PPV
PPV
TP/TP+FP
likelihood ratio
probability of test result in patient with condition/
probability of test result in patient without condition
LR=1 no likelihood of condition
Prevalence and PPV/NPV
PPV and NPV are heavily influenced by prevalence, unlike sensitivity and specificity, which are inherent test characteristics.
PPV increases with higher prevalence
NPV decreases with higher prevalence
Likelihood ratio