Stats Flashcards

Question

A personality test correlates with other personality tests but not with unrelated anxiety scales. A. Convergent validity B. Discriminant validity C. Content validity D. Ecological validity E. Internal consistency

Answer 1

B. Discriminant validity Distinguishes construct from unrelated ones. (Convergent = correlates with similar constructs.)

Answer 2

✅ Answer: B – Content validity Ensures all facets of the construct are represented.

Answer 3

✅ Answer: B – Ecological validity Real-world generalisability of results.

Answer 4

✅ Answer: A – Internal validity Accuracy of causal inference within the study.

Answer 5

✅ Answer: A – Construct validity Demonstrates the scale truly measures the intended psychological construct.

Answer 6

= Sens / (1 – Spec) Likelihood of positive in diseased vs healthy ↑ = better test 💡 High-Yield Facts • LR⁺ > 10 = Strong evidence for disease • LR⁺ = 1 = No diagnostic value • LR⁺ < 2 = Weak • LR⁻ < 0.1 = Strong evidence against disease • LR⁺ combines both sensitivity & specificity → reflects discriminatory power of test 📘 Ref: Altman 1991; BMJ “Statistics Notes – Likelihood Ratios”

Answer 7

= (1 – Sens) / Spec Likelihood of negative in diseased vs healthy ↓ = better test

Answer 8

= 1– Specificity False positives among healthy “Labelled positive but healthy” • Specificity = True Negative Rate = proportion of non-diseased correctly identified • 1 – Specificity = False Positive Rate = proportion of non-diseased incorrectly labelled positive • Often asked as: “What proportion of healthy individuals are incorrectly identified as diseased?” → 1 – Specificity

Answer 9

= 1 – Sensitivity Missed true cases “Missed diagnosis” • “What proportion of diseased individuals go undetected?” → 1 – Sensitivity (False Negative Rate)

Answer 10

✅ Answer: B – 0.20 Explanation: False-positive rate = 1 – specificity = 1 – 0.8 = 0.2 (20%).

Answer 11

✅ Answer: C – 7.5 Explanation: LR⁺ = Sensitivity / (1 – Specificity) = 0.75 / 0.10 = 7.5. LR⁺ > 10 = very strong diagnostic evidence; 5–10 = moderate.

Answer 12

✅ Answer: D – Likelihood ratio Explanation: LR integrates true-positive and false-positive rates; predictive values depend on prevalence.

Answer 13

✅ Answer: B – PPV decreases Explanation: Lower prevalence → more false positives → lower PPV. PPV ∝ prevalence; NPV ∝ 1/prevalence.

Answer 14

✅ Answer: D – LR⁺ uses both sensitivity and specificity. Explanation: LR⁺ = Sensitivity / (1 – Specificity); LR⁻ = (1 – Sensitivity) / Specificity.

Answer 15

Answer: B – Outcome → Exposure Explanation: Investigators start by identifying those with and without the outcome (“cases” vs “controls”) and then look back to compare past exposures.

Answer 16

Answer: Cohort studies can calculate incidence and relative risk. Explanation: Because subjects are followed from exposure to outcome, new cases can be counted over time; case–control studies only estimate odds ratios.

Answer 17

Answer: Nested Case–Control Study Explanation: A subset of a pre-existing cohort is used. It’s economical and reduces recall bias because exposure data were collected prospectively before disease onset.

Answer 18

Answer: Before-and-After Study Explanation: A quasi-experimental design; participants act as their own controls. It assesses the effect of an intervention but lacks randomisation, so temporal and confounding biases can occur.

Answer 19

Confounder Explanation: SES affects both the exposure (smoking) and the outcome (asthma) but is not part of the causal pathway — the classic definition of a confounder.

Answer 20

Randomisation Explanation: Randomisation distributes both known and unknown confounders equally between groups, reducing confounding before data collection. Matching and stratification control confounding after sampling.

Answer 21

Matching Explanation: Matching ensures comparability between exposure groups on potential confounders (e.g., age, sex). Commonly used in case–control designs.

Answer 22

Answer: B) Interaction (effect modification) Explanation: When the strength or direction of an association changes according to a third variable (sex, age, etc.), that variable is an effect modifier, not a confounder.

Answer 23

Answer: B) Multivariable regression Explanation: Regression models (e.g., logistic or linear regression) statistically adjust for several confounders at once — essential in observational studies where randomisation isn’t possible.

Answer 24

• Confounding = a nuisance that hides or distorts the truth → you remove it. • Interaction = a real phenomenon that reveals subgroup differences → you highlight it. 🧪 Example 1: Smoking, Alcohol, and Cancer 👉 If smokers drink more and smoking itself causes throat cancer, smoking is a confounder. It distorts the true alcohol–cancer relationship. Once you adjust for smoking, the apparent strong link between alcohol and cancer weakens. 💡 Example 2: Sex, Smoking, and Heart Disease 👉 If smoking increases heart-disease risk much more in men than in women, sex is an effect modifier (interaction). The association truly varies between subgroups — it’s not bias; it’s biology.

Answer 25

Correct answer: Multiple Regression

Answer 26

Logistic regression Explanation: Binary outcomes (0/1) need logistic regression, which models log odds of the event.

Answer 27

Multivariable regression Explanation: “Multivariable” means >1 predictor. It can adjust for confounders and test independent effects of each variable.

Answer 28

Cox proportional hazards regression Explanation: Used for survival analysis — models hazard ratios, not odds ratios.

Answer 29

Each 1-unit increase in the predictor raises the outcome by 2.5 units. Explanation: β shows the slope — how much the dependent variable changes for every unit change in the independent one.

Answer 30

🧠 1️⃣ Continuous vs Categorical Variables Continuous Can take any numeric value within a range — measurable, not counted. Height (cm), weight (kg), age, blood pressure, IQ score, behavioural score, MMSE score Categorical (Discrete) Represents distinct groups or categories, not measured on a continuum. Gender (M/F), diagnosis (depression/psychosis), medication (yes/no), marital status 📊 Where This Fits in Regression Linear Regression Dependent Variable (outcome): Continuous Independent Variables (Predictors) Continuous or categorical (can be both) “How does age or sex predict MMSE score?” Logistic Regression Dependent Variable (outcome): Categorical (binary: yes/no) Independent Variables (Predictors) Continuous or categorical (can be both) “How does BMI or sex predict presence of diabetes (yes/no)?” ✅ So — • Linear regression is for continuous outcomes. • Logistic regression is for categorical (usually binary) outcomes.

Answer 31

✅ Answer: Chi-squared test ⸻ 🧠 Why: • Outcome = categorical (schizophrenia: yes/no) • Comparing two independent groups • Comparing proportions (risk) 👉 That = Chi-square ⸻ ❌ Traps: • t-test → ❌ for means, not proportions • Logistic regression → ❌ only if multiple predictors ⸻ 🔥 High-yield rule: 👉 “Proportions between groups → Chi-square”

Answer 32

✅ Answer: Logistic regression ⸻ 🧠 Why: • Outcome = binary (yes/no schizophrenia) • Multiple predictors = multivariate model 👉 That = logistic regression ⸻ ❌ Traps: • Chi-square → ❌ only single variable • Linear regression → ❌ requires continuous outcome ⸻ 🔥 High-yield rule: 👉 “Binary outcome + multiple predictors = LOGISTIC regression”

Answer 33

✅ Answers: Paired t-test AND repeated measures ANOVA ⸻ 🧠 Why: • Same individuals → paired data • Continuous variable → weight • Normal distribution → parametric tests 👉 Options: • Paired t-test → simplest • Repeated measures ANOVA → if multiple timepoints ⸻ ❌ Traps: • Independent t-test → ❌ wrong (not independent groups) • Chi-square → ❌ categorical only ⸻ 🔥 High-yield rule: 👉 “Same people before/after → PAIRED t-test”

Answer 34

✅ Answers: Standard deviation + Variance ⸻ 🧠 Why: Both measure dispersion (spread) • Variance = average squared deviation • SD = square root of variance (more clinically meaningful) ⸻ ❌ Traps: • Range → less robust • IQR → not typically first answer in exams ⸻ 🔥 High-yield: 👉 “Spread = SD ± variance (default exam answer)”

Answer 35

✅ Answer: 95% confidence interval ⸻ 🧠 Why: • 19/20 = 95% • CI = range where true population value lies ⸻ ❌ Trap: • p-value → NOT a range • SD → spread, not estimate ⸻ 🔥 High-yield: 👉 “95% CI = 19/20 rule”

Answer 36

✅ Answer: P-value (significance level) ⸻ 🧠 Why: • Type I error = false positive • Alpha (α) = probability of this • Usually p = 0.05 ⸻ ❌ Trap: • Power → Type II error • CI → estimation, not error ⸻ 🔥 High-yield: 👉 “Type I error = p-value = false positive” Type I error = False positive Meaning: You conclude there IS an effect, but in reality there is NO effect Example: • You say cannabis causes schizophrenia • But in truth… it doesn’t ➡️ That’s a Type I error ⸻ 🧠 Now: What is the p-value? 👉 The p-value = “If there was actually NO real effect, what is the probability of getting results this extreme?” So: • Small p-value → unlikely due to chance → “probably real” • Large p-value → likely due to chance → “probably not real”

Answer 37

✅ Answer: DMADV model ⸻ 🧠 Explanation: DMADV = 👉 Define – Measure – Analyse – Design – Validate • Used when: • You are creating something new • Not just improving existing processes ⸻ ❌ Trap: • PDSA → ❌ for improving existing processes • Model for Improvement → ❌ iterative improvement, not new design ⸻ 🔥 High-yield: 👉 “New system = DMADV” 👉 “Improve existing = PDSA”

Answer 38

✅ Answers: FOCUS approach + Model for Improvement (MFI) ⸻ 🧠 Explanation: 1️⃣ FOCUS • Find process • Organise team • Clarify knowledge • Understand variation • Select improvement 👉 Basically: prep before PDSA ⸻ 2️⃣ Model for Improvement (MFI) Asks: • What are we trying to accomplish? • How will we know change is improvement? • What changes can we make? 👉 Then → PDSA cycles ⸻ ❌ Trap: • PDSA itself → ❌ not “before”, it IS the cycle ⸻ 🔥 High-yield: 👉 “FOCUS + MFI = BEFORE PDSA”

Answer 39

✅ Answer: Statistical Process Control (SPC) chart ⸻ 🧠 Explanation: SPC charts: • Track performance over time • Show: • Normal variation (common cause) • Abnormal variation (special cause) Examples: • Run charts • Control charts ⸻ ❌ Trap: • Bar chart → ❌ static data • Pie chart → ❌ proportions only ⸻ 🔥 High-yield: 👉 “Variation over time = SPC chart”

Answer 40

✅ Correct Answer: → -1 ⸻ 🧠 Explanation (EXAM STYLE) Correlation coefficient (r) ranges from: 👉 -1 → +1 • +1 = perfect positive correlation • 0 = no correlation • -1 = perfect inverse (negative) correlation ⸻ 💡 What does “perfect inverse” mean? As one variable increases → the other decreases in a perfectly straight line Example: • Exercise ↑ → Weight ↓ (perfectly linear hypothetical)

Answer 41

✅ Correct Answer: → Box-and-whisker plot ⸻ 🧠 Why this is a box plot (step-by-step visual decoding) Look at the key features in your image: 1️⃣ Rectangular boxes • Each category (Psychosis, Bipolar, etc.) has a box 👉 This = interquartile range (IQR) ⸻ 2️⃣ Line inside the box • That line = median ⸻ 3️⃣ “Whiskers” extending up/down • These show range (min–max or 1.5×IQR) ⸻ 4️⃣ Dots and stars • These are: • Outliers • Extreme values 👉 THIS combo = box plot signature ⸻ ❌ Why others are wrong (EXAM TRAPS) Histogram ❌ • Continuous data • Bars touch • Shows frequency distribution 👉 NO medians / whiskers ⸻ Bar chart ❌ • Categorical data • Shows means/counts 👉 NO spread or quartiles ⸻ Funnel plot ❌ • Used in meta-analysis • Looks like an inverted triangle ⸻ Kaplan-Meier ❌ • Survival curve • Step-like graph over time ⸻ 🔥 High-Yield Paper B Pearls 1️⃣ Box plot = distribution summary 👉 Shows: • Median • IQR • Spread • Outliers ⸻ 2️⃣ When do they test it? 👉 When comparing distribution across groups (e.g. diagnoses vs satisfaction scores — EXACTLY your question)

Answer 42

✅ Correct Answer: → Interquartile range (IQR) (more specifically: Q1 and Q3) ⸻ 🧠 Break it down visually (using YOUR graph) Inside each box plot: 📦 The box itself = IQR • Lower edge of box → Q1 (25th percentile) • Upper edge of box → Q3 (75th percentile) 👉 So the shaded box = middle 50% of data ⸻ ➕ The line inside the box → Median (Q2) ⸻ 📏 The whiskers → Extend to: • Min/max OR • 1.5 × IQR (depending on convention) ⸻ ⭐ Dots / stars → Outliers ⸻ 🔥 High-Yield Paper B Pearls 1️⃣ Definition of IQR 👉 IQR = Q3 – Q1 ⸻ 2️⃣ What does IQR represent? 👉 Spread of the middle 50% of the data ⸻ 3️⃣ Why examiners test this Because: • Mean is affected by outliers • IQR is NOT affected by outliers 👉 So it’s a robust measure of spread

Answer 43

✅ Correct Answer: → Median (Q2) ⸻ 🧠 How to read it instantly (no thinking needed in exam) In any box plot: 📦 Box edges → Q1 & Q3 📏 Whiskers → range ⭐ Dots → outliers ➡️ The line INSIDE the box = MEDIAN ⸻ 🧠 What is the median? 👉 The middle value of the dataset 👉 50% above, 50% below ⸻ 🔥 High-Yield Paper B Pearls 1️⃣ Median vs Mean (VERY IMPORTANT) • Median = robust (not affected by outliers) ✅ • Mean = affected by outliers ❌ 👉 Box plots show median, NOT mean

Answer 44

✅ Correct Answer: → Median (Q2) ⸻ 🧠 How to read it instantly (no thinking needed in exam) In any box plot: 📦 Box edges → Q1 & Q3 📏 Whiskers → range ⭐ Dots → outliers ➡️ The line INSIDE the box = MEDIAN ⸻ 🧠 What is the median? 👉 The middle value of the dataset 👉 50% above, 50% below ⸻ 🔥 High-Yield Paper B Pearls 1️⃣ Median vs Mean (VERY IMPORTANT) • Median = robust (not affected by outliers) ✅ • Mean = affected by outliers ❌ 👉 Box plots show median, NOT mean

Answer 45

The width of boxes does not mean anything in these plots. The correct answer is: None of the above

Answer 46

✅ Correct Answer: → C 👉 Some patients with psychosis show extremes of variation in level of satisfaction ⸻ 🧠 Why this is correct (EXAM LOGIC) Look at the psychosis group: • There are multiple outliers (dots/asterisks) • These represent extreme values outside the IQR 👉 So: • Not just spread • But extreme variation (key exam phrase) ⸻ 🔥 High-Yield Rule 👉 Outliers = extremes of variation If you see: • many dots / asterisks → think: ✅ extreme values ✅ skew / variability ❌ NOT mean differences ❌ NOT statistical significance ⸻ ❌ Why the others are WRONG (this is where exams catch you) ⸻ ❌ A. None of the patients with personality disorder are dissatisfied 👉 WRONG because: • Box plots don’t show “none” or absolute absence • You cannot conclude all vs none ⸻ ❌ B. Lowest level of satisfaction is seen in anxiety group 👉 WRONG because: • You must look at median (line in box) • “Other” group actually looks lower ⸻ ❌ D. Patients with ‘other’ diagnoses show highest level of satisfaction 👉 WRONG: • Big box ≠ high satisfaction • Median is actually lower, not higher ⸻ ❌ E. Significant satisfaction is observed in psychosis and depression groups 👉 VERY IMPORTANT TRAP ⚠️ 👉 Box plots = descriptive only ❌ They do NOT show: • statistical significance • p-values • comparisons

Answer 47

✅ Correct Answer: → 1 ⸻ 🧠 Step-by-step (EXAM METHOD) Step 1: Identify values • EER (treatment) = 40% = 0.4 • CER (control) = 20% = 0.2 ⸻ Step 2: Calculate Relative Risk (RR) RR = \frac{EER}{CER} = \frac{0.4}{0.2} = 2 ⸻ Step 3: Calculate Relative Risk Reduction (RRR) RRR = 1 - RR RRR = 1 - 2 = -1 👉 Take absolute value (exam expects magnitude): RRR = 1 ⸻ 🔥 High-yield interpretation ⚠️ This is a TRAP question • Treatment actually increases response (good outcome) • So technically this is Relative Risk Increase (RRI) • But exam still labels it as RRR → take absolute value ⸻ 🧠 ULTRA HIGH-YIELD RULE 👉 If outcome is GOOD (response, survival): • RR > 1 → GOOD • “RRR” may become negative → take absolute value 👉 If outcome is BAD (death, relapse): • RR < 1 → GOOD • RRR = 1 - RR (normal) ⸻ 💡 One-line memory hack 👉 “RRR = 1 − RR… but ignore the sign in exams”

Answer 48

✅ Correct Answer: → case-control study ⸻ 🧠 Explanation (VERY HIGH-YIELD) ✅ Qualitative methods = explore experiences, meanings • Focus groups ✔️ • Participant observation ✔️ • Ethnography ✔️ • Semi-structured interviews ✔️ 👉 These are: • subjective • descriptive • non-numerical ⸻ ❌ Case-control study = QUANTITATIVE • Compares cases vs controls • Uses numbers, odds ratios • Analytical epidemiological design

Answer 49

✅ Correct Answer: → Galbraith plot ⸻ 🧠 Why this is a Galbraith plot This graph shows: • multiple study points from a meta-analysis • a central straight regression line • parallel dashed lines around it • points scattered around that line to assess heterogeneity / outliers 👉 That pattern is classic for a Galbraith plot (also called a radial plot) ⸻ 🔥 What a Galbraith plot is used for A Galbraith plot helps to: • assess heterogeneity in meta-analysis • identify outlier studies • visually show which studies deviate from the overall effect If studies fall far away from the central line or outside the dashed limits, they may be contributing to heterogeneity. 💎 High-yield Paper B pearls 1. Funnel plot = publication bias If they ask: “Which plot is used to detect publication bias?” 👉 Funnel plot 2. Galbraith plot = heterogeneity If they ask: “Which plot helps detect heterogeneity or outlier studies in meta-analysis?” 👉 Galbraith plot 3. Forest plot Another big one: • each study shown with CI bars • pooled effect at bottom • used to display results of meta-analysis

Answer 50

✅ Correct Answer: → A ⸻ 🧠 Why A = regression analysis In a Galbraith plot: 👉 The central straight line = regression line • It represents the overall effect estimate • It’s derived from regression of effect size vs precision 👉 Point A is sitting on that central line ➡️ So it corresponds to the regression line / regression estimate ⸻ 🔥 High-yield concept (VERY examable) 👉 Galbraith plot = regression-based plot • X-axis → precision (1/SE) • Y-axis → standardized effect size 👉 The line is literally: a regression line through the studies

Answer 51

“Which of the following points can be used to compute the standard error value of a given study in the meta-analysis?” ⸻ 🧠 First — translate the question into simple language They are asking: 👉 “Where on this graph can I find information about how reliable each study is?” Because remember: 👉 Standard error = how reliable / precise the study result is ⸻ 🧠 Now connect it to the graph (Galbraith plot) From earlier: 👉 X-axis = precision = 1 / standard error ⸻ 🔑 So: If you know: \text{Precision} = \frac{1}{SE} Then: SE = \frac{1}{\text{X-axis value}} ⸻ ✅ Final Answer (in exam terms) 👉 The horizontal position of the point (X-axis value) ⸻ 🧠 Why this makes sense (intuitively) Think of each dot: • Moving RIGHT → more precise → smaller SE → more reliable • Moving LEFT → less precise → bigger SE → less reliable 👉 So the X-axis is literally showing you: “How trustworthy is this study?”

Answer 52

In a Galbraith plot: 👉 Y-axis = Z score (standardised effect size) 👉 X-axis = precision (1/SE) ⸻ 🧠 So what does that mean visually? Each dot: • Horizontal position → how reliable the study is (SE) • Vertical position → the Z score ⸻ ✅ Correct Answer 👉 The vertical position (Y-axis value) of the point ⸻ 🧠 Why this makes sense (intuitive version) Remember: 👉 Z score = “how big the effect is compared to its uncertainty” So: • Higher up → stronger effect relative to noise • Lower down → weaker effect 👉 That’s exactly what the Y-axis shows

Answer 53

Point D falls outside the 2 standard deviation margins. It is an outlier. The correct answer is: Point D

Answer 54

✅ Correct Answer: → Precision ⸻ 🧠 Let’s explain this like you’re seeing it for the first time We said earlier: 👉 This is a Galbraith plot ⸻ 🧠 What each axis means (VERY IMPORTANT) 👉 X-axis = Precision = 1 / Standard Error (1/SE) 👉 Y-axis = Z score (standardised effect) ⸻ 🧠 Now in plain English 👉 X-axis tells you: “How reliable is this study?” ⸻ 🟢 Intuitive picture • Points on the right side → very reliable studies (low SE) • Points on the left side → less reliable studies (high SE) ⸻ 🔥 Why we call it “precision” Because: 👉 Precision = how tight / consistent / reliable a result is • High precision → small variability → trustworthy • Low precision → big variability → less trustworthy

Answer 55

✅ Correct Answer: → Standardised effect ⸻ 🧠 Let’s explain this from absolute basics We already said: 👉 This is a Galbraith plot ⸻ 🔑 What each axis means (anchor this firmly) 👉 X-axis = precision (1/SE) 👉 Y-axis = Z score = standardised effect ⸻ 🧠 What is “standardised effect” (simple explanation) 👉 It means: “How big is the result compared to its uncertainty?” Mathematically: Z = \frac{\text{effect size}}{\text{standard error}} ⸻ 🟢 Intuitive version Imagine: • A study says “treatment works” • But how confident are we? 👉 Standardised effect answers: “Is this effect big enough relative to the noise?” ⸻ 💡 What the Y-axis is showing 👉 Higher up: • stronger effect relative to error 👉 Lower down: • weaker or uncertain effect

Answer 56

✅ Correct Answer: → Case series ⸻ 🧠 Think like the exam 👉 What do we have? • Only 3 patients • Observational finding • Possible new adverse effect ⸻ 💡 What is a case series (super simple) 👉 It means: “I saw a few similar patients and I’m reporting what I noticed” ⸻ 🔥 Why this is PERFECT here • You don’t have a trial • You don’t have controls • You just want to: 👉 alert others quickly 👉 New side effect = case report/series FIRST

Answer 57

✅ Correct Answer: → Lower positive predictive value in the rural region ⸻ 🧠 Now let’s break this down VERY simply Step 1: What is changing? 👉 Prevalence changes • Urban → HIGH HIV prevalence • Rural → LOW prevalence ⸻ Step 2: What stays the same? 👉 Same test → • Sensitivity = same • Specificity = same ⸻ Step 3: What changes with prevalence? 👉 ONLY: • PPV • NPV ⸻ 💡 Key concept (THIS IS EXAM GOLD) 👉 Higher prevalence → higher PPV 👉 Lower prevalence → lower PPV ⸻ 🧠 So here: • Rural → LOW prevalence ➡️ PPV ↓ ⸻ 🧠 What is PPV (beginner version) 👉 “If test is positive → how likely is it TRUE?” ⸻ 💡 Why PPV drops in rural Imagine: • Few people actually have HIV • Even if test is good 👉 many positives will be false ⸻ ❌ Why other options wrong ❌ Sensitivity / specificity 👉 DO NOT change with prevalence 🚨 EXAM TRAP 🚨 ⸻ 🔥 High-yield summary (must memorise) 👉 Sensitivity & specificity = test property 👉 PPV & NPV = population dependent

Answer 58

✅ Correct Answer: → Regression towards the mean ⸻ 🧠 Explain like you’re seeing this for the first time Step 1: What did they do? They picked: 👉 People with VERY HIGH blood pressure ⸻ Step 2: What happened later? 👉 Their average BP went DOWN ⸻ ❗ Your brain says: “Did something change?? Treatment?? Learning??” 🚫 NO. ⸻ 💡 What actually happened 👉 When you select extreme values (very high or very low)… ➡️ Next time you measure them ➡️ They naturally move closer to average ⸻ 🎯 Simple analogy Imagine: • You measure height of “tallest kids” in class • Next time → they’re still tall, but slightly less extreme 👉 Why? Because first measurement had random variation ⸻ 🔥 Key idea 👉 Extreme values = partly real + partly random fluctuation 👉 On repeat: ➡️ random part disappears ➡️ value moves closer to mean

Answer 59

✅ Correct Answer: → C, D and E are well established causes of B ⸻ 🧠 Now let’s simplify this BIG concept This is about: 👉 Causation vs association ⸻ 💡 Think like this You found: 👉 A is linked to B But question is: 👉 “Does A actually CAUSE B?” ⸻ 🔥 What weakens causation? 👉 If MANY OTHER things already cause B ⸻ 🎯 Simple example Let’s say: • A = drinking coffee • B = heart attack But: • Smoking • Diabetes • Hypertension ALREADY cause heart attacks 👉 Then coffee is LESS convincing as a cause ⸻ 💡 This is called 👉 Lack of specificity (Bradford Hill criteria) ⸻ 🧠 Why this weakens causality If: 👉 Many causes exist ➡️ A is less likely to be THE cause ⸻ ❌ Why others are WRONG (they actually SUPPORT causation) ✔ Dose-response 👉 More A → more B ➡️ STRONG evidence ⸻ ✔ A precedes B 👉 Cause must come BEFORE effect ⸻ ✔ Biological plausibility 👉 Makes sense scientifically ⸻ ✔ Consistency 👉 Seen in multiple studies ⸻ 🚨 Exam trick 👉 Most options = “support causation” 👉 One option = “weakens causation” ⸻ 🧠 High-yield summary 👉 Bradford Hill criteria (remember this list!): • Temporality (A before B) • Dose-response • Consistency • Biological plausibility • Specificity

Answer 60

✅ Correct Answer: → 20 ⸻ 🧠 Step-by-step (VERY beginner) Step 1: Identify what they’re asking 👉 “Number needed to treat (NNT)” ⸻ Step 2: What is NNT? 👉 “How many patients do I treat to help ONE extra person?” ⸻ Step 3: Calculate difference (this is KEY) Fluoxetine = 65% Dosulepin = 60% 👉 Difference = 5% ⸻ Step 4: Convert to decimal 5% = 0.05 ⸻ Step 5: Apply formula 👉 NNT = 1 / ARR ARR = absolute risk reduction = 0.05 👉 NNT = 1 / 0.05 = 20 ⸻ 🎯 Super simple way to think 👉 If 5 extra people improve per 100 patients ➡️ You need 20 patients to get 1 extra success ⸻ 🔥 High-yield rule 👉 Small difference = BIG NNT 👉 Large difference = small NNT ⸻

Answer 61

✅ Correct Answer: → Increase in magnitude of effect ⸻ 🧠 Now let’s make this EASY Step 1: What is “centre effect”? 👉 Different hospitals/centres give different results ⸻ Step 2: Why is this a problem? 👉 Patients in same centre are similar ➡️ Not independent ⸻ Step 3: What happens if we IGNORE this? 👉 We treat all patients as independent ➡️ This falsely: • makes data look stronger • exaggerates results ⸻ 💡 Result 👉 Effect size looks bigger than it actually is ➡️ Increase in magnitude of effect ⸻ 🎯 Simple analogy Imagine: • One hospital is AMAZING → all patients improve • Others average If you ignore clustering: 👉 It looks like treatment is AMAZING everywhere ⸻ 🔥 What error is this? 👉 Type I error (false positive) 👉 Overestimating effect ⸻ ❌ Why others wrong ❌ Sampling error 👉 Too vague ❌ Reduced precision 👉 Actually often looks MORE precise falsely ❌ No change 👉 There is an effect ❌ Decrease 👉 Opposite of what happens ⸻ 🧠 High-yield rule 👉 Cluster/centre effects → exaggerate treatment effect if ignored

Answer 62

✅ Correct answer: 100% ⸻ 💡 Step-by-step explanation (VERY SIMPLE) 1️⃣ First: What is specificity? 👉 Specificity = ability to correctly identify NON-diseased people Formula: 👉 Specificity = True Negatives / Total Non-diseased ⸻ 2️⃣ Translate the question into a table We have: Diseased (schizophrenia patients = 100) • 60 detected correctly → True Positives (TP) = 60 • 40 missed → False Negatives (FN) = 40 ⸻ Non-diseased (controls = 100) • “Does NOT wrongly diagnose anyone” 👉 Means: • False Positives (FP) = 0 • So all are correctly negative → True Negatives (TN) = 100 ⸻ 3️⃣ Now calculate specificity 👉 Specificity = TN / (TN + FP) 👉 = 100 / (100 + 0) 👉 = 100 / 100 = 100% ⸻ 🔥 Key insight (THIS is the trick) 👉 The moment you see: “does NOT wrongly diagnose anyone” 💥 That means: ➡️ False positives = 0 ➡️ Specificity = 100% ⸻ ❌ Common trap People focus on: 👉 “60 out of 100” That is sensitivity, NOT specificity. ⸻ 🚨 High-yield facts (Paper B GOLD) 1. Sensitivity = TP / all diseased 2. Specificity = TN / all non-diseased 3. No false positives → specificity = 100% 4. No false negatives → sensitivity = 100% ⸻ 🧠 Exam takeaway 👉 Specificity cares ONLY about controls (non-diseased) ⸻ 💥 One-line memory trick 👉 “No false positives = perfect specificity”

Answer 63

💡 First — what is NPV? 👉 NPV = If the test is NEGATIVE → how likely the person is truly healthy Formula: 👉 NPV = True Negatives / All Negative Results ⸻ 🧠 Step 1: Build the table (same as before) Diseased (100 patients) • 60 correctly detected → TP = 60 • 40 missed → FN = 40 ⸻ Controls (100 people) • “No one wrongly diagnosed” 👉 FP = 0 👉 TN = 100 ⸻ 🧠 Step 2: Focus ONLY on NEGATIVE tests 👉 Who tested negative? • FN = 40 (they have disease but test says negative ❌) • TN = 100 (healthy and test negative ✅) 👉 Total negatives = 40 + 100 = 140 ⸻ 🧠 Step 3: Apply formula 👉 NPV = TN / (TN + FN) 👉 = 100 / (100 + 40) 👉 = 100 / 140 👉 = 0.714 = 71.4% ≈ 70% ⸻ ✅ Correct answer: 70% ⸻ 🔥 Why this confuses people You think: 👉 “No false positives → must be 100%” ❌ WRONG — that’s specificity, not NPV ⸻ 🚨 KEY DIFFERENCE (VERY HIGH-YIELD) 🔴 Specificity 👉 Looks ONLY at controls 👉 = TN / (TN + FP) ⸻ 🟢 NPV 👉 Looks at ALL NEGATIVE RESULTS 👉 Includes: • TN ✅ • FN ❌

Answer 64

💡 Step 1: What is sensitivity? 👉 Sensitivity = True Positives / All Diseased ➡️ Translation: “Out of all people who ACTUALLY have the disease, how many did the test catch?” ⸻ 🧠 Step 2: Extract the numbers CAREFULLY 🟢 Gold standard = TRUTH ⸻ From the question: 1️⃣ “Out of 30 suspected cases, only 20 are gold standard positive” 👉 So: • Total tested positive = 30 • True positives (TP) = 20 • Therefore: 👉 False positives (FP) = 10 (because 30 - 20) ⸻ 2️⃣ “Out of 20 ruled out by new test (test negative), 5 actually had NMS” 👉 This is KEY: • These are people: • test says negative ❌ • but actually have disease ✅ 👉 So: ➡️ False negatives (FN) = 5 ⸻ 🧠 Step 3: Total diseased 👉 Total diseased = TP + FN 👉 = 20 + 5 = 25 ⸻ 🧠 Step 4: Apply formula 👉 Sensitivity = TP / (TP + FN) 👉 = 20 / 25 ⸻ ✅ Correct answer: 20/25

Answer 65

✅ Correct answer A. Randomised control trial ⸻ 3️⃣ Clear, exam-focused explanation • The key clue is: • computer-generated list → randomisation • There is also: • intervention given to one group • control/placebo comparison • double blinding: • patients unaware • clinicians unaware • Neutral observer measuring outcome strengthens objectivity So this is a: • randomised • controlled • double-blind trial ⸻ 4️⃣ Why the other options are wrong B. Cross sectional study ❌ • Cross-sectional = snapshot at one point in time • No intervention, no randomisation C. Case control study ❌ • Starts with outcome, then looks back for exposure • Usually retrospective • No random allocation D. Cohort study ❌ • Follows exposed vs unexposed groups over time • Observational, not randomised intervention E. Ecological study ❌ • Unit of analysis is populations/groups, not individual patients ⸻ 5️⃣ ⭐ High-yield facts • Randomisation = strongest clue for RCT • Blinding reduces bias • Control group allows comparison • RCT = gold standard for testing treatment efficacy ⸻ 6️⃣ 🎯 One-line exam answer If patients are randomly allocated to treatment vs control, think randomised controlled trial.

Answer 66

✅ Correct answer B. Not precise but accurate ⸻ This question is testing the difference between accuracy and precision. Accuracy • Means how close results are to the true value / gold standard • Here: • “overall results are very close to gold standard” • so the test is accurate Precision • Means how consistent / tightly clustered the results are • Here: • “individuals vary widely” • so the test is not precise So: • accurate • but not precise ⭐ High-yield facts • Accuracy = closeness to truth • Precision = reproducibility / consistency • Widely scattered results = poor precision • Average close to gold standard = good accuracy

Answer 67

✅ Correct answer: E. Galbraith plot ⸻ 💡 Explanation (EXAM LOGIC) 👉 This question is PURE pattern recognition. Key phrases: • “normal standard deviate” • “reciprocal of standard error” 💥 That combination = Galbraith plot ⸻ 🧠 What is a Galbraith plot? • A meta-analysis tool • Used to: detect heterogeneity explore publication bias

Answer 68

✅ Correct answer: C. Convergent validity ⸻ 💡 Explanation (EXAM LOGIC) 👉 Key idea: You are comparing your new test with OTHER established tests measuring the SAME thing ⸻ 🧠 What is convergent validity? 👉 Does your test agree with other tests measuring the same construct? Here: • All tools measure depression • If they agree → good convergent validity ⸻ ⚠️ Why not others? ❌ Internal consistency • Do items within the SAME test agree? ⸻ ❌ Divergent validity • Should NOT correlate with unrelated constructs ⸻ ❌ Predictive validity • Predicts FUTURE outcomes ⸻ ❌ Inter-rater reliability • Agreement between different raters

Answer 69

✅ Correct answer: D. Confounding factor ⸻ 🧠 Explanation (EXAM LOGIC) 👉 A confounder is: ✔ Associated with the exposure (urban living) ✔ Associated with the outcome (schizophrenia) ❌ NOT on the direct causal pathway ⸻ Apply it here: • Urban living → ↑ cannabis use ✅ • Cannabis use → ↑ schizophrenia risk ✅ • But cannabis ≠ caused by urban living directly in pathway ❌ 👉 So cannabis distorts the relationship ➡️ That’s a confounder ⸻ 🚨 High-yield trap Many people pick: 👉 “Causal factor” ❌ Because cannabis does increase schizophrenia risk BUT: 👉 The question is about study design, not biology ⸻ 💥 Exam pearl 👉 If a variable is: • linked to BOTH exposure & outcome • not measured → ALWAYS think confounding

Answer 70

✅ Correct answer: C. Log-rank test ⸻ 🧠 Explanation (EXAM LOGIC) 👉 This is: ✔ Time-to-event data ✔ Event = relapse ✔ Comparing two groups over time ⸻ So we use: 1️⃣ Kaplan–Meier → draw survival curves 2️⃣ Log-rank test → compare curves ⸻ 🔥 KEY ASSOCIATION (MUST MEMORISE) 👉 “Time to event” = ➡️ Kaplan-Meier + Log-rank test If you see: 👉 “time to relapse / survival / duration” → Log-rank test ❌ A. Logarithmic transformation What it actually is: 👉 A data transformation technique Used when: • Data is skewed (e.g. income, biomarkers) • You want to make it more normally distributed ⸻ ❌ Why it’s wrong here: • It does NOT compare groups • It does NOT analyse time-to-event data • It’s just pre-processing, not a test 👉 Think: “Log transformation = fixing data shape, not testing hypotheses” ⸻ ❌ B. Log-linear analysis What it actually is: 👉 Used for categorical data relationships 👉 Especially in contingency tables Example: • Relationship between gender × diagnosis × treatment ⸻ ❌ Why it’s wrong here: • No categorical interaction being tested • Outcome is time, not categories 👉 Think: “Log-linear = categories talking to each other, not time passing” ⸻ ❌ D. Logistic regression What it actually is: 👉 Used when outcome is binary Examples: • Dead vs alive • Relapsed vs not relapsed ⸻ ❌ Why it’s wrong here: • Here we care about WHEN relapse happens, not just IF • Logistic regression would ignore timing completely ⸻ 🔥 Key distinction: • Logistic regression → “Did relapse happen?” • Survival analysis → “When did relapse happen?” 👉 This question = WHEN → survival analysis → log-rank ⸻ ❌ E. Log-based t test What it even is: 👉 Not a standard exam-relevant test (essentially a distractor) Closest idea: • t-test compares means of continuous variables ⸻ ❌ Why it’s wrong: • Time-to-event data is not analysed with t-tests • Doesn’t handle: censoring (VERY important in survival data) time dimension

Answer 71

✅ Correct answer: E. Cost-effectiveness acceptability curve (CEAC) ⸻ 🧠 Why this is correct (core idea) 👉 CEAC shows: ➡️ Probability that an intervention is cost-effective ➡️ Across different willingness-to-pay thresholds 👉 It directly answers: “Given uncertainty, how confident are we this treatment is worth it?” ⸻ ❌ Why the others are WRONG ❌ A. Cost-effectiveness plane 👉 What it does: • Plots incremental cost vs incremental effect ⸻ 👉 Why wrong: • Shows distribution of results • BUT does NOT summarise uncertainty as probability 💡 Think: Plane = picture CEAC = decision-making ⸻ ❌ B. ROC curve 👉 Used for: • Diagnostic test performance • Sensitivity vs specificity ⸻ 👉 Why wrong: • NOTHING to do with economics 🚨 Easy elimination ⸻ ❌ C. Willingness curve 👉 Sounds tempting ❗ (exam trap) 👉 Reality: • Not a standard/statistical tool in health economics exams ⸻ 👉 Why wrong: • CEAC already incorporates willingness-to-pay 💡 Trap = wording similarity ⸻ ❌ D. Cost-utility curve 👉 What it is: • Cost per QALY framework (type of analysis) ⸻ 👉 Why wrong: • It’s a method of evaluation, not a way to show uncertainty

Answer 72

✅ Correct answer: D. Randomization ⸻ 🧠 Why this is correct 👉 Randomization: ✔ Distributes confounders equally between groups ✔ Includes known AND unknown confounders 👉 This is the ONLY method that truly balances unknown confounders ⸻ ❌ Why the others are WRONG ⸻ ❌ A. Increasing sample size 👉 What it does: • Improves precision • Reduces random error ⸻ 👉 Why wrong: • Does NOT fix systematic bias (confounding) 💡 Big study ≠ unbiased study ⸻ ❌ B. Consecutive sampling 👉 What it does: • Takes patients in order ⸻ 👉 Why wrong: • Helps reduce selection bias • Does NOTHING for confounders ⸻ ❌ C. Volunteer selection 👉 What it does: • People choose to participate ⸻ 👉 Why wrong: • Actually INTRODUCES bias (volunteer bias) ⸻ ❌ E. Pragmatic trial 👉 What it does: • Real-world applicability ⸻ 👉 Why wrong: • Improves external validity • Does NOT control confounding

Answer 73

✅ Correct answer: Survival plot (Kaplan–Meier curve) ⸻ 🧠 WHY this is correct Look at the key features 👇 👉 X-axis = time (months) 👉 Y-axis = probability (cumulative incidence) 👉 Step-like curves 👉 Multiple groups compared ⸻ 💡 This = survival analysis Even though it says: “cumulative probability of HIV” 👉 That’s just the inverse of survival • Survival plot = probability of NOT having event • This graph = probability of HAVING event 👉 Same method → Kaplan–Meier ⸻ 🔥 KEY EXAM TRICK 👉 If you see: • Time on X-axis • Step-like curves • Multiple treatment groups ➡️ ALWAYS = Kaplan–Meier / survival plot

Answer 74

✅ Correct answer: E ⸻ 🧠 CORE CONCEPT (THIS IS EVERYTHING) 👉 Power = probability of detecting a true effect 👉 Formula: Power = 1 − β (Type II error) ⸻ 💡 So: If power = 80% 👉 There is: • 80% chance of detecting a true difference • 20% chance of missing it (Type II error) ⸻ What is the question REALLY asking? Study has 80% power to detect a 15% difference 👉 Translation: • If a real difference exists (≥15%) • The study will successfully find it 80% of the time ⸻ 💡 FIRST: What does “power” mean? 👉 Power = ability to detect a real effect The formula: 👉 Power = 1 − β (beta) • β = Type II error = missing a real effect • So: 👉 80% power = 20% chance of missing a real difference ⸻ 🎯 Put it in human language Imagine: 👉 There REALLY IS a 15% difference between groups You repeat the study many times… ➡️ In 80 out of 100 times, the study will: ✔ detect it ✔ give a significant result ➡️ In 20 out of 100 times, it will: ❌ miss it ⸻ ✅ Now look at the correct answer “The investigator had an 80% chance of detecting a difference of 15% if 15% or more if it was actually present” 👉 This is EXACTLY what we just said ✔ real difference exists ✔ study detects it 80% of the time ⸻ ❌ Why the other options are wrong (this is the key) ⸻ ❌ “80% chance of rejecting null if it is actually true” 👉 This is describing: ➡️ Type I error (alpha) 💡 Rejecting a TRUE null = false positive 👉 Power has NOTHING to do with this ⸻ ❌ “Probability of rejecting a false null is 20%” 👉 Rejecting a false null = detecting a real effect 👉 That should be 80% (power) ❌ Not 20% ⸻ ❌ “80% chance of retaining null if it is false” 👉 Retaining a false null = missing a real effect 👉 That is: ➡️ Type II error (β) = 20% ❌ Not 80% ⸻ ❌ “This cannot be interpreted” 👉 It absolutely can — very standard statement

Answer 75

✅ Correct answer: 👉 B. Discarding conference posters as data might not be peer reviewed ⸻ 💡 Explanation (exam-focused) 👉 The question is testing: ➡️ How to avoid publication bias in systematic reviews ⸻ 🔑 Key principle: 👉 A complete literature search must include: • Published studies ✔️ • Unpublished data ✔️ • Grey literature ✔️ ⸻ 🧠 What are conference posters? 👉 They are: • Grey literature • Often unpublished studies • Sometimes include negative or non-significant results ⸻ ❗ Why you should NOT discard them: 👉 If you remove them: ➡️ You introduce publication bias Because: • Positive studies → published • Negative studies → often only in posters ⸻ ❌ Why the other options are correct (and should NOT be avoided) ⸻ ✅ A. Using multiple databases • Increases coverage • Reduces missing studies ⸻ ✅ C. Writing to experts • Helps find unpublished or ongoing studies ⸻ ✅ D. Retrieving multiple language papers • Avoids language bias ⸻ ✅ E. Cross-checking references • Finds studies missed in database search

Answer 76

✅ Correct answer: 👉 C. Whether the test and ‘gold standard’ were applied independently ⸻ 🔍 Explanation (exam-focused) • Key issue = avoidance of bias • Specifically → work-up bias (verification bias) 👉 If: • Only test-positive patients get the gold standard → results become distorted ⸻ ❗ Why independence matters: • Both tests must be applied to ALL participants • And blinded from each other 👉 Otherwise: • Sensitivity ↑ falsely • Specificity ↓ falsely ⸻ ❌ Why others are wrong: • A. Sensitivity & specificity • These are outcomes, not the main validity threat • B. Inter-rater reliability Measures agreement, not diagnostic validity • D. Impact factor Irrelevant (classic distractor) • E. Test-retest reliability About consistency, not bias in validation ⸻ 🎯 Exam takeaway 👉 Gold standard must be applied independently to ALL patients → prevents verification/work-up bias 🧠 First: what are we worried about? In diagnostic studies: 👉 We compare a new test vs a gold standard BUT… 👉 If we don’t apply both tests properly → bias creeps in ⸻ 🚨 1. Work-up bias (Verification bias) 👉 These are basically the same thing (different names) ⸻ 💡 Definition (simple): Not everyone gets the gold standard test ⸻ 🎯 What usually happens: • Patient does screening test first • Only positives go on to get the gold standard 👉 Negatives are ignored ⸻ 🔴 Example: New test for depression: • Positive → seen by psychiatrist (gold standard) • Negative → sent home ⸻ ❗ What’s the problem? 👉 You never confirm if negatives were truly negative ➡️ You miss false negatives ⸻ 📉 Effect on results: • Sensitivity → looks higher than it really is • Specificity → also distorted ⸻ 💥 One-line: 👉 Work-up bias = only some people get the gold standard

Answer 77

✅ Correct answer: 👉 C. Relationship between cannabis and PANSS is stronger than relationship between nicotine and PANS ⸻ 🔍 Explanation (exam-focused) • Both p-values: • 0.04 → significant • 0.03 → significant 👉 BUT: • p-value ≠ strength of relationship ⸻ 🧠 What is missing? 👉 Correlation coefficient (r) • That tells strength • p-value only tells statistical significance ⸻ ❌ Why option C is wrong: • You cannot compare strength using p-values • Smaller p ≠ stronger association ⸻ ✅ Why others are correct: • A & E: Both p < 0.05 → significant ✔️ • B: Pearson correlation → parametric ✔️ 🔑 Step 1: What did the question say? 👉 It said: Pearson’s correlation (p = 0.04) ⸻ 🧠 Step 2: What does Pearson automatically imply? From what we just learned: 👉 Pearson = parametric test So the moment you see: ➡️ Pearson correlation 💥 You should instantly think: ➡️ Parametric statistics is being used ⸻ 🧠 Step 3: Why is Pearson “parametric”? Because it assumes: • Data is continuous • Data is normally distributed • Relationship is linear 👉 These are parametric assumptions ⸻ 🔥 So option B is basically asking: 👉 “Since Pearson was used… does that mean parametric statistics was used?” ✔️ YES → correct • D: PANSS treated as continuous ✔️ 🧠 What is a “continuous variable”? 👉 A number that can take MANY values Examples: • Height • Weight • Blood pressure • PANSS score ⸻ ❗ PANSS specifically: • It’s a score • Made of multiple items • Final result = number (e.g. 65, 72, 83) 👉 That makes it continuous ⸻ 🔥 Why does correlation matter here? 👉 Correlation (Pearson) requires: • TWO continuous variables ⸻ 🧠 In this question: They correlated: • Cannabis dose (number) • PANSS score (number) 👉 So PANSS must be treated as continuous ✔️ Therefore option D = TRUE ⸻ 🎯 Exam takeaway (VERY HIGH-YIELD) 👉 Never compare strength of relationships using p-values 👉 Always need r (correlation coefficient)

Answer 78

🧠 1. What is “correlation” in the first place? 👉 Correlation = “Do two variables move together?” Examples: • Dose ↑ → symptoms ↓ • Stress ↑ → anxiety ↑ ⸻ 📊 Correlation coefficient (r) • Ranges from -1 → +1 🔹 2. Pearson vs Spearman (THE big exam distinction) 🔵 Pearson’s correlation (r) 👉 Measures: Linear relationship between two continuous variables ⸻ ✅ Requirements: • Both variables = continuous • Data ≈ normally distributed • Relationship = linear (straight line) ⸻ 🧠 Example: • PANSS score vs cannabis dose • Height vs weight ⸻ 📈 What it looks like: Points form a straight-ish line ⸻ 🟢 Spearman’s correlation (rho, ρ) 👉 Measures: Monotonic relationship (not necessarily linear) ⸻ ✅ Requirements: • Data can be: • Ordinal OR continuous • NO need for normal distribution ⸻ 🧠 Example: • Rank in class vs exam score • Severity scale (mild/mod/severe) vs outcome ⸻ 📈 What it looks like: • Can be curved • As long as it goes consistently ↑ or ↓ 🧠 One-line memory hack 👉 Pearson = precise numbers 👉 Spearman = ranks / rough order ⸻ 🚨 VERY COMMON EXAM TRAPS ❌ Trap 1: Using Pearson when data is skewed → WRONG 👉 Use Spearman ⸻ ❌ Trap 2: Thinking both measure “any relationship” 👉 Pearson = linear only 👉 Spearman = any consistent trend ⸻ 🔹 3. What else do you NEED to know about correlation coefficient? This is where examiners get sneaky 👇 ⸻ 🧠 (1) Correlation ≠ causation 👉 Just because two things move together ❌ does NOT mean one causes the other Example: • Ice cream sales ↑ • Drowning ↑ 👉 Confounder = summer ☀️ ⸻ 🧠 (2) Strength vs significance r value strength of relationship p value whether it’s statistically significant 🔥 Example: • r = 0.8, p = 0.2 → strong but NOT significant • r = 0.2, p = 0.01 → weak but significant 🧠 (4) Outliers can RUIN Pearson 👉 One extreme value → distorts line ➡️ Spearman is more robust ⸻ 🧠 (5) Correlation only detects LINEAR (Pearson) 👉 If relationship is curved: • Pearson → may say no correlation • Spearman → still detects it ⸻ 🧠 (6) Units don’t matter 👉 Correlation is unit-free • kg vs cm → still same r 🔑 How do you decide WHICH one? 🔵 Use Pearson if: • Data is continuous (numbers like PANSS, weight, BP) • Data is normally distributed • Relationship is linear 👉 Example: • PANSS score vs cannabis dose ✔️ → Pearson ⸻ 🟢 Use Spearman if: • Data is ordinal (rank, Likert scale, mild/mod/severe) • OR data is not normally distributed • OR relationship is not linear 👉 Example: • Rank in class vs performance ✔️ → Spearman ⸻ 🔥 Exam shortcut (VERY important) 👉 If you see: • Pearson mentioned → parametric → continuous • Spearman mentioned → non-parametric → ordinal / skewed

Answer 79

✅ Correct answer: 👉 D. The researcher has committed a type 2 error ⸻ 🔍 Explanation (exam-focused) 👉 They found a significant association So: ➡️ They rejected the null hypothesis ⸻ 🧠 What is Type 2 error? 👉 False negative • There IS a real effect • But you fail to detect it ⸻ ❗ Why this cannot explain the result: 👉 Here they DID find a significant association ➡️ So they did NOT miss an effect ❌ Therefore → NOT Type 2 error ⸻ ❌ Why the others CAN explain the result: ⸻ ✅ A. Unknown factor (confounding) • Third variable explains the association ⸻ ✅ B. Type 1 error • False positive (very important) • You detect association when none exists ⸻ ✅ C. Chance association • Random variation → false positive ⸻ ✅ E. Systematic error (bias) • Study design flaw → distorted result ⸻ 🎯 Exam takeaway 👉 If result is significant → think: • Type 1 error ✔️ possible • Type 2 error ❌ NOT possible

Stats Flashcards

(103 cards)