Stats Flashcards

(103 cards)

1
Q

Which of the following can be considered a strength of a study?

A

✅ Use of an interdisciplinary approach to validate emerging themes

🔹 Concept: Triangulation in qualitative research
• Using multiple coders or an interdisciplinary team (psychiatrists, psychologists, social workers etc.) to discuss and refine themes = triangulation.
• Triangulation = validation → increases credibility and trustworthiness.
• Hence, a major strength in qualitative analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

🧠 Qualitative-Research: Strengths vs Weaknesses

A

STRENGTHS – what increases credibility / validity
&
WEAKNESSES – what reduces reliability / generalisability

  1. Triangulation:
    🔹 Multiple coders or interdisciplinary team validate emerging themes → ↑ credibility 🔹 Use of data-source triangulation (interviews + focus groups + records)
    ❌ Reliance on single researcher’s interpretation (no cross-checking)

✨ Triangulation = gold standard for qualitative rigour

  1. Sampling:
    🔹 Purposive / theoretical sampling ensures data from those with direct, relevant experience 🔹 Data saturation achieved (no new themes emerging)
    ❌ Convenience / non-random sampling → selection bias ❌ Small / unbalanced sample → limits transferability

✨ Qualitative aims for depth, not representativeness

  1. Interviews and Data Collection:🔹 Semi-structured / open-ended interviews capture lived experience 🔹 Reflexivity logs show awareness of interviewer influence
    ❌ Single interviewer → interviewer bias ❌ Overly structured questions → constrain data richness

✨Reflexivity = acknowledging subjectivity strengthens trustworthiness

  1. Analysis: 🔹 Systematic coding with theme verification by ≥2 analysts 🔹 Member checking (participants review summaries)
    ❌ Lack of coding transparency ❌ Themes not linked back to raw data (quotes)

✨ MRCPsych loves “thematic analysis verified by multiple coders

  1. Context/Data depths: 🔹 Thick description – rich contextual quotes and detailed setting
    ❌ Superficial summaries without context

✨ Rich data” is a buzzword for qualitative strength

  1. Reflexivity & Audit Trail:🔹 Researcher positionality statement – reflects biases 🔹 Audit trail – clear documentation of analytic steps
    ❌ No reflexive statement → reader can’t assess bias

✨Shows methodological transparency

  1. Ethics / Validity:🔹 Participant consent & confidentiality addressed 🔹 Peer debriefing to discuss analytic decisions
    ❌ Potential coercion, lack of anonymisation

✨Demonstrates procedural rigour

  1. Transferability (vs generalisability):
    🔹 Findings described clearly so readers can judge applicability to other settings
    ❌ Overclaiming generalisability

✨In qualitative work, say “transferable,” not “generalizable.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which of the following can be considered a weakness of a study?

A

✅ Using a single interviewer for all interviews

🔹 Concept: Inter-rater variability / Reflexivity bias
• A single interviewer limits variation in data collection and may bias participant responses based on interviewer tone, style, or expectations.
• Using multiple interviewers with calibration can reduce researcher bias and improve reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Biases in Qualitative Research

A

🧩 1️⃣ Selection Bias
• Definition: Participants are not representative of the population being studied.
• Example: Only interviewing patients who agreed to take part in a psychotherapy evaluation — those with negative experiences may decline.
• Why it matters: Limits transferability of findings.
• SPMM clue: “Carefully selected participants” or “voluntary participation” → selection bias.

🧩 2️⃣ Recall Bias
• Definition: Participants may not accurately remember past experiences.
• Example: Asking discharged inpatients to recall how safe they felt during admission.
• Prevention: Collect contemporaneous accounts or triangulate data (e.g. use notes, staff interviews).
• SPMM clue: “Interviews conducted after discharge / after treatment” → recall bias.

🧩 3️⃣ Moderator / Interviewer Bias
• Definition: The interviewer’s tone, phrasing, facial expression, gender, or preconceptions shape participants’ responses.
• Example: Interviewer nods approvingly when a participant says staff were kind → participant gives more positive answers.
• Prevention: Use multiple interviewers, standard topic guides, or reflexive journals.
• SPMM clue: “Single interviewer” → moderator bias.

🧩 4️⃣ Social Desirability Bias
• Definition: Participants modify responses to appear favourable or avoid judgement.
• Example: Patients report medication adherence more positively than reality.
• Prevention: Assure confidentiality and neutrality.
• SPMM clue: “Sensitive topic” + “face-to-face interview” → social desirability bias.

🧩 5️⃣ Confirmation Bias
• Definition: Researcher subconsciously looks for data supporting pre-existing beliefs.
• Example: A researcher who believes hospital care is coercive interprets neutral statements as negative.
• Prevention: Reflexivity, triangulation (multiple coders, peer review).
• SPMM clue: “Researcher with prior experience / strong views” → confirmation bias.

🧩 6️⃣ Attrition Bias
• Definition: Some participants drop out before completion; their views may differ from those who remain.
• Example: In longitudinal interviews, distressed participants withdraw.
• Prevention: Ensure follow-up and analyse dropouts.
• SPMM clue: “Loss of participants over time” → attrition bias.

🧩 7️⃣ Interpretive Bias
• Definition: Researcher’s personal lens affects interpretation of themes.
• Example: Coding themes based on what the researcher expected to find.
• Prevention: Independent coding by multiple researchers (inter-rater validation).
• SPMM clue: “Themes validated by interdisciplinary team” → mitigates interpretive bias.

🧩 8️⃣ Publication Bias (less common in qual.)
• Definition: Positive or novel findings more likely to be published.
• Example: Studies finding dissatisfaction with services are more likely to be accepted for publication.
• Prevention: Pre-register studies or publish all findings.

🧠 Mitigation Strategies
- Triangulation
Using multiple researchers, data sources, or methods → ↑ validity

  • Reflexivity
    Acknowledging & documenting researcher’s influence
  • Member checking
    Participants review accuracy of interpretations
  • Audit trail
    Transparent documentation of how coding & themes developed
  • Thick description
    Provides context → aids transferability

💡 SPMM Exam Tip

When you see:
• “Single interviewer” → Moderator bias
• “Retrospective interview” → Recall bias
• “Participants selected by convenience or staff recommendation” → Selection bias
• “Researcher previously worked in same setting” → Confirmation bias / reflexivity issue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

🧩 Types of Qualitative Interview Procedures

A

🟢 1. Unstructured Interviews
• Definition: Completely open conversation; no fixed questions or time limit.
• Purpose: Explore a topic in depth when little prior knowledge exists.
• Example: “Tell me about your experience of living with schizophrenia.”
• Advantages: Very rich data; spontaneous discoveries.
• Disadvantages: Hard to replicate; data vary greatly between interviews; analysis is complex.
• SPMM clue: “No topic guide,” “free discussion,” “participant led.”

🟡 2. Semi-Structured Interviews ✅ Most common in psychiatry
• Definition: Guided by a topic guide or list of key questions but flexible follow-up probes allowed.
• Purpose: Balance consistency (between interviews) with depth.
• Example: “How do you feel about your current medication?” → follow-ups depending on response.
• Advantages: Allows comparison and exploration; most compatible with thematic analysis, IPA, grounded theory.
• Disadvantages: Still subject to interviewer bias; requires skilled moderation.
• SPMM clue: Mentions “topic guide,” “time-limited interviews,” “thematic analysis.”

🔵 3. Structured Interviews
• Definition: Pre-set, standardised questions in fixed order and wording.
• Purpose: Ensure uniformity across participants.
• Used in: Quantitative or mixed-method studies (e.g., SCID, MINI).
• Advantages: Reduces interviewer bias; easy to replicate.
• Disadvantages: Limited depth; may miss new insights.
• SPMM clue: “Closed questions,” “checklist,” “same questions to all.”

🟣 4. Focus Groups
• Definition: Group interview (6–10 participants) guided by a facilitator.
• Purpose: Explore shared or differing views; observe social dynamics.
• Advantages: Interaction stimulates new ideas; efficient data collection.
• Disadvantages: Dominant voices can bias discussion; confidentiality harder to control.
• SPMM clue: “Group discussion moderated by researcher.”

🟤 5. Key-Informant Interviews
• Definition: Conducted with people who have special knowledge or experience relevant to the topic.
• Example: Interviewing senior nurses about ward culture.
• Used for: Policy evaluation, service design, or community studies.

💡 Exam Tip
“Researchers used a topic guide and conducted 2-hour interviews that were audio-taped and transcribed. Which type of interview is this?”
✅ Answer: Semi-structured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A screening test for dementia has a sensitivity of 90% and specificity of 70%.
If 100 people with dementia are tested, how many will be correctly identified?

A

✅ Answer: D (90)
Explanation: Sensitivity = TP / (TP + FN).
→ 90 % of people with dementia (the diseased group) will test positive.
High-yield: Sensitivity = true positive rate.

⚙️ Step-by-step reasoning
1️⃣ The question says “100 people with dementia.”
That means all 100 in the sample have the disease.
So the outcome we’re looking for is the number of true positives among the diseased group.

2️⃣ The property that tells you how often a test correctly identifies people who have the disease is sensitivity, not specificity.

\text{Sensitivity} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}}

3️⃣ If sensitivity = 90 %, then the test correctly identifies 90 % of those who are actually diseased.

100 \times 0.90 = 90

✅ Answer = 90 patients correctly identified.

❌ Why it’s not specificity
• Specificity measures how well the test correctly identifies people without the disease (true negatives).
• Here, no one in the question is disease-free — the sample is entirely “people with dementia.”
• Therefore specificity is irrelevant to this calculation.

✨Sensitivity
True positive rate (probability that the test correctly identifies disease).
Sensitivity = TP / (TP + FN)

✨Specificity
True negative rate (probability that the test correctly identifies non-disease).
Specificity = TN / (TN + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following parameters changes with disease prevalence?

A. Sensitivity
B. Specificity
C. Positive Predictive Value (PPV)
D. Negative Predictive Value (NPV)
E. Both C and D

A

✅ Answer: E (PPV and NPV)
Explanation:
• Sensitivity & specificity are intrinsic to the test → unchanged by prevalence.
• PPV ↑ as prevalence ↑ ; NPV ↓ as prevalence ↑.
SPMM tip: “Predictive values = Population-dependent.”

✨Positive predictive value (PPV)
Probability that a person with a positive test actually has the disease.
= TP / (TP + FP)
✨Negative predictive value (NPV)
Probability that a person with a negative test truly doesn’t have the disease.
= TN / (TN + FN)
✨Prevalence effect
PPV and NPV change with prevalence, but sensitivity/specificity do not.
=

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A test has sensitivity = 80 %. How many patients must be screened to detect 200 true positives?

A. 160  B. 200  C. 240  D. 250  E. 320

A

✅ Answer: D (250)
Explanation:

Sensitivity= TP / (TP + FN)

SPMM tip: “If question gives sensitivity + # true positives → divide.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following statements is true about specificity?

A. It measures the proportion of true positives correctly identified.
B. It is influenced by disease prevalence.
C. It is high in a good screening test.
D. It measures the proportion of true negatives correctly identified.
E. It increases false-positive rate.

A

✅ Answer: D – proportion of true negatives correctly identified.
Mnemonic: SNOUT = Sensitive test rules OUT; SPIN = Specific test rules IN.
SPMM tip: Screening → high sensitivity; Confirmation → high specificity.

✨Specificity
True negative rate (probability that the test correctly identifies non-disease).
Specificity = TN / (TN + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If the prevalence of dementia increases in a population, what happens to predictive values?

A. PPV ↑ NPV ↓
B. PPV ↓ NPV ↑
C. Both ↑
D. Both ↓
E. No change

A

Answer: A
Explanation: Higher prevalence = more true positives → higher PPV; fewer true negatives → lower NPV.
SPMM tip: “PPV parallels prevalence.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which term describes the probability that a patient with a negative test truly does not have the disease?

A. Specificity
B. NPV
C. PPV
D. Sensitivity
E. Accuracy

A

Answer: B (Negative Predictive Value)
SPMM tip: “NPV = True negative / All negatives.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which term describes the overall ability of a test to correctly classify individuals as diseased or non-diseased?

A. Accuracy
B. Sensitivity
C. Specificity
D. Predictive Value
E. Reliability

A

A (Accuracy)
Formula: (TP + TN) / (Total Population).
SPMM tip: Don’t confuse accuracy with reliability — reliability = repeatability.

✨Accuracy
- How often the test gives the correct result (both true positives and true negatives) out of all tested individuals
- (TP + TN) / (TP + TN + FP + FN)
- Freedom from systematic error (bias)
- Intrinsic to the test (not dependent on prevalence if sensitivity/specificity remain constant)
- 🎯 Hitting the bull’s-eye
- “How overall correct the test is.”

✨Reliability (Precision)
- How consistent or repeatable a test result is under the same conditions
- Freedom from random error (noise)
- 🎯 Arrows hitting the same spot, even if not the bull’s-eye
- “Consistency”

✨If the stem says “overall ability,” “overall proportion,” or “both true positives and true negatives,”
→ Answer = Accuracy.

✨If the stem says “probability that a positive test reflects true disease,”
→ Answer = Predictive Value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A new Alzheimer’s screening tool has sensitivity 95 %, specificity 60 %. Which of the following will happen if it’s used in a low-prevalence population?

A. PPV will decrease
B. NPV will decrease
C. PPV will increase
D. Sensitivity will decrease

A

Answer: A (PPV ↓)
Explanation: When disease is rare, most positives are false → low PPV.
SPMM tip: “Low prevalence → many false positives and NPV ↑.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A test gives 80 true positives, 10 false negatives, 90 true negatives, and 20 false positives.
What is its accuracy?

A. 70 %  
B. 80 %  
C. 85 %  
D. 90 %  
E. 95 %

A

Answer: C (85 %)

Accuracy means:

How often the test is correct overall

Formula:

Accuracy = (TP + TN) / Total

So we only count the correct results.

Correct results are:
• True positives
• True negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A new cognitive screening test for dementia produces highly consistent results when repeated by the same assessor, but tends to overestimate impairment compared to gold-standard neuropsychological testing.
Which statement best describes this test?

A. Reliable but not accurate
B. Accurate but not reliable
C. Both reliable and accurate
D. Neither reliable nor valid
E. Valid but not reliable

A

Reliable but not accurate

Reason:
Results are consistent (reliable) but systematically wrong (not accurate).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A new mood rating scale gives very similar scores when repeated by the same rater on different days, but the scores are consistently higher than patients’ clinician-rated Hamilton Depression scores.
What best describes this test?

A. Reliable but not valid
B. Valid but not reliable
C. Both reliable and valid
D. Neither reliable nor valid
E. Accurate and valid

A

Answer: A – Reliable but not valid
• Repeatedly consistent → reliable.
• Consistently wrong (systematic bias) → not valid/accurate.
📘 Ref: Kaplan & Sadock, Research Methods; SPMM Topic 12.1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In a study of cognitive testing, two psychologists independently administer the same test and obtain highly correlated scores.
This indicates good …

A. Internal validity
B. Test–retest reliability
C. Inter-rater reliability
D. Construct validity
E. External validity

A

✅ Answer: C – Inter-rater reliability
Measures consistency between different raters.
📘 Ref: Streiner & Norman, Health Measurement Scales.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When a depression scale correlates strongly with another established depression inventory but not with an anxiety scale, this demonstrates …

A. Face validity
B. Construct validity
C. Discriminant validity
D. Content validity
E. Internal reliability

A

✅ Answer: C – Discriminant validity
Shows the tool distinguishes between related but different constructs.

This pattern demonstrates:
Construct validity

More specifically:
• Convergent validity → strong correlation with similar constructs
• Discriminant validity → weak correlation with different constructs

Both together support construct validity.

Correct answer

👉 Construct validity

Why examiners like this question

Because they test the two clues:

Strong correlation with similar measure → convergent validity
Weak correlation with different measure → discriminant validity

Together → construct validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A screening test for dementia correctly classifies both diseased and non-diseased individuals 88 % of the time.
What property does this describe?

A. Sensitivity
B. Specificity
C. Accuracy
D. Reliability
E. Validity

A

C – Accuracy
Overall proportion of true positives + true negatives.
📘 Ref: Altman Practical Statistics for Medical Research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A clinician rates depression severity using the same scale on two occasions a week apart. Scores are strongly correlated (r = 0.91).
What does this indicate?

A. Good internal consistency
B. High test–retest reliability
C. Excellent concurrent validity
D. High face validity
E. Good external validity

A

✅ Answer: B – High test–retest reliability
Shows stability of results over time.
📘 Ref: SPMM QBank Section 12; Kaplan Research Methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

🧠 Types of Reliability

A

✨Test–Retest Reliability
Same test → same person → two time points → consistent scores
A patient scores 20/30 on MMSE today and 21/30 next week
“Stability over time”

✨Inter-Rater Reliability
Two or more observers give similar ratings
Two psychiatrists rating PANSS obtain similar scores
“Agreement between raters”

✨Intra-Rater Reliability
Same rater scores consistently across occasions
One psychologist re-scoring the same session
“Consistency by one rater”

✨Internal Consistency
Items within the scale measure the same construct
Cronbach’s α ≥ 0.7 → good consistency
“Homogeneity of items”

✨Split-Half Reliability
Correlation between halves of one test
Odd vs even items compared
“Half-half correlation”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

🧠 Types of Validity

A

✨Face Validity
Appears to measure what it claims (superficial)
Beck Depression Inventory looks like it measures depression
“Looks right”

✨Content Validity
Covers all relevant aspects of the construct
Exam Qs sampling all syllabus topics
“Covers full domain”
✨Construct Validity
Correlates with related theoretical concepts (convergent + discriminant)
New anxiety scale correlates with existing anxiety scales (convergent) but not with unrelated scales (discriminant)
“Theoretical soundness”
✨Criterion Validity
Correlates with an external criterion or gold standard
Mini-Mental Score correlates with neuropsych assessment
“Compared with gold standard”
- Concurrent Validity
Type of criterion validity; criterion measured at same time
PHQ-9 vs clinician diagnosis today
“Same-time correlation”
- Predictive Validity
Type of criterion validity; criterion measured in future
IQ test predicting school performance
“Forecasts future”
✨Ecological Validity
Results generalise to real-world setting
Lab test reflecting actual ward performance
“Real-world applicability”
External Validity
Results generalise to other populations/settings
Trial findings apply to general clinical population
“Generalisation”

✨Internal Validity
Extent study minimises bias/confounding
RCT with randomisation/blinding
“Control of bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Cronbach’s α = 0.90 for a new anxiety inventory.
What does this signify?

A. High test–retest reliability
B. Good internal consistency
C. High inter-rater agreement
D. Good construct validity
E. High ecological validity

A

Internal consistency

Cronbach’s α ≥ 0.7 = items measure the same underlying construct.
📘 Ref: Streiner & Norman, Health Measurement Scales.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A new cognitive test correlates highly with MMSE scores taken on the same day.
A. Construct validity
B. Predictive validity
C. Concurrent validity
D. Internal consistency
E. External validity

A

✅ Answer: C – Concurrent validity

Criterion validity subtype; compared with gold standard at same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
A personality test correlates with other personality tests but not with unrelated anxiety scales. A. Convergent validity B. Discriminant validity C. Content validity D. Ecological validity E. Internal consistency
B. Discriminant validity Distinguishes construct from unrelated ones. (Convergent = correlates with similar constructs.)
26
A questionnaire includes items on sleep, appetite, mood, and concentration to cover all depression domains. A. Face validity B. Content validity C. Construct validity D. Criterion validity E. Ecological validity
✅ Answer: B – Content validity Ensures all facets of the construct are represented.
27
An experimental cognitive task performs well in the lab but poorly reflects real-life behaviour. Which validity is low? A. Internal validity B. Ecological validity C. Construct validity D. Criterion validity E. External validity
✅ Answer: B – Ecological validity Real-world generalisability of results.
28
A study uses randomisation and blinding to reduce confounding and bias. Which type of validity does this improve? A. Internal validity B. External validity C. Construct validity D. Criterion validity E. Predictive validity
✅ Answer: A – Internal validity Accuracy of causal inference within the study.
29
A depression rating scale shows high correlation with another validated depression scale and with clinical severity ratings. A. Construct validity B. Content validity C. Predictive validity D. Reliability E. Internal validity
✅ Answer: A – Construct validity Demonstrates the scale truly measures the intended psychological construct.
30
LR⁺
= Sens / (1 – Spec) Likelihood of positive in diseased vs healthy ↑ = better test 💡 High-Yield Facts • LR⁺ > 10 = Strong evidence for disease • LR⁺ = 1 = No diagnostic value • LR⁺ < 2 = Weak • LR⁻ < 0.1 = Strong evidence against disease • LR⁺ combines both sensitivity & specificity → reflects discriminatory power of test 📘 Ref: Altman 1991; BMJ “Statistics Notes – Likelihood Ratios”
31
LR⁻
= (1 – Sens) / Spec Likelihood of negative in diseased vs healthy ↓ = better test
32
FPR
= 1– Specificity False positives among healthy “Labelled positive but healthy” • Specificity = True Negative Rate = proportion of non-diseased correctly identified • 1 – Specificity = False Positive Rate = proportion of non-diseased incorrectly labelled positive • Often asked as: “What proportion of healthy individuals are incorrectly identified as diseased?” → 1 – Specificity
33
FNR
= 1 – Sensitivity Missed true cases “Missed diagnosis” • “What proportion of diseased individuals go undetected?” → 1 – Sensitivity (False Negative Rate)
34
A screening test for dementia has: • Sensitivity = 90% • Specificity = 80% What proportion of healthy individuals will be incorrectly labelled as positive? A. 0.10 B. 0.20 C. 0.80 D. 0.90 E. 0.02
✅ Answer: B – 0.20 Explanation: False-positive rate = 1 – specificity = 1 – 0.8 = 0.2 (20%).
35
A new cognitive test has Sensitivity = 75%, Specificity = 90%. Calculate the Likelihood Ratio Positive (LR⁺). A. 0.08 B. 0.75 C. 7.5 D. 9.0 E. 15
✅ Answer: C – 7.5 Explanation: LR⁺ = Sensitivity / (1 – Specificity) = 0.75 / 0.10 = 7.5. LR⁺ > 10 = very strong diagnostic evidence; 5–10 = moderate.
36
Which statistic combines both sensitivity and specificity to summarise a test’s overall discriminatory power? A. Predictive value B. Accuracy C. Reliability D. Likelihood ratio E. Cronbach’s α
✅ Answer: D – Likelihood ratio Explanation: LR integrates true-positive and false-positive rates; predictive values depend on prevalence.
37
In a population where dementia prevalence = 5%, a screening test has Sensitivity = 90% and Specificity = 95%. What happens to Positive Predictive Value (PPV) if the test is used in the general population instead of a memory clinic? A. PPV increases B. PPV decreases C. No change D. Depends only on specificity E. Becomes equal to NPV
✅ Answer: B – PPV decreases Explanation: Lower prevalence → more false positives → lower PPV. PPV ∝ prevalence; NPV ∝ 1/prevalence.
38
Which of the following statements about diagnostic test indices is true? A. PPV is independent of prevalence. B. Specificity is the proportion of patients with disease correctly identified. C. Sensitivity + Specificity = 1. D. LR⁺ uses both sensitivity and specificity. E. False negative rate = 1 – specificity.
✅ Answer: D – LR⁺ uses both sensitivity and specificity. Explanation: LR⁺ = Sensitivity / (1 – Specificity); LR⁻ = (1 – Sensitivity) / Specificity.
39
Which of the following correctly describes the direction of inquiry in a case–control study? A. Exposure → Outcome B. Outcome → Exposure C. Exposure ↔ Outcome (simultaneous) D. Randomized exposure → Outcome
Answer: B – Outcome → Exposure Explanation: Investigators start by identifying those with and without the outcome (“cases” vs “controls”) and then look back to compare past exposures.
40
How do cohort studies differ from case–control studies?
Answer: Cohort studies can calculate incidence and relative risk. Explanation: Because subjects are followed from exposure to outcome, new cases can be counted over time; case–control studies only estimate odds ratios.
41
Within a diabetes cohort of 10 000, 200 patients who developed retinopathy and 200 without were compared for past HbA1c levels. What design is this?
Answer: Nested Case–Control Study Explanation: A subset of a pre-existing cohort is used. It’s economical and reduces recall bias because exposure data were collected prospectively before disease onset.
42
A hospital compares medication-error rates for 6 months before and after introducing electronic prescribing.
Answer: Before-and-After Study Explanation: A quasi-experimental design; participants act as their own controls. It assesses the effect of an intervention but lacks randomisation, so temporal and confounding biases can occur.
43
In a study examining the relationship between parental smoking and child asthma, researchers find that parental socioeconomic status (SES) is associated both with smoking habits and child health. SES therefore acts as: A) Mediator B) Confounder C) Effect modifier D) Independent variable E) Random error
Confounder Explanation: SES affects both the exposure (smoking) and the outcome (asthma) but is not part of the causal pathway — the classic definition of a confounder.
44
Which of the following study designs best controls for confounding during the design phase? A) Stratified analysis B) Regression modelling C) Randomisation D) Matching E) Standardisation
Randomisation Explanation: Randomisation distributes both known and unknown confounders equally between groups, reducing confounding before data collection. Matching and stratification control confounding after sampling.
45
In an observational study, researchers match each smoker with a non-smoker of the same age and sex. What bias-control technique is this? A) Restriction B) Randomisation C) Matching D) Blinding E) Stratification
Matching Explanation: Matching ensures comparability between exposure groups on potential confounders (e.g., age, sex). Commonly used in case–control designs.
46
A study finds that the association between obesity and depression is stronger in women than in men. What does this illustrate? A) Confounding B) Interaction (effect modification) C) Information bias D) Selection bias E) Recall bias
Answer: B) Interaction (effect modification) Explanation: When the strength or direction of an association changes according to a third variable (sex, age, etc.), that variable is an effect modifier, not a confounder.
47
Which of the following analytical methods is most appropriate to adjust for multiple confounders simultaneously? A) Sensitivity analysis B) Multivariable regression C) Stratified analysis D) Propensity matching E) Randomisation
Answer: B) Multivariable regression Explanation: Regression models (e.g., logistic or linear regression) statistically adjust for several confounders at once — essential in observational studies where randomisation isn’t possible.
48
Confounding vs Interaction
• Confounding = a nuisance that hides or distorts the truth → you remove it. • Interaction = a real phenomenon that reveals subgroup differences → you highlight it. 🧪 Example 1: Smoking, Alcohol, and Cancer 👉 If smokers drink more and smoking itself causes throat cancer, smoking is a confounder. It distorts the true alcohol–cancer relationship. Once you adjust for smoking, the apparent strong link between alcohol and cancer weakens. 💡 Example 2: Sex, Smoking, and Heart Disease 👉 If smoking increases heart-disease risk much more in men than in women, sex is an effect modifier (interaction). The association truly varies between subgroups — it’s not bias; it’s biology.
49
Researchers studied whether bedtime regularity predicts behavioural difficulties in 7-year-olds, while adjusting for age, gender, family income, maternal mental health, parenting style, etc. They reported: “After adjusting for multiple confounders, β = 0.53, p < 0.001 …” What statistical test was used?
Correct answer: Multiple Regression
50
If If the dependent variable is binary (e.g., presence / absence of depression), which regression model is used? A) Linear regression B) Logistic regression C) Cox regression D) ANOVA E) Poisson regression
Logistic regression Explanation: Binary outcomes (0/1) need logistic regression, which models log odds of the event.
51
Which of the following regression analyses allows for multiple predictors, both continuous and categorical? A) Univariate regression B) Bivariate correlation C) Multivariable regression D) Chi-square E) ANOVA
Multivariable regression Explanation: “Multivariable” means >1 predictor. It can adjust for confounders and test independent effects of each variable.
52
Which test estimates the time until an event (e.g., relapse) occurs? A) Cox proportional hazards regression B) Logistic regression C) Multiple linear regression D) ANOVA E) t-test
Cox proportional hazards regression Explanation: Used for survival analysis — models hazard ratios, not odds ratios.
53
In a multiple regression, the variable “β = 2.5, p < 0.01” means: A) Each 1-unit increase in the predictor raises the outcome by 2.5 units. B) The groups differ by 2.5 %. C) There is no significant effect. D) The predictor and outcome are unrelated.
Each 1-unit increase in the predictor raises the outcome by 2.5 units. Explanation: β shows the slope — how much the dependent variable changes for every unit change in the independent one.
54
Linear vs Logistic Regression
🧠 1️⃣ Continuous vs Categorical Variables Continuous Can take any numeric value within a range — measurable, not counted. Height (cm), weight (kg), age, blood pressure, IQ score, behavioural score, MMSE score Categorical (Discrete) Represents distinct groups or categories, not measured on a continuum. Gender (M/F), diagnosis (depression/psychosis), medication (yes/no), marital status 📊 Where This Fits in Regression Linear Regression Dependent Variable (outcome): Continuous Independent Variables (Predictors) Continuous or categorical (can be both) “How does age or sex predict MMSE score?” Logistic Regression Dependent Variable (outcome): Categorical (binary: yes/no) Independent Variables (Predictors) Continuous or categorical (can be both) “How does BMI or sex predict presence of diabetes (yes/no)?” ✅ So — • Linear regression is for continuous outcomes. • Logistic regression is for categorical (usually binary) outcomes.
55
Comparison of the risk of developing schizophrenia in two groups of 3000 people – one group comprises of people who have used cannabis and the other of people who have never used cannabis.”
✅ Answer: Chi-squared test ⸻ 🧠 Why: • Outcome = categorical (schizophrenia: yes/no) • Comparing two independent groups • Comparing proportions (risk) 👉 That = Chi-square ⸻ ❌ Traps: • t-test → ❌ for means, not proportions • Logistic regression → ❌ only if multiple predictors ⸻ 🔥 High-yield rule: 👉 “Proportions between groups → Chi-square”
56
“Investigation of the association between several risk factors and the risk of developing schizophrenia.”
✅ Answer: Logistic regression ⸻ 🧠 Why: • Outcome = binary (yes/no schizophrenia) • Multiple predictors = multivariate model 👉 That = logistic regression ⸻ ❌ Traps: • Chi-square → ❌ only single variable • Linear regression → ❌ requires continuous outcome ⸻ 🔥 High-yield rule: 👉 “Binary outcome + multiple predictors = LOGISTIC regression”
57
Comparing baseline and endpoint weight in the same individuals in a phase II study of an antipsychotic. The variable is normally distributed. (Choose TWO)”
✅ Answers: Paired t-test AND repeated measures ANOVA ⸻ 🧠 Why: • Same individuals → paired data • Continuous variable → weight • Normal distribution → parametric tests 👉 Options: • Paired t-test → simplest • Repeated measures ANOVA → if multiple timepoints ⸻ ❌ Traps: • Independent t-test → ❌ wrong (not independent groups) • Chi-square → ❌ categorical only ⸻ 🔥 High-yield rule: 👉 “Same people before/after → PAIRED t-test”
58
“The spread of the raw data (Choose TWO)”
✅ Answers: Standard deviation + Variance ⸻ 🧠 Why: Both measure dispersion (spread) • Variance = average squared deviation • SD = square root of variance (more clinically meaningful) ⸻ ❌ Traps: • Range → less robust • IQR → not typically first answer in exams ⸻ 🔥 High-yield: 👉 “Spread = SD ± variance (default exam answer)”
59
The range in which 19/20 identical experiments would be expected to find the same result.”
✅ Answer: 95% confidence interval ⸻ 🧠 Why: • 19/20 = 95% • CI = range where true population value lies ⸻ ❌ Trap: • p-value → NOT a range • SD → spread, not estimate ⸻ 🔥 High-yield: 👉 “95% CI = 19/20 rule”
60
“Synonymous with type I error rate.”
✅ Answer: P-value (significance level) ⸻ 🧠 Why: • Type I error = false positive • Alpha (α) = probability of this • Usually p = 0.05 ⸻ ❌ Trap: • Power → Type II error • CI → estimation, not error ⸻ 🔥 High-yield: 👉 “Type I error = p-value = false positive” Type I error = False positive Meaning: You conclude there IS an effect, but in reality there is NO effect Example: • You say cannabis causes schizophrenia • But in truth… it doesn’t ➡️ That’s a Type I error ⸻ 🧠 Now: What is the p-value? 👉 The p-value = “If there was actually NO real effect, what is the probability of getting results this extreme?” So: • Small p-value → unlikely due to chance → “probably real” • Large p-value → likely due to chance → “probably not real”
61
An improvement system used to develop new processes or products at superior performance levels. (Choose ONE)”
✅ Answer: DMADV model ⸻ 🧠 Explanation: DMADV = 👉 Define – Measure – Analyse – Design – Validate • Used when: • You are creating something new • Not just improving existing processes ⸻ ❌ Trap: • PDSA → ❌ for improving existing processes • Model for Improvement → ❌ iterative improvement, not new design ⸻ 🔥 High-yield: 👉 “New system = DMADV” 👉 “Improve existing = PDSA”
62
“An extension of PDSA model describing events and questions that must precede the initiation of a PDSA cycle (Choose TWO)”
✅ Answers: FOCUS approach + Model for Improvement (MFI) ⸻ 🧠 Explanation: 1️⃣ FOCUS • Find process • Organise team • Clarify knowledge • Understand variation • Select improvement 👉 Basically: prep before PDSA ⸻ 2️⃣ Model for Improvement (MFI) Asks: • What are we trying to accomplish? • How will we know change is improvement? • What changes can we make? 👉 Then → PDSA cycles ⸻ ❌ Trap: • PDSA itself → ❌ not “before”, it IS the cycle ⸻ 🔥 High-yield: 👉 “FOCUS + MFI = BEFORE PDSA”
63
A visualisation tool to capture variations that occur within a system (Choose ONE)”
✅ Answer: Statistical Process Control (SPC) chart ⸻ 🧠 Explanation: SPC charts: • Track performance over time • Show: • Normal variation (common cause) • Abnormal variation (special cause) Examples: • Run charts • Control charts ⸻ ❌ Trap: • Bar chart → ❌ static data • Pie chart → ❌ proportions only ⸻ 🔥 High-yield: 👉 “Variation over time = SPC chart”
64
“Which of the following coefficient values depict a perfect inverse correlation between 2 continuous variables?” Options: • 0 • 2 • 1 • -1 • 0.5 ⸻
✅ Correct Answer: → -1 ⸻ 🧠 Explanation (EXAM STYLE) Correlation coefficient (r) ranges from: 👉 -1 → +1 • +1 = perfect positive correlation • 0 = no correlation • -1 = perfect inverse (negative) correlation ⸻ 💡 What does “perfect inverse” mean? As one variable increases → the other decreases in a perfectly straight line Example: • Exercise ↑ → Weight ↓ (perfectly linear hypothetical)
65
“What type of graph is shown?” Options: • Histogram • Box whisker plot • Funnel plot • Bar chart • Kaplan-Meier graph
✅ Correct Answer: → Box-and-whisker plot ⸻ 🧠 Why this is a box plot (step-by-step visual decoding) Look at the key features in your image: 1️⃣ Rectangular boxes • Each category (Psychosis, Bipolar, etc.) has a box 👉 This = interquartile range (IQR) ⸻ 2️⃣ Line inside the box • That line = median ⸻ 3️⃣ “Whiskers” extending up/down • These show range (min–max or 1.5×IQR) ⸻ 4️⃣ Dots and stars • These are: • Outliers • Extreme values 👉 THIS combo = box plot signature ⸻ ❌ Why others are wrong (EXAM TRAPS) Histogram ❌ • Continuous data • Bars touch • Shows frequency distribution 👉 NO medians / whiskers ⸻ Bar chart ❌ • Categorical data • Shows means/counts 👉 NO spread or quartiles ⸻ Funnel plot ❌ • Used in meta-analysis • Looks like an inverted triangle ⸻ Kaplan-Meier ❌ • Survival curve • Step-like graph over time ⸻ 🔥 High-Yield Paper B Pearls 1️⃣ Box plot = distribution summary 👉 Shows: • Median • IQR • Spread • Outliers ⸻ 2️⃣ When do they test it? 👉 When comparing distribution across groups (e.g. diagnoses vs satisfaction scores — EXACTLY your question)
66
The upper and lower horizontal edges of the shaded area refer to what?” ⸻
✅ Correct Answer: → Interquartile range (IQR) (more specifically: Q1 and Q3) ⸻ 🧠 Break it down visually (using YOUR graph) Inside each box plot: 📦 The box itself = IQR • Lower edge of box → Q1 (25th percentile) • Upper edge of box → Q3 (75th percentile) 👉 So the shaded box = middle 50% of data ⸻ ➕ The line inside the box → Median (Q2) ⸻ 📏 The whiskers → Extend to: • Min/max OR • 1.5 × IQR (depending on convention) ⸻ ⭐ Dots / stars → Outliers ⸻ 🔥 High-Yield Paper B Pearls 1️⃣ Definition of IQR 👉 IQR = Q3 – Q1 ⸻ 2️⃣ What does IQR represent? 👉 Spread of the middle 50% of the data ⸻ 3️⃣ Why examiners test this Because: • Mean is affected by outliers • IQR is NOT affected by outliers 👉 So it’s a robust measure of spread
67
In the given graph, the dark line across the box is…” ⸻
✅ Correct Answer: → Median (Q2) ⸻ 🧠 How to read it instantly (no thinking needed in exam) In any box plot: 📦 Box edges → Q1 & Q3 📏 Whiskers → range ⭐ Dots → outliers ➡️ The line INSIDE the box = MEDIAN ⸻ 🧠 What is the median? 👉 The middle value of the dataset 👉 50% above, 50% below ⸻ 🔥 High-Yield Paper B Pearls 1️⃣ Median vs Mean (VERY IMPORTANT) • Median = robust (not affected by outliers) ✅ • Mean = affected by outliers ❌ 👉 Box plots show median, NOT mean
68
In the given graph, the dark line across the box is…” ⸻
✅ Correct Answer: → Median (Q2) ⸻ 🧠 How to read it instantly (no thinking needed in exam) In any box plot: 📦 Box edges → Q1 & Q3 📏 Whiskers → range ⭐ Dots → outliers ➡️ The line INSIDE the box = MEDIAN ⸻ 🧠 What is the median? 👉 The middle value of the dataset 👉 50% above, 50% below ⸻ 🔥 High-Yield Paper B Pearls 1️⃣ Median vs Mean (VERY IMPORTANT) • Median = robust (not affected by outliers) ✅ • Mean = affected by outliers ❌ 👉 Box plots show median, NOT mean
69
In the graph, the width of the shaded box represents…
The width of boxes does not mean anything in these plots. The correct answer is: None of the above
70
Which of the following is a correct interpretation of the given information and the graph? Options: A. None of the patients with personality disorder are dissatisfied B. Lowest level of satisfaction is seen in anxiety group C. Some patients with psychosis show extremes of variation in level of satisfaction D. Patients with ‘other’ diagnoses show highest level of satisfaction E. Significant satisfaction is observed in psychosis and depression groups ⸻
✅ Correct Answer: → C 👉 Some patients with psychosis show extremes of variation in level of satisfaction ⸻ 🧠 Why this is correct (EXAM LOGIC) Look at the psychosis group: • There are multiple outliers (dots/asterisks) • These represent extreme values outside the IQR 👉 So: • Not just spread • But extreme variation (key exam phrase) ⸻ 🔥 High-Yield Rule 👉 Outliers = extremes of variation If you see: • many dots / asterisks → think: ✅ extreme values ✅ skew / variability ❌ NOT mean differences ❌ NOT statistical significance ⸻ ❌ Why the others are WRONG (this is where exams catch you) ⸻ ❌ A. None of the patients with personality disorder are dissatisfied 👉 WRONG because: • Box plots don’t show “none” or absolute absence • You cannot conclude all vs none ⸻ ❌ B. Lowest level of satisfaction is seen in anxiety group 👉 WRONG because: • You must look at median (line in box) • “Other” group actually looks lower ⸻ ❌ D. Patients with ‘other’ diagnoses show highest level of satisfaction 👉 WRONG: • Big box ≠ high satisfaction • Median is actually lower, not higher ⸻ ❌ E. Significant satisfaction is observed in psychosis and depression groups 👉 VERY IMPORTANT TRAP ⚠️ 👉 Box plots = descriptive only ❌ They do NOT show: • statistical significance • p-values • comparisons
71
The results from a 24-weeks RCT of memantine in patients with moderate-severe Alzheimer’s dementia was reported in 2015. The investigators recruited 126 subjects for memantine arm and 126 for placebo arm out of which 100 in memantine group and 100 in placebo group completed the study. Using a categorical measure of treatment response, it was shown that 40% in memantine group responded while only 20% in the placebo group showed a response. Calculate the relative risk reduction of using memantine. Options: • 1 • 20 • 10 • 5 • 2 ⸻
✅ Correct Answer: → 1 ⸻ 🧠 Step-by-step (EXAM METHOD) Step 1: Identify values • EER (treatment) = 40% = 0.4 • CER (control) = 20% = 0.2 ⸻ Step 2: Calculate Relative Risk (RR) RR = \frac{EER}{CER} = \frac{0.4}{0.2} = 2 ⸻ Step 3: Calculate Relative Risk Reduction (RRR) RRR = 1 - RR RRR = 1 - 2 = -1 👉 Take absolute value (exam expects magnitude): RRR = 1 ⸻ 🔥 High-yield interpretation ⚠️ This is a TRAP question • Treatment actually increases response (good outcome) • So technically this is Relative Risk Increase (RRI) • But exam still labels it as RRR → take absolute value ⸻ 🧠 ULTRA HIGH-YIELD RULE 👉 If outcome is GOOD (response, survival): • RR > 1 → GOOD • “RRR” may become negative → take absolute value 👉 If outcome is BAD (death, relapse): • RR < 1 → GOOD • RRR = 1 - RR (normal) ⸻ 💡 One-line memory hack 👉 “RRR = 1 − RR… but ignore the sign in exams”
72
Qualitative methods include all except: Options: • focus groups • participant observation • ethnography • case-control study • semi-structured interviews ⸻
✅ Correct Answer: → case-control study ⸻ 🧠 Explanation (VERY HIGH-YIELD) ✅ Qualitative methods = explore experiences, meanings • Focus groups ✔️ • Participant observation ✔️ • Ethnography ✔️ • Semi-structured interviews ✔️ 👉 These are: • subjective • descriptive • non-numerical ⸻ ❌ Case-control study = QUANTITATIVE • Compares cases vs controls • Uses numbers, odds ratios • Analytical epidemiological design
73
A clinical trainee reviews studies reporting the diagnostic accuracy of MRI scans in identifying schizophrenia. She identified 9 studies whose result could be statistically synthesized further in a meta-analysis. She produced the following graph. What is this graph called?” Options: • Gabrielle plot • Correlation plot • Funnel plot • Nixon plot • Galbraith plot ⸻
✅ Correct Answer: → Galbraith plot ⸻ 🧠 Why this is a Galbraith plot This graph shows: • multiple study points from a meta-analysis • a central straight regression line • parallel dashed lines around it • points scattered around that line to assess heterogeneity / outliers 👉 That pattern is classic for a Galbraith plot (also called a radial plot) ⸻ 🔥 What a Galbraith plot is used for A Galbraith plot helps to: • assess heterogeneity in meta-analysis • identify outlier studies • visually show which studies deviate from the overall effect If studies fall far away from the central line or outside the dashed limits, they may be contributing to heterogeneity. 💎 High-yield Paper B pearls 1. Funnel plot = publication bias If they ask: “Which plot is used to detect publication bias?” 👉 Funnel plot 2. Galbraith plot = heterogeneity If they ask: “Which plot helps detect heterogeneity or outlier studies in meta-analysis?” 👉 Galbraith plot 3. Forest plot Another big one: • each study shown with CI bars • pooled effect at bottom • used to display results of meta-analysis
74
Which of the following points in this graph refers to a regression analysis?
✅ Correct Answer: → A ⸻ 🧠 Why A = regression analysis In a Galbraith plot: 👉 The central straight line = regression line • It represents the overall effect estimate • It’s derived from regression of effect size vs precision 👉 Point A is sitting on that central line ➡️ So it corresponds to the regression line / regression estimate ⸻ 🔥 High-yield concept (VERY examable) 👉 Galbraith plot = regression-based plot • X-axis → precision (1/SE) • Y-axis → standardized effect size 👉 The line is literally: a regression line through the studies
75
Which of the following points can be used to compute the standard error value of a given study in the meta-analysis?
“Which of the following points can be used to compute the standard error value of a given study in the meta-analysis?” ⸻ 🧠 First — translate the question into simple language They are asking: 👉 “Where on this graph can I find information about how reliable each study is?” Because remember: 👉 Standard error = how reliable / precise the study result is ⸻ 🧠 Now connect it to the graph (Galbraith plot) From earlier: 👉 X-axis = precision = 1 / standard error ⸻ 🔑 So: If you know: \text{Precision} = \frac{1}{SE} Then: SE = \frac{1}{\text{X-axis value}} ⸻ ✅ Final Answer (in exam terms) 👉 The horizontal position of the point (X-axis value) ⸻ 🧠 Why this makes sense (intuitively) Think of each dot: • Moving RIGHT → more precise → smaller SE → more reliable • Moving LEFT → less precise → bigger SE → less reliable 👉 So the X-axis is literally showing you: “How trustworthy is this study?”
76
Which of the following points refers to a z score?
In a Galbraith plot: 👉 Y-axis = Z score (standardised effect size) 👉 X-axis = precision (1/SE) ⸻ 🧠 So what does that mean visually? Each dot: • Horizontal position → how reliable the study is (SE) • Vertical position → the Z score ⸻ ✅ Correct Answer 👉 The vertical position (Y-axis value) of the point ⸻ 🧠 Why this makes sense (intuitive version) Remember: 👉 Z score = “how big the effect is compared to its uncertainty” So: • Higher up → stronger effect relative to noise • Lower down → weaker effect 👉 That’s exactly what the Y-axis shows
77
Which of the following points refers to an outlier?
Point D falls outside the 2 standard deviation margins. It is an outlier. The correct answer is: Point D
78
“The scale on x-axis refers to” Options: • Precision • Standardised effect • Sample size • Significance value • Quality of the studies ⸻
✅ Correct Answer: → Precision ⸻ 🧠 Let’s explain this like you’re seeing it for the first time We said earlier: 👉 This is a Galbraith plot ⸻ 🧠 What each axis means (VERY IMPORTANT) 👉 X-axis = Precision = 1 / Standard Error (1/SE) 👉 Y-axis = Z score (standardised effect) ⸻ 🧠 Now in plain English 👉 X-axis tells you: “How reliable is this study?” ⸻ 🟢 Intuitive picture • Points on the right side → very reliable studies (low SE) • Points on the left side → less reliable studies (high SE) ⸻ 🔥 Why we call it “precision” Because: 👉 Precision = how tight / consistent / reliable a result is • High precision → small variability → trustworthy • Low precision → big variability → less trustworthy
79
The scale on y-axis refers to?” Options: • Standardised effect • Significance value • Sample size • Precision • Quality of the studies ⸻
✅ Correct Answer: → Standardised effect ⸻ 🧠 Let’s explain this from absolute basics We already said: 👉 This is a Galbraith plot ⸻ 🔑 What each axis means (anchor this firmly) 👉 X-axis = precision (1/SE) 👉 Y-axis = Z score = standardised effect ⸻ 🧠 What is “standardised effect” (simple explanation) 👉 It means: “How big is the result compared to its uncertainty?” Mathematically: Z = \frac{\text{effect size}}{\text{standard error}} ⸻ 🟢 Intuitive version Imagine: • A study says “treatment works” • But how confident are we? 👉 Standardised effect answers: “Is this effect big enough relative to the noise?” ⸻ 💡 What the Y-axis is showing 👉 Higher up: • stronger effect relative to error 👉 Lower down: • weaker or uncertain effect
80
A doctor notices three cases of new onset myoclonic epilepsy in his inpatient unit. Accidentally, he discovers that all three patients had been weaned off the same depot medication 2 weeks ago. In order to alert the wider medical community about a possible discontinuation effect of this drug, the most preferred method of publishing this observation is:” Options: • Meta analysis • Case series • Systematic review • RCT • N of 1 trial ⸻
✅ Correct Answer: → Case series ⸻ 🧠 Think like the exam 👉 What do we have? • Only 3 patients • Observational finding • Possible new adverse effect ⸻ 💡 What is a case series (super simple) 👉 It means: “I saw a few similar patients and I’m reporting what I noticed” ⸻ 🔥 Why this is PERFECT here • You don’t have a trial • You don’t have controls • You just want to: 👉 alert others quickly 👉 New side effect = case report/series FIRST
81
A doctor wants to compare the prevalence of HIV encephalitis using a neuroimaging study along with HIV serology in an urban and a rural region, using the same test battery. Which of the following can be expected from the serology results?” Options: • Lower positive predictive value in the rural region • Lower sensitivity in the rural region • Lower positive predictive value in the urban region • Lower specificity in the rural region • Lower sensitivity in the urban region ⸻
✅ Correct Answer: → Lower positive predictive value in the rural region ⸻ 🧠 Now let’s break this down VERY simply Step 1: What is changing? 👉 Prevalence changes • Urban → HIGH HIV prevalence • Rural → LOW prevalence ⸻ Step 2: What stays the same? 👉 Same test → • Sensitivity = same • Specificity = same ⸻ Step 3: What changes with prevalence? 👉 ONLY: • PPV • NPV ⸻ 💡 Key concept (THIS IS EXAM GOLD) 👉 Higher prevalence → higher PPV 👉 Lower prevalence → lower PPV ⸻ 🧠 So here: • Rural → LOW prevalence ➡️ PPV ↓ ⸻ 🧠 What is PPV (beginner version) 👉 “If test is positive → how likely is it TRUE?” ⸻ 💡 Why PPV drops in rural Imagine: • Few people actually have HIV • Even if test is good 👉 many positives will be false ⸻ ❌ Why other options wrong ❌ Sensitivity / specificity 👉 DO NOT change with prevalence 🚨 EXAM TRAP 🚨 ⸻ 🔥 High-yield summary (must memorise) 👉 Sensitivity & specificity = test property 👉 PPV & NPV = population dependent
82
A group of men were examined during a routine screening for elevated blood pressure. Those men with the highest blood pressure (diastolic blood pressure higher than the 80th percentile for the group) were re-examined at a follow-up examination 2 weeks later. It was found that the mean for the re-examined men had decreased by 10mmHg at the follow-up examination. The most likely explanation is:” Options: • Repeated testing / learning effect • Measurement error • Regression towards the mean • Increased awareness leading to treatment seeking • The observers were better trained for the second examination ⸻
✅ Correct Answer: → Regression towards the mean ⸻ 🧠 Explain like you’re seeing this for the first time Step 1: What did they do? They picked: 👉 People with VERY HIGH blood pressure ⸻ Step 2: What happened later? 👉 Their average BP went DOWN ⸻ ❗ Your brain says: “Did something change?? Treatment?? Learning??” 🚫 NO. ⸻ 💡 What actually happened 👉 When you select extreme values (very high or very low)… ➡️ Next time you measure them ➡️ They naturally move closer to average ⸻ 🎯 Simple analogy Imagine: • You measure height of “tallest kids” in class • Next time → they’re still tall, but slightly less extreme 👉 Why? Because first measurement had random variation ⸻ 🔥 Key idea 👉 Extreme values = partly real + partly random fluctuation 👉 On repeat: ➡️ random part disappears ➡️ value moves closer to mean
83
A is strongly associated with B. It is investigated in a study whether A causes B. Which one of the following weakens the claim for a causal association between A and B?” Options: • Dose response relationship between A and B • A always precedes B • A and B are biologically related phenomena • Consistency of association between A and B • C, D and E are well established causes of B ⸻
✅ Correct Answer: → C, D and E are well established causes of B ⸻ 🧠 Now let’s simplify this BIG concept This is about: 👉 Causation vs association ⸻ 💡 Think like this You found: 👉 A is linked to B But question is: 👉 “Does A actually CAUSE B?” ⸻ 🔥 What weakens causation? 👉 If MANY OTHER things already cause B ⸻ 🎯 Simple example Let’s say: • A = drinking coffee • B = heart attack But: • Smoking • Diabetes • Hypertension ALREADY cause heart attacks 👉 Then coffee is LESS convincing as a cause ⸻ 💡 This is called 👉 Lack of specificity (Bradford Hill criteria) ⸻ 🧠 Why this weakens causality If: 👉 Many causes exist ➡️ A is less likely to be THE cause ⸻ ❌ Why others are WRONG (they actually SUPPORT causation) ✔ Dose-response 👉 More A → more B ➡️ STRONG evidence ⸻ ✔ A precedes B 👉 Cause must come BEFORE effect ⸻ ✔ Biological plausibility 👉 Makes sense scientifically ⸻ ✔ Consistency 👉 Seen in multiple studies ⸻ 🚨 Exam trick 👉 Most options = “support causation” 👉 One option = “weakens causation” ⸻ 🧠 High-yield summary 👉 Bradford Hill criteria (remember this list!): • Temporality (A before B) • Dose-response • Consistency • Biological plausibility • Specificity
84
“A multi centre double-blind pragmatic RCT reported remission rates for depression of 65% for fluoxetine and 60% for dosulepin. Number of patients that must receive fluoxetine for one patient to achieve the demonstrated beneficial effect is” Options: • 5 • 10 • 60 • 20 • 15 ⸻
✅ Correct Answer: → 20 ⸻ 🧠 Step-by-step (VERY beginner) Step 1: Identify what they’re asking 👉 “Number needed to treat (NNT)” ⸻ Step 2: What is NNT? 👉 “How many patients do I treat to help ONE extra person?” ⸻ Step 3: Calculate difference (this is KEY) Fluoxetine = 65% Dosulepin = 60% 👉 Difference = 5% ⸻ Step 4: Convert to decimal 5% = 0.05 ⸻ Step 5: Apply formula 👉 NNT = 1 / ARR ARR = absolute risk reduction = 0.05 👉 NNT = 1 / 0.05 = 20 ⸻ 🎯 Super simple way to think 👉 If 5 extra people improve per 100 patients ➡️ You need 20 patients to get 1 extra success ⸻ 🔥 High-yield rule 👉 Small difference = BIG NNT 👉 Large difference = small NNT ⸻
85
A multicentre trial to assess effectiveness of CBT-based treatment in early intervention for psychosis finds a significant effect in favour of the intervention. But secondary analysis reveals significant trial centre-effect on the outcome. Which of the following could possibly have resulted from such an effect?” Options: • Increase in magnitude of effect • Sampling error • Reduced precision of the outcome • No change in observed effect • Decrease in magnitude of effect ⸻
✅ Correct Answer: → Increase in magnitude of effect ⸻ 🧠 Now let’s make this EASY Step 1: What is “centre effect”? 👉 Different hospitals/centres give different results ⸻ Step 2: Why is this a problem? 👉 Patients in same centre are similar ➡️ Not independent ⸻ Step 3: What happens if we IGNORE this? 👉 We treat all patients as independent ➡️ This falsely: • makes data look stronger • exaggerates results ⸻ 💡 Result 👉 Effect size looks bigger than it actually is ➡️ Increase in magnitude of effect ⸻ 🎯 Simple analogy Imagine: • One hospital is AMAZING → all patients improve • Others average If you ignore clustering: 👉 It looks like treatment is AMAZING everywhere ⸻ 🔥 What error is this? 👉 Type I error (false positive) 👉 Overestimating effect ⸻ ❌ Why others wrong ❌ Sampling error 👉 Too vague ❌ Reduced precision 👉 Actually often looks MORE precise falsely ❌ No change 👉 There is an effect ❌ Decrease 👉 Opposite of what happens ⸻ 🧠 High-yield rule 👉 Cluster/centre effects → exaggerate treatment effect if ignored
86
A new diagnostic test detects 60 out of 100 schizophrenia patients correctly. It does not wrongly diagnose anyone in a sample of 100 controls. 👉 How specific is this test? ⸻
✅ Correct answer: 100% ⸻ 💡 Step-by-step explanation (VERY SIMPLE) 1️⃣ First: What is specificity? 👉 Specificity = ability to correctly identify NON-diseased people Formula: 👉 Specificity = True Negatives / Total Non-diseased ⸻ 2️⃣ Translate the question into a table We have: Diseased (schizophrenia patients = 100) • 60 detected correctly → True Positives (TP) = 60 • 40 missed → False Negatives (FN) = 40 ⸻ Non-diseased (controls = 100) • “Does NOT wrongly diagnose anyone” 👉 Means: • False Positives (FP) = 0 • So all are correctly negative → True Negatives (TN) = 100 ⸻ 3️⃣ Now calculate specificity 👉 Specificity = TN / (TN + FP) 👉 = 100 / (100 + 0) 👉 = 100 / 100 = 100% ⸻ 🔥 Key insight (THIS is the trick) 👉 The moment you see: “does NOT wrongly diagnose anyone” 💥 That means: ➡️ False positives = 0 ➡️ Specificity = 100% ⸻ ❌ Common trap People focus on: 👉 “60 out of 100” That is sensitivity, NOT specificity. ⸻ 🚨 High-yield facts (Paper B GOLD) 1. Sensitivity = TP / all diseased 2. Specificity = TN / all non-diseased 3. No false positives → specificity = 100% 4. No false negatives → sensitivity = 100% ⸻ 🧠 Exam takeaway 👉 Specificity cares ONLY about controls (non-diseased) ⸻ 💥 One-line memory trick 👉 “No false positives = perfect specificity”
87
A new diagnostic test detects 60 out of 100 schizophrenia patients correctly. It does not wrongly diagnose anyone in a sample of 100 controls. 👉 What is the Negative Predictive Value (NPV)? ⸻
💡 First — what is NPV? 👉 NPV = If the test is NEGATIVE → how likely the person is truly healthy Formula: 👉 NPV = True Negatives / All Negative Results ⸻ 🧠 Step 1: Build the table (same as before) Diseased (100 patients) • 60 correctly detected → TP = 60 • 40 missed → FN = 40 ⸻ Controls (100 people) • “No one wrongly diagnosed” 👉 FP = 0 👉 TN = 100 ⸻ 🧠 Step 2: Focus ONLY on NEGATIVE tests 👉 Who tested negative? • FN = 40 (they have disease but test says negative ❌) • TN = 100 (healthy and test negative ✅) 👉 Total negatives = 40 + 100 = 140 ⸻ 🧠 Step 3: Apply formula 👉 NPV = TN / (TN + FN) 👉 = 100 / (100 + 40) 👉 = 100 / 140 👉 = 0.714 = 71.4% ≈ 70% ⸻ ✅ Correct answer: 70% ⸻ 🔥 Why this confuses people You think: 👉 “No false positives → must be 100%” ❌ WRONG — that’s specificity, not NPV ⸻ 🚨 KEY DIFFERENCE (VERY HIGH-YIELD) 🔴 Specificity 👉 Looks ONLY at controls 👉 = TN / (TN + FP) ⸻ 🟢 NPV 👉 Looks at ALL NEGATIVE RESULTS 👉 Includes: • TN ✅ • FN ❌
88
A new diagnostic test for Neuroleptic Malignant Syndrome is compared with gold standard 'clinical examination with lab test battery'. Out of 30 suspected cases, only 20 are gold standard positive. In addition, out of 20 cases ruled out as not to have NMS by the new test, 5 were diagnosed with NMS using gold standard test. What is the sensitivity of the new test?
💡 Step 1: What is sensitivity? 👉 Sensitivity = True Positives / All Diseased ➡️ Translation: “Out of all people who ACTUALLY have the disease, how many did the test catch?” ⸻ 🧠 Step 2: Extract the numbers CAREFULLY 🟢 Gold standard = TRUTH ⸻ From the question: 1️⃣ “Out of 30 suspected cases, only 20 are gold standard positive” 👉 So: • Total tested positive = 30 • True positives (TP) = 20 • Therefore: 👉 False positives (FP) = 10 (because 30 - 20) ⸻ 2️⃣ “Out of 20 ruled out by new test (test negative), 5 actually had NMS” 👉 This is KEY: • These are people: • test says negative ❌ • but actually have disease ✅ 👉 So: ➡️ False negatives (FN) = 5 ⸻ 🧠 Step 3: Total diseased 👉 Total diseased = TP + FN 👉 = 20 + 5 = 25 ⸻ 🧠 Step 4: Apply formula 👉 Sensitivity = TP / (TP + FN) 👉 = 20 / 25 ⸻ ✅ Correct answer: 20/25
89
A new drug with activity in an animal model of depression is tested in patients with moderate depression. Out of 200 individuals, 100 are chosen using a computer-generated list to receive the drug. The drug is administered in a cup of chocolate drink, while the others receive pure chocolate drink. Both clinicians and patients are unaware of who received the medication. At the end of the study, a neutral observer unconnected to the trial measures depressive symptoms in all subjects. This study can be best described as: A. Randomised control trial B. Cross sectional study C. Case control study D. Cohort study E. Ecological study ⸻
✅ Correct answer A. Randomised control trial ⸻ 3️⃣ Clear, exam-focused explanation • The key clue is: • computer-generated list → randomisation • There is also: • intervention given to one group • control/placebo comparison • double blinding: • patients unaware • clinicians unaware • Neutral observer measuring outcome strengthens objectivity So this is a: • randomised • controlled • double-blind trial ⸻ 4️⃣ Why the other options are wrong B. Cross sectional study ❌ • Cross-sectional = snapshot at one point in time • No intervention, no randomisation C. Case control study ❌ • Starts with outcome, then looks back for exposure • Usually retrospective • No random allocation D. Cohort study ❌ • Follows exposed vs unexposed groups over time • Observational, not randomised intervention E. Ecological study ❌ • Unit of analysis is populations/groups, not individual patients ⸻ 5️⃣ ⭐ High-yield facts • Randomisation = strongest clue for RCT • Blinding reduces bias • Control group allows comparison • RCT = gold standard for testing treatment efficacy ⸻ 6️⃣ 🎯 One-line exam answer If patients are randomly allocated to treatment vs control, think randomised controlled trial.
90
test is being evaluated to predict treatment response in geriatric depression using neuroimaging techniques. The overall results of the test are very close to that observed on longitudinal follow-up after treatment (gold standard), but individuals vary widely in the magnitude of results produced. Which of the following correctly describes the properties of this test? A. Neither precise nor accurate B. Not precise but accurate C. Precise but not accurate D. Precise and accurate E. Accurate and sensitive ⸻
✅ Correct answer B. Not precise but accurate ⸻ This question is testing the difference between accuracy and precision. Accuracy • Means how close results are to the true value / gold standard • Here: • “overall results are very close to gold standard” • so the test is accurate Precision • Means how consistent / tightly clustered the results are • Here: • “individuals vary widely” • so the test is not precise So: • accurate • but not precise ⭐ High-yield facts • Accuracy = closeness to truth • Precision = reproducibility / consistency • Widely scattered results = poor precision • Average close to gold standard = good accuracy
91
Aplot of normal standard deviate against the reciprocal of the standard error can be used to study both heterogeneity and publication bias. This plot is known as: A. Scatter plot B. Survival plot C. Kaplan-Meier plot D. LOC curve E. Galbraith plot ⸻
✅ Correct answer: E. Galbraith plot ⸻ 💡 Explanation (EXAM LOGIC) 👉 This question is PURE pattern recognition. Key phrases: • “normal standard deviate” • “reciprocal of standard error” 💥 That combination = Galbraith plot ⸻ 🧠 What is a Galbraith plot? • A meta-analysis tool • Used to: detect heterogeneity explore publication bias
92
A primary care depression screening tool is evaluated by testing whether those diagnosed as “depressed” using this scale are also found to be depressed using: • Beck’s Depression Inventory • Hamilton Depression Scale • Clinician assessment What is being measured? A. Internal consistency B. Divergent validity C. Convergent validity D. Predictive validity E. Inter-rater reliability ⸻
✅ Correct answer: C. Convergent validity ⸻ 💡 Explanation (EXAM LOGIC) 👉 Key idea: You are comparing your new test with OTHER established tests measuring the SAME thing ⸻ 🧠 What is convergent validity? 👉 Does your test agree with other tests measuring the same construct? Here: • All tools measure depression • If they agree → good convergent validity ⸻ ⚠️ Why not others? ❌ Internal consistency • Do items within the SAME test agree? ⸻ ❌ Divergent validity • Should NOT correlate with unrelated constructs ⸻ ❌ Predictive validity • Predicts FUTURE outcomes ⸻ ❌ Inter-rater reliability • Agreement between different raters
93
A study finds urban living increases risk of schizophrenia. However, cannabis use was not measured. 👉 What is cannabis use in this context? Options: A. Additive factor B. Contaminating factor C. Causal factor D. Confounding factor E. Placebo factor ⸻
✅ Correct answer: D. Confounding factor ⸻ 🧠 Explanation (EXAM LOGIC) 👉 A confounder is: ✔ Associated with the exposure (urban living) ✔ Associated with the outcome (schizophrenia) ❌ NOT on the direct causal pathway ⸻ Apply it here: • Urban living → ↑ cannabis use ✅ • Cannabis use → ↑ schizophrenia risk ✅ • But cannabis ≠ caused by urban living directly in pathway ❌ 👉 So cannabis distorts the relationship ➡️ That’s a confounder ⸻ 🚨 High-yield trap Many people pick: 👉 “Causal factor” ❌ Because cannabis does increase schizophrenia risk BUT: 👉 The question is about study design, not biology ⸻ 💥 Exam pearl 👉 If a variable is: • linked to BOTH exposure & outcome • not measured → ALWAYS think confounding
94
RCT comparing two treatments for alcohol use disorder Outcome = time to first drink (relapse) 👉 Which statistical test compares the two groups? Options: A. Logarithmic transformation B. Log-linear analysis C. Log-rank test D. Logistic regression E. Log-based t test ⸻
✅ Correct answer: C. Log-rank test ⸻ 🧠 Explanation (EXAM LOGIC) 👉 This is: ✔ Time-to-event data ✔ Event = relapse ✔ Comparing two groups over time ⸻ So we use: 1️⃣ Kaplan–Meier → draw survival curves 2️⃣ Log-rank test → compare curves ⸻ 🔥 KEY ASSOCIATION (MUST MEMORISE) 👉 “Time to event” = ➡️ Kaplan-Meier + Log-rank test If you see: 👉 “time to relapse / survival / duration” → Log-rank test ❌ A. Logarithmic transformation What it actually is: 👉 A data transformation technique Used when: • Data is skewed (e.g. income, biomarkers) • You want to make it more normally distributed ⸻ ❌ Why it’s wrong here: • It does NOT compare groups • It does NOT analyse time-to-event data • It’s just pre-processing, not a test 👉 Think: “Log transformation = fixing data shape, not testing hypotheses” ⸻ ❌ B. Log-linear analysis What it actually is: 👉 Used for categorical data relationships 👉 Especially in contingency tables Example: • Relationship between gender × diagnosis × treatment ⸻ ❌ Why it’s wrong here: • No categorical interaction being tested • Outcome is time, not categories 👉 Think: “Log-linear = categories talking to each other, not time passing” ⸻ ❌ D. Logistic regression What it actually is: 👉 Used when outcome is binary Examples: • Dead vs alive • Relapsed vs not relapsed ⸻ ❌ Why it’s wrong here: • Here we care about WHEN relapse happens, not just IF • Logistic regression would ignore timing completely ⸻ 🔥 Key distinction: • Logistic regression → “Did relapse happen?” • Survival analysis → “When did relapse happen?” 👉 This question = WHEN → survival analysis → log-rank ⸻ ❌ E. Log-based t test What it even is: 👉 Not a standard exam-relevant test (essentially a distractor) Closest idea: • t-test compares means of continuous variables ⸻ ❌ Why it’s wrong: • Time-to-event data is not analysed with t-tests • Doesn’t handle: censoring (VERY important in survival data) time dimension
95
Uncertainty in cost-effectiveness estimates is best summarised using which method? Options: A. Cost-effectiveness plane B. ROC curve C. Willingness curve D. Cost-utility curve E. Cost-effectiveness acceptability curve ⸻
✅ Correct answer: E. Cost-effectiveness acceptability curve (CEAC) ⸻ 🧠 Why this is correct (core idea) 👉 CEAC shows: ➡️ Probability that an intervention is cost-effective ➡️ Across different willingness-to-pay thresholds 👉 It directly answers: “Given uncertainty, how confident are we this treatment is worth it?” ⸻ ❌ Why the others are WRONG ❌ A. Cost-effectiveness plane 👉 What it does: • Plots incremental cost vs incremental effect ⸻ 👉 Why wrong: • Shows distribution of results • BUT does NOT summarise uncertainty as probability 💡 Think: Plane = picture CEAC = decision-making ⸻ ❌ B. ROC curve 👉 Used for: • Diagnostic test performance • Sensitivity vs specificity ⸻ 👉 Why wrong: • NOTHING to do with economics 🚨 Easy elimination ⸻ ❌ C. Willingness curve 👉 Sounds tempting ❗ (exam trap) 👉 Reality: • Not a standard/statistical tool in health economics exams ⸻ 👉 Why wrong: • CEAC already incorporates willingness-to-pay 💡 Trap = wording similarity ⸻ ❌ D. Cost-utility curve 👉 What it is: • Cost per QALY framework (type of analysis) ⸻ 👉 Why wrong: • It’s a method of evaluation, not a way to show uncertainty
96
Which study design approach reduces confounding? Options: A. Increasing sample size B. Consecutive sampling C. Volunteer selection D. Randomization E. Pragmatic trial approach ⸻
✅ Correct answer: D. Randomization ⸻ 🧠 Why this is correct 👉 Randomization: ✔ Distributes confounders equally between groups ✔ Includes known AND unknown confounders 👉 This is the ONLY method that truly balances unknown confounders ⸻ ❌ Why the others are WRONG ⸻ ❌ A. Increasing sample size 👉 What it does: • Improves precision • Reduces random error ⸻ 👉 Why wrong: • Does NOT fix systematic bias (confounding) 💡 Big study ≠ unbiased study ⸻ ❌ B. Consecutive sampling 👉 What it does: • Takes patients in order ⸻ 👉 Why wrong: • Helps reduce selection bias • Does NOTHING for confounders ⸻ ❌ C. Volunteer selection 👉 What it does: • People choose to participate ⸻ 👉 Why wrong: • Actually INTRODUCES bias (volunteer bias) ⸻ ❌ E. Pragmatic trial 👉 What it does: • Real-world applicability ⸻ 👉 Why wrong: • Improves external validity • Does NOT control confounding
97
An RCT shows time to HIV infection over months for: • Placebo • TDF • TDF–FTC 👉 Graph shows cumulative probability over time What type of graph is this? Options: • Scatter plot • L’Abbé plot • Galbraith plot • Survival plot • Forest plot
✅ Correct answer: Survival plot (Kaplan–Meier curve) ⸻ 🧠 WHY this is correct Look at the key features 👇 👉 X-axis = time (months) 👉 Y-axis = probability (cumulative incidence) 👉 Step-like curves 👉 Multiple groups compared ⸻ 💡 This = survival analysis Even though it says: “cumulative probability of HIV” 👉 That’s just the inverse of survival • Survival plot = probability of NOT having event • This graph = probability of HAVING event 👉 Same method → Kaplan–Meier ⸻ 🔥 KEY EXAM TRICK 👉 If you see: • Time on X-axis • Step-like curves • Multiple treatment groups ➡️ ALWAYS = Kaplan–Meier / survival plot
98
A research study was designed to have a power of 80% to detect a 15% difference in the mean outcomes of two study groups. How can this statement be interpreted? Select one: • The investigator had an 80% chance of rejecting the null hypothesis if it is actually true. • The probability of rejecting a false null hypothesis is 20% • The investigator had an 80% chance of retaining the null hypothesis if it is actually false • This vague statement cannot be interpreted meaningfully • The investigator had an 80% chance of detecting a difference of 15% if 15% or more if it was actually present
✅ Correct answer: E ⸻ 🧠 CORE CONCEPT (THIS IS EVERYTHING) 👉 Power = probability of detecting a true effect 👉 Formula: Power = 1 − β (Type II error) ⸻ 💡 So: If power = 80% 👉 There is: • 80% chance of detecting a true difference • 20% chance of missing it (Type II error) ⸻ What is the question REALLY asking? Study has 80% power to detect a 15% difference 👉 Translation: • If a real difference exists (≥15%) • The study will successfully find it 80% of the time ⸻ 💡 FIRST: What does “power” mean? 👉 Power = ability to detect a real effect The formula: 👉 Power = 1 − β (beta) • β = Type II error = missing a real effect • So: 👉 80% power = 20% chance of missing a real difference ⸻ 🎯 Put it in human language Imagine: 👉 There REALLY IS a 15% difference between groups You repeat the study many times… ➡️ In 80 out of 100 times, the study will: ✔ detect it ✔ give a significant result ➡️ In 20 out of 100 times, it will: ❌ miss it ⸻ ✅ Now look at the correct answer “The investigator had an 80% chance of detecting a difference of 15% if 15% or more if it was actually present” 👉 This is EXACTLY what we just said ✔ real difference exists ✔ study detects it 80% of the time ⸻ ❌ Why the other options are wrong (this is the key) ⸻ ❌ “80% chance of rejecting null if it is actually true” 👉 This is describing: ➡️ Type I error (alpha) 💡 Rejecting a TRUE null = false positive 👉 Power has NOTHING to do with this ⸻ ❌ “Probability of rejecting a false null is 20%” 👉 Rejecting a false null = detecting a real effect 👉 That should be 80% (power) ❌ Not 20% ⸻ ❌ “80% chance of retaining null if it is false” 👉 Retaining a false null = missing a real effect 👉 That is: ➡️ Type II error (β) = 20% ❌ Not 80% ⸻ ❌ “This cannot be interpreted” 👉 It absolutely can — very standard statement
99
A research team sets out to review the evidence base for combining antidepressants to treat resistant depression. For the literature search to be complete which of the following strategy should be avoided? Select one: • Using multiple databases to search • Discarding conference posters as data might not be peer reviewed • Writing to experts for any missing studies • Retrieving multiple language papers • Cross checking reference list from identified studies
✅ Correct answer: 👉 B. Discarding conference posters as data might not be peer reviewed ⸻ 💡 Explanation (exam-focused) 👉 The question is testing: ➡️ How to avoid publication bias in systematic reviews ⸻ 🔑 Key principle: 👉 A complete literature search must include: • Published studies ✔️ • Unpublished data ✔️ • Grey literature ✔️ ⸻ 🧠 What are conference posters? 👉 They are: • Grey literature • Often unpublished studies • Sometimes include negative or non-significant results ⸻ ❗ Why you should NOT discard them: 👉 If you remove them: ➡️ You introduce publication bias Because: • Positive studies → published • Negative studies → often only in posters ⸻ ❌ Why the other options are correct (and should NOT be avoided) ⸻ ✅ A. Using multiple databases • Increases coverage • Reduces missing studies ⸻ ✅ C. Writing to experts • Helps find unpublished or ongoing studies ⸻ ✅ D. Retrieving multiple language papers • Avoids language bias ⸻ ✅ E. Cross-checking references • Finds studies missed in database search
100
A researcher compares two diagnostic tests for alcohol dependence, one of which is considered to be a ‘gold standard’. What is the single most important issue to consider when assessing the validity of such a study? Select one: A. The sensitivity and specificity of the test. B. The inter-rater reliability. C. Whether the test and ‘gold standard’ were applied independently D. The impact factor of the journal. E. The test-retest reliability of the ‘gold standard’. ⸻
✅ Correct answer: 👉 C. Whether the test and ‘gold standard’ were applied independently ⸻ 🔍 Explanation (exam-focused) • Key issue = avoidance of bias • Specifically → work-up bias (verification bias) 👉 If: • Only test-positive patients get the gold standard → results become distorted ⸻ ❗ Why independence matters: • Both tests must be applied to ALL participants • And blinded from each other 👉 Otherwise: • Sensitivity ↑ falsely • Specificity ↓ falsely ⸻ ❌ Why others are wrong: • A. Sensitivity & specificity • These are outcomes, not the main validity threat • B. Inter-rater reliability Measures agreement, not diagnostic validity • D. Impact factor Irrelevant (classic distractor) • E. Test-retest reliability About consistency, not bias in validation ⸻ 🎯 Exam takeaway 👉 Gold standard must be applied independently to ALL patients → prevents verification/work-up bias 🧠 First: what are we worried about? In diagnostic studies: 👉 We compare a new test vs a gold standard BUT… 👉 If we don’t apply both tests properly → bias creeps in ⸻ 🚨 1. Work-up bias (Verification bias) 👉 These are basically the same thing (different names) ⸻ 💡 Definition (simple): Not everyone gets the gold standard test ⸻ 🎯 What usually happens: • Patient does screening test first • Only positives go on to get the gold standard 👉 Negatives are ignored ⸻ 🔴 Example: New test for depression: • Positive → seen by psychiatrist (gold standard) • Negative → sent home ⸻ ❗ What’s the problem? 👉 You never confirm if negatives were truly negative ➡️ You miss false negatives ⸻ 📉 Effect on results: • Sensitivity → looks higher than it really is • Specificity → also distorted ⸻ 💥 One-line: 👉 Work-up bias = only some people get the gold standard
101
A researcher finds that consumed cannabis dose (CD) correlates with Positive and Negative Symptom Score (PANSS) [Pearson’s correlation p=0.04]. He also finds that consumed nicotine dose (ND) correlates with PANSS score with p = 0.03. Which of the following is an incorrect conclusion? Select one: A. Cannabis dose is significantly correlated with PANSS B. Parametric statistics has been used in this study C. Relationship between cannabis and PANSS is stronger than relationship between nicotine and PANS D. PANSS score has been possibly treated as a continuous variable E. Nicotine dose is significantly correlated with PANSS ⸻
✅ Correct answer: 👉 C. Relationship between cannabis and PANSS is stronger than relationship between nicotine and PANS ⸻ 🔍 Explanation (exam-focused) • Both p-values: • 0.04 → significant • 0.03 → significant 👉 BUT: • p-value ≠ strength of relationship ⸻ 🧠 What is missing? 👉 Correlation coefficient (r) • That tells strength • p-value only tells statistical significance ⸻ ❌ Why option C is wrong: • You cannot compare strength using p-values • Smaller p ≠ stronger association ⸻ ✅ Why others are correct: • A & E: Both p < 0.05 → significant ✔️ • B: Pearson correlation → parametric ✔️ 🔑 Step 1: What did the question say? 👉 It said: Pearson’s correlation (p = 0.04) ⸻ 🧠 Step 2: What does Pearson automatically imply? From what we just learned: 👉 Pearson = parametric test So the moment you see: ➡️ Pearson correlation 💥 You should instantly think: ➡️ Parametric statistics is being used ⸻ 🧠 Step 3: Why is Pearson “parametric”? Because it assumes: • Data is continuous • Data is normally distributed • Relationship is linear 👉 These are parametric assumptions ⸻ 🔥 So option B is basically asking: 👉 “Since Pearson was used… does that mean parametric statistics was used?” ✔️ YES → correct • D: PANSS treated as continuous ✔️ 🧠 What is a “continuous variable”? 👉 A number that can take MANY values Examples: • Height • Weight • Blood pressure • PANSS score ⸻ ❗ PANSS specifically: • It’s a score • Made of multiple items • Final result = number (e.g. 65, 72, 83) 👉 That makes it continuous ⸻ 🔥 Why does correlation matter here? 👉 Correlation (Pearson) requires: • TWO continuous variables ⸻ 🧠 In this question: They correlated: • Cannabis dose (number) • PANSS score (number) 👉 So PANSS must be treated as continuous ✔️ Therefore option D = TRUE ⸻ 🎯 Exam takeaway (VERY HIGH-YIELD) 👉 Never compare strength of relationships using p-values 👉 Always need r (correlation coefficient)
102
Correlation coefficient and Pearson’s (r) vs Speraman’s (ρ (rho))
🧠 1. What is “correlation” in the first place? 👉 Correlation = “Do two variables move together?” Examples: • Dose ↑ → symptoms ↓ • Stress ↑ → anxiety ↑ ⸻ 📊 Correlation coefficient (r) • Ranges from -1 → +1 🔹 2. Pearson vs Spearman (THE big exam distinction) 🔵 Pearson’s correlation (r) 👉 Measures: Linear relationship between two continuous variables ⸻ ✅ Requirements: • Both variables = continuous • Data ≈ normally distributed • Relationship = linear (straight line) ⸻ 🧠 Example: • PANSS score vs cannabis dose • Height vs weight ⸻ 📈 What it looks like: Points form a straight-ish line ⸻ 🟢 Spearman’s correlation (rho, ρ) 👉 Measures: Monotonic relationship (not necessarily linear) ⸻ ✅ Requirements: • Data can be: • Ordinal OR continuous • NO need for normal distribution ⸻ 🧠 Example: • Rank in class vs exam score • Severity scale (mild/mod/severe) vs outcome ⸻ 📈 What it looks like: • Can be curved • As long as it goes consistently ↑ or ↓ 🧠 One-line memory hack 👉 Pearson = precise numbers 👉 Spearman = ranks / rough order ⸻ 🚨 VERY COMMON EXAM TRAPS ❌ Trap 1: Using Pearson when data is skewed → WRONG 👉 Use Spearman ⸻ ❌ Trap 2: Thinking both measure “any relationship” 👉 Pearson = linear only 👉 Spearman = any consistent trend ⸻ 🔹 3. What else do you NEED to know about correlation coefficient? This is where examiners get sneaky 👇 ⸻ 🧠 (1) Correlation ≠ causation 👉 Just because two things move together ❌ does NOT mean one causes the other Example: • Ice cream sales ↑ • Drowning ↑ 👉 Confounder = summer ☀️ ⸻ 🧠 (2) Strength vs significance r value strength of relationship p value whether it’s statistically significant 🔥 Example: • r = 0.8, p = 0.2 → strong but NOT significant • r = 0.2, p = 0.01 → weak but significant 🧠 (4) Outliers can RUIN Pearson 👉 One extreme value → distorts line ➡️ Spearman is more robust ⸻ 🧠 (5) Correlation only detects LINEAR (Pearson) 👉 If relationship is curved: • Pearson → may say no correlation • Spearman → still detects it ⸻ 🧠 (6) Units don’t matter 👉 Correlation is unit-free • kg vs cm → still same r 🔑 How do you decide WHICH one? 🔵 Use Pearson if: • Data is continuous (numbers like PANSS, weight, BP) • Data is normally distributed • Relationship is linear 👉 Example: • PANSS score vs cannabis dose ✔️ → Pearson ⸻ 🟢 Use Spearman if: • Data is ordinal (rank, Likert scale, mild/mod/severe) • OR data is not normally distributed • OR relationship is not linear 👉 Example: • Rank in class vs performance ✔️ → Spearman ⸻ 🔥 Exam shortcut (VERY important) 👉 If you see: • Pearson mentioned → parametric → continuous • Spearman mentioned → non-parametric → ordinal / skewed
103
A research team sets out to find an association between domestic abuse and mid-life depression. They conclude that a significant association is present between domestic abuse and depression. All of the following can explain the study result except Select one: A. An unknown factor mediates the relationship between the studied factors, albeit not in the causal pathway B. The researcher has committed a type 1 error C. The sample examined has a chance association D. The researcher has committed a type 2 error E. The researcher has committed a systematic error leading to this result ⸻
✅ Correct answer: 👉 D. The researcher has committed a type 2 error ⸻ 🔍 Explanation (exam-focused) 👉 They found a significant association So: ➡️ They rejected the null hypothesis ⸻ 🧠 What is Type 2 error? 👉 False negative • There IS a real effect • But you fail to detect it ⸻ ❗ Why this cannot explain the result: 👉 Here they DID find a significant association ➡️ So they did NOT miss an effect ❌ Therefore → NOT Type 2 error ⸻ ❌ Why the others CAN explain the result: ⸻ ✅ A. Unknown factor (confounding) • Third variable explains the association ⸻ ✅ B. Type 1 error • False positive (very important) • You detect association when none exists ⸻ ✅ C. Chance association • Random variation → false positive ⸻ ✅ E. Systematic error (bias) • Study design flaw → distorted result ⸻ 🎯 Exam takeaway 👉 If result is significant → think: • Type 1 error ✔️ possible • Type 2 error ❌ NOT possible