What is an unpaired T-test? What is its non-parametric equivalent?
📊** Unpaired (Independent) t-Test**
An unpaired t-test compares the means of two independent groups to see if they are significantly different.
It answers:
“Are the averages in these two unrelated groups different?”
🧠** When Do You Use It?**
Two independent groups
Continuous data
Approximately normally distributed
Similar variance
🏥** Clinical Examples**
1️⃣ Antidepressant Study
Compare mean depression scores between:
Drug group
Placebo group
2️⃣ Blood Pressure Comparison
Compare average systolic BP between:
Smokers
Non-smokers
🧠** Memory Hook (Unpaired t-test)**
Unpaired = Unrelated groups.
Or:
“Two groups. Two means. Test the difference.”
🔄 Mann–Whitney U Test (Non-Parametric Version)
The Mann–Whitney U test is the non-parametric alternative to the unpaired t-test, used when the data are not normally distributed or are ordinal. Instead of comparing means, it compares the ranked positions of values between two independent groups.
It answers:
“Are the values in one group generally higher or lower than the other?”
🔁 How It Relates to the t-Test
Unpaired t-test → compares means (assumes normal distribution)
Mann–Whitney U → compares ranks (no normality assumption)
Same situation (two independent groups),
different method when assumptions fail.
🧠** Better Memory Hook (Logical + Funny)**
“When the data get weird, call Mann–Whitney.”
Or even better:
t-test tests the Mean.
Mann–Whitney minds the Mess.
Because when the data are messy (skewed, non-normal), Mann–Whitney steps in.
🔥** Ultra-Exam Hook**
Unpaired t-test = Mean vs Mean
Mann–Whitney = Rank vs Rank
Provide a clinical example of data you would use a Mann-Whitney U test for?
Here’s a clear, exam-style clinical example of when and how a Mann–Whitney U test is used.
🔹 Clinical Scenario
A psychiatry registrar wants to compare depression severity scores between:
Outcome measure: PHQ-9 score at 12 weeks
However:
Because the data are:
➡ The appropriate test is a Mann–Whitney U test (non-parametric alternative to independent t-test).
🔹 Hypothetical Data
CBT group PHQ-9 scores:
4, 6, 5, 3, 8, 7, 6, 5, 4, 3, 9, 6
Supportive therapy scores:
8, 10, 7, 9, 11, 12, 6, 8, 10, 9, 13, 7
🔹 Clinical Question
> Is there a statistically significant difference in depression severity between CBT and supportive therapy groups?
🔹 Why Not a t-test?
A t-test assumes:
Here:
So we use Mann–Whitney U, which:
🔹 Interpretation (Hypothetical Result)
If analysis shows:
Interpretation:
> There is a statistically significant difference in PHQ-9 scores between the CBT and supportive therapy groups. Patients receiving CBT had significantly lower depression scores at 12 weeks.
🔹 How You’d Phrase It in a Paper
> A Mann–Whitney U test showed that PHQ-9 scores were significantly lower in the CBT group (median = 5.5) compared with the supportive therapy group (median = 9.0), U = 20, p = 0.01.
🔹 High-Yield Exam Pearl
Use Mann–Whitney U when:
🔹 Quick Comparison Memory Hook
What is a one sample t-test?
For parametric data.
One-Sample t-Test and Paired t-Test
(They are related but answer different questions.)
=====
🔹 1️⃣ One-Sample t-Test
✅ What It Is
A one-sample t-test compares:
> The mean of ONE sample
to
A known or hypothesised population mean
It answers:
> “Is my sample different from a known value?”
It assumes:
====
🩺 Clinical Example 1 – Lithium Level Monitoring
Therapeutic lithium level is 0.6 mmol/L.
A registrar measures lithium levels in 15 patients at 6-month review.
Mean lithium level in the clinic sample = 0.72 mmol/L
Clinical question:
> Is our clinic’s mean lithium level significantly different from the recommended therapeutic target (0.6 mmol/L)?
A one-sample t-test compares:
If p < 0.05 → clinic prescribing pattern significantly differs from target.
====
🩺 Clinical Example 2 – Length of Stay Audit
National average psychiatric inpatient stay = 21 days.
Your ward’s last 20 discharges:
Mean LOS = 25 days.
Question:
> Is our ward’s mean length of stay significantly different from the national average?
→ Use a one-sample t-test.
====
🔑 Exam Memory Hook
One sample → one group vs one known number
====
What is the non-parametric version of a one-sample (paired) T test?
Wilcoxon matched pairs test.
💍 2️⃣ Wilcoxon Matched Pairs = Non-Parametric Paired t-test
💒 Memory Hook:
“The Wedding Without the Bell Curve.”
A wedding = pairs 👰🤵
The paired t-test is the classy wedding that requires a perfect bell-shaped cake 🎂 (normal distribution).
The Wilcoxon wedding?
Still paired…
But the cake collapsed.
No bell curve required.
Logical link:
Same people measured twice
Paired t-test → compares mean difference
Wilcoxon → ranks the differences
Same structure, just no normality assumption
🔑 Trigger phrase:
“Same people twice? If the bell curve dies — invite Wilcoxon.”
What is the difference between an unapired t-test and paired t-test?
Great question — this is high-yield stats and examiners love it.
🧠 Core Difference in One Line
1️⃣ Unpaired (Independent) t-test
📌 What it compares
Two separate, unrelated groups
📌 Research question
> “Are the means of these two independent groups different?”
📌 Clinical examples
📌 Key assumption
The two groups are independent (no overlap, no pairing)
🧠 Memory Hook
Think:
👬 UNpaired = UNrelated groups
Two strangers in a bar. No relationship.
2️⃣ Paired t-test (Dependent t-test)
📌 What it compares
Two measurements from the same participants
📌 Research question
> “Did the mean change within this group?”
📌 Clinical examples
Each person acts as their own control.
🧠 Memory Hook
👫 Paired = Partners
Same person, two time points.
Like checking your weight before and after Christmas 🎄🍰
🔬 Why This Matters Statistically
🔥 Clinical Pearl
Paired t-tests are more powerful because they remove individual baseline differences.
Example:
If someone naturally has high anxiety, that baseline trait doesn’t distort the result — you only measure their change.
🚨 Common Exam Trap
If the same participants are measured twice and you choose unpaired → ❌ Wrong.
If two separate groups and you choose paired → ❌ Wrong.
Always ask:
> Are these the same people?
⚡ Ultra-Short Summary
Feature | Unpaired | Paired |
| ———– | ——————————— | ——————————– |
| Groups | Different people | Same people |
| Variability | Includes between-person variation | Removes between-person variation |
| Power | Lower | Higher (less noise) |
What is a paired t-test?
🔹 2️⃣ Paired t-Test (Dependent t-Test) - PAIRED T-TEST/ DEPENDENT T-TEST
✅ What It Is
A paired t-test compares:
> Two measurements taken from the SAME individuals
It answers:
> “Did this group change?”
It assumes:
====
🩺 Clinical Example 1 – Antidepressant Trial
PHQ-9 scores measured:
Each patient has:
We compare:
> Mean difference in PHQ-9 within the same patients.
This is NOT independent groups — it’s the same people.
So we use a paired t-test.
====
🩺 Clinical Example 2 – Clozapine and Weight Gain
Patients weighed:
Question:
> Has weight significantly increased?
→ Paired t-test compares weight before and after in the same individuals.
====
🔬 Why Not Independent t-Test?
Because measurements are not independent.
Using an independent t-test would:
====
🔹 The Key Concept Difference
(See attached)
====
🔑 Ultra-High-Yield Summary
====
🧠 Exam Trap Alert
If you see:
What is a Wilcoxon matched pairs test?
📊 Wilcoxon Matched-Pairs Test (Wilcoxon Signed-Rank Test)
🧠 What Is It?
The Wilcoxon matched-pairs test (also called the Wilcoxon signed-rank test) is the non-parametric alternative to the paired t-test.
👉 It compares two related measurements from the same participants, but
👉 it does NOT assume normal distribution.
Instead of comparing means, it compares the ranks of the differences.
🩺 Clinical Examples
1️⃣ Depression scores before & after therapy
2️⃣ Pain scores pre- and post-surgery
Pain scales (0–10) are ordinal, not truly continuous
→ Wilcoxon is more appropriate
3️⃣ CRP levels before and after antibiotics
CRP is often highly skewed
→ Non-normal → Wilcoxon
🔬 What It Actually Tests
It looks at:
If most differences go in one direction → statistically significant.
🧪 Exam-Ready Comparison: Wilcoxon vs Paired t-test
(See comparison table)
🎯 High-Yield Exam Logic
If question says:
If it says:
⚡ Quick Decision Algorithm
Same people twice?
→ YES
Are data normally distributed?
→ YES → Paired t-test
→ NO → Wilcoxon
→ NO
Use independent tests instead
🧠 Memory Hook
💍 Wilcoxon = “Wickedly non-normal wedding”
Wedding = paired
Wicked = weird distribution
No normality allowed 🎭
🔥 Clinical Pearl
Most real-world psychiatric scales (e.g. Likert-based symptom ratings) are technically ordinal and often skewed — which makes Wilcoxon statistically more correct, even though many studies still use paired t-tests for convenience.
🧾 Ultra-Short Summary (Exam Version)
The Wilcoxon matched-pairs test is the non-parametric alternative to the paired t-test used for two related samples when data are ordinal or not normally distributed. It compares ranked differences rather than mean differences.
–
Love this 😄 — let’s make it 10-year-old simple.
🎒 The Sticker Test Example
Imagine you have 6 kids.
You give them a new “Super Study Juice” 🧃 and test how many spelling words they get right:
Because it’s the same kids twice, this is a paired situation.
📊 The Scores
🤔 What Are We Trying to See?
Did the juice actually help?
But here’s the twist:
Maybe the scores are kind of messy and not nicely spread out.
So instead of looking at averages, we do something simpler.
🏅 What Wilcoxon Does (Kid Version)
Instead of asking:
> “What’s the average improvement?”
It asks:
> “Did most kids get better?”
Step 1️⃣
Look at how much each kid improved.
Amy: +3
Ben: +1
Cara: +5
Dan: 0
Eva: +3
Finn: +1
Step 2️⃣
Rank the improvements (smallest to biggest)
We don’t care about exact numbers — just who improved a little vs a lot.
Step 3️⃣
Check direction
If most improvements are positive, the juice probably helped.
If half went up and half went down → probably no effect.
🧠 The Big Idea
Wilcoxon is like saying:
👉 “I don’t care about fancy averages.
👉 I just want to know if most people improved.”
🆚 Compared to Paired t-test (Kid Version)
| Paired t-test | Wilcoxon |
| ————————– | ——————————- |
| Looks at average change | Looks at ranked change |
| Needs nice, normal numbers | Works even if numbers are messy |
| More mathy | More “fair and simple” |
🧁 Even Simpler Analogy
Paired t-test =
📏 “Let’s measure exactly how much taller everyone grew.”
Wilcoxon =
🏆 “Let’s line them up and see who grew more than who.”
💡 One-Sentence Memory Trick
Wilcoxon = “Were most kids better?”
Paired t-test = “How much better on average?”
Kid | Before | After |
| —- | —— | —– |
| Amy | 5 | 8 |
| Ben | 6 | 7 |
| Cara | 4 | 9 |
| Dan | 7 | 7 |
| Eva | 3 | 6 |
| Finn | 8 | 9 |
What is a one way analysis of variance F test using total sum of squares? (One-way ANOVA)
📊 One-Way Analysis of Variance (ANOVA) — The F Test
(using Total Sum of Squares)
🧠 What Is It? (Exam-Ready)
A one-way ANOVA tests whether the means of 3 or more independent groups are different.
It uses an F-test, which compares:
> 🔹 Variability between groups
to
🔹 Variability within groups
If between-group variation is much bigger than within-group variation → the means are likely different.
🧮 The Key Idea: Total Sum of Squares (SST)
Think of total variability in your data as a pie 🥧
Total variation = SST (Total Sum of Squares)
This gets split into two parts:
1️⃣ Between-group variation (SSB)
→ How far group means are from the overall mean
2️⃣ Within-group variation (SSW)
→ How spread out people are inside each group
The Formula Logic
(See attached image of formula)
If groups are truly different → SSB will be large → F will be large.
🩺 Clinical Examples
1️⃣ Comparing 3 antidepressants
You measure depression scores in patients on:
Question:
> Is at least one drug’s mean score different?
Use one-way ANOVA.
2️⃣ Comparing therapy types
You compare:
Measure mean anxiety reduction.
Again → one-way ANOVA.
🔥 Exam Pearl
If you compare more than two independent groups, use ANOVA, not multiple t-tests.
Multiple t-tests inflate Type I error.
👶 Explanation for a 10-Year-Old
🍦 Ice Cream School Example
Three classes eat different ice creams:
Afterward, you measure how happy each class is.
Now you ask:
> Are the classes happy because of the flavour,
or are kids just randomly different in happiness?
Step 1: Look at Total Mess (Total Variation)
All kids have different happiness levels.
That’s your total mess (Total Sum of Squares).
Step 2: Split the Mess
Some mess happens because:
Some mess happens because:
Step 3: Compare Them
If:
Then flavour probably matters.
That comparison = the F test
🧠 What Does the F Mean?
F is basically asking:
> “Is the difference between groups bigger than the random noise inside groups?”
If YES → big F → significant result.
📊 Clean Exam Comparison
🧠 Memory Hook
ANOVA = “Another Number Of Various Averages”
Or better:
> ANOVA asks:
“Are the groups different, or is it just noise?”
And remember:
⚡ Ultra-Short Summary (Exam Version)
One-way ANOVA tests whether three or more independent group means differ by partitioning total variance into between-group and within-group components. The F statistic compares these two sources of variance.
What is the non-parametric alternative to a one-way ANOVA?
A Kruskall-Wallis test.
🥊 1️⃣ Kruskal–Wallis = Non-Parametric One-Way ANOVA
🎤 Memory Hook:
“ANOVA Brings Averages. Kruskal Brings a Ranking Contest.”
Imagine:
👔 ANOVA walks into a conference room with a calculator saying:
“Let’s compare the averages of these 3 groups.”
Then 🎖️ Kruskal–Wallis bursts in and says:
“Forget averages. Line them ALL up. I’m ranking everybody.”
Why this works logically:
One-way ANOVA → compares means across 3+ independent groups
Kruskal–Wallis → compares ranks across 3+ independent groups
Same structure (3+ independent groups)
Just swaps means → ranks
🔑 Trigger phrase:
“Three groups? If messy data — rank the crowd.”
What’s the non-parametric equivalent of an unpaired T-test?
Mann-Whitney U test - non-parametric unpaired t-test.
🥊 3️⃣ Mann–Whitney U = Non-Parametric Unpaired t-test
⚔️ Memory Hook:
“Two Teams, No Averages — Just a Tug-of-War.”
Picture two teams pulling a rope.
👨🔬 Unpaired t-test says:
“Let’s compare the average strength of each team.”
🏋️ Mann–Whitney says:
“Nope. Let’s mix everyone together, rank them, and see which team keeps winning the top spots.”
Logical link:
Two independent groups
Unpaired t-test → compares means
Mann–Whitney → compares ranks
Same setup, just no normality required
🔑 Trigger phrase:
“Two separate teams? If data are ugly — rank the fight.”
Compare PURPOSE/STUDY GROUP of each test, provide its non-parametric twin, and provide a clinical example.
1) Paired t-test
2) Unpaired t-test
3) One-way ANOVA
4) Two-way ANOVA
See attached image.
What is a two-way analysis of variance? (Two way ANOVA)
🧠 What Is Two-Way ANOVA?
Two-way ANOVA tests:
1️⃣ The effect of Factor A
2️⃣ The effect of Factor B
3️⃣ The interaction between them
🩺** Clinical Example Explained**
Suppose you study:
Therapy type: CBT vs Psychodynamic
Sex: Male vs Female
You’re asking:
1) Does therapy type matter?
2) Does sex matter?
3) Does therapy work differently depending on sex? (interaction)
That last question is the big one.
🎯 Why There’s No Simple Non-Parametric Twin
For:
2 groups → Mann–Whitney
3+ groups → Kruskal–Wallis
But for two independent variables, things get more complicated.
There is no clean, exam-friendly non-parametric twin like the others.
Advanced options exist (e.g., aligned rank transform), but they’re rarely tested clinically.
What is a chi squared test?
Perfect — let’s make this clear, simple, exam-ready, and include the null hypothesis properly.
AssoCHIation between discrete variables
📊 Chi-Squared (χ²) Test — Distribution-Free Test
🧠 What Is It?
A chi-squared (χ²) test is a non-parametric (distribution-free) test used to compare categorical (discontinuous) variables between two or more independent groups.
It tests whether:
> The distribution of a categorical variable is the same across groups.
🎯 The Null Hypothesis (This Is High-Yield)
The null hypothesis (H₀) says:
> There is no association between the variables.
The distribution of the categorical variable is the same in all groups.
In simpler words:
> Any differences in proportions happened by chance.
🔬 Even Simpler Version (Cleaned-Up Statement)
A chi-squared test examines whether the distribution of a categorical (discontinuous) variable is the same across two or more independent samples.
It is called “distribution-free” because it does not assume normality.
🧮 What Does It Actually Compare?
It compares:
If observed ≠ expected by a lot → χ² is large → reject H₀.
🩺 Clinical Examples
1️⃣ Smoking & Schizophrenia
Question:
> Is smoking status associated with schizophrenia?
Variables:
Null hypothesis:
> Smoking rates are the same in people with and without schizophrenia.
2️⃣ Drug Response (Responder vs Non-responder)
Question:
> Is response rate different between SSRI and SNRI?
Null hypothesis:
> Proportion of responders is the same for both drugs.
3️⃣ Side Effects by Sex
Question:
> Is weight gain associated with sex?
Null hypothesis:
> The proportion with weight gain is the same in males and females.
👶 10-Year-Old Explanation
Imagine two classrooms choose snacks:
You think boys and girls choose snacks equally.
The null hypothesis says:
> “There is no real difference. Any weird pattern happened by chance.”
You compare:
If they’re very different → probably not random.
That’s chi-squared.
🧠 When Do You Use It?
Use χ² when:
⚠️ Exam Pearl
If expected cell counts are very small (<5) → use Fisher’s Exact Test instead.
🧠 Memory Hook
χ² = “Compare the Counts”
If you’re counting people in boxes (Yes/No, Male/Female, Drug A/Drug B) → think chi-squared.
⚡ Ultra-Short Exam Summary
The chi-squared test is a non-parametric, distribution-free test that assesses whether the distribution of a categorical variable differs across two or more independent groups by comparing observed and expected frequencies.
Yes, but why is it called a Chi-Squared Test?
I love this question 😄
Let’s explain it like you’re 10.
🎲 Why Is It Called “Chi-Squared”?
First:
Chi (χ) is just a Greek letter.
It’s pronounced “kai” (like “sky”).
So “chi-squared” just means:
> “The chi number, squared.”
🧮 What Gets Squared?
When we do a chi-squared test, we:
1️⃣ Look at what we expected to happen
2️⃣ Look at what we actually saw
3️⃣ Find the difference
Then we:
👉 Square the difference
Why square it?
Because:
So if something is way off from expected, squaring makes it stand out.
🧁 10-Year-Old Example
Imagine you think:
> Half the class likes chocolate.
Half likes vanilla.
You expect:
But you actually see:
That’s a big difference!
You calculate:
Observed − Expected
Then square it
Big difference → big squared number → big chi-squared value.
🎯 Why the “Chi” Part?
When mathematicians worked out the math for these squared differences,
they discovered the final number follows a special curved shape.
They named that curve:
The chi-squared distribution
So the test is called:
> The test that uses the chi-squared distribution.
🧠 Super Simple Summary
It’s called chi-squared because:
🧠 Even Simpler Version
Chi-squared =
> “How big are the squared surprises?”
Big surprises → big chi-squared → something interesting is happening.
Compare a chi squared test and a Fisher’s exact test?
Great question — this is a classic exam comparison 🔥
Both tests examine associations between categorical variables, but they differ in how they calculate probability and when they should be used.
📊 What Do They Have in Common?
Both:
Example structure:
🧠 The Big Difference
(See comparison table)
🎯 When Do You Use Each?
✅ Use Chi-Squared when:
✅ Use Fisher’s Exact Test when:
🩺 Clinical Examples
1️⃣ Large Sample Example (Chi-Squared)
Study of 500 patients:
Is smoking associated with schizophrenia?
| | Schizophrenia | No Schizophrenia |
| ———- | ————- | —————- |
| Smoker | 120 | 180 |
| Non-smoker | 80 | 120 |
All cell counts are large → use chi-squared.
2️⃣ Small Sample Example (Fisher’s Exact)
Rare adverse event in a small RCT (n=20):
| | Seizure | No Seizure |
| ——- | ——- | ———- |
| Drug | 3 | 7 |
| Placebo | 0 | 10 |
One cell has 0 and expected counts are small → use Fisher’s exact test.
👶 10-Year-Old Explanation
Imagine you’re checking if boys and girls prefer chocolate.
If you ask 200 kids:
You can estimate the answer pretty well → chi-squared is fine.
But if you only ask 8 kids:
Your estimate might be shaky.
So Fisher’s test counts all possible ways the results could happen and calculates the exact chance.
Chi-squared = good estimate
Fisher = exact calculation
🧠 Why Is Fisher “Exact”?
Chi-squared uses a shortcut based on a curve.
Fisher:
That’s why it’s better for small numbers.
🔥 Exam Pearl
If the stem says:
> “Small sample size”
“Rare event”
“Expected cell count less than 5”
Choose Fisher’s exact test.
🧠 Memory Hook
Chi-squared = “Big crowd? Estimate is fine.”
Fisher = “Tiny crowd? Count every possibility.”
⚡ Ultra-Short Exam Summary
Chi-squared is an approximate test of association for categorical variables used in larger samples, while Fisher’s exact test calculates the exact probability of association and is preferred when expected cell counts are small.
| Outcome Yes | Outcome No |
| ———– | ———– | ———- |
| Exposed | | |
| Not Exposed | | |
What is a Fisher’s exact test?
🧪 Fisher’s Exact Test
🧠 What Is It?
Fisher’s exact test is a statistical test used to examine whether there is an association between two categorical variables, usually in a 2×2 table, when the sample size is small.
It tests the null hypothesis that:
> There is no association between the two variables
(i.e., the proportions are the same in both groups).
It is called “exact” because it calculates the exact probability of observing the data (or something more extreme), rather than using an approximation like the chi-squared test.
🎯 The Null Hypothesis (Very High-Yield)
The null hypothesis (H₀) states:
There is no association between the two categorical variables.
The proportions are the same in both groups.
In simple terms:
Any difference in proportions happened purely by chance.
Fisher’s exact test calculates the exact probability of observing the data (or something more extreme) assuming the null hypothesis is true.
If that probability (p-value) is very small → we reject the null hypothesis.
📊 When Do You Use It?
Use Fisher’s exact test when:
🩺 Clinical Examples
1️⃣ Rare Adverse Event in a Small Trial
A small RCT (n=20) studies whether a new antidepressant causes seizures.
Because:
→ Use Fisher’s exact test.
2️⃣ Rare Genetic Mutation & Disease
You examine whether a rare mutation is associated with a rare neurological disorder.
| | Mutation | No Mutation |
| ———- | ——– | ———– |
| Disease | 2 | 8 |
| No Disease | 1 | 19 |
Small numbers → Fisher’s exact test.
3️⃣ Case-Control Study with Small Sample
Investigating whether cannabis use is associated with psychosis in a small rural sample (n=30).
If some cells have small counts → use Fisher’s exact test.
🔬 How It Differs from Chi-Squared
(See comparison table)
👶 10-Year-Old Explanation
Imagine you flip a coin 200 times.
If something strange happens, you can estimate whether it’s weird.
But if you flip a coin only 4 times:
You can’t really estimate — you have to count all possible ways the flips could happen.
Fisher’s test does that counting.
It checks every possible way the table could look and calculates the exact chance.
🧠 Why It’s Important Clinically
In medicine:
are common.
Chi-squared can give misleading results with small numbers.
Fisher’s exact test is more accurate in those cases.
⚡ Ultra-Short Exam Summary
Fisher’s exact test is a non-parametric test used to determine whether two categorical variables are associated in small samples by calculating the exact probability of the observed distribution.
🧠** Memory Hook**
🎣 Fisher “catches the exact probability” when the sample is small.
If the crowd is tiny and the counts are small:
Don’t estimate — let Fisher count every fish in the pond.
Small sample
Exact probability
Tests association
Under the null of no difference
| | Seizure | No Seizure |
——- | ——- | ———- |
| Drug | 3 | 7 |
| Placebo | 0 | 10 |
What is the product moment correlation coefficient (aka ‘Pearson’s r’)?
(Pearson is like the parametric version of Spearman)
(Spearman is the non-parametric version of Pearson)
📈 Product–Moment Correlation Coefficient (Pearson’s r)
🧠 What Is It?
The product–moment correlation coefficient, better known as Pearson’s r, measures:
> The strength and direction of a linear relationship between two continuous variables.
It ranges from:
🎯 What Is the Null Hypothesis?
The null hypothesis (H₀) states:
> There is no linear correlation between the two variables in the population
(r = 0).
If the p-value is small → reject H₀ → there is evidence of a linear relationship.
📊 What Does “Product–Moment” Mean?
It sounds scary, but it just means:
You don’t need the formula for most clinical exams — just remember:
> Pearson’s r measures how closely two continuous variables move together in a straight-line pattern.
🩺 Clinical Examples
1️⃣ Depression Severity & Sleep Hours
Do higher HAM-D scores correlate with fewer hours of sleep?
If worse depression is linked to less sleep → r would be negative.
2️⃣ Lithium Level & Serum Creatinine
Is higher lithium concentration associated with higher creatinine?
Both continuous variables.
If higher lithium correlates with higher creatinine → r positive.
3️⃣ Age & Blood Pressure
Is increasing age associated with higher systolic BP?
Likely a positive correlation.
📊 Interpreting r
⚠️ Important Assumptions (Exam-Relevant)
Pearson’s r assumes:
If data are not normal or relationship isn’t linear → use Spearman’s rank correlation instead.
👶 10-Year-Old Explanation
Imagine:
You measure:
Usually:
Taller kids weigh more.
If they go up together → positive correlation.
If one goes up and the other goes down → negative correlation.
If there’s no pattern → correlation near zero.
Pearson’s r is just a number that tells you:
> “How strongly do these two things move together in a straight line?”
🧠 Memory Hook
📏 Pearson = “Pair of continuous variables moving together.”
If:
Think: Pearson’s r
⚡ Ultra-Short Exam Summary
Pearson’s r is a parametric measure of the strength and direction of the linear relationship between two continuous variables, testing the null hypothesis that there is no correlation (r = 0).
NB: CORRELATION DOES NOT MEAN CAUSATION
| r value | Meaning |
——- | —————————- |
| +0.8 | Strong positive relationship |
| +0.4 | Moderate positive |
| 0 | No linear relationship |
| –0.4 | Moderate negative |
| –0.8 | Strong negative |
Compare Pearson’s product-moment correlation coefficent and Spearman’s rank correlation coefficient?
Perfect — this is a very exam-favourite comparison 🔥
Let’s make it crystal clear, clinically grounded, and very memorable.
📊 Pearson vs Spearman — Clear Comparison
(see attached table)
🧠 What’s the Real Difference?
Pearson asks:
> “Do these two continuous variables move together in a straight line?”
Spearman asks:
> “If one goes up, does the other generally go up too?”
(Even if not perfectly linear)
🩺 Clinical Examples
🧪 Example 1: Depression Score & Sleep Hours
If the relationship looks roughly straight-line → use Pearson.
If the relationship is curved or skewed → use Spearman.
🧪 Example 2: Pain Severity (Likert 1–5) & Disability Score
Pain scale is ordinal.
→ Use Spearman.
🧪 Example 3: Lithium Level & Creatinine
Both continuous and roughly normal → Pearson.
If extreme outliers present → Spearman safer.
🎯 Exam Logic Shortcut
Two continuous variables + normal + linear → Pearson
Ordinal OR skewed OR outliers → Spearman
👶 10-Year-Old Explanation
📏 Pearson
Imagine drawing a straight line through dots on a graph.
Pearson asks:
> “How straight is this line?”
🏆 Spearman
Imagine lining kids up from shortest to tallest.
Then lining them up by weight.
Spearman asks:
> “Do the same kids stay near the top in both lines?”
It doesn’t care about exact numbers — just order.
🧠 Memory Hooks (Very Different + Logical)
📏 Pearson Memory Hook
“Pear-son = Perfectly Straight Person”
Pearson only likes:
If the graph isn’t straight, Pearson gets upset.
Trigger phrase:
> “Straight line? Call Pearson.”
🏆 Spearman Memory Hook
“Spear-man Throws Away the Numbers and Keeps the Ranks”
Imagine a warrior named Spearman 🛡️
He throws away messy numbers
Keeps only the rankings
He doesn’t care if the line curves —
As long as higher ranks go with higher ranks.
Trigger phrase:
> “Messy data? Rank it with Spearman.”
🔥 Key Concept: Linear vs Monotonic
Linear → straight line
Monotonic → always going up or always going down (even if curved)
Pearson needs linear.
Spearman just needs monotonic.
🧠 Ultra-Clean Exam Summary
Pearson’s r is a parametric measure of linear correlation between two continuous variables, while Spearman’s rho is a non-parametric rank-based measure of monotonic association used when data are ordinal, skewed, or non-linear.
This is excellent. We are upgrading the memory palace. 😄
Let’s make this unforgettable.
🍐📏 Pearson vs 🛡️🏹 Spearman — The Ultimate Mental Picture
🎉 Scene 1: Pearson the Pear Pansy
You walk into a garden party.
Standing there is Pearson.
He is:
He says in a dramatic voice:
> “If it’s not perfectly straight, I refuse to measure it.”
He only likes:
If one datapoint sticks way out?
He faints into a bowl of pear custard.
🍐 What Pearson Stands For
📏 Pear-son = Pear + Straight ruler
If it’s not straight, he cannot cope.
🛡️🏹 Scene 2: Spearman the Rugged Warrior
Now imagine a mountain path.
Enter Spearman.
He is:
The path is curved.
It twists.
It’s not straight at all.
Spearman says:
> “Doesn’t matter. Are we generally going uphill? Good enough.”
He lines everyone up by rank.
He doesn’t care about:
Outlier off in the bushes?
He shrugs:
> “Still last in rank.”
🛡️ What Spearman Stands For
🏹 Spear-man = Rugged, flexible, rank-focused
🧠 Lock It In
Picture this:
🍐 Pearson in a pear costume with a ruler crying because the graph is curved.
🏹 Spearman hiking a mountain saying,
“As long as it goes up overall, we’re fine.”
⚡ Ultra-Fast Recall Rule
Straight, clean, continuous → 🍐 Pearson
Curved, messy, ranked → 🛡️ Spearman
If you’d like, I can now create a similar absurd memory pair for:
We can build an entire statistical Avengers team 😄
What is regression by the least squares method?
📈 Regression by the Least Squares Method
🧠 What Is It?
Regression by the least squares method is a statistical technique used to find the best-fitting straight line that predicts one continuous variable from another.
It is most commonly called:
> Simple linear regression
The “least squares” part means:
> We choose the line that makes the sum of the squared errors as small as possible.
📏 What Does That Mean?
Imagine plotting points on a graph.
Some dots are above the line.
Some are below.
Each dot has an error (distance from the line).
We:
1️⃣ Measure each vertical distance
2️⃣ Square it (so negatives don’t cancel positives)
3️⃣ Add them all up
The best line is the one with the smallest total squared error.
That’s why it’s called least squares.
🧪 Is It Parametric?
✅ Yes — linear regression is considered a parametric method.
Why?
Because it assumes:
It estimates parameters:
That’s what makes it parametric.
📊 What Does It Actually Do?
Regression answers:
> “How much does Y change when X changes?”
Equation:
[
Y = \beta_0 + \beta_1 X
]
🩺 Clinical Examples
1️⃣ Age Predicting Blood Pressure
Question:
> How much does systolic BP increase per year of age?
Age = predictor (X)
Blood pressure = outcome (Y)
Regression estimates the slope.
2️⃣ Lithium Level Predicting Creatinine
Question:
> Does higher lithium level predict higher creatinine?
Lithium = X
Creatinine = Y
Regression tells you how much creatinine increases per unit lithium.
3️⃣ Depression Severity Predicting Functioning
HAM-D score predicting work functioning score.
Regression gives a predictive relationship.
🔥 Correlation vs Regression
Correlation (Pearson’s r):
Regression:
👶 10-Year-Old Explanation
Imagine you’re throwing darts at a board 🎯
Each dart lands somewhere.
Now you draw a straight line that goes through the middle of all the darts.
Some darts are above.
Some are below.
You measure how far each dart is from the line.
Then you:
You move the line until those squared distances are as small as possible.
That’s the “least squares” line.
🧠 Even Simpler
It’s just:
> “The straight line that misses the points by the smallest amount overall.”
🎯 What Does the Null Hypothesis Say?
The null hypothesis:
> The slope (β₁) = 0
There is no relationship between X and Y.
If slope ≠ 0 significantly → evidence of association.
🧠 Memory Hook
📏 Least Squares = “Smallest Total Squared Mistakes”
Regression = “Predicting with the best straight line.”
If you’re drawing a line through data and minimizing squared errors → think regression.
⚡ Ultra-Short Exam Summary
Regression by the least squares method is a parametric technique that fits the best linear prediction line between a continuous predictor and outcome by minimizing the sum of squared residuals, testing the null hypothesis that the slope equals zero.
Compare regression vs. correlation.
Excellent — this is a high-yield conceptual distinction 🔥
Many people confuse these, but they answer different questions.
📊 Correlation vs Regression — Clear Comparison
🧠 Core Difference in One Line
🩺 Clinical Examples
1️⃣ Depression & Sleep
You measure:
Correlation:
> Are worse depression scores associated with fewer hours of sleep?
You get r = –0.6 → strong negative relationship.
Regression:
> For every 1-point increase in HAM-D, how many fewer hours of sleep do we expect?
You get:
[
Sleep = 8 - 0.2(HAM\text{-}D)
]
That gives you a predictive formula.
2️⃣ Lithium & Creatinine
Correlation:
Are higher lithium levels associated with higher creatinine?
Regression:
How much does creatinine increase per mmol/L increase in lithium?
Regression gives you a clinically meaningful slope.
🔬 Conceptual Difference
Correlation:
Regression:
👶 10-Year-Old Explanation
Imagine height and weight.
Correlation says:
> “Do tall kids usually weigh more?”
Regression says:
> “If a kid grows 10 cm taller, how much heavier will they probably be?”
Correlation = Are they related?
Regression = How much change happens?
🧠 Memory Hook
💞 Correlation = “Are they dancing together?”
📏 Regression = “How much does one pull the other?”
Or even simpler:
Correlation = friendship
Regression = prediction
⚡ Ultra-Short Exam Summary
Correlation measures the strength and direction of association between two variables, whereas regression estimates the magnitude of change in an outcome variable per unit change in a predictor variable and provides a predictive equation.
Feature | Correlation (Pearson’s r) | Regression (Least Squares) |
| ————— | ——————————————— | ———————————- |
| Purpose | Measures strength & direction of relationship | Predicts one variable from another |
| Directional? | ❌ No (symmetrical) | ✅ Yes (X predicts Y) |
| Output | r (–1 to +1) | Equation: Y = β₀ + β₁X |
| Units | No units (standardised) | Has units (slope meaningful) |
| Null hypothesis | r = 0 | β₁ = 0 (slope = 0) |
| Swapping X & Y | Same result | Different equation |
What is non-parametric regression?
Great — now we’re stepping into slightly more advanced territory 👌
Let’s make it clear, structured, and clinically grounded.
📈 What Is Non-Parametric Regression?
🧠 Simple Definition
Non-parametric regression is a way of modelling the relationship between variables without assuming a specific functional form (like a straight line) and without strict distributional assumptions (e.g., normality).
Unlike linear regression, it does not assume:
Instead, it lets the data “speak for themselves.”
🔬 How Is It Different from Linear Regression?
So:
Linear regression = “Fit the best straight line.”
Non-parametric regression = “Fit the best curve without assuming its shape.”
🩺 Clinical Example 1: Age & Cognitive Decline
Suppose cognitive score declines:
That relationship is curved, not straight.
A linear regression would misrepresent it.
Non-parametric regression can model the curve naturally.
🩺 Clinical Example 2: Drug Dose & Side Effects
At low doses:
Mid doses:
High doses:
This is not linear.
Non-parametric regression can model that shape without forcing a straight line.
🩺 Clinical Example 3: Stress & Performance
The relationship might be:
Low stress → poor performance
Moderate stress → peak performance
High stress → poor performance
That’s an inverted U-shape.
Linear regression cannot capture that well.
Non-parametric regression can.
📊 Common Non-Parametric Regression Methods
These are not always tested at basic clinical level, but good to know:
1️⃣ LOESS (Locally Weighted Scatterplot Smoothing)
2️⃣ Kernel Regression
3️⃣ Spline Regression (when flexible knots used)
4️⃣ Rank-Based Regression (less common clinically)
⚠️ Important Clarification
Do not confuse:
With:
They answer different questions.
Regression = prediction/modelling
Correlation = association strength
Group tests = compare distributions
👶 10-Year-Old Explanation
Imagine you’re drawing a road through a bunch of dots.
Linear regression says:
> “The road must be straight.”
Non-parametric regression says:
> “The road can curve and bend however it needs to follow the dots.”
It doesn’t assume the road’s shape ahead of time.
🧠 When Would You Use It?
Use non-parametric regression when:
🎯 Exam-Ready Summary
Non-parametric regression refers to flexible modelling methods that estimate relationships between variables without assuming a specific parametric form (e.g., linearity or normal residuals), allowing the data to determine the shape of the association.
🧠 Memory Hook
Linear regression = “Ruler.” 📏
Non-parametric regression = “Flexible garden hose.” 🪴
If the data bend, the hose bends with them.
Feature | Linear Regression | Non-Parametric Regression |
| ———————- | —————– | ————————- |
| Assumes straight line? | ✅ Yes | ❌ No |
| Assumes normal errors? | ✅ Yes | ❌ Not required |
| Fits fixed equation? | Y = β₀ + β₁X | Flexible curve |
| Parametric? | Yes | No |
What is the difference between simple and multiple regression?
Great question — this is foundational and very testable 🔥
Let’s make it clean, structured, and intuitive.
📊 Simple vs Multiple Regression
🧠 Core Difference in One Line
Both usually refer to linear regression unless otherwise specified.
📈 1️⃣ Simple Linear Regression
Structure
[
Y = \beta_0 + \beta_1 X
]
What It Asks
> “How much does Y change when X changes?”
🩺 Clinical Example
Does age predict systolic blood pressure?
You estimate how much BP increases per year of age.
📈 2️⃣ Multiple Linear Regression
Structure
[
Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3
]
What It Asks
> “How much does each predictor affect Y, while holding the others constant?”
This is the key difference.
🩺 Clinical Example
Predicting depression severity using:
Multiple regression lets you estimate:
This helps control for confounding.
🔬 Why Multiple Regression Is Powerful
It allows:
Most clinical research uses multiple regression.
🎯 Key Concept: “Holding Other Variables Constant”
Simple regression cannot separate overlapping effects.
Multiple regression can answer:
> “Is smoking still associated with lung cancer after adjusting for age?”
👶 10-Year-Old Explanation
Simple Regression
You ask:
> “Does studying more increase test scores?”
One cause. One result.
Multiple Regression
Now you ask:
> “Do studying, sleep, and eating breakfast affect test scores?”
Now you check each one separately
while keeping the others the same.
🧠 Clean Comparison Table
🧠 Memory Hook
Simple regression = “One arrow.” ➡️
Multiple regression = “Control panel with many knobs.” 🎛️
You adjust one knob at a time to see its independent effect.
⚡ Ultra-Short Exam Summary
Simple regression examines the relationship between one predictor and one outcome, whereas multiple regression estimates the independent effects of two or more predictors on a single outcome while controlling for other variables.
Feature | Simple Regression | Multiple Regression |
| ————————– | —————– | ———————— |
| Number of predictors | 1 | 2 or more |
| Controls for confounding? | No | Yes |
| More realistic clinically? | Less | More |
| Equation | Y = β₀ + β₁X | Y = β₀ + β₁X₁ + β₂X₂ + … |
What is multiple regression by least squares method?
Excellent — now we’re combining concepts properly 🔥
You already know:
Now let’s put them together clearly.
📈 Multiple Regression by the Least Squares Method
🧠 What Is It?
Multiple regression by the least squares method is a parametric statistical technique that:
> Models the relationship between one continuous outcome variable and two or more predictor variables
by choosing the regression equation that minimizes the sum of squared residuals (errors).
📐 The Equation
[
Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \dots
]
Where:
The least squares method finds the β values that make the total squared error as small as possible.
🎯 What Does It Test?
For each predictor:
Null hypothesis:
[
\beta_i = 0
]
Meaning:
> That predictor has no independent effect on the outcome.
If β ≠ 0 significantly → independent association exists.
🔬 Why “Least Squares”?
For every observation:
1️⃣ Calculate predicted Y
2️⃣ Calculate error (Observed − Predicted)
3️⃣ Square it
4️⃣ Add all squared errors
The best model = smallest total squared error.
🧪 Is It Parametric?
✅ Yes.
It assumes:
🩺 Clinical Examples
1️⃣ Predicting Blood Pressure
Outcome: Systolic BP
Predictors:
Question:
> How much does BP increase with age, controlling for BMI and smoking?
Each β tells you the independent contribution.
2️⃣ Predicting Depression Severity
Outcome:
Predictors:
You can estimate the independent effect of trauma while controlling for sleep and substance use.
3️⃣ Predicting Length of Hospital Stay
Outcome:
Predictors:
This models multiple contributing factors at once.
🧠 Why It’s Important Clinically
Multiple regression:
Most modern medical research uses this.
👶 10-Year-Old Explanation
Imagine you’re trying to guess how tall someone will be.
You look at:
Instead of using just one thing,
you use all three at the same time.
You draw a “best fit” rule that makes your mistakes as small as possible.
That’s multiple regression using least squares.
🧠 Simple Mental Picture
Simple regression = one lever controlling a machine
Multiple regression = many levers controlling one machine
Least squares = adjust the levers until total mistake is smallest
⚡ Ultra-Short Exam Summary
Multiple regression by the least squares method is a parametric technique that estimates the independent effects of two or more predictors on a continuous outcome by minimizing the sum of squared residuals, testing whether each regression coefficient differs from zero.
What is non-parametric multiple regression ? What tests test these? (Advanced)
Excellent — this is more advanced, so I’ll make it clear and structured 👌
📈 What Is Non-Parametric Multiple Regression?
🧠 Simple Definition
Non-parametric multiple regression models the relationship between:
Without assuming:
Instead of fitting a straight plane (like linear regression does), it allows the relationship to be flexible and data-driven.
🔬 How It Differs from Parametric Multiple Regression
Parametric = “Fit the best flat surface.”
Non-parametric = “Let the surface bend naturally.”
📊 Common Non-Parametric Multiple Regression Methods
These are modelling techniques rather than single “tests.”
1️⃣ LOESS / LOWESS (Locally Weighted Regression)
2️⃣ Kernel Regression
3️⃣ Spline Regression (when highly flexible)
4️⃣ Generalised Additive Models (GAMs)
5️⃣ Decision Tree Regression / Random Forest Regression
🩺 Clinical Examples
1️⃣ Age, BMI & Cognitive Decline
Outcome:
Predictors:
Suppose:
A non-parametric model can capture those curves without forcing linearity.
2️⃣ Drug Dose, Renal Function & Side Effects
Outcome:
Predictors:
The relationship may:
Non-parametric regression models that curve.
3️⃣ ICU Mortality Prediction
Outcome:
Predictors:
Relationships are rarely linear — tree-based or GAM models often perform better.
👶 10-Year-Old Explanation
Imagine you’re drawing a road through dots again.
Parametric regression says:
> “The road must be flat and straight.”
Non-parametric multiple regression says:
> “The road can twist, turn, and curve to follow the dots — even in 3D.”
It doesn’t decide the shape first.
It figures it out from the data.
🧠 When Would You Use It?
Use non-parametric multiple regression when:
Often used in:
🎯 Important Clarification
This is different from:
Those compare groups or ranks.
Non-parametric regression is about prediction/modelling.
⚡ Ultra-Short Exam Summary
Non-parametric multiple regression refers to flexible modelling techniques that estimate relationships between one outcome and multiple predictors without assuming a fixed linear form or normal residuals, allowing complex, data-driven associations.
🧠 Memory Hook
Parametric regression = 📏 rigid table
Non-parametric regression = 🧵 stretchy fabric that molds around the data
If the real world bends, the model should bend too.
Feature | Parametric Multiple Regression | Non-Parametric Multiple Regression |
| —————————- | —————————— | ———————————- |
| Assumes linear relationship? | ✅ Yes | ❌ No |
| Assumes normal residuals? | ✅ Yes | ❌ Not required |
| Model form fixed? | Yes (Y = β₀ + β₁X₁ + β₂X₂…) | No fixed form |
| Estimates β coefficients? | Yes | Not necessarily in simple form |
| Flexibility | Low–moderate | High |