Lecture 12: Statistical Hypotheses Testing: Flashcards by im dying Unknown

What is evidence?

Evidence refers to information, facts, or data that support or challenge a claim, prediction, assumption, or hypothesis

How well did you know this?

Not at all

Perfectly

The probability of observing 18 heads out of 20 flips purely by chance (if the coin were fair) is 0.0004025: what do we refer to this as?

In statistics, we refer to this probability as the value predicted by the model that assumes the coin is fair.
This is a long-run prediction because you didn’t flip the coin 3 billion times, so it’s a long run expectation of a variable.

How well did you know this?

Not at all

Perfectly

What occurs to the probability under the assumption of a fair coin as the sample size increases?

The evidence against the fair-coin assumption becomes stronger for large samples.
This illustrates a key principle of statistical inference: sample size strongly influences how unusual an observed result appears under a given assumption (model).

How well did you know this?

Not at all

Perfectly

We refer to this probability as the value predicted by the model that assumes the coin is fair. What is this model called?

This model is called the null hypothesis (H0) because it makes explicit that a specific mechanism must be assumed (e.g., a fair coin) in order to calculate probabilities.

How well did you know this?

Not at all

Perfectly

When is STRONG statistical evidence generated?

STRONG statistical evidence is generated when the observed data would be very unlikely under the assumption being tested (model /H0), indicating that the assumption is inconsistent with what was observed.

How well did you know this?

Not at all

Perfectly

When is WEAK statistical evidence generated?

WEAK statistical evidence is generated when the observed data are somewhat unlikely under the assumption being tested (model /H0), suggesting limited inconsistency with what was expected under that assumption

How well did you know this?

Not at all

Perfectly

When is NO statistical evidence against the assumption generated?

NO statistical evidence against the assumption is generated when the observed data are reasonably likely under the assumption being tested (model /H0), indicating that the data are consistent with what was expected.

How well did you know this?

Not at all

Perfectly

Humans are predominantly right-handed. Do other animals exhibit handedness as well? Bisazza et al. (1996) tested this possibility on the common toad.
- They randomly sampled 18 wild toads, placed a balloon over each one’s head, and recorded which forelimb the toads used to remove it to determine their preferred limb.
- RESULTS: 14 toads were right-handed and four were left-handed. Do these results provide sufficient evidence to demonstrate handedness in toads?

Weak evidence: frogs and humans may be the only species that exhibit handedness, but there are many species that haven’t been tested so you cannot conclude the hypothesis that animals exhibit handedness as humans do without testing other species.

How well did you know this?

Not at all

Perfectly

What is a research hypothesis?!

A hypothesis is a supposition or proposed explanation made based on limited evidence as a starting point for further investigation (Oxford dictionary);
- e.g., “animals, other than humans, also have a preferred limb (handedness)”.

How well did you know this?

Not at all

Perfectly

What can’t hypotheses be described as?

Hypotheses cannot be definitively proven true or false based on a single dataset. They can only be described as supported or not supported by the data at hand, and they always remain open to revision or refutation in light of future evidence.

How well did you know this?

Not at all

Perfectly

What is inference based on and what does this mean about the conclusion?

Inference is based on limited sample data rather than the entire population.
–> As such, conclusions are always conditional on the evidence observed and therefore cannot establish absolute truth.

How well did you know this?

Not at all

Perfectly

What is the statistical hypothesis framework?

The statistical hypothesis framework (most often involving statistical testing) is a quantitative method of statistical inference that allows to generate evidence for or against a hypothesis.

How well did you know this?

Not at all

Perfectly

In the frequentist framework, what is the statistical question stated as?

In the frequentist framework, the statistical question is then stated as two mutually exclusive hypotheses called null hypothesis (H0) and alternative hypothesis (HA).

How well did you know this?

Not at all

Perfectly

What occurs in the frequentist framework of inference?

In the frequentist framework of inference, we typically compute a probability value (p-value) that quantifies how compatible the observed data are with a specified null hypothesis, thereby providing a measure of evidence against that hypothesis (e.g., testing for handedness in toads).

How well did you know this?

Not at all

Perfectly

What is the null hypothesis?

Represents a specific assumption about the population parameter (often reflecting no effect or no difference).

How well did you know this?

Not at all

Perfectly

What is the alternative hypothesis?

Study These Flashcards

Alternative Hypothesis: represents a competing claim suggesting that the parameter differs from that assumption.

Why is it called the “null” hypothesis?

Study These Flashcards

It is called “null” because it represents a baseline or default assumption; typically, that there is no effect, no difference, or no deviation from some reference value.
It assumes that any observed differences or patterns arise purely from random variation rather than from a real underlying effect.

Why is resampling or sampling with replacement important?

Study These Flashcards

*Resampling (sampling with replacement) is important because it ensures that each selection of an observational unit (e.g., a piece of paper) is independent of the others

What is the proper grammar in frequentist hypothesis testing in the following sentence and why?
“Under H₀” vs “for H₀ ”

Study These Flashcards

“Under H₀ ” means assuming the null hypothesis is true and describing what the distribution of data (samples) would look like in that hypothetical world. “Under” is the correct phrasing in frequentist hypothesis testing.
“For H₀ ” sounds like we are arguing in favour of or supporting the null hypothesis, which frequentist tests do not do.

What is the p-value?

Study These Flashcards

The p-value is the probability, calculated under the assumed null hypothesis (H₀), of observing a value of the test statistic (θ) as extreme as, or more extreme than, the one actually observed.

What is the Frequentist Hypothesis-Testing Framework?

Study These Flashcards

Statistical hypothesis testing is a quantitative inference framework.
It evaluates how compatible the data are with an assumed model.
That model is the null hypothesis (H₀).
Core idea: we evaluate how surprising the observed data would be if the null hypothesis (H₀) were true.
–> P-value is a “metric of fit” with the model

What do we NOT do in frequentist inference?

Study These Flashcards

In frequentist inference, we do not prove that a model is correct; we evaluate how incompatible the observed data are with the model (null hypothesis).

When do we reject the null hypothesis?

Study These Flashcards

We test the null hypothesis (or null model) directly. If the observed data are highly incompatible with it, we reject the null and regard the alternative hypothesis as MORE PLAUSIBLE in light of the evidence.

Even if the null hypothesis is rejected, what does this mean about the alternative hypothesis?

Study These Flashcards

However, this does not mean the alternative hypothesis has been proven true; it simply means the null model does not adequately explain the data.

What does the p-value quantify?

The p-value quantifies that consistency: small values indicate greater inconsistency with H₀, whereas large values indicate that the observed data are reasonably consistent with H₀ (though they do not prove it is true).

Statistical tests (via their p-values) measure how surprising the observed data are under that assumption (i.e., detect inconsistency): What does high surprise and low surprise indicate?

1) High surprise (small p-value) → evidence against H₀ 2) Low surprise (large p-value) → no evidence against H₀

What does a p-value of 0.031 indicate?

High surprise under H₀: It indicates that the observed data would occur about 3.1% of the time if the null model were true. - That is relatively uncommon, and under conventional thresholds (like 0.05), it is considered sufficiently incompatible with the null to justify rejecting it. - While this provides evidence against H0, the p-value is not extremely small, so the evidence can be considered moderate rather than overwhelming

What is the difference between statistical and research hypotheses?

Statistical hypotheses are tools; research hypotheses are the goal.

How does one decide on when to reject the null hypothesis?

The significance level, denoted by 𝛼 (alpha), is the threshold we set before analyzing the data to decide how much incompatibility with the null model we are willing to tolerate before rejecting it.

What is 𝛼 (alpha) in hypothesis testing?

The significance level: It is a chosen threshold that reflects how cautious we want to be about claiming evidence against the null model.

What is the typical threshold level (alpha) in biology?

Biology usually uses 0.05 or 0.01 – tradition and consistency.

What do p-values quantify?

p-values quantify surprise under an assumption.

What does the p-value NOT represent?

CRITICAL: the p-value does not represent the probability that the null hypothesis (H₀) is true. - Instead, it is a quantitative measure indicating the strength of evidence against H₀.

What does a small p-value suggest?

A smaller p-value suggests stronger evidence against the null hypothesis

What are the 4 don’ts about P values and statistical hypothesis testing (Wasserstein et al. 2019)?

1) P-values can indicate how incompatible the observed data are with a specified statistical model (e.g., the one assumed under H0). 2) P-values do not measure the probability that the studied research hypothesis is true. 3) Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold (alpha). 4) A p-value, or statistical significance, does not measure the biological importance of a result.

What is recommended because of the limitations of the p-value?

Despite the limitations of p-values, we are not recommending that the calculation and use of p-values be discontinued. Where p-values are used, they should be reported as continuous quantities (e.g., p = 0.08) and not yes/no reject the null hypothesis

Lecture 12: Statistical Hypotheses Testing: Flashcards

(36 cards)