Population
The collection of units to which we want to generalize a set of findings or a statistical model.
(i.e. people, plankton, plants, cities, suicidal authors, etc.)
Sample
A smaller (but hopefully representative) collection of units from a population used to determine truths about that population.
The mean is a model of
what happens in the real world: the typical score.
It is not a perfect representation of the data.
A deviation is…
the difference between the mean and an actual data point.
Sum of Squared Errors
Variance
SS / (N-1)
The variance has one problem:
It is measured in units squared. So difficult to interpret.
The standard deviation
Since the variance is measured in units squared. We take the square root to make it a meaningful metric.
The sum of squares, variance, and standard deviation represent the same thing:
Central Limit Theorem
The distribution of the sample means will be approximately normally distributed
Central Limit Theorem
How can we measure the accuracy of this average?
—> approximation = standard error
Test Statistics
Type I error
Type II error
Examples type 1/2 error
Type 1: we believe pregnancy is there, but it’s actually not there
Type 2: we believe pregnancy is not there, but there’s actually a pregnancy present
Type 1: covid test, you think you have it, but you don’t
Type 2: covid test, you think you don’t have it, but you do
What Does Statistical Significance Tell Us?
The importance of an effect?
No, significance depends on sample size. When doing tests, you should always aim at interpreting the effect size as well.
What Does Statistical Significance Tell Us?
That the null hypothesis is false?
No, it is always false.
What Does Statistical Significance Tell Us?
That the null hypothesis is true?
No, it is never true.
Assessing Normality
We don’t have access to the sampling distribution so we usually test the observed data
Shapiro-Wilk Test
Shapiro-Wilk test for exam and numeracy for whole sample