Define distributions of data
The manner in which data for a particular variable is spread over its range. Commonly visualised by a histogram.
What is the problem with skewed distributions?
Mean is distorted by tails
What is the problem with bimodal distributions?
Mean not representative since two distinct populations are identified
What properties are specified by a normal distribution shape?
Mean (centre peak) and standard deviation (spread)
What is the formula to calculate a z-score?
z = x - population mean/population sd
What do z-scores tell us?
How many SDs a datapoint is from the mean
How do you calculate probability of selecting someone above/below a specific datapoint in a normal distribution?
1) Calculate z-score for datapoint
2) Look up the associated p-value in z-score table (tells us the area)
How do you calculate the standard error (SD of the sampling distribution of the mean)
SD of parent population/SQRT of sample size
What does the central limit theorem state?
Given a population with a mean and SD, the sampling distribution of the mean approaches a normal distribution with mean and SD as the sample size increases.
How do you calculate a z-score for a sample mean?
Sample mean - population mean / standard error