Mean
The “average” number; found by adding all data points and dividing by the total number of data points
Median
The middle number; found by ordering all data points and picking out the one in the middle
If there is an even number of data points, pick out two middle numbers and take the mean of those two numbers
Mode
The number that occurs most frequently
What happens when an outlier is removed from the data set?
If a high outlier is removed, the mean will decrease
If a low outlier is removed, the mean will increase
The median will remain the same
Standard deviation
A statistical measure of data’s spread/dispersion from its mean
A low standard deviation indicates that data points are close to the average thus the dataset is consistent
A high value indicates that the data points are more spread out from the average, suggesting greater variability
How to calculate standard deviation?
The median and IQR are _________ because extreme data points have little effect on their values
Robust estimates
In a symmetric distribution, the mean is _____ the median
Equal to
In a positively skewed (left skewed) distribution, the mean is _____ the median
Greater than
In a negatively skewed (right skewed) distribution, the mean is _____ the median
Less than
What are measures of central tendency?
Mean, median, mode
When might the mean be a better measure of central tendency than the median?
When it is desirable that all observations are taken into account through inclusion in the calculation
Why is the median a better measure for central tendency than the mean?
Because it is less likely to be effected when a dataset contains extreme values
A distribution on the histogram with one prominent peak (which represents the most frequent data point)
Unimodal
A distribution on the histogram with two prominent peaks (which represents the most frequent data point)
Bimodal
A distribution on the histogram with multiple prominent peaks (which represents the most frequent data point)
Multimodal
What are shown on a boxplot?
Median, first quartile, third quartile, whiskers (capturing the data that fall between Q1-1.5xIQR and Q3+1.5xIQR; the whiskers must end at actual data points), and extreme values
Interquartile Range (IQR)
A measure of the spread of the middle 50% of a dataset
How to find an IQR:
1. Find the median (Q2)
2. Find the first quartile (Q1): is the median of the data points below Q2
3. Find the third quartile (Q3): is the median of the data points above Q2
4. IQR = Q3 - Q1
If a distribution is symmetric, we can apply the _____ rule
Empirical rule (68-95-99.7 rule)
Empirical rule
In a symmetrical distribution:
68% of data falls within one SD (mean +1 and - 1 SD)
95% of data falls within two SDs (mean +2 and - 2 SD)
99.7% of data falls within three SDs (mean +3 and - 3 SD)