Lession 2: Data Distributions Flashcards

(20 cards)

1
Q

Mean

A

The “average” number; found by adding all data points and dividing by the total number of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Median

A

The middle number; found by ordering all data points and picking out the one in the middle

If there is an even number of data points, pick out two middle numbers and take the mean of those two numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mode

A

The number that occurs most frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens when an outlier is removed from the data set?

A

If a high outlier is removed, the mean will decrease

If a low outlier is removed, the mean will increase

The median will remain the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Standard deviation

A

A statistical measure of data’s spread/dispersion from its mean

A low standard deviation indicates that data points are close to the average thus the dataset is consistent

A high value indicates that the data points are more spread out from the average, suggesting greater variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to calculate standard deviation?

A
  1. Find the mean
  2. Calculate the deviations: for each data point, subtract the mean from the value
  3. Square the deviations: square each of the differences found in the previous step
  4. Find the variance: sum the squared deviations and divide it by n-1
  5. Take the square root of the result from step 5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The median and IQR are _________ because extreme data points have little effect on their values

A

Robust estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a symmetric distribution, the mean is _____ the median

A

Equal to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a positively skewed (left skewed) distribution, the mean is _____ the median

A

Greater than

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In a negatively skewed (right skewed) distribution, the mean is _____ the median

A

Less than

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are measures of central tendency?

A

Mean, median, mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When might the mean be a better measure of central tendency than the median?

A

When it is desirable that all observations are taken into account through inclusion in the calculation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is the median a better measure for central tendency than the mean?

A

Because it is less likely to be effected when a dataset contains extreme values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A distribution on the histogram with one prominent peak (which represents the most frequent data point)

A

Unimodal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A distribution on the histogram with two prominent peaks (which represents the most frequent data point)

A

Bimodal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A distribution on the histogram with multiple prominent peaks (which represents the most frequent data point)

17
Q

What are shown on a boxplot?

A

Median, first quartile, third quartile, whiskers (capturing the data that fall between Q1-1.5xIQR and Q3+1.5xIQR; the whiskers must end at actual data points), and extreme values

18
Q

Interquartile Range (IQR)

A

A measure of the spread of the middle 50% of a dataset

How to find an IQR:
1. Find the median (Q2)
2. Find the first quartile (Q1): is the median of the data points below Q2
3. Find the third quartile (Q3): is the median of the data points above Q2
4. IQR = Q3 - Q1

19
Q

If a distribution is symmetric, we can apply the _____ rule

A

Empirical rule (68-95-99.7 rule)

20
Q

Empirical rule

A

In a symmetrical distribution:

68% of data falls within one SD (mean +1 and - 1 SD)

95% of data falls within two SDs (mean +2 and - 2 SD)

99.7% of data falls within three SDs (mean +3 and - 3 SD)