Lession 2: Data Distributions Flashcards

Question 1

Q

Mean

Answer

A

The “average” number; found by adding all data points and dividing by the total number of data points

Question 2

Q

Median

Answer

A

The middle number; found by ordering all data points and picking out the one in the middle

If there is an even number of data points, pick out two middle numbers and take the mean of those two numbers

Question 3

Q

Mode

Answer

A

The number that occurs most frequently

Question 4

Q

What happens when an outlier is removed from the data set?

Answer

A

If a high outlier is removed, the mean will decrease

If a low outlier is removed, the mean will increase

The median will remain the same

Question 5

Q

Standard deviation

Answer

A

A statistical measure of data’s spread/dispersion from its mean

A low standard deviation indicates that data points are close to the average thus the dataset is consistent

A high value indicates that the data points are more spread out from the average, suggesting greater variability

Question 6

Q

How to calculate standard deviation?

Answer

A

Find the mean
Calculate the deviations: for each data point, subtract the mean from the value
Square the deviations: square each of the differences found in the previous step
Find the variance: sum the squared deviations and divide it by n-1
Take the square root of the result from step 5

Question 7

Q

The median and IQR are _________ because extreme data points have little effect on their values

Answer

A

Robust estimates

Question 8

Q

In a symmetric distribution, the mean is _____ the median

Question 9

Q

In a positively skewed (left skewed) distribution, the mean is _____ the median

Answer

A

Greater than

Question 10

Q

In a negatively skewed (right skewed) distribution, the mean is _____ the median

Answer

A

Less than

Question 11

Q

What are measures of central tendency?

Answer

A

Mean, median, mode

Question 12

Q

When might the mean be a better measure of central tendency than the median?

Answer

A

When it is desirable that all observations are taken into account through inclusion in the calculation

Question 13

Q

Why is the median a better measure for central tendency than the mean?

Answer

A

Because it is less likely to be effected when a dataset contains extreme values

Question 14

Q

A distribution on the histogram with one prominent peak (which represents the most frequent data point)

Question 15

Q

A distribution on the histogram with two prominent peaks (which represents the most frequent data point)

Question 16

Q

A distribution on the histogram with multiple prominent peaks (which represents the most frequent data point)

Answer

Study These Flashcards

A

Multimodal

Question 17

Q

What are shown on a boxplot?

Answer

Study These Flashcards

A

Median, first quartile, third quartile, whiskers (capturing the data that fall between Q1-1.5xIQR and Q3+1.5xIQR; the whiskers must end at actual data points), and extreme values

Question 18

Q

Interquartile Range (IQR)

Answer

Study These Flashcards

A

A measure of the spread of the middle 50% of a dataset

How to find an IQR:
1. Find the median (Q2)
2. Find the first quartile (Q1): is the median of the data points below Q2
3. Find the third quartile (Q3): is the median of the data points above Q2
4. IQR = Q3 - Q1

Question 19

Q

If a distribution is symmetric, we can apply the _____ rule

Answer

Study These Flashcards

A

Empirical rule (68-95-99.7 rule)

Question 20

Q

Empirical rule

Answer

Study These Flashcards

A

In a symmetrical distribution:

68% of data falls within one SD (mean +1 and - 1 SD)

95% of data falls within two SDs (mean +2 and - 2 SD)

99.7% of data falls within three SDs (mean +3 and - 3 SD)

Lession 2: Data Distributions Flashcards

(20 cards)