Lesson 5 Flashcards

(21 cards)

1
Q

Measures of ___________ ____________ are used to do what?

A

Measure of central tendency are used to describe the data by taking about the average, most common, and middle number. These all provide some insight into the distribution of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can you describe data without number?

A

Talk about it in relation to the context of the data, or make a pie chart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is statistics?

A

Set of rules and procedures for reducing large masses of data to manageable proportions in order to draw conclusions from the data. This includes trying to understand the measures of central tendency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two types of statistics?

A

Descriptive and inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is descriptive statistics and what are some examples?

A

This is using number to summarize a set of data (so mean median and mode and range).
Ex) 45 is the lotteries most frequently drawn number
The age range for the Olympic Games was 54 years
Bo’s batting average was 0.311

These all help is to picture the data without being provided with large sets of numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a measure of central tendency? What are the three types?

A

A measure of the typical value in a collection of numbers or data.

  1. Mean (average value, may not represent an actual value in the data set especially if discrete).
  2. Mode (most common value so will represent a value in the data set).
  3. Median (middle value in the data set, will also represent the data set and is less effected by outliers because it is the middle no matter how extreme the outlier is on the end)/
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is mean? What are two types of means we look at and how do we calculate them?

What is the main issue with the mean?

A

Mean is the average value in the data.
Population mean: The average value for the people you are studying.
Sample mean: The average value for just those in the study which is supposed to represent what would happen if you tested the population (which is often impossible). It may not always represent the broader population though if not a large enough sample and participants don’t have evenly distributed traits.

Population mean: Divide the sum of all the values by the total number of people in the population (N).
Sample mean: Divide the sum of all the values by the number of people in the sample (n)

Always include 0 in the mean as one of the values!
The main issue with the mean is that it can be greatly impacted by outliers in the data set so that it doesn’t actually represent most of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When calculating means in APA format, how many decimals do you report to?

A

You report to one decimal more than the data has.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the median, and how do you calculate it?

A

The median is a value that lies in the middle of the data when the data set is ordered.

To calculate:
1. Rank the data
2. The position of the median is equal to the number of entries plus one divided by 2. If you get a decimal answer when you do this, it means you data set is even and thus has two means. So you want to take those two middle values and average them to get the mean. This will be the decimal value that was calculated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is mode and how do you calculate it? When could you have no mode? When could you have multiple modes, or could you?

A

Mode is the most frequent value in the data set (appears the most often). If there are no repeated values then you have no mode. If there are multiple modes, (multiple values repeated the same amount of times) then your data is multimodal (bimodal if two entires).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the three measures can be used for non-quantitative data?

A

Mode can be used because it is the most common value and thus if you have categories even if they can’t be ordered, they will still be the most common. For example, which political party was voted for the most commonly? This is nominal data but the mode can still be used to describe it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the advantages of using the mean?

A

Most common statistic so most people know what it means. Takes into account every entry in the data set so every single data point is getting their input and changing just one entry can influence its value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Disadvantages of using the mean?

A

Not appropriate for all data types, such as categorical/ qualitative data. Might not take on a value that is in the data set, especially if you are taking the mean of discrete data. You lose knowledge about individual cases and outliers. It is also the MOST STRONGLY EFFECTED BY EXTREME SCORES. So although every data point gets an input, this can be bad because it can skew the representation of the data and not represent the majority of data points included.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If you have an outlier, what measure would be better to represent the data rather than the mean?

A

The median would be the best because it represents the middle value in the data, and if the data is pretty similar except for an outlier, that middle value will not be effected by how extreme that outlier is and thus it will more accurately reflect the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What helps to counteract outliers in the data?

A

A larger sample size helps to counteract outliers because the ups and downs balance out and decrease that impact. However when there’s a small sample size, you might want to use the median to more accurately reflect your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are advantages of using the median as a measure?

A

It is not influenced much by extreme scores because it is still just a value on the end. The actual value of that number does not impact what the middle value in the sequence is, unless it changes its position in the sequence. Reasonable estimate for the centre of distribution. It is not influenced by extreme scores at all once those scores places in the order are decided. But it is influenced by the number of outliers.

17
Q

What are disadvantages of using the median as a measure of central tendency?

A

It might not be good to ignore extreme values in some cases, especially if there are a lot of extremes values. It also may not be appropriate for representing all data types especially if you can’t order the data (nominal data, what would be the middle value?)

18
Q

What are advantages of using the mode as a measure of central tendency?

A

The mode is the most frequently obtained score and it is important to note that because it represents a lot of the people. It is not influenced by extreme scores at all. It can also be used for all data types, although it may not be the most useful in some situations, like when there is no mode.

19
Q

What are some disadvantages of using mode and in what cases would it not be used?

A

Using mode can be bad because it doesn’t represent a large proportion of the scores, especially for continuous values, and it ignores the extreme values. Therefore it is often not used for continuous quantitative data.

20
Q

What is something you should be aware of when looking at the outcomes of statistics?

A

Be aware of studies that trade an everyday understanding of average to mean typical or common, because mean can be influenced by outliers and may not even represent a number in the data set, so it can’t always be used this way.

21
Q

What can the mode be used for that the median and mean cannot?

A

The mode can be used for nominal data because it is looking at the most common choice and those choices do not need to be ordered.