Unit 5 Key Ideas Flashcards

(49 cards)

1
Q

Univariate analysis

A

The analysis of a single attribute or variable at a time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Standard deviation (SD)

A

A measure that tells us how spread out observations are from one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define five number summary

A

it is a summary that contains five numbers that statisticians and data scientists use to help understand the various values that different observations have for an attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the five numbers in the five number summary?

A

Minimum, first quartile, median, third quartile, maximum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define the minimum

A

The smallest value that any observation has for the attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define the first quartile

A

It is the 25th percentile value.
25% of observations have a value below the first quartile (i.e. this is the point that is a quarter through all your data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define the median (i.e. the second quartile)

A

It is the second quartile (aka the 50% percentile value) Half of the values are below the median. It is the point where half of the data is below and half is above.

Half of the values are below the median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define the third quartile

A

This point is three quarters into your data. It is the 75th percentile value. 75% of all observations have value below the third quartile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define the maximum

A

This is the largest value that any observation has for the attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define frequencies

A

The total number of observations whose response is equal to a particular value
(e.g. if nine of the resondents say they are 7 ft, than the frequency for seven is nine)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Relative frequencies (i.e. percentages)

A

The percentage of all observations without missing values whose value is equal to a particular response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a dot plot?

A

A graph where each observation is displayed as one dot on the graph. Looking at the height of a dot plot tells you the frequency for a particular value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a density plot (i.e. density curve)?

A

It is like a dot plot but with a line drawn on top of it. It doesn’t show individual stacks of dots. It shows the concentration of where there are fewer and greater observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a word cloud?

A

It is meant to graphically depict text data. It shows all the text recorded from all the responses to each observation. The size of the word represents the frequency of it (i.e. how many times someone said it)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a bar graph?

A

It is based on a frequency table and has one bar for every response option (typically in order). The bar height is equivalent to each response option’s frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define aggregate characteristics

A

They are the characteristics of a group of observational units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What aggregate characteristics are recorded for text, rating scale, and categorical data?

A

Frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you think of the spread of a distribution?

A

Think of the “middle 95%” and the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you think about the location of a distribution?

A

Think about the peak/mode and the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you think about the shape of a distribution?

A

Think about the number of peaks, and the symmetry (asymmetrical? left or right skewed?), and compare it to one of the common distributions (i.e. bell curve, multimodal, uniform distribution, etc.)

21
Q

What aggregate characteristics are recorded for quantitative data?

A

The shape, location, and spread, of the data

22
Q

Define distribution

A

The pattern that the responses from all the observational units make

23
Q

What are the different common shapes that we often see in quantitative distributions?

A

U-Shaped, Uniform distribution, Multimodal, Unimodal, Bell-curve (Normal distribution), Skewed distributions, symmetric

24
Q

Deine U-Shaped distribution

A

Most of the observational units either have very low values or very high values with hardly any values in-between

25
Define uniform distribution
It is essentially a flat shape. There are approximately the same number of observational units for each of the possible values.
26
Define multimodlal
It has multiple peaks. It is common when there are actually group differences in the attribute.
27
Define unimodal
It has only one peak
28
Define a bell-curve (i.e. normal distribution)
Most of the observational units have a value near the average. Approximately 95% of the values are within two standard deviations of the mean (aka average)
29
What is another word for the mean?
Average
30
What is another word for the median?
The second quartile
31
What is a skewed distribution?
They are distributions that look like they have one side stretched out
32
Define right skew
Distributions look like the right side of a normal distribution has been stretched out (indicating some units have large values "stretching it" out)
33
Define left skew
Distributions that look like the left side of a normal distribution has been stretched out (indicating that some units have very small values)
34
Why does a left skew mean that some units have very small values while a right skew means some units have a very large value?
Think of a left skew (also called negatively skewed) distribution as having a tail that drags to the left. This means that while most of your data points are bunched up on the higher end, there are a few smaller values pulling that tail out to the left. Conversely, a right skew (positively skewed) distribution has a tail that extends to the right, meaning most of your data points are on the lower end, but there are a few larger values dragging that tail out to the right.
35
What is a key to thinking statistically?
Focusing on how each observational unit varies
36
What is a two-way table?
Similar to a frequency, this type of table is a two-dimensional version of this. One attribute's frequencies are presented as different rows in the table, and a second attribute's frequencies are presented as columns.
37
Define conditional relative frequencies
They are relative frequencies based only on the total from a single row or single column
38
Scienticians use what when each column represents a separate group?
Column percentages
39
Define Cramer's V
It is the statistic that quantifies the association between two attributes. Having an association typically means there is a perceived relationship between the two attributes. It is on a scale of 0-1
40
Define a side-by-side bar graph
It is a bar graph that has one bar graph created separately for each of the different response options for the second attribute being considered in the two-way table
41
Define side-by-side density plots
Have one density curve for a quantitative attribute for each group all on the same plot
42
The ratio of standard deviation is equal to what?
The largest standard deviation between the two groups divided by the smallest standard deviation Largest standard deviation in the group/smallest standard deviation in the group
43
What is the effect size queal to?
The difference in the means (averages), divided by the larger standard deviation Difference in means/larger standard deviation in the group
44
What effect sizes indicate a large difference in the averages between the groups?
0.75 or more
45
What effect sizes indicate a small difference in the averages between groups?
0.25 or less
46
What is a scatterplot?
A plot in which each observational unit is placed as a point on a graph according to their value for each quantitative attribute
47
What is a smoothed trend line?
Is a line through the average value of the vertical axis attribute across all values of the horizontal axis attribute. It is a line that shows a general trend or direction of the data without getting caught up in the ups and downs. It's the calm, steady flow in a river of data points.
48
Define Kendall's Tau
Is a correlation measure that can be used even when the trend line is not straight
49
What do the values of the Kendall Tau mean?
If values are close to 0, it means there is no association between the attributes. When values are close to -1 (the trend is going down), it means there is a strong negative association between attributes. When values are close to +1, it means that there is a strong positive association between the attributes (the trend is going up).