Univariate analysis
The analysis of a single attribute or variable at a time
Standard deviation (SD)
A measure that tells us how spread out observations are from one another.
Define five number summary
it is a summary that contains five numbers that statisticians and data scientists use to help understand the various values that different observations have for an attribute
What are the five numbers in the five number summary?
Minimum, first quartile, median, third quartile, maximum
Define the minimum
The smallest value that any observation has for the attribute
Define the first quartile
It is the 25th percentile value.
25% of observations have a value below the first quartile (i.e. this is the point that is a quarter through all your data)
Define the median (i.e. the second quartile)
It is the second quartile (aka the 50% percentile value) Half of the values are below the median. It is the point where half of the data is below and half is above.
Half of the values are below the median.
Define the third quartile
This point is three quarters into your data. It is the 75th percentile value. 75% of all observations have value below the third quartile.
Define the maximum
This is the largest value that any observation has for the attribute
Define frequencies
The total number of observations whose response is equal to a particular value
(e.g. if nine of the resondents say they are 7 ft, than the frequency for seven is nine)
Relative frequencies (i.e. percentages)
The percentage of all observations without missing values whose value is equal to a particular response
What is a dot plot?
A graph where each observation is displayed as one dot on the graph. Looking at the height of a dot plot tells you the frequency for a particular value.
What is a density plot (i.e. density curve)?
It is like a dot plot but with a line drawn on top of it. It doesn’t show individual stacks of dots. It shows the concentration of where there are fewer and greater observations.
What is a word cloud?
It is meant to graphically depict text data. It shows all the text recorded from all the responses to each observation. The size of the word represents the frequency of it (i.e. how many times someone said it)
What is a bar graph?
It is based on a frequency table and has one bar for every response option (typically in order). The bar height is equivalent to each response option’s frequency.
Define aggregate characteristics
They are the characteristics of a group of observational units
What aggregate characteristics are recorded for text, rating scale, and categorical data?
Frequencies
How do you think of the spread of a distribution?
Think of the “middle 95%” and the standard deviation
How do you think about the location of a distribution?
Think about the peak/mode and the mean
How do you think about the shape of a distribution?
Think about the number of peaks, and the symmetry (asymmetrical? left or right skewed?), and compare it to one of the common distributions (i.e. bell curve, multimodal, uniform distribution, etc.)
What aggregate characteristics are recorded for quantitative data?
The shape, location, and spread, of the data
Define distribution
The pattern that the responses from all the observational units make
What are the different common shapes that we often see in quantitative distributions?
U-Shaped, Uniform distribution, Multimodal, Unimodal, Bell-curve (Normal distribution), Skewed distributions, symmetric
Deine U-Shaped distribution
Most of the observational units either have very low values or very high values with hardly any values in-between