Define Descriptive / Summary Statistics (5)
What to pay attention to in summary statistics? (4)
What are the measures of central tendency? (3)
Mean, median, mode
How do extreme values affect the mean, median, and mode
Which measure of central tendency do you use for the Nominal Variable?
Mode:
- The numbers in nominal variables only refer to the category
- Calculating the mean would be pointless
Which measure of central tendency do you use for the Ordinal Variable?
Median:
- Median splits to create further categories or creates dichotomies
Dichotomy
A division of 2 things that are being represented as different or opposed.
Using the interquartile range + median of an ordinal variable would split the data into 4 categories.
Example:
x<Q1 = Small, Q1<x<Median (Q2) = Small-Medium, Median (Q2) <x< Q3 = Middle-Large, x> Q3 = Large
Which measure of central tendency do you use for the Interval (scale) or Ratio Variables?
Mean or Median:
Depending on the skewness, this would indicate which central tendency to go for.
Not skewed –> Mean
Skewed –> Median
Define Skewness
What is the name for when the skewness values go outside the -1 to +1 range?
Substantially skewed
What kind of skew is a distribution with a longer right tail?
Positively skewed
What is a negatively skewed distribution?
A distribution which has a longer tail to the left
Kurtosis
Leptokurtic (kurtosis)
Platykurtic (kurtosis)
Mesokurtic (kurtosis)
This is a normal distribution.
–> Kurtosis = 3
What is the interquartile range?
This is a measure of variability.
Q1: Lower Quartile (P25, 25%)
Q2: Median (P50, 50%)
Q3: Upper Quartile (P75, 75%)
What are the measures of variablility?
Variance: The average of the squared differences between each data point and the mean.
Standard Deviation: The square root of variance
High standard deviation–> High dispersion of data points
Low standard deviation –> Low dispersion of data points
Correlation significance
Whether the correlation in population is significantly different from zero or not.
Directionality problem
A and B can be correlated because A causes B or B causes A
Third variable problem
A and B can be correlated not because A causes B or B causes A. but some unmeasured third variable, C, causes both A and B
What is the correlation coefficient?
This is a value with a range between -1 and +1, where it measures the association between two variables, but NOT the causation.
π(π) = π·(π) + π·(π)π(π) + E(i)
Define these values
y = Dependent variable
x = Independent variable
π·(π) = Intercept / Constant
π·(1) = Slope or regression coefficient for the variable ‘x’.
E = Error term –> Everything that the model does not take into account
What does it mean to measure the coefficient of a specific variable in a multiple regression equation? (3)
The coefficient of each independent variable:
- Indicates change in dependent variable
- When the given independent variable changes
- But keeping all other independent variables constant (important assumption)