w
what is a contingency table?
data frequencies or proportions within different levels of categorical variable.
What are one way and two way contingency tables?
What are marginal dsitributions?
How to find marginal distributions in rows vs columns?
rows: sum frequencies accross all columns for each row
column: sum frequencies accross all rows for each column
what are conditional distributions?
two-way tables that show the proportion of sampling units for one variable within each level of the second variable. the interaction between categorical variables (shown as seperate table)
How create conditional distribution?
select one of the categorical variables to be the primary variable and the other one to be the secondary (conditional) variable
How are conditional distributions calculated?
calculated as the frequency from contingency table divided by the marginal distribution of the primary variable
What do contiditonal distributions show us in regards to the variables?
allow us to see how the secondary variable changes accross the primary variable
What is a bar graph?
used to visualize categorical data
vertical or horizontal orientation
What are two variable bar graphs?
What type of variable is good for a grouping variable?
ordinal categorical variables
What are grouped bar charts
What is a stacked bar chart?
What are histograms?
what are the three steps of how histograms are created?
Advantages of histograms?
Disadvantage of histogram?
complicated to display hisograms when the dataset has many levels of categorical variable
what is a bin?
a small range of the numerical variable. The numerical variable is divided into a number of bins of equal size forming the base of the figure.
What are box plots?
What do boxplots show?
What happens with grouped box plots for categorical groups?
When should we use histograms vs box plots
if you have numerical data for a small number of categorical groups and want to showcase the shape of the data distribution, then histograms are the choice.
if you have many categorical groups, or are not interested in showcasing the shape of the data distribution, then use box plot
What is a scatter plot?
What is the independent vs dependent variable?
independent: the experimental treatment that is manipulated
dependent: the measured response under those treatments