ANOVA, analysis of variance
compares the means of two or more independent groups by examining the variance between the group means and comparing it to the variance within each group.
why not doing multiple t-tests instead of anova?
it increases the chance of type 1 error rate
what is SSt?
a measure of the total amount of variation in the entire dataset
what is SSg or SSb?(between groups)
summary measure of how much variation in the data is attributable to differences in group means
error sum of square of with in group SSe or SSw/
summary measure of how much variation in the data is attributable to random variation among individuals within groups
what is MS(mean square)?
variance, sum of squares divided by its degrees of freedom
MSg
group-variance between the means of the different groups
df= k(# of groups)-1
MSe
error-variance among individual observations within the groups
df= N(#total # of individuals) - k (# of groups)
F ratio
test statistic for anova, under null Msg =Mse therefore F will be close to 1, if Ho is false then Msg»»»Use and F»»>1 more different between the groups than within the groups
how much of the variations is explained by the explanatory variables(group differences)
proportion of total variation explained by group differences.
R^2, variation explained
summarize the contribution of the group difference to the total variation
R^2 close to zero?
most of the variation is within groups
R^2 close to one ?
most of the variation is explained by the explanatory variable
fixed effets?
categories of the explanatory variable are predetermined
Random effects?
randomly sampled from a larger pool of groups, groups are no of particular interest
what are ANOVA assumptions?
1- Normality of residual
2- homosdecasticity (equal variances between groups)
3- independence of error terms
how to check for normality of residuals?
normal probability plot(Q-Q)
Shapiro wilk test (Ho=residuals are normally distributed, p should be greater than 0.05). ANOVA is robust to it as long as the sample size is big
what should I do with outliers?
repeat the analysis with and without the suspicious observation and see if it changes the outcome, check for errors and the reason for it happening
Homoscedasticity?
homogeneity of variance, all groups have the same variance.ANOVA is robust to this one too as long as the number of observations per group is about equal.
how to check for homoscedasticity?
visual check the residuals and fitted plot, Bartletts test
Bartletts test?
Ho: the k population variances are all the same (p greater than 0.05 confirms the homoscedasticity)
Tukey HSD test?
tests which treatments/groups differ from which others
what type of distribution is used for tukey test?
we use the q distribution instead of the t distribution but the formula is similar to two sample t-test