L7 chi-squared test Flashcards by Eli Cusano

Experiment example

A researcher was interested in whether animals could be trained to dance. He took 200 cats and tried to train them to dance by giving them either food or affection as a reward for dance-like behaviour. At the end of the week he counted how many animals could dance and how many could not. There are two categorical variables here: training (the animal was trained using either food or affection, not both) and dance (the animal either learned to dance or it did not). By combining categories, we end up with four different categories. All we then need to do is to count how many cats fall into each category.
Look at picture 1 to see the contingency table of this data

How well did you know this?

Not at all

Perfectly

What is χ2 test?

A chi-squared test, is any statistical hypothesis test wherein the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true
- Often used as short for Pearson’s chi-squared test

How well did you know this?

Not at all

Perfectly

What does χ2 test measure?

The association between two categorical variables

Both our independent variable (IV) and our dependent variable (DV) are categorical

How well did you know this?

Not at all

Perfectly

What is the central idea of Person’s χ2 test?

Based on the idea of comparing frequencies we observe in certain categories to frequencies you might expect to get in those categories by chance

How well did you know this?

Not at all

Perfectly

What is the formula we use to calculate χ2? Explain each part of the formula

Picture 2
- We divide by the model scores - same process as dividing by degrees of freedom to get the mean squares (standardizes the deviation of each observation)
- i - rows in the contingency table; j - columns
- Observed data - the frequencies

How well did you know this?

Not at all

Perfectly

What is the model in the formula of χ2?

We calculate the expected frequencies for each cell in the table using the column and row totals for that cell

By doing so we factor in the total number of observations (n) that could have contributed to that cell
picture 3
For our experiment example we obtained expected frequencies for the four cells (picture 4)
We apply all the data into our formula and get χ2 (picture 6)

How well did you know this?

Not at all

Perfectly

What do we use to display data and calculate χ2?

Contingency tables
Picture 5

How well did you know this?

Not at all

Perfectly

What is the χ2 distribution?

It describes the test statistic χ2 under the assumption of the null hypothesis and is used to obtain the p-value corresponding to the value of the χ2-statistic

How well did you know this?

Not at all

Perfectly

How is its shape determined and how do we obtain a p-value from it?

Its shape is determined by the degrees of freedom which are (r-1)(c-1), in which r is the number of rows and c is the number of columns
Always one-sided, so when getting the p-value, we look at the probability of all the values to the right from the statistic we obtained
picture 7

How well did you know this?

Not at all

Perfectly

What happes to χ2 statistic’s approximation as the sample increases? How is that different with small samples?

The chi-square statistic has a sampling distribution that approximates a chi-square distribution, and this approximation improves as the sample size increases

For large samples, this approximation is accurate enough, but for small samples, it becomes unreliable

How well did you know this?

Not at all

Perfectly

What happens if the expected frequencies in a χ2 test are too low (small sample)?

The sampling distribution of the test statistic deviates too much from the chi-square distribution, making the test inaccurate

How well did you know this?

Not at all

Perfectly

What is the Fisher’s test?

Calculates exact χ2 for small samples (one of the cells’ expected frequency less than 5)
- uses 2x2 contingecy table (i.e., two categorical variables each with two categories)

How well did you know this?

Not at all

Perfectly

Can Fisher’s test be used for larger samples or tables?

Yes, but it’s unnecessary and can be computationally intensive

How well did you know this?

Not at all

Perfectly

What is an alternative to Pearson’s χ2?

Likelihood ratio statistic which is based on maximum likelihood theory

How well did you know this?

Not at all

Perfectly

How do we compute the likelhood ratio statistic?

Johnny had this in his slides but skipped it. The book talked very little about it

Collect data and create a model for which the probability of obtaining the observed set of data is maximized
Compare this model to the probability of obtaining those data under the null
The resulting statistic is based on comparing observed frequencies with those predicted by the model
The statistic has a χ2 distribution with df computed the same way as for Pearson’s
The rest of the procedure to obtain p-value is the same

Formula is in picture 8, i and j are the rows and columns of the contingency table and ln is the natural logarithm

How well did you know this?

Not at all

Perfectly

What is the Yates’s correction?

Study These Flashcards

For 2 x 2 contingency tables, Yates’s correction is to prevent overestimation of statistical significance for small data (at least one cell of the table has an expected count smaller than 5)

You subtract 0.5 from the absolute value of the model’s deviation before squaring it
formula in picture 9
For larger samples this doesn’t change much since the difference is already big enough; for small samples it makes a difference

What is the problem with Yates’s correction?

Study These Flashcards

It overcorrects since it lowers the value of χ2 statistic which makes it less significant

This results in overly conservative results

The book said rather ignore it, Johnny was a bit wary about it as well but didn’t make such a definite statement

What are three measures of the strength of the association between two categorical variables?

Study These Flashcards

These measures modify the chi-square statistic to account for sample size and degrees of freedom, aiming to restrict the range of the test statistic from 0 to 1, similar to a correlation coefficient:

Phi: Accurate for 2x2 contingency tables, but for larger tables, its value may exceed 1, making it less reliable.
Contingency Coefficient: Designed to keep values between 0 and 1, but it rarely reaches 1, making it less effective.
Cramér’s V: Identical to Phi for 2x2 tables, but unlike the others, it can reach a maximum of 1 for tables with more than two categories, making it the most useful for larger tables.

Johnny didn’t mention this in the lecture

How can we represent the χ2 in a linear model? Apply it to the experiment example

Study These Flashcards

It’s the same as with factorial design since we have two predictors and an interaction between them (training x dance)
picture 10.1
picture 10.2 - the outcome variable is categorical so the assumption of linearity is broken → the outcome variable gets transformed to log values (which also affects the error term)
picture 10.3 - the predicted values of the outcome (error = 0)
We replace training and dance variables with 0 or 1 depending on the category that we are trying to calculate

How do we calculate each variable in the observed linear model?

Study These Flashcards

You set a b0 - that’s our base, in this case when the cats got food but didn’t dance (picture 10.4). All other categories are zero
To calculate the b-variable for each category we calculate it by assigning 1 to that category and 0s to all the other ones. We find the number in the cell corresponding to our category in the contingency table in picture 1
Picture 10.5, 10.6, 10.7
Putting all those b values together we get the model in picture 10.8

How do we calculate the expected frequencies model?

Study These Flashcards

The χ2 test looks at whether 2 variables are independent (interaction = 0)
Remove the interaction term and we can get two scenarios:
1. The model is still a good fit to the data, the interaction effect isn’t contributing to the fit to the model → variables are not dependent
2. The model is a poor fir to the data, the interaction term is contributing a lot to the model which implies that the variables are dependent

Picture 10.9 - formula for the predicted number of cats in each category (we took the b-variable for interaction effect out)
- Now, we use the expected values that we already computed from picture 4 and calculate the b-values: picture 10.10, 10.11 (main effect of the type of training), 10.12 (the main effect of whether the cat danced), 10.13 (just to double check whether it fits)
- Putting all this together we get the predicted values from the model - picture 10.14
- We can rearrange the formula of the model (picture 10.15) to get the residuals (picture 10.16) and eventually the χ2

What are the assumptions when analysing categorical data?

Study These Flashcards

Independence of observations
Expected frequencies in 2x2 contingency tables should not be less than 5. In larger tables and when looking at association between three or more categorical variables, all expected counts >1 and no more than 20% should be less than 5
To assess practical significance we should look at effect sizes (odds ratio)

How do we calculate effect size of the χ2 test?

Study These Flashcards

Using odds ratio based on the observed values
- not useful for larger tables than 2x2

Formula on how to calculate it in picture 11
Example in picture 12

What does the odds ratio represent?

Study These Flashcards

It takes into account both of the levels of both of our variables. So we are not just saying how many more odds numbers than even numbers are perceived as female but we are making it relative to the male part as well.
= How many times is it more likely to be female/male, relative to the other variable odd/even
- It’s generally easier to talk about odds ratio that exceed 1 so we can flip the variables if we need to (1/x = z; x = 1/z)

What are standardized residuals?

It helps to break down the significance further to see where our model was mistaken the most and the directionality of its mistakes (over/underestimated) - basically looking at which of our frequencies was driving our large χ2 value? - formula in picture 13 - we take the square root of the model in the denominator

Why is it helpful to see the standardized residuals as z-score?

We can see them as z-scores which helps to interpret the significance becuase f the value lies outside of ±1.96 then it is significant at p ≤ 0.05, if it lies outside ±2.58 then it is significant at p ≤ 0.01, and if it lies outside ±3.29 then it is significant at p ≤ 0.001

How do we report results of χ2 test?

We provide the value of the test statistic with its associated df and the significance value. the value of χ2 was 25.36, that the degrees of freedom on which this was based were 1, and that it was significant at p < 0.001 It's useful to also provide the contingency table and report the odds ratio: There was a significant association between the type of training and whether cats would dance, χ2(1) = 25.36, p < 0.001. The odds ratio showed that the odds of cats dancing were 6.65 times higher if they were trained with food than if trained with affection.

L7 chi-squared test Flashcards

(27 cards)