How to perform a hypothesis test for population proportion (generic shit)?
How to perform a hypothesis test for a mean?
Where x̄ is the sample mean, and μ is the population mean
a is the variance of the population, n is the size of the sample
How do you perform a hypothesis test for correlation coefficients (either spearman’s or pearson’s)?
(2) Be careful about whether it is one tail or two tail
conclusion and hypothesis tests should be in context to the question
How do you perform a single-sample sign test?
If one tail, only look at the values in the direction of H_1
How do you perform a single-sample Wilcoxon signed-rank test?
Step 1: Calculating the test statistic
1. Find the differences from the median
2. Rank the modulus of the differences
3. W+ is the sum of all the positive ranks and W- the sum of all the negative ranks
4. T is the smaller value of W+ and W-
Step 2: Hypothesis test
1. State the null and alternative hypotheses
H_0 : median is A
H_1 : median is not A
2. State the test statistic (found in part 1)
3. State the value of n, the significance level, and therefore the critical value (from the correct table in the data booklet)
4. Compare T to the critical value; we are looking for smaller than the CV
5. Conclude (in context!)
Be careful finding the CV: is it two tail or one tail?
How would you perform a paired-sample test?
Step 1: Calculating the test statistic
1. Format a table with each data point in each pair, the difference, |d|, and rank
2. Fill in the table (duh)
3. Find W+ (sum of positive ranks) and W- (sum of negative ranks)
4. T is the smaller of W+ and W-
Step 2: The hypothesis test
1. State the null and alternative hypothesis
H_0 : the distribution is the same
H_1 : the distribution is not the same
2. State the value of n, the significance level, and therefore the critical value (from the correct table in the data booklet).
3. Compare T to the CV
4. In this test we are looking for T to be smaller than CV in order to reject the null hypothesis
5. Write a conclusion (don’t be lazy)
Which difference you take much be stated (i.e. stating d = a - b
How would you perform a Wilcoxon Rank Sum test?
Step 1: Calculate the test statistic, T
1. Rank all the values in both samples in order of increasing size (the lowest being 1)
2. R_m is the sum of the ranks in m and R_n is the sum of the ranks in n
3. T is the smaller of R_m and [ m(m + n + 1) - R_m ]
Step 2: The hypothesis test
1. State the null and alternative hypotheses
H_0 : The median of the two groups is the same i.e. median_1 = median_2
H_1 : The median of the two groups is not the same
2. State the value of T you found in part 1
3. State the value of m, n, and the significance level. Find and state the relevant critical value from the correct table in the data booklet)
4. Compare T to the CV
5. Conclude; T < CV leads to a rejection of the null hypothesis
When would you use a Wilcoxon rank sum / unpaired sample test?
When approximating using the normal what do you need to do?
Normal approximations can be used for Wilcoxon signed rank and rank sum
note: the approximations are different in each case
Add the continuity correction:
If T is less than the mean of the normal distribution, add 0.5
If T is more than the mean of the normal distribution, subtract 0.5
How do you perform a hypothesis test for the mean of a large sample?
This is where we use the CLT!
CLT = Central Limit Theorem
Reject H_0 if [ p-value < significance level ]
How do you perform a Chi^2 Test?
Step 1: Calculation of the Chi^2 Statistic
1. Build a contingency table
2. In the observed table, add an extra column and row, and total each row and column
3. Build the expected frequency contingency table:
* calculating the probability of each row and column, multiplying them for each box’s probability.
* Multiply each box’s probability by the total of all the boxes in the observed (bottom right of the extended observed table).
4. Merge any columns or rows if there is an expected frequency** less than 5**
5. Calculate Chi^2 which is the sum of [(O-E)^2/E] for each box in the table
Step 2: Testing using Chi^2 Statistic
1. State the null and alternative hypotheses, as well as the significance level
H_0 : The two variables are independent
H_1 : The two variables are dependent
2. Calculate the degrees of freedom: v = (n-1)(m-1)
3. Find the critical value according to the stated significance level
4. Compare the Chi^2 statistic with the CV
5. T > CV ==> reject H_0
6. Conclusion (go on add some context)
large values of chi^2 indicate a large differece between O and E
expected frequencies do not have to be integers
What change do you need to make when calculated Chi^2 for a 2x2 table?
The Yates Correction:
Chi^2 = Sum of [(O-E-0.5)^2/E]
How do you perform a goodness-of-fit test?
Step 1: Calculating the Chi^2 statistic
1. Determine the distribution for the expected frequencies
2. Calculate the expected frequencies using the probability calculated from the distribution and multiply by the total
3. Perform the Chi^2 calculation as normal with these expected frequencies
Step 2: Perform the hypothesis test
1. State null and alternative hypotheses
H_0 : the data does follow this distribution
H_1 : the data does not follow this distribution
2. Calculate the Chi^2 statistic (see part 1)
3. Calculate the degrees of freedom [ v = number of bins - 1 - number of constraints ]
4. State the critical value found from the table in the data book
5. A larger Chi^2 than the CV would lead to a rejection of the null hypothesis ==> the distribution is not a good fit
be careful with the number of constraints
How do you find the degrees of freedom in a goodness-of-fit tests?
v = number of bins (data points?) - 1 - number of constraints/ assumptions
If you have to calculate a mean, standard deviation, or proportion from the data, and use that value in predicting the expected values, these all add to the number of constraints.