Chapter 3.2: Big Data - Multiple testing Flashcards by Dylan Ottey

How do you generally 101 variables for a sample size of 100 of standard normal data entries that are independent?

How well did you know this?

Not at all

Perfectly

How to I analyse this data in R?

How would I plot the linear regression model?

How well did you know this?

Not at all

Perfectly

What is voodoo correlation and why is it a caution of multiple testing?

If you set the significance level at 5% then out of 100 indepedent uniform samples you would expect 5 of them will fall below 0.05 –> this is simply by chance not necessarily because of any significant correlation
Have to change the significance level or define significance in another way

How well did you know this?

Not at all

Perfectly

Give the null hypothesis: the defendent is guilty, what are Type I and Type II errors in this case?

Whis error is the worse and the one we try to control?

How do theser relate to False Positives and Negatives?

How well did you know this?

Not at all

Perfectly

Building upon from type I and II errors, what is the notion of the Family-wise error rate?

Want to make as few False rejections as possible
Probability you are NOT going to reject is (1-α-tilda), and hence m indepedent tests this is raised the the power of m –> Probability you are NOT going to reject all of them

How well did you know this?

Not at all

Perfectly

What is the can we do to reduce the FWER?

How well did you know this?

Not at all

Perfectly

What is the proof of this?

union of all H_0,i that are true –> using the union becuase if any of these happen you get at least one rejection of a true null.
Union bound argument –> Fundamental Tool countable union rules of a probability measures
Convert into P-values –> reject the null of the p-values is < α/m
When the null-hypothesis is true, the associated P-value will follw a uniform 0-1 –> hence the probability P_i is less than or equal to α/m is simply α/m.
Summing over all true nulls totals to m₀