Chapter 3.2: Big Data - Multiple testing Flashcards

(9 cards)

1
Q

R:

How do you generally 101 variables for a sample size of 100 of standard normal data entries that are independent?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to I analyse this data in R?

How would I plot the linear regression model?

A
  • Not also V6 look significant based on the F-value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is voodoo correlation and why is it a caution of multiple testing?

A
  • If you set the significance level at 5% then out of 100 indepedent uniform samples you would expect 5 of them will fall below 0.05 –> this is simply by chance not necessarily because of any significant correlation
  • Have to change the significance level or define significance in another way
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give the null hypothesis: the defendent is guilty, what are Type I and Type II errors in this case?

Whis error is the worse and the one we try to control?

How do theser relate to False Positives and Negatives?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Building upon from type I and II errors, what is the notion of the Family-wise error rate?

A
  • Want to make as few False rejections as possible
  • Probability you are NOT going to reject is (1-α-tilda), and hence m indepedent tests this is raised the the power of m –> Probability you are NOT going to reject all of them

*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the can we do to reduce the FWER?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the proof of this?

A
  1. union of all H0,i that are true –> using the union becuase if any of these happen you get at least one rejection of a true null.
  2. Union bound argument –> Fundamental Tool countable union rules of a probability measures
  3. Convert into P-values –> reject the null of the p-values is < α/m
  4. When the null-hypothesis is true, the associated P-value will follw a uniform 0-1 –> hence the probability P_i is less than or equal to α/m is simply α/m.
  5. Summing over all true nulls totals to m0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the False discovery rate (FDR)? What is the False discovery proporition (FDP)?

A
  • FDP = number of FP over all rejections –> random number

Normally get more rejection for FDR than for the family-wise error rates especially when p is big

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Approach approach do we employ to control the FDR?

A
  • Ramps/sorts all the p-values in order from smallest to largest
  • What is the last time the sorted p-values <= α * (i/m)
  • Reject all p-values smaller than the cutoff value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly