Probability Flashcards

(99 cards)

1
Q

What are the advantages and disadvantages of a census?

A

Advantages - collecting data on every member of the population so it’s a very accurate representation and unbiased
Disadvantages - takes a long time, lots of effort and money to gather data on a large population, hard to ensure everyone’s surveyed (if some are missed results may have bias)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define population

A

The whole group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define sample

A

A selected group from the population (subset)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the equation equivalent to A∪B in a Venn diagram?

A

P(A∪B) = P(A) + P(B) - P(A∩B)
You must subtract P(A∩B) because otherwise the intersection is counted twice!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Use De Morgan’s law to make P(P’∩L’) easier to interpret

A

P(P’∩L’) is equivalent to P(P∪L)’
Remember to change the sign (from intersect to union) and put the complement symbol outside the bracket!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Use De Morgan’s law to make P(A’∪B’∪C’) easier to interpret

A

P(A∩B∩C)’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does independence mean? Give the supporting equations

A

(Unconditional) Two events are independent when the occurrence of one doesn’t affect the probability of the occurrence of the other. So P(A|B) = P(A) and vice versa: P(B|A) = P(B) so P(A) = P(A∩B) / P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What equation do you use to test for independence?

A

P(A) x P(B) = P(A∩B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does mutually exclusive mean?

A

Two events are mutually exclusive if they can’t occur at the same time so P(A∩B) = 0 (meaning there’s no overlap (intersection) in the Venn diagram)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What equation is used to test for mutual exclusivity?

A

P(A∪B) = P(A) + P(B) The resulting value must be 0 for the events to be mutually exclusive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does ∩ mean?

A

Intersect (“and”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does ∪ mean?

A

Union (“or”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does ‘ mean?

A

Complement (“not”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does | mean?

A

“Given” e.g. B|A means given A (start with A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Give the equation for finding the intersection (conditional probability)

A

P(A|B) = P(A∩B) / P(B) or
P(B|A) = P(A∩B) / P(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is simple random sampling?

A

Where every person or item in the population has an equal chance of being in the sample, and each selection is independent of the others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the advantages and disadvantages of simple random sampling?

A

Advantages: every member of the population has an equal chance of being selected, so it’s completely unbiased
Disadvantages: can be inconvenient if the population is spread over a large area - it may be difficult to track down identified members

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

State the method for simple random sampling

A

1) enumerate the population from 1 to n (where n is the population size)
2) use a random number generator to draw a random integer from 1 to n
3) continue drawing until k different numbers have been identified (k is the sample size) select the corresponding members

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Events A, B and C have probabilities: 0.1, 0.2 and 0.3 respectively.
A and B are independent events.
B and C are mutually exclusive.
A and C are also mutually exclusive.
Draw a Venn diagram showing the events.

A

1) A and B overlap, the centre is 0.02
2) A is 0.08
3) B is 0.18
4) C does not overlap A or B (it’s completely separate) and has a value of 0.3
5) outside the Venn diagram is 0.42 (because the probabilities all add up to 1 total!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A Venn diagram shows students who study art and students who study history, there is an intersection between the 2. State 2 events that are mutually exclusive

A

A student studies history and a student doesn’t study history are mutually exclusive events (and the same for a student studies art and a student doesn’t study art)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you work out P(A’∩B)

A

Find the values in B then remove the values that overlap with A (can be said as “B and not A”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

If you have a set of 40 cards and 3 are even, what is the probability you pick 2 odd numbers?

A

37/40 x 36/39 = 111/130

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is opportunity/ convenience sampling?

A

Where the sample is chosen from a section of the population that’s most convenient for the sampler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the advantages and disadvantages of opportunity/ convenience sampling?

A

Advantages: data can be gathered quickly and easily
Disadvantages: the sample isn’t chosen randomly so can be biased, there’s no attempt to make the sample representative either

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is quota sampling?
The population is divided into categories and each category is given a quota (number of members you want to sample), data is collected until the quotas are met (**without** using random sampling)
26
What are the advantages and disadvantages of quota sampling?
Advantages: easy for the sampler as they don’t need access to the whole population or a list of every member Disadvantages: can be biased if the selection process isn’t random (some of the population may be excluded)
27
What is a discrete random variable?
A random variable is a number generated by a random experiment (e.g. rolling a die). The variable is discrete if the possible values form a countable set
28
Find the nth term for this sequence of numbers: 26/80, 22/80, 18/80, 14/80
nth term of the numerator is -4n+30 which rearranges to 30-4n So the whole nth term is (30-4x)/80
29
P(x=2) = P(x=3) Find ‘a’ and ‘b’ x: 1 2 3 4 5 P(X-x): a-b 3a 5b 6a+5b 4a+b (Imagine this is a table)
3a=5b (a-b) + 3a + 5b + (6a+5b) + (4a+b) = 1 (probabilities in a table always sum to 1! Now you have 2 equations you can use simultaneous equations to solve) a=0.05 and b=0.03
30
P(A)=0.2, P(B)=0.35, P(C)=0.45 each event is independent, for any 4 events find the probability at least one B is chosen. Then find the probability A is chosen twice, then C twice
1) 1-P(not B) = 1-(0.65)^4 =0.8215 2) (0.2x0.2) + (0.45x0.45) **x 6** =0.0486 **Multiply by 6 as there are 6 ways of choosing 2 from 4?**
31
If you roll 10 fair dice, what is the probability of getting 4 sixes?
(This is binomial distribution.) P(x=4) = (10 choose 4) x (1/6)^4 x (5/6)^6 210x(1/6)^4x(5/6)^6=0.0543
32
How do you work out binomial distribution on a calculator?
1) go to home 2) click on ‘inference’ then ‘probability’ 3) select ‘binomial’ 4) type in n,P then choose ‘next’ 5) select the type of graph on the left (little graph icon) to get x=/≥/≤ 6) type in a value for x and press ‘exe’
33
What is self-selection / volunteer sampling?
This is where people choose to be part of the sample. (Advertise/appeal to the whole population, those who respond are included in this sample).
34
What are the advantages and disadvantages of self-selection / volunteer sampling?
Advantages: requires little time/ effort to find sample members, people who have volunteered are less likely to not respond Disadvantages: there can be trends within the correspondents that lead to bias, people may not want to volunteer for various reasons
35
A test paper consists of 10 multiple choice questions, each with 4 possible answers. A candidate guesses the answers to each question. Let X be the number of correct answers the student achieves. Write down the distribution of X.
X ~ B (10,0.25)
36
In an experiment a biased coin is thrown 10 times. If the probability of obtaining a head is 0.4, find the probability of obtaining exactly 5 heads. If the experiment is performed 7 times what’s the probability exactly 5 heads are obtained on exactly 2 occasions?
1) P(x=5) = 0.2007 2) Y ~ B (7,0.2007) when P(Y=2) (use calculator) gives you 0.2759
37
When seeds are planted, on average 90% germinate. A gardener plants 10 seeds in a tray and waits to see how many germinate. He then plants 20 trays of seeds, each with 10 seeds. Find the probability there are at least 19 trays in each of which at least 8 seeds germinate.
X ~ B (10,0.9) when P(X≥8) gives you 0.9298. Let Y be the number of trays in which at least 8 seeds germinate. Y ~ B (20,0.9298) when P(Y≥19) gives you **0.5855**
38
The random variable W has a binomial distribution with parameters n and p. If p=0.27, find the smallest value of n such that P(W≥1)>0.95
W ~ B (n,0.27) P(W≥1)>0.95 is the same as 1-P(W=0)>0.95, thus P(W=0)<0.05 which means we can put it in the formula: (nC0) x (0.27)^0 x (0.73)^n <0.05 0.73^n<0.05 take logs of both sides: nlog(73) < log(0.05), n>log(0.05)/log(73) **(remember to flip < to > as log(0.05) is a negative number!!)** n>9.519 so **n=10 as it must be a positive whole number (natural)**
39
Which sampling methods appear the most frequently in exam papers (are important to learn!)?
1) self-select 2) simple random 3) opportunity
40
What is the equation for binomial distribution?
X ~ B (n,p) X is a discrete random variable ~ means ‘relating to’ B represents binomial distribution n is the number of trials p is the probability
41
What values can n take in the formula X ~ B (n,p) ?
n is in the naturals so includes zero and positive whole numbers
42
How do you calculate the mean and variance for binomial distribution?
1) mean: np (trials x probability) 2) variance: np(1-p)
43
What is another way you can be asked to calculate the mean for binomial distribution?
Calculate the **‘expected number’**
44
How do you represent these inequalities: 1) at most 10 2) at least 2 3) at least 3, but at most 17
1) x≤10 2) x≥2 3) 3≤x≤17
45
X ~ B (40,0.55). Find P(x<20)
P(x<20) is the same as P(x≤19), typed into calculator gives 0.2130
46
X ~ B (50,0.86). Find P(x>42)
P(x>42) is the same as P(x≥43), typed into calculator gives 0.5990
47
When X ~ B (12,0.34) find P(2
The interval must use only ≤ symbol so type this into the calculator: 3≤x≤6. This gives an answer of 0.7579
48
What are the characteristics of binomial distributions?
1) discrete data 2) each selection is independent 3) the probability is fixed/constant 4) there are only 2 possible outcomes e.g. success/failure or heads versus tails **In a question you must relate these factors to the context/ scenario**
49
In binomial hypothesis testing what are the 2 types of hypothesis?
1) **H0 (H naught)** which is known as a **null hypothesis** 2) **H1** known as the **alternate hypothesis**
50
What is a null hypothesis, what statement do we use in conjunction with it?
Where findings are **statistically insignificant**, we use the phrase ‘**fail to reject H0**’ if we believe H0 is true
51
What is an alternate hypothesis, what statement do we use in conjunction with it?
Where findings are **statistically significant and have a direction**, we use the phrase ‘**reject H0**’ if we believe H1 is true (the statement is always with reference to H0, NEVER say ‘accept’)
52
What are the possible directions of an alternate hypothesis?
Positive, negative or you can state that ‘there’s a difference’
53
What is a type 1 error? What is a type 2 error?
Type 1: falsely rejecting the null hypothesis when it’s true Type 2: failing to reject the null hypothesis when it’s incorrect (opposite of type 1)
54
What is the significance level and what values can it take?
The cut-off point for either rejecting or failing to reject the null hypothesis. The value can be 10% (0.1), 5% (0.05) or 1% (0.01).
55
What significance level is the industry standard?
5% as 10% is too liberal and 1% is too strict
56
What is the p-value?
The probability due to chance, calculated e.g. by doing X ~ B (12,0.5) when P(x≥11), the answer: 0.003174 is the p-value
57
What does it mean if the p-value > significance level?
Probability due to chance is too high so you must **fail to reject the null hypothesis**
58
What does it mean if the p-value < significance level?
Probability due to chance is low enough that you can **reject the null hypothesis**
59
Andy wins if a coin comes up heads, Beth wins if it comes up tails. The coin is flipped 7 times and heads comes up once. Andy complains the coin must be biased against heads. Test at the 5% level to determine whether Andy’s complaint is justified.
You must give **all** of these points, if you miss any you lose the consecutive marks!! 1) let p be the probability of getting heads 2) H0: p=0.5 (equal chance of heads or tails) 3) H1: p<0.5 (heads is less likely to appear) 4) assuming H0 is true, then X ~ B (7,0.5) when P(x≤1) = 0.0625 (this is the p-value) P(x≤**1**) because 1 of the 7 flips is heads 5) 0.0625>0.5 the result is **not significant** so we **fail to reject the null hypothesis** 6) there is **insufficient** evidence to **suggest** that the coin is biased against heads
60
What is a critical region?
Range of values which would lead you to reject H0 (opposite of acceptance region)
61
What are 2 things you must do for 2 tailed tests?
1) **1/2 the significance level** 2) write the critical region in interval notation
62
When the p-value > significance level do we reject or fail to reject H0?
Fail to reject H0
63
Find the coefficient of x^4 in the expansion of (8+x^2)^9
9C2 x 8^7 x (x^2)^2 36 x 2097152 x x^4 = 75497472x^4 so the coefficient is 75497472
64
The first 3 terms, in ascending powers of x, of the expansion of (3-(x/2))^8 are 6561-8748x+5103x^2. Use the first 3 terms to estimate the value of 2.995^8
(3-(x/2))=2.995, -x/2=-0.005 so **x=0.01** 6561-8748(0.01)+5103(0.01)^2 =**6,474.0303**
65
In **decreasing** powers of x, find the first 4 terms in the expansion of (1+(1/x))^7
7C7 x (1/x)^0 x 1^7 = 1 7C6 x (1/x)^1 x 1^6 = 7x^-1 7C5 x (1/x)^2 x 1^5 = 21x^-2 7C4 x (1/x)^3 x 1^4 = 35x^-3 So 1+7x^-1 + 21x^-2 + 35x^-3 (**the only difference between ascending and descending powers of x is that you start with, in this case for descending powers, 7C7 instead of 7C0!**)
66
How can you tell if a question is about a 2 tailed test?
It uses the words: difference/ biased / change rather than increase/ decrease (which would indicate one tailed - regular - tests)
67
A die is rolled 36 times to see if it biased. 6 is rolled once. Test at the 5% level whether or not there’s evidence the die is biased.
The vague language shows it’s a 2 tailed test: Let p be the probability of rolling a 6 H0: p=1/6 H1: p**≠**1/6 Assume H0 is true: X ~ B (36,1/6) when P(x≤1/6) (**use ≤ because 1/36 being a 6 is a very small probability so it’s likely the bias is against sixes**) = 0.0116 < 0.025 (**significance level halved!!**) The result is statistically significant so reject H0. There’s significant evidence to suggest the die is biased
68
What value is never part of a critical region?
The mean
69
If H0: p=0.6 H1: p≠0.6 X ~ B (32,0.6) Test at the 10% significance level to find the critical region
≠ means it’s a 2 tailed test so your answer will be a union of 2 critical regions. 10% needs to be halved so you use 0.05 (5%) 1) find the mean to give you a starting point (nxp = 32x0.6=19.2) 2) to find the lower critical region, test each number consecutively working down from the mean until you reach one that gives a p-value below 0.05. This is true for numbers below 14 so x∈[0,14] because it’s the lower bound the **zero is automatically known** P(x≤14) = 0.0463<0.05 P(x≤15)=0.0920>0.05 3) to find the upper critical region, test each number consecutively working up from the mean until you reach one that gives a p-value below 0.05. This is true for numbers above 25 so x∈[25,32] because it’s the upper bound the **top number is always ‘n’, which in this case is 32** P(x≥24) = 0.0575>0.05 P(x≥25)=0.0248<0.05 4) answer: x∈[0,4] U [25,32]
70
Find the critical region given that: H0: p=0.8 H1: p<0.8 X ~ B (70,0.8) 5% significance level
1) mean (np) = 70x0.8=56 2) because the **sign is < in the question (for H1) use P(x≤…)** for each value you try 3) test numbers between the mean and zero because of the < symbol 4) the first number with a p-value below the significance level (0.05) is 49. 5) write out both 49 and 50 to show that 50 doesn’t satisfy being below 0.05 and thus that 49 is the first acceptable value: P(x≤49) = 0.0303<0.05 P(x≤50)=0.0545>0.05 6) **zero is always the lower bound for < symbol** so the critical region is: x∈[0,49]
71
Find the critical region given that: H0: p=0.25 H1: p>0.25 X ~ B (85,0.25) 10% significance level
1) mean (np) = 85x0.25=21.25 2) because the **sign is > in the question (for H1) use P(x≥…)** for each value you try 3) test numbers between the mean and ‘n’ (which in this case is 85) because of the > symbol 4) the first number with a p-value below the significance level (0.1) is 27. 5) write out both **26** and 27 to show that 26 doesn’t satisfy being below 0.1 and thus that 27 is the first acceptable value: P(x≥26) = 0.1439>0.1 P(x≥27)=0.09639<0.1 6) **n is always the upper bound for > symbol** so the critical region is: x∈[27,85]
72
What type of data is used for the normal distribution?
Continuous data that forms a bell-shaped curve
73
What is the equation for the normal distribution?
X ~ N (μ, σ^2) Where μ is the mean of the data and σ^2 is the variance
74
What is always true about the area under a normal distribution curve?
The **area represents (/ is equal to) the probability** and the **total area under the graph sums to 1**
75
What is true of the horizontal axis of a normal distribution curve?
It’s an asymptote
76
For continuous data is this statement true: P(x>24) = P(x≥24) ?
Yes, because values greater than 24 can be infinitely close to 24 (therefore we can say they equal 24)
77
What’s equation represents the standard normal distribution?
Z ~ N (0,1^2) where the values in the brackets are always zero and 1!
78
What is the equation to calculate Z for the standard normal distribution?
Z = (x - μ) /σ This **isn’t** in your formula booklet so you must learn it!!
79
Given X ~ N (10, 16), find P(x<12)
1) work out the standard normal distribution: z=(x-μ)/σ so z=(12-10)/4 =0.5 2) the area of the standardised graph will equal the original graphs area so P(z<0.5)=P(x<12)
80
How do you use a calculator to answer normal distribution questions?
1) Select inference, then probability, then normal 2) Ensure the mean and standard deviation are set to 0 and 1 respectively (when using z) 3) use the toggle on the left to select the graph type (/ whether the symbol is greater than or equal to etcetera) 4) type in your z value 5) **write your answer to 4dp**
81
Given X ~ N (8, 9), find P(x>9)
1) Work out the standard normal distribution: z=(x-μ)/σ so z=(9-8)/3 =1/3 2) P(x>9) = P(x≥9) so on the calculator select the graph icon that gives you the ≥ symbol and type in 1/3. 3) The answer is 0.3694
82
What is the P(x=56) for a normal distribution curve?
**Zero** because the data is continuous, so you can have an infinite number of values, thus the probability of choosing any one number is infinitely small
83
What is the P(x≠57) for a normal distribution curve?
**1** because the probability of choosing all the numbers except 57 (one value out infinitely many) is so high it’s basically 1
84
Given X ~ N (56,10^2), convert to the standard normal distribution to find P(56
1) work out 2 separate z-values: z= 56-56/10=0 and z=65-56/10=0.9 2) select the graph icon showing P(_≤x≤_), then insert your z-values 3) the answer is 0.3159
85
Find the critical region for H0: p=0.2, H1: p<0.2 n=10 at the 5% significance level
1) mean=10x0.2=2 so start testing at 2 2) P(x≤0) = 0.1073 > 0.05, **no critical region** because you never reach a value below 0.05
86
Find the critical region for H0: p=0.4, H1: p≠0.4 n=30 at the 1% significance level
1) ≠ represents a 2 tailed test so you must half the significance level! (Now 0.005) 2) mean=30x0.4=12 so start testing above and below 12 3) x∈[0,4] U x∈[20,30]
87
Define the acceptance region
The range of values which would lead you to fail to reject H0 (opposite of the critical region)
88
Define the significance level
The **probability** of rejecting H0 when it’s actually true
89
Given that X ~ B (n,0.4) and P(X≤0) =0.0778, what is n?
P(X≤0) =0.0778 means **P(X=0)**=0.0778 (n,o) x 0.4^0 x 0.6^n = 0.0778 n=log0.6(0.0778) =4.999 so **n=5**
90
It is claimed that a coin is fair. To test this claim, it is flipped 18 times. If X is the number of heads in the 18 tosses, the acceptance region for the hypothesis test conducted at the 10% significance level is given by a≤X≤b. Find the values of ‘a’ and ‘b’
a=6 b=12
91
What is the trapezium rule used for?
Used for approximating the area under a graph
92
What shape gives a smaller error bound when estimating area under a graph? A rectangle or a trapezium?
A rectangle
93
What are the ordinates and what are the strips?
Ordinates: y-values/ outputs/ heights Strips: trapeziums
94
What value is always the same for trapeziums in the trapezium rule?
h (the width of the trapeziums)
95
Explain how the trapezium rule might be used to give a better estimate to an integral
**Increasing the number of strips** improves the accuracy of the estimated area
96
If n is the number of strips, how many ordinates are there?
Always n+1 ordinates
97
What is the equation for the trapezium rule?
Area ≈ 1/2 h (first + last + 2(rest))
98
For concave graphs is the area approximated using the trapezium rule an over or underestimate? What about for a graph that is concave then convex? **Why?**
Concave: **underestimate** because the tops of the trapeziums will always be **below** the curve Convex: **overestimate** because the tops of the trapeziums will always be **above** the curve Mixture of the two: we can’t say
99
Write down 3 probability conditions such that A and B are independent events
1) P(A∩B) = P(A) x P(B) 2) P(A|B) = P(A) 3) P(B|A) = P(B) All derived from same equation as equation 1 rearranges to P(A∩B)/P(B) = P(A) which rearranges to the following 2 equations