probability theory
the mathematical study of uncertainty.
experiment
A probability experiment is any experiment whose outcomes rely purely on chance (eg, rolling a die).
simple event
A simple event is an outcome of a single repetition of the experiment (eg, “get a six”)
event
A collection of several simple events is called an event (eg, “get an even number”). We usually denote this by “E”.
Sample space
The collection of all simple events is also called the sample space. We denote this collection by “S”.
probability distribution
When conducting an experiment, the probability distribution is the set of all possible outcomes, together with their corresponding probabilities;
the probability of an outcome is a numerical measure of its likelihood of occurrence.
Simple Probability
If all possible outcomes in the sample space are equally likely (eg, the die is “fair”) then we can compute the probability of any event E, written P(E), by
Properties of Simple Probabilities:
Properties of Simple Probabilities:
1. 0 ≤ P(E) ≤ 1, for every event E.
2. P(E) is the sum of probabilities of all simple events comprising E.
3. P(S) = 1
In plain English we could say:
1. No event can ever have a probability greater than 1 or less than 0. 𝑃(𝐸) = 1 means E is guaranteed to occur 𝑃(𝐸) = 0 means E is guaranteed to not occur.
2. If an event consists of a collection of simple outcomes, then the probability of the event is equal to sum of the probabilities of the outcomes of which the event is comprised.
3. The particular event consisting of all possible outcomes has probability of 1. That is, if an experiment is conducted, we are guaranteed to observe one of the possible outcomes.
Empirical Probability
If we perform an experiment many times, we can get an estimate of the probability of an event by simply counting the favourable outcomes.
P(E) = number of favourable / number of experiments
= f / n
We can create new events based on other events:
i. Complement (not E): The event E does not occur. The complement of E is itself a new event, usually denoted by 𝐸𝑐, but sometimes bar 𝐸̅ or 𝐸′
ii. Intersection (E1 & E2): The events E1 and E2 both occur. The intersection of E1 and E2 is itself a new event, denoted by 𝐸1 ∩ 𝐸2, sometimes written as 𝐸1&𝐸2
iii. Union (E1 or E2): At least one of the events E1 or E2 occur. The union of E1 and E2 is itself a new event, denoted by 𝐸1 ∪ 𝐸2, sometimes written as 𝐸1𝑂𝑅 𝐸2
Since events may contain more than just one simple event, two different events, E1 and E2, can be related in several ways:
i. E1 is a subset of E2 (all of E1 is contained in E2) or vice versa.
ii. E1 and E2 have some elements in common.
iii. E1 and E2 have no common elements
Venn diagram
depicts the sample space as a rectangle, and events as disks contained within the rectangle.
mutually exclusive (or disjoint)
Two or more events are said to be mutually exclusive (or disjoint) if no two of them have outcomes in common,
Basic Rules of Probability:
i. The Special Addition Rule: If events A and B are mutually exclusive, then
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
Moreover, this same property applies to more than 2 mutually exclusive events. That is, if 𝐸1, 𝐸2, ⋯ , 𝐸𝑛 are mutually exclusive, then
𝑃(𝐸1 ∪ 𝐸2 ∪ ⋯ ∪ 𝐸𝑛) = 𝑃(𝐸1) + 𝑃(𝐸2) + ⋯ + 𝑃(𝐸𝑛)
ii. The Complementation Rule: For any event E,
𝑃(𝐸) = 1 − 𝑃(𝐸𝐶)
iii. The General Addition Rule: If A and B are any two events, then
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
Rules (i) and (ii) are simply special cases of rule (iii). (i) follows from (iii) because mutually exclusive events have empty intersection (so 0 probability of both occurring).
(ii) follows from (iii) since a probability space requires P(S)=1, and since any event E in S must satisfy 𝐸 ∪ 𝐸 𝐶 = 𝑆. Thus,
𝑃(𝐸) + 𝑃(𝐸𝐶) = 𝑃(𝐸 ∪ 𝐸𝐶) = 𝑃(𝑆) = 1,
which implies
𝑃(𝐸) = 1 − 𝑃(𝐸𝐶)
One can visualize these results, using the Venn diagrams, and noting the following:
- the total area of the rectangle is equal to 1, and this area represents the probability of anything from S, i.e. P(S)=1;
- the area common to two disks represents the probability of their intersection, i.e. P(A∩B)
contingency table/two-way
displays the frequency distribution for a bivariate dataset by placing one variable on the vertical axis and one variable on the horizontal axis, such that the frequencies falling into each cell correspond to number of observations for that particular combination of levels/classes of the two variables.
Furthermore, the total frequencies for each row and column are given in the margin
joint frequency distribution.
The bivariate data presented in a contingency table describes a joint frequency distribution. If we replace the frequencies with their corresponding probabilities, then we have a joint probability distribution. The interior cells are the joint probabilities, and the marginal cells are the marginal probabilities
conditional probability
probability that event B occurs, given that event A occurs is called the conditional probability of B given A.
Notation: The conditional probability of B given A is denoted P(B|A). The vertical line is read as “given”. “The probability of B given A”
𝑃(𝐵|𝐴) = “size” of B, restricted by fact that A occurs / “size” of S, restricted by fact that A occurs
The conditional probability rule: If A and B are any two events with P(A)>0, then
𝑃(𝐵|𝐴) = P(A ∩ B ) / P(A)
general multiplication rule
A consequence of the conditional probability rule is the general multiplication rule: For events A and B,
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴)
and
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐵)𝑃(𝐴|𝐵)
Tree diagram
When using the general multiplication rule, it is sometimes helpful to draw a tree diagram, which displays as splitting branches all possible events and their conditional probabilities
dependent events
wherein the probability of an event depends on whether the other event occurs.
independent events
sometimes the probabilities associated with two events do not depend on one another. Such events are said to be independent. More formally, events A and B are said
to be independent if
𝑃(𝐴|𝐵) = 𝑃(𝐴),
or
𝑃(𝐵|𝐴) = 𝑃(𝐵),
or
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵).
This definition leads to the special multiplication rule: For independent events E1, E2,
E3,…
𝑃(𝐸1 ∩ 𝐸2 ∩ 𝐸3 ∩ ⋯ ) = 𝑃(𝐸1)𝑃(𝐸2)𝑃(𝐸3) ⋯
That is, the probability of the intersection of independent events is equal to the product of the probabilities of each event.
basic counting rules
enable us to count the number of possibilities in various situations using simple combinatorial techniques
Full Arrangements, Factorials
A particular variant of the basic counting rule is the factorial, which applies to the situation where there are n possibilities for the first action, n-1 possibilities for the second action, … , 2 possibilities for the n-1st action, and 1 possibility for the nth action. More formally, the product of the first n positive integers is called n factorial:
𝑛! = 𝑛(𝑛 − 1) ⋯ (2)(1)
Note: n! gives the number of ways to arrange n unique objects. For this reason, we define 0!=1, i.e. there is exactly 1 way to arrange 0 objects: do nothing!
Partial Arrangements, Permutations
Instead of rearranging all n distinct objects, we could select only “k” of the objects and then arrange those k objects. This is called a permutation. There are
𝑛𝑃𝑘 = 𝑛!/ (𝑛−𝑘)!
ways to choose k objects and then arrange them. This is usually pronounced “n permute k”
𝑛𝑃𝑘 = 𝑛(𝑛 − 1). . . (𝑛 − 𝑘 + 2)(𝑛 − 𝑘 + 1)
= 𝑛(𝑛 − 1). . . (𝑛 − 𝑘 + 2)(𝑛 − 𝑘 + 1)(1)(1) … (1)(1)
= 𝑛(𝑛 − 1). . . (𝑛 − 𝑘 + 2)(𝑛 − 𝑘 + 1) (𝑛 − k/ n- k)(𝑛 − 𝑘 − 1/𝑛 − 𝑘 − 1)(1/1)
= 𝑛!/(𝑛 − 𝑘)!