What does ‘probability’ mean? Contrast frequentist and Bayesian interpretations.
Frequentist: P(A) is the long-run relative frequency of event A in repeated identical trials.
Bayesian: P(A) is a degree of belief (a rational measure of uncertainty) given information.
Both use the same probability calculus (axioms + rules); they differ in interpretation and how parameters are treated (fixed vs random).
Define sample space Ω, event A, and outcome ω. Give an example.
Sample space Ω: set of all possible outcomes ω of an experiment.
Event A: subset of Ω.
Example: roll a die. Ω={1,2,3,4,5,6}. Event A=’even’={2,4,6}. Outcome ω might be 5.
State the 3 Kolmogorov axioms of probability.
1) Non-negativity: for any event A, P(A) ≥ 0.
2) Normalization: P(Ω) = 1.
3) Additivity: for disjoint events A∩B=∅, P(A∪B)=P(A)+P(B). (Countable additivity in full generality.)
Derive the complement rule: P(Aᶜ)=1−P(A).
Because A and Aᶜ are disjoint and A∪Aᶜ=Ω.
So 1=P(Ω)=P(A∪Aᶜ)=P(A)+P(Aᶜ) ⇒ P(Aᶜ)=1−P(A).
Write P(A∪B) in terms of P(A), P(B), and P(A∩B).
Inclusion–exclusion for 2 events:
P(A∪B)=P(A)+P(B)−P(A∩B).
Special case: if A,B disjoint then P(A∩B)=0 and you recover additivity.
State inclusion–exclusion for P(A∪B∪C).
P(A∪B∪C)=P(A)+P(B)+P(C)
−P(A∩B)−P(A∩C)−P(B∩C)
+P(A∩B∩C).
Define conditional probability P(A|B). When is it defined?
If P(B)>0, then P(A|B) = P(A∩B)/P(B).
Interpretation: restrict the sample space to B and renormalize.
Express P(A∩B) using P(A|B) and P(B) (and the symmetric form).
P(A∩B)=P(A|B)P(B)=P(B|A)P(A), assuming the conditioning event has positive probability.
State the law of total probability for a partition {B_i}.
If {B_i} are disjoint, cover Ω, and P(B_i)>0, then for any event A:
P(A)=Σ_i P(A|B_i)P(B_i).
Think: break A into pieces inside each B_i.
State Bayes’ theorem and interpret each term.
Bayes: P(A|B)= P(B|A)P(A) / P(B), with P(B)>0.
P(A)=prior, P(B|A)=likelihood, P(B)=evidence/normalizer, P(A|B)=posterior.
Write Bayes’ theorem in odds form for hypotheses H1 vs H0.
Posterior odds = prior odds × Bayes factor.
P(H1|D)/P(H0|D) = [P(D|H1)/P(D|H0)] × [P(H1)/P(H0)].
The Bayes factor measures evidence from data D.
What is n! and when does it appear in counting?
n! = n·(n−1)·…·2·1 (with 0!=1).
Counts the number of ways to order n distinct objects (permutations of length n).
How many ways to arrange r objects chosen from n distinct objects (no repetition)? Why does order matter?
Permutations: P(n,r)= n·(n−1)·…·(n−r+1)= n!/(n−r)!.
Order matters because different sequences correspond to different outcomes (e.g., gold/silver/bronze).
How many ways to choose r objects from n distinct objects (no repetition) when order does not matter? Why not?
Combinations: C(n,r)= n choose r = n!/[r!(n−r)!].
Order doesn’t matter because selections are sets: {a,b}={b,a}.
You can derive it by dividing permutations by r! (all r! orders represent the same set).
How many distinct permutations of n items with counts n1,n2,…,nk (sum=n)?
Multiset permutations: n!/(n1! n2! … nk!).
Reason: start with n! orders, but swapping identical items doesn’t create a new arrangement; divide by each group’s factorial.
How many ways to choose r items from n types with repetition allowed (order irrelevant)?
Stars and bars: number of nonnegative integer solutions to x1+…+xn=r is C(n+r−1, r).
Interpret r stars and n−1 bars as separators between item types.
State the binomial theorem and connect it to combinations.
(a+b)^n = Σ_{k=0}^n C(n,k) a^{n−k} b^k.
C(n,k) counts ways to choose which k of the n factors contribute a ‘b’ term (order of factors irrelevant).
Use inclusion–exclusion to count |A∪B| and explain why subtraction is needed.
|A∪B|=|A|+|B|−|A∩B|.
Adding |A| and |B| double-counts elements in both sets; subtract once to correct.
What is a random variable (RV)? Discrete vs continuous?
An RV X is a function from outcomes ω∈Ω to real numbers: X(ω)∈ℝ.
Discrete: takes countable values with PMF p(x)=P(X=x).
Continuous: takes values on intervals; probabilities come from integrals of a PDF f(x), with P(X=x)=0.
Define the CDF F_X(x). List key properties.
F_X(x)=P(X≤x).
Properties: non-decreasing; right-continuous; limits: F(−∞)=0, F(∞)=1.
For any a<b: P(a< X ≤ b)=F(b)−F(a).
Explain PMF vs PDF and how probabilities are computed in each case.
Discrete: PMF p(x)=P(X=x). For set S: P(X∈S)=Σ_{x∈S} p(x).
Continuous: PDF f(x)≥0 with ∫ f(x) dx =1 and P(a≤X≤b)=∫_a^b f(x) dx.
PDF is not a probability at a point; it is a density.
How are PDF and CDF related for a continuous RV?
F(x)=∫_{−∞}^x f(t)dt.
If F is differentiable, f(x)=F’(x).
Probabilities come from area: P(a<X≤b)=F(b)−F(a)=∫_a^b f(t)dt.
How does the CDF look for a discrete RV? What do the jumps mean?
A discrete CDF is a step function.
Jump size at x equals P(X=x).
Formally: P(X=x)=F(x)−lim_{t→x^-}F(t).
Define survival function S(x) and hazard h(x). When are they used?
Survival: S(x)=P(X>x)=1−F(x).
Hazard (continuous): h(x)=f(x)/S(x).
Used in time-to-event / reliability / survival analysis; exponential has constant hazard.