Bayes’ Theorem
Elementary Version
P(A|B) = P(A∩B)/P(B)
= P(B|A)P(B) / [P(B|A)P(A)+P(B|A^c)P(A^c)]
Bayes’ Theorem
Events as Discrete Random Variables
P(X=x|Y=y) = P(Y=y|X=x)P(X=x) / [ΣP(Y=y|X=t)P(X=t)]
Bayes’ Theorem
Events as Continuous Random Variables
f(x|y) = f(y|x)f(x) / [∫f(y|t)f(t)]
Frequentist vs. Bayesian Approach
Statistical Inference
Parameter Definition
Frequentist vs. Bayesian
- a Bayesian defines a parameter as a random variable
Model
Frequentist vs. Bayesian
- Bayesian: f(x|θ) OR p(x|θ)
Bayesian Models
Influence of the Prior
How do you determine the posterior distribution?
π(θ|x) = f(x|θ)π(θ) / [∫f(x|t)π(t)dt]
∝ f(x|θ)π(θ)
What can you do with a posterior distribution?
- test hypotheses
Decision Theory
-anytime you make a decision you can lose something
-risk = expected loss
-the goal is to make decisions that will minimise risk
d = d(x) = ∈ D
-where d(x) is a decision based on the data and D is the decision space
Decision Space
Loss Function
L = L(d(x), θ) ≥ 0
-when X and θ are random, L is a real-valued random variable
Expected Loss
E(L) = E(E(L|X))
= ∫ [∫ L(d(x), θ) dπ(θ)] dP(x)
Bayes’ Decision
Prior Distribution
Beta Distribution
π(θ) = Γ(α+β)/[Γ(α)Γ(β)] * θ^(α-1) * (1-θ)^(β-1)
-for 00, β>0`
Properties of the Beta Distribution
-defined on [0,1] E(θ) = α/α+β Var(θ) = αβ / (α+β)²(α+β+1) -for α=β=1, the distribution is uniform -can assume a variety of shapes depending on α and β
Conjugate Priors
Loss Function
Squared Error Loss
-any different functions can be taken for the loss function as long as they satisfy the property that more wrong = greater loss
-e.g. the squared error:
L(d,θ) = k (d-θ)²
-we can drop the proportionality constant
Minimise Expected Loss
-let μ = E(θ|X=x)
-then:
E(L(d,θ) | X=x)
= (d-μ)² + Var(θ|X=x)
-this is minimised when d=μ = E(θ|X=x)
-i.e. Bayes’ estimate under squared error loss is the posterior mean