What are the two problems we often want to solve with sampling using Monte Carlo methods?
How can we approximate expectation of f(x) using monte carlo?
Let x_s be samples, then
E(f(x)) = integ f(x)*p(x) dx = approx (1/S) sum f(x_s)
How can we approximate predictions using monte carlo sampling?
Let theta_s be samples, then:
p(y|x) = integ p(y|x,theta) p(theta) d theta = approx
(1/S) sum p(y|x,theta_s)
Is the monte carlo estimator for expectation biased?
No it is unbiased
What happens to the variance of the monte carlo estimator for expectation when the number of samples, S, increases?
It decreases by a factor of 1/S
How can we use monte carlo to sample discrete values?
Sample from U(0,1), assign the probability for different outcomes to different intervals
What is the rejection sampling algorithm?
What are the problems of rejection sampling?
2. We might reject most samples
What is the idea behind importance sampling?
Instead of throwing away samples, we weigh them by w_s = p(x_s)/q(x_s). So Ep( f(x) ) = Eq( f(x) * w_s) = approx 1/S sum f(x_s) w_s
Is importance sampling unbiased?
Yes, if q > 0 when p > 0.
What are the dissadvantages of importance sampling?
What is the idea behind MCMC (Markov Chain Monte Carlo)?
Instead of generating independent samples, use a proposal function q that depends on the previous sample.
p(x_t+1|x_t) = T(x_t+1|x_t) where T is called a transition operator. Example:
T(x_t+1|x_t) = N(X_t+1 | x_t, sigma^2*I)
Name a necesary property for the MCMC transition operator.
The markov property, so
p(x_t+1|x_t, x_t-1…. x_0) = p(x_t+1|x_t)
What are the properties of the samples from MCMC?
They are not independent, but they are unbiased, so we can take the average.
What are the four different ways a markov chain might behave?
1) Diverge
2) Converge to a single point
3) Conberge to a limited cycle
4) Converge to an equilibrium distribution
What kind of behaviour do we want from our MCMC markov chain?
Converge to an equilibrium distribution
How can we reduce the dependency of our samples?
Discard most of the samples, keeping only every M’th sample.
What are the two conditions for converging to a equilibrium distrubution?
These conditions ensure that we only have one equilibirum distribution.
Name a sufficient condition for p* beeing invariant
Detailed balance, p(x)T(x’|x) = p(x’)T(x|x’)
What is the Metropolis-Hasting algorithm?
Assume that p_hat = Zp can be evaluated easily.
Let q(x’|x_t)~N(x’|x_t, sigma^2) and let sigma be the step size. What happpens with
1) Large step sizes
2) Small step sizes
2. We might not explore enough.
What distributions do we need for Gibbs?
We need the conditional distributions for each paramter.
What is the requirement for Ergodicity (Any state can be reached from any point) in Gibbs sampling?
All conditionals are greater than 0.
What are the steps for finding the conditional distribution?