ELBO (evidence lower bound; marginal likelihood lower bound)
F(v) := E_q[log( p(x,z) / q(z) )]
F(v) = E_q[log( p(x|y) )] - KL(q(z) || p(y))
Score Function
score function (log derivative trick)
\nabla_v log(q(z|v)) = \nabla_v q(z|v) / q(z|v)
\Leftrightarrow \nabla_v q(z|v) = \nabla_v log(q(z|v)) * q(z|v)
Natural Gradient
\tilde{\nabla}_v F(v) = F^{-1} \nabla_v F(v)
F^{-1} = (Fisher information matrix)^{-1}
Noisy Updates of Variational Parameters
v_{t+1} = v_t + \rho_t \hat{\nabla}_v F(v)
\nabla_v ELBO using the score function
\nabla_v F(v) = E_q[\nabla_v log(q(z|v)) * ( log(p(x,z) - log(q(z|v) )]
Use Monte Carlo to compute this
Change of variables
|q(z|v) dz| = |p(\varepsilon) d\varepsilon|
Reparametrisation trick
Base distribution p(\varepsilon) [normal or uniform] and a deterministic transformation z = t(\varepsilon, v) s.t. z~q(z|v). Then:
\nabla_v E_{q(z|v)}[f(z)] = E_{p(\varepsilon)}[\nabla_v f(t(\varepsilon, v))]
Note, we take the expectation w.r.t. base distribution now.
Reparametrisation ELBO gradient
\nabla_v F(v) = E_{p(\varepsilon)}[\nabla_v * ( log(p(x,t(\varepsilon, v)) - log(q(t(\varepsilon, v)|v) )]
\nabla_v F(v) = E_{p(\varepsilon)}[\nabla_z * ( log(p(x,z) - log(q(z|v) ) * \nabla_v t(\varepsilon, v)]
where z = t(\varepsilon, v)
What are the score function ELBO gradient properties
\+ Works for all models (continuous and discrete) \+ Works for a large class of variational approximations - Variance can be high, thus, slow convergence
What are the path wise gradient estimator ELBO gradient properties
Amortised variational inference in hierarchical Bayesian models
F(v) = E_q[log(p(x,\beta,z_{1:N}) - log(q(\beta, z_{1:N} | \lambda, \phi_{1:N})],
where v = {\lambda, \phi_{1:N}}
F(v) = E_q[log(p(x,\beta,z_{1:N})]
- E_q[log(q(\beta|\lambda) + \sum_n log(q(z_n| f(x_n, \theta) )],
where \phi_n = f(x_n, \theta), f is a deep neural network
Amortised SVI (Algorithm)
BBVI (Algorithm) [black box variational inference]
SVI (Algorithm) [stochastic variational inference]
Mean-Field Approximation (Algorithm) [CW3]