In a latent variable model, we assume that the observed variables are caused by, or generated by, some _____________ ___________ factors, which represent the ”true” state of the world. These models are harder to ____ than models with no latent variables.
underlying latent; fit
What are the advantages of the latent variable models?
1) LVMs often have fewer parameters than models that directly represent correlation in the visible space.
2) The hidden variables in an LVM can serve as a bottleneck, which computes a compressed representation of the data → basis of unsupervised learning.
A LVM is any probabilistic model in which some variables are always latent or hidden. Give a example of a LVM.
Mixture model
Whta’s a LVM?
It is any probabilistic model in which some variables are always latent or hidden.
Interpret an image in terms of an
underlying 3D scene, represented by objects and surfaces. Forward mapping from hidden state to visible state is often ______________ (different latent values may give rise to the same observation). The inverse mapping is ___________.
many-to-one; ill-posed
Mixing weights can also be called ___________ _____________.
mixture coefficients
If a bottle belongs to juice type A or B, its sugar concentration is assumed to be generated from a __________ distribution specific to that _______.
Gaussian; type
The most widely used mixture model is the mixture of ____________.
Gaussians
By using a sufficient number of ___________ and by adjusting their means and ______________ as well as the coefficients in the __________ combination, a GMM can be used to approximate any __________ defined on R^D.
We can use ______________ ______________ to set the values of the parameters that define the GMM distribution.
Gaussians; covariances; linear, density; maximum likelihood
In the context of mixture models, the likelihood function is given by a ____________ of the probabilities of each ____________ when given the set of ______________.
product; datapoint; parameters
The maximum likelihood solution for the parameters no longer has a ________-form ____________ solution due to the presence of the summation over k inside the ____________.
closed; analytical; logarithm
To maximize likelihood, we can employ a powerful framework called ____________-_____________ (EM).
Expectation; Maximization
The mean µ_{k} for the k-th Gaussian component is obtained by taking a ___________ _________ of all the points in the dataset, in which the ____________ factor for data point x_{n} is given by the posterior probability r_{nk} that component k is responsible for _____________ x_{n}.
weighted mean; weighting; generating
Similarly, to the mean, the covariance Σ_{k} for the k-th Gaussian component is proportional to the weighted _____________ ___________ _____________, i.e., each data point is weighted by the corresponding posterior probability.
empirical scatter matrix
The mixing coefficient for the k-th component is given by the average _______________ that component takes for explaining the _____________.
responsibility; datapoints
The estimation of the mixing coefficients makes use of a ______________ multiplier.
Lagrange
The previous results for mean, covariances, and mixing coefficients don’t constitute a _________-form solution for the parameters of the mixture model.
The responsabilities (or posterior probabilities) r_{nk}, which is the conditional probability of ____ given ____, depends on those parameters in a complex way.
These results suggest a simple iterative scheme for finding a solution to the _____________ _____________ problem. It turns out to be an instance of the _______ algorithm for the particular case of the Gaussian mixture model.
closed; z; x; maximum likelihood; EM
Explain the EM algorithm for GMMs.
1) Initialize the means, covariances and mixing coefficients, and evaluate the initial value of the log likelihood.
2) E-step: use the current values for the parameters to evaluate the posterior probabilities.
3) M-step: use these probabilities to re-estimate the means, covariances, and mixing coefficients^a.
4) Evaluate the log-likelihood (eq. 1) and check for convergence of either the parameters or the log-likelihood. If the convergence is not satisfied return to step 2^b.
a^We first evaluate the new means and then use these new values to find the covariances.
b^Each update to the parameters resulting from an E-step followed by an M-step guaranteed to increase the log-likelihood function.
What is the goal of the EM algorithm for GMMs?
Given a GMM, the goal is to maximize the likelihood function with respect to the parameters.
What’s a full covariance matrix?
It means the components may independently adopt any position and shape.
What’s a tied covariance matrix?
It means they have the same shape, but the shape may be anything.
What’s a diagonal covariance matrix?
It means the contour axes are oriented along the coordinate axes, but otherwise, the eccentricities may vary between components.
What’s a spherical covariance matrix?
It is a ”diagonal” situation with circular contours (spherical in higher dimensions).
K-means is a special case of EM. True or False?
True