Approximate Inference - 08 Flashcards by Joana Saraiva

In practice, the probabilistic models that we use are often quite complex, and simple algorithms for exact inference, like ______________ _____________, may be too slow for them. In fact, many interesting classes of models may not admit ________ _____________-________ solutions at all.
For this reason, a significant amount of research effort in machine learning is devoted to developing algorithms that yield approximate solutions to the inference problem.

variable elimination; exact polynomial; time

How well did you know this?

Not at all

Perfectly

What are the 2 main families of approximate inference algorithms?

1) Variational methods: which formulate inference as an optimization problem.

2) Sampling methods: which produce answers by repeatedly generating random numbers from a distribution of interest.

How well did you know this?

Not at all

Perfectly

_____________ methods have historically been the main way of performing approximate inference, although over the past 15 years, _____________ methods have emerged as viable (and often superior) alternatives.

Sampling; variational

How well did you know this?

Not at all

Perfectly

For most probabilistic models of practical interest, _________ inference is intractable, and so we have to resort to some form of approximation. We now consider approximate inference methods based on ____________ ___________, also known as Monte Carlo techniques. Stochastic approach to solve numerical integration problems.

exact; numerical sampling

How well did you know this?

Not at all

Perfectly

For some applications, the posterior distribution over _____________ variables will be of direct interest in itself. For most situations, the posterior distribution is required primarily for the purpose of evaluating _______________, for example, to make _____________.

unobserved; expectations; predictions

How well did you know this?

Not at all

Perfectly

The fundamental problem that we wish to address involves finding the _____________ of some function f(z) with respect to a probability distribution p(z).

The components of z might comprise ____________ or _____________ variables or some combination of the two.

expectation; discrete; continuous

How well did you know this?

Not at all

Perfectly

We want to compute the ___________ value of some function. We shall suppose that such expectations are too ___________ to be evaluated exactly using _____________ techniques.

expected; complex; analytical

How well did you know this?

Not at all

Perfectly

The general idea behind sampling methods is to obtain a set of samples drawn _______________ from the distribution p(z). This allows the expectation to be approximated by a __________ ________. This is called __________ _________ integration. The accuracy of the estimator does not depend on the dimensionality of z, and only depends on the ___________ _____ ____________.

independently; finite sum; Monte Carlo; number of samples

How well did you know this?

Not at all

Perfectly

In the case of a ___________ graph with no observed variables, it is straightforward to sample from the joint distribution (assuming that it is possible to sample from the conditional distributions at each node) using the ____________ _____________.

directed; ancestral sampling

How well did you know this?

Not at all

Perfectly

In the case of probability distributions defined by an _____________ graph, there is no ______-_______ sampling strategy that will sample from the prior distribution with no observed variables. Instead, computationally more expensive techniques must be employed, such as ________ sampling.

undirected; one; pass; Gibbs

How well did you know this?

Not at all

Perfectly

What is forward sampling?

To obtain a sample from the joint distribution, we make one pass through the set of variables in the order z1,…,zM sampling from the conditional distributions p(zi|pai). At each step, all of the parent values will have been instantiated. After one pass through the graph, we will have obtained a sample from the joint distribution.

In a Bayesian network over M variables, forward sampling allows us to sample from the joint distribution x ∼ p(x) in linear time, O(M), by taking exactly one multinomial sample from each CPD.

How well did you know this?

Not at all

Perfectly

What are the limitations of forward sampling?

Forward sampling generates samples following the joint distribution p(X) and assumes that no variables are fixed (no observed evidence). When we want to compute p(Y|E = e), we can’t directly enforce the evidence, we just hope that some samples naturally match it. But if the evidence is rare (almost no samples will match), it is very inefficient.

How well did you know this?

Not at all

Perfectly

What’s the idea behind rejection sampling?

Generate samples as in forward sampling, but discard those that don’t match the evidence. The remaining (accepted) samples are approximately drawn from p(X|E = e). Simple and intuitive, but wasteful if the evidence is unlikely.

How well did you know this?

Not at all

Perfectly

Rejection sampling is simple and intuitive but wasteful if evidence is ____________.

unlikely

How well did you know this?

Not at all

Perfectly

Explain rejection sampling.

We need some simpler distribution q(z), called proposal distribution, from which we can readily draw samples. The proposal distribution needs to satisfy kq(z) ≥ p(z) for all values of z and some constant k. The function kq(z) provides an upper envelope for p. Each step of the rejection sampler involves generating two random numbers:
sample z0 ∼ q(z), which corresponds to picking a random z location; and sample u0 ∼ Unif(0,kq(z0)), which corresponds to picking a random height (y location) under the envelope. If u0 > p(z0) then the sample is rejected, otherwise u0 is retained.

p(z) here being the unnormalized version of the probability distribution

How well did you know this?

Not at all

Perfectly

How efficient is rejection sampling?

Study These Flashcards

The fraction of points that are rejected by this method depends on the ratio of the area under the unnormalized distribution to the area under the curve kq(z).

If p is a normalized target distribution, the acceptance probability is 1/k. The constant k should be as small as possible subject to the limitation that kq(z) must be nowhere less than p(z).

For rejection sampling to be of practical value, we require that the comparison function (kq(z)) be close to the target distribution so that the rate of rejection is kept to a minimum. For practical examples, where the target distribution may be multimodal and sharply peaked, it will be
extremely difficult to find a good proposal distribution and comparison function.

Rejection can be a useful technique in one or two dimensions; it is unsuited to problems of high dimensionality.
The acceptance rate diminishes exponentially with dimensionality.

____________ sampling can be a useful technique in one or two dimensions. It is, however, unsuited to problems of high dimensionality. The ___________ rate diminishes exponentially with dimensionality.

Study These Flashcards

Rejection; acceptance

What are the limitations of rejection sampling?

Study These Flashcards

Samples that don’t match the evidence are discarded.
When evidence is rare or sharply peaked, almost all samples are rejected. It results in wasted computation and very few effective
samples.

What’s the idea behind importance sampling?

Study These Flashcards

Keep all samples from an easier-to-sample distribution q(x)
Adjust their contribution by a weight w(x)=p(x)/q(x)
Samples that better represent p(x) get larger weights
No rejection means more efficient use of computation

What are the limitations of importance sampling?

Study These Flashcards

Works well only when the proposal q(x) is similar to the target p(x). If p(x) is high-dimensional or strongly correlated, it’s
almost impossible to design a good proposal. Leads to high variance in weights, meaning that few samples dominate the estimate.

What is the idea behind Gibbs sampling?

Study These Flashcards

Instead of drawing independent samples from q(x), build a Markov chain that moves through the space of p(x)
Exact sample depends on the previous one, but after some time, the chain converges to the true target distribution p(x)
No need for a separate proposal, only need the conditional distribution p(xi|x\i)

For low-dimensional problems, we can use methods such as ____________ sampling. However, for high-dimensional problems, it is more common to use Markov chain Monte Carlo (MCMC). _________ sampling is a simple and widely applicable MCMC algorithm. It reduces the problem of ______________ sampling to the problem of univariate sampling.

Study These Flashcards

importance; Gibbs; multivariate

In the context of Gibbs sampling, z^(k+1) differs from z(k) in what?

Study These Flashcards

In one single component!

Exact inference becomes computationally infeasible in large or dense networks. We replace exact computation with sampling based approximations that trade precision for scalability.True or False?

Study These Flashcards

True

Accuracy improves with the number of __________, but efficiency depends on how well we ____________ them.

samples; generate

Sampling methods approximate inference by exploring the distribution through _______ rather than computing it exactly, trading ____________ for _____________.

data; precision; scalability

What is the main limitation of Gibbs sampling?

It presents a burn-in effect before going on to converge.

Approximate Inference - 08 Flashcards

(27 cards)