What is the core idea of Bayesian inference?
To treat unknown parameters as random variables and update a prior belief about them using observed data to obtain a posterior belief.
In Bayesian terms, what is a prior distribution?
A probability distribution that represents our beliefs about a parameter before seeing the current data.
What is a likelihood in Bayesian inference?
The probability of the observed data as a function of the parameter, reflecting how plausible the data are under different parameter values.
What is a posterior distribution?
The updated distribution of the parameter after combining the prior and the likelihood using Bayes’ rule.
What is Bayes’ rule in the parameter-data form?
Posterior ∝ Likelihood × Prior, or p(θ|data) ∝ p(data|θ)p(θ).
What does the normalization constant in Bayes’ rule ensure?
That the posterior distribution integrates (or sums) to 1 over all possible parameter values.
What is a point estimate in the Bayesian framework?
A single summary of the posterior such as the posterior mean, median, or maximum a posteriori (MAP) estimate.
What is the MAP (maximum a posteriori) estimate?
The parameter value that maximizes the posterior distribution p(θ|data).
How is MAP estimation related to regularized maximum likelihood?
MAP is equivalent to maximizing log-likelihood plus log-prior, which often looks like a regularized loss function.
What type of prior corresponds to L2 regularization in linear models?
A Normal (Gaussian) prior on coefficients, leading to ridge-like penalties.
What type of prior corresponds to L1 regularization in linear models?
A Laplace (double-exponential) prior on coefficients, leading to lasso-like penalties.
Why is it useful to view regularization as a Bayesian prior?
It provides an interpretation of regularization as encoding prior beliefs about parameter magnitude and supports probabilistic reasoning.
What is a conjugate prior?
A prior distribution chosen so that the posterior is in the same family as the prior when combined with a given likelihood.
Why are conjugate priors convenient?
They allow closed-form posterior updates, simplifying analytic calculations and reducing computational cost.
What is a conjugate prior for a Bernoulli or Binomial likelihood?
The Beta distribution is conjugate to Bernoulli and Binomial likelihoods for a probability parameter.
What is a conjugate prior for a Poisson likelihood?
The Gamma distribution is conjugate for the rate parameter of a Poisson likelihood.
What is a conjugate prior for a Normal likelihood with known variance and unknown mean?
A Normal prior on the mean is conjugate, yielding a Normal posterior for the mean.
What is a Bayesian credible interval?
An interval [a,b] such that the posterior probability that the parameter lies in [a,b] equals a chosen level (e.g., 95%).
How does a credible interval differ conceptually from a frequentist confidence interval?
A credible interval directly expresses probability about the parameter given the data; a confidence interval concerns the long-run coverage of the procedure under repeated sampling.
What is the posterior predictive distribution?
The distribution of a future observation, obtained by averaging the likelihood of new data over the posterior distribution of the parameters.
Why is the posterior predictive distribution useful in ML?
It captures both parameter uncertainty and data noise, providing more realistic uncertainty about future predictions.
What is hierarchical (multilevel) modeling in the Bayesian context?
Modeling parameters themselves as drawn from higher-level distributions, allowing sharing of information across groups or entities.
Why are hierarchical models powerful?
They enable partial pooling across groups, improving estimates for small-sample groups and capturing structure in multi-level data.
What is partial pooling?
An approach where group-specific estimates are shrunk towards a global mean based on data and prior, balancing between no pooling and complete pooling.