Markov chain Monte Carlo sampling

Taken from coursework for ECE 7751: Graphical Models in Machine Learning, taught by Faramarz Fekri at Georgia Tech, Spring 2023

Inverse CDF sampling

A simple sampling method adopted by many of the standard math libraries is the inverse probability transform: draw uUnif(0,1), then draw xF1(u), where F1 is the inverse of the CDF F. Show that x is distributed according to F. What is the drawback of this method?

Solution

We let U represent the uniform random variable and start with the fact that p(Uu)=u

Showing that x is distributed according to F can be done by showing that F(x)=p(F1(U)x).

So,

p(F1(U)x)=p(UF(x))=F(x)because p(Uu)=u

And we’re done.

The drawback is that we do not always have the inverse CDF in an analytically tractable form; hence, the need for methods like Markov chain Monte Carlo (MCMC).

Combining multiple stationary distributions

Show that if both transition kernels K1 and K2 have p() as stationary density, so do K1K2 and λK1+(1λ)K2. In practice, the former corresponds to sampling from K1 and K2 cyclically and the latter draws either K1 with probability λ or K2 otherwise. Although it is not required to show, extension to more than two kernels should be straightforward.

In the continuous case, the cyclic kernel can be defined as composition of functions:

(K1K2)(z|x)=K2(y|x)K1(z|y)dy

Solution

Cyclic sampling

Discrete

We let π be the vector of probabilities of the stationary distribution and K a matrix where Kji represents the transition probability from xi to xj.

πj=iπiKjiπ=Kππ=K1π=K2π=K1K2π=K1K2π
Continuous
p()=xp(x)K1(|x)dx=xp(x)K2(|x)dxp(z)=yp(y)K1(z|y)dy=yxp(x)K2(y|x)dxK1(z|y)dy=xp(x)yK2(y|x)K1(z|y)dydxrearranging=xp(x)(K1K2)(z|x)dx

Sampling with a mixture of kernels

Discrete
λπ=λK1π(1λ)π=(1λ)K2ππ=λK1π+(1λ)K2πadd first two lines togetherπ=(λK1+(1λ)K2)combined kernelπ
Continuous
λp(y)=λxp(x)K1(y|x)dx(1λ)p(y)=(1λ)xp(x)K2(y|x)dxp(y)=λxp(x)K1(y|x)dx+(1λ)xp(x)K2(y|x)dxadd first two lines togetherp(y)=xp(x)(λK1(y|x)+(1λ)K2(y|x))combined kerneldx

Metropolis-Hastings sampling produces a stationary distribution equal to the target

Recall MH sampling for target distribution p(x) using proposal q(x|y): at state s, first draw tq(t|s), then accept t with probability A(t|s)=min(1,p~(t)q(s|t)p~(s)q(t|s))

where p(x) is the unnormalized target distribution. Show that p(x) is the stationary distribution of the Markov chain defined by this procedure.

Consider both continuous and discrete cases.

Solution

We will show that the detailed balance equation p(t)T(s|t)=p(s)T(t|s) is satisfied, which implies that p(x) is the stationary distribution of the Markov chain (see Koller & Friedman1, Proposition 12.3).

This holds for both continuous (p(t)=sp(s)T(t|s)ds) and discrete (p(t)=sp(s)T(t|s)) state spaces.

When st

p(s)T(t|s)=p(s)q(t|s)A(t|s)=p(s)q(t|s)min(1, p(t)q(s|t)p(s)q(t|s))p~(t)p~(s)=p(t)p(s)=min(p(s)q(t|s), p(t)q(s|t))=p(t)q(s|t)min(p(s)q(t|s)p(t)q(s|t), 1)=p(t)q(s|t)A(s|t)=p(t)T(s|t)

When s=t

Though T(s|t) has a different form when s=t, it does not matter. The proof is trivial; we can simply exchange s and t on one side to get the other: p(s)T(t|s)=p(s)T(s|s)=p(t)T(s|t)

Gibbs sampling produces a stationary distribution equal to the target

Recall Gibbs sampling for target distribution p(x)=p(x1,,xd): for each j1,,d, draw tp(xj|rest) and set xj=t. Show that p(x) is the stationary distribution of the Markov chain defined by this procedure.

Solution

Again, we prove that the detailed balance equation holds; in the case of Gibbs case, it is

p(xj,xj)T(xj,xj|xj,xj)=p(xj,xj)T(xj,xj|xj,xj)

where T(xj,xj|xj,xj)=p(xj|xj) and xj={xi:ij}.

We start with the left side and show it is equal to the right:

p(xj,xj)T(xj,xj|xj,xj)=p(xj,xj)p(xj|xj)=p(xj)p(xj|xj)p(xj|xj)=p(xj,xj)p(xj|xj)=p(xj,xj)T(xj,xj|xj,xj)
Kyle Johnsen
Kyle Johnsen
PhD Candidate
Biomedical Engineering

My research interests include applying principles from the brain to improve machine learning.