Sampling: Two Motivating Examples

Drawing samples from a specified target distribution is a longstanding problem that arises in many different fields. Here are two “first examples” to illustrate what the difficulties are.

Hard Disks in a Box

The figure below shows non-overlapping disks randomly packed into a box. When the disk density is low, the disks appear random. When the disk density becomes higher than $0.71$ a transition is observed, and the arrangement becomes roughly hexagonal. (This is called the Kirkwood transition.) Proving this rigorously is still open. We’re going to see if we can verify it numerically here.

Packing density 0.50
Packing density 0.72

Question 1. How do you randomly sample an arrangement of $N$ non-overlapping disks of radius $\epsilon$ in a box “uniformly” (i.e. in a manner that every valid arrangement is no more / less likely than every other arrangement)?

It is this exact question that led to the famous Metropolis Hastings algorithm. An implementation of it, and a discussion of how not to do it is in a notebook whose output you can view here, or download here.

The Ising Model

In the previous we were only interested in sampling the uniform distribution on all valid configurations. When the target distribution is not uniform, the sampling problem is even harder.

One example of this is the Ising model, which arises as a model of ferromagnetism. This considers a lattice of particles, each with a spin of $\pm 1$ representing the alignment of the magnetic field of the particles. Let $\Lambda \subseteq \R^d$ be a finite set (representing the lattice of particles), and $\sigma \colon \Lambda \to \set{\pm 1}$ be the spins. The energy of the configuration $\sigma$ is given by \begin{equation} H(\sigma) \defeq - J \sum_{\substack{i, j \in \Lambda\\i \sim j}} \sigma_i \sigma_j - B \sum_{i \in \Lambda} \sigma_i \,, \end{equation} where $i \sim j$ means $i$ and $j$ are nearest neighbors in the lattice $\Lambda$. Here $J \neq 0$ is the interaction strength. For ferromagnetic materials particles favor alignment and we have $J > 0$, and for anti-ferromagnetic materials particles do not favor alignment and we have $J < 0$. The quantity $B$ represents the strength of the external magnetic field. Having spins align with the external magnetic field (i.e. $\sign(\sigma_i) = \sign(B)$) reduces the total energy.

The Ising model dictates that we expect to see the configuration $\sigma$ with probability proportional to \begin{equation} \pi_u(\sigma) \defeq e^{-\beta H(\sigma)}\,, \quad\text{where } \beta = \frac{1}{k_B T}\,, \end{equation} Here, $T$ is the temperature, and $k_B$ is the Boltzmann constant.

To obtain the probability from $\pi_u$ you have to normalize it by computing the partition function: \begin{equation} Z_\beta \defeq \sum_\sigma \pi_u(\sigma) \, \end{equation} and then setting \begin{equation} \pi(\sigma) = \frac{\pi_u(\sigma)}{Z_\beta} \,. \end{equation} The distribution $\pi$ is called the Gibbs distribution. If there are $100$ points in the lattice, then the above sum has $2^{100}$ terms which is not computationally tractable.

If $\beta \approx 0$ then $\pi$ is roughly uniform, and there’s no alignment of spins. When $\beta$ is large, however, non-alignment of spins becomes energetically expensive. For instance with $B = 0$, $J = 1$, consider two states $\sigma = 1$ (everything aligned) and $\tau$ where the spins oscillate a lot. We compute \begin{equation} \frac{\pi_u(\sigma) }{\pi_u(\tau)} \approx \frac{e^{\beta \abs{\Lambda}}}{e^{\beta O(1)}} \,, \end{equation} and so an aligned state is $e^{\beta \abs{\Lambda}}$ times more likely to occur than a typical state. In fact, it is known that when $\beta = O(1)$, there are $O(2^{\abs{\Lambda}/2})$ low energy configurations that the system is typically in, and all the other configurations occur with negligible probability. If you blindly did uniform random sampling, then you will find the typical low energy configurations with probability roughly $2^{-\abs{\Lambda}/2}$, which is miniscule! However, a simple MCMC algorithm finds this set very quickly as shown below.

The evolution of $\log \pi_u$ of 1000 spin configurations during an MCMC simulation. The spins were initially chosen to be uniformly random.

To illustrate the evolution, Here is a video of an MCMC method to sample from $\pi$ when $\beta = 1$. It finds a typical low energy configuration quickly (few thousand iterations).

Metropolis Hastings algorithm sampling in the Ising model with a $10 \times 10$ lattice and $\beta = 1.0$.

Phase transitions

One question that is hard to analyze theoretically, but can be done quickly via MCMC is to study phase transitions. Define the magnetization of a spin configuration to be \begin{equation} M(\sigma) = \frac{1}{\abs{\Lambda}} \sum_{i \in \Lambda} \sigma(i) \,. \end{equation} By symmetry we clearly have \begin{equation} \E M = \sum_{\sigma} M(\sigma) \pi(\sigma) = \frac{1}{Z_\beta} \sum_{\sigma} M(\sigma) \pi_u(\sigma) = 0 \,. \end{equation}

When $\Lambda$ is an $N \times N$ lattice, it is possible to show that there exist two inverse temperatures $0 < \beta_1 < \beta_2$ such that:

  1. For $\beta > \beta_2$, there exists $\delta(\beta) > 0$ such that \begin{equation} \E \abs{M} > \delta(\beta)\,. \end{equation}

  2. If $\beta < \beta_1$, there exists $\delta(\beta) > 0$ such that

\begin{equation} \lim_{N \to \infty} \P( \abs{M} \leq \delta ) = 1 \end{equation}

This was originally proved by Peirels. While this is hard to prove, theoretically one can see it numerically quite easily. We simulate $m$ random spin configurations using an MCMC algorithm. Then numerically compute \begin{equation} \E \abs{M} \approx \frac{1}{m} \sum_{i = 1}^m \abs{M(\sigma_i)} \end{equation} and see if it stays away from $0$ large when $\beta = O(1)$, and whether it is close to $0$ when $\beta$ is small. Some numerical simulations are shown here.

Expected magnetization in the Ising model with 1000 realizations