Coupling
Bounding the distance to stationary
Coupling is a powerful technique that has many uses. Here we limit our scope to using coupling to estimate mixing times. As we saw previously the mixing time bounds for the random walk on the cycle, or on the binary hypercube were suboptimal by a logarithmic factor. In both these cases coupling can be used to obtain the sharp bound.
As usual suppose $(X_n)$ is a Markov chain with state space $\mathcal X$, aperiodic, irreducible transition matrix, $P$, and stationary distribution $\pi$.
Definition 1. A Markovian coupling of $P$ is a Markov process $(X_n, Y_n)$ on $\mathcal X \times \mathcal X$ such that for every $x, x’, y, y’ \in \mathcal X$ we have \begin{equation}\label{e:cpl}\tag{C} \P( X_{n+1} = x’ \given X_n = x, Y_n = y ) = P(x, x’) \qquad \P( Y_{n+1} = y’ \given X_n = x, Y_n = y ) = P(y, y’) \end{equation}
Remark 2. Let $(X_n, Y_n)$ be a Markovian coupling of $P$. By definition $(X_n, Y_n)$ is a Markov process, and define \begin{equation} P_\cpl(x,y; x’, y’) = \P( X_{n+1} = x’, Y_{n+1} = y’ \given X_n = x, Y_n = y ) \end{equation} to be the transition kernel of the Markov process $(X_n, Y_n)$. The coupling condition \eqref{e:cpl} requires that for every $x, y, x’$ we have \begin{equation} \sum_{y’ \in \mathcal X} P_\cpl(x, y; x’, y’ ) = P(x, x’) \end{equation} and for every $x, y, y’$ we have \begin{equation} \sum_{x’ \in \mathcal X} P_\cpl(x, y; x’, y’ ) = P(y, y’) \,. \end{equation}
There are several choices for the transition kernel $P_\cpl$ that satisfy these properties, and it’s useful to enumerate some of them for a simple example (e.g. lazy random walk on a cycle).
Example 3. Consider the lazy random walk on the cycle $\set{0, \dots, N-1}$. Let $(\xi_n, \zeta_n)$ be a sequence of random variables so that:
- For every $n$, we have $\P( \xi_n = 0 ) = \P( \zeta_n = 0) = 1/2$ and $\P(\xi_n = \pm 1) = \P(\zeta_n = \pm 1) = 1/4$.
- For every $n$, $(\xi_{n+1}, \zeta_{n+1})$ is independent of $(\xi_1, \zeta_1)$, $(\xi_2, \zeta_2)$, …, $(\xi_n, \zeta_n)$. However, we do not require $\xi_n$ and $\zeta_n$ to be independent.
Then a coupling of the lazy random walk is the Markov chain $(X_n, Y_n)$ \begin{equation} X_{n+1} = X_n + \xi_n \pmod{N} \qquad Y_{n+1} = \begin{cases} X_{n+1} & \text{if } X_n = Y_n\,, \\ Y_n + \zeta_n \pmod{N} & \text{if } X_n \neq Y_n \,. \end{cases} \end{equation}
Notice there are many choices for the joint distribution of $(\zeta_n, \xi_n)$. We could choose $\zeta_n = \xi_n$, or $\zeta_n = -\xi_n$, or $\zeta_n$ to be independent of $\xi_n$, or some combination.
Remark 4. For all couplings we consider, we will always ensure $Y_{n+1} = X_{n+1}$ if $Y_n = X_n$.
Let $(X_n, Y_n)$ be any Markovian coupling of $P$, and define \begin{equation} \tau_\cpl = \min \set{n \st X_n = Y_n } \,. \end{equation} Then \begin{equation} d(n) \leq \max_{x, y \in \mathcal X} \P^{x, y}( \tau_\cpl > n ) \quad\text{and hence}\quad \tmix = \tmix(\tfrac{1}{4}) \leq 4 \max_{x, y} \E^{x, y} \tau_\cpl \,. \end{equation}
Remark 6. One can of course use the above to obtain \begin{equation} \tmix(\epsilon) \leq \frac{1}{\epsilon} \max_{x, y} \E^{x, y} \tau_\cpl \,, \end{equation} but that is suboptimal. A better way is to use \begin{equation} \tmix(\epsilon) \leq 4 |\log_4 \epsilon| \max_{x, y} \E^{x, y} \tau_\cpl \,. \end{equation}
Examples
Lazy random walk on the $N$ cycle
Consider the biased $(p, q)$ random walk on the $N$-cycle. We construct a coupling as follows: Choose $\xi, \zeta \in \set{\pm 1}$ independently so that $\P( \xi = \pm1) = 1/2$ and $\P(\zeta = 1) = p$. Suppose $X_n \neq Y_n$. If $\xi = 1$, set $X_{n+1} = X_n + \zeta \pmod{N}$, and $Y_{n+1} = Y_n$. If $\xi = -1$, set $X_{n+1} = X_n$, and $Y_{n+1} = Y_n + \zeta \pmod{N}$. On the other hand, if $X_n = Y_n$, then set $X_{n+1} = Y_{n+1} = X_n + \one_{\set{\xi = 1}} \zeta$.
Let $D_n = Y_n - X_n \pmod{N}$. In general a function of a Markov chain need not be a Markov chain.. However, in this case we directly check that $D$ is exactly the symmetric random walk on the cycle $\set{0, \dots, N}$ that gets absorbed at both $0$ and $N$, as we previously studied in the Gamblers ruin. So if $x, y \in \mathcal X$ with $x - y = k \pmod{N}$, \begin{equation} \E^{x, y} \tau_\cpl = \E^k \tau_{\mathit{ruin}} = k (N - k) \end{equation} and hence \begin{equation} \tmix \leq N^2 \,. \end{equation}
Remark 7. In this case one can also show that \begin{equation} \tmix \geq \frac{N^2}{32} \,, \end{equation} showing our bound is sharp (modulo a small multiplicative constant).
Lazy random walk on the torus.
One can also use coupling to get a good bound on the mixing time of the lazy random walk on the $d$ dimensional torus. Define $\Z_N^d = \set{0, \dots, N-1}^d$, and consider the lazy random walk that doesn’t move with probability half, and moves to it’s nearest neighbor with probability $1/4d$. (Here $x = (x_1, \dots, x_d)$ and $y = (y_1, \dots, y_d)$ are neighbors if there exists $i$ such that for all $j \neq i$ we have $x_j = y_j$, and for $j = i$ we have $x_i = y_i \pm 1 \pmod{N}$.
Proposition 8. For the lazy random walk on $Z_N^d$, we have \begin{equation} \tmix(\epsilon) \leq d N^2 \log_4 \paren[\Big]{\frac{d}{\epsilon}} \end{equation}
Proof sketch: Construct a coupling as follows: Pick a coordinate $i \in \set{1, \dots, d}$ uniformly at random. If the $i$-th coordinates of $X$ and $Y$ agree, then move them together. If not, pick either $X$ or $Y$ at random, and move the $i$-th coordinate of the one you picked, and hold the other one fixed. Along each coordinate, this is like the one dimensional coupling we saw above. Estimate the coupling time to conclude.
Lazy random walk on the binary hypercube
Let $\mathcal X = \set{0, 1}^d$ be the binary hypercube, and declare $x, y \in \mathcal X$ to be neighbors if they disagree in exactly one coordinate.
Proposition 9. For the lazy random walk on the binary hypercube, \begin{equation} \tmix(\epsilon) \leq d \log\paren[\Big]{\frac{d}{\epsilon}} \end{equation}
Proof sketch: Here is a coupling construction: Pick a coordinate $i$ uniformly at random. Refresh the $i$-th coordinate of both $X$ and $Y$ in exactly the same manner. The coupling time is now the time so that every coordinate is chosen at least once.
This is the standard coupon collector problem, and one can show that for every $c > 0$ we have \begin{equation} \P^{x, y}( \tau_\cpl > d \ln d + c d ) \leq e^{-c} \end{equation} from which the desired bound follows.
Proofs
Definition 10. Let $\mu, \nu$ be two probability distributions. We say $(X, Y)$ is a coupling of $(\mu, \nu)$ if $X, Y$ are two random variables (defined on the same probability space) and $\dist(X) = \mu$ and $\dist(Y) = \nu$.
Example 11. If $X \sim \mu$ and $Y \sim \nu$ are independent, then $(X, Y)$ is a coupling of $(\mu, \nu)$.
Example 12. If $\mu = \nu$, one can still construct a coupling by choosing $X \sim \mu$ and $Y \sim \nu$ independently. However, one can also construct a coupling by choosing $X \sim \mu$ and $Y = X$.
If $(X, Y)$ is any coupling of $(\mu, \nu)$ then $\norm{\mu - \nu}_\TV \leq \P( X \neq Y )$.
Proof: Suppose $(X, Y)$ is a coupling of $(\mu, \nu)$. For any $A \subseteq \mathcal X$, note \begin{equation} \mu(A) - \nu(A) = \P(X \in A) - \P(Y \in A) \leq \P(X \in A, Y \not\in A) \leq \P(X \neq Y) \,. \end{equation} Now choose $A = \set{x \in \mathcal X \st \mu(x) > \nu(x)}$ and note \begin{equation} \norm{\mu - \nu}_\TV = \frac{1}{2} \paren[\big]{ \mu(A) - \nu(A) + \nu(A^c) - \mu(A^c) } = \mu(A) - \nu(A) \end{equation} to conclude the proof.
Proposition 14. There exists a coupling of $(\mu, \nu)$ such that $\norm{\mu - \nu}_\TV = \P( X \neq Y)$.
Proof sketch: Let $\delta = \norm{\mu - \nu}_\TV$. If $\delta \in (0, 1)$, let $A = \set{\mu > \nu}$ be as above and define \begin{equation} p(x) = \frac{1}{\delta} (\mu(x) - \nu(x)) \one_A(x)\,, \quad q(x) = \frac{1}{\delta} (\nu(x) - \mu(x)) \one_{A^c}(x)\,, \quad \gamma(x) = \frac{1}{1 - \delta} (\mu(x) \wedge \nu(x))\,, \end{equation} and check that $p, q$ and $\gamma$ are all probability distributions. Now flip a biased coin that lands heads with probability $\delta$. If the coin lands heads, we choose $X \sim p$ and $Y \sim q$ independently. If the coin lands tails, we choose $X = Y \sim \gamma$. Now check that $(X, Y)$ is a coupling of $(\mu, \nu)$ and $\P(X \neq Y) = \delta = \norm{\mu - \nu}_\TV$.
Finally, if $\delta = 0$ or $\delta = 1$ find an appropriate coupling to conclude the proof.
Proof of Theorem 5 : For any $A \subseteq X$, and any $x, y \in \mathcal X$ note \begin{equation} \norm{\delta_x P^n - \delta_y P^n}_\TV \leq \P^{x, y}(X_n \neq Y_n) = \P^{x, y}( \tau_\cpl > n )\,. \end{equation} Hence for any $x \in \mathcal X$, \begin{equation} \norm{\delta_x P^n - \pi}_\TV = \norm[\Big]{ \delta_x P^n - \sum_y \pi(y) \delta_y P^n } = \norm[\Big]{ \sum_y \pi(y) \paren[\big]{ \delta_x P^n - \delta_y P^n } } \leq \max_y \P^{x, y}( \tau_\cpl > n ) \,. \end{equation} Taking the max over $x$ concludes the proof.