Convergence of Reversible Chains

The convergence of reversible chains can be completely determined by the spectrum of the transition matrix. This is good because we obtain (sharp) upper and lower bounds. This is bad because the spectrum (or more precisely the spectral gap) is not easy to estimate.

Eigenfunction decomposition

Suppose $P$ is a reversible irreducible transition matrix with stationary distribution $\pi$.

Definition 1. For $f, g \colon \mathcal X \to \R$ define the inner product \begin{equation} \ip{f, g}_\pi = \sum_{x \in \mathcal X} f(x) g(x) \pi(x) \,, \end{equation} and the norm \begin{equation} \norm{f}_{\ell_2(\pi)} \defeq \sqrt{ \ip{f, f}_\pi } = \paren[\Big]{ \sum_{x \in \mathcal X} f(x)^2 \pi(x) }^{1/2} \,. \end{equation}

Proposition 2. If $P$ is reversible then $P$ is self adjoint with respect to $\ip{\cdot, \cdot}_\pi$. That is, for every $f, g \colon \mathcal X \to \R$ we have \begin{equation} \ip{f, Pg}_\pi = \ip{Pf, g}_\pi \,. \end{equation}

The spectral theorem now guarantees there exists an orthonormal basis of (real) eigenfunctions. That is, there exists $\lambda_1$, …, $\lambda_{\mathcal X} \in \R$ and corresponding eigenfunctions $\varphi_1$, …, $\varphi_{\mathcal X}$ such that \begin{equation} P \varphi_k = \lambda_k \varphi_k \quad\text{and}\quad \ip{\varphi_j, \varphi_k}_\pi = \begin{cases} 0 & j \neq k\,, \\ 1 & j = k \,. \end{cases} \end{equation} Since $P$ is stochastic, we’ve seen before one eigenvalue of $P$ is $1$ (corresponding to the constant eigenfunction), and for all $k \geq 2$ we have $\abs{\lambda_k} \leq 1$. Without loss of generality, we assume \begin{equation} \lambda_1 = 1 \,, \quad \varphi_1(x) = 1 \,, \text{ for all } x \in \mathcal X\,, \quad\text{and}\quad \abs{\lambda_k} \leq \abs{\lambda_2}\,, \text{ for all } k \geq 2 \,. \end{equation}

Spectral gap and $\ell^2$ convergence

Definition 3. The absolute spectral gap of a reversible chain is defined to be $\gamma_* = 1- \abs{\lambda_2}$.

Theorem 4 ($\ell^2$ convergence).

Let $(X_n)$ be a Markov chain with an (irreducible, reversible) transition matrix $P$, and let $\mu_n = \dist(X_n)$. Then \begin{equation} \norm[\Big]{ \frac{\mu_n}{\pi} - 1}_{\ell^2(\pi)} \leq (1 - \gamma_*)^n \norm[\Big]{ \frac{\mu_0}{\pi} - 1}_{\ell^2(\pi)} \,. \end{equation}

Remark 5. Note $\norm{\mu/\pi - 1}_{\ell^2(\pi)}$ is another way to measure how far apart $\mu$ and $\pi$ are. Clearly $\norm{\mu/\pi - 1}_{\ell^2(\pi)} \geq 0$ and $\norm{\mu/\pi - 1}_{\ell^2(\pi)} = 0$ if and only if $\mu = \pi$.

Remark 6. For reversible chains the convergence rate in Theorem 4 is sharp – that is you can’t converge faster than the right hand side for all initial distributions.

Lemma 7. For any function $f$, \begin{equation} \norm{P^n f - \E_\pi f}_{\ell^2(\pi)} \leq (1 - \gamma_*)^n \norm{f - \E_\pi f}_{\ell^2(\pi)} \leq (1 - \gamma_*)^n \std_\pi(f) \,, \end{equation} where \begin{equation} \E_\pi f = \pi f = \sum_{x \in \mathcal X} f(x) \pi(x)\,, \quad\text{and}\quad \std_\pi(f) = \paren[\big]{ \E_\pi( f - \E_\pi f )^2 }^{1/2} = \norm{ f - \E_\pi f }_{\ell^2(\pi)} \,. \end{equation}

Proof sketch: Write \begin{equation} f = \sum_{k=1}^{\abs{\mathcal X}} \ip{f, \varphi_k}_{\ell^2(\pi)} \varphi_k = \pi f + \sum_{k=2}^{\abs{\mathcal X}} \ip{f - \pi f, \varphi_k}_{\ell^2(\pi)} \varphi_k \,, \end{equation} and so \begin{equation} \norm{P^n f - \pi f}_{\ell^2(\pi)}^2 = \sum_{k=2}^{\abs{\mathcal X}} \lambda_k^n \ip{f - \pi f, \varphi_k}_{\ell^2(\pi)}^2 \leq (1 - \gamma_*)^n \norm{f - \pi f}_{\ell^2(\pi)}^2 \,. \end{equation}

Proof sketch of Theorem 4 : For any function $f$, observe \begin{equation} \ip[\Big]{ \frac{\mu_n}{\pi} - 1, f }_\pi % = \sum_{x \in \mathcal X} \mu_0(x) (P^n f(x) - \pi f) = \sum_{x \in \mathcal X} (\mu_0(x) - \pi(x)) (P^n f(x) - \pi f) = \ip[\Big]{ \frac{\mu_0}{\pi} - 1, P^n f - \pi f }_\pi \,. \end{equation} Using Lemma 7 and the Cauchy–Schwartz inequality we get \begin{equation} \ip[\Big]{ \frac{\mu_n}{\pi} - 1, f }_\pi \leq (1 - \gamma_*)^n \norm[\Big]{ \frac{\mu_0}{\pi} - 1 }_{\ell^2(\pi)} \norm[\Big]{ f - \pi f }_{\ell^2(\pi)} \,. \end{equation} Choosing $f = \mu_n / \pi - 1$ finishes the proof.

Total variation convergence.

Theorem 8.

Let $(X_n)$ be a Markov chain with an (irreducible, reversible) transition matrix $P$, and let $\mu_n = \dist(X_n)$. Then \begin{equation} \norm{\mu_n - \pi}_\TV \leq \frac{(1 - \gamma_*)^n}{\sqrt{2 \pi_\min}} \,, \end{equation} where $\pi_\min = \min_{x \in \mathcal X} \pi(x)$

Remark 9. The proof of Theorem 12.4 in LPW obtains $(1 - \gamma_*)^n / 2 \sqrt{\pi_\min}$ on the right hand side instead, which is slightly better.

Proof sketch: This follows from Theorem 4 and the fact that \begin{equation} \norm{\mu_n - \pi}_{\TV} \leq \frac{1}{2} \norm[\Big]{ \frac{\mu_n}{\pi} - 1 }_{\ell^2(\pi)} \,. \end{equation}

Definition 10. The relaxation time of a reversible Markov chain is defined to be \begin{equation} \trel \defeq \frac{1}{\gamma_*} \,. \end{equation}

Remark 11. For any $\epsilon > 0$, to ensure $\norm{\mu_n - \pi}_\TV < \epsilon$, we need to choose \begin{equation} n > \trel \ln \paren[\Big]{ \frac{1}{\epsilon \sqrt{2 \pi_\min}} } \end{equation}

Example 12. One example where the absolute spectral gap can be computed is for riffle shuffles. How often do you need to shuffle a deck of cards for it to be “truly random”? This can be phrased as a question estimating the convergence rate of a Markov chain on the permutation group $S_{52}$ (representing card shuffles). One can show that $\gamma_* = 1/2$, showing that riffling the deck mixes extremely fast. A stronger statement was proved by Bayer and Diaconis which shows that $7$ riffles are enough.