Convergence of Reversible Chains

The convergence of reversible chains can be completely determined by the spectrum of the transition matrix. This is good because we obtain (sharp) upper and lower bounds. This is bad because the spectrum (or more precisely the spectral gap) is not easy to estimate.

Eigenfunction decomposition

Suppose $P$ is a reversible irreducible transition matrix with stationary distribution $\pi$.

Definition 1. For $f, g \colon \mathcal X \to \R$ define the inner product \begin{equation} \ip{f, g}_\pi = \sum_{x \in \mathcal X} f(x) g(x) \pi(x) \,, \end{equation} and the norm \begin{equation} \norm{f}_{\ell_2(\pi)} \defeq \sqrt{ \ip{f, f}_\pi } = \paren[\Big]{ \sum_{x \in \mathcal X} f(x)^2 \pi(x) }^{1/2} \,. \end{equation}

Proposition 2. If $P$ is reversible then $P$ is self adjoint with respect to $\ip{\cdot, \cdot}_\pi$. That is, for every $f, g \colon \mathcal X \to \R$ we have \begin{equation} \ip{f, Pg}_\pi = \ip{Pf, g}_\pi \,. \end{equation}

The spectral theorem now guarantees there exists an orthonormal basis of (real) eigenfunctions. That is, there exists $\lambda_1$, …, $\lambda_{\mathcal X} \in \R$ and corresponding eigenfunctions $\varphi_1$, …, $\varphi_{\mathcal X}$ such that \begin{equation} P \varphi_k = \lambda_k \varphi_k \quad\text{and}\quad \ip{\varphi_j, \varphi_k}_\pi = \begin{cases} 0 & j \neq k\,, \\ 1 & j = k \,. \end{cases} \end{equation}

Proposition 3. Exactly one eigenvalue of $P$ is $1$, corresponding to the constant eigenfunction. Moreover, all eigenvalues of $P$ have absolute value at most $1$.

Proof: We saw both of these HW 4. A quick reminder of the proof is as follows.

  1. Since $P$ is stochastic, clearly the constant function $1$ is an eigenfunction with eigenvalue $1$. Moreover if $P \varphi = \varphi$, then let $x_\max$ be the point where $\varphi$ attains its maximum. Then \begin{equation} \varphi(x_\max) = \sum_y P(x_\max, y) \varphi(y) \leq \varphi(x_\max) \,. \end{equation} This forces $\varphi(y) = \varphi(x_\max)$ for every $y$ for which $P(x_\max, y) > 0$. By irreducibility, this implies $\varphi$ is a constant function, showing only one eigenvalue of $P$ is $1$.

  2. Let $\norm{f}_{\ell^\infty} = \max_x \abs{f(x)}$. If $P\varphi = \lambda \varphi$, we note $\abs{\lambda} \norm{\varphi}_{\ell^\infty} = \norm{P \varphi}_{\ell^\infty} \leq \norm{\varphi}_{\ell^\infty}$, and hence $\abs{\lambda} \leq 1$.

Without loss of generality, we now assume \begin{equation} \lambda_1 = 1 \,, \quad \varphi_1(x) = 1 \,, \text{ for all } x \in \mathcal X\,, \quad\text{and}\quad \abs{\lambda_k} \leq \abs{\lambda_2}\,, \text{ for all } k \geq 2 \,. \end{equation}

Spectral gap and $\ell^2$ convergence

Definition 4. The absolute spectral gap of a reversible chain is defined to be $\gamma_* = 1- \abs{\lambda_2}$.

Theorem 5 ($\ell^2$ convergence).

Let $(X_n)$ be a Markov chain with an (irreducible, reversible) transition matrix $P$, and let $\mu_n = \dist(X_n)$. Then \begin{equation} \norm[\Big]{ \frac{\mu_n}{\pi} - 1}_{\ell^2(\pi)} \leq (1 - \gamma_*)^n \norm[\Big]{ \frac{\mu_0}{\pi} - 1}_{\ell^2(\pi)} \,. \end{equation}

Remark 6. Note $\norm{\mu/\pi - 1}_{\ell^2(\pi)}$ is another way to measure how far apart $\mu$ and $\pi$ are. Clearly $\norm{\mu/\pi - 1}_{\ell^2(\pi)} \geq 0$ and $\norm{\mu/\pi - 1}_{\ell^2(\pi)} = 0$ if and only if $\mu = \pi$.

Remark 7. For reversible chains the convergence rate in Theorem 5 is sharp – that is you can’t converge faster than the right hand side for all initial distributions.

In order to use Theorem 5 , one would need to guarantee $\gamma_* > 0$. Here are two conditions.

Proposition 8. If $P$ is reversible, irreducible and aperiodic, then the absolute spectral gap $\gamma_* > 0$.

Proposition 9. Suppose $P$ is reversible and irreducible. Let $Q = (I + P)/2$ be the transition matrix of the lazy chain. The absolute spectral gap of $Q$ is strictly positive.

Proof: $\lambda$ is an eigenvalue of $P$ if and only if $(1 + \lambda)/2$ is an eigenvalue of $Q$.

Lemma 10. For any function $f$, \begin{equation} \norm{P^n f - \E_\pi f}_{\ell^2(\pi)} \leq (1 - \gamma_*)^n \norm{f - \E_\pi f}_{\ell^2(\pi)} \leq (1 - \gamma_*)^n \std_\pi(f) \,, \end{equation} where \begin{equation} \E_\pi f = \pi f = \sum_{x \in \mathcal X} f(x) \pi(x)\,, \quad\text{and}\quad \std_\pi(f) = \paren[\big]{ \E_\pi( f - \E_\pi f )^2 }^{1/2} = \norm{ f - \E_\pi f }_{\ell^2(\pi)} \,. \end{equation}

Proof sketch: Write \begin{equation} f = \sum_{k=1}^{\abs{\mathcal X}} \ip{f, \varphi_k}_{\ell^2(\pi)} \varphi_k = \pi f + \sum_{k=2}^{\abs{\mathcal X}} \ip{f - \pi f, \varphi_k}_{\ell^2(\pi)} \varphi_k \,, \end{equation} and so \begin{equation} \norm{P^n f - \pi f}_{\ell^2(\pi)}^2 = \sum_{k=2}^{\abs{\mathcal X}} \lambda_k^n \ip{f - \pi f, \varphi_k}_{\ell^2(\pi)}^2 \leq (1 - \gamma_*)^n \norm{f - \pi f}_{\ell^2(\pi)}^2 \,. \end{equation}

Proof sketch of Theorem 5 : For any function $f$, observe \begin{equation} \ip[\Big]{ \frac{\mu_n}{\pi} - 1, f }_\pi % = \sum_{x \in \mathcal X} \mu_0(x) (P^n f(x) - \pi f) = \sum_{x \in \mathcal X} (\mu_0(x) - \pi(x)) (P^n f(x) - \pi f) = \ip[\Big]{ \frac{\mu_0}{\pi} - 1, P^n f - \pi f }_\pi \,. \end{equation} Using Lemma 10 and the Cauchy–Schwartz inequality we get \begin{equation} \ip[\Big]{ \frac{\mu_n}{\pi} - 1, f }_\pi \leq (1 - \gamma_*)^n \norm[\Big]{ \frac{\mu_0}{\pi} - 1 }_{\ell^2(\pi)} \norm[\Big]{ f - \pi f }_{\ell^2(\pi)} \,. \end{equation} Choosing $f = \mu_n / \pi - 1$ finishes the proof.

Total variation convergence.

Theorem 11.

Let $(X_n)$ be a Markov chain with an (irreducible, reversible) transition matrix $P$, and let $\mu_n = \dist(X_n)$. Then \begin{equation} \norm{\mu_n - \pi}_\TV \leq \frac{(1 - \gamma_*)^n}{\sqrt{2 \pi_\min}} \,, \end{equation} where $\pi_\min = \min_{x \in \mathcal X} \pi(x)$

Remark 12. Using the eigenfunction representation of $P(x, y)$ you can improve this bound by an unimportant factor of $1/\sqrt{2}$.

Proof sketch: This follows from Theorem 5 and the fact that \begin{equation} \norm{\mu_n - \pi}_{\TV} \leq \frac{1}{2} \norm[\Big]{ \frac{\mu_n}{\pi} - 1 }_{\ell^2(\pi)} \,. \end{equation}

Definition 13. The relaxation time of a reversible Markov chain is defined to be \begin{equation} \trel \defeq \frac{1}{\gamma_*} \,. \end{equation}

Remark 14. For any $\epsilon > 0$, to ensure $\norm{\mu_n - \pi}_\TV < \epsilon$, we need to choose \begin{equation} n > \trel \ln \paren[\Big]{ \frac{1}{\epsilon \sqrt{2 \pi_\min}} } \end{equation}

Example 15. One example where the absolute spectral gap can be computed is for riffle shuffles. How often do you need to shuffle a deck of cards for it to be “truly random”? This can be phrased as a question estimating the convergence rate of a Markov chain on the permutation group $S_{52}$ (representing card shuffles). One can show that $\gamma_* = 1/2$, showing that riffling the deck mixes extremely fast. A stronger statement was proved by Bayer and Diaconis which shows that $7$ riffles are enough.