Markov Chains: Introduction

A Ranking Problem

Consider a tournament with $N$ teams. Due to time constraints, all teams don’t play each other. Each teams may play a different number of games based on geographical constraints. Given the results of the tournament how do you rank the teams?

This ranking problem shows up in many different contexts; one you are almost certainly familiar with is the Page Rank algorithm – where given a huge number of interconnected web-pages, we need to rank their relative importance in order to deliver more relevant search results.

One way to solve the ranking problem is as follows:

  1. If team $x$ and team $y$ play, let $S(x, y)$ be the difference between the scores of $x$ and $y$ in the game. If $S(x, y)$ is very positive, then $x$ beat $y$ by a lot and is is more likely to beat $y$ if they play again. If $S(x, y)$ is very negative then $x$ lost by a lot is more likely to lose to $y$ if they play again.

  2. Suppose spectators of this sport are very fickle, and switch the teams they support as a result of games. Let $P(x, y)$ be the probability that a fan of team $x$ decides to switch allegiances and become a fan of team $y$ instead. A reasonable assumption would be to choose \begin{equation} P(x, y) \propto e^{-\beta S(x, y)} \end{equation} for some positive constant $\beta > 0$.

  3. View the tournament through the eyes of one fickle spectator. They start as a fan of one team (say $X_0$), and spectate one of the games played by $X_0$ (chosen at random). Based on the outcome they become a fan of some other team $X_1$ (either the same team, the opponent in the game, or with some small probability, some entirely different team). Then repeat this process, retaining no memory of previous games or allegiances.

  4. Rank teams by the percentage of time a fickle spectator spends as a fan of the team.

Code to experiment with:

Here are some numerical simulations ranking teams using this algorithm. The source code is here if you want to run it and experiment.

Ranking algorithms based on some variant of this idea have been around since at least 1939. Joe Keller in ’85 used this when he was unsatisfied with how baseball teams were ranked; and suggested using it to rank authors using the citation graph. If you’re interested, Keener has a nice account of several issues related to ranking in such tournaments. This is the precursor to the PageRank algorithm that was initially used by Google to rank search results.

A few questions that come to mind are:

  1. Is the final ranking unique? In other words, given the switching probabilities $P(x, y)$, will our final ranking always be the same? If you change $\beta$, you change $P(x, y)$ and we certainly expect the final rankings to change. But if you fix $\beta$, do the final rankings depend on, say which team you started spectating? And which sequence of games you chose to observe? If it did, then this would not be a good ranking algorithm!

  2. The percentage of time spent as a spectator of a given team is really a long time limit: \begin{equation} \pi(x) = \lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} \one_{\set{X_n = x}} \,. \end{equation} Are we guaranteed it exists? If it didn’t this algorithm wouldn’t give any ranking at all!

We will see that the above questions translate to existence and uniqueness of the stationary distribution of a Markov Chain. We will spend the first part of this course studying Markov Chains, and the convergence to the stationary distribution.

Basic notions

Definition 1. A Markov chain on $\mathcal X$ is a family of random variables $X_0$, $X_1$, … such that for all $n \in \N$ and $x_0, \dots, x_{n+1} \in \mathcal X$ we have \begin{equation} \P(X_{n+1} = x_{n+1} \given \set{ X_k = x_k \st 0 \leq k \leq n} ) = \P(X_{n+1} = x_{n+1} \given X_n = x_n ) \,. \end{equation}

Definition 2. A time homogeneous Markov chain is a Markov chain where \begin{equation} \P( X_{n+1} = y \given X_n = x ) = \P( X_1 = y \given X_0 = x )\,, \end{equation} for all $x, y \in \mathcal X$ and $n \in \N$.

Example 3 (Simple random walk). Let $\mathcal X = \Z$, $\xi_n$ be i.i.d. $\pm 1$ valued random variables (coin flips) and set $X_{n+1} = X_n + \xi_{n+1}$.

Example 4. (Random surfer) Suppose $G = (V, E)$ is a directed graph, like, for instance, the network of web pages. Consider the following Markov process: From any node $x \in P$, jump to a direct successor chosen uniformly at random.

In practice, the random surfer chain is not good: it gets stuck at dead ends, and it can not escape loops. A better version is the page rank chain.

Example 5. (Page rank chain) Suppose $G = (V, E)$ is a directed graph, let $\alpha \in (0, 1)$, and consider the following Markov process: Flip an (independent) coin that lands heads with probability $\alpha$. If the coin lands heads, and the current point has direct successors, then jump to a direct successor chosen uniformly at random. Otherwise, then jump to any point in $P$, chosen uniformly at random.

Definition 6. The transition matrix of a (time homogeneous) Markov chain is \begin{equation} P(x, y) = \P( X_1 = y \given X_0 = x )\,. \end{equation}

Remark 7. Notice $P(x, y) \geq 0$ and $\sum_y P(x, y) = 1$. Such matrices are called stochastic matrices.

Lemma 8. For any $x_0$, …, $x_n \in \mathcal X$, \begin{equation} \P(X_n = x_n, \dots, X_1 = x_1 \given X_0 = x_0) = P(x_0, x_1) P(x_1, x_2) \cdots P(x_{n-1}, x_n) \,. \end{equation}

Proposition 9.

For any $n \in \N$, and any $x, y \in \mathcal X$ we have \begin{equation} \P( X_n = y \given X_0 = x ) = P^n(x, y)\,, \end{equation} where $P^n$ means the $n^\text{th}$ power of the matrix $P$.

Proof. Done in class.

Proposition 10. If $\dist(X_0) = \mu_0$ (i.e. $\P(X_0 = x) = \mu_0(x)$ for all $x \in \mathcal X$), then $\dist(X_n) = \mu_0 P^n$ (as a matrix product).

Proof. Done in class.

Hitting times and Gamblers Ruin

Suppose a gambler plays a game at a casino. He bets a dollar, and with probability $1/2$ he gets 2 dollars (so wins $1$ dollar total), and with probability $1/2$ he gets nothing and loses his bet. When he runs out of money, he is ruined and leaves.

The game is clearly fair. What is the probability he gets ruined? Turns out, he gets ruined with probability $1$, even though the game is fair.

We study this in a little more generality below.

Definition 11. Let $G = (V, E)$, and $(X_n)$ is a Markov chain on $G$ with transition kernel $P$. Let $S \subseteq V$ be some subset of vertices. We define the hitting time of $S$, denoted by $\tau_S$, by \begin{equation} \tau_S = \min \set{n \in \N \st X_n \in S} \,. \end{equation}

Remark 12. Notice $\tau_S$ is random. Moreover, $\tau$ has the property that for every $n \in \N$, the event $\set{\tau = n}$ can be described using only $X_0$, …, $X_n$, without relying on the values of $X_{n+1}$, $X_{n+1}$, etc. For this reason $\tau$ is called a stopping time.

The (discrete) Dirichlet Problem

Proposition 13. Suppose $\P( \tau_S < \infty) = 1$. Given a function $f \colon S \to \R$, define \begin{equation} u(x) \defeq \E^x f(X_{\tau_S}) = \E \paren{ f(X_{\tau_S}) \given X_0 = x } \,. \end{equation} Then \begin{align} (I - P) u &= 0 &&\text{in } V - S\,, \\ u &= f && \text{in } S \,. \end{align}

Proof: Done in class.

Remark 14. Here $Pg$ is defined by \begin{equation} Pg(x) = \sum_{y \in \mathcal X} P(x, y) f(y) \end{equation} and $I$ is the identity operator. Notice \begin{equation} (I - P) g(x) = \sum_{y \in \mathcal X} P(x, y) g(y) - g(x) = \sum_{y \in \mathcal X} P(x, y) (g(y) - g(x)) \,. \end{equation} When $P$ is the transition kernel of the nearest neighbor random walk, then $I - P = D^{-1} L $ where $L$ is the Graph Laplacian, and $D$ is the degree matrix.

Example 15 (Gamblers Ruin).

Let $p, q \in (0, 1)$. A gambler plays a game where they win \$1 with probability $p$, and lose \$1 with probability $q$. They start with a fortune of $$k$. If they make $$N$ or lose all their money, they stop playing. The probability they go bankrupt is \begin{equation} \begin{cases} \displaystyle \frac{\paren{\frac{q}{p}}^k - \paren{\frac{q}{p}}^N} {1 - \paren{\frac{q}{p}}^N} &\text{if } p \neq q\,, \\ \displaystyle 1 - \frac{k}{N} & \displaystyle \text{if } p = q = \frac{1}{2} \,. \end{cases} \end{equation}

Proof: Consider the graph on $\set{0, \dots, N}$ which connects nearest neighbors, set $S = \set{0, N}$, and define $f \colon S \to \R$ by $f(x) = 1$ if $x = 0$ and $0$ otherwise. The probability of going bankrupt is given by \begin{equation} u(k) = \E^k f(X_{\tau_S}) \,, \end{equation} and so by Proposition 13 we see \begin{equation} u(0) = 1\,,\quad u(N) = 0\,,\quad \end{equation} and for all $x \in \set{1, \dots, N - 1}$, \begin{equation} u(x) = p u(x + 1) + q u(x - 1) \,. \end{equation} Solving this recurrence relation gives the desired result.

Remark 16. If we fix $k$ and send $N \to \infty$, we see the probability of going bankrupt is $1$ if $p \leq q$, and $(q/p)^k$ if $p > q$.

The (discrete) Poisson problem

Suppose now we play a game; but instead of collecting the reward when we stop playing, we collect a reward (or cost) every time we play. As before, consider a Markov chain on a graph $G = (V, E)$, let $S \subseteq V$ and let $g \colon V - S \to \R$ be a given reward function. Let $\tau_S$ be the hitting time to $S$. The cost or reward we get when playing this game is exactly \begin{equation} \sum_{n = 0}^{\tau_S - 1} g(X_n) \,. \end{equation} Notice that the sum has a random number of terms, as $\tau_S$ is a random variable.

It turns out that the expected reward satisfies a nice equation, similar to Proposition 13 .

Proposition 17. Define \begin{equation} v(x) \defeq \E^x \paren[\Big]{ \sum_{n = 0}^{\tau_S - 1} g(X_n) } = \E\paren[\Big]{ \sum_{n = 0}^{\tau_S - 1} g(X_n) \given[\Big] X_0 = x } \,. \end{equation} Then \begin{align} (I - P) v &= g &&\text{in } V - S\,, \\ v &= 0 && \text{in } S \,. \end{align}

Remark 18. Note for $x \in S$, we know $\tau_S(x) = 0$, and so $v(x) = \sum_{n = 0}^{-1} g(X_n)$ is the empty sum, which by definition is $0$.

Proof: The proof is similar to the proof of Proposition 13 . The main difference is use time homogenity and the Markov property to show that for any $x \not\in S$, we have the identity $$ \E \paren[\Big]{ \sum_{n = 1}^{\tau_S - 1} g(X_n) ~\Big|~ X_1 = y, X_0 = x } = \E \paren[\Big]{ \sum_{n = 0}^{\tau_S - 1} g(X_n) ~\Big|~ X_0 = y } \,. $$ Now reason as in the proof of Proposition 13 to conclude.

Corollary 19. The function $v(x) = \E^x \tau_S$ satisfies \begin{align} (I - P) v &= 1 &&\text{in } V - S\,, \\ v &= 0 && \text{in } S \,. \end{align}

Proof: Put $g = 1$ in Proposition 17 .

Example 20 (Ruin time).

Consider the gamblers ruin example, and let $\tau$ be the time at which the gambler stops playing. If $p = q = 1/2$, then \begin{equation} \E^k \tau = k ( N - k) \,. \end{equation} If $p \neq q$, then \begin{equation} \E^k \tau = \frac{k}{q - p} - \frac{N}{q-p} \paren[\Big]{ \frac{1 - (\frac{q}{p})^k}{1 - (\frac{q}{p})^N} } \end{equation}

Remark 21. Here $\E^k \tau = \E( \tau \given X_0 = k) = \E \tau(k)$.

Remark 22. If $p < q$, then as $N \to \infty$ we see $\E^k \tau \to \frac{k}{q-p}$. On the other hand, if $p \geq q$, then as $N \to \infty$ we see $\E^k \tau \to \infty$. Thus, if you’re playing a fair game ($p = q = 1/2$), you will almost surely get ruined; however on average it will take infinitely long for you to get ruined no matter how large / small your initial fortune was.

Proof of Example 20 : When $p \neq q$, the function $f(k) = \frac{k}{q-p}$ solves $(I - P) f = 1$, and the functions $g_1(k) = (\frac{q}{p})^k$, and $g_k(k) = 1$ solve $(I - P) g_1 = (I - P)g_2 = 0$. So write \begin{equation} v(k) = \E^k \tau = f(k) + a g_1(k) + b g_2(k) \end{equation} and solve for $a, b$ so that the boundary conditions are satisfied.

When $p = q$ the same strategy works, except we use the functions $f(k) = -k^2$, $g_1(k) = k$ and $g_2(k) = 1$ instead.