Assignment 6

Assigned 2025-10-08, due 2025-10-15

Question 1

Let $\pi$ be a probability distribution on $\mathcal X$, and $A \subseteq \mathcal X$ be an event we want the estimate the probability of. Let $X_1$, …, $X_N$ be i.i.d. samples from $\pi$, and we define \begin{equation} \hat \P_N(A) = \frac{1}{N} \sum_{n=1}^N \one_A(X_n) \quad\text{and}\quad \mathcal E_N = \frac{1}{\P(A)}\sqrt{\E (\hat \P_N(A) - \P(A))^2} \,. \end{equation} Here $\hat \P_N(A)$ is an estimator for $\P(A)$, and $\mathcal E_N$ is the relative error. Given $\epsilon > 0$ find $N$ (in terms of $\P(A)$ and $\epsilon$) so that that we necessarily have $\mathcal E_N < \epsilon$.

Note: Here $\P(A) = \sum_{a \in A} \pi(a)$. People often use the Monte Carlo method to estimate probabilities of events. When the events are rare, using a vanilla version of this method is a bad idea; and this question quantifies why. One technique that’s often used to improve results is importance sampling. Rare event sampling is a huge subject and there are many other sophisticated methods.

Question 2

Let $B(0, 1) = \set{x \in \R^d \st \abs{x} < 1}$, and let \begin{equation} V_d = \abs{B(0,1)} = \int_{B(0, 1)} 1 \, dx \end{equation} denote the volume. (Note this is a $d$-dimensional integral).

Write code to approximate $V_d$ for $d = 10$, and $N = 10^6$ using a Monte Carlo method with a relative error of at most $5\%$.

Hint: I suggest using an algorithm based on this theorem with $p$ to be the uniform distribution on $[-1, 1]^d$. Also note that in this case $V_d$ is known exactly, and the point of this question is for a test case where you can easily check whether or not your code (and algorithm) is correct. In Python the exact value of $V_d$ can be obtained using pi**(d/2) / sp.special.gamma(d/2 + 1). Use this to compute the relative error.
Write code to approximate $V_d$ for $d = 50$, and $N = 10^6$ using a Monte Carlo method with a relative error of at most $1\%$.

Hint: The suggested approach for the previous part will fail miserably because of the curse of dimensionality. It’s worth trying to see this for yourself. I suggest using importance sampling along with an appropriately chosen distribution which the standard library can already sample from.

Turn in a screenshot of the relevant portion of your code, and the output.

Question 3

Let $\pi$ be a probability distribution on a finite state space $\mathcal X$ such that $\pi(x) > 0$ for all $x \in \mathcal X$. Let $Q$ be an irreducible transition matrix on $\mathcal X$, and let $(X_n)$ be the Metropolis–Hastings chain with proposal mechanism $Q$ and stationary distribution $\pi$. True or false: The Markov chain $(X_n)$ is irreducible. Prove it, or find a counter example.
In the previous part, suppose $Q$ was also symmetric, and $\pi$ is not the uniform distribution. True or false: The Markov chain $(X_n)$ is aperiodic. Prove it, or find a counter example.
Consider the Metropolis–Hastings chain for the Ising model described here. Is it irreducible and aperiodic? Prove or disprove.

Question 4

Consider the Ising model with temperature $\beta$ on a connected graph $\Lambda = (V, E)$. The Glauber dynamics is the following Markov chain on the space of all spin configurations $\mathcal X$, which we recall is the set of all functions $\sigma \colon \Lambda \to \set{\pm 1}$. Given $X_n = \sigma \in \mathcal X$, we choose a vertex $v \in \Lambda$ uniformly at random. Define the new spin configuration $\sigma_v \colon \Lambda \to \set{\pm 1}$ by $\sigma_v(w) = \sigma(w)$ if $w \neq v$, and $\sigma_v(v) = -\sigma(v)$. Now we define $X_{n+1}$ by \begin{equation} X_{n+1} = \begin{cases} \sigma_v & \displaystyle \quad \text{with probability } \frac{\pi(\sigma_v)}{\pi(\sigma) + \pi(\sigma_v)} \\ \sigma & \displaystyle \quad \text{with probability } \frac{\pi(\sigma)}{\pi(\sigma) + \pi(\sigma_v)} \end{cases} \end{equation}

What is the stationary distribution of $(X_n)$? Prove it.
Is $(X_n)$ irreducible and aperiodic?
Define $S(\sigma, v) = \sigma(v) \sum_{w \in \mathcal N(v)} \sigma(w)$. Compute $P(\sigma, \sigma_v)$ in terms of $S(\sigma, v)$.

Note: This is useful in practice because $\pi(\sigma) / (\pi(\sigma) + \pi(\sigma_v))$ costs $O(\abs{\Lambda})$ to compute. Moreover, it could have a disastrous floating point error leading to bad results. The formula you compute here is $O(\deg(\Lambda))$ to compute, and avoids floating point errors. It can be shown that the Glauber dynamics described here provides a Markov chain with faster convergence to the stationary distribution than the Metropolis–Hastings chain.