Gerrymandering mathematics

This page gives an overview of some of our work ("Assessing significance…") on gerrymandering. This work is on mathematics relevant to gerrymandering. In particular, we are proving mathematical theorems which are intended to be useful for the purpose of evaluating and addressing the gerrymandering problem in practice.

Rigorously detecting gerrymandering

One key task is to be able to rigorously identify gerrymandering. It is not hard to tell that a districting favors one political party or another; for example, in 2012, the new Congressional districting of Pennsylvania resulted in Democrats winning only 5 out of the 18 seats up for election, despite winning a majority of the total Congressional vote in the state. Metrics such as the efficiency gap capture this advantage that one party may enjoy from the particular characteristics of a districting.

It has been pointed out, however, that this kind of advantage can arise naturally, without bias in how districtings are drawn, simply because of the differences in how Democrats and Republicans are distributed in various states. For example, when Democrats are more clustered in cities, it makes it more likely that "blue" districts in a state will be more heavily Democratic than "red" districts are Republican; in this scenario, Democrat voters are concentrated in fewer districts, and elect fewer representatives than would be proportional to their number.

The task then is to evaluate whether a districting not only favors a given political party, but does so to a degree that is atypical for the particular political geography of the state. For example, if we drew new district lines for Pennsylvania from scratch, would we be likely to see a districting with the same characteristics as the current one? Or is the current districting really an outlier in the space of possible districtings of Pennsylvania? Moreover, even when circumstantial evidence makes such a judgment obvious, we wish to give objective mathematical tools for the problem, which can be embraced as part of a rigorous legal framework for adjudicating gerrymandering questions.

A space of valid districtings for comparison

To make a rigorous analysis of the outlier status of a districting of a state, we need a notion of what constitutes a valid districting of a state. This allows us, in principle, to compare the current districting to the space of a valid districtings. Our analysis uses on simple criteria for validity of a districtings of a state into a prescribed number of districts, for example:

  1. Each district should be contiguous,
  2. The districts should have nearly equal population, and
  3. The districts should have reasonable shapes.

In practice, there are many possible ways of formalizing this last constraint; one simple way is to require that the total perimeter of all districts is not too large. It is also possible to add more constraints on districtings: for example that they avoid splitting certain counties, or include specified majority-minority districts. Given a notion of what constitutes a valid districting, our task is to compare hypothetical election outcomes from the present districting of a state with the those which would result from the other valid districtings of the same state. However, the space of possible valid districtings of a state is enormous, and far too large to be enumerable by a computer, even if districts are restricted to being made up of discrete units like municipal wards or census blocks. Thus the way to compare the districting of a state with the set of possible valid districtings is to make a statistical comparison. This requires being able to draw valid districtings randomly; here, we do not mean randomly as in "haphazardly", but really randomly, in the sense that we can randomly choose valid districtings in a way such that each valid districting is equally likely to be chosen.

Randomly generating districtings

Markov Chains are well-studied probabilistic processes which, among other things, can be used to generate random samples from complicated sets. The basic idea of a Markov Chain is simple: suppose I have some space of objects A I would like to draw a random sample from, but I don't have random access to the elements of the set. Instead, I begin with an arbitrary element X0 of the set (which may not be a typical random element) and make small iterative changes to the initial element X0, to produce a random sequence X0, X1, X2, X3, ... of elements of X, each slightly different, in a random way, from the one before it. This process is a Markov Chain on the space X, and it is known, under simple assumptions (for example, that each element of X can in principle be reached from every other one by a sequence of changes) that once the sequence is long enough, a term Xn in the sequence is essentially a random element of X.

Markov chains can thus be used to generate random valid districtings of a state quite easily. In this application, A is the set of all districtings of the state into the prescribed number of districts. Even though we have no direct way of accessing random elements of A, we do know at least one member of A; namely, the current districting of the state, and we can use this as X0. Small random changes can be made to districtings by swapping small areas on the boundary of a district to the neighboring district, so long as the change preserves the requirements on valid districtings. Thus by making many such swaps, we get a long sequence X0, X1, X2, X3, ..., Xn of districtings, and if n is large enough, we know that Xn is a random districting of A. Generating many such random valid districtings of the state would allow us to make a rigorous statistical statement about the status of the current districting of the state. For example, if we generate 1000 random valid districtings, and the present districting is more biased towards Republicans than all the generated districtings, than we have observed an event which should have happened with probability less than 1/1000, if the present districting was really a typical member of the space of valid districtings.

The problem with this method is that while mathematical theorems guarantee in principle that Xn is random if n is sufficiently large, they tell us nothing about how large n must be to have truly random samples, and in general, it is quite possible that n would have to be so large as to be completely impractical to generate even a single random sample. Indeed, for the districting Markov Chain, it is not even clear that it is possible to make small changes to turn any valid districting into any other, and if this is false, then one cannot generate random samples using the chain no matter how large n is.

Demonstrating bias without randomly generating districtings

Our work gives a way around this problem, by allowing one to use Markov Chains to demonstrate that the element X0 is an outlier in the set A, without having to run the Markov Chain for long enough to generate random samples of A. The statistical test we prove the correctness of is simple; one simply generates the sequence X0, X1, X2, X3, ..., Xk for as large a k as is desired (larger k will tend to allow better detection of bias). Now, if X0 is among the worst ε fraction of districtings observed in the Markov sequence, and ε is very small, than this would seem to indicate that X0 is somehow a very carefully chosen member of A. And indeed, Theorem 1.1 in our paper implies that this observation would have a probability of at most √ 2ε . For example, if the current districting of a state was typical of the space of valid districtings of the state, then we should have at most a 1% chance of observing that it is in the worst 1/20,000 fraction of the districtings in our Markov sequence of districtings, since 1/100=√ 2/20,000; . Such an observation would thus be statistically significant at a threshold of p=.01.

Results for Pennsylvania and Wisconsin

We have applied our test to the current Congressional districting of Pennsylvania, and to the current state assembly districting of Wisconsin. In each case, we have tried various notions how to formalize the requirement that districts have reasonable shapes. The way each of these conditions works is discussed in detail to the supplement of our paper. Roughly speaking, the perimeter constraint cares about the total perimeter of all districtings, the L1 constraint cares about the how much larger the perimeter of each district is relative to what you would expect just from its area, and the L2 constraint cares about the same thing, but heavily penalizes districtings which have even a few bad districts in them. In the tables below, each entry corresponds to a Markov sequence with 240 (roughly 1 trillion) districtings, generated with the indicated parameters. The ε value, as above, is the fraction of observed districtings which were as bad as the initial districting. This is generally tiny. In fact, for some parameters, no districting in the sequence of 1 trillion districtings was as bad as the initial districting! The p value in the table is the statistical significance of the outlier status from this run, computed simply as √ 2ε .

Shape constraint (Steps) metric ε p=
Perimeter 240 ωvar 3.0974 · 10−8 2.4889 · 10−4
Perimeter 240 ωMM 5.7448 · 10−10 3.3896 · 10−5
L1 240 ωvar 5.0123 · 10−11 1.0012 · 10−5
L1 240 ωMM 5.6936 · 10−10 3.3745 · 10−5
L2 240 ωvar 8.2249 · 10−11 1.2826 · 10−5
L2 240 ωMM 6.8038 · 10−10 3.6888 · 10−5
L 240 ωvar 3.3188 · 10−13 8.1472 · 10−7
L 240 ωMM 6.9485 · 10−8 3.7279 · 10−4

The metric column gives the technical parameter used to compare the level of electoral bias between districtings; these precise choices are discussed in the supplement of our paper. Since our method is based on relative bias of districtings, however, the precise metric we use is not so important (as demonstrated by the consistency achieved from two very different metrics in the case of Pennsylvania). For our analysis of Wisconsin, the results in the following table use the efficiency gap as a metric, because of the attention given to that metric in the Gill v. Whitford case:

Shape constraint (Steps) Preserve Counties? Preserve MM districts? ε p=
Perimeter 240 yes yes 2.7317 · 10−8 2.3374 · 10−4
L1 240 yes yes 1.6378 · 10−8 1.8099 · 10−4
L2 240 yes yes 1.0051 · 10−8 1.4178 · 10−4
L1 240 no no 3.5840 · 10−7 8.4664 · 10−4

For Wisconsin, we took the additional step of designing a Markov Chain which left unaltered any parts of the state belonging to counties lying entirely in one of the current districts of the Wisconsin assembly map, or any lying in one of the current Majority-Minority districts. These constraints are designed to show that the method is robust even when respecting a wide range of valid considerations map designers may have followed when designing the current districting. Our table also includes a run of the chain without these constraints.

For both states, the p values obtained are thus quite small; in principle we expect even smaller p values could be obtained with longer runs of the Markov Chain, but our relatively short runs (1 trillion steps, taking a week or two) have the advantage that they can be easily verified by third parties.