Computational Cartography & Gerrymandering

My brief recent dabbling with computational cartography (see Part 1 & Part 2) was inspired by Neil Freeman. His project was initiated as a (speculative) reform to the Electoral College.


Incidentally, this is a great idea. Every four years when whichever party looses the White House starts complaining about how silly the Electoral College is I feel compelled to point out that it's actually an excellent way to maximize the expected impact of any one voter's vote. The catch is that this requires states to have equally-weighted populations, which obviously we don't.

I just ran a quick Monte Carlo to check the numbers on this. Let's say you voted in 10 million elections in your life, and in each one there were 100 million voters. (This is obviously way too many elections, but I wanted something that didn't require me to report the results in scientific notation.) Let's further stipulate that every one of those races was close: a dead heat with a standard deviation of 1%. In this idealized case you could expect to be the tie-breaking vote — thus having your vote "matter" — in less than four of them. Four in ten million.

cnt = 10000000;
pop = 100000000;
for i=1:1000,
  results = round(random('normal',n/2,pop*.01,[cnt 1]));
  ties=[ties;sum(results==n/2)];
end;
mean(ties)     => ans = 3.9160

boston-red-sox-world-series-trophy

If, on the other hand, there were 51 states of 1,960,784 voters each, you would be the tie-breaker for your state 203 times. This would only matter if the other 50 states were tied 25/25. You can expect this to happen 11.2% of the time. Which means your chance of being the deciding vote goes from 4-in-10,000,000 to 23-in-10,000,000 which means your vote "matters" more than five times as much in the (modified) Electoral College system than it does in the winner-take-all system.

All of this is far too numerical to explain to someone in casual conversation, so my standard defense of the Electoral College is to agree that we should abolish it as soon as the winner of the World Series is decided by whichever teams scores the most runs in 63 innings rather than the team which wins four of seven games. The reasons to prefer both the Electoral College and best-of-seven champions are the same, but I'm already way too far off track to cover that now.


Okay, back to the point. Equally-sized states would be cool. But the chances of us redrawing state boundaries is zero.

Despite this, I think an automated system to draw boundaries for equally populated regions is far more than just an exercise. Why? Because while our state boundaries are effectively carved in stone by tradition — or in the case of my home of Maryland, literally carved in stone — there are political borders that are re-drawn with regularity: congressional districts.

That's how we end up with absurdity like this:

Maryland 3rd
Maryland's (totally reasonable, not at all farcical) 3rd Congressional District

Gerrymandering was one of the chief complaints than Glenn Reynolds made in his recent EconTalk appearance:

I'm also pretty much agnostic on term limits. I used to be opposed to them. And as I see how effective gerrymandering is, which I think is one of the reasons people hate Congress but love their Congressmen, or at least tolerate their Congressmen, because of gerrymandering, I think that's something that is difficult to deal with and probably wouldn't be helped that much by term limits. If you've got a gerrymandered district, especially with modern gerrymandering technology, the new guy who gets elected in it is likely to look an awful lot like the old guy because the district is so one-dimensional.

Taking the "especially with modern gerrymandering technology" as a given, how can we use technology to pre-empt this trend? It's usually best to address technical problems with technical solutions rather than social ones.

One option is to draw the boundaries the same way we do now, but reject them if their score on some measure falls below a certain acceptable level. For example, IDV Solutions has ranked districts by perimeter/sqrt(area). This is, by their own admission, a crude way to do things. To name just one problem, it favors districts that fall on state borders like the Mason-Dixon Line over those that fall on borders like the Potomac River.

Other measures are possible, obviously. One that I think would be useful is based on nearest neighbors. In a non-gerrymandered map the vast majority of addresses will have all of their k nearest neighbors in the same district. As districts become more complex, more people will be have large proportions of their neighbors in other districts. If you limit the neighbors to those within the state, which is reasonable, you minimize the Mason-Dixon/Potomac problem.

Another measure, which I've thought out less fully, would be to measure the Kolmogorov Complexity of the map. The descriptive length of the Maryland 3rd is... well "long" doesn't cut it. Compare that to the former definition of the Maryland 1st: "the Eastern Shore up to the Susquehanna River." The Kolmogorov Complexity itself is incomputable, but I think we have very good proxies in this case. Whoever draws the maps must be able to present in writing, with no graphics, an unambiguous definition of the districts in less than k words. Or alternately, must be able to present them orally to a judge, spoken and without visual aids, in c minutes.

There's a general principle here: If you can't explain it succinctly, it's a bad rule.

Another approach might be to measure the curvature or the fractal dimension of boundaries between districts. I've thought even less about this though.

The approach of capping some metric has the option of being simple, objective, and easily implementable. Whatever partisan committee which now draws the borders presents their map to an adjudicator, who passes it through an established, preferably open-source, program which calculates the metrics. If they're above some pre-established threshold the map is rejected. This is already an improvement over the subjective way we do things now.

A second option, suggested by David Friedman, is to have the partisan committee submit not a finished map, but a program to generate a map. The inputs to the program could contain population data, municipal boundaries, etc. but no information about party affiliation or past voting records, or any proxies for them. There would also be a cap on program length to keep anyone from submitting what is essentially just a precomputed lookup table.

Given the low numeracy I see in most legislators, I think it would be a hard sell to get them produce such a program, but the idea has tremendous appeal to me. The whole problem is to remove subjectivity and make the decision on the basis only of explicitly acceptable criteria, which is exactly what computers do: they're entirely dispassionate and the only know what you tell them.

We could take this one step further and simply codify the program to be used rather than having partisan committees writing their own after every census. I find a technocratic Bureau of Districting Algorithms much easier to imagine than local politicians wrestling with GIS algorithms.

This entry was posted in CS / Science / Tech / Coding and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *