In the previous post I talked a bit about how one might build a system which would algorithmically divide the US into 50 states of approximately equal populations.
I've messed around with the problem a bit and developed a very rough approximation. Actually, I don't even feel comfortable calling this an approximation. But it is at least the scaffolding you could build an eventual solution around.
It imports the relevant data from the Census (after a lot of PitA data cleaning) and throws it up on the screen, connecting neighboring counties. Then it picks 50 random counties to be the centers of the states and runs a k-means update on them (weighted by the county populations) to iterate to a better solution. The only wrinkle to a basic clustering is that it uses the population of a state to set a z-coordinate value for each centroid when determining the distance to each point. States with more people in them are represented higher up, making them effectively smaller.
The image above is what you end up with; here's where the system started:
The ratio of the most populous state to the least in America now is 66:1. As simple as this system is, some of the runs I tested on got that ratio down to 11:1.
There's a thousand things to do to this to make it an actual solution to the problem of automatically re-drawing state boundaries. I need to do things from the small (use spherical geometry to accurately calculate the distance between pairs of lat/long coordinates), to the medium (link AK/HI/PR to the mainland better), to the major (balance state populations more explicitly). Nevertheless, I think this is a useful framework in to build some intelligent decision making on top of.