Here is a question to think about. If religions help to create social capital by allowing people to signal conscientiousness, conformity, and trustworthiness [as Norenzayan claims], how does this relate to Bryan Caplan’s view that obtaining a college degree performs that function?
That might explain why the credentialist societies of Han China were relatively irreligious. Kling likes to use the Vickies/Thetes metaphor from Neal Stephenson's Diamond Age, and I think this dichotomy could play well with that. Wouldn't the tests required by the Reformed Distributed Republic fill this role, for instance?
This is by far the best, simplest explanation of Coase's insights that I have read. Having read plenty of Landsburg, that should not — indeed does not — surprise me.
His final 'graph is a digression, but a good point:
Coase’s Nobel Prize winning paper is surely one of the landmark papers of 20th century economics. It’s also entirely non-technical (which is fine), and (in my opinion) ridiculously verbose (which is annoying). It’s littered with numerical examples intended to illustrate several different but related points, but the points and the examples are so jumbled together that it’s often difficult to tell what point is being illustrated... Pioneering work is rarely presented cleanly, and Coase was a true pioneer.
And this is why I put little stock in "primary sources" when it comes to STEM. The intersection between people/publications who originate profound ideas and people/publications which explain profound ideas well is a narrow one. If what you want is the latter, don't automatically mistake it for the former. The best researchers are not the best teachers, and this is true as much for papers as it is for people.
Start a font by tweaking all glyphs at once. With more than twenty parameters, design custom classical or experimental shapes. Once prototyping of the font is done, each point and curve of a glyph can be easily modified. Explore, modify, compare, export with infinite variations.
(Okay, so technically this may not belong on a "reading list.") Duncan previously created The History of Rome podcast, which is one of my favorites. Revolutions is his new project, and it just launched. Get on board now.
This would be a great starting place for high-school or freshmen STEM curricula. As a bonus, it has this nice epigraph from Richard Hamming:
"In science, if you know what you are doing, you should not be doing it. In engineering, if you do not know what you are doing, you should not be doing it. Of course, you seldom, if ever, see either pure state."
I'm at the tail end of a doctoral program and going on the job market. This is good advice. What's disappointing is that this would have been equally good and applicable advice for people going on the job market back when I started grad school. The fact that we're five years (!!) down the road and we still have need of these sorts of "surviving in horrid job markets" pieces is bleak.
All programming blogs need at least one post unofficially titled “Indisputable Proof That I Am Awesome.” These are usually my favorite kind of read, as the protagonist starts out with a head full of hubris, becomes mired in self-doubt, struggles on when others would have quit, and then ultimately triumphs over evil (that is to say, slow or buggy computer code), often at the expense of personal hygiene and/or sanity.
I'm a fan of the debugging narrative, and this is a fine example of the genre. I've been wrestling with code for mapping projections recently, so I feel Miller's pain specifically. In my opinion the Winkel Tripel is mathematically gross, but aesthetically unsurpassed. Hopefully I'll find some time in the next week or so to put up a post about my mapping project.
Social cohesion can be thought of as a manifestation of how "iterated" people feel their interactions are, how likely they are to interact with the same people again and again and have to deal with long term consequences of locally optimal choices, or whether they feel they can "opt out" of consequences of interacting with some set of people in a poor way.
Munger links to some very good analysis but it occurs to me that what is really needed is the variance of grades over time and not just the mean. (Obviously these two things are related since the distribution is bounded by [0, 4]. A mean which has gone from 2.25 to 3.44 will almost certainly result in less variance here.)
I don't much care where the distribution is centered. I care how wide the distribution is — that's what lets observers distinguish one student from another. Rankings need inequality. Without it they convey no information.
I share Graur's and Tabarrok's wariness over "high impact false positives" in science. This is a big problem with no clear solutions.
The Graur et al. paper that Tabarrok discusses is entertaining in its incivility. Sometimes civility is not the correct response to falsehoods. It's refreshing to see scientists being so brutally honest with their opinions. Some might say they are too brutal, but at least they've got the honest part.
McCaffrey is completely right. But good luck to him reasoning people out of an opinion they were never reasoned into in the first place.
I do like the neologism "sustainable pricing" that he introduces. Bravo for that.
I would add a sixth reason to his list: accusations of "price gouging" are one rhetorical prong in an inescapable triple bind. A seller has three MECE choices: price goods higher than is common, the same as is common, or lower than is common. These choices will result in accusations of price gouging, collusion, and anti-competitive pricing, respectively. Since there is no way to win when dealing with people who level accusations of gouging, the only sensible thing to do is ignore them.
"Betting Therapy" should be a thing. You go to a betting therapist and describe your fears — everything you're afraid will happen if you do X — and then the therapist offers to bet money on whether it actually happens to you or not. After you lose enough money, you stop being afraid.
This is the second machine learning competition hosted at Kaggle that I've gotten serious about entering and sunk time into only to be derailed by a paper deadline. I'm pretty frustrated. Since I didn't get a chance to submit anything for the contest itself, I'm going to outline the approach I was trying here.
First a bit of background on this particular contest. The data is very high dimensional (1875 features) and multicategorical (9 classes). You get 1000 labeled training points, which isn't nearly enough to learn a good classifier on this data. In addition you get ~130000 unlabeled points. The goal is to leverage all the unlabeled data to be able to build a decent classifier out of the labeled data. To top it off you have no idea what the data represents, so it's impossible to use any domain knowledge.
I saw this contest a couple of weeks ago shortly after hearing a colleague's PhD proposal. His topic is the building networks of Kohonen Self-Organizing Maps for time series data, so SOMs are where my mind went first. SOMs are a good fit for this task: they can learn on labeled or unlabeled data, and they're excellent at dimensionality reduction.
My approach was to use the unlabeled training data to learn a SOM, since they lend themselves well to unsupervised learning. Then I passed the labeled data to the SOM. The maximally active node (i.e. the node whose weight vector best matches the input vector, aka the "best matching unit" or BMU) got tagged with the class of that training sample. Then I could repeat with the test data, and read out the class(es) tagged to the BMU for each data point.
So far that's simple enough, but there is far too much data to learn a SOM on efficiently,1 so I turned to my old ensemble methods.
 SOM bagging. The most obvious approach in many ways. Train each network on only a random subset of the data. The problem here is that any reasonable fraction of the data is still too big to get into memory. (IIRC Breiman's original Bagging paper used full boostraps, i.e. resamples the same size as the original set and even tested using resamples larger than the original data. That's not an option for me.) I could only manage 4096 data points (a paltry 3% of the data set) in each sample without page faulting. (Keep in mind again that a big chunk of this machine's memory was being used on my actual work.)
 SOM random dendrites. Like random forests, use the whole data set but only select a subset of the features for each SOM to learn from. I could use 64 of 1985 features at a time. This is also about 3%; the standard is IIRC more like 20%.
In order to add a bit more diversity to ensemble members I trained each for a random number of epochs between 100 and 200. There are a lot of other parameters that could have been adjusted to add diversity: smoothing, distance function and size of neighborhoods, size of network, network topology, ...
This is all pretty basic. There tricky part is combining the individual SOM predictions. For starters, how should you make a prediction with a single SOM? The BMU often had several different classes associated with it. You can pick whichever class has a plurality, and give that network's vote to that class. You can assign fractions of its vote in proportion to the class ratio of the BMU. You can take into account the distance between the sample of the BMU, and incorporate the BMU's neighbors. You can use a softmax or other probabilistic process. You can weight nodes individually or weight the votes of each SOM. This weighting can be done the traditional way (e.g. based on accuracy on a validation set) or in a way that is unique to the SOM's competitive learning process (e.g. how many times was this node the BMU? what is the distance in weight-space between this node and its neighbors? how much has this node moved in the final training epochs?).
At some point I'm going to come back to this. I have no idea if Kaggle keeps the infrastructure set up to allow post-deadline submissions, but I hope they do. I'd like to get my score on this just to satisfy my own curiosity.
This blackbox prediction concept kept cropping up in my mind while reading Nate Silver's The Signal and the Noise. We've got all these Big Questions where we're theoretically using scientific methods to reach conclusions, and yet new evidence rarely seems to change anyone's mind.
Does Medicaid improve health outcomes? Does the minimum wage increase unemployment? Did the ARRA stimulus spending work? In theory the Baicker et al. Oregon study, Card & Krueger, and the OMB's modeling ought to cause people to update beliefs but they rarely do. Let's not even get started on the IPCC, Mann's hockey stick, etc.
So here's what I'd like to do for each of these supposedly-evidence-based-and-numerical-but-not-really issues. Assemble an expert group of econometricians, modelers, quants and so on. Give them a bunch of unlabeled data. They won't know what problem they're working on or what any of the features are. Ask them to come up with the best predictors they can.
If they determine minimum wages drive unemployment without knowing they're looking at economic data then that's good evidence the two are linked. If their solution uses Stanley Cup winners but not atmospheric CO2 levels to predict tornado frequency then that's good evidence CO2 isn't a driver of tornadoes.
I don't expect this to settle any of these questions once-and-for-all — I don't expect anything at all will do that. There are too many problems (who decides what goes in the data set or how it's cleaned or scaled or lagged?). But I think doing things double-blind like this would create a lot more confidence in econometric-style results. In a way it even lessens the data-trawling problem by stepping into the issue head-on: no more doubting how much the researchers just went fishing for any correlation they could find, because we know that's exactly what they did, so we can be fully skeptical of their results.
I also ran up against computational constraints here. I'm using almost every CPU cycle (and most of the RAM) I can get my hands on to run some last-minute analysis for the aforementioned paper submission, so I didn't have a lot of resources left over to throw at this. To top it off there's a bunch of end-of-semester server maintenance going on which both took processors out of the rotation and prevented me from parallelizing this the way I wanted. [↩]
As a CS guy who's tip-toed into psychology here and there I would offer Morrison & Murphy-Hill this advice: tread very, very lightly when making claims regarding the words "knowledge" and especially "intelligence."
I'm glad I didn't know about this in the winter of 2003, when I engaged in intense bouts of Tetris as a weird form of post-modern zazen. I still remember the guy who used to sit in front of me in Linear Algebra wore a tattersall shirt every single class, and I would see tetrominos cascading down his back.
This is why I want legislators & regulators who have played some strategy games. I want people making rules who have the habit of thinking, "If I do this, what is the other guy going to do? Surely he won't simply keep doing the things he was doing before I changed the environment. And surely not the exact the thing that I hope he does. What if he responds by...?"
We started with a weird pseudo-equation, manipulated it as if it were meaningful, transformed it into a series of statements that were either meaningless or clearly false, and out popped something that happened to be true. What Blass essentially proved (and Fiore and Leinster generalized) is, in effect, is that this is no coincidence. More specifically, they’ve proved in a very broad context that if you manipulate this kind of equation, pretending that sets are numbers and not letting yourself get ruffled by the illegitimacy of everything you’re doing, the end result is sure to be either a) obviously false or b) true.
My (very briefly stated) problem with p-values is that they combine size-of-effect and effort-in-experiment into one scalar. (This has been in the news a lot lately with the Oregon Medicaid study. Was the effect of Medicaid on blood pressure, glucose levels, etc. insignificant because Medicaid doesn't help much or because the sample size was too small? Unsurprising peoples' answers to this question are perfectly correlated with all of their prior political beliefs.)
One of the pitfalls of computational modeling is that it allows researchers to just keeping churning out simulation runs until their results are "significant." Processor cycles get shoveled into the model's maw until you have enough results to make even a tiny observed effect fit in under that magical p=0.05 limit. In theory everyone knows this isn't kosher, but "in theory" only takes us so far.
I am irked by people claiming that a non-medical doctorate is somehow "not real" though. "Doctor," like most words, has several meanings. What kind of semiotic/linguistic authority are they to declare which one is "real" and which isn't? Thanks, but they can leave their self-serving grammatical prescriptivism out of this.
Suppose that... it eventually becomes clear that quantum annealing can be made to work on thousands of qubits, but that it’s a dead end as far as getting a quantum speedup is concerned. Suppose the evidence piles up that simulated annealing on a conventional computer will continue to beat quantum annealing, if even the slightest effort is put into optimizing the classical annealing code. If that happens, then I predict that the very same people now hyping D-Wave will turn around and—without the slightest acknowledgment of error on their part—declare that the entire field of quantum computing has now been unmasked as a mirage, a scam, and a chimera. The same pointy-haired bosses who now flock toward quantum computing, will flock away from it just as quickly and as uncomprehendingly. Academic QC programs will be decimated, despite the slow but genuine progress that they’d been making the entire time in a “parallel universe” from D-Wave.
I think Aaronson is right to worry about that possibility. That's essentially what caused the "AI Winter." I'd hate to see that happen to QC.
The problem is not a lack of interest, but the lack of cheap, programmable hardware for teenagers to cut their teeth on. For typical youngsters, computers have become too complicated, too difficult to open (laptops especially) and alter their settings, and way too expensive to tinker with and risk voiding their warranty by frying their innards.
I don't see the connection between learning to code and having super-cheap hardware. Back when I was a kid learning to program I actually had to pay real money for a compiler. (Anyone else remember Borland Turbo C++?) Now you're tripping over free languages and environments to use, including many that run entirely through your browser so there's zero risk to your machine.
Honestly how many teens are going to go full-David Lightman and be doing serious enough hacking that their hardware is at risk? Is the goal to make sure teens have the opportunity to start learning to code before college, or to give them hardware to tinker with? Those are both fine goals. Being a software guy I'd put more weight on the former, but the important thing is that the way to accomplish either are completely different.
For starters, [your correspondant] plans to turn his existing Raspberry Pi into a media centre. By all accounts, Raspbmc—a Linux-based operating system derived from the XBox game-player’s media centre—is a pretty powerful media player. The first task, then, is to rig the Raspberry Pi up so it can pluck video off the internet, via a nearby WiFi router, and stream it direct to a TV in the living room. Finding out not whether, but just how well, this minimalist little computer manages such a feat will be all part of the fun.
I did this exact project about a month ago, and couldn't be more pleased with either the results or the fun of getting it to work. I still have to tinker with some things: the Vimeo plugin won't log into my account, and I need to build a case. Other than that, I wish I had done this a long time ago.
My brief recent dabbling with computational cartography (see Part 1 & Part 2) was inspired by Neil Freeman. His project was initiated as a (speculative) reform to the Electoral College.
Incidentally, this is a great idea. Every four years when whichever party looses the White House starts complaining about how silly the Electoral College is I feel compelled to point out that it's actually an excellent way to maximize the expected impact of any one voter's vote. The catch is that this requires states to have equally-weighted populations, which obviously we don't.
I just ran a quick Monte Carlo to check the numbers on this. Let's say you voted in 10 million elections in your life, and in each one there were 100 million voters. (This is obviously way too many elections, but I wanted something that didn't require me to report the results in scientific notation.) Let's further stipulate that every one of those races was close: a dead heat with a standard deviation of 1%. In this idealized case you could expect to be the tie-breaking vote — thus having your vote "matter" — in less than four of them. Four in ten million.
cnt = 10000000;
pop = 100000000;
results = round(random('normal',n/2,pop*.01,[cnt 1]));
mean(ties) => ans = 3.9160
If, on the other hand, there were 51 states of 1,960,784 voters each, you would be the tie-breaker for your state 203 times. This would only matter if the other 50 states were tied 25/25. You can expect this to happen 11.2% of the time. Which means your chance of being the deciding vote goes from 4-in-10,000,000 to 23-in-10,000,000 which means your vote "matters" more than five times as much in the (modified) Electoral College system than it does in the winner-take-all system.
All of this is far too numerical to explain to someone in casual conversation, so my standard defense of the Electoral College is to agree that we should abolish it as soon as the winner of the World Series is decided by whichever teams scores the most runs in 63 innings rather than the team which wins four of seven games. The reasons to prefer both the Electoral College and best-of-seven champions are the same, but I'm already way too far off track to cover that now.
Okay, back to the point. Equally-sized states would be cool. But the chances of us redrawing state boundaries is zero.
Despite this, I think an automated system to draw boundaries for equally populated regions is far more than just an exercise. Why? Because while our state boundaries are effectively carved in stone by tradition — or in the case of my home of Maryland, literally carved in stone — there are political borders that are re-drawn with regularity: congressional districts.
I'm also pretty much agnostic on term limits. I used to be opposed to them. And as I see how effective gerrymandering is, which I think is one of the reasons people hate Congress but love their Congressmen, or at least tolerate their Congressmen, because of gerrymandering, I think that's something that is difficult to deal with and probably wouldn't be helped that much by term limits. If you've got a gerrymandered district, especially with modern gerrymandering technology, the new guy who gets elected in it is likely to look an awful lot like the old guy because the district is so one-dimensional.
Taking the "especially with modern gerrymandering technology" as a given, how can we use technology to pre-empt this trend? It's usually best to address technical problems with technical solutions rather than social ones.
One option is to draw the boundaries the same way we do now, but reject them if their score on some measure falls below a certain acceptable level. For example, IDV Solutions has ranked districts byperimeter/sqrt(area). This is, by their own admission, a crude way to do things. To name just one problem, it favors districts that fall on state borders like the Mason-Dixon Line over those that fall on borders like the Potomac River.
Other measures are possible, obviously. One that I think would be useful is based on nearest neighbors. In a non-gerrymandered map the vast majority of addresses will have all of their k nearest neighbors in the same district. As districts become more complex, more people will be have large proportions of their neighbors in other districts. If you limit the neighbors to those within the state, which is reasonable, you minimize the Mason-Dixon/Potomac problem.
Another measure, which I've thought out less fully, would be to measure the Kolmogorov Complexity of the map. The descriptive length of the Maryland 3rd is... well "long" doesn't cut it. Compare that to the former definition of the Maryland 1st: "the Eastern Shore up to the Susquehanna River." The Kolmogorov Complexity itself is incomputable, but I think we have very good proxies in this case. Whoever draws the maps must be able to present in writing, with no graphics, an unambiguous definition of the districts in less than k words. Or alternately, must be able to present them orally to a judge, spoken and without visual aids, in c minutes.
There's a general principle here: If you can't explain it succinctly, it's a bad rule.
Another approach might be to measure the curvature or the fractal dimension of boundaries between districts. I've thought even less about this though.
The approach of capping some metric has the option of being simple, objective, and easily implementable. Whatever partisan committee which now draws the borders presents their map to an adjudicator, who passes it through an established, preferably open-source, program which calculates the metrics. If they're above some pre-established threshold the map is rejected. This is already an improvement over the subjective way we do things now.
A second option, suggested by David Friedman, is to have the partisan committee submit not a finished map, but a program to generate a map. The inputs to the program could contain population data, municipal boundaries, etc. but no information about party affiliation or past voting records, or any proxies for them. There would also be a cap on program length to keep anyone from submitting what is essentially just a precomputed lookup table.
Given the low numeracy I see in most legislators, I think it would be a hard sell to get them produce such a program, but the idea has tremendous appeal to me. The whole problem is to remove subjectivity and make the decision on the basis only of explicitly acceptable criteria, which is exactly what computers do: they're entirely dispassionate and the only know what you tell them.
We could take this one step further and simply codify the program to be used rather than having partisan committees writing their own after every census. I find a technocratic Bureau of Districting Algorithms much easier to imagine than local politicians wrestling with GIS algorithms.
Don Boudreaux discussing Armen Alchian's preference for clear prose over "mathematical pyrotechnics" reminded me of a few neural networks researchers I know. I won't name names, because it wasn't a favorable comparison. There's far too much equation-based whizz-bangery going on in some papers.
I use to think the problem was insufficient sophistication in my own math background, but I've recently heard independently from two very smart people in our Applied Math/Scientific Computing program that they also find the math in a lot of these papers to be more of an obfuscating smoke screen than a clarifying explication. If they find it hard to follow I've got good reason to believe the problem isn't just me.