I don't usually write up my technical work here, mostly because I spend enough hours as is doing technical writing. But a co-author, Jon Barker, recently wrote a post on the NVIDIA Parallel For All blog about one of our papers on neural networks for detecting malware, so I thought I'd link to it here. (You can read the paper itself, "Malware Detection by Eating a Whole EXE" here.) Plus it was on the front page of Hacker News earlier this week, which is not something I thought would ever happen to my work.
Rather than rehashing everything in Jon's Parallel for All post about our work, I want to highlight some of the lessons we learned from doing this about ML/neural nets/deep learning.
As way of background, I'll lift a few paragraphs from Jon's introduction:
The paper introduces an artificial neural network trained to differentiate between benign and malicious Windows executable files with only the raw byte sequence of the executable as input. This approach has several practical advantages:
No hand-crafted features or knowledge of the compiler used are required. This means the trained model is generalizable and robust to natural variations in malware.
The computational complexity is linearly dependent on the sequence length (binary size), which means inference is fast and scalable to very large files.
Important sub-regions of the binary can be identified for forensic analysis.
This approach is also adaptable to new file formats, compilers and instruction set architectures—all we need is training data.
We also hope this paper demonstrates that malware detection from raw byte sequences has unique and challenging properties that make it a fruitful research area for the larger machine learning community.
One of the big issues we were confronting with our approach, MalConv, is that executables are often millions of bytes in length. That's orders of magnitude more time steps than most sequence processing networks deal with. Big data usually refers to lots and lots of small data points, but for us each individual sample was big. Saying this was a non-trivial problem is a serious understatement.
Here are three lessons we learned, not about malware or cybersecurity, but about the process of building neural networks on such unusual data.
1. Deep learning != image processing
The large majority of the work in deep learning has been done in the image domain. Of the remainder, the large majority has been in either text or speech. Many of the lessons, best practices, rules of thumb, etc., that we think apply to deep learning may actually be specific to these domains.
For instance, the community has settled around narrow convolutional filters, stacked with a lot of depth as being generally the best way to go. And for images, narrow-and-deep absolutely seems to be the correct choice. But in order to get a network that processes two million time steps to fit in memory at all (on beefy 16GB cards no less) we were forced to go wide-and-shallow.
With images, a pixel values is always a pixel value. 0x20 in a grayscale image is always darkish gray, no matter what. In an executable, a byte values are ridiculously polysemous: 0x20 may be part of an instruction, a string, a bit array, a compressed or encrypted values, an address, etc. You can't interpolate between values at all, so you can't resize or crop the way you would with images to make your data set smaller or introduce data augmentation. Binaries also play havoc with locality, since you can re-arrange functions in any order, among other things. You can't rely on any Tobbler's Law1 relationship the way you can in images, text, or speech.
2. BatchNorm isn't pixie dust
Batch Normalization has this bippity-boppity-boo magic quality. Just sprinkle it on top of your network architecture, and things that didn't converge before now do, and things that did converge now converge faster. It's worked like that every time I've tried it — on images. When we tried it on binaries it actually had the opposite effect: networks that converged slowly now didn't at all, no matter what variety of architecture we tried. It's also had no effect at all on some other esoteric data sets that I've worked on.
We discuss this at more length in the paper (§5.3), but here's the relevant figure:
This is showing the pre-BN activations from MalConv (blue) and from ResNet (red & orange) and Inception-v4 (green). The purpose of BatchNorm is to output values in a standard normal, and it implicitly expects inputs that are relatively close to that. What we suspect is happening is that the input values from other networks aren't gaussian, but they're close-ish.2 The input values for MalConv display huge asperity, and aren't even unimodal. If BatchNorm is being wonky for you, I'd suggest plotting the pre-BN activations and checking to see that they're relatively smooth and unimodal.
3. The Lump of Regularization Fallacy
If you're overfitting, you probably need more regularization. Simple advice, and easily executed. Everytime I see this brought up though, people treat regularization as if it's this monolithic thing. Implicitly, people are talking as if you have some pile of regularization, and if you need to fight overfitting then you just shovel more regularization on top. It doesn't matter what kind, just add more.
We ran in to overfitting problems and tried every method we could think of: weight decay, dropout, regional dropout, gradient noise, activation noise, and on and on. The only one that had any impact was DeCov, which penalized activities in the penultimate layer that are highly correlated with each other. I have no idea what will work on your data — especially if it's not images/speech/text — so try different types. Don't just treat regularization as a single knob that you crank up or down.
I hope some of these lessons are helpful to you if you're into cybersecurity, or pushing machine learning into new domains in general. We'll be presenting the paper this is all based on at the Artificial Intelligence for Cyber Security (AICS) workshop at AAAI in February, so if you're at AAAI then stop by and talk.
Everything is related, but near things are more related than far things. [↩]
I'd love to be able to quantify that closeness, but every test for normality I'm aware of doesn't apply when you have this many samples. If anyone knows of a more robust test please let me know. [↩]
The MIT Technology Review has a recent article by James Somers about error backpropagation, "Is AI Riding a One-Trick Pony?" Overall, I agree with the message in the article. We need to keep thinking of new paradigms because the SotA right now is very useful, but not correct in any rigorous way. However, as much as I agree with the thesis, I think Somers oversells it, especially in the beginning of the piece. For instance, the introductory segment concludes:
When you boil it down, AI today is deep learning, and deep learning is backprop — which is amazing, considering that backprop is more than 30 years old. It’s worth understanding how that happened—how a technique could lie in wait for so long and then cause such an explosion — because once you understand the story of backprop, you’ll start to understand the current moment in AI, and in particular the fact that maybe we’re not actually at the beginning of a revolution. Maybe we’re at the end of one.
That's a bit like saying "When you boil it down, flight is airfoils, and airfoils are Bernoulli's principle — which is amazing, considering that Bernoulli's principle is almost 300 years old." I totally endorse the idea that we ought to understand backprop; I've spent a lot of effort in the last couple of months organizing training for some of my firm's senior leadership on neural networks, and EBP/gradient descent is the heart of my presentation. But I would be very, very careful about concluding that backprop is the entire show.
Backprop was also not "lying in wait." People were working on it since it was introduced in 1986. The problem was that '86 was the height of the 2nd AI winter, which lasted another decade. Just like people should understand backprop to understand contemporary AI, they should learn about the history of AI to understand contemporary AI. Just because no one outside of CS (and precious few people in CS, for that matter) paid any attention to neural networks before 2015 doesn't mean they were completely dormant, only to spring up fully formed in some sort of intellectual Athenian birth.
I really don't want to be in the position of defending backprop. I took the trouble to write a dissertation about non-backprop neural nets for a reason, after all.1 But I also don't want to be in the position of letting sloppy arguments against neural nets go unremarked. That road leads to people mischaracterizing Minksy and Papert, abandoning neural nets for generations, and putting us epochs behind where we might have been.2
PS This is also worth a rejoinder:
Big patterns of neural activity, if you’re a mathematician, can be captured in a vector space, with each neuron’s activity corresponding to a number, and each number to a coordinate of a really big vector. In Hinton’s view, that’s what thought is: a dance of vectors.
That's not what thought is, that's how thought can be represented. Planets are not vectors, but their orbits can be profitably described that way, because "it behooves us to place the foundations of knowledge in mathematics." I'm sorry if that seems pedantic, but the distinction between a thing and its representation—besides giving semioticians something to talk about—underpins much of our interpretation of AI systems and cognitive science as well. Indeed, a huge chunk of data science work is figuring out the right representations. If you can get that, your problem is often largely solved.3
PPS This, on the other hand, I agree with entirely:
Deep learning in some ways mimics what goes on in the human brain, but only in a shallow way. … What we know about intelligence is nothing against the vastness of what we still don’t know.
What I fear is that people read that and conclude that artificial neural networks are built on a shallow foundation, so we should give up on them as being unreliable. A much better conclusion would be that we need to keep working and build better, deeper foundations.
That reason being, roughly put, that we're pretty sure the brain is not using backprop, and it seems ill-advised to ignore the mechanisms employed by the most intelligent thing we are aware of. [↩]
Plus sloppy arguments should be eschewed on the basis of the sloppiness alone, irrespective of their consequences. [↩]
IIRC both Knuth and Torvalds have aphorisms to the effect that once you have chosen the correct data structures, the correct algorithms will naturally follow. I think AI and neuroscience are dealing with a lot of friction because we haven't been able to figure out the right representations/data structures. When we do, the right learning algorithms will follow much more easily. [↩]
I'm 100% on board with the US having a strategy, but I want to offer one caveat: "comprehensive national strategies" are susceptible to becoming top-down, centralized plans, which I think is dangerous.
I'm generally disinclined to centralized planning, for both efficiency and philosophical reasons. I'm not going to take the time now to explain why; I doubt anything I could scratch out here would shift people very much along any kind of Keynes-Hayekspectrum.
So why am I bothering to bring this up? Mostly because I think it would be especially ill-conceived to adopt central planning when it comes to AI. The recent progress in AI has been largely a result of abandoning top-down techniques in favor of bottom-up ones. We've abandoned hand-coded visual feature detectors for convolutional neural networks. We've abandoned human-engineered grammar models for statistical machine translation. In one discipline after another emergent behavior has outpaced decades worth of expert-designed techniques. To layer top-down policy-making on a field built of bottom-up science would be a waste, and an ironic one at that.
PS Having spoken to two of the three authors of this piece, I don't mean to imply that they support centralized planning of the AI industry. This is just something I would be on guard against.
As an AI researcher, I think I am required to have an opinion about this. Here's what I have to say to the various tribes.
AI-pessimists: please remember that the Luddites have been wrong about technology causing economic cataclysm every time so far. We're talking about several consecutive centuries of wrongness.1 Please revise your confidence estimates downwards.
AI-optimists: please remember that just because the pessimists have always been wrong in the past does not mean that they must always be wrong in the future. It is not a natural law that the optimists must be right. That labor markets have adapted in the long term does not mean that they must adapt, to say nothing of short-term dislocations. Please revise your confidence estimates downwards.
Everyone: many forms of technology are substitutes for labor. Many forms of technology are complements to labor. Often a single form of technology is both simultaneously. It is impossible to determine a priori which effect will dominate.2 This is true of everything from the mouldboard plough to a convolutional neural network. Don't casually assert AI/ML/robots are qualitatively different. (For example, why does Bill Gates think we need a special tax on robots that is distinct from a tax on any other capital equipment?)
As always, please exercise cognitive and epistemic humility.
I am aware of the work of Gregory Clark and others related to Industrial Revolution era wage and consumption stagnation. If a disaster requires complicated statistical models to provide evidence it exists, I say its scale can not have been that disastrous. [↩]
Who correctly predicted that the introduction of ATMs would coincide with an increase in employment of bank tellers? Anyone? Anyone? Beuller? [↩]
I know plenty about algorithms, and enough about marketing.1 And despite that, I'm not sure what this headline actually means. It's eye catching, to be sure, but what would marketing to an algorithm look like?
When you get down to it, marketing is applied psychology. Algorithms don't have psyches. Whatever "marketing to algorithms" means, I don't think it's going to be recognizable as marketing.
Would you call what spammers do to slip past your filters "marketing"? (That's not rhetorical.) Does that count as marketing? Because that's pretty much what Gunton seems to be describing.
Setting aside the intriguing possibility of falling in love with an artificial intelligence, the film [Spike Jonez's Her] raises a potentially terrifying possibility for the marketing industry.
It suggests a world where an automated guardian manages our lives, taking away the awkward detail; the boring tasks of daily existence, leaving us with the bits we enjoy, or where we make a contribution. In this world our virtual assistants would quite naturally act as barriers between us and some brands and services.
Great swathes of brand relationships could become automated. Your energy bills and contracts, water, gas, car insurance, home insurance, bank, pension, life assurance, supermarket, home maintenance, transport solutions, IT and entertainment packages; all of these relationships could be managed by your beautiful personal OS.
If you're a electric company whose customers all interact with you via software daeomns, do you even have a brand identity any more? Aren't we discussing a world in which more things will be commoditized? And isn't that a good thing for most of the categories listed?
What do we really care about: getting goods and services, or expressing ourselves through the brands we identify with? Both, to an extent. But if we can no longer do that through our supermarkets or banking, won't we simply shift that focus it to other sectors: clothes, music, etc.
2. Consider that legislation may be an inferior form of law not just recently, or occasionally, but usually. Instead, consider the ideas of Bruno Leoni, which suggest that common law that emerges from individual cases represents a spontaneous order, while legislation represents an attempt at top-down control that works less well.
Both of these stories remind me of a couple of scenes in Greg Egan's excellent Permutation City. Egan describes a situation where people have daemons to answer their video phones that have learned (bottom-up) how to mimic your reactions well enough to screen out personal calls from automated messages. In turn marketers have software that learns how to recognize if they're talking to a real person or one of these filtering systems. The two have entered an evolutionary race to the point that people's filters are almost full-scale neurocognitive models of their personalities.
Enough to draw a paycheck from a department of marketing for a few years, at least. [↩]
I thought I would post some of the bite-sized coding pieces I've done recently. To lead off, here's Ruby function to find the distance between two points given their latitude and longitude.
Latitude is given in degrees north of the equator (use negatives for the Southern Hemisphere) and longitude is given in degrees east of the Prime Meridian (optionally use negatives for the Western Hemisphere).
DEG2RAD = PI/180.0
def lldist(lat1, lon1, lat2, lon2)
rho = 3960.0
theta1 = lon1*DEG2RAD
phi1 = (90.0-lat1)*DEG2RAD
theta2 = lon2*DEG2RAD
phi2 = (90.0-lat2)*DEG2RAD
val = sin(phi1)*sin(phi2)*cos(theta1-theta2)+cos(phi1)*cos(phi2)
val = [-1.0, val].max
val = [ val, 1.0].min
psi = acos(val)
A couple of notes:
Everything with val at the bottom is to deal with an edge case that can crop up when you try to get the distance between a point and itself. In that case val should be equal to 1.0, but on my systems some floating-point errors creep in and I get 1.0000000000000002, which is out of range for the acos() function.
This returns the distance in miles. If you want some other unit, redefine rho with the appropriate value for the radius of the earth in your desired unit (6371 km, 1137 leagues, 4304730 passus, or what have you).
This assumes the Earth is spherical, which is a decent first approximation, but is still just that: a first approximation.1
I am currently writing a second version to account for the difference between geographic and geocentric latitude which should do a good job of accounting for the Earth's eccentricity. The math is not hard, but finding ground truth to validate my results against is, since the online calculators I've tried to check against do not make their assumptions clear. I did find a promising suite of tools for pilots, and I'd hope if you're doing something as fraught with consequences as flying that you've accounted for these sorts of things.
As far as I'm concerned, this is my canonical example of the difference between a first and second approximation. The Earth isn't really a oblate spheroid either, but that makes a very good second approximation — about 100 m. (See John Cook here and here.) [↩]
However, the sub-rational side of me is loving this. Not for any partisan reasons — I'm an "a plague a' both your houses" sort of guy — but rather because it is so satisfying to this geek to see the President,2 his cabinet secretaries, senators, and all the other high and mighty mandarins and viziers of the Beltway brought low before the intransigent reality of Code.
All these powerful people are learning (one hopes) the painful lesson that so many powerful people before them have learned when confronting technical problems. It does not matter how many laws you can create with the stroke of your pen, nor how many regiments you can order about, nor how many sheriffs or tax collectors or wardens you direct: you can't give orders to Computers. It is nice to see such mighty people forced to acknowledge — as thousands of hapless executives and others have in the past — that things are not as simple as commanding geeky worker bees to make it so. No number of fiats, from however august an authority, can summon software in to being: It must be made.
As my father — a former legislative assistant on the Hill — said, "passing a law requiring the exchanges to be open is like passing a law forbidding people from being sick, and just as effective."
Compilers don't care about oratory or rhetoric. Political capital can't find bugs. Segfaults aren't fixed at whistle-stops or town-halls or photo-ops. No quantity of arm-bending or tongue-wagging or log-rolling or back-scratching can plug memory leaks. You can't hand-shake or baby-kiss your way into working code.
I tend to see two different mistaken attitudes among non-geeks when it comes to how software is actually made. Some people think it's complete magic, which is flattering but utterly wrong. Others see it as "just pressing buttons," which is wrong but utterly arrogant.
Programmers sit at computers, stare at monitors, and type. Which is exactly what J. Random Whitecollar does, so how hard can it be? It is, after all, "just typing" — although in the same way that surgery is just cutting and stitching.3
I have become accustomed — as every CS grad student becomes — to getting emails from founders seeking technical expertise for their start-ups. The majority of these are complete rubbish, written by two troglodytes who imagine that coming up with an idea plus a clever name for a website constitutes the bulk of the work. These emails typically include a line about "just needing someone to create the site/app/program for us." This is a dead give away that these people will make terrible partners. Just create it? Just? You might as well tell a writer that you have an idea for a novel, and could he please just write the book for you?
This is the same attitude I see from the the White House. Not only did they start off the process with the general suits-vs-geeks attitude, they continued at every turn to place precedence on political desires over engineering realities: failing to set realistic deadlines from the start, leaving all details up to the numerous "the secretary shall determine" clauses in the legislation, delaying the date that states must decide if they would run their own exchanges, delaying finalizing what the rules on the back end would be for insurers, HHS insisting on doing the general contracting itself,4 the head-in-the-sand "brisk management" they engaged in when it became clear the deadline would be slipped, etc., etc. Over and over again the political establishment prioritized their own wants over the engineering needs.
Suits imagine they have the hard job, because that's the only job they know how to do. Yes, the political wrangling is difficult. But we geeks have sat in those frustrating meetings, attempting to get disparate parties on the same page. We've drafted those memos, and written those reports, and had those conference calls. We have to do all that too. When's the last time the suits tried our job? When's the last time they wrestled with memory allocation bug in ad hoc dynamic data structures nested four deep? When have they puzzled out a floating point underflow error? When have they deciphered an undocumented API?
The psychologically easiest response, when confronted with something you have no clue how to do, is to assert that it's simple, and you would easily do it if only you had the inclination and time denied you by having to deal with more rigorous matters.
I don't want to fall in to the opposite trap here of assuming the other guy's job, i.e. the political, non-engineering one, is easy. But let me ask you some questions. How many people in the executive branch have the jobs they do because they donated to a campaign or ran a solid get-out-the-vote drive in a swing state, or did something else politically advantageous to the current occupant of the Oval Office but otherwise entirely unrelated to the department/bureau/administration they now give orders to? And how many political appointees are where they are because they've mastered their craft over tens of thousands of hours of practice?
Now answer those same questions, but substitute "software engineering firm" for "executive branch." What's the ratio of people who get ahead by who-they-know to those who are promoted for what-they-know there? Silicon valley isn't exactly known for sinecures and benefices. On the other hand OPM has entire explicit classes of senior-level officials who are where they are for no other reason than POTUS's say-so. And this isn't some kind of sub-rosa, wink-wink-nudge-nudge thing: this is exactly how the administration is supposed to function.
Let's shift gears and take a look at Charette's (soon-to-be-) classic article, "Why Software Fails." I don't expect the politicians and bureaucrats in charge of this thing to have read K&R or SICP backwards and forwards or have a whole menagerie of O'Reilly books on their shelf. But they at least ought to be familiar with this sort of thing before embarking on a complete demolition and remodel of a sixth of the US economy that was critically dependent on a website.
Here's Charette's list of common factors:
Unrealistic or unarticulated project goals
Inaccurate estimates of needed resources
Badly defined system requirements
Poor reporting of the project's status
Poor communication among customers, developers, and users
Use of immature technology
Inability to handle the project's complexity
Sloppy development practices
Poor project management
Let's assume the final one doesn't apply (although I'm sure there were still budget constraints, since I remember multiple proposals all summer and autumn to fix this by throwing more money at it). Other than that, I could find a news story to back up the ObamaCare site making every one of these mistakes other than #7. You're looking at 10 out of 12 failure indicators. Even granting very generous interpretation of events there's no way the Exchanges weren't dealing with at the very least #1, 3, 4, and 6.
I've seen plenty of people on the Right gleefully jeer that this is what happens when you don't have market incentives to guide you. They're right.
I've also seen plenty of people on the Left retort that history is littered with private enterprises that have wasted billions on poorly-executed ambitious IT projects. They're right too.
Of course, they're both wrong as well.
The people on the Right are engaging in a huge amount of survivorship bias. All those companies that screwed up an IT rollout like this aren't around for us to notice anymore. Maybe they aren't bankrupt, but they're not as salient as their successful competitors either. Failures are obscure, successes are obvious.
The people on the Left are misunderstanding how distributed, complex systems like a market work. Yes, individual agents will fail. That's part of the plan, just like it is in evolution. You can't have survival-of-the-fittest without also having the contrapositive.5 We don't have the freedom to develop healthcare.gov via the distributed market-driven exploration process. All of our eggs are in this one basket.
I don't think the people making either claim really grok how a market is supposed to operate. It wouldn't help to give the healthcare.gov developers an equity stake or pay them lavish bonuses. You might get more effort from them or a better group of programmers, but you've still only got a single attempt at getting this right. And the people pointing out all the wasted private-sector IT spending are also missing the point. Yes, there are failures, but the entire system relies on failures to find the successes. The ACA does the opposite of that by forcing everyone to adopt the same approach and continuing to disallow purchasing across state lines. That's a recipe for catastrophic loss of diversity and dampening of feedback signals.
I've seen people on all sides suggest that what we really needed to do was go to Silicon Valley and hire some hotshot programmers and give them big paychecks, and they could build this for us lickety-split. Instead, we're stuck paying people mediocre GS salaries (or the equivalent via contractors), so we get mediocre programmers who deliver mediocre product. I don't think this reasoning holds up. Another common observation, which I also think is flawed, is almost the opposite: there was never a way to make this work since good coders in the Valley expect to get equity stakes when they create big, ambitious software products, and no such compensation is possible for a federal contract.
At the margin more money will obviously help. Ceteris paribus, you will get more talented people. But that's not the whole story, by a long-shot.
1. There's several orders of magnitude between the best programmers and the median programmers. You can't even quantify the difference in quality between the best and the worst, because the worst have negative productivity: they introduce more bugs into the code than they fix. Paying marginally more may get you marginally better coders, but there's a qualitative difference between the marginally-above-average and the All Stars.
2. The way you get the best is not often by offering more money. It helps, it's only a piece of the whole story. The way you get the best is by giving them interesting problems to work on.
3. The exchange is not an interesting problem. In fact it's quite the opposite. It's almost entirely what Eric S Raymond calls "glue" — it pastes a bunch of other systems together, but doesn't do anything very interesting on its own. ESR cautions programmers (quite rightly!) to use as little glue as possible. Glue is where errors — and madness — insinuate themselves in to a project.6
This is related to all of the discussion I've heard about how Obama got geeks to volunteer to help him create various tools for his campaigns. If hotshot programmers would do that, the thinking goes, why wouldn't they pitch in to build an awesome exchange?
Simple: because the exchange is boring. It's as bureaucratic as it comes. Working on it would require a massive amount of interfacing with non-technical managers in order to comply with non-trivial, difficult-to-interpret legislative/regulatory rules. Do you know how many lawyers a coder would have to talk to in order to manage a project like this?! Coders are almost as allergic to lawyers as the Nac Mac Feegle are.
All that managerial overhead is no fun at all, especially compared to the warm-and-fuzzies some people feel when they get to participate in the tribal activity of a big election.7
Not only is building the exchanges not fun compared to building a campaign website, but it comes with all sorts of deadlines and responsibilities too. If you think up some little GPS app to point people toward their polling place, but it doesn't work the way you want, or handle a large enough load... no sweat. It was a hobby thing anyway. If it works then you feel good about helping to get your guy elected. If it doesn't then you just move on to the next hobby that strikes your fancy.
Was there a way for the Obama administration to harness some of that energy from the tech community? Yeah. Could they have used open source development to make some of the load lighter? Yeah. But it's no cure-all. At the end of the day there was a lot of fiddly, boring, thankless, unsexy government work to be done.
Let's take a slight detour and discuss Twitter. A professor I know claims that several bad months of performance by healthcare.gov is no big deal. After all, Twitter used to be plagued by the Fail Whale but it's a very successful enterprise now.
First of all, they're the exception. People remember the Fail Whale specifically because Twitter is the opposite of a failure now.
Secondly, tweeting is entertaining. Buying insurance isn't. People will put up with more hurdles being put between them and free fun than between them and expensive drudgery.
Thirdly, Twitter never had to worry about its delicate actuarial calculus being thrown off by a non-random sample of users pushing their way through a clogged system.
Fourthly, if Twitter screwed something up all its users were free to walk away — either until things were fixed, or forever. We don't have that option w/r/t healthcare.gov.
The administration's responses in the last few weeks to the ongoing troubles have been characterized as "legislation by press release." Let's put aside the constitutional/philosophical issue of whether the President is merely tweaking the way a law is executed, as is his wont, or is re-writing the law of the land by presidential motu propio.
I want to point out that this is another area where comparison to Amazon, Netflix, etc. falls short. If Twitter finds out that some part of their design is unimplementable they have complete prerogative to change the design of their service in any way. They can re-write their ToS or feature list or pricing structure however they want, whenever they want. The State utterly lacks such range of motion and nimbleness. There is thus even less point in people on either the Red Team or Blue Team saying "well the private sector builds massive IT projects all the time." They aren't playing the same game.
Jay Carney et al. have been insisting all along that everything is working (or will be working, or should have been working, or whatever the line is today), and the only problem is it's a bit slow, as if this is a trivial matter. I don't think people realize how relentlessly commerce websites are engineered to remove all the slowness. And I mean all the slowness. Every millisecond of delay costs you sales. Every slowdown lowers your conversion rate. Tens of milliseconds are a big deal.8 Having delays delays measures in minutes is unspeakably bad. Delays in the hours are no longer "delays" — they mean the system doesn't work.
(If you don't believe me then you can do a little experiment. If you're using Chrome, open up a new tab, then go to View > Developer > Developer Tools and click on the Network tab. [I know other browsers have a similar function, but I don't remember what they call it off the top of my head.] Once you see that, go back to the tab you just opened and load www.amazon.com. You'll see all the various files needed to display their page listed in the timeline. Note that the "latency" column is measured in milliseconds. If delays of several minutes were just part of doing business, this isn't how developers would want something reported.)
(Update: here's a look at what the exchange sales funnel actually looks like. Not good, especially for the unsubsidized consumers. And considering this is a product we're required to buy. [How well would Amazon do if they had the IRS requiring you to buy books every year in the name of increased national literacy?] Oh, and considering we don't know who will actually end up paying their bills. And is anyone else a little suspicious at how hard it is to get these numbers? What happened to all the promises of freely shared government data from "the most transparent administration ever"? How does that mesh with not releasing how many people have actually purchased a plan?)
These lengthy delays are actually worse than the exchanges not working at all. We'd be better off if they never opened. The healthy kid who's buying a policy because he's told he has to is going to be put off by these delays, but the sick old-timer with diabetes and a bad hip isn't. So rather than not getting any customers, you're getting just the expensive ones. (I feel like we need a sound effect or musical theme to play when the Death Spiral is about to come on stage. Maybe something from "Mars, Bringer of War"?)
This all leads us to Brisk Management and Failing on Time. These are very important engineering management concepts. This post is already dragging on much to long, so I'll summarize in one sentence: it is a huge mistake to take on extra risks just to hit an arbitrary calendar deadline. Or if you'd prefer a sentence with more imagery: it's better that a building take longer to finish than have it done on time but collapse later. The health exchanges look like a textbook case of Failing on Time. Obama was reassuring everyone that signing up was going to be just like shopping on Amazon or Kayak a week — one week! — before the missed launch date. The first missed launch date, that is.
Many of the problems of the exchange implementation were apparent even on paper, in the planning stages. For instance: just how is healthcare.gov supposed to calculate subsidies? That will require a real-time verification of your income. From whence will this information come? The IRS knows a scary amount about us, but it doesn't know until deep into next year how much you made this month. They don't have some server with an API standing by to answer queries like getCurrentIncome(<SSN>).9 So it was pretty inevitable that this feature would be abandoned in favor of the honor system. Which is unfortunate, because I remember ObamaCare supporters swearing up and down that it was completely absurd for their opponents to raise concerns about people hustling the system for subsidies they didn't qualify for. (Not to mention the equivalent assumptions the CBO was forced to make.)
This post is already orders of magnitude longer than I expected, so I'm going to toss in a handful of links to a couple of other people's posts without comment. There were many, many more I could put here, but keeping track of all the ink spilled on this is impossible.
The last four are by Megan McArdle, who is not only one of the most cogent econ-bloggers out there, she also worked as an IT consultant, so she has had a lot of valuable perspective to contribute.
I'll close with this, from Ellen Ullman's excellent memoir Close to the Machine. Ullman was (is?) a card-carrying communist. I mention that so you know she's no anti-government right-wing Tea Party ideologue. This passage describes her experience in the early 90s as the lead developer on a San Francisco project building a computer system to unify all the city's AIDS-related efforts. She started the project over-joyed to be working for "the good guys" instead of some profit-maximizing robber barons, but very quickly it turns in to this:
Next came the budget and scheduling wrangles. Could the second phase be done in December? At first I tried what may be the oldest joke known to programming managers—"Sure you can have it in December! Of What year?"—but my client was in deadly earnest. "There is a political deadline," they said,"and we can't change it." It did no good to explain that writing software was not a political process. The deed was done. They had gone around mentioning various dates—dates chosen almost at random, imagined times, wishes—and the mentioned dates soon took on an air of reality. To all the world, to city departments and planning bureaus, to task forces and advisory boards, the dates had become expectations, commitments. Now there was no way back. The date existed and the software would be "late." Of course, this is the way all software projects become "late"—in relation to someone's fantasy that is somehow adopted as real—but I didn't expect it so soon at the AIDS project, place of "helping people," province of "good."
I asked, "What part of the system would you like me not to do?"
"You tell us," they said.
"This one. This piece here can't be done on time."
"But we must ace that one! It's a political requirement."
Round and round: the same as every software project, any software project. The same place.
(Ellen Ullman, Close to the Machine, pp. 82–83.)
After all, you, dear readers, are strangers to me, and I find it slightly uncivilized to discuss politics, religion or sex with strangers. [↩]
See Eric S Raymond, "The Art of Unix Programming," 2003. This may be a little advanced for a legislator or administrator to read, but is is that much to ask that the people governing these critical systems learn a little but about how they work? [↩]
Is it Robin Hanson who has the theory about political engagement being another form of team sport and spectation? [↩]
And for context, conscious thought is best described on a scale of hundreds of milliseconds, so delays that are nearly too brief to perceive lower you chance of completing a transaction by a noticeable amount. [↩]
If you don't believe me then you should have been around when I was trying to convince Sallie Mae and the Department of Education of my family's correct income was so they could calculate our loan repayments. It took about nine months to convince them that my wife, a teacher, is paid 10 months a year and as a result you can't just multiply her biweekly wages by 26 to get annual income. There are four million teachers in the US, so it's not exactly like this was some rare exception they had to cope with. I wish there was some IRS system for quickly verifying income, because it would have saved me most of a year of mailing in pay stubs and 1040s and W-2s and offer letters and triplicate forms, by which point, of course, the information was out of date and we had to start over.
Sorry to get off on a tangent here, but the federal government is so bad at technology I just can't let this go. And actually, it's not much of a tangent when you consider it was the PPACA that spearheaded the semi-nationalization of the student loan industry. (Drat. I need a footnote for this footnote. A couple of the very important concepts you can learn from "The Art of Unix Programming" (note 7 supra) are the principles of Compactness and Orthogonality. Both of these, and particularly the latter, should be rules for legislation as well. Folding student loan reform into the PPACA in order to game the CBO scoring is a pretty clear violation of both of these principles.)
Compared to health insurance, a student loan is a pretty simple thing. Have you had to deal with studentloans.gov? It's atrocious. Recently they changed the repayment plan that my wife was on without notifying us. That's bad enough. The ugly part is that when they do that, they don't change the displayed label on your account that tells you which plan you're in, so even if you proactively check for changes you won't find out. And the truly hideous thing is that they don't change the label on the info screens that their own representatives can see either, so if you call to verify you still won't find out! It's true that dealing with the banks before was a complete mess, but I chalk that up to the absence of a right-of-exit for consumers. That was bad enough then, but post-nationalization I'm really over a barrel.
Getting people signed up is only the first skirmish for healthcare.gov. All these sorts of ongoing problems, such as the ones I've experienced with student loans, will constitute the bulk of the IT battle, and they have not yet even begun to show up yet. [↩]