AI's "one trick pony" has a hell of a trick

The MIT Technology Review has a recent article by James Somers about error backpropagation, "Is AI Riding a One-Trick Pony?" Overall, I agree with the message in the article. We need to keep thinking of new paradigms because the SotA right now is very useful, but not correct in any rigorous way. However, as much as I agree with the thesis, I think Somers oversells it, especially in the beginning of the piece. For instance, the introductory segment concludes:

When you boil it down, AI today is deep learning, and deep learning is backprop — which is amazing, considering that backprop is more than 30 years old. It’s worth understanding how that happened—how a technique could lie in wait for so long and then cause such an explosion — because once you understand the story of backprop, you’ll start to understand the current moment in AI, and in particular the fact that maybe we’re not actually at the beginning of a revolution. Maybe we’re at the end of one.

That's a bit like saying "When you boil it down, flight is airfoils, and airfoils are Bernoulli's principle — which is amazing, considering that Bernoulli's principle is almost 300 years old." I totally endorse the idea that we ought to understand backprop; I've spent a lot of effort in the last couple of months organizing training for some of my firm's senior leadership on neural networks, and EBP/gradient descent is the heart of my presentation. But I would be very, very careful about concluding that backprop is the entire show.

Backprop was also not "lying in wait." People were working on it since it was introduced in 1986. The problem was that '86 was the height of the 2nd AI winter, which lasted another decade. Just like people should understand backprop to understand contemporary AI, they should learn about the history of AI to understand contemporary AI. Just because no one outside of CS (and precious few people in CS, for that matter) paid any attention to neural networks before 2015 doesn't mean they were completely dormant, only to spring up fully formed in some sort of intellectual Athenian birth.

I really don't want to be in the position of defending backprop. I took the trouble to write a dissertation about non-backprop neural nets for a reason, after all. ((That reason being, roughly put, that we're pretty sure the brain is not using backprop, and it seems ill-advised to ignore the mechanisms employed by the most intelligent thing we are aware of.)) But I also don't want to be in the position of letting sloppy arguments against neural nets go unremarked. That road leads to people mischaracterizing Minksy and Papert, abandoning neural nets for generations, and putting us epochs behind where we might have been. ((Plus sloppy arguments should be eschewed on the basis of the sloppiness alone, irrespective of their consequences.))


PS This is also worth a rejoinder:

Big patterns of neural activity, if you’re a mathematician, can be captured in a vector space, with each neuron’s activity corresponding to a number, and each number to a coordinate of a really big vector. In Hinton’s view, that’s what thought is: a dance of vectors.

That's not what thought is, that's how thought can be represented. Planets are not vectors, but their orbits can be profitably described that way, because "it behooves us to place the foundations of knowledge in mathematics." I'm sorry if that seems pedantic, but the distinction between a thing and its representation—besides giving semioticians something to talk about—underpins much of our interpretation of AI systems and cognitive science as well. Indeed, a huge chunk of data science work is figuring out the right representations. If you can get that, your problem is often largely solved. ((IIRC both Knuth and Torvalds have aphorisms to the effect that once you have chosen the correct data structures, the correct algorithms will naturally follow. I think AI and neuroscience are dealing with a lot of friction because we haven't been able to figure out the right representations/data structures. When we do, the right learning algorithms will follow much more easily.))

PPS This, on the other hand, I agree with entirely:

Deep learning in some ways mimics what goes on in the human brain, but only in a shallow way. … What we know about intelligence is nothing against the vastness of what we still don’t know.

What I fear is that people read that and conclude that artificial neural networks are built on a shallow foundation, so we should give up on them as being unreliable. A much better conclusion would be that we need to keep working and build better, deeper foundations.

This entry was posted in CS / Science / Tech / Coding and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *