cogsci » J· Sylvest· R· F· Cos· Nvllvm· Fecit

AI's "one trick pony" has a hell of a trick

The MIT Technology Review has a recent article by James Somers about error backpropagation, "Is AI Riding a One-Trick Pony?" Overall, I agree with the message in the article. We need to keep thinking of new paradigms because the SotA right now is very useful, but not correct in any rigorous way. However, as much as I agree with the thesis, I think Somers oversells it, especially in the beginning of the piece. For instance, the introductory segment concludes:

When you boil it down, AI today is deep learning, and deep learning is backprop — which is amazing, considering that backprop is more than 30 years old. It’s worth understanding how that happened—how a technique could lie in wait for so long and then cause such an explosion — because once you understand the story of backprop, you’ll start to understand the current moment in AI, and in particular the fact that maybe we’re not actually at the beginning of a revolution. Maybe we’re at the end of one.

That's a bit like saying "When you boil it down, flight is airfoils, and airfoils are Bernoulli's principle — which is amazing, considering that Bernoulli's principle is almost 300 years old." I totally endorse the idea that we ought to understand backprop; I've spent a lot of effort in the last couple of months organizing training for some of my firm's senior leadership on neural networks, and EBP/gradient descent is the heart of my presentation. But I would be very, very careful about concluding that backprop is the entire show.

Backprop was also not "lying in wait." People were working on it since it was introduced in 1986. The problem was that '86 was the height of the 2nd AI winter, which lasted another decade. Just like people should understand backprop to understand contemporary AI, they should learn about the history of AI to understand contemporary AI. Just because no one outside of CS (and precious few people in CS, for that matter) paid any attention to neural networks before 2015 doesn't mean they were completely dormant, only to spring up fully formed in some sort of intellectual Athenian birth.

I really don't want to be in the position of defending backprop. I took the trouble to write a dissertation about non-backprop neural nets for a reason, after all. ((That reason being, roughly put, that we're pretty sure the brain is not using backprop, and it seems ill-advised to ignore the mechanisms employed by the most intelligent thing we are aware of.)) But I also don't want to be in the position of letting sloppy arguments against neural nets go unremarked. That road leads to people mischaracterizing Minksy and Papert, abandoning neural nets for generations, and putting us epochs behind where we might have been. ((Plus sloppy arguments should be eschewed on the basis of the sloppiness alone, irrespective of their consequences.))

PS This is also worth a rejoinder:

Big patterns of neural activity, if you’re a mathematician, can be captured in a vector space, with each neuron’s activity corresponding to a number, and each number to a coordinate of a really big vector. In Hinton’s view, that’s what thought is: a dance of vectors.

That's not what thought is, that's how thought can be represented. Planets are not vectors, but their orbits can be profitably described that way, because "it behooves us to place the foundations of knowledge in mathematics." I'm sorry if that seems pedantic, but the distinction between a thing and its representation—besides giving semioticians something to talk about—underpins much of our interpretation of AI systems and cognitive science as well. Indeed, a huge chunk of data science work is figuring out the right representations. If you can get that, your problem is often largely solved. ((IIRC both Knuth and Torvalds have aphorisms to the effect that once you have chosen the correct data structures, the correct algorithms will naturally follow. I think AI and neuroscience are dealing with a lot of friction because we haven't been able to figure out the right representations/data structures. When we do, the right learning algorithms will follow much more easily.))

PPS This, on the other hand, I agree with entirely:

Deep learning in some ways mimics what goes on in the human brain, but only in a shallow way. … What we know about intelligence is nothing against the vastness of what we still don’t know.

What I fear is that people read that and conclude that artificial neural networks are built on a shallow foundation, so we should give up on them as being unreliable. A much better conclusion would be that we need to keep working and build better, deeper foundations.

by jsylvest Posted on 10 November 2017

Posted in CS / Science / Tech / Coding | Tagged AI, cogsci, computer science, deep learning, machine learning, ML, neural nets, technology | Leave a comment

Friston

Two of my favorite blogs — Slate Star Codex (topics: psychiatry, social commentary) and Marginal Revolution (topics: economics, everything else) — have both linked to Karl Friston papers in the last 24 hours. Since one of my bosses is a Friston enthusiast, and he's the only Friston devotee I've ever met, and neither of these blogs has anything to do with what I work on, this gave me a Worlds-Are-Colliding feeling.

A George divided against itself can not stand.

I haven't read either paper yet ("An aberrant precision account of autism" and "Predicting green: really radical (plant) predictive processing") but I do want to respond to SSC's commentary. Here's what he had to say:

A while ago I quoted a paper by Lawson, Rees & Friston about predictive-processing-based hypotheses of autism. They said:

This provides a simple explanation for the pronounced social-communication difficulties in autism; given that other agents are arguably the most difficult things to predict. In the complex world of social interactions, the many-to-one mappings between causes and sensory input are dramatically increased and difficult to learn; especially if one cannot contextualize the prediction errors that drive that learning.

And I was really struck by the phrase “arguably the most difficult thing to predict”. Really? People are harder to predict than, I don’t know, the weather? Weird little flying bugs? Political trends? M. Night Shyamalan movies? And of all the things about people that should be hard to predict, ordinary conversations?

I totally endorse the rest of his post, but here I need to disagree. Other people being the hardest thing to predict seems perfectly reasonable to me. The weather isn't that hard to predict decently well: just guess that the weather tomorrow will be like it is today and you'll be pretty damn accurate. Add in some basic seasonal trends — it's early summer, so tomorrow will be like today but a little warmer — and you'll get closer yet. This is obviously not perfect, but it's also not that much worse than what you can do with sophisticated meteorological modeling. Importantly, the space between the naive approach and the sophisticated approach doesn't leave a lot of room to evolve or learn better predictive ability.

Weird flying bugs aren't that hard to predict either; even dumb frogs manage to catch them enough to stay alive. I'm not trying to be mean to amphibians here, but on any scale of inter-species intelligence they're pretty stupid. The space between how well a frog can predict the flight of a mosquito and how well some advanced avionics system could do so is potentially large, but there's very little to be gained by closing that predictive gap.

Political trends are hard to predict, but only because you're predicting other human agents aggregated on a much larger scale. A scale that was completely unnecessary for us to predict, I might add, until the evolutionary eye-blink of ten thousand years or so ago.

Predicting movies is easier than predicting other human agents, because dramatic entertainments — produced by humans, depicting humans — are just a subset of interacting with other human agents. If you have a good model of how other people will behave, then you also have a good model of how other people will behave when they are acting as story tellers, or when they are characters. (If characters don't conform to the audience's model of human agents at least roughly, they aren't good characters.)

Maybe a better restatement of Friston et al. would be "people are are arguably the most difficult things to predict from the domain of things we have needed to predict precisely and have any hope of predicting precisely."

by jsylvest Posted on 29 June 2017

Posted in Uncategorized | Tagged cogsci | Leave a comment

J· Sylvest· R· F· Cos· Nvllvm· Fecit

/* Compute what is computable, and make computable what is not so. */

Tag Archives: cogsci

AI's "one trick pony" has a hell of a trick

Friston

Recent Posts

Categories

Archives

@jsylvest tweets