Tag Archives: AI

BaRT: Barrage of Random Transforms for Adversarially Robust Defense

This week I'm at CVPR — the IEEE's Computer Vision and Pattern Recognition Conference, which is a huge AI event. I'm currently rehearsing the timing of my talk one last time, but I wanted to take a minute between run-throughs to link to my co-author Steven Forsyth's wonderful post on the NVIDIA research blog about our paper.

Steven does a fantastic job of describing our work, so head over there to see what he has to say. I couldn't resist putting a post of my own because (a) I love this video we created...

...and (b), Steven left out what I think was the most convincing result we had, which shows that BaRT achieves a Top-1 accuracy on ImageNet that is higher than the Top-5 accuracy of the previous state-of-the-art defense, Adversarial Training.

A result from our paper, showing accuracy for varying adversarial distances.
Accuracy of BaRT under attack by PGD for varying adversarial distances, compared to the previous state-of-the-art.

Also, (c) I am very proud of this work. It's been an idea I've been batting around for almost three years now, and I finally got approval from my client to pursue it last year. It turns out it works exactly how I expected, and I can honestly say that this is the first — and probably only — time in my scientific career that has ever happened.

If you want a copy of the paper, complete with some code in the appendices, ((Our hands are somewhat tied releasing the full code due to the nature of our client relationship with the wonderful Laboratory for Physical Sciences, who funded this work.)) our poster, and the slides for our oral presentation you can find it on the BaRT page I slapped together on my website.

Posted in CS / Science / Tech / Coding | Tagged , , , , , | Leave a comment

Book List: 2019Q1

I think I did less reading this quarter than at any point since I beat dyslexia. Certainly less than any point since I started keeping track in 2011, and that includes the period when I finished my dissertation and had two kids. I'm teaching a course at a local college this semester, and lesson prep and grading has not left a lot of time for reading. But enough complaining...


slide:ology: The Art and Science of Creating Great Presentations, Nancy Duarte

Despite giving a fairly large number of presentations, I'm definitely not the audience for this. It's not really about presentations, but about sales presentations. If, like me, you have mostly factual & technical information to impart, I'm not sure how much this will help. There's a decent amount of advice in here if you're a complete graphic design novice, but there are probably better places to get that knowledge.


Cover of "The Relaxed Mind" by Dza Kilung
"The Relaxed Mind," Dza Kilung

The Relaxed Mind, Dza Kilung Rinpoche

There is perhaps a bit too much "woo" in the later chapters of this meditation manual, but it is still a good book for practice. If nothing else, I like having some meditation-related book on my bedstand/ipod: even if that book itself is not the best, it serves as an encouragement to keep practicing. The earlier two or three of the seven practices described here seem concretely useful. Maybe the latter practices will have more appeal to me as I become a "better" meditator?


Cover of "The Most Human Human," by Brian Christian
"The Most Human Human," Brian Christian

The Most Human Human: What Talking with Computers Teaches Us About What It Means to Be Alive, Brian Christian

I loved this. Christian has a degree in computer science and an MFA in poetry. I can't think of a better background to write about what the Turing Test tells us about talking with (and being) human. There's good history of AI, exploration of psychology and epistemology, and tips for what makes a conversation interesting.

I'm recommending this as a great book for other technologists to learn something about "soft skills" and for non-technologists to learn about AI. I can't think of another book that comes close to providing both benefits.


Lies Sleeping, Ben Aaronovitch

This is the latest in Aaronovitch's "Rivers of London" series, which I still love. I should really write these recaps as soon as I finish reading, because it's been long enough now that I don't have anything specific to say about it. But this is the ninth volume in the series, so if you don't already have an opinion about the prior eight, there's really no need for you to have one about this.

My wife, who reads mysteries almost exclusively, has recently started this series after hearing me talk about it since 2014. It's one of the few book series we both equally enjoy.


The Labyrinth Index, Charles Stross

(1) Copy-and-paste what I said above about not waiting to write these comments. (2) Copy-and-paste what I said about already having opinions about the series since it's long running, but replace "ninth" with "twelfth."


Cover of "Gravity's Rainbow" by Thomas Pynchon
"Gravity's Rainbow," Thomas Pynchon

Gravity's Rainbow, Thomas Pynchon

I'll be honest: I did not understand this book. I enjoyed it a great deal, but I did not understand it.

I like Pynchon as a stylist even when the narrative has me completely befuddled. As a result, even the confusing passages make for very good audiobook listening because I can let the language just wash over me.


Cover of "Zen Mind, Beginner's Mind" by Shunryu Suzuki.
"Zen Mind, Beginner's Mind," Shunryu Suzuki.

Zen Mind, Beginner's Mind, Shunryu Suzuki

I also didn't fully understand this book, but I feel like I wasn't really meant to. ((Actually, now that I think about it, maybe Pynchon didn't really want people to understand him either.)) I'm not sure "understanding" is even a thing you're supposed to be able to do to Zen. I think I got a lot out of it regardless. It's definitely something I'm going to revisit in the future.


Cover of "House of Suns" by Alastair Reynolds
"House of Suns," Alastair Reynolds

House of Suns, Alastair Reynolds

This is another winner. I haven't had this much fun reading a sci-fi book in years. It has that wide-screen baroque space opera feel that I used to get from Iain Banks books. I can't think of another story that engages so well with the sheer scope — in time and distance — of the galaxy. Before I was half way through I was already putting all of the library's other Reynolds books on my list.


The Sky-Blue Wolves, S. M. Stirling

I keep saying I'm going to stop reading this series, but then a new volume comes out just when I want a junk-food book and I read it anyway. Then I feel about as satisfied as I do after eating actual junk food. This is a fun world to mentally play around in, but Stirling is really phoning it in at this point. The Big Bad Guy that was supposed to require a world war to defeat just got knocked off in about a chapter of Dreamtime Ninja Shenanigans, and meanwhile two of our Intrepid Heroes (who happen to both be rightful heirs to continent-spanning empires) decided to have a love-child. Nice neat bow; everyone rides into the sunset.

Posted in Book List, Reviews | Tagged , , , , , , , | Leave a comment

Why we worry about the Ethics of Machine Intelligence

This essay was co-authored by myself and Steve Mills.

We worry about the ethics of Machine Intelligence (MI) and we fear our community is completely unprepared for the power we now wield. Let us tell you why.

To be clear, we’re big believers in the far-reaching good MI can do. Every week there are new advances that will dramatically improve the world. In the past month we have seen research that could improve the way we control prosthetic devices, detect pneumonia, understand long-term patient trajectories, and monitor ocean health. That’s in the last 30 days. By the time you read this, there will be even more examples. We really do believe MI will transform the world around us for the better, which is why we are actively involved in researching and deploying new MI capabilities and products.

There is, however, a darker side. MI also has the potential to be used for evil. One illustrative example is a recent study by Stanford University researchers who developed an algorithm to predict sexual orientation from facial images. When you consider recent news of the detainment and torturing of more than 100 male homosexuals in the Russian republic of Chechnya, you quickly see the cause for concern. This software and a few cameras positioned on busy street corners will allow the targeting of homosexuals at industrial-scale – hundreds quickly become thousands. The potential for this isn’t so far-fetched. China is already using CCTV and facial recognition software to catch jaywalkers. The researchers pointed out that their findings “expose[d] a threat to the privacy and safety of gay men and women.” That disavowal does little to prevent outside groups from implementing the technology for mass targeting and persecution.

Many technologies have the potential to be applied for nefarious purposes. This is not new. What is new about MI is the scale and magnitude of impact it can achieve. This scope is what will allow it to do so much good, but also so much bad. It is like no other technology that has come before. The notable exception being atomic weapons, a comparison others have already drawn. We hesitate to draw such a comparison for fear of perpetuating a sensationalistic narrative that distracts from this conversation about ethics. That said, it’s the closest parallel we can think of in terms of the scale (potential to impact tens of millions of people) and magnitude (potential to do physical harm).

None of this is why we worry so much about the ethics of MI. We worry because MI is unique in so many ways that we are left completely unprepared to have this discussion.

Ethics is not [yet] a core commitment in the MI field. Compare this with medicine where a commitment to ethics has existed for centuries in the form of the Hippocratic Oath. Members of the physics community now pledge their intent to do no harm with their science. In other fields ethics is part of the very ethos. Not so with MI. Compared to other disciplines the field is so young we haven’t had time to mature and learn lessons from the past. We must look to these other fields and their hard-earned lessons to guide our own behavior.

Computer scientists and mathematicians have never before wielded this kind of power. The atomic bomb is one exception; cyber weapons may be another. Both of these, however, represent intentional applications of technology.  While the public was unaware of the Manhattan Project, the scientists involved knew the goal and made an informed decision to take part. The Stanford study described earlier has clear nefarious applications; many other research efforts in MI may not. Researchers run the risk of unwittingly conducting studies that have applications they never envisioned and do not condone. Furthermore, research into atomic weapons could only be implemented by a small number of nation-states with access to proper materials and expertise. Contrast that with MI, where a reasonably talented coder who has taken some open source machine learning classes can easily implement and effectively ‘weaponize’ published techniques. Within our field, we have never had to worry about this degree of power to do harm. We must reset our thinking and approach our work with a new degree of rigor, humility, and caution.

Ethical oversight bodies from other scientific fields seem ill-prepared for MI. Looking to existing ethical oversight bodies is a logical approach. Even we suggested that MI is a “grand experiment on all of humanity” and should follow principals borrowed from human subject research. The fact that Stanford’s Institutional Review Board (IRB), a respected body within the research community, reviewed and approved research with questionable applications should give us all pause. Researchers have long raised questions about the broken IRB system. An IRB system designed to protect the interests of study participants may be unsuited for situations in which potential harm accrues not to the subjects but to society at large. It’s clear that the standards that have served other scientific fields for decades or even centuries may not be prepared for MI’s unique data and technology issues. These challenges are compounded even further by the general lack of MI expertise, or sometimes even technology expertise, within the members of these boards. We should continue to work with existing oversight bodies, but we must also take an active role in educating them and evolving their thinking towards MI.

MI ethical concerns are often not obvious. This differs dramatically from other scientific fields where ethical dilemmas are self-evident. That’s not to say they are easy to navigate. A recent story about an unconscious emergency room patient with a “Do Not Resuscitate” tattoo is a perfect example. Medical staff had to decide whether they should administer life-saving treatment despite the presence of the tattoo. They were faced with a very complex, but very obvious, ethical dilemma. The same is rarely true in MI where unintended consequences may not be immediately apparent and issues like bias can be hidden in complex algorithms. We have a responsibility to ourselves and our peers to be on the lookout for ethical issues and raise concerns as soon as they emerge.  

MI technology is moving faster than our approach to ethics. Other scientific fields have had hundreds of years for their approach to ethics to evolve alongside the science. MI is still nascent yet we are already moving technology from the ‘lab’ to full deployment. The speed at which that transition is happening has led to notable ethical issues including potential racism in criminal sentencing and discrimination in job hiring. The ethics of MI needs to be studied as much as the core technology if we ever hope to catch up and avoid these issues in the future. We need to catalyze an ongoing conversation around ethics much as we see in other fields like medicine, where there is active research and discussion within the community

The issue that looms behind all of this, however, is the fact that we can’t ‘put the genie back in the bottle’ once it has been released. We can’t undo the Stanford research now that it’s been published. As a community, we will forever be accountable for the technology that we create.

In the age of MI, corporate and personal values take on entirely new importance. We have to decide what we stand for and use that as a measure to evaluate our decisions. We can’t wait for issues to present themselves. We must be proactive and think in hypotheticals to anticipate the situations we will inevitably face.

Be assured that every organization will be faced with hard choices related to MI. Choices that could hurt the bottom line or, worse, harm the well-being of people now or in the future. We will need to decide, for example, if and how we want to be involved in Government efforts to vet immigrants or create technology that could ultimately help hackers. If we fail to accept that these choices inevitably exist, we run the risk of compromising our values. We need to stand strong in our beliefs and live the values we espouse for ourselves, our organizations, and our field of study. Ethics, like many things, is a slippery slope. Compromising once almost always leads to compromising again.

We must also recognize that the values of others may not mirror our own. We should approach those situations without prejudice. Instead of anger or defensiveness we should use them as an opportunity to have a meaningful dialog around ethics and values. When others raise concerns about our own actions, we must approach those conversations with humility and civility. Only then can we move forward as a community.

Machines are neither moral or immoral. We must work together to ensure they behave in a way that benefits, not harms, humanity. We don’t purport to have the answers to these complex issues. We simply request that you keep asking questions and take part in the discussion.


This has been crossposted to Medium and to the Booz Allen website as well.

We’re not the only one discussing these issues. Check out this Medium post by the NSF-Funded group Pervasive Data Ethics for Computational Research, Kate Crawford’s amazing NIPS keynote, Mustafa Suleyman’s recent essay in Wired UK, and Bryor Snefjella’s recent piece in BuzzFeed.

Posted in CS / Science / Tech / Coding | Tagged , , , , , , | Leave a comment

AIES 2018

Last week I attended the first annual conference on AI, Ethics & Society where I presented some work on a Decision Tree/Random Forest algorithm that makes decisions that are less biased or discriminatory. ((In the colloquial rather than technical sense)) You can read all the juicy details in our paper. This isn't a summary of our paper, although that blog post is coming soon. Instead I want to use this space to post some reaction to the conference itself. I was going to put this on a twitter thread, but it quickly grew out of control. So, in no particular order, here goes nothing:

Many of the talks people gave were applicable to GOFAI but don't fit with contemporary approaches. Approaches to improving/limiting/regulating/policing rule-based or expert systems won't work well (if at all) with emergent systems.

Many, many people are making the mistake of thinking that all machine learning is black box. Decision trees are ML but also some of the most transparent models possible. Everyone involved in this AI ethics discussion should learn a rudimentary taxonomy of AI systems. It would avoid mistakes and conflations like this, and it would take maybe an hour of time.

Now that I think of it, it would be great if next year's program included some tutorials. A crash course in AI taxonomy would be useful, as would a walk-through of what an AI programmer does day-to-day. (I think it would help people to understand what kinds of control we can have over AI behavior if they knew a little more about what went in to getting any sort of behavior at all.) I'd be interested in some lessons on liability law and engineering, or how standards organization operate.

Lots of people are letting the perfect be the enemy of the good. I heard plenty of complaints about solutions that alleviate problems but don't eliminate them completely, or work in a majority of situations but don't cover every possible sub-case.

Some of that was the standard posturing that happens at academic conferences ("well, sure, but have you ever thought of this??!") but that's a poor excuse for this kind of gotcha-ism.

Any academic conference has people who ask questions to show off how intelligent they are. This one had the added scourge of people asking questions to show off how intelligent and righteous they are. If ever there was a time to enforce concise Q&A rules, this is it.

We’re starting from near scratch here and working on a big problem. Adding any new tool to the toolbox should be welcome. Taking any small step towards the goal should be welcome.

People were in that room because they care about these problems. I heard too much grumbly backbiting about presenters that care about ethics, but don't care about it exactly the right way.

We can solve problems, or we can enforce orthodoxy, but I doubt we can do both.

It didn't occur to me at the time, but in retrospect I'm surprised how circumscribed the ethical scenarios being discussed were. There was very little talk of privacy, for instance, and not much about social networks/filter bubbles/"fake news"/etc. that has been such a part of the zeitgeist.

Speaking of zeitgeist, I didn't have to hear the word "blockchain" even one single time, for which I am thankful.

If I had to give a rough breakdown of topics, it would be 30% AV/trolley problems, 20% discrimination, 45% meta-discussion, and 5% everything else.

One questioner brought up Jonathan Haidt's Moral Foundations Theory at the very end of the last day. I think he slightly misinterpreted Haidt (but I'm not sure since the questioner was laudably concise), but I was waiting all weekend for someone to bring him up at all.

If any audience would recognize the difference between “bias” in the colloquial sense and “bias” in the technical, ML/stats sense, I would have hoped it was here. No such luck. This wasn't a huge problem in practice, but it’s still annoying.

There’s a ton of hand-waving about how many of the policies being proposed for ethical AI will actually work at the implementation level. “Hand-waving” is even too generous of a term. It’s one thing to propose rules, but how do you make that work when fingers are hitting keyboards?

I’ll give people some slack here because most talks were very short, but “we’ll figure out what we want, and then tell the engineers to go make it happen somehow” is not really a plan. The plan needs to be grounded in what's possible starting at its conception, not left as an implementation detail for the technicians to figure out later.

"We'll figure out what to do, and then tell the geeks to do it" is not an effective plan. One of the ways it can fail is because it is tinged with elitism. (I don't think participants intended to be elitist, but that's how some of these talks could be read.) I fully endorse working with experts in ethics, sociology, law, psychology, etc. But if the technicians involved interpret what those experts say — accurately or not — as "we, the appointed high priesthood of ethics, will tell you, the dirty code morlocks, what right and wrong is, and you will make our vision reality" then the technicians will not be well inclined to listen to those experts.

Everyone wants to 'Do The Right Thing'. Let's work together to help each other do that and refrain as much as possible from finger pointing at people who are 'Doing It Wrong.' Berating people who have fallen short of your ethical standards — even those who have fallen way, way short — feels immensely satisfying and is a solid way to solidify your in-group, but it's not productive in the long run. That doesn't mean we need to equivocate or let people off the hook for substandard behavior, but it does mean that the response should be to lead people away from their errors as much as possible rather than punishing for the sake of punishing.

I wish the policy & philosophy people here knew more about how AI is actually created.

(I’m sure the non-tech people wish I knew more about how moral philosophy, law, etc. works.)

Nonetheless, engineers are going to keep building AI systems whether or not philosophers etc. get on board. If the latter want to help drive development there is some onus on them to better learn the lay of the land. That’s not just, but they have they weaker bargaining position so I think it's how things will have to be.

Of course I'm an engineer, so this is admittedly a self-serving opinion. I still think it's accurate though.

Even if every corporation, university, and government lab stopped working on AI because of ethical concerns, the research would slow but not stop. I can not emphasize enough how low the barriers to entry in this space are. Anyone with access to arXiv, github, and a $2000 gaming computer or some AWS credits can get in the game.

I was always happy to hear participants recognize that while AI decision making can be unethical/amoral, human decision making is also often terrible. It’s not enough to say the machine is bad if you don’t ask “bad compared to what alternative?”. Analyze on the right margin! Okay, the AI recidivism model has non-zero bias. How biased is the parole board? Don't compare real machines to ideal humans.

Similarly, don't compare real-world AI systems with ideal regulations or standards. Consider how regulations will end up in the real world. Say what you will about the Public Choice folks, but their central axiom is hard to dispute: actors in the public sector aren't angels either.

One poster explicitly mentioned Hume and the Induction Problem, which I would love to see taught in all Data Science classes.

Several commenters brought up the very important point that datasets are not reality. This map-is-not-the-territory point also deserves to be repeated in every Data Science classroom far more often.

That said, I still put more trust in quantitative analysis over qualitative. But let's be humble. A data set is not the world, it is a lens with which we view the world, and with it we see but through a glass darkly.

I'm afraid that overall this post makes me seem much more negative on AIES than I really am. Complaining is easier than complementing. Sorry. I think this has been a good conference full of good people trying to do a good job. It was also a very friendly crowd, so as someone with a not insignificant amount of social anxiety, thank you to all the attendees.

Posted in CS / Science / Tech / Coding | Tagged , , , , , | Leave a comment

MalConv: Lessons learned from Deep Learning on executables

I don't usually write up my technical work here, mostly because I spend enough hours as is doing technical writing. But a co-author, Jon Barker, recently wrote a post on the NVIDIA Parallel For All blog about one of our papers on neural networks for detecting malware, so I thought I'd link to it here. (You can read the paper itself, "Malware Detection by Eating a Whole EXE" here.) Plus it was on the front page of Hacker News earlier this week, which is not something I thought would ever happen to my work.

Rather than rehashing everything in Jon's Parallel for All post about our work, I want to highlight some of the lessons we learned from doing this about ML/neural nets/deep learning.

As way of background, I'll lift a few paragraphs from Jon's introduction:

The paper introduces an artificial neural network trained to differentiate between benign and malicious Windows executable files with only the raw byte sequence of the executable as input. This approach has several practical advantages:

  • No hand-crafted features or knowledge of the compiler used are required. This means the trained model is generalizable and robust to natural variations in malware.
  • The computational complexity is linearly dependent on the sequence length (binary size), which means inference is fast and scalable to very large files.
  • Important sub-regions of the binary can be identified for forensic analysis.
  • This approach is also adaptable to new file formats, compilers and instruction set architectures—all we need is training data.

We also hope this paper demonstrates that malware detection from raw byte sequences has unique and challenging properties that make it a fruitful research area for the larger machine learning community.

One of the big issues we were confronting with our approach, MalConv, is that executables are often millions of bytes in length. That's orders of magnitude more time steps than most sequence processing networks deal with. Big data usually refers to lots and lots of small data points, but for us each individual sample was big. Saying this was a non-trivial problem is a serious understatement.

The MalConv architecture
Architecture of the malware detection network. (Image copyright NVIDIA.)

Here are three lessons we learned, not about malware or cybersecurity, but about the process of building neural networks on such unusual data.

1. Deep learning != image processing

The large majority of the work in deep learning has been done in the image domain. Of the remainder, the large majority has been in either text or speech. Many of the lessons, best practices, rules of thumb, etc., that we think apply to deep learning may actually be specific to these domains.

For instance, the community has settled around narrow convolutional filters, stacked with a lot of depth as being generally the best way to go. And for images, narrow-and-deep absolutely seems to be the correct choice. But in order to get a network that processes two million time steps to fit in memory at all (on beefy 16GB cards no less) we were forced to go wide-and-shallow.

With images, a pixel values is always a pixel value. 0x20 in a grayscale image is always darkish gray, no matter what. In an executable, a byte values are ridiculously polysemous: 0x20 may be part of an instruction, a string, a bit array, a compressed or encrypted values, an address, etc. You can't interpolate between values at all, so you can't resize or crop the way you would with images to make your data set smaller or introduce data augmentation. Binaries also play havoc with locality, since you can re-arrange functions in any order, among other things. You can't rely on any Tobbler's Law ((Everything is related, but near things are more related than far things.)) relationship the way you can in images, text, or speech.

2. BatchNorm isn't pixie dust

Batch Normalization has this bippity-boppity-boo magic quality. Just sprinkle it on top of your network architecture, and things that didn't converge before now do, and things that did converge now converge faster. It's worked like that every time I've tried it — on images. When we tried it on binaries it actually had the opposite effect: networks that converged slowly now didn't at all, no matter what variety of architecture we tried. It's also had no effect at all on some other esoteric data sets that I've worked on.

We discuss this at more length in the paper (§5.3), but here's the relevant figure:

BatchNorm activations
KDE plots of the convolution response (pre-ReLU) for multiple architectures. Red and orange: two layers of ResNet; green: Inception-v4; blue: our network; black dashed: a true Gaussian distribution for reference.

This is showing the pre-BN activations from MalConv (blue) and from ResNet (red & orange) and Inception-v4 (green). The purpose of BatchNorm is to output values in a standard normal, and it implicitly expects inputs that are relatively close to that. What we suspect is happening is that the input values from other networks aren't gaussian, but they're close-ish. ((I'd love to be able to quantify that closeness, but every test for normality I'm aware of doesn't apply when you have this many samples. If anyone knows of a more robust test please let me know.)) The input values for MalConv display huge asperity, and aren't even unimodal. If BatchNorm is being wonky for you, I'd suggest plotting the pre-BN activations and checking to see that they're relatively smooth and unimodal.

3. The Lump of Regularization Fallacy

If you're overfitting, you probably need more regularization. Simple advice, and easily executed. Everytime I see this brought up though, people treat regularization as if it's this monolithic thing. Implicitly, people are talking as if you have some pile of regularization, and if you need to fight overfitting then you just shovel more regularization on top. It doesn't matter what kind, just add more.

We ran in to overfitting problems and tried every method we could think of: weight decay, dropout, regional dropout, gradient noise, activation noise, and on and on. The only one that had any impact was DeCov, which penalized activities in the penultimate layer that are highly correlated with each other. I have no idea what will work on your data — especially if it's not images/speech/text — so try different types. Don't just treat regularization as a single knob that you crank up or down.

I hope some of these lessons are helpful to you if you're into cybersecurity, or pushing machine learning into new domains in general. We'll be presenting the paper this is all based on at the Artificial Intelligence for Cyber Security (AICS) workshop at AAAI in February, so if you're at AAAI then stop by and talk.

Posted in CS / Science / Tech / Coding | Tagged , , , , , , | Leave a comment

AI's "one trick pony" has a hell of a trick

The MIT Technology Review has a recent article by James Somers about error backpropagation, "Is AI Riding a One-Trick Pony?" Overall, I agree with the message in the article. We need to keep thinking of new paradigms because the SotA right now is very useful, but not correct in any rigorous way. However, as much as I agree with the thesis, I think Somers oversells it, especially in the beginning of the piece. For instance, the introductory segment concludes:

When you boil it down, AI today is deep learning, and deep learning is backprop — which is amazing, considering that backprop is more than 30 years old. It’s worth understanding how that happened—how a technique could lie in wait for so long and then cause such an explosion — because once you understand the story of backprop, you’ll start to understand the current moment in AI, and in particular the fact that maybe we’re not actually at the beginning of a revolution. Maybe we’re at the end of one.

That's a bit like saying "When you boil it down, flight is airfoils, and airfoils are Bernoulli's principle — which is amazing, considering that Bernoulli's principle is almost 300 years old." I totally endorse the idea that we ought to understand backprop; I've spent a lot of effort in the last couple of months organizing training for some of my firm's senior leadership on neural networks, and EBP/gradient descent is the heart of my presentation. But I would be very, very careful about concluding that backprop is the entire show.

Backprop was also not "lying in wait." People were working on it since it was introduced in 1986. The problem was that '86 was the height of the 2nd AI winter, which lasted another decade. Just like people should understand backprop to understand contemporary AI, they should learn about the history of AI to understand contemporary AI. Just because no one outside of CS (and precious few people in CS, for that matter) paid any attention to neural networks before 2015 doesn't mean they were completely dormant, only to spring up fully formed in some sort of intellectual Athenian birth.

I really don't want to be in the position of defending backprop. I took the trouble to write a dissertation about non-backprop neural nets for a reason, after all. ((That reason being, roughly put, that we're pretty sure the brain is not using backprop, and it seems ill-advised to ignore the mechanisms employed by the most intelligent thing we are aware of.)) But I also don't want to be in the position of letting sloppy arguments against neural nets go unremarked. That road leads to people mischaracterizing Minksy and Papert, abandoning neural nets for generations, and putting us epochs behind where we might have been. ((Plus sloppy arguments should be eschewed on the basis of the sloppiness alone, irrespective of their consequences.))


PS This is also worth a rejoinder:

Big patterns of neural activity, if you’re a mathematician, can be captured in a vector space, with each neuron’s activity corresponding to a number, and each number to a coordinate of a really big vector. In Hinton’s view, that’s what thought is: a dance of vectors.

That's not what thought is, that's how thought can be represented. Planets are not vectors, but their orbits can be profitably described that way, because "it behooves us to place the foundations of knowledge in mathematics." I'm sorry if that seems pedantic, but the distinction between a thing and its representation—besides giving semioticians something to talk about—underpins much of our interpretation of AI systems and cognitive science as well. Indeed, a huge chunk of data science work is figuring out the right representations. If you can get that, your problem is often largely solved. ((IIRC both Knuth and Torvalds have aphorisms to the effect that once you have chosen the correct data structures, the correct algorithms will naturally follow. I think AI and neuroscience are dealing with a lot of friction because we haven't been able to figure out the right representations/data structures. When we do, the right learning algorithms will follow much more easily.))

PPS This, on the other hand, I agree with entirely:

Deep learning in some ways mimics what goes on in the human brain, but only in a shallow way. … What we know about intelligence is nothing against the vastness of what we still don’t know.

What I fear is that people read that and conclude that artificial neural networks are built on a shallow foundation, so we should give up on them as being unreliable. A much better conclusion would be that we need to keep working and build better, deeper foundations.

Posted in CS / Science / Tech / Coding | Tagged , , , , , , , | Leave a comment

National AI Strategy

Some of my co-workers published a sponsored piece in the Atlantic calling for a national AI strategy, which was tied in to some discussions at the Washington Ideas event.

I'm 100% on board with the US having a strategy, but I want to offer one caveat: "comprehensive national strategies" are susceptible to becoming top-down, centralized plans, which I think is dangerous.

I'm generally disinclined to centralized planning, for both efficiency and philosophical reasons. I'm not going to take the time now to explain why; I doubt anything I could scratch out here would shift people very much along any kind of Keynes-Hayek spectrum.

So why am I bothering to bring this up? Mostly because I think it would be especially ill-conceived to adopt central planning when it comes to AI. The recent progress in AI has been largely a result of abandoning top-down techniques in favor of bottom-up ones. We've abandoned hand-coded visual feature detectors for convolutional neural networks. We've abandoned human-engineered grammar models for statistical machine translation. In one discipline after another emergent behavior has outpaced decades worth of expert-designed techniques. To layer top-down policy-making on a field built of bottom-up science would be a waste, and an ironic one at that.


PS Having spoken to two of the three authors of this piece, I don't mean to imply that they support centralized planning of the AI industry. This is just something I would be on guard against.

Posted in Business / Economics, CS / Science / Tech / Coding | Tagged , , , , | Leave a comment

Will AI steal our jobs?

As an AI researcher, I think I am required to have an opinion about this. Here's what I have to say to the various tribes.

AI-pessimists: please remember that the Luddites have been wrong about technology causing economic cataclysm every time so far. We're talking about several consecutive centuries of wrongness. ((I am aware of the work of Gregory Clark and others related to Industrial Revolution era wage and consumption stagnation. If a disaster requires complicated statistical models to provide evidence it exists, I say its scale can not have been that disastrous.)) Please revise your confidence estimates downwards.

AI-optimists: please remember that just because the pessimists have always been wrong in the past does not mean that they must always be wrong in the future. It is not a natural law that the optimists must be right. That labor markets have adapted in the long term does not mean that they must adapt, to say nothing of short-term dislocations. Please revise your confidence estimates downwards.

Everyone: many forms of technology are substitutes for labor. Many forms of technology are complements to labor. Often a single form of technology is both simultaneously. It is impossible to determine a priori which effect will dominate. ((Who correctly predicted that the introduction of ATMs would coincide with an increase in employment of bank tellers? Anyone? Anyone? Beuller?)) This is true of everything from the mouldboard plough to a convolutional neural network. Don't casually assert AI/ML/robots are qualitatively different. (For example, why does Bill Gates think we need a special tax on robots that is distinct from a tax on any other capital equipment?)

As always, please exercise cognitive and epistemic humility.

Posted in Business / Economics, CS / Science / Tech / Coding | Tagged , , , , , | Leave a comment