There are lots of people who are convinced that we’re a few short years away from economic apocalypse as robots take over all of our jobs. Here’s an example from Business Insider:
Top computer scientists in the US warned over the weekend that the rise of artificial intelligence (AI) and robots in the workplace could cause mass unemployment and dislocated economies
These fears are based on hype. The capabilities of real-world AI-like systems are far, far from what non-experts expect from those devices, and the gap between where we are and where people expect to be is vast and–in the short-term at least–mostly insurmountable.
Let’s take a look at where the hype comes from, why it’s wrong, and what to expect instead. For starters, we’ll take all those voice-controlled devices (Alexa, Siri, Google Assistant) and put them in their proper context.
Voice Controls Are a Misleading Gimmick
Prologue: Voice Controls are Frustrating
A little while back I was changing my daughter’s diaper and thought, hey: my hands are occupied but I’d like to listen to my audiobook. I said, “Alexa, resume playing Ghost Rider on Audible.” Sure enough: Alexa not only started playing my audiobook, but the track picked up exactly where I’d left off on my iPhone a few hours previously. Neat!
There was one problem: I listen to my audiobooks at double speed, and Alexa was playing it at normal speed. So I said, “Alexe, double playbook speed.” Uh-oh. Not only did Alexa not increase the playback speed, but it did that annoying thing where it starts prattling on endlessly about irrelevant search results that have nothing to do with your request. I tried five or six different varieties of the command and none of them worked, so I finally said, “Alexa, shut up.”
This is my most common command to Alexa. And also Siri. And also the Google Assistant. I hate them all.
They’re supposed to make life easier but, as a general rule, they do the exact opposite. When we got our new TV I connected it to Alexa because: why not? It was kind of neat to turn it on using a voice command, but it really wasn’t that useful because voice commands didn’t work for things like switching video inputs so you still had to find the remote anyway and because the voice command to turn it off never worked, even when the volume was pretty low.
Then one day the TV stopped working with Alexa. Why? Who knows. I have half-heartedly tried to fix it six or seven times over the last year to no avail. I spent more time setting up and unsuccessfully debugging the connection than I ever saved.
This isn’t a one-off exception; it’s the rule. Same thing happened with a security camera I use as a baby monitor. For a few weeks it worked with Alexa until it didn’t. I got that one working again, but then it broke again and I gave up. Watching on the Alexa screen wasn’t ever really more useful than watching on my phone anyway.
So what’s up? Why is all this nifty voice-activated stuff so disappointing?
If you’re like me, you were probably really excited by all this voice-activation stuff when it first started to come out because it reminded you of Star Trek: The Next Generation. And if you’re like me, you also got really annoyed and jaded after actually trying to use some of this stuff when you realized it’s all basically an inconvenient, expensive, privacy-smashing gimmick.
Before we get into that, let me give y’all one absolutely vital caveat. The one true and good application of voice control technology is accessibility. For folks who are blind or can’t use keyboards or mice or other standard input devices, this technology is not a gimmick at all. It’s potentially life-transforming. I don’t want any of my cynicism to take away from that really, really important exception.
But that’s not how this stuff is being packaged and marketed to the broad audience, and it’s that–the explicit and implicit promises and all the predictions people make based on top of them–that I want to address.
CLI vs. GUI
To put voice command gimmicks in their proper context, you have to go back to the beginning of popular user interfaces, and the first of those was the CLI: Command Line Interface. A CLI is a screen, a keyboard, and a system that allows you to type commands and see feedback. If you’re tech savvy then you’ve used the command line (AKA terminal) on Mac or Unix machines. If you’re not, then you’ve probably still seen the Windows command prompt at some point. All of these are different kinds of CLI.
In the early days of the PC (note: I’m not going back to the ancient days of punch cards, etc.) the CLI was all you had. Eventually this changed with the advent of the GUI: graphical user interface.
The GUI required new technology (the mouse), better hardware (to handle the graphics) and also a whole new way of thinking about the user interaction with the computer. Instead of thinking about commands, the GUI emphasizes objects. In particular, the GUI has used a kind of visual metaphor from the very beginning. The most common of these are icons, but it goes deeper than that. Buttons to click, a “desktop” as a flat surface to organize things, etc.
Even though you can actually do a lot of the same things in either a CLI or a GUI (like moving or renaming files), the whole interaction paradigm is different. You have concepts like clicking, double-clicking, right-clicking, dragging-and-dropping in the GUI that just don’t have any analog in the CLI.
It’s easy to think of the GUI as superior to the CLI since it came later and is what most people use most of the time, but that’s not really the case. Some things are much better suited to a GUI, including some really obvious ones like photo and video editing. But there are still plenty of tasks that make more sense in a CLI, especially related to installing and maintaining computer systems.
The biggest difference between a GUI and a CLI is feedback. When you interact with a GUI you get constant, immediate feedback to all of your actions. This in turn aids in discoverability. What this means is that you really don’t need much training to use a GUI. By moving the mouse around on the screen, you can fairly easily see what commands are available, for example. This means you don’t need to memorize how to execute tasks in a GUI. You can memorize the shortcuts for copy and paste, but you can also click on “Edit” and find them there. (And if you forget they’re under the edit menu, you can click File, View, etc. until you find them.)
The feedback and discoverability of the GUI is what has made it the dominant interaction paradigm. It’s much easier to get started and much more forgiving of memory lapses.
Enter the VUI
When you see commercials of attractive, well-dressed people interacting with voice assistants, the most impressive thing is that they use normal-sounding commands. The interactions sound conversational. This is what sets the (false) expectation that interacting with Siri is going to be like interacting with the computer on board the Enterprise (NCC 1701-D). This way leads frustration and madness, however. A better way to think of voice control is as a third user interface paradigm, the VUI or voice user interface.
There is one really cool aspect of a VUI, and that’s the ability of the computer to transcribe spoken words to written text. That’s the magic.
However, once you account for that you realize that the rest of the VUI experience is basically a CLI… without a screen. Which means: without feedback and discoverability.
Those two traits that make the GUI so successful for everyday life are conspicuously absent from a VUI. Just like when interacting with a CLI, using a VUI successfully means that you have to memorize a bunch of commands and then invoke them just so. There is a little more leeway with a VUI than a CLI, but not much. And that leeway is at least partially offset by the fact that when you type in a command at the terminal, you can pause and re-read it to see if you got it all right before you hit enter and commit. You can’t even do that with a VUI. Once you open your mouth and start talking, your commands are being executed (or, more often than not: failing to execute) on the fly.
This is all bad enough, but in addition to basically being 1970s tech (except for the transcription part), the VUI faces the additional hurdle of being held up against an unrealistic expectation because it sounds like natural speech.
No one sits down in front of a terminal window and expects to be able to type in a sentence or two of plain English and get the computer to do their bidding. Here I am, asking Bash what time it is. It doesn’t go well:
Even non-technical folks understand that you have to have a whole skillset to be able to interact with a computer using the CLI. That’s why the command line is so intimidating for so many folks.
But the thing is, if you ask Siri (or whatever), “What time is it?” you’ll get an answer. This gives the impression that–unlike a CLI–interacting with a VUI won’t require any special training. Which is to say: that a VUI is intelligent enough to understand you.
It’s not, and it doesn’t.
A VUI is much closer to a CLI than a GUI, and our expectations for it should be set at the 1970s level instead of, like with a GUI, more around the 1990s. Aside from the transcription side of things, and with a few exceptions for special cases, a VUI is a big step backwards in useability.
AI vs. Machine Learning
Machine Learning Algorithms are Glorified Excel Trendlines
When we zoom out to get a larger view of the tech landscape, we find basically the same thing: mismatched expectations and gimmicks that can fool people into thinking our technology is much more advanced than it really is.
As one example of this, consider the field of machine learning, which is yet another giant buzzword. Ostensibly, machine learning is a subset of artificial intelligence (the Grand High Tech Buzzword). Specifically, it’s the part related to learning.
This is another misleading concept, though. The word “learning” carries an awful lot of hidden baggage. A better way to think of machine learning is just: statistics.
If you’ve worked with Excel at all, you probably know that you can insert trendlines into charts. Without going into too much detail, an Excel trendline is an application of the simplest and most commonly used form of statistical analysis: ordinary least-squares regression. There are tons of guides out there to explain the concept to you, my point is just that nobody thinks the ability to click “show trendline” on an Excel chart means the computer is “learning” anything. There’s no “artificial intelligence” at play here, just a fairly simple set of steps to solve a minimization problem.
Although the bundle of algorithms available for data scientists conducting machine learning are much broader and more interesting, they’re the same kind of thing. Random forests, support vector machines, naive Bayesian classifiers: they’re all optimization problems fundamentally the same as OLS regression (or other, slightly fancier statistical techniques like logistic regression.)
As with voice controlled devices, you’ll understand the underlying tech a lot better if you replace the cool, fancy expectations (like the Enterprise’s computer) with a much more realistic example (a command prompt). Same thing here. Don’t believe the machine learning hype. We’re talking about adding trendlines to Excel charts. Yeah, it’s fancier than that, but that example will give you the right intuition about the kind of activity that’s going on.
Last thing: don’t get me wrong knocking on machine learning. I love me some machine learning. No, really, I do. As statistical tools the algorithms are great and certainly much more capable than an Excel trendline. This is just about how to get your intuition a little more in line with what they are in a philosophical sense.
Are Robots Coming to Take Your Job?
So we’ve laid some groundwork by explaining how voice control services and machine learning aren’t as cool as the hype would lead you to believe. Now it’s time to get to the main event and address the questions I started this post with: are we on the cusp of real AI that can replace you and take your job?
You could definitely be forgiven for thinking the answer is an obvious “yes”. After all, it was a really big deal when Deep Blue beat Gary Kasparov in 1997, and since then there’s been a litany of John Henry moments. So-called AI has won at Go and Jeopardy, for example. Impressive, right? Not really.
First, let me ask you this. If someone said that a computer beat the reigning world champion of competitive memorization… would you care? Like, at all?
Because yes, competitive memorization (aka memory sport) is a thing. Players compete to see how fast they can memorize the sequence of a randomly shuffled deck of cards, for example. Thirteen seconds is a really good time. If someone bothered to build a computer to beat that (something any tinkerer could do in a long weekend with no more specialized equipment than a smartphone) we wouldn’t be impressed. We’d yawn.
Memorizing the order of a deck of cards is a few bytes of data. Not really impressive for computers that store data by the terabyte and measure read and write speeds in gigabytes per second. Even the visual recognition part–while certainly tougher–is basically a solved problem.
With a game like chess–where the rules are perfectly deterministic and the playspace is limited–it’s just not surprising or interesting for computers to beat humans. In one important sense of the word, chess is just a grandiose version of Tic-Tac-Toe. What I mean is that there are only a finite number of moves to make in either Tic-Tac-Toe or chess. The number of moves in Tic-Tac-Toe is very small, and so it is an easily solved game. That’s the basic plot of the WarGames and the reason nobody enjoys playing Tic-Tac-Toe after they learn the optimal strategy when they’re like seven years old. Chess is not solved yet, but that’s just because the number of moves is much larger. It’s only a matter of time until we brute-force the solution to chess. Given all this, it’s not surprising that computers do well at chess: it is the kind of thing computers are good at. Just like memorization is the kind of thing computers are good at.
Now, the success of computers at playing Go is much more impressive. This is a case where the one aspect of artificial intelligence with any genuine promise–machine learning–really comes to the fore. Machine learning is overhyped, but it’s not just hyped.
On top of successfully learning to play Go better than a human, machine learning was also used to dramatically increase the power of automated language translation. So there’s some exciting stuff happening here, but Go is still a nice, clean system with orderly rules that is amenable to automation in ways that real life–or even other games, like Starcraft–are not.
So let’s talk about Starcraft for a moment. I recently read an article that does a great job of providing a real-life example. It’s a PC Magazine article about the controversy over an AI that managed to defeat top-ranked human players in Starcraft II. Basically, a team created an AI (AlphaStar) to beat world-class Starcraft II players. Since Starcraft is a much more complex game (dozens of unit types, real-time interaction, etc.) this sounds really impressive. The problem is: they cheated.
When a human plays Starcraft part of what they’re doing is looking at the screen and interpreting what they see. This is hard. So AlphaStar skipped it. Instead of building a system so that they could point a camera at the screen and use visual recognition to identify the units and terrain, they (1) built AlphaStar to only play on one map over and over again so the terrain never changed and (2) tapped into the Starcraft data to directly get at the exact location of all their units. Not only does this bypass the tricky visualization and interpretation problem, it also meant that AlphaStar always knew where every single unit was at every single point in time (while human players can only see what’s on the screen and have to scroll around the map).
You could argue that Deep Blue didn’t use visual recognition either. The plays were fed into the computer directly. The difference is that human chess players use the same code to understand the game, so the playing field was even. Not so with AlphaStar.
That’s why the “victory” of AlphaStar over world class Starcraft players was so controversial. The deck was stacked. The AI could see the entire board at the same time (which is not possible as a restriction of the way the game is played, not just human capacity) and only by playing on one map over and over again. If you moved AlphaStar to a different map, world class players could have easily beaten it. Practically anyone could have easily beaten it.
So here’s the common theme between voice-command and AlphaStar: as soon as you take one step off the beaten path, they break. Just like a CLI, a VUI (like Alexa or Siri) breaks as soon as you enter a command it doesn’t perfectly expect. And AlphaStar goes from worldclass pro to bumbling child if you swap from a level it’s been trained on to one it hasn’t.
The thing to realize is that his limitation isn’t just about how these programs perform today. It’s about the fundamental expectations we should have for them ever.
Easy Problems and Hard Problems
This leads me to the underlying reason for all the hype around AI. It’s very, very difficult for non-experts to tell the difference between problems that are trivial and problems that are basically impossible.
For a good overview of the concept, check out Range by David Epstein. He breaks the world into “kind problems” and “wicked problems”. Kind problems are problems like chess or playing Starcraft again and again on the same level with direct access to unit location. Wicked problems are problems like winning a live debate or playing Starcraft on a level you’ve never seen before, maybe with some new units added in for good measure.
If your job involves kind problems–if it’s repeatable with simple rules for success and failure–than a robot might steal your job. But if your job involves wicked problems–if you have to figure out a new approach to a novel situation on a regular basis–then your job is safe now and for the foreseeable future.
This doesn’t mean nobody should be worried. The story of technological progress has largely been one of automation. We used to need 95% or more of the human population to grow food just so we’d have enough to eat. Thanks to automation and labor-augmentation, that proportion is down to the single digits. Every other job that exists, other than subsistence farming, exists because of advances to farming technology (and other labor). In the long run: that’s great!
In the short run, it can be traumatic both individually and collectively. If you’ve invested decades of your life getting good at one the tasks that robots can do, then it’s devastating to suddenly be told your skills–all that effort and expertise–are obsolete. And when this happens to large numbers of people, the result is societal instability.
So it’s not that the problem doesn’t exist. It’s more that it’s not a new problem, and it’s one we should manage as opposed to “solve”. The reason for that is that the only way to solve the problem would be to halt forward progress. And, unless you think going back to subsistence farming or hunter-gathering sounds like a good idea (and nobody really believes that, no matter what they say), then we should look forward with optimism for the future developments that will free up more and more of our time and energy for work that isn’t automatable.
But we do need to manage that progress to mitigate the personal and social costs of modernization. Because there are costs, and even if they are ultimately outweighed by the benefits, that doesn’t mean they just disappear.