The Next Step for Artificial Intelligence Is Machines that Get Smarter on Their Own

Deep learning enables computers to do a better job than humans at mastering skills and making decisions

1 June 2016

Have you ever used a voice-activated service such as Apple’s Siri only to find it completely missed what you were saying? Or played a game against a computer and felt it didn’t even put up a fight? That’s about to change with advances in deep learning, which improves computers’ ability to process information and make decisions—like people do, and oftentimes even better.

Deep-learning techniques allow a computer system to connect the dots from many different areas of knowledge, similar to how the human brain works, to make the best decision possible. Facebook, Google, Microsoft, and other tech companies are in a race to apply deep learning to make machines intelligent without much help from the programmers.

Take that voice-activation service, for example. Say a user with an accent is dictating a message amid background noise. A deep-learning machine could detect and process those factors to interpret what the person is saying. And this is just a starting point. Applications for deep learning are endless.

“Deep learning can be applied horizontally across many fields and applications,” says IEEE Member Rajan Goyal, a distinguished engineer at Cavium, in San Jose, Calif., who works on next-generation accelerators. He is exploring silicon designs for deep learning. “The power of deep learning is that it can solve many problems that have been impossible to solve to date.”


Deep learning is a relatively new form of artificial intelligence that gives an old technology—a neural network—a twist made possible by big data, supercomputing, and advanced algorithms. Data lines possessed by each neuron of the network communicate with one another.

It would be impossible to write code for an unlimited number of situations. And without correct code, a machine would not know what to do. With deep learning, however, the system is able to figure things out on its own. The technique lets the network form neural relationships most relevant to each new situation. First, however, the machine needs to learn how to learn.

One of the first deep-learning exercises was carried out by Yann LeCun, director of AI research at Facebook. LeCun taught a computer system how to recognize the differences between images of dogs and cats. When the system chose incorrectly, he would correct it until the program figured out the reason it was wrong, such as not, for example, having considered the shape of a nose or an ear. Eventually, the machine started to distinguish a cat from a dog nearly every time.

“Just like human beings grow up learning about the environment around them and reacting to it, machines can also learn complex tasks this way,” Goyal says. Basically, the machines learn from their own mistakes.

Now add millions of inputs beyond those for cats and dogs, and you start to have a highly intelligent system. Because tomorrow’s intelligent systems will have so much knowledge, they aren’t likely to make the mistake of reading a word incorrectly just because the handwriting is poor or a letter is missing. Instead, the machine would, for example, consider the context of the word, such as how it is written, and on what medium—on a billboard, say, or in a newspaper—as well as what words or images surround it.

“Instead of teaching the machine everything, let it teach itself,” Goyal says. “By creating a system that can learn on its own, the time to develop it is drastically reduced.”

The more an intelligent machine knows, the faster it can pick up new information. Even­tually, humans might not be able to teach it much at all. One of the latest examples is the AlphaGo program, which defeated Go champion Lee Sedol. Go, an ancient board game renowned for its intuitive strategy, has more possible combinations than the number of atoms in the universe.

In response to the victory, Demis Hassabis, cofounder of DeepMind, the company that developed AlphaGo and has since been acquired by Google, said it demonstrates that AI could be used to solve problems that confound humans.

And then there’s Google Brain. For each image of a location collected for Google Maps, a team of employees had the tedious task of clicking yes or no for whether a photo was of an actual address and not, for example, of an empty stretch of street or unoccupied woods. Then the company’s engineers trained its computer system to handle the task. And using deep-learning image recognition, Google’s machines were able to identify with street addresses all the homes and buildings in France in less than an hour.

IEEE Fellow Li Deng, chief scientist of AI at Microsoft Applications and Services Group and research manager at Microsoft Research, pioneered research and applications in deep learning speech recognition. With colleagues at Microsoft Research, he explored multimodal intelligence involving images and natural language for computers to communicate like humans. Deng received the IEEE Signal Processing Society’s 2015 Technical Achievement Award for contributions to deep learning and to automatic speech recognition.

When a deep-learning system views an image or video, it can describe what it sees. The system identifies visual cues—such as woman, camera, flowers, and purple—then uses natural-language models to generate many possible sentences describing the scene. The system can then quickly determine what it understands to be the most sensible description: A woman is taking photos of purple flowers.

The opposite is also true. If you type in a descriptive sentence, the machine can bring up the most relevant media it finds on the Web.

That ability to understand an image’s content and express it in natural language, Deng says, will be useful for a wide variety of applications. At a Microsoft conference on 30 March, a blind software developer showcased how he was able to “see” using a deep learning–enabled headset. The user tapped a button on the headset to take a snapshot of the scene in front of him, and the system explained what was in the photo. The system could even describe facial expressions and detect if a person in the image looked happy or confused, for example.


More precise than humans, intelligent machines will be able to pick up subtle cues, such as differentiating fake smiles from real ones—which is often difficult for a person to discern—Goyal says. Deep learning also will enable machines to predict a person’s needs, Deng adds. For example, in the future when you text a friend that you plan to see a movie, Uber or a similar car service could be automatically programmed to pick you up if you do not have your own car. Intelligent systems, he says, will be able to make decisions far more accurately and faster than humans can, and we’re already starting to see that happen.

This article appears in the June 2016 print issue as “Mastering Deep Learning.”

This article is part of our June 2016 special issue on artificial intelligence.

Learn More