Video game controllers, as well as the consoles they’re used with, have come a long way since the basic two-button style of the Nintendo from the 1980s. As the years passed, more and more buttons were added to accommodate increasingly complex games. And in 2006, Nintendo added a revolutionary feature—motion control—to the controller for its new console, the Wii. The technology lets users move characters, called avatars, by pointing the remote control at the screen and moving it in any direction.
Last November, Microsoft took controllers to a totally new level by rethinking what—or who—a controller is. In its new Kinect, a Web cam–style add-on for its Xbox 360 console, the player himself (or herself) has been turned into the controller. Relying on a variety of technologies, including a 3-D camera, depth sensors, and real-time motion tracking, Kinect lets users control their on-screen avatars simply by moving their bodies.
Kinect wouldn’t have been possible without the help of IEEE Fellow Andrew Blake and his team at Microsoft Research Cambridge, in the United Kingdom, Microsoft’s flagship research lab in Europe. Blake is the managing director there. The lab came up with one of the breakthroughs that lets Kinect track a person without the person having to wear sensors—something researchers in machine learning had been working on for two decades. Machine learning is a branch of artificial intelligence that focuses on algorithms that allow computers to mimic human behavior based on empirical data.
Microsoft’s Xbox group in Redmond, Wash., which had been developing Kinect for several years, enlisted Blake’s help in 2008. Redmond had come up with a rough prototype for tracking a player in real time, but the system had encountered problems. It relied on a computer graphics program to construct an avatar, which the program would then constantly adjust to match images of the player taken by Kinect’s 3-D Web camera as he or she moved.
But the method had flaws. The system would lose track of a person after a short time, could generally only track someone who was about the same size (relative in terms of scale) and shape as the avatar and couldn’t process rapid movements. Blake and his researchers were asked to help fix it.
Help came in the form of machine learning. “The Xbox team had used a computer graphics approach, but they didn’t know about machine learning,” notes Blake. “The researchers at our lab are expert in this area.”
Blake himself had been working for years on real-time motion tracking techniques using machine learning. He and Microsoft researcher Kentaro Toyama published a paper in 2001, “Probabilistic Tracking in a Metric Space,” describing a novel approach that assigned a probabilistic likelihood that each movement would lead to another specific type of movement. This data was fed into a computer program that automatically calculated the most likely next move. This was a breakthrough for machine learning, but it wasn’t able to fix Kinect.
“We still had a long way to go in developing something that would work reliably,” Blake says.
MACHINE LEARNING BREAKTHROUGH
Jamie Shotton, a Cambridge researcher, proposed a solution. He suggested teaching the machine learning system to distinguish one part of a person’s body from another.
“I have to say that when he first proposed it, I never thought it would work,” Blake says. But Shotton had worked a lot with computer recognition. For his Ph.D. thesis, he trained a computer to differentiate cows, grass, and other elements from a photo taken in the countryside by studying it pixel by pixel. So Shotton went ahead and tried the recognition approach on the tracking problem.
It only took about three months for Shotton to demonstrate that his method would work. Over the next few months, Blake’s team collaborated with the Xbox group’s engineers in an extensive machine learning project. The machine learning system was “taught” to recognize people in all shapes and sizes and in many different poses. To do this, the researchers uploaded more than a million images of different people in different positions. To teach the system how to recognize body parts, the team used a computer graphics algorithm to render color-coded images representing the different body parts. They eventually created a machine learning algorithm that could analyze each pixel in an image and determine which limb it was.
Kinect’s 3-D camera, developed by the Xbox team, was crucial to the solution, according to Blake. In his group’s previous work on real-time motion capture, an ordinary Web cam was used. “It was a heavy load to put on a computer because it could not easily separate an object or person from the background,” Blake says. “The 3-D camera relieved that load.”
“It made the machine learning system capable of recognizing depth,” he continues. This meant that it could better recognize body parts, as well as track people both close to and far from the camera.
When a player steps in front of the 3-D camera, Kinect automatically begins to classify the image of the person—pixel by pixel—based on rules learned from the million or so images it has seen. Kinect classifies a player’s limbs independently on each frame from the camera so that it doesn’t lose track, and it doesn’t matter how rapidly the player moves. Kinect is also capable, using further techniques developed by the Xbox team, to recognize more than one person, making two-player games possible.
“The result is really quite revolutionary,” Blake says. “It took a lot of collaboration between our group and the Xbox engineers to make this happen. Even three or four years ago you could only dream about something like Kinect.”
Games that have been developed for Kinect include a variety of dance, exercise, and sports games, such as bowling, volleyball, and running. And with more than 2.5 million Kinects sold since launch, Blake is excited about the future. “I think we are pushing the boundaries of gaming and broadening its appeal,” he says. “Even people who may not consider themselves gamers can have a good time playing Kinect.”