Computer vision, a field concerned with processing, understanding, and analyzing data from still images and video, gives artificial-intelligence systems a better means of understanding and interacting with the physical world.
“Any time we can squeeze a camera on, say, an unmanned aerial vehicle, computer vision has the potential to provide intelligence and automation,” IEEE Fellow Fatih Porikli says.
Porikli, a professor of engineering at the Australian National University, in Canberra, is the computer vision group leader at NICTA, Australia’s Information and Communications Technology Research Centre of Excellence, also in Canberra. NICTA’s role is to pursue ICT-related research to benefit the Australian economy.
Porikli has made significant contributions to object and motion detection, object tracking, and video analytics. His pioneering region covariance descriptor, which assigns statistical attributes to an image, has become the standard for computer-vision algorithms.
He moved to NICTA in 2013, after 13 years at the Mitsubishi Electric Research Laboratories, in Cambridge, Mass. Before that he worked at HRL Laboratories, in Malibu, Calif., and AT&T Research Laboratories, in Middletown, N.J.
In 2014, he helped launch the annual IEEE DeepVision workshop, which brainstorms theories and processes for deep-learning architectures, a relatively new form of machine learning in which machines teach themselves. The Institute interviewed Porikli about the future of computer vision and the role of deep learning in AI.
What interested you in computer vision?
Human eyesight has always fascinated me. We sense, perceive, and understand the world with our eyes. These small, magical organs are so precious. But how do we see and process visual information so effortlessly? That question has always reverberated in my mind.
What are the best examples in use today?
Many are found in our daily lives. These include the fingerprint-recognition and face-detection features in our smartphones, and driver-assistance and collision-avoidance systems in cars. These systems save lives. With depth-sensing cameras, which capture objects in 3-D, computer vision also enables marvelous gaming experiences. Industrial vision systems used in farming can now literally separate grain from chaff.
How could deep learning boost computer vision? What does it do that other machine-learning systems can’t?
Two recent developments made deep-learning applications possible for computer vision. One is the availability of very large annotated image data sets. Such sets contain millions of images, with labels that describe their features. The other development is the ever-increasing computational capability of machines.
Before deep learning, humans had to make lots of assumptions on what was important in the data and what could be ignored.
Deep-learning networks, on the other hand, learn by themselves what is important about the data. They learn which are the discriminative and informative parts of, say, a human face, such as the eyes, nose, and so on. Thus, the networks themselves eliminate any dependency on user-defined models, which are often suboptimal.
Another significant advantage of such networks is their capacity to incorporate complex models in their architectures. This is critical in complicated classification tasks where data cannot be split up simply into different classes. Another advantage is that deep-learning networks are applicable to modern computational architecture because they can be implemented to run parallel tasks—which makes them very fast.
You’ve organized two IEEE DeepVision workshops. What were the takeaways?
The workshops crystallized the notion that deep-learning networks provide a significant boost in performance to a variety of applications compared with previous machine-learning methods. And by combining different types of processing layers, functions, and topologies, it’s possible to create a vast number of deep-learning systems for different tasks, demonstrating nearly endless possibilities. Our intuition has also been confirmed that deep learning is not a momentary trend but a permanent revolution reshaping computer vision research on all fronts.
What is the future of computer vision? What are the biggest challenges still to be overcome?
The past several years have seen significant progress, particularly in object recognition. However, our journey is far from complete.
How will computer vision facilitate augmented reality and telepresence applications, providing 360-degree images? What will be possible with intelligent cameras connected to the Internet of Things? We have amazingly capable robots, yet painfully cumbersome ways of programming them. How can computer vision change the way we program, interact with, and teach robots what they must do? Imagine robots that learn simply by observing humans.
Over 1,000 hours of video data is being uploaded to social media platforms every minute. How will computer vision respond to the enormous amount of visual data so that we don’t drown in it?
Soon, I believe, we will see something like a computer vision app store. This will allow any app developer to access and incorporate advanced computer-vision tools into various applications. This will open opportunities for many more people to contribute and benefit from the technology. I think the biggest challenge is to learn to leverage computer-vision components easily into high-level applications.
Any words of wisdom for young engineers who want to work in the field?
Most outstanding computer-vision achievements come from collaborative interdisciplinary environments. Take retail marketing, which provides image-based product searches that can also be applied to security for contraband screening. Keep this in mind and think outside the box.
To get a snapshot of the latest developments in the field, consider attending a computer-vision conference. An internship, in particular for young engineers and data scientists, would be another avenue for entering the field. Pursuing a postgraduate degree in computer vision is a great way to dive in even deeper. It’s not unusual to be self-taught through courses and other online resources, however, or to learn from a mentor.
This article appears in the June 2016 print issue as “Fatih Porikli Sets the Bar for Computer Vision.”
This article is part of our June 2016 special issue on artificial intelligence.