As data continues to amass, become more complex, and nearly impossible to lay out on a neatly organized chart or graph, those involved in technical fields must consider how to handle all the bits and bytes in order to learn from them. At the first IEEE Big Data Initiative Workshop, technology leaders gathered to discuss such challenges as well as how to best work together to advance the field.
Some 50 participants gathered on 1 and 2 October at the Stevens Institute of Technology, in Hoboken, N.J., to explore such themes as how to bring structure to otherwise “unstructured” data, gain wisdom from the mounds of data available, and determine what role IEEE should play. IEEE Senior Member Roberto Saracco, chair of the IEEE Future Directions Committee, and Kathy Grise, the director of the IEEE Big Data Initiative kicked off the event.
“Data is very diverse and broad with so many IEEE societies, councils, and other groups using it in some aspect of their work,” said Grise. “That is why it is important for IEEE to be a leader in this arena.”
A GREAT BIG WORLD
IEEE Life Fellow Jose Moura, professor of electrical and computer engineering at Carnegie Mellon University, in Pittsburgh, talked about the importance of asking the right questions when analyzing data. But this can be a challenge, he said, because sometimes it is not known what those questions should be until after the data comes in.
Such is the case in New York City where 190 million records of taxi rides have been collected, which include data such as timestamps, pick-up and drop-off locations, duration of trips, fares paid, tip amounts, number of passengers, and more. The city also has 500 cameras set up to look at traffic in real-time. Only after the data was collected could one truly understand what could be gleaned from it, Moura said. For example, after reviewing the data, it became clear that there are more taxis than the city needs and idle time for cab drivers could be reduced by at least 60 percent by solving some of the inefficiencies of the current system.
Showing the audience a pyramid in which “big data” is at the bottom and “big wisdom” at the top, Moura emphasized the point that the data alone is just a start. Big data requires aggregation, having access to it and storing it, and its security and resilience. Moving up the pyramid, one starts to see insights into what the data means. This would involve using tools such as machine learning, analytics and visualization, and data science to accomplish. On the very top is the action component. What decisions can one reach once meaning can be made from the data?
IEEE Member Mark Davis, the leader of the IEEE Cloud Computing Initiative’s big data track, brought up the need for simplification of data in his talk. One way is by using the data warehouse approach in which data is used to fit into a preexisting hypothesis—or a question for which you want an answer. The better way, he said, is to keep all the data in its original format using inexpensive storage programs such as Hadoop until the time comes when it is needed. He concluded, however, that there is an opportunity for more tools to give structure to otherwise unstructured data, or data that comes from a multitude of sources and is incomplete.
Davis also spoke about using data to solve problems for businesses, but some participants were curious as to at what cost. In other words, what is the return on investment? The reason most companies aren’t using big data is not due to technology but hesitation among its leaders, Davis said. There is a need for people with the right mind-set and skills to know what to do with the data for the return to be worth it to them. For example, Moura showed how some telecommunications companies use information analyzed from data to predict which of its customers will switch to another provider, and then offer them incentives to stay before that happens.
IEEE Fellow Andrew Laine, president-elect of the IEEE Engineering in Medicine and Biology Society, talked about the importance of using data to reduce medical costs. He noted that just 5 percent of the U.S. population is responsible for 60 percent of health-care costs. He said data could radically transform that to a model that is moving away from caring for the sick to one that is managing the overall health of the population. Moreover, with all the data available from wearables and fitness apps, patients will be more informed and engaged in taking care of their own health. IT jobs in health care alone are expected to grow by 20 percent in the next few years. These jobs include using data to find a way to prevent accidental deaths.
“Data has traditionally been siloed,” added IEEE Senior Member Dave Belanger, senior research fellow at the Howe School of Technology Management, at Stevens. He gave a talk on what’s next for big data. “The transparency that we’re seeing now is what’s going to lead to new information we couldn’t get before.”
He emphasized Moura’s notion to ask the right questions. “If you ask a question, you will always get an answer,” he said. “It’s important to frame those questions properly.”
IEEE Fellow Xin Yao, professor of computer science at the University of Birmingham, in England, set forth the question of whether there is a unique set of core knowledge that defines and underpins big data. “Is big data just a term, or a science?” he asked. “And if it’s the latter, what are the scientific questions that are unique to big data, and how can we capture such core knowledge?” Yao is also the president of the IEEE Computational Intelligence Society.
IEEE Senior Member Christian Hansen, head of the school of computing and engineering sciences at Eastern Washington University, in Cheney, talked about how big data is an opportunity for undergraduate students in this field to get jobs before even entering a graduate program due to the high demand for these skills. But one challenge for those in big data, he noted, is that “We’re gathering a lot of hay and not enough needles. We need to be looking for the needles.” Hansen is the president of the IEEE Reliability Society.
On the second day, attendees heard from those representing the IEEE Sensors Council and IEEE Computer Society to gain their perspective on big data. Summarizing the group’s perspectives was IEEE Senior Member Ling Liu, professor at the Distributed Data Intensive Systems Lab School of Computer Science at Georgia Institute of Technology, in Atlanta, which included a lively discussion about big-data analytics versus its privacy risks. The attendees then convened for a discussion on how to move forward with the initiative.