Cutting Big Data Down to Size

IEEE Green ICT Initiative seeks to reduce the technology’s impact on the environment

12 January 2018

Given the mind-boggling scale of big data across all sectors, we need to remember that transporting, storing, and processing this data comes with limits and costs, and is currently on a path that’s unsustainable.

That is the impetus of green ICT—making this technology more energy efficient and thus reducing its carbon footprint. This principle, plus one that would use ICT to “green” other areas, were the central ideas of the IEEE Greening Through ICT Summit, held in October in Paris.

One aspect of green ICT will be to cut big data down to a manageable size. After all, we do not seek to simply collect data, per se—we seek the actionable insights and knowledge that data makes possible.

Related: Special Report: Green ICT

FIVE THINGS TO CONSIDER

To manage big data, organizations must consider the types of sensors they use to collect this data, how and where the data will be processed, and what kinds of insights they seek. They must take into account what ICT experts refer to as the five V’s: volume, velocity, variety, veracity, and value.

Consider volume in, say, monitoring a patient’s heart rate. The heart-rate waveform itself is of little interest because that pattern has become known over time. Transmitting a constant stream of that data makes no sense—it does not confer useful knowledge. Ideally, a monitor would register only anomalies that threaten the patient’s health and, perhaps, trigger emergency services.

Velocity means the data itself is fast-changing and its usefulness has a time limit. Using the heart-monitoring example, if that data has to be sent upstream for centralized processing and decision-making, the output might be too late to save an at-risk patient.

In terms of variety, think of social media: Different platforms may all convey similar data, but from different angles or with different data qualities. The challenge there might be to reduce redundancies while correlating the different varieties of data in the most efficient way to gain insights.

Another factor to consider is veracity. Big data typically contains errors or unusable points, which require “cleaning it up” or preprocessing it. How can we accomplish this in a place and manner that does not consume a lot of power or increases latency? One way is to do this locally.

In terms of value, it is common practice to prioritize certain types of data in a network—which often translates as quality of service, particularly when it comes to telecommunications. Network providers must determine which data takes precedence and which data is not time-sensitive. Signals from heart monitors and autonomous vehicles, for example, have greater urgency than, say, soil monitors in an agricultural setting.

A MIDDLE GROUND

When it comes to data processing, we must remember our options are not limited to simply local or remote. We can also structure a network to provide progressive processing, in which processing takes place at successive nodes and each node refines the data a bit further. This would describe a linear approach.

Another option is fog computing, a decentralized infrastructure in which data is processed and stored in the most logical, efficient place between the data source and the cloud.

We are already taking similar approaches in applications such as local caching of content on the Internet. Instead of pulling content from a central data center, we can make copies locally, reduce traffic in the network, and reduce power consumption. We’ll need to apply similar strategies in a more pervasive and strategic manner that takes all five V’s into account.

There are practical engineering challenges as well, such as designing and building sensors, developing appropriately located network nodes for processing data, and determining which data take priority in a congested environment.

From a business perspective, we’ll need to seek value propositions that can convey to companies the urgency of greening ICT and managing big data. Certainly, improved energy efficiency and data efficacy provide rewards worth pursuing by society and enterprises. Raising the broadest possible awareness of the need to make ICT, big data, and their pervasive applications as green and sustainable as possible is a helpful first step.

For more information, see “Energy Efficient Big Data Networks: Impact of Volume and Variety,” published in IEEE Transactions on Network and Service Management and available in the IEEE Xplore Digital Library.

IEEE Senior Member Jaafar Elmirghani is the cochair of the IEEE Green ICT Initiative.

Learn More