Big data needs you. Recent searches for big-data job openings on several major career sites revealed thousands of job postings, and that number is only expected to grow. A recent study by SAS Institute, a business analytics company, predicted that the number of employees needed to handle big-data tasks will grow by more than 240 percent by 2017. McKinsey, a consulting firm, predicted a shortfall of hundreds of thousands of big-data employees in the United States alone.
“Every field has to redefine itself in this new era, where you can collect so much more data and use it to improve your competitive advantage,” says IEEE Fellow Manish Parashar, founding director of the Rutgers Discovery Informatics Institute, in Piscataway, N.J., which focuses on solving data-intensive challenges in engineering, science, medicine, and other disciplines.
IEEE Fellow Francine Berman, a professor of computer science at Rensselaer Polytechnic Institute, in Troy, N.Y., sees the application of big data creating “new industries and new ways of doing things. Becoming literate about data, getting interested in data, and knowing how to handle data will be prerequisites for just about everything.”
She is also chair of Research Data Alliance/US. The organization is developing the global infrastructure needed for data sharing and exchange among diverse research areas, including tools, code, institutional policy, and best practices. These will provide the foundation for new data-driven insights and discoveries. RDA/US receives support from the U.S. National Science Foundation, the European Commission, and the Australian Commonwealth government.
A TRIO OF SKILLS
Three important skills are needed if you’re to be effective in handling big data, points out Dennis Shasha, associate director of NYU Wireless at New York University and a researcher in pattern recognition and database tuning. He’s also a fellow of the Association of Computing Machinery.
First is an understanding of databases and how they manage large amounts of data. Next is knowledge about machine learning and data mining, which allows inferences to be made from the data. Last comes statistics, so you can estimate the reliability of your conclusions.
It also helps to have an inquisitive personality—a cross between that of a detective and a journalist, says Shasha. “The more questions you ask, the more you’ll learn from the data.”
Add to that the ability to take questions about data sets and translate them into insight and knowledge.
“It’s important to understand the field in which the data is going to be used,” he continues. “This allows you to ask the right questions and design the right experiments to produce additional data.”
Working with big data also requires data literacy.
“You need to know when the data does or does not make sense, whether the data is pertinent to the point, when the data supports the conclusions, and when that data is likely to be faulty,” Berman explains.
She also advises people not to be afraid of the mathematics they’ll have to use.
“You don’t have to be a professional mathematician to navigate in a data-driven world, but understanding and having an affinity for how things work quantitatively is really important,” she says.
THINGS TO DO
Employees will also be needed to deal with cybersecurity, formulate policy and regulations, and research issues involving long-term storage and who can access the data.
“A lot of research needs to be done into how to manage the large data volumes and rates, and how to process it in an efficient and scalable manner,” says Parashar, “as well as how to provide enough bandwidth, throughput, computing capability, and storage capacity for handling it all.”
Issues regarding the stewardship and preservation of data both now and in the future must be worked out, Berman notes. This is especially true in the scientific realm, where large data sets like the Worldwide Protein Data Bank, a collection of 3-D structural data of proteins and nucleic acids used by researchers around the globe, will be important to the field for decades to come.
Finally, there is the ability to act on the information gathered. “You have to incorporate big data into your business plan,” says Parashar. “It’s going to change the way you do things.”
Many doors can lead to a career in big data, according to Berman. That’s because every industry is generating its own data, and data-driven professions require multiple kinds of expertise. “It’s a really broad space,” she says. “You enter through your own interests.”
Big data has applications in every field and every industry. Parashar notes that people already working in electrical engineering, computer science, or any other high-tech field could move their career in that direction by adding data-science skills to their knowledge base.
If you’re interested in a big-data career, there are numerous online resources to consult for information, including a series of big-data videos from the IEEE Computer Society. So far, these cover the ethics of big data, the ways sensor data is being used, and the challenges posed by the vast amount of data in electronic medical records.
Opportunities also exist for more training. Many universities, including Rutgers, offer certificate programs that cover analytics, data science, and informatics. There are also the tutorials and workshops given at IEEE conferences.
Only a relatively small number of people will work at specialized data companies. Instead, most should consider an industry they’re already familiar with and look for open positions there, according to Shasha. His own work at NYU has been as varied as studying which genes might govern certain behavior of plants, predicting housing prices in Los Angeles, figuring out whether a bank’s systems could prevent fraudulent transactions, and determining the best way to deploy wireless base stations for mobile devices.
“The skill set for big data is generic,” Shasha says. “I’ve had students who started in a field like biology, then went off to work in the financial industry.”