How much data counts as big data? It’s relative, says ASU researcher Frank Timmes.
Download Full Image
Timmes is an astrophysicist and professor in the School of Earth and Space Exploration at Arizona State University. In his research exploring the origins of our universe, Timmes sets data calculations in motion that use and produce terabytes upon terabytes of data.
Timmes and other ASU researchers, in disciplines ranging from health to business to the humanities, often work with data sets so large they are known simply as big data. Big data is defined by four characteristics: volume, variety, velocity and veracity.
"Variety" describes the many forms of data. Every organization, from hospitals to supermarkets to airports to schools, generates different types of data. As individuals we are also generating data in the form of social-media updates, web surfing and location data. And new forms of data are always being created.
"Velocity" is how fast data is being generated. As technology advances, we are generating data at an accelerating pace.
"Veracity" describes the accuracy and completeness of the data.
"Volume" is the amount of data, measured in bytes. One byte is the amount of data used to encode a single letter of text. When personal computers debuted in the 1970s they boasted 48 kilobytes (48,000 bytes) of memory. In 2008, Google was estimated to generate 20 petabytes of data each day. That’s 20 quadrillion (20,000,000,000,000,000) bytes, the equivalent of 400 million filing cabinets' worth of text — and that was seven years ago.
How much data counts as big data? It’s relative. The easiest way to recognize big data, Timmes said, is if it’s too big for your current machine.
“Normal data today would have been considered big data 20 years ago; 20 years from now our big data will seem miniscule,” he said.
Let them eat (big data) cake
As we grow increasingly cozy with technology, big data has crept into our lives and lexicon. Devices and computers track our clicks, location, social-media activity, health, purchases and more (so much more) and along the way generate bits (and bytes) of data.
To make use of so much data, researchers rely on high-performance computing centers such as ASU Research Computing. The facility offers 100-gigabit Internet2 access and multi-petabyte storage capacity, including large-scale in-memory analytics, as well as the staff to help researchers use it.
Scientists and researchers in all fields now have the ability to create, analyze and access data in new ways and at new scales. In some cases big data used in research is so massive that it isn’t practical or cost-effective to store the data for future use. It might be kept for a few years and then deleted.
“Is every piece of data that you can put in digital format worth storing? Oftentimes not,” Timmes said.
Instead of saving petabytes upon petabytes of data, researchers make their work repeatable by others by passing on the process of data collection and analysis.
“Don’t give me the cake. Give me the recipe and let me make the cake,” Timmes said.
The power of personalization
Like researchers, retailers are also using big data for innovation.
Michael Goul is professor and associate dean for research in ASU’s W. P. Carey School of Business. He studies the application of big data in predictive analytics, such as when eBay and Amazon casually suggest additional products based on your activity and purchase history.
“People come into eBay, and they don’t realize that pretty much everyone is in an experiment,” Goul said. “They test their ideas out on people live, in the system. They’re using big data for innovation.”
Goul sees exciting potential for big data to shape the experience of personalization. He said a product recommendation is just the tip of the iceberg. A future shopping experience might allow you to virtually visit a designer showroom in Paris, for example, and try on items from the latest clothing line.
Another innovation Goul is tracking uses predictive analysis in health care. This could offer suggestions for additional testing or services based on someone’s health history and other information, for example.
“We can gain so much if we can leverage technology in ways that can personalize it,” Goul said.
An interdisciplinary defense
Tailored online interactions are enabled by vast quantities of personal data. But just because you share data with one retailer doesn’t mean you wish to share it with everyone. Recent breaches at companies ranging from Target to Ashley Madison and even our federal government illustrate how difficult it can be to keep sensitive data safe.
Jamie Winterton guides cybersecurity strategy at ASU’s Global Security Initiative (GSI) and says that our online actions are often tracked without our knowledge by third parties.
“We shed so much data as we go through life, whether through personal devices or interactions. The more little pieces that are lying around, the easier it is for someone to piece together a complete picture of you,” Winterton said.
GSI recently launched the new Center for Cybersecurity and Digital Forensics (CDF), which draws on ASU’s interdisciplinary leadership and GSI’s position as a university-wide entity to advance new understandings of security. CDF also develops partnerships with both private industry and government to improve their collective security.
Protecting data is an endless game of leapfrog, with each new attack inviting a more sophisticated defense, which hackers quickly work to break down. Creative data defenses aren’t based in computer science alone, Winterton said. They must be interdisciplinary.
No page left undigitized
Michael Simeone also works across disciplines. He is an assistant research professor at ASU’s Institute for Humanities Research and director of the Nexus Lab for Digital Humanities. Simeone is involved in projects that range from delving into 18th-century cartography to modeling changes in economic thought leadership over the past 40 years.
Technology and big data capabilities are allowing new insights into humanities questions. In addition, the humanities bring a critical point of view to the table as society grapples with our increasingly digital identity.
“Everyone assumes that just because people are under a particular age they're tech savvy. It's just not true. Learning a particular piece of software and having an important set of critical mechanisms in your mind about how to encounter data and statistics as they relate to your everyday life, social situation, culture and history is a really important skill set to have, especially as the data scales up. The digital humanities is at a nice place to intervene in this mix,” said Simeone.
As with other disciplines, the onset of a data deluge in the humanities is calling researchers back to the drawing board to rethink traditional research methodologies.
“There are some growing pains right now. Just as the data is getting bigger, the methodologies have to be thought through responsibly. If you’re used to studying 20 books and suddenly you can study 1 million books, it posits some very clear methodological challenges,” Simeone said.
The most powerful tool we have
As more of our lives are mirrored in data sets, there is enormous potential for improving quality of life. But Goul reflects that as we generate ever more data at increasing speeds we are also increasing the speed at which we make decisions based on that data.
“Sometimes it’s good to have a little soaking time and to think, ‘Is this something I really want to do?’” he said.
We will inevitably continue to live in a world where technology is the norm and not the exception. At the end of the day, however, we are not status updates, purchase histories and steps taken; we are human beings, undigitized and in the flesh. That is why Timmes asserts that, despite our advanced technology and bytes upon bytes of data, the most powerful tool anyone has is “your brain driving your fingertips on the keyboard.”
Written by Kelsey Wharton, Office of Knowledge Enterprise Development.