Big data: Distilling meaning from data

Frankel, Felice; Reid, Rosalind

doi:10.1038/455030a

Download PDF

Books & Arts
Published: 03 September 2008

Big data: Distilling meaning from data

Felice Frankel¹ &
Rosalind Reid²

Nature volume 455, page 30 (2008)Cite this article

12k Accesses
113 Citations
17 Altmetric
Metrics details

Buried in vast streams of data are clues to new science. But we may need to craft new lenses to see them, explain Felice Frankel and Rosalind Reid.

It is a breathtaking time in science as masses of data pour in, promising new insights. But how can we find meaning in these terabytes? To search successfully for new science in large datasets, we must find unexpected patterns and interpret evidence in ways that frame new questions and suggest further explorations. Old habits of representing data can fail to meet these challenges, preventing us from reaching beyond the familiar questions and answers.

To extract new meaning from the sea of data, scientists have begun to embrace the tools of visualization. Yet few appreciate that visual representation is also a form of communication. A rich body of communication expertise holds the potential to greatly improve these tools. We propose that graphic artists, communicators and visualization scientists should be brought into conversation with theorists and experimenters before all the data have been gathered. If we design experiments in ways that offer varied opportunities for representing and communicating data, techniques for extracting new understanding can be made available.

Visual representation is familiar in data-intensive fields. Years before a detector is built for a facility such as the Large Hadron Collider near Geneva, for example, physicists will have pored over simulations. They examine how important events will 'look' in the displays that reveal and communicate what is going on inside the machine. Such discussions tend to take place within the visual conventions of a field. But perhaps conversations might be broadened to consider alternative representations of the same data. These might suggest other approaches to collecting, organizing and querying data that will maximize the transparency of experimental results and thus aid intuition, discovery and communication.

Discussing visual communication before designing experiments may reveal new science. Credit: D. ARMENDARIZ

Unfortunately, visualization experts and communicators are often consulted only after data are organized and stored, in the hope that they will create effective computer displays, slides and figures for publication. Meanwhile, they may be developing their tools in isolation, kept at arm's length by scientists who are busy getting their experiments done. Opportunities for useful dialogue are thus squandered.

When scientists, graphic artists, writers, animators and other designers come together to discuss problems in the visual representation of science, such as at the Image and Meaning workshops run by Harvard University (http://www.imageandmeaning.org), it becomes clear that representations repeatedly fail to communicate understanding or address obvious questions about the underlying data. A three-dimensional volume rendering may give no hint of important uncertainties or data gaps; solid surfaces or sharp edges may suggest data where they do not exist. A graphic artist might propose ways to reveal gaps or deviations from expectation early in an experiment, guiding subsequent data collection or highlighting new avenues of enquiry. When we asked Harvard University chemist George Whitesides to change the geometry of a self-assembled mono layer with clearly delineated hydrophobic and hydrophilic areas to create an image for submission to a journal, he found himself redesigning the experiment, and unexpected science emerged.

Student workshops and exercises, such as those run by the US National Science Foundation's Picturing to Learn project (http://www.picturingtolearn.org), teach us that attempting to visually communicate scientific data and concepts opens a path to understanding. When science and design students collaborate, their drive to understand one another's ideas pushes them to create new ways of seeing science. Investment in visual communication training for young scientists will pay off handsomely for any data-intensive discipline.

The ingrained habits of highly trained scientists make them rarely as adventurous as these young minds. We think we are on the path to insight when shading reveals contours in 3D renderings, or when bursts of red appear on heat maps, for example. But the algorithms used to produce the graphics may create illusions or embed assumptions. The human visual system creates in the brain an apparent understanding of what a picture represents, not necessarily a picture of the underlying science. Unless we know all the steps from hypothesis to understanding — by conversing with theorists, experimentalists, instrument and software developers, visualization scientists, graphic artists and cognitive psychologists — we cannot be sure whether a display is accurate or misleading.

The greatest opportunity and risk lie in that last step in the path: understanding. Whether verbal or visual, any language that is garbled and inconsistent fails to do its job. Let's talk. Let's all talk.

Author information

Authors and Affiliations

Felice Frankel is senior research fellow in the faculty of arts and sciences at Harvard University, Cambridge, Massachusetts 02138, USA. With G. M. Whitesides, she is co-author of On the Surface of Things: Images of the Extraordinary in Science. felice_frankel@harvard.edu ,
Felice Frankel
Rosalind Reid is executive director of the Initiative in Innovative Computing at Harvard University and former Editor of American Scientist.,
Rosalind Reid

Authors

Felice Frankel
View author publications
You can also search for this author in PubMed Google Scholar
Rosalind Reid
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

See Editorial, page 1 .

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frankel, F., Reid, R. Big data: Distilling meaning from data. Nature 455, 30 (2008). https://doi.org/10.1038/455030a

Download citation

Published: 03 September 2008
Issue Date: 04 September 2008
DOI: https://doi.org/10.1038/455030a

This article is cited by

Parametrized Optimization Based on an Investigation of Musical Similarities Using SPARK and Hadoop
- Savita Chaudhary
- V. Karthik
- E. Naresh
SN Computer Science (2023)
Large-scale automated investigation of free-falling paper shapes via iterative physical experimentation
- Toby Howison
- Josie Hughes
- Fumiya Iida
Nature Machine Intelligence (2020)
Fault diagnosis system of bridge crane equipment based on fault tree and Bayesian network
- Yu Zheng
- Fei Zhao
- Zheng Wang
The International Journal of Advanced Manufacturing Technology (2019)
Big Data Analytics in Healthcare: Data-Driven Methods for Typical Treatment Pattern Mining
- Chonghui Guo
- Jingfeng Chen
Journal of Systems Science and Systems Engineering (2019)
Changes in the structures of U.S. companies: action implications for executives and researchers
- George P. Huber
Journal of Organization Design (2016)

Big data: Distilling meaning from data

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

This article is cited by

Parametrized Optimization Based on an Investigation of Musical Similarities Using SPARK and Hadoop

Large-scale automated investigation of free-falling paper shapes via iterative physical experimentation

Fault diagnosis system of bridge crane equipment based on fault tree and Bayesian network

Big Data Analytics in Healthcare: Data-Driven Methods for Typical Treatment Pattern Mining

Changes in the structures of U.S. companies: action implications for executives and researchers

Search

Quick links

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Parametrized Optimization Based on an Investigation of Musical Similarities Using SPARK and Hadoop

Large-scale automated investigation of free-falling paper shapes via iterative physical experimentation

Fault diagnosis system of bridge crane equipment based on fault tree and Bayesian network

Big Data Analytics in Healthcare: Data-Driven Methods for Typical Treatment Pattern Mining

Changes in the structures of U.S. companies: action implications for executives and researchers

Search

Quick links