TOOLBOX
30 January 2018
Correction 21 February 2018

Data visualization tools drive interactivity and reproducibility in online publishing

New tools for building interactive figures and software make scientific data more accessible, and reproducible.

Jeffrey M. Perkel

Jeffrey M. Perkel

View author publications

You can also search for this author in PubMed Google Scholar

As Benjamin Delory started his paper documenting a new analysis pipeline to quantify plant morphology, he realized that one of the figures could pose a problem.

The paper proposes a framework to compute ‘persistence barcodes’ that describe the branching structure of plant root systems¹. The challenge was how to illustrate it.

The barcode’s underlying algorithm “is continuous and dynamic”, says Delory, a postdoctoral researcher at Leuphana University of Lüneburg in Germany. “And the best solution to show something dynamic is to animate it.”

Scientific figures are typically rendered as static images. But these are divorced from the underlying data, which prevents readers from exploring them in more detail by, for instance, zooming in on features of interest. For genomicists needing to cram millions of data points into dense visuals a few centimetres big, this can be particularly problematic.

The same is true for researchers working with computational algorithms. Scientists often post software on open-source repositories such as GitHub, but getting the code to run properly is easier said than done. Reviewers and other interested parties often require extra software and configuration to make the algorithms work.

Some journals now bridge that gap by supporting interactive figures and code. One of those is F1000Research, which last year partnered with the computing firm Plotly in Montreal, Canada, and the Code Ocean platform in New York City. These capabilities, as well as F1000Research¹’s open-access ethos, led Delory and his collaborators to submit their paper there. It was published in January.

The interactive publication

Interactive graphics that allow readers to delve into a story’s underlying data are frequent features on websites such as those of the New York Times and fivethirtyeight.com, but are less common in scientific publishing.

F1000Research’s ‘living figures’ — interactive charts introduced in 2014 that could be continually updated with new data — were laborious to produce and unscaleable, says senior publishing editor Thomas Ingraham. Plotly lets users build and share visualizations ranging from scatter plots and line graphs to contour plots and maps. The resulting images allow users to zoom in on data, pan across images and mouse-over points to see the plotted values. Student subscriptions start at US$59 per year. Open-source libraries allow researchers to create free Plotly graphics from R, MATLAB, Python and Julia code.

Code Ocean is free for academics for 10 hours of computation time per month and 50 gigabytes of storage; paid tiers start at $19 per month. It brings together code, data, results and the computing environment used to execute them in a self-contained ‘compute capsule’ that replicates the author’s computational configuration. Other users can download, modify and run that code either from codeocean.com, or though a widget in the paper.

F1000Research has now published six papers with live Plotly graphs and five with a Code Ocean widget. And this year, it plans to add support for interactive protein–protein interaction maps, which are produced using the network-mapping tool Cytoscape.

Researchers need not be put off by the perceived complexity. According to computational biologist Xijin Ge at South Dakota State University in Brookings, who has included interactive Plotly graphs in one of his papers², creating those figures requires just one extra line of code per figure. Tom DeCarlo, a coral researcher at the Oceans Institute and School of Earth Sciences at the University of Western Australia in Crawley, has created six Code Ocean projects for journals including Paleoceanography and Paleoclimatology and Biogeosciences. “I thought it was really important for scientific communication and reproducibility,” he says.

Open-source solutions

For those seeking open-source computational alternatives, a tool known as Binder can convert any public GitHub repository containing a Jupyter notebook (documents that interleave text, code and data) or R code into a package that users can run from their browser. Users simply type the notebook repository address into the search bar at mybinder.org, and the program creates a shareable interactive workspace. “It really lends itself to reproducibility and ease of use,” says Carol Willing, a Binder project team member at California Polytechnic State University (Cal Poly) in San Luis Obispo.

Such tools also simplify peer review, says Tim Head, a member of the Binder project team in Zürich, Switzerland. Head was frustrated that he couldn’t make the software work when asked to review a journal article. “Had they sent me a Binder link, we’d be done by now,” he says.

Open-source options also exist for creating interactive images, including Bokeh, htmlwidgets, pygal and ipywidgets. Most are used programmatically, generally within either R or Python code, which is commonly used in science. Coders can, for example, use ipywidgets to drop interactive 3D plots, maps and molecular visualizations into Jupyter notebooks. Another option, which is written in JavaScript, is Vega-Lite. Because that language is less popular in science, Brian Granger at Cal Poly and Jake VanderPlas at the University of Washington in Seattle developed a Python interface called Altair to make it more accessible.

Whereas most of these tools tend to provide functions for specific graph types, Vega-Lite and Altair are flexible ‘grammars’ that describe, for instance, how variables map to different visual features, such as colour or shape. They also allow graphs to be linked, such that when users select a region of one plot, the displays of its neighbours update accordingly. “It lets us actually explore relationships in a multidimensional way,” says Jeffrey Heer, a computer scientist at the University of Washington whose lab developed Vega-Lite.

Two other products let researchers create interactive apps that make use of widgets such as drop-down menus and slider controls to blend data, graphics and code: Shiny, made by RStudio in Boston, Massachusetts, for R, and Plotly’s Dash for Python. They work by transmitting the user’s widget actions to a remote server, which runs the underlying code and updates the page.

The resulting apps can make data and tools accessible to researchers who are uncomfortable with programming. For instance, graduate student Tal Galili worked with colleagues at Tel Aviv University to develop a Plotly-based toolbox to build interactive heat maps from uploaded data sets, as well as a Shiny interface that runs the code behind the scenes. Mine Çetinkaya-Rundel, a statistician at Duke University in Durham, North Carolina, has built Shiny resources for her undergraduate statistics courses to help her to illustrate difficult concepts during lectures.

“It’s nice to just pull that up and say, ‘okay, now that we’ve introduced this thing, what happens when we move around the widgets?’” she says.

Publishing such integrations on journal web pages involves making changes to authoring tools, editorial workflows and infrastructure. It might also involve entrusting scientific data to third parties, who cannot always guarantee their permanence.

To help address this, open-access publisher eLife’s Reproducible Document Stack project aims to create an end-to-end tool set for authoring, submitting and publishing documents that are computationally reproducible, says Giuliano Maciocci, who leads product development at eLife. The plan is to encapsulate many of a paper’s core scientific ‘artefacts’ — its text, figures, code, data and computational environment — in a single downloadable object, he says. To encourage adoption, the journal is making the stack open source.

Making headway

Several other journals and publishers now support Code Ocean integration, including GigaScience, IEEE, SPIE, Cambridge University Press and Taylor & Francis. The Journal of Cell Biology’s JCB DataViewer, based on open-source OMERO software, lets readers explore raw microscopy images rather than the processed, compressed files they typically see. A related tool, the Image Data Resource, offers similar functionality for papers published in any journal. Nature, too, has published interactive figures, for instance in a paper describing the Encyclopedia of DNA Elements project³. A spokesperson says that the journal is investigating several other options for interactive code and figures. In the meantime, researchers often link to external visualizations from their articles.

As more journals embrace interactivity, the online presentation of scientific information could fundamentally change, representing a win for reproducibility, says Erez Lieberman Aiden of the Baylor College of Medicine in Houston, Texas, who published interactive chromatin interaction maps in a recent Cell paper⁴. Static figures are just one perspective on the data. “Informed readers need the ability to draw their own conclusions,” he says. “The act of reading a paper in 1974 and the act of reading a paper in 2017 shouldn’t be the same act.”

Nature 554, 133-134 (2018)

doi: https://doi.org/10.1038/d41586-018-01322-9

Updates & Corrections

Correction 21 February 2018: An earlier version of this article implied that Benjamin Delory developed the persistence barcode method.

References

Delory, B. M. et al. F1000Research 7, 22 (2018).
Article Google Scholar
Jung, D. & Ge, X. F1000Research 6, 1969 (2017).
Article CAS Google Scholar
The ENCODE Project Consortium. Nature 489, 57–74 (2012).
Article PubMed CAS Google Scholar
Rao, S. S. P. et al. Cell 171, 305–320 (2017).
Article PubMed CAS Google Scholar

Download references

Subjects

Latest on:

Researchers want a ‘nutrition label’ for academic-paper facts

Nature Index 17 APR 24

Rwanda 30 years on: understanding the horror of genocide

Editorial 09 APR 24

Three ways ChatGPT helps me in my academic writing

Career Column 08 APR 24

Structure peer review to make it more robust

World View 16 APR 24

Is ChatGPT corrupting peer review? Telltale words hint at AI use

News 10 APR 24

Three ways ChatGPT helps me in my academic writing

Career Column 08 APR 24

Retractions are part of science, but misconduct isn’t — lessons from a superconductivity lab

Editorial 24 APR 24

Londoners see what a scientist looks like up close in 50 photographs

Career News 18 APR 24

Researchers want a ‘nutrition label’ for academic-paper facts

Nature Index 17 APR 24

Jobs

Technician - Senior Technician in Cell and Molecular Biology

APPLICATION CLOSING DATE: 24.05.2024 Human Technopole (HT) is a distinguished life science research institute founded and supported by the Italian ...

Milan (IT)

Human Technopole
Postdoctoral Fellow

The Dubal Laboratory of Neuroscience and Aging at the University of California, San Francisco (UCSF) seeks postdoctoral fellows to investigate the ...

San Francisco, California

University of California, San Francsico
Postdoctoral Associate

Houston, Texas (US)

Baylor College of Medicine (BCM)
Postdoctoral Research Fellow

Description Applications are invited for a postdoctoral fellow position at the Lunenfeld-Tanenbaum Research Institute, Sinai Health, to participate...

Toronto (City), Ontario (CA)

Sinai Health
Postdoctoral Research Associate - Surgery

Memphis, Tennessee

St. Jude Children's Research Hospital (St. Jude)

Data visualization tools drive interactivity and reproducibility in online publishing

Updates & Corrections

References

Subjects

Latest on:

Jobs

Technician - Senior Technician in Cell and Molecular Biology

Postdoctoral Fellow

Postdoctoral Associate

Postdoctoral Research Fellow

Postdoctoral Research Associate - Surgery

Search

Quick links