Data sharing: An open mind on open data

Journal name:
Nature
Volume:
529,
Pages:
117–119
Date published:
DOI:
doi:10.1038/nj7584-117a
Published online

The move to make scientific findings transparent can be a major boon to research, but it can be tricky to embrace the change.

It is a movement building steady momentum: a call to make research data, software code and experimental methods publicly available and transparent. A spirit of openness is gaining traction in the science community, and is the only way, say advocates, to address a 'crisis' in science whereby too few findings are successfully reproduced. Furthermore, they say, it is the best way for researchers to gather the range of observations that are necessary to speed up discoveries or to identify large-scale trends.

The open-data shift poses a conundrum for junior researchers, who are carving out their niche. On the one hand, the drive to share is gathering official steam. Since 2013, global scientific bodies — including the European Commission, the US Office of Science and Technology Policy and the Global Research Council — have begun to back policies that support increased public access to research.

Andy Baker/Getty

On the other hand, scientists disagree about how much and when they should share data, and they debate whether sharing it is more likely to accelerate science and make it more robust, or to introduce vulnerabilities and problems.

As more journals and funders adopt data-sharing requirements, and as a growing number of enthusiasts call for more openness, junior researchers must find their place between adopters and those who continue to hold out, even as they strive to launch their own careers.

One key challenge facing young scientists is how to be open without becoming scientifically vulnerable. They must determine the risk of jeopardizing a job offer or a collaboration proposal from those who are wary of — or unfamiliar with — open science. And they must learn how to capitalize on the movement's benefits, such as opportunities for more citations and a way to build a reputation without the need for conventional metrics, such as publication in high-impact journals.

The nascent era of openness is best embodied by the Transparency and Openness Promotion (TOP) guidelines for journals, first published1 in Science by researchers at the Center for Open Science in Charlottesville, Virginia. Adoption of the guidelines by a journal or organization signifies to the research community that it supports transparency, openness and reproducibility (whether an experiment can be replicated by the original researcher or by someone else).

Those tenets apply to all aspects of science, including experimental design, data sharing and the publication of null findings and replication studies. As Nature went to press, 538 publishers and journals — including Elsevier and Springer Nature — had signed up to the TOP guidelines, along with 57 organizations, among them the American Association for the Advancement of Science, which publishes Science.

A drive to reproduce

Some fields have embraced open data more than others. Researchers in psychology, a field rocked by findings of irreproducibility in the past few years, have been especially vocal proponents of the drive for more-open science. In one of the latest examples of irreproducibility issues, investigators tried to replicate results from 100 psychological studies but succeeded in fewer than half of them2.

A few psychology journals have created incentives to increase interest in reproducible science — for example, by affixing an 'open-data' badge to articles that clearly state where data are available. According to social psychologist Brian Nosek, executive director of the Center for Open Science, the average data-sharing rate for the journal Psychological Science, which uses the badges, increased tenfold to 38% from 2013 to 2015.

Funders, too, are increasingly adopting an open-data policy. Several strongly encourage, and some require, a data-management plan that makes data available. The US National Science Foundation is among these. “There used to be no enforcement, but that's changing,” says Karthik Ram, a data scientist at the Berkeley Institute for Data Science in California and co-founder of ROpenSci, which develops open-source software programmes. Some philanthropic funders, including the Bill & Melinda Gates Foundation in Seattle, Washington, and the Wellcome Trust in London, also mandate open data from their grant recipients.

“Open science, data sharing, software sharing is the future of science.”

Others, such as the Gordon and Betty Moore Foundation in Palo Alto, California, encourage sharing but do not require it. Still, the trend is clear, says Carly Strasser, who oversees the foundation's Data-Driven Discovery Initiative. “Open science, data sharing, software sharing is the future of science,” she says. “It's only going to get more difficult to engage in science without being open.”

But many young researchers, especially those who have not been mentored in open science, are uncertain about whether to share or to stay private. Graduate students and postdocs, who often are working on their lab head's grant, may have no choice if their supervisor or another senior colleague opposes sharing.

Some fear that the potential repercussions of sharing are too high, especially at the early stages of a career. “Everybody has a scary story about someone getting scooped,” says New York University astronomer David Hogg. Those fears may be a factor in a lingering hesitation to share data even when publishing in journals that mandate it (see Nature 515, 478; 2014).

Researchers at small labs or at institutions focused on teaching arguably have the most to lose when sharing hard-won data. “With my institution and teaching load, I don't have postdocs and grad students,” says Terry McGlynn, a tropical biologist at California State University, Dominguez Hills. “The stakes are higher for me to share data because it's a bigger fraction of what's happening in my lab.”

Researchers also point to the time sink that is involved in preparing data for others to view. Once the data and associated materials appear in a repository, answering questions and handling complaints can take many hours.

The time investment can present other problems. In some cases, Ram says, it may be difficult for junior researchers to embrace openness when senior colleagues — many of whom head tenure and promotion committees — might scoff at what they may view as misplaced energies. “I've heard this recently — that embracing the idea of open data and code makes traditional academics uncomfortable,” says Ram. “The concern seems to be that open advocates don't spend their time being as productive as possible.”

An open-science stance can also add complexity to a collaboration. Kate Ratliff, who studies social attitudes at the University of Florida in Gainesville, says that it can seem as if there are two camps in a field — those who care about open science and those who don't. “There's a new area to navigate — 'Are you cool with the fact that I'll want to make the data open?' — when talking with somebody about an interesting research idea,” she says.

Glass half full

Despite complications and concerns, the upsides of sharing can be significant. For example, when information is uploaded to a repository, a digital object identifier (DOI) is assigned. Scientists can use a DOI to publish each step of the research life cycle, not just the final paper. In so doing, they can potentially get three citations — one each for the data and software, in addition to the paper itself. And although some say that citations for software or data have little currency in academia, they can have other benefits.

Many advocates think that transparent data procedures with a date and time stamp will protect scientists from being scooped. “This is the sweet spot between sharing and getting credit for it, while dissuading plagiarism,” says Ivo Grigorov, a project coordinator at the National Institute of Aquatic Resources Research Secretariat in Charlottenlund, Denmark. Hogg says that scooping is less of a problem than many think. “The two cases I'm familiar with didn't involve open data or code,” he says.

Open science also offers junior researchers the chance to level the playing field by gaining better access to crucial data. Ross Mounce, a postdoc studying evolutionary biology at the University of Cambridge, UK, is a vocal champion of open science, partly because his fossil-based phylogenetic research depends on access to others' data. He says that more openness in science could help to dissuade what some perceive as a common practice of shutting out early-career scientists' requests for data.

There is some evidence to support that statement. A study in 2014 sought data from 217 studies published between 2000 and 2013. But the team could secure only 40% of what they requested, and responses varied according to the requester's seniority3.

McGlynn says that many of the obstacles — whether real or perceived — to open science can be sidestepped. He is on the editorial board for the journal Biotropica, which encourages — but does not require — authors to contact the original researcher when they use someone else's archived data, which can be embargoed for up to three years. “Not only will you get their valuable insights, but it's inclusive and fair,” he says.

Communication also helps for those who worry about jeopardizing a collaboration, he says. Concerns about open science should be discussed at the outset of a study. “Whenever you start a project with someone, you have to establish a clear understanding of expectations for who owns the data, at what point they go public and who can do what with them,” he says.

It isn't hugely difficult to share data (see 'Open-data pro tips'). Online repositories such as FigShare or Zenodo make it increasingly easy to deposit scientific content for widespread consumption. More than 400 virtual communities have formed to share data, software and documented workflows so that a user can deploy them straight away, says Tim Smith, who oversees collaboration and information services at Zenodo. The repository launched in May 2013 at CERN, Europe's particle-physics laboratory near Geneva, Switzerland.

Box 1: Open-data pro tips

Scientists who are cautious about open science can start small by sharing data for a project that they have already completed. Specialists in the field offer this advice:

Document a data-deposition plan while working on publications, so that the data and the paper will be ready for publication at the same time. It is not necessary, however, to release data alongside a paper, unless a funder mandates it.

Craft a very explicit statement about data reuse — including who can use the data, how to use them and how to attribute them.

Machine-readable data will be most easily combined with other data sets. Avoid proprietary data formats, such as Microsoft spreadsheets, or colour-coded cells that are readable only by humans.

Permanently archive data in reputable repositories such as FigShare or Zenodo, not on a personal website.

If you choose to share data from a new project, make sure to generate the relevant metadata as you go. It is very hard to reconstruct important details after the fact. Tools such as those on Zenodo enable researchers to document such details throughout a project, so that all you have to do is flip a switch when you are ready to share.

And although there is a time cost associated with uploading and organizing raw data, subsequent queries can often be averted by adding reader-friendly instructions at the start. Hogg recommends that researchers simultaneously upload tutorials and examples of how to use the content.

In the end, sharing data, software and materials with colleagues can help an early-career researcher to garner recognition — a crucial component of success. “The thing you are searching for is reputation,” says Titus Brown, a genomics researcher at the University of California, Davis. “To get grants and jobs, you have to be relevant and achieve some level of public recognition. Anything you do that advances your presence — especially in a larger sphere, outside the communities you know — is a net win.”

References

  1. Nosek, B. et al. Science 348, 14221425 (2015).
  2. Open Science Collaboration Science 349, 6251 (2015).
  3. Magee, A. F. et al. PLoS ONE 9, e110268 (2014).

Download references

Author information

Affiliations

  1. Virginia Gewin is a freelance writer in Portland, Oregon.

Author details

Additional data