The mantra of the nascent open-data movement — that scientists should share online all data underlying their findings — sounds simple. But it can be tough to achieve in practice. An informal audit of one of the movement’s biggest proponents, the Public Library of Science (PLOS), shows that not everyone is complying with the publisher’s pioneering open-data mandate, and hints at the challenges that journals can face in enforcing open-data goals.
The idea that the progress of research will be accelerated if others can easily and freely build on data sets is gaining currency. Last week the Bill & Melinda Gates Foundation in Seattle, Washington, announced that it would demand open data of the researchers it funds.
But whereas some research communities, such as geneticists and crystallographers, have long-established norms of open data, most funders and publishers (including Nature), mindful of researcher autonomy, merely exhort scientists to make their data open. Many surveys have found that scientists are worried about being scooped on future projects, or argue that they have signed agreements not to share their data.
So it was a step-change when in March PLOS made it a requirement that authors who publish in its journals share online all the data necessary to reproduce their studies. It was not the first publisher to convert encouragement into a mandate, but it was the largest.
The new policy piqued the curiosity of Tim Vines, managing editor of Molecular Ecology, one of the few journals apart from the PLOS family with an open-data mandate. Vines and his colleagues published a survey on the effectiveness of different open-data mandates in early 2013 (T. H. Vines et al. FASEB J. 27, 1304–1308; 2013), focusing on a subset of evolutionary biology papers that all used a free software package called STRUCTURE to map the genetic structure of populations on the basis of DNA profiling of individuals.
That study included 51 PLoS ONE papers, and found that just 6 of them had shared the data that went into the STRUCTURE study. In a new analysis, Vines found 20 papers that mentioned STRUCTURE and had been published since March 2014 — including one that tracked different varieties of cotton plants in the Caribbean, and another that compared different populations of a particular sparrow across the southern United States (download full data). Eight of the new studies (40%) had shared the genotype data — meaning that a reader would be able to repeat their analysis. The remaining 60% of papers had not made their data available, even though each stated that “all data underlying the findings are fully available without restriction”, in accordance with PLOS’s policy (see ‘Free the data’).
Because the open-data mandate is new, and PLoS ONE publishes so many papers, some manuscripts do get published without all their data being made available, says the journal’s editorial director Damian Pattinson. PLoS ONE does perform internal checks for some data types, but in this case it would have been the job of external peer reviewers to check whether appropriate underlying data were available, he says. And although PLoS ONE was grateful for their help, “there is a learning curve here for all involved to understand what the data-sharing standards are for all disciplines and data types”. He adds that once someone complains, the journal has a system for investigating papers that do not comply with its open-data requirement.
The research teams in question, nine of whom responded to the Nature news team’s request for an explanation, provide an insight into why data sharing does not always happen in the first place (download full data). Some had forgotten to upload their data and promptly rectified the fault. Four teams said that the journal’s editors and referees had never asked them to share the underlying genetics data, suggesting confusion over what the policy means.
Others aired wider — and common — objections to online data sharing. Even though they had chosen to publish with PLOS, some authors said that they wanted to hold back their data for future studies, or did not want to share the raw data unless they knew future users’ intent.
“A complete culture shift will be further down the line.”
Steve Simpson at the University of Exeter, UK, who reported work on Omani clownfish, said he was happy to share raw data privately with potential collaborators, but not to upload results from 400 individual fish that had taken great effort to collect. “The study is described so that it could be replicated by another expeditionary team who were willing to dive across Oman collecting rare fish under several hard-earned licences,” he wrote.
“There is a lot of inconsistency among fields as to what data are shared,” said another researcher, Sabrina Taylor at Louisiana State University in Baton Rouge, who conducted the sparrow study. She had not uploaded her genetics data because, she said, she was not aware of a public data repository for it.
Vines is optimistic about the prospects for open data. The PLOS mandate means the situation is “already better than it was”, he says. “At its core, the problem is author education”, he adds. Even at Molecular Ecology, which has been enforcing an open-data policy since 2011, “we still have to ask for additional data sets for about half of papers at the acceptance stage”.
It will take time to make PLOS’s policy clear and easy to comply with across all scientific fields, says Pattinson. “A complete culture shift will be further down the line.”
- Journal name:
- Date published: