Every mountaineer knows the sinking feeling of reaching a peak after a hard climb, only to see the true summit still above. Scientists who take on the tough terrain of open access may have a similar experience. After they reach the notable goal of sharing their research papers, they discover that a higher summit awaits: open data.
In many fields, making research data available online for all is a step beyond making research papers open-access. This might puzzle communities that have already agreed to share. Biologists routinely upload DNA sequences to the public repository GenBank, for example, creating a scientific commons for everyone’s benefit. There are now more than 600 subject-specific repositories, with community-specific standards.
Yet even some of the most strident open-access supporters baulk at the concept of fully open data, judging by the reaction to a strengthened data-sharing policy instituted by the Public Library of Science (PLOS) this month. PLOS now requires researchers to make their papers’ underlying data open online on publication, apart from data that they have a duty to keep private, such as that on human study participants (go.nature.com/rd27aa). Journals such as Molecular Ecology have mandated the same thing for years. But the PLOS move has provoked heated discussion and highlighted some important, yet unsettled, aspects of the practice and ethics of online data-sharing.
A few years ago, a survey found that scientists cited a lack of time and money, as well as technical barriers, to explain why they did not post data online (C. Tenopir et al. PLoS ONE 6, e21101; 2011). It still takes time to prepare data, but increasingly, other excuses do not fly. General-purpose storage sites such as Dryad and figshare are cheap (or free) and suitable for all kinds of data sets; data journals provide publication venues appealing to the traditionally minded; and standards are emerging for citing other people’s data sets (see Nature 500, 243–245; 2013).
Harder to surmount are the feeling of data ownership and the fear of being ‘scooped’. Years of toil to collect a data set that might support a decade of career-making publications could be rendered moot when another researcher jumps on the information online. This is a particular problem for early-career researchers, and for those working with unique data sets in small ecology or environmental-science laboratories.
Behind this fear is the worry that other scientists will not provide credit for the data they use. Research administrators place such importance on paper authorship that it is probably not enough for a study that leans significantly on another researcher’s hard-won data set to merely cite that researcher, perhaps depriving them of a publication.
Communities need to debate the ethics of data-sharing and agree on etiquette. When a researcher relies on another’s data, for example, it should be standard practice to invite the data-providers to be co-authors. Ecologists Clifford Duke and John Porter have suggested guidelines for deciding whether to extend such an invitation (C. S. Duke and J. H. Porter BioScience 63, 483–489; 2013); these include noting whether the data are integral to the new analysis, whether the data are unique or particularly novel, and whether the data-provider can fully participate in manuscript-writing by approving draft and final versions. Another ecologist, Dominique Roche, has urged disclosure of data reuse, and better communication between data generators and reusers (D. G. Roche et al. PLoS Biol. 12, e1001779; 2014).
It is not clear whether widespread online data-sharing will increase uncredited scooping. For now, Naturemandates uploading data when structured community data repositories exist, and encourages it otherwise. Before you can climb the highest mountains, you need proper safeguards and a decent map.
- Journal name:
- Date published: