Everyone wants better ways to make research data available and to give more credit to the researchers who create and share data. But for a data set to be widely reusable, scientists need to know how the data were produced and what quality-control experiments were performed. They need access to detailed descriptions of the data outputs, file formats, sample identifiers and replication structure. This is hard work that is often poorly rewarded. As a result, potentially valuable data sets go unpublished, or are not fully released to the public or not described in sufficient detail to permit reuse.
To address this need, Nature Publishing Group will next spring launch Scientific Data, an open-access, online-only journal for detailed descriptions of data sets (http://nature.com/scientificdata). This week, Scientific Data announced its first call for submissions (see go.nature.com/1gnd1j). The doors are now open for scientists to submit ‘Data Descriptor’ manuscripts — a new article type that is designed to describe scientifically valuable data sets in a way that will promote data sharing and reuse.
Data Descriptor articles are fully fledged, peer-reviewed scientific publications, and will be listed in major indexing services, thereby giving authors the credit they deserve for sharing their data and making it usable by others. All Data Descriptors will be released under a Creative Commons licence that allows researchers to reuse, redistribute and remix the articles’ content.
The format of the Data Descriptor includes ‘Technical Validation’ and ‘Usage Notes’ sections. These will allow authors to characterize the quality of the data and to provide advice on their reuse — valuable information that does not always fit into traditional research articles. And, as is the case in other Nature journals, the Methods section will have no length limit, giving authors space to provide detailed, reproducible descriptions of their experiments.
Data Descriptors will link to both related journal articles and data files stored at data repositories, helping readers to navigate easily between research, data descriptions and the actual data. And each Data Descriptor publication will be supported by machine-readable experimental metadata to help advanced users mine and search Scientific Data’s content. Metadata records will be curated by in-house staff to ensure consistent and useful annotation, and will be released in the ISA-Tab format (see S.-A. Sansone et al. Nature Genet. 44, 121–126; 2012).
Peer reviewers of Data Descriptors will focus on the technical rigour of the data-collection procedures, the completeness of the data and alignment with existing community standards. They will check that the data are indeed worth sharing, but will specifically be asked not to base their evaluations on the perceived impact or novelty of the findings associated with the data sets. Scientific Data’s editors have already conducted peer review of a small set of prototype Data Descriptor manuscripts, and have found that scientists adapt quickly to this different peer-review perspective.
What Scientific Data will not be is a new data repository. Rather, it will promote and cooperate with existing community-based repositories, and will combat data fragmentation by ensuring that data sets are deposited in an appropriate repository. Scientific Data is also working with figshare and Dryad, two repositories that accept a wide range of research data types. Integrated data upload is already available with figshare — authors may deposit their data as they submit their Data Descriptor manuscript. Editors and referees will be given secure, confidential access to the data files through the figshare website, and the data will be made public when the Data Descriptor is published.
Scientific Data will not be a place to publish new conclusions or hypothesis-driven analyses, and editors will ask authors to remove material that is beyond the journal’s scope. This will help to ensure that Data Descriptor publications can exist alongside and complement primary research articles. Authors may publish stand-alone Data Descriptors about data sets that have not been used in other publications, or Data Descriptors about data sets published elsewhere but for which a more in-depth description is merited.
Editors of Nature journals have agreed that prior publication of a Data Descriptor will not jeopardize publication of research articles, as long as those articles go beyond a descriptive analysis of the data and report major scientific findings. Scientific Data will initially focus on the life, biomedical and environmental sciences, but may in due course be open to a broader range of scientific disciplines.
- Journal name:
- Date published: