Data Descriptors, Scientific Data's primary article type, describe scientifically valuable datasets. These datasets must be made available to editors and referees at the time of submission, and must be shared with the scientific community as a condition of publication. Here, we provide information on the types of data that should be archived, guidance for authors on selecting a suitable repository for their data, and how to archive sensitive data.
Scientific Data’s data policies are compatible with the standardised research data policies set out by Springer Nature.
Please read on for our data deposition policies, and please contact us if you would like additional advice on how best to meet these requirements for your own data.
- Selecting a repository
- At initial submission
- At publication
- Data derived from experiments on animals or human subjects
- Challenging data-types
- Data preservation
- Dataset updates
- Data repository criteria
- Suggesting additional repositories
- Data citation
Data deposition policy
It is Scientific Data's policy that all datasets central to a Data Descriptor manuscript; including computational or curated data, as well as data produced by an experimental or observational procedure – should be submitted to an appropriate external repository. We believe this is the best means of making these data discoverable, reproducible and reusable, and we work with our authors to identify the most appropriate location(s) for their data.
Authors should provide their data in the 'rawest' form that will permit substantial reuse. It may be advantageous to release some types of data at multiple levels to enable their broadest reuse – for example, proteomics data may best be released as ‘raw’ spectra as well as more processed peptide- or protein-level data. Authors may also submit supplementary information files – including code (also see our code availability policy), models, workflows and summary tables – but primary data should not be submitted as supplementary information.
Scientific Data's data availability policies are compatible with the standardised research data policies set out by Springer Nature.
Selecting a repository
Although Scientific Data mandates the release of the datasets that accompany our manuscripts, we do not ourselves host data. Instead, we encourage submission of data to community-recognized data repositories where possible, and recommend deposition to a generalist repository if no community resource is available. In case the most appropriate community repository does not support confidential peer-review, Scientific Data can help authors host their data temporarily in one of our integrated generalist data repositories.
Authors often present more than one kind of data in a Data Descriptor. As our articles link directly to each data record, publishing with us can facilitate the linking together of disparate types of data, and in such cases we recommend that each dataset be archived in the resource to which it is best suited. For example, raw and derived gene-expression data might best be deposited in ArrayExpress or GEO; however, the results of a principal component analysis, control qPCRs or blots, or associated phenotypic assays could be stored in a generalist repository.
Please note that archiving data on personal or laboratory websites is usually not sufficient for submission of a Data Descriptor.
At initial submission
Authors must deposit their data in an approved data repository as part of the manuscript submission process; manuscripts will not otherwise be sent to review. If data have not already been uploaded to a repository, authors may upload files to figshare or the Dryad Digital Repository during submission of their Data Descriptor manuscript through Scientific Data’s integrated submission system.
If datasets are not open to the public at the time of submission, authors must provide secure links and/or passcodes so that referees may access and evaluate the data in a confidential manner. Authors should not provide their own personal login details. If the most appropriate repository for their data does not support confidential peer-review, authors should upload datasets to figshare or the Dryad Digital Repository for the duration of the peer-review process. Should the manuscript be accepted, the data should be made available from the more suitable resource to immediately prior to final publication.
It is a condition of publication that authors deposit their data in an appropriate repository, and agree to make the data publicly available without restriction, excepting reasonable controls related to human privacy or biosafety.
During the peer-review process, Editors, Editorial Board Members and referees are asked to evaluate whether the data repository(s) selected by the authors is appropriate, and may deem it necessary for authors to archive their data in additional repositories prior to publication.
Data derived from experiments on animals or human subjects
Scientific Data asks authors to report experiments on living organisms according to the policies laid out by the Nature-titled journals.
Authors are expected to describe in detail any controls or limitations on access to or usage of human data in the Usage Notes section of the Data Descriptor manuscript. The process by which researchers may apply for access to the data, and the conditions under which such access may be granted, should similarly be described. Referees with expertise in human-subject research and human privacy will be asked to assess the feasibility of these controls. In general, co-authorship or research collaboration should not be prerequisites for data access.
Authors are required to provide referees with access to the data in a confidential manner, even if the data cannot be fully released to the public. Please contact us at firstname.lastname@example.org before submitting if special controls or authorization would be required to allow referees to access your data.
Certain datasets, particularly very large ones, pose inherent technological challenges to sharing over the internet. For data exceeding a few gigabytes, we encourage you to correspond with individual repositories prior to submitting a Data Descriptor to ensure your data can be uploaded and shared effectively.
Scientific Data believes that researchers, institutions, journals and data repositories have a shared responsibility to ensure long-term data preservation, and we encourage authors to select data repositories with this goal in mind.
Authors must commit to preserving their datasets, on their own laboratory or institutional servers, for at least five years after publishing a Data Descriptor. If, during that time, the repository to which the data were originally submitted disappears or experiences data loss, we may ask the authors to upload the data to another repository and publish a correction or update to the original Data Descriptor.
If authors remove their data from the original public repository or change access criteria in a manner that is inconsistent with the published Data Descriptor, we may ask authors to publish a correction or, in extreme circumstances, to retract the Data Descriptor publication.
Scientific Data understands that important datasets often grow and evolve, and we are glad to work with authors to ensure that datasets can be updated while also maintaining a stable version of the data as published in line with our data preservation requirements. One way to achieve this is to deposit a static version of the data to an appropriate repository, while hosting in parallel a dynamic version in a project-specific resource, allowing users to find the latest data. Both versions of the dataset should be described in the Data Records section of the manuscript. Some of the repositories we work with have well-developed versioning or update systems, which may satisfy our data preservation requirements without the need of a separate static version of the dataset. We encourage authors to discuss the versioning mechanisms available with the maintainers of the repository they have chosen to host their data.
Scientific Data is also glad to consider manuscripts that provide updates on important datasets previously published at Scientific Data or other journals – especially if a new project milestone is reached or if there are important changes in the data collection methods that merit a new publication. New Data Descriptors can be formally linked to previous related publications, and can refer to common datasets through their Data Citations sections.
Data repository criteria
Scientific Data bases its repository recommendations on the following criteria. Trusted data repositories should:
- Be broadly supported and recognized within their scientific community
- Ensure long-term persistence and preservation of datasets in their published form
- Provide expert curation
- Implement relevant, community-endorsed reporting requirements
- Allow anonymous referees to access data before public release
- Provide stable identifiers for submitted datasets
- Allow public access to data without unnecessary restrictions
These are general criteria, and not absolute standards: recommended repositories need not meet every criterion.
When relevant, we encourage authors to review and follow appropriate reporting standards for their field or data-type. Authors may browse a list of registered reporting standards for the life sciences at the BioSharing website, or may refer to our repositories page for more information.
Suggesting additional repositories
If you would like to suggest an unlisted data repository for inclusion on this list, please ask the repository managers to complete our repository questionnaire and email this to email@example.com. The aim of this evaluation process is to identify repositories which are able to serve the wider community, and so are appropriate for Scientific Data to recommend to authors. However, we are glad to consider Data Descriptor manuscripts on data stored in project-specific or community-specific repositories on a case-by-case basis, even if the host repository is not selected for listing on our recommended repository list. Therefore please be aware that a decision not to list a repository as recommended should not be interpreted as a comment on the quality and utility of a repository for the immediate community it serves.
We also encourage life-science and biomedical repositories and communities to register their services and reporting standards at BioSharing. Physics, astrophysics, astronomy and geoscience databases should be registered with re3data.org. Please see our related blog post for more information on how your community or data service can become involved with Scientific Data.
Data Descriptors include formal data citations, which help track data reuse and credit scientists for sharing their data. Data citations should be used to reference any datasets stored in external repositories that are mentioned within a manuscript, including the main datasets that are the focus of a Data Descriptor, and any other datasets that have been used in the study. For previously published datasets, we ask authors to cite both the related research articles and the actual datasets. For more information on how to provide data citations in submitted manuscripts, please see the manuscript templates in our Submission guidelines.
In publications at other journals, we ask researchers to cite Data Descriptors using traditional literature references, and to additionally cite any datasets used, where the journal supports data citations.