Data Descriptors, Scientific Data's primary article type, describe datasets. These must be made available to editors and referees at the time of submission and be shared with the scientific community at final publication. Here, we provide recommendations for selecting a suitable repository and how to archive sensitive data.
Scientific Data’s data policies are compatible with the standardised research data policies set out by Springer Nature, and the requirements of the Data Policy Standardisation and Implementation Interest Group of the Research Data Alliance.
Please read on for our data deposition policies, and please contact us if you would like additional advice on how best to meet these requirements for your own data.
- Selecting a repository
- At initial submission
- At publication
- Human Data
- Challenging data-types
- Data preservation
- Dataset updates
- Data repository criteria
- Data citation
- Data management plans
Data deposition policy
It is Scientific Data's policy that all datasets central to a Data Descriptor manuscript; including computational or curated data, as well as data produced by an experimental or observational procedure – should be submitted to an appropriate external repository. We believe this is the best means of making these data discoverable, reproducible and reusable, and we work with our authors to identify the most appropriate location(s) for their data.
Authors should provide their data at a level 'rawness' that allows it to be re-used, in alignment with accepted norms within their community. It may be advantageous to release some types of data at multiple levels to enable wider reuse – for example, proteomics data may best be released as ‘raw’ spectra as well as more processed peptide- or protein-level data – however this is not mandated as long as the level of 'rawness' is sufficient for some potential use. Authors may submit supplementary information files – including code (also see our code availability policy), models, workflows and summary tables – however we strongly encourage deposition in repositories as a first preference, especially for primary data, which should not be submitted as supplementary information.
Selecting a repository
Although Scientific Data mandates the release of the datasets that accompany our manuscripts, we do not ourselves host data. Instead, we encourage submission of data to community-recognized data repositories where possible, and recommend deposition to a generalist repository if no community resource is available. In case the most appropriate community repository does not support confidential peer-review, Scientific Data can help authors host their data temporarily in one of our integrated generalist data repositories.
Authors often present more than one kind of data in a Data Descriptor. As our articles link directly to each data record, publishing with us can facilitate the linking together of disparate types of data, and in such cases we recommend that each dataset be archived in the resource to which it is best suited. For example, raw and derived gene-expression data might best be deposited in ArrayExpress or GEO; however, the results of a principal component analysis, control qPCRs or blots, or associated phenotypic assays could be stored in a generalist repository.
Please note that archiving data on personal or laboratory websites is not sufficient for submission of a Data Descriptor.
At initial submission
Authors must deposit their data in an approved data repository as part of the manuscript submission process; manuscripts will not otherwise be sent to review. If data have not already been uploaded to a repository, authors may upload files to figshare or the Dryad Digital Repository during submission of their Data Descriptor manuscript through Scientific Data’s integrated submission system.
If datasets are not open to the public at the time of submission, authors must provide secure links and/or passcodes so that referees may access and evaluate the data in a confidential manner. Authors should not provide their own personal login details. If the most appropriate repository for their data does not support confidential peer-review, authors should upload datasets to figshare or the Dryad Digital Repository for the duration of the peer-review process. Should the manuscript be accepted, the data should be made available from the more suitable resource to immediately prior to final publication.
It is a condition of publication that authors deposit their data in an appropriate repository, and agree to make the data publicly available without restriction, excepting reasonable controls related to human privacy or biosafety.
During the peer-review process, Editors, Editorial Board Members and referees are asked to evaluate whether the data repository(s) selected by the authors is appropriate, and may deem it necessary for authors to archive their data in additional repositories prior to publication.
Scientific Data is primarily focused on publishing descriptions of datasets that can be openly shared without restriction. We understand that data on humans often cannot be shared openly due to the need to protect privacy and ensure ethical use. Scientific Data can only consider submissions on sensitive datasets of this kind under specific conditions. The data must be hosted in a suitable repository and there must be a clear way for our peer-reviewers to evaluate the data anonymously (learn more).
Authors who are submitting a human-derived dataset must complete our Human Data Checklist and provide it as a supplementary file during the submission process. The Usage Notes section of the Data Descriptor must include information on how researchers may apply for data access and the conditions under which it will be granted, explicitly noting any limitations on access or reuse. Co-authorship and/or research collaboration should not be prerequisites for data access.
Any submissions that include data on humans, living or deceased, including consumer data or data from donated tissues, must state whether the individuals provided consent for data collection and sharing, and must describe the consent or opt-in/out process in their methods section. Authors who obtained data or tissues from third-parties are responsible for obtaining this information from the provider. If data are collected from or about minors, consent must be provided by a parent or legal guardian. Authors must ensure that their study and data release plan comply with all relevant ethical and legal requirements.
Datasets that include direct identifiers (e.g. names), or three or more indirect identifiers, will not be considered at the journal unless participants provided informed consent for their identities to be shared. Exceptions to this policy may only be made if the data release has been approved by an appropriate institutional review board and the data are shared through a suitable controlled-access data repository. For more information on what may constitute indirect identifiers please see here.
Certain datasets, particularly very large ones, pose inherent technological challenges to sharing over the internet. For data exceeding a few gigabytes, we encourage you to correspond with individual repositories prior to submitting a Data Descriptor to ensure your data can be uploaded and shared effectively.
Scientific Data believes that researchers, institutions, journals and data repositories have a shared responsibility to ensure long-term data preservation, and we encourage authors to select data repositories with this goal in mind.
Authors must commit to preserving their datasets, on their own laboratory or institutional servers, for at least five years after publishing a Data Descriptor. If, during that time, the repository to which the data were originally submitted disappears or experiences data loss, we may ask the authors to upload the data to another repository and publish a correction or update to the original Data Descriptor.
If authors remove their data from the original public repository or change access criteria in a manner that is inconsistent with the published Data Descriptor, we may ask authors to publish a correction or, in extreme circumstances, to retract the Data Descriptor publication.
Scientific Data understands that important datasets often grow and evolve, and we are glad to work with authors to ensure that datasets can be updated while also maintaining a stable version of the data as published in line with our data preservation requirements. One way to achieve this is to deposit a static version of the data to an appropriate repository, while hosting in parallel a dynamic version in a project-specific resource, allowing users to find the latest data. Both versions of the dataset should be described in the Data Records section of the manuscript. Some of the repositories we work with have well-developed versioning or update systems, which may satisfy our data preservation requirements without the need of a separate static version of the dataset. We encourage authors to discuss the versioning mechanisms available with the maintainers of the repository they have chosen to host their data.
Scientific Data is also glad to consider manuscripts that provide updates on important datasets previously published at Scientific Data or other journals – especially if a new project milestone is reached or if there are important changes in the data collection methods that merit a new publication. New Data Descriptors should refer to common datasets via formal data citation in their reference sections.
Data repository criteria
The following criteria should be taken into account when selecting an appropriate repository, ensuring that platforms:
- Ensure long-term persistence and preservation of datasets in their published form
- Provide stable identifiers for submitted datasets (DOIs in most cases)
- Allow public access to data without barriers, such as logins or paywalls
- Support open licences (CC0 and CC-BY, or their equivalents, are required in most cases)
- Provide for confidential review of submitted datasets without the requirement for reviewers to provide identifying information
When relevant, we encourage authors to review and follow appropriate reporting standards for their field or data-type and to our repositories page for more information.
Authors are required to formally cite any datasets stored in external repositories that are mentioned within their manuscript, including the main datasets that are the focus of the submission, as well as any other datasets that have been used in the work. For previously published datasets, we ask authors to cite both the related research articles and the datasets themselves. Appropriate citation of data is checked and enforced by Scientific Data staff prior to publication. For more information on how to cite datasets in submitted manuscripts, please see our Submission Guidelines.
In publications at other journals, we ask researchers to cite Data Descriptors using traditional literature references, and to additionally cite any datasets used, where the journal supports data citations.
All Springer Nature journals, including Scientific Data, are participants in the Initiative for Open Citations. As such, data citations are included in full in the formal reference list, exported to Crossref and are openly available.
Further information on the journal’s policy and approach to data citation can be found in a the following editorials: