Data Repository Guidance

Scientific Data mandates the release of datasets accompanying our Data Descriptors, but we do not ourselves host data. Instead, we ask authors to submit datasets to an appropriate public data repository. Data should be submitted to discipline-specific, community-recognized repositories where possible. Where a suitable discipline-specific resource does not exist, data should be submitted to a generalist repository.

Authors must deposit their data to a data repository as part of the manuscript submission process; manuscripts will not otherwise be sent for review. If data have not been deposited to a repository prior to manuscript submission, authors can upload their data to figshare or the Dryad Digital Repository during the submission process. Data may also be deposited to these resources temporarily, if the main host repository does not support confidential peer review.

Repositories need to meet our requirements for (anonymous peer-review) data access, preservation, resource stability, and suitability for use by all researchers with the appropriate types of data.

Data repositories should meet all of the following requirements:

  • Ensure long-term persistence and preservation of datasets (minimum of 5 years after publication)
  • Be supported by a research community or research institution
  • Provide deposited datasets with stable and persistent identifiers, e.g. DataCite, ISTIC or JaLC Digital Object Identifiers (DOIs)
  • Allow access to data without unnecessary restrictions. Scientific Data does not allow commercial reuse restrictions (learn more). Exceptions will only be permitted for human derived data and should be discussed with the editorial team prior to manuscript submission.
  • Provide clear terms of data use and data access (or licence) on each dataset landing page
  • Facilitate anonymous reviewer access for embargoed data

 
The list below is intended as a guide for those who are unsure where to deposit their data, and provides examples of repositories from a number of disciplines. The repositories on this list met all of the journal’s data hosting requirements at the time of listing. As of 2021, this list will not be expanded further and therefore the use of an alternative data repository is not precluded, provided it meets the above stated criteria.

Please be aware, however, that some repositories may charge for hosting data.

Authors may also wish to use external resources such as DataCite’s Repository Finder and the FAIRsharing registry to find an appropriate repository for their data.

View data repositories

Biological sciences 

Nucleic acid sequence 

Novel DNA sequence, novel RNA sequence, and novel genome assembly data must be deposited to repositories that are part of the International Nucleotide Sequence Collaboration (INSDC) or to those which are working towards INSDC inclusion (as listed below), unless there are privacy or ethics restrictions that prevent open sharing of such data. These data may in addition be deposited to regional and national repositories as required. For human data that requires special controls, please see our recommended health sciences repositories.

Data types Repository options Data and metadata standards

Raw sequencing data (reads or traces)

Genome assemblies

Annotated sequences

Sample metadata

 

INSDC repositories

Genome Sequence Archive (GSA)

 

Browse data and metadata standards endorsed by the Genome Standards Consortium
Genetic variation data

dbSNP (human variations less than 50bp)
dbVar (human variations greater than 50bp)
ClinVar (human genotype & phenotype)
European Variation Archive (EVA) (all species)
Genome Sequence Archive for Human (GSA-Human)

 

Protein sequence 

Molecular & supramolecular structure 

These repositories accept structural data for small molecules (COD); peptides and proteins (all); and larger assemblies (EMDB).

Small molecule crystallographic data should be uploaded to Dryad or figshare before manuscript submission, and should include a .cif file, a structural figure with probability ellipsoids, and structure factors for each structure. Both the structure factors and the structural output must have been checked using the IUCR's CheckCIF routine, and a copy of the output must be included at submission, together with a justification for any alerts reported.

Neuroscience 

These data repositories all accept human-derived data (NeuroMorpho.org and G-Node also accept data from other organisms). Please note that human-subject data submitted to OpenfMRI must be de-identified.

Omics 

Functional genomics

Functional genomics is a broad experimental category, and Scientific Data's recommendations in this discipline likewise bridge disparate research disciplines. Data should be deposited following the relevant community requirements where possible.

Please refer to the MIAME standard for microarray data. Molecular interaction data should be deposited with a member of the International Molecular Exchange Consortium (IMEx), following the MIMIx recommendations.

For data linking genotyping and phenotyping information in human subjects, we strongly recommend submission to dbGAP, EGA or JGA, which have mechanisms in place to handle sensitive data.

Metabolomics & Proteomics

Metabolomics data should be submitted following the MSI guidelines.

We ask authors to submit proteomics data to members of the ProteomeXchange consortium (listed below), following the MIAPE recommendations.

Taxonomy & species diversity 

Mathematical & modelling resources 

Cytometry and Immunology 

Imaging 

Organism-focused resources 

These resources provide information specific to a particular organism or disease pathogen. They may accept phenotype information, sequences, genome annotations and gene expression patterns, among other types of data. Incorporating data into these resources can be very valuable for promoting reuse within these specific communities; however, where applicable, we ask that data records be submitted both to a community repository and to one suitable for the type of data (e.g. transcriptome profiling; please see above).

Health sciences 

Some of the repositories in this section are suitable for datasets requiring restricted data access, which may be required for the preservation of study participant anonymity in clinical datasets. We suggest contacting repositories directly to determine those with data access controls best suited to the specific requirements of your study.

Chemistry and Chemical biology 

Earth, Environmental and Space sciences 

Broad scope Earth & environmental sciences 

Astronomy & planetary sciences 

Biogeochemistry and Geochemistry 

Climate sciences 

Ecology 

Geomagnetism & Palaeomagnetism 

Ocean sciences 

Solid Earth sciences 

Physics 

Materials science 

Social sciences 

Generalist repositories 

Scientific Data encourages authors to archive data to one of the above data-type specific repositories where possible. Where a data-type specific repository is not available, the following generalist repositories might be suitable. Generalist repositories may also be appropriate for archiving associated analyses, or experimental-control data, supplementing the primary data in a discipline-specific repository.

The generalist repositories listed below are able to accept data from all researchers, regardless of location or funding source. If your institution has its own generalist data repository this can be used to host your data as long as the repository is able to mint DataCite DOIs, and allows data to be shared under open terms of use (for example the CC0 waiver). Please note that if your chosen repository is unable to support confidential peer-review, you will be asked to temporarily deposit a copy of the dataset to one of our integrated generalist repositories to facilitate review of your article. Upon completion of peer review, the temporary copy will be erased. To use a repository which does not appear in the manuscript submission system, select 'DataCite DOI' as the repository name during the submission process.

Repository Name Information on fees/costs Size limits Integrated with Scientific Data's manuscript submission system Re3data / FAIRSharing entry
Dryad Digital Repository $120 USD for first 20 GB, and $50 USD for each additional 10 GB None stated Yes ✔ view FAIRsharing entry
figshare 100 GB free per Scientific Data manuscript. Additional fees apply for larger datasets 1 TB per dataset

Yes ✔ - To qualify for the 100 GB of free storage, data must be uploaded to figshare via our submission system. Download instructions.

view FAIRsharing entry
Harvard Dataverse Contact repository for datasets over 1 TB

2.5 GB per file, 10 GB per dataset

No view re3data entry
Open Science Framework Free of charge 5 GB per file, multiple files can be uploaded No view FAIRsharing entry
Zenodo Donations towards sustainability encouraged 50 GB per dataset No view re3data entry
Science Data Bank Free of charge 8 GB per file, no limit to dataset size No view FAIRsharing entry

 

* Curated resource which may not accept direct submission of data. Contact the database directly for further information.