Data Repository Guidance
Scientific Data mandates the release of datasets accompanying our Data Descriptors, but we do not ourselves host data. Instead, we ask authors to submit datasets to an appropriate public data repository. Data should be submitted to discipline-specific, community-recognized repositories where possible. Where a suitable discipline-specific resource does not exist, data should be submitted to a generalist repository.
Authors must deposit their data to a data repository as part of the manuscript submission process; manuscripts will not otherwise be sent for review. If data have not been deposited to a repository prior to manuscript submission we offer a service to deposit them at figshare or dryad during the submission process via our article submission platform. Data may also be deposited to these resources temporarily, if the main host repository does not support confidential peer review (see below).
Repositories need to meet our requirements for anonymous peer-review, data access, preservation, resource stability, licences and suitability for use by all researchers with the appropriate types of data:
- Use open licences (CC0 and CC-BY, or their equivalents, are required in most cases learn more). Exceptions will only be permitted for human derived data that is considered sensitive (e.g. risk of participant identification, controls on specific uses, etc), where we suggest data are shared under Data Usage Agreements (DUAs). We do not typically support the use of more restrictive CC licences - containing SA, NC or ND clauses - for either sensitive or non-sensitive datasets, other than where applied to third party data that has been re-used and the original licence needs to be retained.
- Allow public access to data without barriers, such as formal application processes, unless required for sensitive human datasets requiring controlled access and Data Usage Agreements. Note that basic login functionalities, where data are captured for analytics purposes only, are accepted for non-sensitive datasets as long as immediate access is granted to the holder of the email address without manual checks, however we encourage login-free https access without registration in most cases.
- All data need to be available for peer review. Where logins or other barriers are required or temporarily applied, routes for confidential peer review of submitted datasets need to be provided that do not reveal the identity of the reviewer to the data owner/author of the associated article. Please consult with the repository to arrange this, or provide the data in a temporary location for peer review.
- Ensure long-term persistence and preservation of datasets in their published form. All Data Descriptors need to be associated with live data, so long term preservation and persistence is required to avoid future correction or other action to ensure the integrity of the paper.
- Provide stable persistent identifiers for submitted datasets. DOIs are the default for most non-omics datasets described in the journal.
- Subject specific repositories that are supported and recognized within their scientific community are strongly encouraged - general repositories should be used where no suitable subject repository is available, or the repository does not meet the requirements above.
The list below is intended as a guide for those who are unsure where to deposit their data, and provides examples of repositories from a number of disciplines. Please note this list does not constitute a formal or exclusive list of repositories accepted by the journal and there are many more repositories that meet our criteria than we are able to track. The list is no longer updated (since 2021), but is retained as a useful list of suggestions.
Authors may also wish to use external resources such as DataCite’s Repository Finder and the FAIRsharing registry to find an appropriate repository for their data.
Please note that certain data types (e.g. most omics and cystallographic data) are subject to mandates on which repository should be used. Please see our policy on mandated data types for further informaton.
View data repositories
- Biological sciences: Nucleic acid sequence; Protein sequence; Molecular & supramolecular structure; Neuroscience; Omics; Taxonomy & species diversity; Mathematical & modelling resources; Cytometry and Immunology; Imaging; Organism-focused resources
- Health sciences
- Chemistry and Chemical biology
- Earth, Environmental and Space sciences: Broad scope Earth & environmental sciences; Astronomy & planetary sciences; Biogeochemistry and Geochemistry; Climate sciences; Ecology; Geomagnetism & Palaeomagnetism; Ocean sciences; Solid Earth sciences
- Materials science
- Social sciences
- Generalist repositories
Biological sciences ⤴
Nucleic acid sequence ⤴
Novel DNA sequence, novel RNA sequence, and novel genome assembly data must be deposited to repositories that are part of the International Nucleotide Sequence Collaboration (INSDC) or to those which are working towards INSDC inclusion (as listed below), unless there are privacy or ethics restrictions that prevent open sharing of such data. These data may in addition be deposited to regional and national repositories as required. For human data that requires special controls, please see our recommended health sciences repositories.
|Data types||Repository options||Data and metadata standards|
Raw sequencing data (reads or traces)
|Browse data and metadata standards endorsed by the Genome Standards Consortium|
|Genetic variation data||
dbSNP (human variations less than 50bp)
Protein sequence ⤴
Molecular & supramolecular structure ⤴
These repositories accept structural data for small molecules; peptides and proteins (all); and larger assemblies (EMDB).
Small molecule crystallographic data should be uploaded to Dryad or figshare before manuscript submission, and should include a .cif file, and structure factors for each structure. Both the structure factors and the structural output must have been checked using the IUCR's CheckCIF routine, and a copy of the output must be included at submission, together with a justification for any alerts reported.
These data repositories all accept human-derived data (NeuroMorpho.org and G-Node also accept data from other organisms). Please note that human-subject data submitted to OpenfMRI must be de-identified.
|NeuroMorpho.org||view FAIRsharing entry|
|OpenNeuro (formerly OpenfMRI)||view FAIRsharing entry|
|G-Node||view FAIRsharing entry|
|Neuroimaging Informatics Tools and Resources Collaboratory (NITRC)||view FAIRsharing entry|
|EBRAINS||view FAIRsharing entry|
Functional genomics is a broad experimental category, and Scientific Data's recommendations in this discipline likewise bridge disparate research disciplines. Data should be deposited following the relevant community requirements where possible.
Please refer to the MIAME standard for microarray data. Molecular interaction data should be deposited with a member of the International Molecular Exchange Consortium (IMEx), following the MIMIx recommendations.
For data linking genotyping and phenotyping information in human subjects, we strongly recommend submission to dbGAP, EGA or JGA, which have mechanisms in place to handle sensitive data.
Metabolomics & Proteomics
Metabolomics data should be submitted following the MSI guidelines.
|MassIVE||view FAIRsharing entry|
|MetaboLights||view FAIRsharing entry|
|PeptideAtlas||view FAIRsharing entry|
|PRIDE||view FAIRsharing entry|
|Panorama Public||view FAIRsharing entry|
Taxonomy & species diversity ⤴
Mathematical & modelling resources ⤴
|BioModels Database||view FAIRsharing entry|
|Kinetic Models of Biological Systems (KiMoSys)||view FAIRsharing entry|
|The Network Data Exchange (NDEx)||view FAIRsharing entry|
Cytometry and Immunology ⤴
Organism-focused resources ⤴
These resources provide information specific to a particular organism or disease pathogen. They may accept phenotype information, sequences, genome annotations and gene expression patterns, among other types of data. Incorporating data into these resources can be very valuable for promoting reuse within these specific communities; however, where applicable, we ask that data records be submitted both to a community repository and to one suitable for the type of data (e.g. transcriptome profiling; please see above).
Health sciences ⤴
Some of the repositories in this section are suitable for datasets requiring restricted data access, which may be required for the preservation of study participant anonymity in clinical datasets. We suggest contacting repositories directly to determine those with data access controls best suited to the specific requirements of your study.
Chemistry and Chemical biology ⤴
|ioChem-BD Computational Chemistry Datasets||view re3data entry|
|NCBI PubChem BioAssay||view FAIRsharing entry|
|NCBI PubChem Substance||view FAIRsharing entry|
|Beilstein-Institut, STRENDA||view FAIRsharing entry|
Earth, Environmental and Space sciences ⤴
Broad scope Earth & environmental sciences ⤴
Astronomy & planetary sciences ⤴
Biogeochemistry and Geochemistry ⤴
|EarthChem||view re3data entry|
|Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)||view re3data entry|
Climate sciences ⤴
|TERN Data Discovery Portal||view FAIRsharing entry|
|Environmental Data Initiative (formerly LTER Network Information System Data Portal)||view re3data entry|
|Global Biodiversity Information Facility (GBIF)||view FAIRsharing entry|
|KNB: The Knowledge Network for Biocomplexity||view FAIRsharing entry|
Geomagnetism & Palaeomagnetism ⤴
Ocean sciences ⤴
|Australian Antarctic Data Centre (AADC)||view re3data entry|
|Australian Ocean Data Network (DOIs only assigned to deposited data on request)||view re3data entry|
|Marine Data Archive|
|Marine Geosciences Data System||view re3data entry|
|SEANOE||view FAIRsharing entry|
Solid Earth sciences ⤴
Materials science ⤴
|NoMaD Repository||view FAIRsharing entry|
|Materials Cloud||view FAIRsharing entry|
|MPContribs||view re3data entry|
Social sciences ⤴
|Archaeology Data Service||view re3data entry|
|Harvard Dataverse||view re3data entry|
|ICPSR||view re3data entry|
|Open Science Framework||view FAIRsharing entry|
|Qualitative Data Repository||view FAIRsharing entry|
|UK Data Service||view re3data entry|
Generalist repositories ⤴
Scientific Data encourages authors to archive data to one of the above data-type specific repositories where possible. Where a data-type specific repository is not available, the following generalist repositories might be suitable. Generalist repositories may also be appropriate for archiving associated analyses, or experimental-control data, supplementing the primary data in a discipline-specific repository.
|Repository Name||Information on fees/costs||Size limits||Integrated with Scientific Data's manuscript submission system||Re3data / FAIRSharing entry|
|Dryad Digital Repository||$120 USD for first 20 GB, and $50 USD for each additional 10 GB||None stated||Yes ✔||view FAIRsharing entry|
|figshare||100 GB free per Scientific Data manuscript.||1 TB per dataset||
Yes ✔ - To qualify for the 100 GB of free storage, data must be uploaded to figshare via our submission system. Download instructions.
|view FAIRsharing entry|
|Harvard Dataverse||Contact repository for datasets over 1 TB||
2.5 GB per file, 10 GB per dataset
|No||view re3data entry|
|Open Science Framework||Free of charge||5 GB per file, multiple files can be uploaded||No||view FAIRsharing entry|
|Zenodo||Donations towards sustainability encouraged||50 GB per dataset||No||view re3data entry|
|Science Data Bank||Free of charge||8 GB per file, no limit to dataset size||No||view FAIRsharing entry|