Recommended Data Repositories
Scientific Data mandates the release of datasets accompanying our Data Descriptors, but we do not ourselves host data. Instead, we ask authors to submit datasets to an appropriate public data repository. Data should be submitted to discipline-specific, community-recognized repositories where possible, or to generalist repositories if no suitable community resource is available.
Repositories included on this page have been evaluated to ensure that they meet our requirements for data access, preservation and stability. Please be aware, however, that some repositories on this page may only accept data from those funded by specific sources, or may charge for hosting data. Please ensure you are aware of any deposition policies for your chosen repository. If your repository of choice is not listed please see our guidelines for suggesting additional repositories.
Authors must deposit their data to a recommended data repository as part of the manuscript submission process; manuscripts will not otherwise be sent for review. If data have not been deposited to a repository prior to manuscript submission, authors can upload their data to figshare or the Dryad Digital Repository during the submission process. Data may also be deposited to these resources temporarily, if the main host repository does not support confidential peer review.
We provide a date-stamped archive of our recommended repository list, which is available for use under the CC-BY licence. Recommended repositories and standards that are indexed by FAIRsharing, can be also be viewed and filtered via the Scientific Data FAIRsharing collection.
View data repositories
- Biological sciences: Nucleic acid sequence; Protein sequence; Molecular & supramolecular structure; Neuroscience; Omics; Taxonomy & species diversity; Mathematical & modelling resources; Cytometry and Immunology; Imaging; Organism-focused resources
- Health sciences
- Chemistry and Chemical biology
- Earth, Environmental and Space sciences: Broad scope Earth & environmental sciences; Astronomy & planetary sciences; Biogeochemistry and Geochemistry; Climate sciences; Ecology; Geomagnetism & Palaeomagnetism; Ocean sciences; Solid Earth sciences
- Materials science
- Social sciences
- Generalist repositories
- Other repositories
Biological sciences ⤴
Nucleic acid sequence ⤴
Sequence information should be deposited following the MIxS guidelines.
Simple genetic polymorphisms or structural variations should be submitted to dbSNP or dbVar (please note that these repositories cannot accept sensitive data derived from human subjects); the NCBI Trace Archive may be used for capillary electrophoresis data, while SRA accepts NGS data only.
Protein sequence ⤴
Molecular & supramolecular structure ⤴
These repositories accept structural data for small molecules (COD); peptides and proteins (all); and larger assemblies (EMDB).
Small molecule crystallographic data should be uploaded to Dryad or figshare before manuscript submission, and should include a .cif file, a structural figure with probability ellipsoids, and structure factors for each structure. Both the structure factors and the structural output must have been checked using the IUCR's CheckCIF routine, and a copy of the output must be included at submission, together with a justification for any alerts reported.
These data repositories all accept human-derived data (NeuroMorpho.org and G-Node also accept data from other organisms). Please note that human-subject data submitted to OpenfMRI must be de-identified, while FCP/INDI can handle sensitive patient data.
|NeuroMorpho.org||view FAIRsharing entry|
|Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI)||view FAIRsharing entry|
|OpenNeuro (formerly OpenfMRI)||view FAIRsharing entry|
|G-Node||view FAIRsharing entry|
Functional genomics is a broad experimental category, and Scientific Data's recommendations in this discipline likewise bridge disparate research disciplines. Data should be deposited following the relevant community requirements where possible.
Please refer to the MIAME standard for microarray data. Molecular interaction data should be deposited with a member of the International Molecular Exchange Consortium (IMEx), following the MIMIx recommendations.
For data linking genotyping and phenotyping information in human subjects, we strongly recommend submission to dbGAP, EGA or JGA, which have mechanisms in place to handle sensitive data.
Metabolomics & Proteomics
Metabolomics data should be submitted following the MSI guidelines.
|MassIVE||view FAIRsharing entry|
|MetaboLights||view FAIRsharing entry|
|PeptideAtlas||view FAIRsharing entry|
|PRIDE||view FAIRsharing entry|
Taxonomy & species diversity ⤴
Mathematical & modelling resources ⤴
|BioModels Database||view FAIRsharing entry|
|Kinetic Models of Biological Systems (KiMoSys)||view FAIRsharing entry|
|The Network Data Exchange (NDEx)||view FAIRsharing entry|
Cytometry and Immunology ⤴
|Image Data Resource||view FAIRsharing entry|
|The Cancer Imaging Archive||view FAIRsharing entry|
|SICAS Medical Image Repository||view FAIRsharing entry|
|Coherent X-ray Imaging Data Bank (CXIDB)||view FAIRsharing entry|
Organism-focused resources ⤴
These resources provide information specific to a particular organism or disease pathogen. They may accept phenotype information, sequences, genome annotations and gene expression patterns, among other types of data. Incorporating data into these resources can be very valuable for promoting reuse within these specific communities; however, where applicable, we ask that data records be submitted both to a community repository and to one suitable for the type of data (e.g. transcriptome profiling; please see above).
Health sciences ⤴
Some of the repositories in this section are suitable for datasets requiring restricted data access, which may be required for the preservation of study participant anonymity in clinical datasets. We suggest contacting repositories directly to determine those with data access controls best suited to the specific requirements of your study.
Chemistry and Chemical biology ⤴
|caNanoLab *||view FAIRsharing entry|
|ChEMBL *||view FAIRsharing entry|
|NCBI PubChem BioAssay||view FAIRsharing entry|
|NCBI PubChem Substance||view FAIRsharing entry|
|Beilstein-Institut, STRENDA||view FAIRsharing entry|
Earth, Environmental and Space sciences ⤴
Broad scope Earth & environmental sciences ⤴
|NASA Goddard Earth Sciences Data and Information Services Center||view re3data entry|
|NERC Data Centres||view re3data entry|
|PANGAEA||view re3data entry|
Astronomy & planetary sciences ⤴
Biogeochemistry and Geochemistry ⤴
|EarthChem||view re3data entry|
|Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)||view re3data entry|
Climate sciences ⤴
|AEKOS - TERN Ecoinformatics||view FAIRsharing entry|
|Environmental Data Initiative (formerly LTER Network Information System Data Portal)||view re3data entry|
|Global Biodiversity Information Facility (GBIF)||view FAIRsharing entry|
|KNB: The Knowledge Network for Biocomplexity||view FAIRsharing entry|
Geomagnetism & Palaeomagnetism ⤴
Ocean sciences ⤴
|Australian Antarctic Data Centre (AADC)||view re3data entry|
|Australian Ocean Data Network||view re3data entry|
|Marine Geosciences Data System||view re3data entry|
|SEANOE||view FAIRsharing entry|
Solid Earth sciences ⤴
|British Geological Survey||view re3data entry|
|EarthChem||view re3data entry|
|Magnetics Information Consortium (MagIC)||view re3data entry|
|Marine Geosciences Data System||view re3data entry|
|UNAVCO, Inc.||view re3data entry|
Materials science ⤴
Social sciences ⤴
|Archaeology Data Service||view re3data entry|
|Harvard Dataverse||view re3data entry|
|openICPSR||view re3data entry|
|Open Science Framework||view FAIRsharing entry|
|Qualitative Data Repository||view FAIRsharing entry|
|UK Data Service||view re3data entry|
Generalist repositories ⤴
Scientific Data encourages authors to archive data to one of the above data-type specific repositories where possible. Where a data-type specific repository is not available, we recommend the following generalist repositories, which can handle a wide variety of data. Generalist repositories may also be appropriate for archiving associated analyses, or experimental-control data, supplementing the primary data in a data-type specific repository.
|Repository Name||Information on fees/costs||Size limits||Integrated with Scientific Data's manuscript submission system||Re3data / FAIRSharing entry|
|Dryad Digital Repository||$120 USD for first 20 GB, and $50 USD for each additional 10 GB||None stated||Yes ✔||view FAIRsharing entry|
|figshare||100 GB free per Scientific Data manuscript. Additional fees apply for larger datasets||1 TB per dataset||
Yes ✔ - To qualify for the 100 GB of free storage, data must be uploaded to figshare via our submission system. Download instructions.
|view FAIRsharing entry|
|Harvard Dataverse||Contact repository for datasets over 1 TB||
2.5 GB per file, 10 GB per dataset
|No||view re3data entry|
|Open Science Framework||Free of charge||5 GB per file, multiple files can be uploaded||No||view FAIRsharing entry|
|Zenodo||Donations towards sustainability encouraged||50 GB per dataset||No||view re3data entry|
|Mendeley Data||Contact repository for datasets over 10 GB||10 GB per dataset||No||view FAIRsharing entry|
Other repositories ⤴
Researchers in the Earth and space sciences may wish to use the repository finder tool developed by DataCite and re3data.org, as part of the AGU’s Enabling FAIR Data Project.
To use one of these other repositories, select 'DataCite DOI' as the repository name during manuscript submission. Please note that if your chosen repository is unable to support confidential peer-review, you will be asked to temporarily deposit a copy of the dataset to one of our integrated generalist repositories to facilitate review of your article. Upon completion of peer review, the temporary copy will be erased.
* Curated resource which may not accept direct submission of data. Contact the database directly for further information.