This page contains detailed information to help authors prepare, format and submit a manuscript. Please see our guide to authors for additional information and policies relevant to authors.
Choose a content-type ⤴
|Scope||Detailed descriptions of research datasets, which focus on helping others reuse data, rather than testing hypotheses or presenting new interpretations||A new analysis or meta-analysis of existing data, which highlights innovative examples of data reuse or presents compelling new findings||Original reports on systems or techniques that clearly advance data sharing and reuse to support reproducible research||Flexible format used to publish brief opinions, commentaries and announcements of interest to a broad section of the journal’s readership|
Read more about our Aims & Scope and content-types in our guide to authors. To publish in Scientific Data authors are required to pay an article-processing charge (APC), regardless of the selected content-type.
Select a repository for your data ⤴
When submitting a Data Descriptor, authors must deposit all relevant datasets in an appropriate public repository prior submission, and the completeness of these datasets will be considered during editorial evaluation and peer-review. Datasets must be made publicly available without restriction in the event that the Data Descriptor is accepted for publication (excepting reasonable controls related to human privacy issues or public safety).
Browse our list of recommended data repositories, and read our full data deposition policies. Authors may also upload their data to figshare or to Dryad during manuscript submission (find out more here).
Download a template ⤴
Authors are encouraged to use our manuscript template when preparing a Data Descriptor.
LaTeX template is provided by Overleaf. Authors may download this template and use it locally, or draft and submit their manuscript through the Overleaf online collaborative writing system. See our additional instructions for LaTeX users.
Draft your manuscript ⤴
All submissions should be clearly written, and understandable by scientists from diverse backgrounds, not just specialists. Technical jargon should be avoided as far as possible and clearly explained where its use is necessary. Titles and abstracts, in particular, should be written in language that will be readily intelligible to any scientist. We strongly recommend that authors ask a colleague with different expertise to review the manuscript before submission, in order to identify concepts and terminology that may present difficulties for non-specialist readers. Abbreviations, particularly those that are non-standard, should also be kept to a minimum and, where unavoidable, should be defined in the text or legends at their first occurrence.
Manuscripts published in Scientific Data are not subject to in-depth copy editing. Authors are responsible for procuring copy editing or language editing services for their manuscripts, either before submission, or at the revision stage, should they feel it would benefit their manuscript. Such services include those provided by our affiliates Nature Research Editing Service and American Journal Experts. Please note that the use of such a service is at the author's own expense and in no way implies that the article will be selected for peer review or accepted for publication.
The information below outlines the main sections of our main content-type, the Data Descriptor. Please see our Data Descriptor template for more information on the kinds of content that should be included in these sections. Articles and Analyses follow a more traditional research article format, and therefore do not include all of these sections. See the table above for the sections required for each content-type.
Titles & Abstracts
Titles may not exceed 110 characters, including whitespaces. They should avoid the use of acronyms, abbreviations, and unnecessary punctuation where possible. Colons and parentheses are not permitted.
The 'Abstract' may not exceed 170 words, and should not include references. They should succinctly describe the study, the assay(s) performed, the resulting data and their reuse potential; it should not make any claims regarding new scientific findings. No references are allowed in this section.
Authors & Affiliations
Author affiliations should provide enough detail for the author to be reached, including the department, institution, address, postal code and country wherever possible. They will be cited in numerical order within the author list, starting with the affiliations of the first author. Authors may acknowledge up to six equally contributing authors and up to six joint supervisors within the affiliations list using the standard footnotes "These authors contributed equally to this work" and "These authors jointly supervised this work". All other contributions should be described in the author contributions statement.
Background & Summary
This section should provide an overview of the study that generated the data, as well as outlining the potential reuse value of the data. Any previous publications that used these data, in whole or in part, should be cited and briefly summarized.
The Methods section in Data Descriptors should describe any steps or procedures used in producing the data, including full descriptions of the experimental design, data acquisition assays, and any computational processing (e.g. normalization, image feature extraction). Specific data outputs should be explicitly referenced via our data citation format. See our detailed guidance for providing reproducible methods descriptions in Step 5.
This section should be used to explain each data record associated with this work, including the repository where this information is stored, and to provide an overview of the data files and their formats. Each external data record should be cited using our data citation format format.
This section should present any experiments or analyses that are needed to support the technical quality of the dataset. This section may be supported by figures and tables, as needed.
'Usage Notes' is an optional section that can be used to provide information that may assist other researchers who reuse your data.
For all studies using custom code, a statement must be included under the subheading "Code Availability" indicating whether and how the code can be accessed, including any restrictions to access. This section should also include information on the versions of any software used, if relevant, and any specific variables or parameters used to generate, test, or process the current dataset. Please see our policy on code availability for more information. The code availability statement should be placed at the end of the manuscript, immediately before the references.
Acknowledgements, Author Contributions & Competing Interests
Data Descriptors, Articles and Analyses must include Acknowledgements, Authors contributions & Competing interest statements immediately before the References. Comments do not require an author contribution statement.
The 'Acknowledgements' statement should contain text acknowledging non-author contributors. Acknowledgements should be brief, and should not include thanks editors or effusive comments. Grant or contribution numbers may be acknowledged.
The 'Author contributions' statement should briefly describe each author's contribution to the work. Please see also the Nature journals' authorship policies.
A 'Competing interests' statement is required for all papers accepted by and published in Scientific Data. If there is no conflict of interest, a statement declaring this must still be included in the manuscript (e.g. "The author(s) declare no competing interests"). Please see our policies for more information on what may constitute a competing interest.
All references should be numbered sequentially, first throughout the text, then in tables, followed by figures and, finally, boxes; that is, references that only appear in tables, figures or boxes should be last in the reference list. Only one publication is given for each number. Only papers that have been published or accepted by a named publication or recognized preprint server should be in the numbered list; preprints of accepted papers in the reference list should be submitted with the manuscript.Grant details and acknowledgments are not permitted as numbered references. Footnotes are not used.
BibTeX bibliography files cannot be accepted. LaTeX submissions must contain all references within the manuscript .tex file itself. See our instructions for LaTeX users for more details.
The correct abbreviation for Scientific Data is 'Sci. Data'.
Scientific Data uses standard Nature referencing style. All authors should be included in reference lists unless there are six or more, in which case only the first author should be given, followed by ‘et al.’. Authors should be listed last name first, followed by a comma and initials (followed by full stops, '.') of given names. Article titles should be in Roman text; only the first word of the title should have an initial capital and the title should be written exactly as it appears in the work cited, ending with a full stop. Book titles should be given in italics and all words in the title should have initial capitals. Journal names are italicized and abbreviated (with full stops) according to common usage. Volume numbers and the subsequent comma appear in bold. The full page range should be given where appropriate. Published conference abstracts, numbered patents, and archived code with an assigned DOI may be included in the reference list. See the examples below for a journal article1, book2, book chapter3, preprint4, computer code5, online material6-8 and government report9.
- Schott, D. H., Collins, R. N. & Bretscher, A. Secretory vesicle transport velocity in living cells depends on the myosin V lever arm length. J. Cell Biol. 156, 35‐39 (2002).
- Hogan, B. Manipulating The Mouse Embryo: A Laboratory Manual 2nd edn (Cold Spring Harbor Laboratory Press, 1994)
- Haines, N. & Cotter, R. in Studies in Manic Depression Vol. 1 (ed. Boase, N.) Ch. 2 (Oxford Univ. Press, 1982).
- Babichev, S. A., Ries, J. & Lvovsky, A. I. Quantum scissors: teleportation of single-mode optical states by means of nonlocal single photon. Preprint at https://arxiv.org/abs/quant-ph/0208066 (2002).
- Gallotti, R. & Barthélemy, M. Source code for: The multilayer temporal network of public transport in Great Britain. Figshare https://doi.org/10.6084/m9.figshare.1249862.v1 (2014).
- Manaster, J. Sloth squeak. Scientific American Blog Network http://blogs.scientificamerican.com/psi-vid/2014/04/09/sloth-squeak (2014).
- QGIS Development Team. QGIS Geographic Information System, version 2.18.10. Open Source Geospatial Foundation Project https://qgis.org/en/site/ (2016).
- Hijmans, R. J., Phillips, S. J., Leathwich, J. & Elith, J. dismo: Species Distribution Modelling https://CRAN.R-project.org/package=dismo (2018).
- Akutsu, T. Total Heart Replacement Device. Report No. NIH-NHLI-69 2185-4 (National Institutes of Health, 1974).
In line with emerging industry-wide standards for data citation, references to all datasets described or used in the manuscript should be cited in the text with a superscript number and listed in the ‘References’ section in the same manner as a conventional literature reference.
An author list (formatted as above) and title for the dataset should be included in the data citation, and should reflect the author(s) and dataset title recorded at the repository. If author or title is not recorded by the repository, these should not be included in the data citation. The name of the data-hosting repository, URL to the dataset and year the data were made available are required for all data citations. For DOI-based (e.g. figshare or Dryad) repositories the DOI URL should be used. For repositories using accessions (e.g. SRA or GEO) an identifiers.org URL should be used where available. For first submissions, authors may choose to include just the accession number. Scientific Data staff will provide further guidance after peer-review. Please refer to the following examples of data citation for guidance:
- Zhang, Q-L., Chen, J-Y., Lin, L-B., Wang, F., Guo, J., Deng, X-Y. Characterization of ladybird Henosepilachna vigintioctopunctata transcriptomes across various life stages. figshare https://doi.org/10.6084/m9.figshare.c.4064768.v3 (2018).
- NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP121625 (2017).
- Barbosa, P., Usie, A. and Ramos, A. M. Quercus suber isolate HL8, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:PKMF00000000 (2018).
- DNA Data Bank of Japan https://trace.ddbj.nig.ac.jp/DRASearch/submission?acc=DRA004814 (2016).
Figures & Tables
Manuscripts may reference figures (e.g. Figure 1), tables (e.g. Table 1), online-only tables (e.g. Online-only Table 1) and Supplementary Information (e.g. Supplementary Table 1, Supplementary File 2, etc.). Please see the additional guidance below for submitting figures, tables and supplementary information.
All Data Descriptors should include one or more tables detailing the inputs (e.g. tissue samples, field sites, literature sources) and outputs (e.g. data files) that comprise the presented study. See the section below on "Submitting experimental metadata" for more information.
Check methods for transparency and reproducibility ⤴
Methods should be described in enough detail to allow other researchers to interpret and repeat, if required, the full study. Authors should cite previous descriptions of the methods under use, but ideally the method descriptions should be complete enough for others to understand and reproduce the methods and processing steps without referring to associated publications. There is no limit to the length of the Methods sections.
For Data Descriptors, the Methods section should describe any steps or procedures used in producing the data, including full descriptions of the experimental design, data acquisition assays, and any computational processing (e.g. normalization, image feature extraction). Specific data outputs should be explicitly referenced via data citation (see Data Records and Citing Data).
Authors should review the transparent methods checklist below, and ensure that their manuscript complies with any relevant points. Authors are also encouraged to search FAIRsharing.org for community reporting standards that may be relevant to their specific data-type.
Transparent Methods Checklist
Write a cover letter ⤴
Authors should provide a cover letter that includes the affiliation and contact information for the corresponding author, and briefly explains why the work should be considered appropriate for publication in Scientific Data. Authors are asked to suggest the names and contact information for scientific referees, and may include suggestions for Editorial Board Members, as well as requesting the exclusion of certain referees. Authors should indicate whether they have had any prior discussions with a Scientific Data Editorial Board Member about the work described in the manuscript.
We also ask that authors discuss any related works under consideration or in press at other journals in their cover letter. If this related work is cited in their Scientific Data submission, authors must provide a copy to facilitate peer review.
Submit your manuscript and related files via our online system.
For first submissions (i.e. not revised manuscripts), authors may submit a single PDF with integrated figures and tables – the figures may be inserted within the text at the appropriate positions, or grouped at the end.
Authors should note that only the following file types should be uploaded:
- For article text: DOC, DOCX, TEX
- For figures: PDF, EPS, TIFF, JPG
- For tables: XLS, XLSX, DOC, DOCX
Supplementary Information files may also be uploaded: see further guidance here.
Additional Guidance ⤴
Data Descriptors, Analyses and Articles should usually not have more than three figures, but additional figures can be allowed on a case-by-case basis. In addition, a limited number of uncaptioned molecular structure graphics and numbered mathematical equations may be included if necessary.
Scientific Data requires authors to present digital images in accord with the policies employed by the Nature-titled journals.
Authors are responsible for obtaining permission to publish any figures or illustrations that are protected by copyright, including figures published elsewhere and pictures taken by professional photographers. The journal cannot publish images downloaded from the Internet without appropriate permission.
Figures should be numbered separately with Arabic numerals in the order of occurrence in the text of the manuscript. One- or two-column format figures are required. When appropriate, figures should include error bars. A description of the statistical treatment of error analysis should be included in the figure legend.
Figure lettering should be in a clear, sans-serif typeface (for example, Helvetica); the same typeface in the same font size should be used for all figures in a paper. Use Symbol font for Greek letters. All display items should be on a white background, and should avoid excessive boxing, unnecessary colour, spurious decorative effects (such as three-dimensional ‘skyscraper’ histograms) and highly pixelated computer drawings. The vertical axis of histograms should not be truncated to exaggerate small differences. Labelling must be of sufficient size and contrast to be readable, even after appropriate reduction. The thinnest lines in the final figure should be no smaller than one point wide. Authors will see a PDF proof that will include figures.
Figures divided into parts should be labelled with a lower-case bold a, b, and so on, in the same type-size as used elsewhere in the figure. Lettering in figures should be in lower-case type, with only the first letter of each label capitalized. Units should have a single space between the number and the unit, and follow SI nomenclature (for example, ms rather than msec) or the nomenclature common to a particular field. Thousands should be separated by commas (1,000). Unusual units or abbreviations should be spelled out in full or defined in the legend. Scale bars should be used rather than magnification factors, with the length of the bar defined on the bar itself rather than in the legend. In legends, please use visual cues rather than verbal explanations such as ‘open red triangles’.
Unnecessary figures should be avoided: data presented in small tables or histograms, for instance, can generally be described briefly in the text instead. Figures should not contain more than one panel unless the parts are logically connected; each panel of a multipart figure should be sized so that the whole figure can be reduced by the same amount and reproduced at the smallest size at which essential details are visible.
Figures for peer-review
At the initial submission stage authors may choose to upload separate figure files or to incorporate figures into the main article file, ensuring that any inserted figures are of sufficient quality to be clearly legible. When submitting a revised manuscript all figures must be uploaded as separate figure files ensuring that the image quality and formatting conforms to the specifications below.
Figures for publication
When creating and submitting final figure files, please follow the guidelines below. Failure to do so can significantly delay publication of your work.
Each complete figure must be supplied as a separate file upload. Multi-part/panel figures must be prepared and arranged as a single image file (including all sub-parts; a, b, c, etc.). Please do not upload each panel individually.
We encourage authors to prepare their figures in a quality vector graphics software package, such as Adobe Illustrator or Inkscape. Figures should then be saved directly in the EPS format. When importing graphs or schematics from other programs, authors are encouraged to remake any text labels in a vector graphics program to ensure consistent quality.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Line art, graphs, charts and schematics
For optimal results, all line art, graphs, charts and schematics should be supplied in vector format, such as EPS, PDF or AI, and should be saved or exported as such directly from the application in which they were made. Please ensure that data points and axis labels are clearly legible.
2. Photographic and bitmapped images
All photographic and bitmap images should be supplied in a bitmap image format such as TIFF, JPG or PSD. If saving TIFF files, please ensure that the compression option is selected to avoid very large file sizes.
Please do not supply Word or Powerpoint files with placed images. Images can be supplied as RGB or CMYK (note: we will not convert image colour modes).
Figures that do not meet these standards will not reproduce well and may delay publication until we receive high-resolution images.
3. Chemical structures
Chemical structures should be produced using ChemDraw or a similar program. All chemical compounds must be assigned a bold Arabic numeral in the order in which the compounds are presented in the manuscript text. Structures should then be exported into a 300 dpi RGB TIFF file before being submitted.
4. Stereo images
Stereo diagrams should be presented for divergent ‘wall-eyed’ viewing, with the two panels separated by 5.5 cm. In the final accepted version of the manuscript, the stereo images should be submitted at their final page size.
Figure legends begin with a brief title sentence summarizing the purpose of the figure as a whole and continue with a short description of what is shown in each panel and an explanation of any symbols used. Legends must total no more than 350 words and may contain literature references.
Each figure legend should contain, for each panel where they are relevant:
- the exact sample size (n) for each experimental group/condition, given as a number, not a range;
- a description of the sample collection allowing the reader to understand the independence of samples, clearly identifying any ‘technical replicates’ – i.e., repeated measurements on the same sample;
- a statement of how many times the experiment shown was replicated in the laboratory;
- definitions of statistical methods and measures: very common tests, such as t-tests, simple χ2 tests, Wilcoxon and Mann-Whitney tests can be unambiguously identified by name only, but more complex techniques should be described in the Methods section;
- definition of ‘centre values’ as median or average;
- definition of error bars as s.d. or s.e.m.
Any descriptions too long for the figure legend should be included in the Methods section. Please also refer to our statistical guidelines.
Authors may provide tables within the Word document or as separate files. Legends, where needed, should be included in the Word document. Generally, Data Descriptors, Analyses and Articles should have fewer than ten tables, but more may be allowed when needed. Tables may be of any size, but only tables that fit onto a single printed page will be included in the PDF version of the article (up to a maximum of three, see below).
All Data Descriptors should include one or more tables detailing the inputs (e.g. tissue samples, field sites, literature sources) and outputs (e.g. data files) for the presented study. See the section below on "Submitting experimental metadata" for more information.
Due to typesetting constraints, tables that cannot be fit onto a single A4 page cannot be included in the PDF version of the article and will be made available in the online version only. Any such tables must be labelled in the text as ‘Online-only’ tables and numbered separately from the main table list e.g. ‘Table 1, Table 2, Online-only Table 1’ etc. Any bibliographic references included in online-only tables should be cited last in the ordering.
Submitting experimental metadata
Every Data Descriptor published by Scientific Data includes a machine-accessible metadata file. This metadata record provides a structured description of the dataset, including key characteristics about the generated data. Metadata is captured and distributed in a JSON-based format, which is designed to enable semantic reasoning and data discovery across multiple disciplines and repositories.
Authors are encouraged to include human-readable tables in their manuscript that account for all the inputs and outputs of their study in a detailed manner. Example tables are available in our Data Descriptor manuscript template.
At revision, authors will be asked to provide details about their datasets via our Metadata Creator tool. The metadata files will then be finalised by our in-house curation team, when the Data Descriptor is accepted for publication. Professional curation helps to ensure the use of consistent and standardized annotation using a core set of community ontologies, to facilitate machine accessibility.
Where human data are involved, we recognize that privacy controls may preclude highly detailed descriptions of patients or participants within metadata records. Please make sure that any privacy-related limitations on data-sharing are discussed in the cover letter of your submission.
The metadata files are published under the CC0 licence, allowing other users and resources to ingest and mine this information without restriction. Please note that the metadata records are a value-added product and are not considered part of the 'version of record' of published articles. Therefore the metadata files may be updated from time to time; for instance, to reflect changes in metadata formats or community ontologies.
Equations and mathematical expressions should be provided in the main text of the paper. Equations that are referred to in the text are identified by parenthetical numbers, such as (1), and are referred to in the manuscript as ‘equation (1)’.
Scientific Data discourages authors from supplying text, figures or tables as supplementary files. As much as possible, these types of content should be included in the main manuscript. The main sections of the Data Descriptor manuscript, and particularly the Methods section, have no length limits. Data Descriptors are designed to be focused publications: if extensive supplementary text or figures are required, authors should consider whether the manuscript might best be subdivided into multiple Data Descriptors. Similarly, any primary data files should be deposited in an appropriate public repository, rather than included as Supplementary Information. Scientific Data does not allow statements of ‘data not shown’. Please see our data deposition policies.
With these restrictions in mind, authors may use Supplementary Information for any additional content needed to support the Data Descriptor, such as media (e.g. audio or video), or machine-readable versions of mathematical models. Authors may supply code and computational models as Supplementary Information, particularly for initial submissions. However, upon acceptance of a manuscript, we encourage the public archiving of code (through a DOI-issuing repository); and computational models (in field specific computational model repositories). See our code availability policy for more information.
The guidelines below detail the creation, citation and submission of Supplementary Information. Publication may be delayed if these are not followed correctly. Please note that modification of Supplementary Information after the paper is published requires a formal correction, so authors are encouraged to check their Supplementary Information carefully before submitting the final version.
- Designate each item as a Supplementary File and number accordingly: for example, ‘Supplementary File 1’. This numbering should be separate from that used in tables and figures appearing in the main article.
- Refer to each piece of supplementary material at the appropriate point(s) in the Data Descriptor. Be sure to include the word ‘Supplementary’ each time one is mentioned. Every piece of Supplementary Information must be mentioned at least once in the main article.
- Remember to include a brief title and legend (incorporated into the file to appear near the image) as part of every figure submitted, and a title as part of every table.
- File sizes should be as small as possible, with a maximum size of 10 MB, so that they can be downloaded quickly.
- When supplying multiple supplementary figures, they should be merged into a single PDF file, with figure legends immediately below each figure. A table of contents should be included on the first page, listing the page number of each supplementary figure.
Every submission that contains statistical analyses or data-processing steps must explain the statistical methods in detail either in the Methods or the relevant figure legend. Any special statistical code or software needed for scientists to reuse or reanalyse datasets should be discussed in the Usage Notes section of Data Descriptors. We encourage authors to make openly available any code or scripts that would help readers reproduce any data-processing steps (see our code availability policy). In addition, authors must ensure that the version of the data described and analysed in the Data Descriptor is permanently available so that others can reproduce any statistical analyses.
Authors are encouraged to summarize their datasets with descriptive statistics in the Technical Validation section, which should include the n value for each dataset; a clearly labelled measure of centre (such as the mean or the median); and a clearly labelled measure of variability (such as standard deviation or range). Ranges are more appropriate than standard deviations or standard errors for small datasets. Graphs should include clearly labelled error bars. Authors must state whether a number that follows the ± sign is a standard error (s.e.m.) or a standard deviation (s.d.).
Authors must clearly explain the independence of any replicate measurements, and ‘technical replicates’ – repeated measurements on the same sample – should be clearly identified.
Data Descriptors should not test new hypotheses or provide extensive interpretive analysis, and therefore should not usually contain statistical significance testing. When hypothesis-based tests must be employed, authors should state the name of the statistical test; the n value for each statistical analysis; the comparisons of interest; a justification for the use of that test (including, for example, a discussion of the normality of the data when the test is appropriate only for normal data); the alpha level for all tests, whether the tests were one-tailed or two-tailed; and the actual p-value for each test (not merely ‘significant’ or ‘p < 0.05’). It should be clear what statistical test was used to generate every p-value. Use of the word ‘significant’ should always be accompanied by a p-value; otherwise, use ‘substantial’, ‘considerable’, etc. Multiple test correction must be used when appropriate and described in detail in the manuscript.
Please also see our specific recommendations for figure legends.
Genetic & chemical nomenclature
Molecular structures are identified by bold Arabic numerals assigned in order of presentation in the text. Once identified in the main text or a figure, compounds may be referred to by their name, by a defined abbreviation or by the bold Arabic numeral (as long as the compound is referred to consistently as one of these three).
When possible, authors should refer to chemical compounds and biomolecules using systematic nomenclature, preferably using IUPAC. Standard chemical and biological abbreviations should be used. Unconventional or specialist abbreviations should be defined at their first occurrence in the text.
Authors should use approved nomenclature for gene symbols, and use symbols rather than italicized full names (for example Ttn, not titin). Please consult the appropriate nomenclature databases for correct gene names and symbols. A useful resource is NCBI Gene.
Approved human gene symbols are provided by HUGO Gene Nomenclature Committee (HGNC; e-mail: email@example.com); see also www.genenames.org. Approved mouse symbols are provided by The Jackson Laboratory (e-mail: firstname.lastname@example.org); see also www.informatics.jax.org/mgihome/nomen.
For proposed gene names that are not already approved, please submit the gene symbols to the appropriate nomenclature committees as soon as possible, as these must be deposited and approved before publication of an article.
Avoid listing multiple names of genes (or proteins) separated by a slash, as in ‘Oct4/Pou5f1’, as this is ambiguous (it could mean a ratio, a complex, alternative names or different subunits). Use one name throughout and include the other at first mention: ‘Oct4 (also known as Pou5f1)’.
Instructions for LaTeX users
To assist with formatting, we encourage authors to use the LaTeX Data Descriptor template provided by Overleaf. Authors submitting LaTeX files may use any of the standard class files such as article.cls, revtex.cls or amsart.cls. Non-standard fonts should be avoided; please use the default Computer Modern fonts. For the inclusion of graphics, we recommend graphicx.sty. Please use numerical references only for citations. There is no need to spend time visually formatting the manuscript: the Scientific Data style will be imposed when the paper is prepared for publication. References should be included within the manuscript file itself as our system cannot accept BibTeX bibliography files; authors who wish to use BibTeX to prepare their references should therefore copy the reference list from the .bbl file that BibTeX generates and paste it into the main manuscript .tex file (and delete the associated \bibliography and \bibliographystyle commands). As a final precaution, authors should ensure that the complete .tex file compiles successfully on their own system with no errors or warnings before submission.
If a consortium is included in the main author list, all members of the consortium are considered bona fide authors, and must be listed together with their affiliations at the end of the Author Contributions statement. The authors and affiliations for the consortium members are an extension of the main author list. Therefore any affiliations already included in the main author list should not be repeated in the Author Contributions statement and the numbering of the affiliations in the consortium should continue in numerical order from those in the main author list – they should not start again from 1. If a member of the consortium already appears as an individual name in the main author list, then his/her affiliations should be identical in the consortium author list. The consortia itself should be acknowledged with the footnote "A full list of members appears in the Author Contributions". If you need to give credit to a consortium, a project or a group of people who do not meet authorship criteria, you can add a mention in the Acknowledgements section or elsewhere (in which case, a full list of members can be provided as a Supplementary Note in the Supplementary Information, if desired).