Submission Guidelines

This page contains detailed information to help authors prepare, format and submit a manuscript. Please see our guide to authors for additional information and policies relevant to authors.

Contents

Choose a content-type

Select a repository for your data

Download a template

Draft your manuscript

    Titles & Abstracts
    Authors & Affiliations
    Background & Summary
    Methods
    Data Records
    Technical Validation
    Usage Notes
    Acknowledgements, Author Contributions & Competing Interests
    References
    Data Citations
    Figures & Tables

Check methods for transparency and reproducibility

Write a cover letter

Submit

Additional Guidance

    Figures
    Tables
    Submitting experimental metadata
    Equations
    Supplementary information
    Statistical guidelines
    Genetic & chemical nomenclature
    Instructions for LaTeX users
    Consortia authorships
 

Choose a content-type 

Data Descriptor Analysis Article Comment
Scope Detailed descriptions of research datasets, which focus on helping others reuse data, rather than testing hypotheses or presenting new interpretations A new analysis or meta-analysis of existing data, which highlights innovative examples of data reuse or presents compelling new findings Original reports on systems or techniques that clearly advance data sharing and reuse to support reproducible research Flexible format used to publish brief opinions, commentaries and announcements of interest to a broad section of the journal’s readership
Peer-reviewed? Yes Yes Yes Editors' discretion
Format outline
  • Abstract
  • Background & Summary
  • Methods
  • Data Records
  • Technical Validation
  • Usage Notes (optional)
  • References
  • Data Citations
  • Abstract
  • Introduction
  • Results
  • Discussion
  • Methods
  • References
  • Data Citations (optional)
  • Abstract
  • Introduction
  • Results
  • Discussion
  • Methods
  • References
  • Data Citations (optional)
  • Abstract
  • Comment
  • References (max. 25)

Read more about our Aims & Scope and content-types in our guide to authors. To publish in Scientific Data authors are required to pay an article-processing charge (APC), regardless of the selected content-type.

 

Select a repository for your data 

When submitting a Data Descriptor, authors must deposit all relevant datasets in an appropriate public repository prior submission, and the completeness of these datasets will be considered during editorial evaluation and peer-review. Datasets must be made publicly available without restriction in the event that the Data Descriptor is accepted for publication (excepting reasonable controls related to human privacy issues or public safety).

Browse our list of recommended data repositories, and read our full data deposition policies. Authors may also upload their data to figshare or to Dryad during manuscript submission (find out more here).

 

Download a template 

Authors are encouraged to use our templates when preparing a Data Descriptor manuscript. We provide manuscript templates in Word (doc | docx), and Excel templates to help authors provide detailed information about their samples, methods and data outputs (xls | xlsx). The Word and Excel templates can also be downloaded in a single zip package.

LaTeX template is provided by Overleaf. Authors may download this template and use it locally, or draft and submit their manuscript through the Overleaf online collaborative writing system. See our additional instructions for LaTeX users.

 

Draft your manuscript 

All submissions should be clearly written, and understandable by scientists from diverse backgrounds, not just specialists. Technical jargon should be avoided as far as possible and clearly explained where its use is necessary. Titles and abstracts, in particular, should be written in language that will be readily intelligible to any scientist. We strongly recommend that authors ask a colleague with different expertise to review the manuscript before submission, in order to identify concepts and terminology that may present difficulties for non-specialist readers. Abbreviations, particularly those that are non-standard, should also be kept to a minimum and, where unavoidable, should be defined in the text or legends at their first occurrence.

Manuscripts published in Scientific Data are not subject to in-depth copy editing. Authors are responsible for procuring copy editing or language editing services for their manuscripts, either before submission, or at the revision stage, should they feel it would benefit their manuscript. Such services include those provided by our affiliates Nature Research Editing Service and American Journal Experts. Please note that the use of such a service is at the author's own expense and in no way implies that the article will be selected for peer review or accepted for publication.

The information below outlines the main sections of our main content-type, the Data Descriptor. Please see our Data Descriptor template for more information on the kinds of content that should be included in these sections. Articles and Analyses follow a more traditional research article format, and therefore do not include all of these sections. See the table above for the sections required for each content-type.

Titles & Abstracts

Titles may not exceed 110 characters, including whitespaces. They should avoid the use of acronyms, abbreviations, and unnecessary punctuation where possible. Colons and parentheses are not permitted.

The 'Abstract' may not exceed 170 words, and should not include references. They should succinctly describe the study, the assay(s) performed, the resulting data and their reuse potential; it should not make any claims regarding new scientific findings. No references are allowed in this section.

Authors & Affiliations

Author affiliations should provide enough detail for the author to be reached, including the department, institution, address, postal code and country wherever possible. They will be cited in numerical order within the author list, starting with the affiliations of the first author. Authors may acknowledge up to six equally contributing authors and up to six joint supervisors within the affiliations list using the standard footnotes "These authors contributed equally to this work" and "These authors jointly supervised this work". All other contributions should be described in the author contributions statement.

Background & Summary

This section should provide an overview of the study that generated the data, as well as outlining the potential reuse value of the data. Any previous publications that used these data, in whole or in part, should be cited and briefly summarized.

Methods

The Methods section in Data Descriptors should describe any steps or procedures used in producing the data, including full descriptions of the experimental design, data acquisition assays, and any computational processing (e.g. normalization, image feature extraction). Specific data outputs should be explicitly referenced via our data citation format. See our detailed guidance for providing reproducible methods descriptions in Step 5.

Data Records

This section should be used to explain each data record associated with this work, including the repository where this information is stored, and to provide an overview of the data files and their formats. Each external data record should be cited using our data citation format, e.g. "Data resulting from method X can be found in xxxxx.txt (Data Citation 1)".

Technical Validation

This section should present any experiments or analyses that are needed to support the technical quality of the dataset. This section may be supported by figures and tables, as needed.

Usage Notes

'Usage Notes' is an optional section that can be used to provide information that may assist other researchers who reuse your data.

Acknowledgements, Author Contributions & Competing Interests

Data Descriptors, Articles and Analyses must include Acknowledgements, Authors contributions & Competing interest statements immediately before the References. Comments do not require an author contribution statement.

The 'Acknowledgements' statement should contain text acknowledging non-author contributors. Acknowledgements should be brief, and should not include thanks to anonymous referees and editors or effusive comments. Grant or contribution numbers may be acknowledged.

The 'Author contributions' statement should briefly describe each author's contribution to the work. Please see also the Nature journals' authorship policies.

A 'Competing interests' statement is required for all papers accepted by and published in Scientific Data. If there is no conflict of interest, a statement declaring this must still be included in the manuscript (e.g. "The author(s) declare no competing financial interests"). Please see our policies for more information on what may constitute a competing interest.

References

References should be numbered sequentially, first throughout the text, then in tables, followed by figures and, finally, boxes; that is, references that only appear in tables, figures or boxes should be last in the reference list. Only one publication is given for each number. Only papers that have been published or accepted by a named publication or recognized preprint server should be in the numbered list; preprints of accepted papers in the reference list should be submitted with the manuscript. Published conference abstracts, numbered patents, and archived code with an assigned DOI may be included in the reference list. Grant details and acknowledgments are not permitted as numbered references. Footnotes are not used.

BibTeX bibliography files cannot be accepted. LaTeX submissions must contain all references within the manuscript .tex file itself. See our instructions for LaTeX users for more details.

The correct abbreviation for Scientific Data is 'Sci. Data'.

Scientific Data uses standard Nature referencing style. All authors should be included in reference lists unless there are six or more, in which case only the first author should be given, followed by ‘et al.’. Authors should be listed last name first, followed by a comma and initials (followed by full stops, '.') of given names. Article titles should be in Roman text; only the first word of the title should have an initial capital and the title should be written exactly as it appears in the work cited, ending with a full stop. Book titles should be given in italics and all words in the title should have initial capitals. Journal names are italicized and abbreviated (with full stops) according to common usage. Volume numbers and the subsequent comma appear in bold. The full page range should be given where appropriate. See the examples below for a journal article1, book2, book chapter3, preprint4, computer code5, online material6 and government report7.

  1. Schott, D. H., Collins, R. N. & Bretscher, A. Secretory vesicle transport velocity in living cells depends on the myosin V lever arm length. J. Cell Biol. 156, 35‐39 (2002).
  2. Hogan, B. Manipulating The Mouse Embryo: A Laboratory Manual 2nd edn (Cold Spring Harbor Laboratory Press, 1994)
  3. Haines, N. & Cotter, R. in Studies in Manic Depression Vol. 1 (ed. Boase, N.) Ch. 2 (Oxford Univ. Press, 1982).
  4. Babichev, S. A., Ries, J. & Lvovsky, A. I. Quantum scissors: teleportation of single-mode optical states by means of nonlocal single photon. Preprint at https://arxiv.org/abs/quant-ph/0208066(2002).
  5. Gallotti, R. & Barthélemy, M. Source code for: The multilayer temporal network of public transport in Great Britain. Figshare https://doi.org/10.6084/m9.figshare.1249862.v1(2014).
  6. Manaster, J. Sloth squeak. Scientific American Blog Network http://blogs.scientificamerican.com/psi-vid/2014/04/09/sloth-squeak (2014).
  7. Akutsu, T. Total Heart Replacement Device. Report No. NIH-NHLI-69 2185-4 (National Institutes of Health, 1974).

Data Citations

Data citations provide bibliographic information for any data records described or used in the manuscript. Comments do not have a separate data citations section, but may cite datasets with unique identifiers in the main reference list. Data citations should include a list of the authors, using the format described above for literature references. This should be followed by the repository name (in italics), the dataset accession number or DOI, and the year the data record was released, in parentheses. Author names should not be included unless they are formally recorded by the repository on the dataset landing page. Dataset titles are not allowed. See the examples below:

  1. Perkins, A. D., Lee, M., & Tanentzapf, G. Figshare https://doi.org/10.6084/m9.figshare.806269 (2014).
  2. GenBank PRJNA244495 (2014).

Figures & Tables

Manuscripts may reference figures (e.g. Figure 1), tables (e.g. Table 1) and Supplementary Information (e.g. Supplementary Table 1, Supplementary File 2, etc.). Please see the additional guidance below for submitting figures, tables and supplementary information.

All Data Descriptors should include one or more tables detailing the inputs (e.g. tissue samples, field sites, literature sources) and outputs (e.g. data files) that comprise the presented study. See the section below on "Submitting experimental metadata" for more information.

 

Check methods for transparency and reproducibility 

Methods should be described in enough detail to allow other researchers to interpret and repeat, if required, the full study. Authors should cite previous descriptions of the methods under use, but ideally the method descriptions should be complete enough for others to understand and reproduce the methods and processing steps without referring to associated publications. There is no limit to the length of the Methods sections.

For Data Descriptors, the Methods section should describe any steps or procedures used in producing the data, including full descriptions of the experimental design, data acquisition assays, and any computational processing (e.g. normalization, image feature extraction). Specific data outputs should be explicitly referenced via data citation (see Data Records and Data Citations).

Authors should review the transparent methods checklist below, and ensure that their manuscript complies with any relevant points. Authors are also encouraged to search FAIRsharing.org for community reporting standards that may be relevant to their specific data-type.

Transparent Methods Checklist

  1. Materials & reagents:
    • Identify commercial suppliers of reagents, instrumentation or kits, when the source is critical to the outcome of the experiments.
    • Declare any restrictions on the availability of unique materials (more information here).
    • Provide catalogue or clone numbers for all antibodies (if available). For primary antibodies, provide proof of validation for the relevant species and applications.
  2. Exclusion criteria: If any data or samples were excluded, explain the exclusion criteria and state in the methods whether the criteria were established before the study was conducted.
    • Randomization & blinding: For any studies that involve assigning samples, animals or participants into different groups:
      • State clearly whether randomization methods were used. If randomization was not employed, this should be clearly stated.
      • State clearly whether blinding was employed during data collection. If blinding was not employed, this should be clearly stated.
    • Animal & human studies (full journal policies here):
      • Experiments involving human subjects must identify the committee approving the experiments, and include a statement confirming that informed consent was obtained from all subjects.
      • Studies employing nonhuman animals should ensure that methods descriptions comply with the ARRIVE checklist.
    • Cell lines:
      • For each eukaryotic cell line used, state the source and whether the cell line has been authenticated or otherwise tested for integrity.
      • If any commonly misidentified cell lines were used (see ICLAC or NCBI Biosample), justify their use.
      • Report whether the cell lines were tested for mycoplasma contamination.
    • Code availability: For all studies using custom code a statement must be included in the Methods section, under the subheading "Code availability", indicating whether and how the code can be accessed, including any restrictions to access. This section should also include information on the versions of any software used, if relevant, and any specific variables or parameters used to generate, test, or process the current dataset. Please see our policy on code availability for more information.

    • Chemistry & materials science: Manuscripts describing chemical syntheses, or characterizing new chemicals or materials should refer to the guidance at Nature Chemistry.

     

    Write a cover letter 

    Authors should provide a cover letter that includes the affiliation and contact information for the corresponding author, and briefly explains why the work should be considered appropriate for . Authors are asked to suggest the names and contact information for scientific referees, and may include suggestions for Editorial Board Members, as well as requesting the exclusion of certain referees. Authors should indicate whether they have had any prior discussions with a Scientific Data Editorial Board Member about the work described in the manuscript.

    We also ask that authors discuss any related works under consideration or in press at other journals in their cover letter. If this related work is cited in their Scientific Data submission, authors must provide a copy to facilitate peer review.

     

    Submit 

    Submit your manuscript and related files via our online system.

    For first submissions (i.e. not revised manuscripts), authors may submit a single PDF with integrated figures and tables – the figures may be inserted within the text at the appropriate positions, or grouped at the end.

    Authors should note that only the following file types should be uploaded:

    • For article text: DOC, DOCX, TEX
    • For figures: PDF, EPS, TIFF, JPG
    • For tables: tab- or comma-delimited text, XLS, XLSX, DOC, DOCX
    Supplementary Information files may also be uploaded: see further guidance here.

    Additional Guidance 

    Figures

    Data Descriptors, Analyses and Articles should usually not have more than three figures, but additional figures can be allowed on a case-by-case basis. In addition, a limited number of uncaptioned molecular structure graphics and numbered mathematical equations may be included if necessary.

    Scientific Data requires authors to present digital images in accord with the policies employed by the Nature-titled journals.

    Authors are responsible for obtaining permission to publish any figures or illustrations that are protected by copyright, including figures published elsewhere and pictures taken by professional photographers. The journal cannot publish images downloaded from the Internet without appropriate permission.

    Figures should be numbered separately with Arabic numerals in the order of occurrence in the text of the manuscript. One- or two-column format figures are required. When appropriate, figures should include error bars. A description of the statistical treatment of error analysis should be included in the figure legend.

    Figure lettering should be in a clear, sans-serif typeface (for example, Helvetica); the same typeface in the same font size should be used for all figures in a paper. Use Symbol font for Greek letters. All display items should be on a white background, and should avoid excessive boxing, unnecessary colour, spurious decorative effects (such as three-dimensional ‘skyscraper’ histograms) and highly pixelated computer drawings. The vertical axis of histograms should not be truncated to exaggerate small differences. Labelling must be of sufficient size and contrast to be readable, even after appropriate reduction. The thinnest lines in the final figure should be no smaller than one point wide. Authors will see a PDF proof that will include figures.

    Figures divided into parts should be labelled with a lower-case bold a, b, and so on, in the same type-size as used elsewhere in the figure. Lettering in figures should be in lower-case type, with only the first letter of each label capitalized. Units should have a single space between the number and the unit, and follow SI nomenclature (for example, ms rather than msec) or the nomenclature common to a particular field. Thousands should be separated by commas (1,000). Unusual units or abbreviations should be spelled out in full or defined in the legend. Scale bars should be used rather than magnification factors, with the length of the bar defined on the bar itself rather than in the legend. In legends, please use visual cues rather than verbal explanations such as ‘open red triangles’.

    Unnecessary figures should be avoided: data presented in small tables or histograms, for instance, can generally be described briefly in the text instead. Figures should not contain more than one panel unless the parts are logically connected; each panel of a multipart figure should be sized so that the whole figure can be reduced by the same amount and reproduced at the smallest size at which essential details are visible.

    Figures for peer-review

    At the initial submission stage authors may choose to upload separate figure files or to incorporate figures into the main article file, ensuring that any inserted figures are of sufficient quality to be clearly legible. When submitting a revised manuscript all figures must be uploaded as separate figure files ensuring that the image quality and formatting conforms to the specifications below.

    Figures for publication

    When creating and submitting final figure files, please follow the guidelines below. Failure to do so can significantly delay publication of your work.

    Each complete figure must be supplied as a separate file upload. Multi-part/panel figures must be prepared and arranged as a single image file (including all sub-parts; a, b, c, etc.). Please do not upload each panel individually.

    We encourage authors to prepare their figures in a quality vector graphics software package, such as Adobe Illustrator or Inkscape. Figures should then be saved directly in the EPS format. When importing graphs or schematics from other programs, authors are encouraged to remake any text labels in a vector graphics program to ensure consistent quality.

    Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

    1. Line art, graphs, charts and schematics
    For optimal results, all line art, graphs, charts and schematics should be supplied in vector format, such as EPS, PDF or AI, and should be saved or exported as such directly from the application in which they were made. Please ensure that data points and axis labels are clearly legible.

    2. Photographic and bitmapped images
    All photographic and bitmap images should be supplied in a bitmap image format such as TIFF, JPG or PSD. If saving TIFF files, please ensure that the compression option is selected to avoid very large file sizes.

    Please do not supply Word or Powerpoint files with placed images. Images can be supplied as RGB or CMYK (note: we will not convert image colour modes).

    Figures that do not meet these standards will not reproduce well and may delay publication until we receive high-resolution images.

    3. Chemical structures
    Chemical structures should be produced using ChemDraw or a similar program. All chemical compounds must be assigned a bold Arabic numeral in the order in which the compounds are presented in the manuscript text. Structures should then be exported into a 300 dpi RGB TIFF file before being submitted.

    4. Stereo images
    Stereo diagrams should be presented for divergent ‘wall-eyed’ viewing, with the two panels separated by 5.5 cm. In the final accepted version of the manuscript, the stereo images should be submitted at their final page size.

    Figure legends

    Figure legends begin with a brief title sentence summarizing the purpose of the figure as a whole and continue with a short description of what is shown in each panel and an explanation of any symbols used. Legends must total no more than 350 words and may contain literature references.

    Each figure legend should contain, for each panel where they are relevant:

    • the exact sample size (n) for each experimental group/condition, given as a number, not a range;
    • a description of the sample collection allowing the reader to understand the independence of samples, clearly identifying any ‘technical replicates’ – i.e., repeated measurements on the same sample;
    • a statement of how many times the experiment shown was replicated in the laboratory;
    • definitions of statistical methods and measures: very common tests, such as t-tests, simple χ2 tests, Wilcoxon and Mann-Whitney tests can be unambiguously identified by name only, but more complex techniques should be described in the Methods section;
    • definition of ‘centre values’ as median or average;
    • definition of error bars as s.d. or s.e.m.
    Any descriptions too long for the figure legend should be included in the Methods section. Please also refer to our statistical guidelines.

    Tables

    Authors may provide tables within the Word document or as separate files (tab-delimited text or Excel files). Legends, where needed, should be included in the Word document. Generally, Data Descriptors, Analyses and Articles should have fewer than ten tables, but more may be allowed when needed. Tables may be of any size, but only tables that fit onto a single printed page will be included in the PDF version of the article (up to a maximum of three).

    All Data Descriptors should include one or more tables detailing the inputs (e.g. tissue samples, field sites, literature sources) and outputs (e.g. data files) for the presented study. See the section below on "Submitting experimental metadata" for more information.

    Submitting experimental metadata

    Every Data Descriptor published by Scientific Data includes a machine-accessible metadata file. This metadata record provides a structured description of the dataset, including key features of the experimental samples and the techniques used to generate the data. Metadata is captured and distributed in ISA-Tab format, which is designed to capture descriptions of research data across disciplines.

    Using the ISA format, we aim to capture the following five key attributes about each published dataset in a structured and machine-accessible way, to maximise data discoverability:

    1. Source(s) ‐ What were your starting materials or inputs? These may be physical objects (e.g. mice or chemicals), or digital objects (e.g. published articles or data sources).
    2. Sample ‐ What part of each Source was used in the study?
    3. Characteristics ‐ What would future users of your data need to know about your sources and samples? Clearly list the differences between distinct samples.
    4. Protocol ‐ How did the samples become data? The protocols listed in the metadata record should match the sub-headings in the Methods section of the manuscript.
    5. Data ‐ Where is the data? This is a machine-accessible representation of the Data Citation section, and should include the repository name and each distinct dataset ID, clearly related to the sample from which it was derived.

    The final ISA-Tab metadata files will be finalised with the help of our in-house Data Editor, when the Data Descriptor is accepted for publication. Professional curation helps to ensure the use of consistent and standardized annotation using a core set of community ontologies, to facilitate machine accessibility. The metadata files form the basis of the search and discovery features that are incorporated into Scientific Data's publication platform.

    Authors that are familiar with the ISA-tab format or who wish to draft these files using third-party applications (e.g. ISA tools), are welcome to submit these directly as part of the manuscript submission process. Please also refer to our detailed ISA-tab metadata specification.

    Authors who are not providing an ISA-Tab file with their manuscript submission are strongly encouraged to submit tables detailing the attributes above, to facilitate metadata creation by our in-house curator. These tables should be included in the main manuscript (e.g. Table 1). The tables should be provided as one or more tab-delimited text or Excel tables at initial submission, and be referenced in the main manuscript. We provide generic examples of metadata tables or experimental, observational and aggregate study types in our Word article templates (doc | docx).

    Where human data are involved, we recognize that privacy controls may preclude highly detailed descriptions of patients or participants within metadata records. Please make sure that any privacy-related limitations on data-sharing are discussed in the cover letter of your submission.

    The ISA-Tab metadata records are published under the CC0 licence, allowing other users and resources to ingest and mine this information without restriction. Please note that the metadata records are a value-added product and are not considered part of the 'version of record' of published articles. Therefore the ISA-Tab metadata files may be updated from time to time; for instance, to reflect changes in metadata formats or community ontologies.

    Equations

    Equations and mathematical expressions should be provided in the main text of the paper. Equations that are referred to in the text are identified by parenthetical numbers, such as (1), and are referred to in the manuscript as ‘equation (1)’.

    Supplementary information

    Scientific Data discourages authors from supplying text, figures or tables as supplementary files. As much as possible, these types of content should be included in the main manuscript. The main sections of the Data Descriptor manuscript, and particularly the Methods section, have no length limits. Data Descriptors are designed to be focused publications: if extensive supplementary text or figures are required, authors should consider whether the manuscript might best be subdivided into multiple Data Descriptors. Similarly, any primary data files should be deposited in an appropriate public repository, rather than included as Supplementary Information. Scientific Data does not allow statements of ‘data not shown’. Please see our data deposition policies.

    With these restrictions in mind, authors may use Supplementary Information for any additional content needed to support the Data Descriptor, such as media (e.g. audio or video), or machine-readable versions of mathematical models. Authors may supply code and computational models as Supplementary Information, particularly for initial submissions. However, upon acceptance of a manuscript, we encourage the public archiving of code (through a DOI-issuing repository); and computational models (in field specific computational model repositories). See our code availability policy for more information.

    The guidelines below detail the creation, citation and submission of Supplementary Information. Publication may be delayed if these are not followed correctly. Please note that modification of Supplementary Information after the paper is published requires a formal correction, so authors are encouraged to check their Supplementary Information carefully before submitting the final version.

    1. Designate each item as a Supplementary File and number accordingly: for example, ‘Supplementary File 1’. This numbering should be separate from that used in tables and figures appearing in the main article.
    2. Refer to each piece of supplementary material at the appropriate point(s) in the Data Descriptor. Be sure to include the word ‘Supplementary’ each time one is mentioned. Every piece of Supplementary Information must be mentioned at least once in the main article.
    3. Remember to include a brief title and legend (incorporated into the file to appear near the image) as part of every figure submitted, and a title as part of every table.
    4. File sizes should be as small as possible, with a maximum size of 10 MB, so that they can be downloaded quickly.
    5. When supplying multiple supplementary figures, they should be merged into a single PDF file, with figure legends immediately below each figure. A table of contents should be included on the first page, listing the page number of each supplementary figure.

    Statistical guidelines

    Every submission that contains statistical analyses or data-processing steps must explain the statistical methods in detail either in the Methods or the relevant figure legend. Any special statistical code or software needed for scientists to reuse or reanalyse datasets should be discussed in the Usage Notes section of Data Descriptors. We encourage authors to make openly available any code or scripts that would help readers reproduce any data-processing steps (see our code availability policy). In addition, authors must ensure that the version of the data described and analysed in the Data Descriptor is permanently available so that others can reproduce any statistical analyses.

    Authors are encouraged to summarize their datasets with descriptive statistics in the Technical Validation section, which should include the n value for each dataset; a clearly labelled measure of centre (such as the mean or the median); and a clearly labelled measure of variability (such as standard deviation or range). Ranges are more appropriate than standard deviations or standard errors for small datasets. Graphs should include clearly labelled error bars. Authors must state whether a number that follows the ± sign is a standard error (s.e.m.) or a standard deviation (s.d.).

    Authors must clearly explain the independence of any replicate measurements, and ‘technical replicates’ – repeated measurements on the same sample – should be clearly identified.

    Data Descriptors should not test new hypotheses or provide extensive interpretive analysis, and therefore should not usually contain statistical significance testing. When hypothesis-based tests must be employed, authors should state the name of the statistical test; the n value for each statistical analysis; the comparisons of interest; a justification for the use of that test (including, for example, a discussion of the normality of the data when the test is appropriate only for normal data); the alpha level for all tests, whether the tests were one-tailed or two-tailed; and the actual p-value for each test (not merely ‘significant’ or ‘p < 0.05’). It should be clear what statistical test was used to generate every p-value. Use of the word ‘significant’ should always be accompanied by a p-value; otherwise, use ‘substantial’, ‘considerable’, etc. Multiple test correction must be used when appropriate and described in detail in the manuscript.

    Please also see our specific recommendations for figure legends.

    Genetic & chemical nomenclature

    Molecular structures are identified by bold Arabic numerals assigned in order of presentation in the text. Once identified in the main text or a figure, compounds may be referred to by their name, by a defined abbreviation or by the bold Arabic numeral (as long as the compound is referred to consistently as one of these three).

    When possible, authors should refer to chemical compounds and biomolecules using systematic nomenclature, preferably using IUPAC. Standard chemical and biological abbreviations should be used. Unconventional or specialist abbreviations should be defined at their first occurrence in the text.

    Authors should use approved nomenclature for gene symbols, and use symbols rather than italicized full names (for example Ttn, not titin). Please consult the appropriate nomenclature databases for correct gene names and symbols. A useful resource is NCBI Gene.

    Approved human gene symbols are provided by HUGO Gene Nomenclature Committee (HGNC; e-mail: hgnc@genenames.org); see also www.genenames.org. Approved mouse symbols are provided by The Jackson Laboratory (e-mail: nomen@informatics.jax.org); see also www.informatics.jax.org/mgihome/nomen.

    For proposed gene names that are not already approved, please submit the gene symbols to the appropriate nomenclature committees as soon as possible, as these must be deposited and approved before publication of an article.

    Avoid listing multiple names of genes (or proteins) separated by a slash, as in ‘Oct4/Pou5f1’, as this is ambiguous (it could mean a ratio, a complex, alternative names or different subunits). Use one name throughout and include the other at first mention: ‘Oct4 (also known as Pou5f1)’.

    Instructions for LaTeX users

    To assist with formatting, we encourage authors to use the LaTeX Data Descriptor template provided by Overleaf. Authors submitting LaTeX files may use any of the standard class files such as article.cls, revtex.cls or amsart.cls. Non-standard fonts should be avoided; please use the default Computer Modern fonts. For the inclusion of graphics, we recommend graphicx.sty. Please use numerical references only for citations. There is no need to spend time visually formatting the manuscript: the Scientific Data style will be imposed when the paper is prepared for publication. References should be included within the manuscript file itself as our system cannot accept BibTeX bibliography files; authors who wish to use BibTeX to prepare their references should therefore copy the reference list from the .bbl file that BibTeX generates and paste it into the main manuscript .tex file (and delete the associated \bibliography and \bibliographystyle commands). As a final precaution, authors should ensure that the complete .tex file compiles successfully on their own system with no errors or warnings before submission.

    Consortia authorships

    If a consortium is included in the main author list, all members of the consortium are considered bona fide authors, and must be listed together with their affiliations at the end of the Author Contributions statement. The authors and affiliations for the consortium members are an extension of the main author list. Therefore any affiliations already included in the main author list should not be repeated in the Author Contributions statement and the numbering of the affiliations in the consortium should continue in numerical order from those in the main author list – they should not start again from 1. If a member of the consortium already appears as an individual name in the main author list, then his/her affiliations should be identical in the consortium author list. The consortia itself should be acknowledged with the footnote "A full list of members appears in the Author Contributions". If you need to give credit to a consortium, a project or a group of people who do not meet authorship criteria, you can add a mention in the Acknowledgements section or elsewhere (in which case, a full list of members can be provided as a Supplementary Note in the Supplementary Information, if desired).