This page contains detailed information to help authors prepare, format and submit a Data Descriptor manuscript. Please see our guide to authors for additional information and policies relevant to authors.

Submission Guidelines

Scope guidelines

Data Descriptors submitted to Scientific Data should provide detailed descriptions of valuable research datasets, including the methods used to collect the data and technical analyses supporting the quality of the measurements. Data Descriptors focus on helping others reuse data, rather than testing hypotheses, or presenting new interpretations, methods or in-depth analyses. Relevant datasets must be deposited in an appropriate public repository prior to Data Descriptor submission, and the completeness of these datasets will be considered during editorial evaluation and peer-review. Datasets must be made publicly available without restriction in the event that the Data Descriptor is accepted for publication (excepting reasonable controls related to human privacy issues or public safety).

Data Descriptors are designed to be focused publications, but may describe multiple datasets when those datasets are closely related and derived from common samples or subjects. For example, a single Data Descriptor might describe genomic, transcriptomic and proteomic measurements from a set of related biological samples. The Editors reserve the right to ask authors to subdivide manuscripts if they exceed our scope definitions for a single coherent study, or would be too long to reasonably peer-review as a single document.

Scientific Data also welcomes compelling reports on research that advances the sharing and reuse of scientific data. Learn more about our other content types.

Preparing and submitting a Data Descriptor manuscript

Please follow the steps below when preparing initial submissions to Scientific Data:

  1. Deposit your data in an appropriate repository. Browse our list of recommended data repositories, and read our full data deposition policies. Authors may also upload their data to figshare or to Dryad during manuscript submission (find out more here).
  2. Draft the main Data Descriptor manuscript. See our manuscript templates and format requirements. We encourage authors to draft tables describing their samples, assays and data outputs (see our metadata guidelines).
  3. Submit your manuscript and related files via our online system.

For first submissions (i.e. not revised manuscripts), authors may submit a single PDF with integrated figures and tables – the figures may be inserted within the text at the appropriate positions, or grouped at the end

Authors should note that only the following file types can be uploaded:

  • For article text: DOC, DOCX, TEX
  • For figures: PDF, EPS, TIFF, JPG
  • For tables: tab- or comma-delimited text, XLS, XLSX.

Supplementary Information files may also be uploaded: see allowed file types here.

Data Descriptors must be clearly written, and should be understandable by scientists from diverse backgrounds, not just specialists. Technical jargon should be avoided as far as possible and clearly explained where its use is necessary. Titles and abstracts in particular should be written in language that will be readily intelligible to any scientist. We strongly recommend that authors ask a colleague with different expertise to review the manuscript before submission, in order to identify concepts and terminology that may present difficulties for non-specialist readers. Abbreviations, particularly those that are non-standard, should also be kept to a minimum and, where unavoidable, should be defined in the text or legends at their first occurrence.

Manuscripts published in Scientific Data are not subject to in-depth copy editing. Authors are responsible for procuring copy editing or language editing services for their manuscripts, either before submission, or at the revision stage, should they feel it would benefit their manuscript. Such services include those provided by our affiliates Nature Research Editing Service and American Journal Experts. Please note that the use of such a service is at the author's own expense and in no way implies that the article will be selected for peer review or accepted for publication.


To assist with formatting, we encourage authors to use the LaTeX Data Descriptor template provided by Overleaf.

Authors submitting LaTeX files may use any of the standard class files such as article.cls, revtex.cls or amsart.cls. Non-standard fonts should be avoided; please use the default Computer Modern fonts. For the inclusion of graphics, we recommend graphicx.sty. Please use numerical references only for citations. There is no need to spend time visually formatting the manuscript: the Scientific Data style will be imposed when the paper is prepared for publication. References should be included within the manuscript file itself as our system cannot accept BibTeX bibliography files; authors who wish to use BibTeX to prepare their references should therefore copy the reference list from the .bbl file that BibTeX generates and paste it into the main manuscript .tex file (and delete the associated \bibliography and \bibliographystyle commands). As a final precaution, authors should ensure that the complete .tex file compiles successfully on their own system with no errors or warnings before submission.

Authors are encouraged to use our templates when preparing a Data Descriptor manuscript. We provide manuscript templates in Word (doc | docx), and Excel templates to help authors provide detailed information about their samples, methods and data outputs (xls | xlsx). The Word and Excel templates can also be downloaded in a single zip package.

A LaTeX template is provided by Overleaf. Authors may download this template and use it locally, or draft and submit their manuscript through the Overleaf online collaborative writing system. See our additional instructions for LaTeX users.

Cover letter

Authors should provide a cover letter that includes the affiliation and contact information for the corresponding author, and that briefly explains why the work should be considered appropriate for Scientific Data. Authors are asked to suggest the names and contact information for scientific referees, and may include suggestions for Editorial Board Members, as well as requesting the exclusion of certain referees. Authors should indicate whether they have had any prior discussions with a Scientific Data Editorial Board Member about the work described in the manuscript.

We also ask that authors discuss any related works under consideration or in press at other journals in their cover letter. If this related work is cited in their Scientific Data submission authors should provide a copy to facilitate peer review.

Format of Data Descriptor manuscripts

The main elements of a Data Descriptor manuscript are:
Title | Authors & Affiliations | Abstract | Background & Summary | Methods | Data Records | Technical Validation | Usage Notes | Acknowledgements | Author contributions | Competing interests | Figures & Tables | References | Data Citations


110 characters maximum, including whitespaces

Titles should avoid the use of acronyms and abbreviations where possible. Colons and parentheses are not permitted.

Authors & Affiliations

Author affiliations should provide enough detail for the author to be reached, including the department, institution, address, postal code and country wherever possible. They will be cited in numerical order within the author list, starting with the affiliations of the first author. Authors may acknowledge up to six equally contributing authors and up to six joint supervisors within the affiliations list using the standard footnotes "These authors contributed equally to this work" and "These authors jointly supervised this work". All other contributions should be described in the author contributions statement.

If a consortium is included in the main author list, all members of the consortium are considered bona fide authors, and must be listed together with their affiliations at the end of the Author Contributions statement. The authors and affiliations for the consortium members are an extension of the main author list. Therefore any affiliations already included in the main author list should not be repeated in the Author Contributions statement and the numbering of the affiliations in the consortium should continue in numerical order from those in the main author list - they should not start again from 1. If a member of the consortium already appears as an individual name in the main author list, then his/her affiliations should be identical in the consortium author list. The consortia itself should be acknowledged with the footnote "A full list of members appears in the Author Contributions". If you need to give credit to a consortium, a project or a group of people who do not meet authorship criteria, you can add a mention in the Acknowledgements section or elsewhere (in which case, a full list of members can be provided as a Supplementary Note in the Supplementary Information, if desired). For guidelines on authorship and consortia, please visit:


170 words maximum, no references

The Abstract should succinctly describe the study, the assay(s) performed, the resulting data and their reuse potential; it should not make any claims regarding new scientific findings. No references are allowed in this section.

Background & Summary

700 words maximum

The Background & Summary should provide an overview of the study design, the assay(s) performed and the data generated, including any background information needed to relate the work to previous studies or the literature, and should reference literature as needed. This section should also briefly outline both the broader goals that motivated collection of the data and their potential reuse value. We encourage authors to include a figure that provides a schematic overview of the study and assay(s') design.

The Methods should include detailed text describing any steps or procedures used in producing the data, including full descriptions of the experimental design, data acquisition assays, and any computational processing (e.g. normalization, image feature extraction). Related methods should be grouped under corresponding subheadings where possible, and methods should be described in enough detail to allow other researchers to interpret and repeat, if required, the full study. Specific data outputs should be explicitly referenced via data citation (see Data Records and Data Citations).

Authors should cite previous descriptions of the methods under use, but ideally the method descriptions should be complete enough for others to understand and reproduce the methods and processing steps without referring to associated publications. There is no limit to the length of the Methods section.

For all studies using custom code in the generation or processing of datasets, a statement must be included in the Methods section, under the subheading "Code availability", indicating whether and how the code can be accessed, including any restrictions to access. This section should also include information on the versions of any software used, if relevant, and any specific variables or parameters used to generate, test, or process the current dataset. Please see our policy on code availability for more information.

Commercial suppliers of reagents or instrumentation should be identified when the source is critical to the outcome of the experiments. Sources for kits should be identified.

If the experiments involve synthesis of a new compound, a detailed synthesis protocol should be included in the Methods. The systematic name of the compound and its bold Arabic numeral should be used as the subheading for the synthesis protocol. Thereafter, the compound should be referred to by its assigned bold numeral. Standard abbreviations for reagents and solvents are encouraged. Safety hazards posed by reagents or protocols should be identified clearly. Isolated mass and percent yields should be reported at the end of each protocol.

Data Records

The Data Records section should be used to explain each data record associated with this work, including the repository where this information is stored, and to provide an overview of the data files and their formats. Each external data record should be cited using the data citation format presented at the end of this template (e.g. "Data resulting from Method X can be found in xxxxx.txt [Data Citation 1]"). A data citation should also be placed in the subsection of the Methods containing the data-collection or analytical procedure(s) used to derive the corresponding record.

Tables should be used to support the data records, and should clearly indicate the samples and subjects, their provenance, and the experimental manipulations performed on each (please see Tables and Submitting Experimental Metadata, below). They should also specify the data output resulting from each data-collection or analytical step, should these form part of the archived record.

Technical Validation

The Technical Validation section should present any experiments or analyses that are needed to support the technical quality of the dataset. This section may be supported by figures and tables, as needed.

Possible content may include:

  • experiments that support or validate the data-collection procedure(s) (e.g. negative controls, or an analysis of standards to confirm measurement linearity);
  • statistical analyses of experimental error and variation;
  • phenotypic or genotypic assessments of biological samples (e.g. confirming disease status, cell-line identity or the success of perturbations);
  • general discussions of any procedures used to ensure reliable and unbiased data production, such as blinding and randomization, sample tracking systems, etc.;
  • any other information needed for assessment of technical rigour by the referees.

Generally, this should not include:

  • follow-up experiments aimed at testing or supporting an interpretation of the data;
  • statistical hypothesis-testing (e.g. tests of statistical significance, identifying differentially expressed genes, trend analysis, etc.);
  • exploratory computational analyses like clustering and annotation enrichment (e.g. GO analysis).

Usage Notes

This section is optional

The Usage Notes should contain brief instructions to assist other researchers with reuse of the data. This may include discussion of software packages that are suitable for analysing the assay data files, suggested downstream processing steps (e.g. normalization, etc.), or tips for integrating or comparing the data records with other datasets. Authors are encouraged to provide code, programs or data-processing workflows if they may help others understand or use the data. Please see our code availability policy for advice on supplying custom code alongside Data Descriptor manuscripts.

The Acknowledgements should contain text acknowledging non-author contributors. Acknowledgements should be brief, and should not include thanks to anonymous referees and editors or effusive comments. Grant or contribution numbers may be acknowledged.

Author Contributions

Each author's contribution to the work should be described briefly, on a separate line, in the Author Contributions section: please see also the Nature journals' authorship policies.

Competing interests

A competing financial interests statement is required for all papers accepted by and published in Scientific Data. If there is no conflict of interest, a statement declaring this must still be included in the manuscript (e.g. "The author(s) declare no competing financial interests"). Please see our policies for more information on what may constitute a competing interest.

Figures & Tables

The Data Descriptor document may reference figures (e.g. Figure 1), tables (e.g. Table 1) and Supplementary Information (e.g. Supplementary Table 1, Supplementary File 2, etc.). For ISA-Tab users, please reference the metadata documents as "see associated Metadata Record". In most cases, when information from metadata documents is central to the Data Descriptor manuscript, it should also be included in the main manuscript as Tables, and formatted in a way that suits human readability.


Figure images should be provided as separate files and should be referred to using a consistent numbering scheme through the entire Data Descriptor. In most cases, a Data Descriptor should not contain more than three figures, but more may be allowed when needed. We discourage the inclusion of figures in the Supplementary Information – all key figures should be included here in the main Figure section.

For initial submissions, authors may choose to supply a single PDF with embedded figures.

Authors are encouraged to consider creating a figure that outlines the experimental workflow(s) used to generate and analyse the data output(s).

Figure Legends

Figure legends begin with a brief title sentence summarizing the purpose of the figure as a whole and continue with a short description of what is shown in each panel and an explanation of any symbols used. Legends must total no more than 350 words and may contain literature references.

Each figure legend should contain, for each panel where they are relevant:

  • the exact sample size (n) for each experimental group/condition, given as a number, not a range;
  • a description of the sample collection allowing the reader to understand the independence of samples, clearly identifying any ‘technical replicates’ – i.e., repeated measurements on the same sample;
  • a statement of how many times the experiment shown was replicated in the laboratory;
  • definitions of statistical methods and measures: very common tests, such as t-tests, simple χ2 tests, Wilcoxon and Mann-Whitney tests can be unambiguously identified by name only, but more complex techniques should be described in the Methods section;
  • definition of ‘centre values’ as median or average;
  • definition of error bars as s.d. or s.e.m.

Any descriptions too long for the figure legend should be included in the Methods section. Please also refer to our statistical guidelines.


Authors are encouraged to provide one or more tables that provide basic information on the main ‘inputs’ to the study (e.g. samples, participants, or information sources) and the main data outputs of the study; see also Submitting experimental metadata. Tables in the manuscript should generally not be used to present primary data (i.e. measurements). Tables containing primary data should be submitted to an appropriate data repository.

Authors may provide tables within the Word document or as separate files (tab-delimited text or Excel files). Legends, where needed, should be included in the Word document. Generally, a Data Descriptor should have fewer than ten tables, but more may be allowed when needed. Tables may be of any size, but only tables that fit onto a single printed page will be included in the PDF version of the article (up to a maximum of three).

References should be numbered sequentially, first throughout the text, then in tables, followed by figures and, finally, boxes; that is, references that only appear in tables, figures or boxes should be last in the reference list. Only one publication is given for each number. Only papers that have been published or accepted by a named publication or recognized preprint server should be in the numbered list; preprints of accepted papers in the reference list should be submitted with the manuscript. Published conference abstracts, numbered patents, and archived code with an assigned DOI may be included in the reference list. Grant details and acknowledgments are not permitted as numbered references. Footnotes are not used.

BibTeX bibliography files cannot be accepted. LaTeX submission must contain all references within the manuscript .tex file itself.

The correct abbreviation for Scientific Data is 'Sci. Data'.

Scientific Data uses standard Nature referencing style. All authors should be included in reference lists unless there are six or more, in which case only the first author should be given, followed by ‘et al.’. Authors should be listed last name first, followed by a comma and initials (followed by full stops, '.') of given names. Article titles should be in Roman text; only the first word of the title should have an initial capital and the title should be written exactly as it appears in the work cited, ending with a full stop. Book titles should be given in italics and all words in the title should have initial capitals. Journal names are italicized and abbreviated (with full stops) according to common usage. Volume numbers and the subsequent comma appear in bold. The full page range should be given where appropriate. See the examples below:

  • Journal Article:
    1. Schott, D. H., Collins, R. N. & Bretscher, A. Secretory vesicle transport velocity in living cells depends on the myosin V lever arm length. J. Cell Biol. 156, 35‐39 (2002).

  • Book – Book titles should be given in italics and all words in the title should have initial capitals:
    2. Hogan, B. Manipulating The Mouse Embryo: A Laboratory Manual 2nd edn (Cold Spring Harbor Laboratory Press, 1994)

  • Book chapter:
    3. Haines, N. & Cotter, R. in Studies in Manic Depression Vol. 1 (ed. Boase, N.) Ch. 2 (Oxford Univ. Press, 1982).

  • Publicly available preprint:
    4. Babichev, S. A., Ries, J. & Lvovsky, A. I. Quantum scissors: teleportation of single-mode optical states by means of nonlocal single photon. Preprint at (2002).

  • Code:
    5. Gallotti, R. & Barthélemy, M. Source code for: The multilayer temporal network of public transport in Great Britain. Figshare (2014).

  • Online material – Stable documents hosted on the web may be cited in the main reference list, using the format below. Websites or dynamic web resources should be cited by embedding the URL in the main article text:
    6. Manaster, J. Sloth squeak. Scientific American Blog Network (2014).

Data Citations

Data citations provide bibliographic information for any data records described or used in the manuscript.

For data with a digital object identifier (DOI) this should be in the format: Lastname1, Initial1A. Initial1B., Lastname2, Initial2A. Initial2B., … & LastnameN, InitialNA. Initial NB. Repository_name DOI (YYYY).

Example citation for data with a DOI:
Perkins, A. D., Lee, M., & Tanentzapf, G. Figshare (2014).

For data identified by accession ID, repositories may not provide the data creator (author). In these cases the data citation format should be Repository_name Accession_ID (YYYY)

Example citation for data with an accession identifier:
GenBank PRJNA244495 (2014).

Submitting experimental metadata

Every Data Descriptor published by Scientific Data includes a machine-accessible metadata file. This metadata record provides a structured description of the dataset, including key features of the experimental samples and the techniques used to generate the data. Metadata is captured and distributed in ISA-Tab format, which is designed to capture descriptions of research data across disciplines.

Using the ISA format, we aim to capture the following five key attributes about each published dataset in a structured and machine-accessible way, to maximise data discoverability:

  1. Source(s) ‐ What were your starting materials? These may be physical objects (e.g. mice or chemicals), or digital objects (e.g. published articles).
  2. Sample ‐ What part of each Source was used in the study?
  3. Characteristics ‐ What would future users of your data need to know about your sources and samples? Clearly list the differences between distinct samples.
  4. Protocol ‐ How did the samples become data? The protocols listed in the metadata record should match the sub-headings in the Methods section of the manuscript.
  5. Data ‐ Where is the data? This is a machine-accessible representation of the Data Citation section, and should include the repository name and each distinct dataset ID, clearly related to the sample from which it was derived.

The final ISA-Tab metadata files will be finalised with the help of our in-house Data Editor, when the Data Descriptor is accepted for publication. Professional curation helps to ensure the use of consistent and standardized annotation using a core set of community ontologies, to facilitate machine accessibility. The metadata files form the basis of the search and discovery features that are incorporated into Scientific Data's publication platform.

Authors that are familiar with the ISA-tab format or who wish to draft these files using third-party applications (e.g. ISA tools), are welcome to submit these directly as part of the manuscript submission process. Please also refer to our detailed ISA-tab metadata specification.

Authors who are not providing an ISA-Tab file with their manuscript submission are strongly encouraged to submit tables detailing the attributes above, to facilitate metadata creation by our in-house curator. These tables should be included in the main manuscript (e.g. Table 1). The tables should be provided as one or more tab-delimited text or Excel tables at initial submission, and be referenced in the main manuscript. We provide generic examples of metadata tables or experimental, observational and aggregate study types in our Word article templates.

Where human data are involved, we recognize that privacy controls may preclude highly detailed descriptions of patients or participants within metadata records. Please make sure that any privacy-related limitations on data-sharing are discussed in the cover letter of your submission.

The ISA-Tab metadata records are published under the CC0 licence, allowing other users and resources to ingest and mine this information without restriction. Please note that the metadata records are a value-added product and are not considered part of the 'version of record' of published articles. Therefore the ISA-Tab metadata files may be updated from time to time; for instance, to reflect changes in metadata formats or community ontologies.

Statistical guidelines

Every Data Descriptor that contains statistical analyses or data-processing steps must explain the statistical methods in detail either in the Methods or the relevant figure legend. Any special statistical code or software needed for scientists to reuse or reanalyse datasets should be discussed in the Usage Notes section of the Data Descriptor. We encourage authors to make openly available any code or scripts that would help readers reproduce any data-processing steps (see our code availability policy). In addition, authors must ensure that the version of the data described and analysed in the Data Descriptor is permanently available so that others can reproduce any statistical analyses.

Authors are encouraged to summarize their datasets with descriptive statistics in the Technical Validation section, which should include the n value for each dataset; a clearly labelled measure of centre (such as the mean or the median); and a clearly labelled measure of variability (such as standard deviation or range). Ranges are more appropriate than standard deviations or standard errors for small datasets. Graphs should include clearly labelled error bars. Authors must state whether a number that follows the ± sign is a standard error (s.e.m.) or a standard deviation (s.d.).

Authors must clearly explain the independence of any replicate measurements, and ‘technical replicates’ – repeated measurements on the same sample – should be clearly identified.

Data Descriptors should not test new hypotheses or provide extensive interpretive analysis, and therefore should not usually contain statistical significance testing. When hypothesis-based tests must be employed, authors should state the name of the statistical test; the n value for each statistical analysis; the comparisons of interest; a justification for the use of that test (including, for example, a discussion of the normality of the data when the test is appropriate only for normal data); the alpha level for all tests, whether the tests were one-tailed or two-tailed; and the actual p-value for each test (not merely ‘significant’ or ‘p < 0.05’). It should be clear what statistical test was used to generate every p-value. Use of the word ‘significant’ should always be accompanied by a p-value; otherwise, use ‘substantial’, ‘considerable’, etc. Multiple test correction must be used when appropriate and described in detail in the manuscript.

Please also see our specific recommendations for figure legends.

Sampling inclusion criteria and blinding

  • Data Descriptor manuscripts must describe any inclusion/exclusion criteria if samples or animals were excluded from the dataset, including an explicit statement of whether the criteria were pre-established.
  • If a method of randomization was used to determine how samples/animals were allocated to experimental groups and processed, the method should be clearly described. For animal studies, authors are asked to include a statement about randomization even if no randomization was used.
  • If the investigator was blinded to the group allocation during the experiment and/or when assessing the outcome, state the extent of blinding. Again, for animal studies, authors are asked to include a statement about blinding even if no blinding was done.

Chemical and biological nomenclature and abbreviations

Molecular structures are identified by bold Arabic numerals assigned in order of presentation in the text. Once identified in the main text or a figure, compounds may be referred to by their name, by a defined abbreviation or by the bold Arabic numeral (as long as the compound is referred to consistently as one of these three).

When possible, authors should refer to chemical compounds and biomolecules using systematic nomenclature, preferably using IUPAC. Standard chemical and biological abbreviations should be used. Unconventional or specialist abbreviations should be defined at their first occurrence in the text.

Gene nomenclature

Authors should use approved nomenclature for gene symbols, and use symbols rather than italicized full names (for example Ttn, not titin). Please consult the appropriate nomenclature databases for correct gene names and symbols. A useful resource is NCBI Gene.

Approved human gene symbols are provided by HUGO Gene Nomenclature Committee (HGNC; e-mail:; see also Approved mouse symbols are provided by The Jackson Laboratory (e-mail:; see also

For proposed gene names that are not already approved, please submit the gene symbols to the appropriate nomenclature committees as soon as possible, as these must be deposited and approved before publication of an article.

Avoid listing multiple names of genes (or proteins) separated by a slash, as in ‘Oct4/Pou5f1’, as this is ambiguous (it could mean a ratio, a complex, alternative names or different subunits). Use one name throughout and include the other at first mention: ‘Oct4 (also known as Pou5f1)’.

Characterization of chemical and biomolecular materials

Manuscripts submitted to Scientific Data will be held to rigorous standards with respect to experimental methods and characterization of new compounds. Authors must provide adequate data to support their assignment of identity and purity for each new compound described in the manuscript. Authors should provide a statement confirming the source, identity and purity of known compounds that are central to the scientific study, even if they are purchased or resynthesized using published methods.

Relevant data should be deposited in an appropriate repository according to our data deposition policies.

1. Chemical identity

Chemical identity for organic and organometallic compounds should be established through spectroscopic analysis. Standard peak listings (see formatting guidelines below) for 1H NMR and proton-decoupled 13C NMR should be provided for all new compounds. Other NMR data should be reported (31P NMR, 19F NMR, etc.) when appropriate. For new materials, authors should also provide mass spectral data to support molecular weight identity. High-resolution mass spectral (HRMS) data are preferred. UV or IR spectral data may be reported for the identification of characteristic functional groups when appropriate. Melting-point ranges should be provided for crystalline materials. Specific rotations may be reported for chiral compounds. Authors should provide references, rather than detailed procedures, for known compounds, unless their synthesis protocols represent a departure from or improvement on published methods.

2. Combinatorial compound libraries

Authors describing the preparation of combinatorial libraries should include standard characterization data for a diverse panel of library components.

3. Biomolecular identity

For new biopolymeric materials (oligosaccharides, peptides, nucleic acids, etc.), direct structural analysis by NMR spectroscopic methods may not be possible. In these cases, authors must provide evidence of identity based on sequence (when appropriate) and mass spectral characterization.

4. Biological constructs

Authors should provide sequencing or functional data that validate the identity of their biological constructs (plasmids, fusion proteins, site-directed mutants, etc.) either in the manuscript text or the Methods section, as appropriate.

5. Sample purity

Evidence of sample purity is requested for each new compound. Methods for purity analysis depend on the compound class. For most organic and organometallic compounds, purity may be demonstrated by high-field 1H NMR or 13C NMR data, although elemental analysis (±0.4%) is encouraged for small molecules. Quantitative analytical methods including chromatographic (GC, HPLC, etc.) or electrophoretic analyses may be used to demonstrate purity for small molecules and polymeric materials.

6. Spectral data

Detailed spectral data for new compounds should be provided in list form (see below) in the Methods section. Figures containing spectra generally will not be published in the manuscript unless the data are directly relevant to the central conclusions of the paper. When spectral data are a key component of the Data Descriptor, authors are required to deposit high-quality spectral images, for all key compounds, in an appropriate data repository (see our data deposition policies). Specific NMR assignments should be listed after integration values only if they were unambiguously determined by multidimensional NMR or decoupling experiments. Authors should provide information about how assignments were made in a general Methods section.

Example format for compound characterization data. mp: 100-102 °C (lit.ref 99-101 °C); TLC (CHCl3:MeOH, 98:2 v/v): Rf = 0.23; [α]D = -21.5 (0.1 M in n-hexane); 1H NMR (400 MHz, CDCl3): δ 9.30 (s, 1H), 7.55-7.41 (m, 6H), 5.61 (d, J = 5.5 Hz, 1H), 5.40 (d, J = 5.5 Hz, 1H), 4.93 (m, 1H), 4.20 (q, J = 8.5 Hz, 2H), 2.11 (s, 3H), 1.25 (t, J = 8.5 Hz, 3H); 13C NMR (125 MHz, CDCl3): δ 165.4, 165.0, 140.5, 138.7, 131.5, 129.2, 118.6, 84.2, 75.8, 66.7, 37.9, 20.1; IR (Nujol): 1765 cm-1; UV/Vis: λmax 267 nm; HRMS (m/z): [M]+ calcd. for C20H15Cl2NO5, 420.0406; found, 420.0412; analysis (calcd., found for C20H15Cl2NO5): C (57.16, 57.22), H (3.60, 3.61), Cl (16.87, 16.88), N (3.33, 3.33), O (19.04, 19.09).

7. Crystallographic data for small molecules

Manuscripts reporting new three-dimensional structures of small molecules from crystallographic analysis should deposit the relevant structures in an appropriate repository and describe the resulting structural data in the Data Records section of the Data Descriptor (see our data deposition policies).

8. Macromolecular structural data

Manuscripts reporting new structures should contain a table summarizing structural and refinement statistics. Templates are available for such tables describing NMR and X-ray crystallography data. To facilitate assessment of the quality of the structural data, a stereo image of a portion of the electron-density map (for crystallography papers) or of the superimposed lowest-energy structures (≳10; for NMR papers) should be provided with the submitted manuscript. If the reported structure represents a novel overall fold, a stereo image of the entire structure (as a backbone trace) should also be provided. Structural data must be deposited in an appropriate public repository.

Equations and mathematical expressions should be provided in the main text of the paper. Equations that are referred to in the text are identified by parenthetical numbers, such as (1), and are referred to in the manuscript as ‘equation (1)’.

General figure guidelines

Data Descriptors may have up to three figures and up to ten tables. In addition, a limited number of uncaptioned molecular structure graphics and numbered mathematical equations may be included if necessary.

Scientific Data requires authors to present digital images in accord with the policies employed by the Nature-titled journals.

Authors are responsible for obtaining permission to publish any figures or illustrations that are protected by copyright, including figures published elsewhere and pictures taken by professional photographers. The journal cannot publish images downloaded from the Internet without appropriate permission.

Figures should be numbered separately with Arabic numerals in the order of occurrence in the text of the manuscript. One- or two-column format figures are required. When appropriate, figures should include error bars. A description of the statistical treatment of error analysis should be included in the figure legend. Please note that schemes are not used; sequences of chemical reactions or experimental procedures should be submitted as figures, with appropriate captions. A limited number of uncaptioned graphics depicting chemical structures — each labelled with their name, by a defined abbreviation or by the bold Arabic numeral — may be included in a manuscript.

Figure lettering should be in a clear, sans-serif typeface (for example, Helvetica); the same typeface in the same font size should be used for all figures in a paper. Use Symbol font for Greek letters. All display items should be on a white background, and should avoid excessive boxing, unnecessary colour, spurious decorative effects (such as three-dimensional ‘skyscraper’ histograms) and highly pixelated computer drawings. The vertical axis of histograms should not be truncated to exaggerate small differences. Labelling must be of sufficient size and contrast to be readable, even after appropriate reduction. The thinnest lines in the final figure should be no smaller than one point wide. Authors will see a PDF proof that will include figures.

Figures divided into parts should be labelled with a lower-case bold a, b, and so on, in the same type-size as used elsewhere in the figure. Lettering in figures should be in lower-case type, with only the first letter of each label capitalized. Units should have a single space between the number and the unit, and follow SI nomenclature (for example, ms rather than msec) or the nomenclature common to a particular field. Thousands should be separated by commas (1,000). Unusual units or abbreviations should be spelled out in full or defined in the legend. Scale bars should be used rather than magnification factors, with the length of the bar defined on the bar itself rather than in the legend. In legends, please use visual cues rather than verbal explanations such as ‘open red triangles’.

Unnecessary figures should be avoided: data presented in small tables or histograms, for instance, can generally be described briefly in the text instead. Figures should not contain more than one panel unless the parts are logically connected; each panel of a multipart figure should be sized so that the whole figure can be reduced by the same amount and reproduced at the smallest size at which essential details are visible.

Figures for peer-review

At the initial submission stage authors may choose to upload separate figure files or to incorporate figures into the main article file, ensuring that any inserted figures are of sufficient quality to be clearly legible.

When submitting a revised manuscript all figures must be uploaded as separate figure files ensuring that the image quality and formatting conforms to the specifications below.

Figures for publication

Each complete figure must be supplied as a separate file upload. Multi-part/panel figures must be prepared and arranged as a single image file (including all sub-parts; a, b, c, etc.). Please do not upload each panel individually.

When possible, we prefer to use original digital figures to ensure the highest-quality reproduction in the journal. For optimal results, prepare figures to fit either one (88 mm wide) or two columns (180 mm wide). When creating and submitting digital files, please follow the guidelines below. Failure to do so, or to adhere to the following guidelines, can significantly delay publication of your work.

We encourage authors to prepare their figures in a quality vector graphics software package, such as Adobe Illustrator or Inkscape. Figures should then be saved directly in the EPS format. When importing graphs or schematics from other programs, authors are encouraged to remake any text labels in a vector graphics program to ensure consistent quality.

1. Line art, graphs, charts and schematics

For optimal results, all line art, graphs, charts and schematics should be supplied in vector format, such as EPS or AI, and should be saved or exported as such directly from the application in which they were made. Please ensure that data points and axis labels are clearly legible.

2. Photographic and bitmapped images

All photographic and bitmap images should be supplied in a bitmap image format such as TIFF, JPG or PSD. If saving TIFF files, please ensure that the compression option is selected to avoid very large file sizes.

Please do not supply Word or Powerpoint files with placed images. Images can be supplied as RGB or CMYK (note: we will not convert image colour modes).

Figures that do not meet these standards will not reproduce well and may delay publication until we receive high-resolution images.

3. Chemical structures

Chemical structures should be produced using ChemDraw or a similar program. All chemical compounds must be assigned a bold Arabic numeral in the order in which the compounds are presented in the manuscript text. Structures should then be exported into a 300 dpi RGB TIFF file before being submitted.

4. Stereo images

Stereo diagrams should be presented for divergent ‘wall-eyed’ viewing, with the two panels separated by 5.5 cm. In the final accepted version of the manuscript, the stereo images should be submitted at their final page size.

Supplementary information

Scientific Data discourages authors from supplying text, figures or tables as supplementary files. As much as possible, these types of content should be included in the main manuscript. The main sections of the Data Descriptor manuscript, and particularly the Methods section, have no length limits. Data Descriptors are designed to be focused publications: if extensive supplementary text or figures are required, authors should consider whether the manuscript might best be subdivided into multiple Data Descriptors. Similarly, any primary data files should be deposited in an appropriate public repository, rather than included as Supplementary Information. Scientific Data does not allow statements of ‘data not shown’. Please see our data deposition policies.

With these restrictions in mind, authors may use Supplementary Information for any additional content needed to support the Data Descriptor, such as media (e.g. audio or video), or machine-readable versions of mathematical models. Authors may supply code and computational models as Supplementary Information, particularly for initial submissions. However, upon acceptance of a manuscript, we encourage the public archiving of code (though a DOI-issuing repository); and computational models (in field specific computational model repositories). See our code availability policy for more information.

The guidelines below detail the creation, citation and submission of Supplementary Information. Publication may be delayed if these are not followed correctly. Please note that modification of Supplementary Information after the paper is published requires a formal correction, so authors are encouraged to check their Supplementary Information carefully before submitting the final version.

  1. Designate each item as a Supplementary File and number accordingly: for example, ‘Supplementary File 1’. This numbering should be separate from that used in tables and figures appearing in the main article.
  2. Refer to each piece of supplementary material at the appropriate point(s) in the Data Descriptor. Be sure to include the word ‘Supplementary’ each time one is mentioned. Every piece of Supplementary Information must be mentioned at least once in the main article.
  3. Remember to include a brief title and legend (incorporated into the file to appear near the image) as part of every figure submitted, and a title as part of every table.
  4. File sizes should be as small as possible, with a maximum size of 10 MB, so that they can be downloaded quickly.
  5. When supplying multiple supplementary figures, they should be merged into a single PDF file, with figure legends immediately below each figure. A table of contents should be included on the first page, listing the page number of each supplementary figure.

Further queries about submission and preparation of Supplementary Information should be directed to

