Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications

Journal name:
Nature Biotechnology
Volume:
29,
Pages:
415–420
Year published:
DOI:
doi:10.1038/nbt.1823
Published online

Abstract

Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The 'environmental packages' apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.

Main

Without specific guidelines, most genomic, metagenomic and marker gene sequences in databases are sparsely annotated with the information required to guide data integration, comparative studies and knowledge generation. Even with complex keyword searches, it is currently impossible to reliably retrieve sequences that have originated from certain environments or particular locations on Earth—for example, all sequences from 'soil' or 'freshwater lakes' in a certain region of the world. Because public databases of the International Nucleotide Sequence Database Collaboration (INSDC; comprising DNA Data Bank of Japan (DDBJ), the European Nucleotide Archive (EBI-ENA) and GenBank (http://www.insdc.org/)) depend on author-submitted information to enrich the value of sequence data sets, we argue that the only way to change the current practice is to establish a standard of reporting that requires contextual data to be deposited at the time of sequence submission. The adoption of such a standard would elevate the quality, accessibility and utility of information that can be collected from INSDC or any other data repository.

The GSC has previously proposed standards for describing genomic sequences— the “minimum information about a genome sequence” (MIGS)—and metagenomic sequences—the “minimum information about a metagenome sequence” (MIMS)1. Here we introduce an extension of these standards for capturing information about marker genes. Additionally, we introduce 'environmental packages' that standardize sets of measurements and observations describing particular habitats that are applicable across all GSC checklists and beyond2. We define 'environment' as any location in which a sample or organism is found, e.g., soil, air, water, human-associated, plant-associated or laboratory. The original MIGS/MIMS checklists included contextual data about the location from which a sample was isolated and how the sequence data were produced. However, standard descriptions for a more comprehensive range of environmental parameters, which would help to better contextualize a sample, were not included. The environmental packages presented here are relevant to any genome sequence of known origin and are designed to be used in combination with MIGS, MIMS and MIMARKS checklists.

To create a single entry point to all minimum information checklists from the GSC and to the environmental packages, we propose an overarching framework, the MIxS standard (http://gensc.org/gc_wiki/index.php/MIxS). MIxS includes the technology-specific checklists from the previous MIGS and MIMS standards, provides a way of introducing additional checklists such as MIMARKS, and also allows annotation of sample data using environmental packages. A schematic overview of MIxS along with the MIxS environmental packages is shown in Figure 1.

Figure 1: Schematic overview about the GSC MIxS standard (brown), including combination with specific environmental packages (blue).
Schematic overview about the GSC MIxS standard (brown), including combination with specific environmental packages (blue).

Shared descriptors apply to all MIxS checklists; however, each checklist has its own specific descriptors as well. Environmental packages can be applied to any of the checklists. EU, eukarya; BA, bacteria/archaea; PL, plasmid; VI, virus; ORG, organelle.

Development of MIMARKS and the environmental packages

Over the past three decades, the 16S rRNA, 18S rRNA and internal transcribed spacer gene sequences (ITS) from Bacteria, Archaea and microbial Eukaryotes have provided deep insights into the topology of the tree of life3, 4 and the composition of communities of organisms that live in diverse environments, ranging from deep sea hydrothermal vents to ice sheets in the Arctic5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16. Numerous other phylogenetic marker genes have proven useful, including RNA polymerase subunits (rpoB), DNA gyrases (gyrB), DNA recombination and repair proteins (recA) and heat shock proteins (HSP70)3. Marker genes can also reveal key metabolic functions rather than phylogeny; examples include nitrogen cycling (amoA, nifH, ntcA)17, 18, sulfate reduction (dsrAB)19 or phosphorus metabolism (phnA, phnI, phnJ)20, 21. In this paper we define all phylogenetic and functional genes (or gene fragments) used to profile natural genetic diversity as 'marker genes'. MIMARKS (Table 1) complements the MIGS/MIMS checklists for genomes and metagenomes by adding two new checklists, a MIMARKS survey, for uncultured diversity marker gene surveys, and a MIMARKS specimen, for marker gene sequences obtained from any material identifiable by means of specimens. The MIMARKS extension adopts and incorporates the standards being developed by the Consortium for the Barcode of Life (CBOL)22. Therefore, the checklist can be universally applied to any marker gene, from small subunit rRNA to cytochrome oxidase I (COI), to all taxa, and to studies ranging from single individuals to complex communities.

Table 1: The core items of the MIMARKS checklists, along with the value types, descriptions and requirement status

Both MIMARKS and the environmental packages were developed by collating information from several sources and evaluating it in the framework of the existing MIGS/MIMS checklists. These include four independent community-led surveys, examination of the parameters reported in published studies and examination of compliance with optional features in INSDC documents. The overall goal of these activities was to design the backbone of the MIMARKS checklist, which describes the most important aspects of marker gene contextual data.

Results of community-led surveys

Four online surveys about descriptors for marker genes have been conducted to determine researcher preferences for core descriptors. The Department of Energy Joint Genome Institute and SILVA23 surveys focused on general descriptor contextual data for a marker gene, whereas the Ribosomal Database Project (RDP)24 focused on prevalent habitats for rRNA gene surveys, and the Terragenome Consortium25 focused on soil metagenome project contextual data (Supplementary Results 1). The above recommendations were combined with an extensive set of contextual data items suggested by an International Census of Marine Microbes (ICoMM) working group that met in 2005. These collective resources provided valuable insights into community requests for contextual data items to be included in the MIMARKS checklist and the main habitats constituting the environmental packages.

Survey of published parameters

We reviewed published rRNA gene studies, retrieved from SILVA and the ICoMM database MICROBIS (The Microbial Oceanic Biogeographic Information System, http://icomm.mbl.edu/microbis/) to further supplement contextual data items that are included in the respective environmental packages. In total, 39 publications from SILVA and >40 ICoMM projects were scanned for contextual data items to constitute the core of the environmental package subtables (Supplementary Results 1).

In a final analysis step, we surveyed usage statistics of INSDC source feature key qualifier values of rRNA gene sequences contained in SILVA (Supplementary Results 1). Notably, <10% of the 1.2 million 16S rRNA gene sequences (SILVA release 100) were associated with even basic information such as latitude and longitude, collection date or PCR primers.

The MIMARKS checklist

The MIMARKS checklist provides users with an 'electronic laboratory notebook' containing core contextual data items required for consistent reporting of marker gene investigations. MIMARKS uses the MIGS/MIMS checklists with respect to the nucleic acid sequence source and sequencing contextual data, but extends them with further experimental contextual data such as PCR primers and conditions, or target gene name.

For clarity and ease of use, all items within the MIMARKS checklist are presented with a value syntax description, as well as a clear definition of the item. Whenever terms from a specific ontology are required as the value of an item, these terms can be readily found in the respective ontology browsers linked by URLs in the item definition. Although this version of the MIMARKS checklist does not contain unit specifications, we recommend all units to be chosen from and follow the International System of Units (SI) recommendations. In addition, we strongly urge the community to provide feedback regarding the best unit recommendations for given parameters. Unit standardization across data sets will be vital to facilitate comparative studies in future. An Excel version of the MIMARKS checklist is provided on the GSC web site (http://gensc.org/gc_wiki/index.php/MIMARKS).

The MIxS environmental packages

Fourteen environmental packages provide a wealth of environmental and epidemiological contextual data fields for a complete description of sampling environments. The environmental packages can be combined with any of the GSC checklists (Fig. 1 and Supplementary Results 2). Researchers within The Human Microbiome Project26 contributed the host-associated and all human packages. The Terragenome Consortium contributed sediment and soil packages. Finally, ICoMM, Microbial Inventory Research Across Diverse Aquatic Long Term Ecological Research Sites and the Max Planck Institute for Marine Microbiology contributed the water package. The MIMARKS working group developed the remaining packages (air, microbial mat/biofilm, miscellaneous natural or artificial environment, plant-associated and wastewater/sludge). The package names describe high-level habitat terms in order to be exhaustive. The miscellaneous natural or artificial environment package contains a generic set of parameters, and is included for any other habitat that does not fall into the other thirteen categories. Whenever needed, multiple packages may be used for the description of the environment.

Examples of MIMARKS-compliant data sets

Several MIMARKS-compliant reports are included in Supplementary Results 3. These include a 16S rRNA gene survey from samples obtained in the North Atlantic, an 18S pyrosequencing tag study of anaerobic protists in a permanently anoxic basin of the North Sea, a pmoA survey from Negev Desert soils, a dsrAB survey of Gulf of Mexico sediments and a 16S pyrosequencing tag study of bacterial diversity in the western English Channel (SRA accession no. SRP001108).

Adoption by major database and informatics resources

Support for adoption of MIMARKS and the MIxS standard has spread rapidly. Authors of this paper include representatives from genome sequencing centers, maintainers of major resources, principal investigators of large- and small-scale sequencing projects, and individual investigators who have provided compliant data sets, showing the breadth of support for the standard within the community.

In the past, the INSDC has issued a reserved 'barcode' keyword for the CBOL7. Following this model, the INSDC has recently recognized the GSC as an authority for the MIxS standard and issued the standard with official keywords within INSDC nucleotide sequence records27. This greatly facilitates automatic validation of the submitted contextual data and provides support for data sets compliant with previous versions by including the checklist version as a keyword.

GenBank accepts MIxS metadata in tabular format using the sequin and tbl2asn submission tools, validates MIxS compliance and reports the fields in the structured comment block. The EBI-ENA Webin submission system provides prepared web forms for the submission of MIxS compliant data; it presents all of the appropriate fields with descriptions, explanations and examples, and validates the data entered. One tool that can aid submitting contextual data is MetaBar28, a spreadsheet and web-based software, designed to assist users in the consistent acquisition, electronic storage and submission of contextual data associated with their samples in compliance with the MIxS standard. The online tool CDinFusion (http://www.megx.net/CDinFusion) was created to facilitate the combination of contextual data with sequence data, and generation of submission-ready files.

The next-generation Sequence Read Archive (SRA) collects and displays MIxS-compliant metadata in sample and experiment objects. There are several tools that are already available or under development to assist users in SRA submissions. The myRDP SRA PrepKit allows users to prepare and edit their submissions of reads generated from ultra-high-throughput sequencing technologies. A set of suggested attributes in the data forms assist researchers in providing metadata conforming to checklists such as MIMARKS. The Quantitative Insights Into Microbial Ecology (QIIME) web application (http://www.microbio.me/qiime) allows users to generate and validate MIMARKS-compliant templates. These templates can be viewed and completed in the users' spreadsheet editor of choice (e.g., Microsoft Excel). The QIIME web-platform also offers an ontology lookup and geo-referencing tool to aid users when completing the MIMARKS templates. The Investigation/Study/Assay (ISA) is a software suite that assists in the curation, reporting and local management of experimental metadata from studies using one or a combination of technologies, including high-throughput sequencing29. Specific ISA configurations (http://isa-tools.org/tools.html) have been developed to ensure MIxS compliance by providing templates and validation capability. Another tool, ISAconverter, produces SRA.xml documents, facilitating submission to the SRA repository. MIxS checklists are also registered with the BioSharing catalog of standards (http://biosharing.org/), set to progressively link minimal information specifications to the respective exchange formats, ontologies and compliant tools.

Further detailed guidance for submission processes can be found under the respective wiki pages (http://gensc.org/gc_wiki/index.php/MIxS) of the standard.

Maintenance of the MIxS standard

To allow further developments, extensions and enhancements of MIxS, we set up a public issue tracking system to track changes and accomplish feature requests (http://mixs.gensc.org/). New versions will be released annually. Technically, the MIxS standard, including MIMARKS and the environmental packages, is maintained in a relational database system at the Max Planck Institute for Marine Microbiology Bremen on behalf of the GSC. This provides a secure and stable mechanism for updating the checklist suite and versioning. In the future, we plan to develop programmatic access to this database to allow automatic retrieval of the latest version of each checklist for INSDC databases and for GSC community resources. Moreover, the Genomic Contextual Data Markup Language is a reference implementation of the GSC checklists by the GSC and now implements the full range of MIxS standards. It is based on XML Schema technology and thus serves as an interoperable data exchange format for infrastructures based on web services30.

Conclusions and call for action

The GSC is an international body with a stated mission of working towards richer descriptions of the complete collection of genomes and metagenomes through the MIxS standard. The present report extends the scope of GSC guidelines to marker gene sequences and environmental packages and establishes a single portal where experimentalists can gain access to and learn how to use GSC guidelines. The GSC is an open initiative that welcomes the participation of the wider community. This includes an open call to contribute to refinements of the MIxS standards and their implementations.

The adoption of the GSC standards by major data providers and organizations, as well as the INSDC, supports efforts to contextually enrich sequence data and complements recent efforts to enrich other (meta) 'omics data. The MIxS standard, including MIMARKS, has been developed to the point that it is ready for use in the publication of sequences. A defined procedure for requesting new features and stable release cycles will facilitate implementation of the standard across the community. Compliance among authors, adoption by journals and use by informatics resources will vastly improve our collective ability to mine and integrate invaluable sequence data collections for knowledge- and application-driven research. In particular, the ability to combine microbial community samples collected from any source, using the universal tree of life as a measure to compare even the most diverse communities, should provide new insights into the dynamic spatiotemporal distribution of microbial life on our planet and on the human body.

Accession codes

Referenced accessions

Sequence Read Archive

References

  1. Field, D. et al. The minimum information about a genome sequence (MIGS) specification. Nat. Biotechnol. 26, 541547 (2008).
  2. Taylor, C.F. et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat. Biotechnol. 26, 889896 (2008).
  3. Ludwig, W. & Schleifer, K.H. in Microbial Phylogeny and Evolution, Concepts and Controversies. (ed. Sapp, J.) 7098 (Oxford University Press, New York, 2005).
  4. Ludwig, W. et al. Bacterial phylogeny based on comparative sequence analysis. Electrophoresis 19, 554568 (1998).
  5. Giovannoni, S.J., Britschgi, T.B., Moyer, C.L. & Field, K.G. Genetic diversity in Sargasso Sea bacterioplankton. Nature 345, 6063 (1990).
  6. Stahl, D.A. Analysis of hydrothermal vent associated symbionts by ribosomal RNA sequences. Science 224, 409411 (1984).
  7. Ward, D.M., Weller, R. & Bateson, M.M. 16S rRNA sequences reveal numerous uncultured microorganisms in a natural community. Nature 345, 6365 (1990).
  8. DeLong, E.F. Archaea in coastal marine environments. Proc. Nat. Acad. Sci. USA 89, 56855689 (1992).
  9. Diez, B., Pedros-Alio, C. & Massana, R. Study of genetic diversity of eukaryotic picoplankton in different oceanic regions by small-subunit rRNA gene cloning and sequencing. Appl. Environ. Microbiol. 67, 29322941 (2001).
  10. Fuhrman, J.A., McCallum, K. & Davis, A.A. Novel major archaebacterial group from marine plankton. Nature 356, 148149 (1992).
  11. Hewson, I. & Fuhrman, J.A. Richness and diversity of bacterioplankton species along an estuarine gradient in Moreton Bay, Australia. Appl. Environ. Microbiol. 70, 34253433 (2004).
  12. Huber, J.A., Butterfield, D.A. & Baross, J.A. Temporal changes in archaeal diversity and chemistry in a mid-ocean ridge subseafloor habitat. Appl. Environ. Microbiol. 68, 15851594 (2002).
  13. Lopez-Garcia, P., Rodriguez-Valera, F., Pedros-Alio, C. & Moreira, D. Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton. Nature 409, 603607 (2001).
  14. Moon-van der Staay, S.Y., De Wachter, R. & Vaulot, D. Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity. Nature 409, 607610 (2001).
  15. Pace, N.R. A molecular view of microbial diversity and the biosphere. Science 276, 734740 (1997).
  16. Rappe, M.S. & Giovannoni, S.J. The uncultured microbial majority. Annu. Rev. Microbiol. 57, 369394 (2003).
  17. Francis, C.A., Beman, J.M. & Kuypers, M.M.M. New processes and players in the nitrogen cycle: the microbial ecology of anaerobic and archaeal ammonia oxidation. ISME J. 1, 1927 (2007).
  18. Zehr, J.P., Mellon, M.T. & Zani, S. New nitrogen-fixing microorganisms detected in oligotrophic oceans by amplification of nitrogenase (nifH) genes. Appl. Environ. Microbiol. 64, 34443450 (1998).
  19. Minz, D. et al. Diversity of sulfate-reducing bacteria in oxic and anoxic regions of a microbial mat characterized by comparative analysis of dissimilatory sulfite reductase genes. Appl. Environ. Microbiol. 65, 46664671 (1999).
  20. Gilbert, J.A. et al. The seasonal structure of microbial communities in the Western English Channel. Environ. Microbiol. 11, 31323139 (2009).
  21. Martinez, A.W., Tyson, G. & DeLong, E.F. Widespread known and novel phosphonate utilization pathways in marine bacteria revealed by functional screening and metagenomic analyses. Environ. Microbiol. 12, 222238 (2009).
  22. Hanner, R. Data Standards for BARCODE Records in INSDC (BRIs) (Database Working Group, Consortium for the Barcode of Life, 2009). <http://www.barcodeoflife.org/sites/default/files/legacy/pdf/DWG_data_standards-Final.pdf>.
  23. Pruesse, E. et al. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35, 71887196 (2007).
  24. Cole, J.R. et al. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 37, D141D145 (2009).
  25. Vogel, T.M. et al. TerraGenome: a consortium for the sequencing of a soil metagenome. Nat. Rev. Microbiol. 7, 252 (2009).
  26. Turnbaugh, P.J. et al. The Human Microbiome Project. Nature 449, 804810 (2007).
  27. Benson, D.A. et al. GenBank. Nucleic Acids Res. 36, D25D30 (2008).
  28. Hankeln, W. et al. MetaBar—a tool for consistent contextual data acquisition and standards compliant submission. BMC Bioinformatics 11, 358 (2010).
  29. Rocca-Serra, P. et al. ISA infrastructure: supporting standards-compliant experimental reporting and enabling curation at the community level. Bioinformatics 26, 23542356 (2010).
  30. Kottmann, R. et al. A standard MIGS/MIMS compliant XML schema: toward the development of the Genomic Contextual Data Markup Language (GCDML). OMICS 12, 115121 (2008).

Download references

Acknowledgments

Funding sources are listed in the Supplementary Note.

Author information

Affiliations

  1. Microbial Genomics and Bioinformatics Group, Max Planck Institute for Marine Microbiology, Bremen, Germany.

    • Pelin Yilmaz,
    • Renzo Kottmann,
    • Pier Luigi Buttigieg,
    • Wolfgang Hankeln,
    • Elmar Pruesse,
    • Christian Quast &
    • Frank Oliver Glöckner
  2. Jacobs University Bremen gGmbH, Bremen, Germany.

    • Pelin Yilmaz,
    • Pier Luigi Buttigieg,
    • Wolfgang Hankeln,
    • Elmar Pruesse &
    • Frank Oliver Glöckner
  3. Natural Environment Research Council Environmental Bioinformatics Centre, Wallington CEH, Oxford, UK.

    • Dawn Field,
    • Norman Morrison,
    • Peter Sterk,
    • Mark Bailey,
    • Tim Booth,
    • Lindsay K Newbold,
    • Anna E Oliver &
    • Andrew Whiteley
  4. Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado, USA.

    • Rob Knight,
    • Elizabeth K Costello,
    • Jerry Kennedy,
    • Robert Larsen,
    • Catherine A Lozupone,
    • Sara Nakielny,
    • Jesse Stombaugh &
    • Doug Wendel
  5. Howard Hughes Medical Institute, San Francisco, California, USA.

    • Rob Knight
  6. Ribosomal Database Project, Michigan State University, East Lansing, Michigan, USA.

    • James R Cole
  7. Center for Microbial Ecology, Michigan State University, East Lansing, Michigan, USA.

    • James R Cole,
    • Patrick S G Chain &
    • James M Tiedje
  8. The Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, Massachusetts, USA.

    • Linda Amaral-Zettler
  9. Plymouth Marine Laboratory, Plymouth, UK.

    • Jack A Gilbert
  10. Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA.

    • Jack A Gilbert,
    • Folker Meyer &
    • Andreas Wilke
  11. Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA.

    • Jack A Gilbert
  12. National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.

    • Ilene Karsch-Mizrachi &
    • Anjanette Johnston
  13. European Molecular Biology Laboratory (EMBL) Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

    • Guy Cochrane,
    • Robert Vaughan &
    • Christopher Hunter
  14. WCU Center for Green Metagenomics, School of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea.

    • Joonhong Park
  15. School of Computer Science, University of Manchester, Manchester, UK.

    • Norman Morrison
  16. Oxford e-Research Centre, University of Oxford, Oxford, UK.

    • Philippe Rocca-Serra,
    • Eamonn Maguire &
    • Susanna Assunta-Sansone
  17. Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.

    • Manimozhiyan Arumugam &
    • Peer Bork
  18. Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado, USA.

    • Laura Baumgartner,
    • Justin Kuczynski &
    • Norman R Pace
  19. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts, USA.

    • Bruce W Birren,
    • Dirk Gevers &
    • Doyle V Ward
  20. Department of Medicine and the Department of Microbiology, New York University Langone Medical Center, New York, New York, USA.

    • Martin J Blaser
  21. National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA.

    • Vivien Bonazzi &
    • Lita Proctor
  22. Department of Microbiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA.

    • Frederic D Bushman,
    • Emily Charlson &
    • Rohini Sinha
  23. DOE Joint Genome Institute, Walnut Creek, California, USA.

    • Patrick S G Chain,
    • Janet Jansson &
    • Nikos Kyrpides
  24. Los Alamos National Laboratory, Bioscience Division, Los Alamos, New Mexico, USA.

    • Patrick S G Chain
  25. Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA.

    • Heather Huot-Creasy,
    • Jacques Ravel,
    • Lynn Schriml,
    • Owen White &
    • Jennifer R Wortman
  26. Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium.

    • Peter Dawyndt
  27. Center for Environmental Biotechnology, Lawrence Berkeley National Laboratory, Berkeley, California, USA.

    • Todd DeSantis
  28. Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, USA.

    • Noah Fierer,
    • Robert Guralnick &
    • Teresa Legg
  29. Department of Biological Sciences, University of Southern California, Los Angeles, California, USA.

    • Jed A Fuhrman
  30. National Ecological Observatory Network, Boulder, Colorado, USA.

    • Rachel E Gallery
  31. Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA.

    • Richard A Gibbs,
    • Sarah Highlander &
    • Joseph Petrosino
  32. Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.

    • Richard A Gibbs
  33. Department of Biology, University of New Mexico, LTER Network Office, Albuquerque, New Mexico, USA.

    • Inigo San Gil
  34. Department of Computer Science, University of Colorado, Boulder, Colorado, USA.

    • Antonio Gonzalez &
    • Dan Knights
  35. Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Jeffrey I Gordon,
    • Andrew L Kau,
    • Brian Muegge,
    • Michelle I Smith &
    • Tanya Yatsunenko
  36. University of Colorado Museum of Natural History, University of Colorado, Boulder, Colorado, USA.

    • Robert Guralnick
  37. Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas, USA.

    • Sarah Highlander &
    • Joseph Petrosino
  38. Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia.

    • Philip Hugenholtz
  39. Earth Science Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.

    • Janet Jansson
  40. Department of Biology, San Diego State University, San Diego, California, USA.

    • Scott T Kelley
  41. Department of Microbiology, Cornell University, Ithaca, New York, USA.

    • Omry Koren,
    • Ruth E Ley &
    • Aymé Spor
  42. Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, Colorado, USA.

    • Christian L Lauber &
    • Donna Lyons
  43. Lehrstuhl für Mikrobiologie, Technische Universität München, Freising, Germany.

    • Wolfgang Ludwig
  44. J. Craig Venter Institute, Rockville, Maryland, USA.

    • Barbara A Methé &
    • Karen E Nelson
  45. Department of Environmental Sciences, University of Colorado, Boulder, Colorado, USA.

    • Diana Nemergut
  46. Department of Biology, University of Waterloo, Ontario, Canada.

    • Josh D Neufeld
  47. Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.

    • Giriprakash Palanisamy
  48. Ribocon GmbH, Bremen, Germany.

    • Jörg Peplies
  49. VIB - Vrije Universiteit Brussel, Brussels, Belgium.

    • Jeroen Raes
  50. Canadian Centre for DNA Barcoding, Biodiversity Institute of Ontario, University of Guelph, Guelph, Ontario, Canada.

    • Sujeevan Ratnasingham
  51. Departments of Microbiology and Immunology and Department of Medicine, Stanford University School of Medicine, Stanford, California, USA.

    • David A Relman
  52. Veterans Affairs Palo Alto Health Care System, Palo Alto, California, USA.

    • David A Relman
  53. Department of Microbiology and Immunology, Ann Arbor, Michigan, USA.

    • Patrick D Schloss
  54. The Genome Center, Department of Genetics, Washington University in St. Louis School of Medicine, St. Louis, Missouri, USA.

    • Erica Sodergren &
    • George M Weinstock

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Results 1 (803 KB)

    Community led surveys

Excel files

  1. Supplementary Results 2 (184 KB)

    MIMARKS checklist

  2. Supplementary Results 3 (115 KB)

    MIMARKS compliant datasets

  3. Supplementary Note (29 KB)

    Funding sources

Additional data