A framework for human microbiome research

doi:10.1038/nature11209

Download PDF

Article
Open access
Published: 13 June 2012

A framework for human microbiome research

The Human Microbiome Project Consortium

Nature volume 486, pages 215–221 (2012)Cite this article

119k Accesses
1682 Citations
225 Altmetric
Metrics details

Subjects

Abstract

A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies.

A distinct Fusobacterium nucleatum clade dominates the colorectal cancer niche

Article Open access 20 March 2024

Martha Zepeda-Rivera, Samuel S. Minot, … Christopher D. Johnston

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Microbiota in health and diseases

Article Open access 23 April 2022

Kaijian Hou, Zhuo-Xun Wu, … Zhe-Sheng Chen

Main

Advances in sequencing technologies coupled with new bioinformatic developments have allowed the scientific community to begin to investigate the microbes that inhabit our oceans, soils, the human body and elsewhere¹. Microbes associated with the human body include eukaryotes, archaea, bacteria and viruses, with bacteria alone estimated to outnumber human cells within an individual by an order of magnitude. Our knowledge of these communities and their gene content, referred to collectively as the human microbiome, has until now been limited by a lack of population-scale data detailing their composition and function.

The US NIH-funded Human Microbiome Project Consortium (HMP) brought together a broad collection of scientific experts to explore these microbial communities and their relationships with their human hosts. As such, the HMP² has focused on producing reference genomes (viral, bacterial and eukaryotic), which provide a critical framework for subsequent metagenomic annotation and analysis, and on generating a baseline of microbial community structure and function from an adult cohort defined by a carefully delineated set of clinical inclusion and exclusion criteria that we term ‘healthy’ in this study (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?id=phd002854.2). Investigations of the microbiome from this cohort incorporated several complementary analyses including: 16S ribosomal RNA (rRNA) gene sequence (16S) and taxonomic profiles, whole-genome shotgun (WGS) or metagenomic sequencing of whole community DNA, and alignment of the assembled sequences to the reference microbial genomes from the human body^3,4. Thus, the HMP complements other large-scale sequence-based human microbiome projects such as the MetaHIT project⁵, which focused on examination of the gut microbiome using WGS data including samples from cohorts exhibiting a wide range of health statuses and physiological characteristics.

Additional projects supported by the HMP are investigating the association of specific components and dynamics of the microbiome with a variety of disease conditions, developing tools and technology including isolating and sequencing uncultured organisms, and studying the ethical, legal and social implications of human microbiome research (http://commonfund.nih.gov/hmp/fundedresearch.aspx). A comprehensive list of current publications from HMP projects is available at http://commonfund.nih.gov/hmp/publications.aspx.

Here we detail the resources created so far by the HMP initiative including: clinical specimens (samples), reference genomes, sequencing and annotation protocols, methods and analyses. We describe the thousands of samples obtained from 15 or 18 distinct body sites from 242 donors over multiple time points that were processed at two clinical centres (Baylor College of Medicine (BCM) and Washington University School of Medicine). We also describe the laboratory and computational protocols developed for reliably generating and interpreting the human microbiome data. HMP resources include both protocols for, and the subsequent data generated from, 16S and metagenomic sequencing of human microbiome samples. During this study, these protocols were rigorously standardized and quality controlled for simultaneous use across four sequencing centres (BCM Human Genome Sequencing Center, The Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, the J. Craig Venter Institute and The Genome Institute at Washington University School of Medicine). In particular, we focus on the production of the first phase of metagenomic data sets (phase I) used for subsequent in-depth analyses, and we summarize standards and recommendations based on our experiences generating and analysing these data. An additional set of publications (many included in the references and in those of ref. 4) describe in further detail the microbial ecology and microbiological implications of these data. Collectively these resources and analyses represent an important framework for human microbiome research.

HMP resource organization

Supplementary Fig. 1 summarizes organization of the HMP, including the data processing and analytical steps, and the scientific entities gathered to conduct the project. An overview of available HMP data sets and additional resources are provided in Supplementary Tables 1–3. Donors were recruited and enrolled into the HMP through the two clinical centres. Over 240 adults were carefully screened and phenotyped before sampling one to three times at 15 (male) or 18 (female) body sites using a common sampling protocol (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?id=phd003190.2). All included subjects were between the ages of 18 and 40 years and had passed a screening for systemic health based on oral, cutaneous and body mass exclusion criteria (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?id=phd002854.2) (K. Aagaard et al., manuscript submitted).

A Data Analysis and Coordination Center (DACC) was created to serve as the central repository for all HMP WGS, 16S and reference genome sequence information generated by the four sequencing centres. The DACC supports access to analysis software, biological samples, clinical protocols, news, publication announcements and project statistics, and performed centralized analysis of HMP reference genome and WGS annotation in cooperation with the sequencing centres. All unprocessed 16S, WGS and reference genome sequence data are deposited at the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/bioproject/43021). Unless otherwise noted, all data sets and protocols described here are available to the scientific community at the DACC (http://hmpdacc.org). Specific data sets referred to in this work and available at the DACC are indicated in parentheses with the preface ‘RES’.

Phase I 16S and WGS sequencing overview

A set of 5,298 samples were collected from 242 adults (K. Aagaard et al., manuscript submitted; Table 1 and Supplementary Table 4), from which 16S and WGS data were generated for a total of 5,177 taxonomically characterized communities (16S) and 681 WGS samples describing the microbial communities from habitats within the human airways, skin, oral cavity, gut and vagina. For a subset of 560 samples, both data types were generated (Table 1). These efforts constitute our initial primary metagenomic data sets (phase I) described in more detail later. Additional efforts are ongoing to sequence and analyse the remaining samples from the complete HMP collection (11,174 primary specimens in total from 300 individuals sampled up to three times over 22 months) (K. Aagaard et al., manuscript submitted).

Table 1 HMP donor samples examined by 16S and WGS

Full size table

16S standards development and sequencing

The goals of the HMP required that 16S sequences and profiles from data produced at the four participating sequencing centres be comparable in a variety of downstream analyses; however, no suitable methodology was available at the commencement of the project. While establishing 16S protocols, we determined that many components of data production and processing can contribute errors and artefacts. We investigated methods that avoid these errors and their subsequent effects on taxonomic classification and operational taxonomic unit (OTU)-based community structure. The results are discussed in detail in Supplementary Information and ref. 6. Thus, multiple evaluations of 16S protocols were undertaken before adopting a single standardized protocol that ensured consistency in the high-throughput production.

To maximize accuracy and consistency, protocols were evaluated primarily using a synthetic mock community of 21 known organisms⁶ (Supplementary Table 5). Additional testing of the protocol was carried out on a subset of HMP samples (Supplementary Table 1). Collectively, these efforts resulted in adoption of a protocol to amplify and sequence samples using the Roche-454 FLX Titanium platform⁶ (http://www.hmpdacc.org/doc/HMP_MDG_454_16S_Protocol.pdf). The HMP created both cell mixtures and genomic DNA extracts of the mock community (Supplementary Tables 2 and 5). A large body of metagenomic data (both 16S and WGS) (RES:HMMC) from these and other calibration experiments are available to the community to facilitate further benchmarking of new molecular and analytical approaches (Supplementary Table 3).

The majority of the sample collection was targeted for 16S sequencing using the 454 FLX Titanium based strategy⁶. The nucleotide sequence of the 16S rRNA gene consists of regions of highly conserved sequence, which alternate with nine regions or windows of variable nucleotide sequence that constitute the most informative portions of the gene sequence for use in taxonomic classification. A window covering number three (V3) to five (V5) variable regions (V35) of the 16S rRNA gene was chosen as the target for 4,879 samples. Sequence of a V1 to V3 (V13) window was also included for a subset of 2,971 samples to provide a complementary view of taxonomic profiles⁶ (RES:HMR16S) (Table 1, Supplementary Figs 2, 3 and Supplementary Information).

After adoption of the 16S protocol, including removal of multiple sources of potential artefacts or bias generated by 16S sequencing using pyrosequencing^7,8, a variety of approaches for accurate diversity estimation were developed and compared⁹. A 16S data processing pipeline was established using the mothur software package¹⁰ (Supplementary Information), which includes two optional low and high stringency approaches. The former provides an output favouring longer read lengths tailored towards taxonomic classification, the latter an output with more aggressive sequence error reduction tailored towards OTU construction (RES:HMMCP). A third complementary pipeline was also developed using the QIIME software package¹¹ (Supplementary Information), which processes these data using an OTU-binning strategy to which taxonomic classification is added (RES:HMQCP). All pipelines result in highly comparable views of the human microbiome.

Metagenomic assembly and gene cataloguing

Approximately 749 samples representing targeted body sites were chosen for WGS sequencing using the Illumina GAIIx platform with 101-base-pair paired-end reads. From a high-quality set of 681 samples an average depth of 13 Gb (± 4.3) was achieved per sample, collectively producing a total of 8.8 Tb (RES:HMIWGS) (Table 1). Theoretically, these per sample data are sufficient to cover a 3 Mb bacterial genome present at only 0.8% abundance with a probability of 90% (M. C. Wendl et al., manuscript submitted). In addition, 12 stool samples were simultaneously sequenced using the 454 FLX Titanium platform (RES:HM4WGS). Comparisons between the centres demonstrated high consistency of target sequencing depth and success rates⁴. After development of a protocol for removing reads resulting from human DNA contamination (Supplementary Information), 49% of the reads were targeted for removal as human (for information on authorized access to these reads, see Supplementary Information). Samples collected from soft tissue tended to have higher human contamination (for example, mid-vagina (96%), anterior nares (82%) and throat (75%)). Preparations from saliva were also high in human DNA sequence (80%), whereas stool contained a relatively low abundance of human reads (up to 1%) (Supplementary Fig. 4).

After application of a quality control protocol that includes human sequence removal, quality filtering and trimming of reads (Supplementary Information), the remaining 3.5 Tb from 681 samples were subjected to a three-tiered complementary analysis strategy (Supplementary Information) of reference genome mapping (which was able to use ∼57% of the data), assembly and gene prediction (∼50% of the data), and metabolic reconstruction (∼36% of the data). This combined strategy facilitated the extraction of maximal organismal and functional information.

Metagenomic assemblies were generated for all available samples using an optimized SOAPdenovo protocol with parameters designed to produce substrates for downstream analyses such as gene and function prediction, resulting in a total of 41 million contigs (RES:HMASM) (Supplementary Information). Reads that remained unassembled were pooled across individual body sites and re-assembled using the same approach, resulting in an additional 4,200,672 contigs (RES:HMBSA). These body-site-specific assemblies are aimed at reconstructing organisms that represent too small a fraction in any individual sample to assemble but are found among many individuals. For 12 stool samples both Illumina and 454 FLX Titanium data (RES:HM4WGS) were generated, allowing a hybrid assembly approach using Newbler (Supplementary Information) (RES:HMHASM). Overall, the assembly statistics recovered varied substantially depending on body site and community complexity (Supplementary Fig. 5). However, our results indicate that, for the assembly strategy we used, metagenomic assembly quality plateaus at approximately 6 Gb of microbial sequence coverage for a sample possessing a microbial community structure similar to that of stool samples (Supplementary Fig. 6).

A WGS-based perspective of community membership was obtained by aligning the reads to a set of 1,742 finished bacterial, 131 archaeal, 3,683 viral and 326 microeukaryotic reference genomes¹² (RES:HMREFG) (Supplementary Information) representing a broad taxonomic range from each of these four domains. A total of 57.6% of the high-quality microbial reads could be associated with a known genome (ranging from 33–77% for anterior nares and posterior fornix, respectively) (RES:HMSCP). The overwhelming majority of mapped sequences originated from bacteria (99.7%), while the remaining reads mapped to microeukaryotes (0.3%) or archaea (<0.01%) (Supplementary Information).

Two complementary approaches were used to summarize overall function and metabolism of the human microbiome, producing two primary data sets of annotations (RES:HMMRC and RES:HMGI) (Supplementary Information) and additional secondary analyses (RES:HMGS, HMHGI, HMGC and HMGOI) (Supplementary Information) available to the community for further interrogation. The first primary data set of annotations was produced by mapping individual shotgun reads to characterized protein families¹³ (RES:HMMRC). The second was produced from functionally annotated gene predictions generated from the metagenomic assemblies (RES:HMGI), which were subsequently grouped according to high-level biological processes and to selected additional processes specific to metabolism and regulation¹⁴ (RES:HMGS) (Supplementary Tables 6, 7 and Supplementary Fig. 7).

HMP data generation and analysis lessons

A key manner in which the HMP resources will serve to guide future studies of the microbiome is by enabling informed decisions regarding sampling protocols and genomic DNA preparation (K. Aagaard et al., manuscript submitted), sequencing depth (M. C. Wendl et al., manuscript submitted), statistical power (P. S. La Rosa et al., manuscript submitted) and metagenomic data type. As indicated in Table 1, the consortium successfully amplified 16S sequences to our target depth at all 18 body sites, with the fewest sequences recovered consistently from the antecubital fossae. The amount of host human DNA recovered and the finest level of OTU resolution varied for 16S sequences among body sites⁶ (Supplementary Figs 3 and 4).

From our WGS investigations, a series of protocols (http://hmpdacc.org/tools_protocols/tools_protocols.php) have been established to process large volumes of short-read WGS data and to annotate and examine these data through both a multi-tiered assembly approach and as single reads¹⁵. An investigator’s choice of metagenomic technologies can thus be guided not only by a 16S versus WGS dichotomy, but also by the expected fraction of host sequence and the appropriate 16S region targeting the dominant taxa at each body site (Supplementary Figs 2–6 and 8).

Together, these data sets represent comprehensive and complementary views of the human microbiome, as shown by comparing organismal (Fig. 1a) and gene (Fig. 1b) catalogues, and the ratio of genes contributed per OTU (Fig. 1c). The discovery rate of new gene clusters (as determined by annotation of assembled WGS data) is in general detected more slowly relative to organismal discovery (as determined by OTU data) owing to the fragmentary nature of these community reads and assemblies despite high sequence depth (Fig. 1a, b and Supplementary Fig. 9), and the number of genes contributed per OTU varies by body site (Fig. 1c and Supplementary Information). However, in general, these results highlight an important point for consideration of further microbiome investigations using these data sets, as they suggest that the majority of the common taxa and genes present in this reference population have been detected.

Figure 1: **Rates of gene and OTU discovery from HMP taxonomic and metagenomic data.**

We additionally compared the gut community gene catalogue sampled by the HMP with that of MetaHIT in terms of total detected gene counts. The HMP recovered more total non-redundant gene counts (5,140,472) than reported by MetaHIT (3,299,822)⁵, probably reflecting a combination of the increased sequence depth obtained by the HMP (11.7 Gb HMP, 4.5 Gb MetaHIT on average) and differences in data generation and processing⁵.

The two non-redundant sets of gene sequences were subsequently combined and compared by matches to a database of orthologous groups¹⁶ of functionally annotated genes. Approximately 57% of the orthologous groups recovered by this method overlapped between the data sets, while an additional 34% versus 10% were unique to the HMP and MetaHIT, respectively (Supplementary Fig. 10, Supplementary Table 8 and Supplementary Information). After removal of genes that received any orthologous group assignment, the remaining novel genes were subsequently clustered¹⁷. Approximately 79% of the HMP-derived novel gene clusters were orthologous to one or more clusters in MetaHIT, while an additional 16% were unique to this study versus 5% for MetaHIT-derived data⁵ (Supplementary Fig. 11, Supplementary Table 8 and Supplementary Information). These results suggest that, for this body habitat, relatively similar gene catalogues were recovered despite differences in experimental design and protocols. However, a greater proportion of both annotated and unique novel genes were detected in the HMP data set, emphasizing the utility of sequencing depth in recovering gene function and, in particular, deriving rare function. These results further underscore the importance of large-scale sequence-based studies of the microbiome to characterize better its gene content and diversity.

Human microbiome reference genomes

The current goal for the reference genome component of the HMP is to sequence at least 3,000 reference bacterial genomes, and additional viral and microeukaryotic genomes, associated with the human body. Thus far, more than 800 genomes have been sequenced and are available from the NCBI and the DACC (http://hmpdacc.org/HMRGD). From an alignment of WGS reads to reference genomes (RES:HMREFG), approximately 26% from the total read set (46% of all reads that could be aligned) were matched to a subset of 223 HMP reference genomes (Supplementary Information and Supplementary Data).

We continue to solicit community feedback for strains that will best benefit our attempts at understanding the breadth of human microbiome diversity. For example, a prioritized list of the ‘most wanted’ HMP taxa is being maintained (http://hmpdacc.org/most_wanted/) with the goal of targeting these difficult to obtain organisms using both culture-based and single-cell approaches.

A catalogue of all HMP reference genomes along with custom filtering, viewing, graphing and download options can be found at the DACC Project Catalogue (http://www.hmpdacc-resources.org/hmp_catalog/main.cgi). In addition, comparative analyses of reference genomes are provided by the data warehouse and analytical systems, Integrated Microbial Genomes/HMP (http://www.hmpdacc-resources.org/cgi-bin/imgm_hmp/main.cgi). Cultures of all HMP reference strains are required to be made publicly available through the Biodefense and Emerging Infections Research Resources Repository (BEI). Information on strain acquisition can be found at the DACC (http://hmpdacc.org/reference_genomes/reference_genomes.php) and BEI (http://www.beiresources.org/tabid/1901/stabid/1901/CollectionLinkID/4/Default.aspx).

Conclusion

An overarching goal of this multi-year, multi-centre project is the generation of a community resource to advance research efforts related to the microbiome. The result is a collection of 11,174 primary biological specimens representing the human microbiome, as well as corresponding blood samples from the human donors, which are being reserved for sequencing at a future date and from which cell lines will be developed. A variety of new protocols were developed to enable a project of this scope; these include methods for donor recruitment, laboratory and sequence processing, and analysis of 16S and WGS sequence and profiles. These resources serve as models to guide the design of similar projects. Studies with a primary focus on disease can use this reference for comparative purposes, including detecting shifts in microbial taxonomic and functional profiles, or identification of new species not present in healthy cohorts that appear under disease conditions. The catalogue described in this study is, to our knowledge, the largest and most comprehensive reference set of human microbiome data associated with healthy adult individuals. Collectively the data represent a treasure trove that can be mined to identify new organisms, gene functions, and metabolic and regulatory networks, as well as correlations between microbial community structure and health and disease⁴. Among other future benefits, this resource may promote the development of novel prophylactic strategies such as the application of prebiotics and probiotics to foster human health.

Methods Summary

As part of a multi-institutional collaboration, the HMP human subjects study was reviewed by the Institutional Review Boards (IRBs) at each sampling site: the BCM (IRB protocols H-22895 (IRB no. 00001021) and H-22035 (IRB no. 00002649)); Washington University School of Medicine (IRB protocol HMP-07-001 (IRB no. 201105198)); and St Louis University (IRB no. 15778). The study was also reviewed by the J. Craig Venter Institute under IRB protocol 2008-084 (IRB no. 00003721), and at the Broad Institute of MIT and Harvard the study was determined to be exempt from IRB review. All study participants gave their written informed consent before sampling and the study was conducted using the Human Microbiome Project Core Sampling Protocol A. Each IRB has a federal-wide assurance and follows the regulations established in 45 CFR Part 46. The study was conducted in accordance with the ethical principles expressed in the Declaration of Helsinki and the requirements of applicable federal regulations.

All further details are in Supplementary Information.

Accession codes

Data deposits

Accession numbers for all primary sequencing data are given in Supplementary Information.

References

Gilbert, J. A. & Dupont, C. L. Microbial metagenomics: beyond the genome. Annu. Rev. Mar. Sci. 3, 347–371 (2011)
Article ADS Google Scholar
NIH HMP Working Group et al. The NIH Human Microbiome Project. Genome Res. 19, 2317–2323 (2009)
Human Microbiome Jumpstart Reference Strains Consortium. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010)
The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature http://dx.doi.org/10.1038/nature11234 (this issue)
Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010)
Article CAS Google Scholar
Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS ONE http:dx.plos.org/10.1371/journal.pone.0039315 (14 June 2012)
Kunin, V., Engelbrektson, A., Ochman. H & Hugenholtz, P. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ. Microbiol. 12, 118–123 (2010)
Article CAS Google Scholar
Huse, S. M., Huber, J. A., Morrison, H. G., Sogin, M. L. & Welch, D. M. Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol. 8, R143 (2007)
Article Google Scholar
Schloss, P. D., Gevers, D. & Westcott, S. L. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE 6, e27310 (2011)
Article ADS CAS Google Scholar
Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009)
Article CAS Google Scholar
Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7, 335–336 (2010)
Article CAS Google Scholar
Martin, J. S. et al. Optimizing read mapping to reference genomes to determine composition and species prevalence in microbial communities. PLoS ONE http://dx.doi.org/10.1371/journal.pone.0036427 (14 June 2012)
Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. http://dx.doi.org/10.1371/journal.pcbi.1002358 (14 June 2012)
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000)
Article CAS Google Scholar
Goll, J. et al. A case study for large-scale human microbiome analysis using JCVI’s Metagenomics Reports (METAREP). PLoS ONE http://dx.doi.org/10.1371/journal.pone.002904 (14 June 2012)
Muller, J. et al. eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010)
Article ADS CAS Google Scholar
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010)
Article CAS Google Scholar

Download references

Acknowledgements

The consortium would like to thank our external scientific advisory board: R. Blumberg, J. Davies, R. Holt, P. Ossorio, F. Ouellette, G. Schoolnik and A. Williamson. We would also like to thank our collaborators throughout the International Human Microbiome Consortium, particularly the investigators of the MetaHIT project, for advancing human microbiome research. Data repository management was provided by the NCBI and the Intramural Research Program of the NIH National Library of Medicine. We especially appreciate the generous participation of the individuals from the St Louis, Missouri, and Houston, Texas areas who made this study possible. This research was supported in part by NIH grants U54HG004969 to B.W.B.; U54HG003273 to R.A.G.; U54HG004973 to R.A.G., S.K.H. and J.F.P.; U54HG003067 to E. S. Lander.; U54AI084844 to K.E.N.; N01AI30071 to R. L. Strausberg; U54HG004968 to G.M.W.; U01HG004866 to O.W.; U54HG003079 to R.K.W.; R01HG005969 to C.H.; R01HG004872 to R.K.; R01HG004885 to M.P.; R01HG005975 to P.D.S.; R01HG004908 to Y.Y.; R01HG004900 to M. K. Cho and P. Sankar; R01HG005171 to D.E.H.; R01HG004853 to A.L.M.; R01HG004856 to R.R.; R01HG004877 to R.R.S. and R.M.F.; R01HG005172 to P. Spicer; R01HG004857 to M.P.; R01HG004906 to T.M.S.; R21HG005811 to E.A.-V.; G.A.B. was supported by UH2AI083263 and UH3AI083263 (G.A.B., C. N. Cornelissen, L. K. Eaves and J. F. Strauss); M.J.B. was supported by UH2AR057506, S.M.H. was supported by UH3DK083993 (V. B. Young, E. B. Chang, F. Meyer, T.M.S., M. L. Sogin, J. M. Tiedje); K.P.R. was supported by UH2DK083990 (J.V.); J.A.S. and H.H.K. were supported by UH2AR057504 and UH3AR057504 (J.A.S.); DP2OD001500 to K.M.A.; N01HG62088 to the Coriell Institute for Medical Research; U01DE016937 to F.E.D.; S.K.-H. was supported by RC1DE020298and R01DE021574 (S.K.-H. and H. Li); J.I. was supported by R21CA139193 (J.I. and D. S. Michaud); K.P.L. was supported by P30DE020751 (D. J. Smith); Army Research Office grant W911NF-11-1-0473 to C.H.; National Science Foundation grants NSF DBI-1053486 to C.H. and NSF IIS-0812111 to M.P.; The Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231 for P.S.C.; LANL Laboratory-Directed Research and Development grant 20100034DR and the US Defense Threat Reduction Agency grants B104153I and B084531I to P.S.C.; Research Foundation - Flanders (FWO) grant to K.F. and J. Raes; R.K. is a Howard Hughes Medical Institute (HHMI) Early Career Scientist; Gordon & Betty Moore Foundation funding and institutional funding from the J. David Gladstone Institutes to K.S.P.; A.M.S. was supported by fellowships provided by the Rackham Graduate School and the NIH Molecular Mechanisms in Microbial Pathogenesis Training Grant T32AI007528; a Crohn’s and Colitis Foundation of Canada Grant in Aid of Research to E.A.-V.; 2010 IBM Faculty Award to K.C.W. Analysis of the HMP data was performed using National Energy Research Scientific Computing resources; the BluBioU Computational Resource at Rice University.

Author information

Susan Kinder-Haake: Deceased.

Authors and Affiliations

J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, 20850, Maryland, USA
Barbara A. Methé, Karen E. Nelson, Ramana Madupu, Monika Bihan, Dana A. Busam, A. Scott Durkin, Leslie Foster, Johannes Goll, Kelvin Li, Jamison M. McCorrison, Jason R. Miller, Yu-Hui Rogers, Ravi K. Sanka, Indresh Singh, Granger G. Sutton, Mathangi Thiagarajan & Manolito Torralba
Center for Bioinformatics and Computational Biology and Department of Computer Science, University of Maryland, Biomolecular Sciences Building Rm. 3120F, College Park, 20742, Maryland, USA
Mihai Pop, Sergey Koren, Bo Liu & Daniel D. Sommer
University of Maryland School of Medicine, Institute for Genome Sciences 801 W. Baltimore Street, Baltimore, 21201, Maryland, USA
Heather H. Creasy, Michelle G. Giglio, Olukemi O. Abolude, Cesar A. Arze, Brandi L. Cantarel, Jonathan Crabtree, Noam J. Davidovics, Victor M. Felix, Catherine Jordan, Anup A. Mahurkar, Joshua Orvis, Jacques Ravel, Lynn Schriml, James R. White & Owen White
Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, Massachusetts 02115, USA.,
Curtis Huttenhower, J. Fah Sathirapongsasuti & Nicola Segata
The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, 02142, Massachusetts, USA
Curtis Huttenhower, Dirk Gevers, Ashlee M. Earl, Michael G. FitzGerald, Jennifer R. Wortman, Sarah K. Young, Qiandong Zeng, Eric J. Alm, Lucia Alvarado, Scott Anderson, Harindra M. Arachchi, Toby Bloom, Dawn M. Ciulla, Rachel L. Erlich, Michael Feldgarden, Sheila Fisher, Dennis C. Friedrich, Georgia Giannoukos, Jonathan M. Goldberg, Allison Griggs, Sharvari Gujja, Brian J. Haas, Theresa A. Hepburn, Clinton Howarth, Katherine H. Huang, Cristyn Kells, Teena Mehta, Chad Nusbaum, Matthew Pearson, Margaret E. Priest, Carsten Russ, Narmada Shenoy, Sean M. Sykes, Diana G. Tabbaa, Zhengyuan Wang, Doyle V. Ward, Chandri Yandava, Jeremy D. Zucker & Bruce W. Birren
Baylor College of Medicine Human Genome Sequencing Center, One Baylor Plaza, Houston, 77030, Texas, USA
Joseph F. Petrosino, Donna M. Muzny, Kim C. Worley, Christian J. Buhay, Yan Ding, Shannon P. Dugan, Michael E. Holder, Huaiyang Jiang, Vandita Joshi, Christie L. Kovar, Sandra L. Lee, Niall Lennon, Lora Lewis, Yue Liu, Irene Newsham, Xiang Qin, Jeffrey G. Reid, Katarzyna Wilczek-Boney, Yuan Qing Wu, Lan Zhang, Yiming Zhu, Richard A. Gibbs & Sarah K. Highlander
Washington University School of Medicine, The Genome Institute, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA.,
Sahar Abubucker, Asif T. Chinwalla, Robert S. Fulton, Kymberlie Hallsworth-Pepin, Elizabeth A. Lobos, Vincent Magrini, John C. Martin, Makedonka Mitreva, Erica J. Sodergren, Aye M. Wollam, Elizabeth Appelbaum, Veena Bhonagiri, Lei Chen, Sandra W. Clifton, Kimberley D. Delehaunty, Elena Deych, David J. Dooling, Candace N. Farmer, Catrina C. Fronick, Lucinda L. Fulton, Brandi Herter, Karthik C. Kota, Elaine R. Mardis, Kathie A. Mihindukulasuriya, Patrick J. Minx, Michelle O’Laughlin, Craig Pohl, Chad M. Tomlinson, Jason Walker, Wesley Warren, Kristine M. Wylie, Todd Wylie, Liang Ye, Yanjiao Zhou, George M. Weinstock & Richard K. Wilson
Department of Pathology & Immunology, Baylor College of Medicine, One Baylor Plaza, Houston, 77030, Texas, USA
James Versalovic & Hongyu Gao
Texas Children’s Hospital Department of Pathology, 6621 Fannin Street, Houston, Texas 77030, USA.,
James Versalovic
Department of Obstetrics & Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine, One Baylor Plaza, Houston, 77030, Texas, USA
Kjersti M. Aagaard
University of Guelph Department of Molecular and Cellular Biology, 50 Stone Road East, Guleph, Ontario N1G 2W1, Canada.,
Emma Allen-Vercoe
Department of Civil & Environmental Engineering, Massachusetts Institute of Technology, Parsons Laboratory, Room 48-317, 15 Vassar Street, Cambridge, Massachusetts 02139, USA.,
Eric J. Alm
Lawrence Berkeley National Laboratory, Center for Environmental Biotechnology, 1 Cyclotron Road, Berkeley, California 94720, USA.,
Gary L. Andersen
University of California, San Francisco, School of Dentistry, 707 Parnassus Avenue, San Francisco, California 94143, USA.,
Gary Armitage
Baylor College of Medicine, Molecular Virology and Microbiology, One Baylor Plaza, Houston, 77030, Texas, USA
Joseph F. Petrosino, Tulin Ayvaz, Wendy A. Keitel, Matthew C. Ross, Bonnie P. Youmans & Sarah K. Highlander
National Institutes of Health, National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS), 6701 Democracy Boulevard, MSC 4872, Bethesda, Maryland 20892, USA.,
Carl C. Baker
National Institutes of Health, Office of Research on Women’s Health (ORWH), 6707 Democracy Boulevard, MSC 5484, Bethesda, Maryland 20892, USA.,
Lisa Begg
National Institutes of Health, National Institute for Allergy and Infectious Diseases (NIAID), 6610 Rockledge Drive, MSC 6603, Bethesda, Maryland 20892, USA.,
Tsegahiwot Belachew, Joseph L. Campbell, Carolyn Deal, Valentina Di Francesco, Christina Giblin & Maria Y. Giovanni
Department of Medicine, New York University Langone Medical Center, 550 First Avenue, OBV A-606, New York, New York 10016, USA.,
Martin J. Blaser
National Institutes of Health, National Human Genome Research Institute (NHGRI), 5635 Fishers Lane, MSC 9305, Bethesda, Maryland 20892, USA.,
Vivien R. Bonazzi, Joseph L. Campbell, Shaila Chhibba, Jean McEwen, Jane Peterson, Lita M. Proctor, Jeffery A. Schloss, Lu Wang, Christopher Wellington & Kris A. Wetterstrand
Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, PO Box 843083, Richmond, Virginia 23284, USA.,
Paul Brooks
Virginia Commonwealth University, Center for the Study of Biological Complexity, 1000 West Cary Street, Richmond, Virginia 23284, USA.,
Paul Brooks, Gregory A. Buck, Maria C. Rivera & Nihar U. Sheth
Department of Biology, Virginia Commonwealth University, 1000 West Cary Street, Richmond, Virginia 23284, USA.,
Gregory A. Buck & Maria C. Rivera
Lawrence Berkeley National Laboratory, Technology Integration Group, National Energy Research Scientific Computing Center, 1 Cyclotron Road, Berkeley, California 94720, USA.,
Shane R. Canon
Bioscience Division, Los Alamos National Laboratory Genome Science Group, HRL, MS-888, LANL, Los Alamos, New Mexico 87545, USA.,
Patrick S. Chain, Chien-Chi Lo & Matthew Scholz
Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, California 94598, USA.,
Patrick S. Chain, Nikos C. Kyrpides, Konstantinos Liolios, Victor M. Markowitz, Konstantinos Mavrommatis & Ioanna Pagani
Computational Research Division, Lawrence Berkeley National Laboratory, Biological Data Management and Technology Center, 1 Cyclotron Road, Berkeley, California 94720, USA.,
I-Min A. Chen, Ken Chu, Victor M. Markowitz & Krishna Palaniappan
Department of Chemistry and Biochemistry, University of Colorado, Campus Box 215, University of Colorado, Boulder, 80309-0215, Colorado, USA
Jose C. Clemente, Rob Knight, Catherine A. Lozupone & Daniel McDonald
National Institutes of Health, National Institute of Dental and Craniofacial Research (NIDCR), 6701 Democracy Boulavard, MSC 4878, Bethesda, Maryland 20892, USA.,
Mary A. Cutting, Holli A. Hamilton, Emily L. Harris, R. Dwayne Lunsford & Pamela McInnes
The Procter & Gamble Company, FemCare Product Safety and Regulatory Affairs, 6110 Center Hill Avenue, Cincinnati, Ohio 45224, USA.,
Catherine C. Davis
Inc. Bioinformatics Department, Second Genome, 1150 Bayhill Drive, Suite 215, San Bruno, California 94066, USA.,
Todd Z. DeSantis
Department of Molecular Genetics, Forsyth Institute, 245 First Street, Cambridge, Massachusetts 02142, USA.,
Floyd E. Dewhirst, Jacques Izard & Katherine P. Lemon
Department of Oral Medicine, Harvard School of Dental Medicine, Infection and Immunity, 188 Longwood Avenue, Boston, Massachusetts 02115, USA.,
Floyd E. Dewhirst
Department of Pathology & Immunology, Washington University School of Medicine, 660 South Euclid Avenue, Box 8118, St Louis, Missouri 63110, USA.,
W. Michael Dunne Jr & Mark A. Watson
bioMerieux, Inc., 100 Rodolphe Street, Durham, North Carolina 27712, USA.,
W. Michael Dunne Jr
drive5.com, Tiburon, 94920, California, USA
Robert C. Edgar
Cleveland Clinic, Center for Bioethics, Humanities and Spiritual Care, 9500 Euclid Avenue, Cleveland, Ohio 44195, USA.,
Ruth M. Farrell & Richard R. Sharp
Department of Structural Biology, VIB, Belgium, Pleinlaan 2, 1050 Brussels, Belgium.,
Karoline Faust & Jeroen Raes
Department of Applied Biological Sciences (DBIT), Vrije Universiteit Brussels, Pleinlaan 2, 1050 Brussels, Belgium.,
Karoline Faust & Jeroen Raes
Department of Bioinformatics and Genomics, University of North Carolina Charlotte, 9201 University City Blvd, Charlotte, 28223-0001, North Carolina, USA
Anthony A. Fodor
Department of Biological Sciences, University of Idaho, Life Sciences South Room 441A, PO Box 443051, Moscow, Idaho 83844, USA.,
Larry Forney
Massachusetts Institute of Technology, Computational and Systems Biology, Parsons Laboratory, Room 48-317, 15 Vassar Street, Cambridge, Massachusetts 02139, USA.,
Jonathan Friedman & Chris S. Smillie
Saint Louis University, Center for Advanced Dental Education, 3320 Rutger Street, St Louis, Missouri 63104, USA.,
Nathalia Garcia
Department of Computer Science, University of Colorado, University of Colorado, Boulder, 80309, Colorado, USA
Antonio Gonzalez & Dan Knights
University of Maryland Francis King Carey School of Law, 500 W. Baltimore Street, Baltimore, Maryland 21201, USA.,
Diane E. Hoffmann
Marine Biological Laboratory, Josephine Bay Paul Center, 7 MBL Street, Woods Hole, Massachusetts 02543-1015, USA.,
Susan M. Huse
Department of Oral Medicine, Harvard School of Dental Medicine, Infection and Immunity, 188 Longwood Avenue, Boston, Massachusetts 02115, USA.,
Jacques Izard
Ecology Department, Earth Sciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, California 94720, USA.,
Janet K. Jansson
Department of Periodontics, University of Texas Health Science Center School of Dentistry, 6516 MD Anderson Blvd, Houston, Texas 77030, USA.,
James A. Katancik
Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA.,
Scott T. Kelley & Beltran Rodriguez-Mueller
Division of Associated Clinical Specialties and Dental Research Institute, UCLA School of Dentistry, 10833 Le Conte Avenue, Los Angeles, California 90095-1668, USA.,
Susan Kinder-Haake
McGill University, Faculty of Medicine, Peel 3647 Montreal, Quebec H3A 1X1, Canada.,
Nicholas B. King
Howard Hughes Medical Institute, Campus Box 215, Boulder, Colorado 80309-0215, USA.,
Rob Knight
National Institutes of Health, National Cancer Institute (NCI), Dermatology Branch, CCR, MSC 1908, 10 Center Drive, Bethesda, Maryland 20892, USA.,
Heidi H. Kong
Department of Microbiology, Cornell University, 467 Biotechnology Building, Ithaca, 14853, New York, USA
Omry Koren & Ruth E. Ley
Department of Medicine, Division of General Medical Science, Washington University School of Medicine, 660 South Euclid Avenue, Box 8005, St Louis, Missouri 63110, USA.,
Patricio S. La Rosa & William D. Shannon
Division of Infectious Diseases, Children’s Hospital Boston, Harvard Medical School, 300 Longwood Avenue, Boston, Massachusetts 02115, USA.,
Katherine P. Lemon
Department of Anthropology, University of Oklahoma, 455 West Lindsey, Dale Hall Tower 521, Norman, Oklahoma 73019, USA.,
Cecil M. Lewis & Paul Spicer
Department of Obstetrics and Gynecology, Washington University School of Medicine, 4533 Clayton Avenue, Box 8219, St Louis, Missouri 63110, USA.,
Tessa Madden
Division of Gastroenterology and Hepatology, University of Alabama at Birmingham, 1530 3rd Avenue South, Birmingham, Alabama 35294-1150, USA.,
Peter J. Mannon
Baylor College of Medicine, Center for Medical Ethics and Health Policy, One Baylor Plaza, Houston, 77030, Texas, USA
Amy L. McGuire
Baylor College of Medicine, Medicine-Infectious Disease, One Baylor Plaza, Houston, 77030, Texas, USA
Shital M. Patel
Biosciences Division, Oak Ridge National Laboratory, PO Box 2008 MS 6038 Oak Ridge, Tennessee 37831-6038, USA.,
Mircea Podar & Tatiana A. Vishnivetskaya
University of California, San Francisco, Gladstone Institutes, 1650 Owens Street, San Francisco, California 94158, USA.,
Katherine S. Pollard, Thomas J. Sharpton & Rebecca M. Truty
University of California, San Francisco, Institute for Human Genetics, 1650 Owens Street, San Francisco, California 94158, USA.,
Katherine S. Pollard
Division of Biostatistics, University of California, San Francisco, 1650 Owens Street, San Francisco, California 94158, USA.,
Katherine S. Pollard
Department of Microbiology and Immunology, University of Maryland School of Medicine, BioPark II - Room 611, 801 W. Baltimore Street, Baltimore, Maryland 21201, USA.,
Jacques Ravel
Indiana University, School of Informatics and Computing, 150 S. Woodlawn Avenue, Bloomington, Indiana 47405, USA.,
Mina Rho & Yuzhen Ye
Mount Sinai School of Medicine, Annenberg Building Floor 5th, Room 5-208, 1468 Madison Avenue, New York, New York 10029, USA.,
Rosamond Rhodes
Baylor College of Medicine Molecular & Human Genetics, One Baylor Plaza, Houston, 77030, Texas, USA
Kevin P. Riehle
Center for Bioethics and Department of Medical Ethics, University of Pennsylvania, 3401 Market Street, Suite 320, Philadelphia, Pennsylvania 19104, USA.,
Pamela Sankar
Department of Microbiology & Immunology, University of Michigan, 5713 Medical Science Bldg. II, 1150 West Medical Center Dr., Ann Arbor, 48109-5620, Michigan, USA
Patrick D. Schloss & Alyxandria M. Schubert
Department of Microbiology and Molecular Genetics, Michigan State University, 6180 Biomedical Physical Sciences, Michigan State University, East Lansing, 48824, Michigan, USA
Thomas M. Schmidt
The EMMES Corporation, 401 N. Washington St., Suite 700, Rockville, 20850, Maryland, USA
Gina A. Simone
Wayne State University School of Medicine, Detroit, Michigan, Harper University Hospital, 3990 John R Street, Detroit, Michigan 48201, USA.,
Jack D. Sobel
Johns Hopkins University School of Medicine, McKusick-Nathans Institute of Genetic Medicine, Bloomberg School of Public Health, E3138, 615 N Wolfe St, Baltimore, Maryland 21205, USA.,
Todd J. Treangen
J. Craig Venter Institute, 10355 Science Center Drive, San Diego, 92121, California, USA
Jonathan H. Badger & Shibu Yooseph
Northwestern University, Feinberg School of Medicine, 420 East Superior Street Chicago, Illinois 60611, USA.,
Laurie Zoloth
Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine, One Baylor Plaza, Houston, 77030, Texas, USA
Joseph F. Petrosino
National Institutes of Health, National Human Genome Research Institute (NHGRI), Genetics and Molecular Biology Branch, MSC 4442, Bethesda, 20892, Maryland, USA
Sean Conlan & Julia A. Segre

Consortia

The Human Microbiome Project Consortium

Barbara A. Methé
, Karen E. Nelson
, Mihai Pop
, Heather H. Creasy
, Michelle G. Giglio
, Curtis Huttenhower
, Dirk Gevers
, Joseph F. Petrosino
, Sahar Abubucker
, Jonathan H. Badger
, Asif T. Chinwalla
, Ashlee M. Earl
, Michael G. FitzGerald
, Robert S. Fulton
, Kymberlie Hallsworth-Pepin
, Elizabeth A. Lobos
, Ramana Madupu
, Vincent Magrini
, John C. Martin
, Makedonka Mitreva
, Donna M. Muzny
, Erica J. Sodergren
, James Versalovic
, Aye M. Wollam
, Kim C. Worley
, Jennifer R. Wortman
, Sarah K. Young
, Qiandong Zeng
, Kjersti M. Aagaard
, Olukemi O. Abolude
, Emma Allen-Vercoe
, Eric J. Alm
, Lucia Alvarado
, Gary L. Andersen
, Scott Anderson
, Elizabeth Appelbaum
, Harindra M. Arachchi
, Gary Armitage
, Cesar A. Arze
, Tulin Ayvaz
, Carl C. Baker
, Lisa Begg
, Tsegahiwot Belachew
, Veena Bhonagiri
, Monika Bihan
, Martin J. Blaser
, Toby Bloom
, Vivien R. Bonazzi
, Paul Brooks
, Gregory A. Buck
, Christian J. Buhay
, Dana A. Busam
, Joseph L. Campbell
, Shane R. Canon
, Brandi L. Cantarel
, Patrick S. Chain
, I-Min A. Chen
, Lei Chen
, Shaila Chhibba
, Ken Chu
, Dawn M. Ciulla
, Jose C. Clemente
, Sandra W. Clifton
, Sean Conlan
, Jonathan Crabtree
, Mary A. Cutting
, Noam J. Davidovics
, Catherine C. Davis
, Todd Z. DeSantis
, Carolyn Deal
, Kimberley D. Delehaunty
, Floyd E. Dewhirst
, Elena Deych
, Yan Ding
, David J. Dooling
, Shannon P. Dugan
, W. Michael Dunne Jr
, A. Scott Durkin
, Robert C. Edgar
, Rachel L. Erlich
, Candace N. Farmer
, Ruth M. Farrell
, Karoline Faust
, Michael Feldgarden
, Victor M. Felix
, Sheila Fisher
, Anthony A. Fodor
, Larry Forney
, Leslie Foster
, Valentina Di Francesco
, Jonathan Friedman
, Dennis C. Friedrich
, Catrina C. Fronick
, Lucinda L. Fulton
, Hongyu Gao
, Nathalia Garcia
, Georgia Giannoukos
, Christina Giblin
, Maria Y. Giovanni
, Jonathan M. Goldberg
, Johannes Goll
, Antonio Gonzalez
, Allison Griggs
, Sharvari Gujja
, Brian J. Haas
, Holli A. Hamilton
, Emily L. Harris
, Theresa A. Hepburn
, Brandi Herter
, Diane E. Hoffmann
, Michael E. Holder
, Clinton Howarth
, Katherine H. Huang
, Susan M. Huse
, Jacques Izard
, Janet K. Jansson
, Huaiyang Jiang
, Catherine Jordan
, Vandita Joshi
, James A. Katancik
, Wendy A. Keitel
, Scott T. Kelley
, Cristyn Kells
, Susan Kinder-Haake
, Nicholas B. King
, Rob Knight
, Dan Knights
, Heidi H. Kong
, Omry Koren
, Sergey Koren
, Karthik C. Kota
, Christie L. Kovar
, Nikos C. Kyrpides
, Patricio S. La Rosa
, Sandra L. Lee
, Katherine P. Lemon
, Niall Lennon
, Cecil M. Lewis
, Lora Lewis
, Ruth E. Ley
, Kelvin Li
, Konstantinos Liolios
, Bo Liu
, Yue Liu
, Chien-Chi Lo
, Catherine A. Lozupone
, R. Dwayne Lunsford
, Tessa Madden
, Anup A. Mahurkar
, Peter J. Mannon
, Elaine R. Mardis
, Victor M. Markowitz
, Konstantinos Mavrommatis
, Jamison M. McCorrison
, Daniel McDonald
, Jean McEwen
, Amy L. McGuire
, Pamela McInnes
, Teena Mehta
, Kathie A. Mihindukulasuriya
, Jason R. Miller
, Patrick J. Minx
, Irene Newsham
, Chad Nusbaum
, Michelle O’Laughlin
, Joshua Orvis
, Ioanna Pagani
, Krishna Palaniappan
, Shital M. Patel
, Matthew Pearson
, Jane Peterson
, Mircea Podar
, Craig Pohl
, Katherine S. Pollard
, Margaret E. Priest
, Lita M. Proctor
, Xiang Qin
, Jeroen Raes
, Jacques Ravel
, Jeffrey G. Reid
, Mina Rho
, Rosamond Rhodes
, Kevin P. Riehle
, Maria C. Rivera
, Beltran Rodriguez-Mueller
, Yu-Hui Rogers
, Matthew C. Ross
, Carsten Russ
, Ravi K. Sanka
, Pamela Sankar
, J. Fah Sathirapongsasuti
, Jeffery A. Schloss
, Patrick D. Schloss
, Thomas M. Schmidt
, Matthew Scholz
, Lynn Schriml
, Alyxandria M. Schubert
, Nicola Segata
, Julia A. Segre
, William D. Shannon
, Richard R. Sharp
, Thomas J. Sharpton
, Narmada Shenoy
, Nihar U. Sheth
, Gina A. Simone
, Indresh Singh
, Chris S. Smillie
, Jack D. Sobel
, Daniel D. Sommer
, Paul Spicer
, Granger G. Sutton
, Sean M. Sykes
, Diana G. Tabbaa
, Mathangi Thiagarajan
, Chad M. Tomlinson
, Manolito Torralba
, Todd J. Treangen
, Rebecca M. Truty
, Tatiana A. Vishnivetskaya
, Jason Walker
, Lu Wang
, Zhengyuan Wang
, Doyle V. Ward
, Wesley Warren
, Mark A. Watson
, Christopher Wellington
, Kris A. Wetterstrand
, James R. White
, Katarzyna Wilczek-Boney
, Yuan Qing Wu
, Kristine M. Wylie
, Todd Wylie
, Chandri Yandava
, Liang Ye
, Yuzhen Ye
, Shibu Yooseph
, Bonnie P. Youmans
, Lan Zhang
, Yanjiao Zhou
, Yiming Zhu
, Laurie Zoloth
, Jeremy D. Zucker
, Bruce W. Birren
, Richard A. Gibbs
, Sarah K. Highlander
, George M. Weinstock
, Richard K. Wilson
& Owen White

Contributions

Principal investigators: B.W.B., R.A.G., S.K.H., B.A.M., K.E.N., J.F.P., G.M.W., O.W., R.K.W. Manuscript preparation: B.A.M., K.E.N., M.P., H.H.C., M.G.G., D.G., C.H., J.F.P. Funding agency management: C.C.B., T.B., V.R.B., J.L.C., S.C., C.D., V.D.F., C.G., M.Y.G., R.D.L., J.M., P.M., J.P., L.M.P., J.A.S., L.W., C.W., K.A.W. Project leadership: S.A., J.H.B., B.W.B., A.T.C., H.H.C., A.M.E., M.G.F., R.S.F., D.G., M.G.G., K.H., S.K.H., C.H., E.A.L., R.M., V.M., J.C.M., B.A.M., M.M., D.M.M., K.E.N., J.F.P., E.J.S., J.V., G.M.W., O.W., A.M.W., K.C.W., J.R.W., S.K.Y., Q.Z. Analysis preparation for manuscript: M.B., B.L.C., D.G., M.G.G., M.E.H., C.H., K.L., B.A.M., X.Q., J.R.W., M.T. Data release: L.A., T.B., I.A.C., K.C., H.H.C., N.J.D., D.J.D., A.M.E., V.M.F., L.F., J.M.G., S.G., S.K.H., M.E.H., C.J., V.J., C.K., A.A.M., V.M.M., T.M., M.M., D.M.M., J.O., K.P., J.F.P., C.P., X.Q., R.K.S., N.S., I.S., E.J.S., D.V.W., O.W., K.W., K.C.W., C.Y., B.P.Y., Q.Z. Methods and research development: S.A., H.M.A., M.B., D.M.C., A.M.E., R.L.E., M.F., S.F., M.G.F., D.C.F., D.G., G.G., B.J.H., S.K.H., M.E.H., W.A.K., N.L., K.L., V.M., E.R.M., B.A.M., M.M., D.M.M., C.N., J.F.P., M.E.P., X.Q., M.C.R., C.R., E.J.S., S.M.S., D.G.T., D.V.W., G.M.W., Y.W., K.M.W., S.Y., B.P.Y., S.K.Y., Q.Z. DNA sequence production: S.A., E.A., T.A., T.B., C.J.B., D.A.B., K.D.D., S.P.D., A.M.E., R.L.E., C.N.F., S.F., C.C.F., L.L.F., R.S.F., B.H., S.K.H., M.E.H., V.J., C.L.K., S.L.L., N.L., L.L., D.M.M., I.N., C.N., M.O., J.F.P., X.Q., J.G.R., Y.R., M.C.R., D.V.W., Y.W., B.P.Y., Y.Z. Clinical sample collection: K.M.A., M.A.C., W.M.D., L.L.F., N.G., H.A.H., E.L.H., J.A.K., W.A.K., T.M., A.L.M., P.M., S.M.P., J.F.P., G.A.S., J.V., M.A.W., G.M.W. Body site experts: K.M.A., E.A.V., G.A., L.B., M.J.B., C.C.D., F.E.D., L.F., J.I., J.A.K., S.K.H., H.H.K., K.P.L., P.J.M., J. Ravel., T.M.S., J.A.S., J.D.S., J.V. Ethical, legal and social implications: R.M.F., D.E.H., W.A.K., N.B.K., C.M.L., A.L.M., R.R., P. Sankar, P. Spicer, R.R.S., L.Z. Strain management: E.A.V., J.H.B., I.A.C., K.C., S.W.C., H.H.C., T.Z.D., A.S.D., A.M.E., M.G.F., M.G.G., S.K.H., V.J., N.C.K., S.L.L., L.L., K.L., E.A.L., V.M.M., B.A.M., D.M.M., K.E.N., I.N., I.P., L.S., E.J.S., C.M.T., M.T., D.V.W., G.M.W., A.M.W., Y.W., K.M.W., B.P.Y., L.Z., Y.Z. 16S data analysis: K.M.A., E.J.A., G.L.A., C.A.A., M.B., B.W.B., J.P.B., G.A.B., S.R.C., S.C., J.C., T.Z.D., F.E.D., E.D., A.M.E., R.C.E., M.F., A.A.F., J.F., K.F., H.G., D.G., B.J.H., T.A.H., S.M.H., C.H., J.I., J.K.J., S.T.K., S.K.H., R.K., H.H.K., O.K., P.S.L., R.E.L., K.L., C.A.L., D.M., B.A.M., K.A.M., M.M., M.P., J.F.P., M.P., K.S.P., X.Q., J. Raes, K.P.R., M.C.R., B.R., J.F.S., P.D.S., T.M.S., N.S., J.A.S., W.D.S., T.J.S., C.S.S., E.J.S., R.M.T., J.V., T.A.V., Z.W., D.V.W., G.M.W., J.R.W., K.M.W., Y.Y., S.Y., Y.Z. Shotgun data processing and alignments: C.J.B., J.C.C., E.D., D.G., A.G., M.E.H., H.J., D.K., K.C.K., C.L.K., Y.L., J.C.M., B.A.M., M.M., D.M.M., J.O., J.F.P., X.Q., J.G.R., R.K.S., N.U.S., I.S., E.J.S., G.G.S., S.M.S., J.W., Z.W., G.M.W., O.W., K.C.W., T.W., S.K.Y., L.Z. Assembly: H.M.A., C.J.B., P.S.C., L.C., Y.D., S.P.D., M.G.F., M.E.H., H.J., S.K., B.L., Y.L., C.L., J.C.M., J.M.M., J.R.M., P.J.M., M.M., J.F.P., M.P., M.E.P., X.Q., M.R., R.K.S., M.S., D.D.S., G.G.S., S.M.S., C.M.T., T.J.T., W.W., G.M.W., K.C.W., L.Y., Y.Y., S.K.Y., L.Z. Annotation: O.O.A., V.B., C.J.B., I.A.C., A.T.C., K.C., H.H.C., A.S.D., M.G.G., J.M.G., J.G., A.G., S.G., B.J.H., K.H., S.K.H., C.H., H.J., N.C.K., R.M., V.M.M., K.M., T.M., M.M., J.O., K.P., M.P., X.Q., N.S., E.J.S., G.G.S., S.M.S., M.T., G.M.W., K.C.W., J.R.W., C.Y., S.K.Y., Q.Z., L.Z. WGS Metabolic Reconstruction: S.A., B.L.C., J.G., C.H., J.I., B.A.M., M.M., B.R., A.M.S., N.S., M.T., G.M.W., S.Y., Q.Z., J.D.Z.

Corresponding author

Correspondence to Barbara A. Methé.

Ethics declarations

Competing interests

The author declare no competing financial interests.

Supplementary information

Supplementary Information

This file contains Supplementary Notes and Methods, Supplementary References, Supplementary Tables 1-8 and Supplementary Figures 1-11. (PDF 1285 kb)

Supplementary Data

This file contains the summary of HMP WGS reads from 754 samples that were mapped to 223 HMP Reference Genomes (as 25,758 contigs). A description of the process used to obtain this data is described in the Supplementary Text under the heading "Comparison of HMP WGS reads to HMP reference genomes." (XLS 64 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution-Non-Commercial-Share Alike licence (http://creativecommons.org/licenses/by-nc-sa/3.0/).

Reprints and permissions

About this article

Cite this article

The Human Microbiome Project Consortium. A framework for human microbiome research. Nature 486, 215–221 (2012). https://doi.org/10.1038/nature11209

Download citation

Received: 02 November 2011
Accepted: 10 May 2012
Published: 13 June 2012
Issue Date: 14 June 2012
DOI: https://doi.org/10.1038/nature11209

This article is cited by

Association between primary Sjögren’s syndrome and gut microbiota disruption: a systematic review and meta-analysis
- Yue Shen
- Xue Yu
- Xinchang Wang
Clinical Rheumatology (2024)
Composition of the colon microbiota in the individuals with inflammatory bowel disease and colon cancer
- Ceren Acar
- Sibel Kucukyildirim Celik
- Hatice Mergen
Folia Microbiologica (2024)
Evolutionary History of Periodontitis and the Oral Microbiota—Lessons for the Future
- Shashikiran Shanmugasundaram
- Namratha Nayak
- Ramya Arangaraju
Current Oral Health Reports (2024)
Unique microbial landscape in the human oropharynx during different types of acute respiratory tract infections
- Hui Li
- Xiaorong Wu
- Tao Ding
Microbiome (2023)
Realising respiratory microbiomic meta-analyses: time for a standardised framework
- David Broderick
- Robyn Marsh
- Michael W. Taylor
Microbiome (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.