Abstract

Many analyses of the human gut microbiome depend on a catalog of reference genes. Existing catalogs for the human gut microbiome are based on samples from single cohorts or on reference genomes or protein sequences, which limits coverage of global microbiome diversity. Here we combined 249 newly sequenced samples of the Metagenomics of the Human Intestinal Tract (MetaHit) project with 1,018 previously sequenced samples to create a cohort from three continents that is at least threefold larger than cohorts used for previous gene catalogs. From this we established the integrated gene catalog (IGC) comprising 9,879,896 genes. The catalog includes close-to-complete sets of genes for most gut microbes, which are also of considerably higher quality than in previous catalogs. Analyses of a group of samples from Chinese and Danish individuals using the catalog revealed country-specific gut microbial signatures. This expanded catalog should facilitate quantitative characterization of metagenomic, metatranscriptomic and metaproteomic data from the gut microbiome to understand its variation across populations in human health and disease.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Gene Expression Omnibus

References

  1. 1.

    , , & The impact of the gut microbiota on human health: an integrative view. Cell 148, 1258–1270 (2012).

  2. 2.

    et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).

  3. 3.

    The Human Microbiome Project Consortium. A framework for human microbiome research. Nature 486, 215–221 (2012).

  4. 4.

    et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

  5. 5.

    et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).

  6. 6.

    et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

  7. 7.

    et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Biotechnol. doi:10.1038/nbt.2939 (6 July 2014).

  8. 8.

    et al. Generation and analysis of a mouse intestinal metatranscriptome through Illumina based RNA-sequencing. PLOS ONE 7, e36009 (2012).

  9. 9.

    et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature 505, 559–563 (2014).

  10. 10.

    et al. Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn's disease. PLOS ONE 7, e49138 (2012).

  11. 11.

    et al. Supporting data for the paper: “An integrated catalog of reference genes in the human gut microbiome.” GigaScience Database doi:10.5524/100064 (2014).

  12. 12.

    & Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

  13. 13.

    et al. MOCAT: a metagenomics assembly and gene prediction toolkit. PLOS ONE 7, e47656 (2012).

  14. 14.

    , , & Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).

  15. 15.

    et al. IMG 4 version of the integrated microbial genomes comparative analysis system. Nucleic Acids Res. 42, D560–D567 (2014).

  16. 16.

    et al. A core gut microbiome in obese and lean twins. Nature 457, 480–484 (2009).

  17. 17.

    et al. Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes. DNA Res. 14, 169–181 (2007).

  18. 18.

    Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43, 783–791 (1987).

  19. 19.

    & Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50, 88–97 (1994).

  20. 20.

    et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).

  21. 21.

    , & Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132 (2010).

  22. 22.

    , , & Accurate and universal delineation of prokaryotic species. Nat. Methods 10, 881–884 (2013).

  23. 23.

    et al. Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011).

  24. 24.

    & KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).

  25. 25.

    et al. eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res. 40, D284–D289 (2012).

  26. 26.

    & Micro-eukaryotic diversity of the human distal gut microbiota: qualitative assessment using culture-dependent and -independent analysis of faeces. ISME J. 2, 1183–1193 (2008).

  27. 27.

    Prokaryotic and eukaryotic diversity of the human gut. Adv. Appl. Microbiol. 72, 43–62 (2010).

  28. 28.

    , & Microbial eukaryotes in the human microbiome: ecology, evolution, and future directions. Front. Microbiol. 2, 153 (2011).

  29. 29.

    et al. The long-term stability of the human gut microbiota. Science 341, 1237439 (2013).

  30. 30.

    et al. Country-specific antibiotic use practices impact the human gut resistome. Genome Res. 23, 1163–1169 (2013).

  31. 31.

    et al. Metagenome-wide analysis of antibiotic resistance genes in a large cohort of human gut microbiota. Nat. Commun. 4, 2151 (2013).

  32. 32.

    et al. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466, 334–338 (2010).

  33. 33.

    et al. The human gut virome: inter-individual variation and dynamic response to diet. Genome Res. 21, 1616–1625 (2011).

  34. 34.

    et al. Cryptic prophages help bacteria cope with adverse environments. Nat. Commun. 1, 147 (2010).

  35. 35.

    , , , & Going viral: next-generation sequencing applied to phage populations in the human gut. Nat. Rev. Microbiol. 10, 607–617 (2012).

  36. 36.

    , , & Antibiotic treatment expands the resistance reservoir and ecological network of the phage metagenome. Nature 499, 219–222 (2013).

  37. 37.

    et al. Comparative assessment of human and farm animal faecal microbiota using real-time quantitative PCR. FEMS Microbiol. Ecol. 68, 351–362 (2009).

  38. 38.

    et al. A pyrosequencing-based metagenomic study of methane-producing microbial community in solid-state biogas reactor. Biotechnol. Biofuels 6, 3 (2013).

  39. 39.

    et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013).

  40. 40.

    et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287 (2006).

  41. 41.

    et al. Genome-wide experimental determination of barriers to horizontal gene transfer. Science 318, 1449–1452 (2007).

  42. 42.

    et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).

  43. 43.

    et al. The “most wanted” taxa from the human microbiome for whole genome sequencing. PLOS ONE 7, e41294 (2012).

  44. 44.

    et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).

  45. 45.

    BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

  46. 46.

    , , , & SmashCommunity: a metagenomic annotation and analysis tool. Bioinformatics 26, 2977–2978 (2010).

  47. 47.

    et al. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).

  48. 48.

    , & SSAHA: a fast search method for large DNA databases. Genome Res. 11, 1725–1729 (2001).

  49. 49.

    World Health Organization Western Pacific Region & WHO/IASO/IOTF. The Asia Pacific perspective: redefining obesity and its treatment. Heal. Commun. Aust. Pty. Ltd. (2000) at 〈〉.

  50. 50.

    et al. The new BMI criteria for Asians by the regional office for the western pacific region of WHO are suitable for screening of overweight to prevent metabolic syndrome in elder Japanese workers. J. Occup. Health 45, 335–343 (2003).

  51. 51.

    , , & Prediction of hypertension, diabetes, dyslipidaemia or albuminuria using simple anthropometric indexes in Hong Kong Chinese. Int. J. Obes. Relat. Metab. Disord. 23, 1136–1142 (1999).

  52. 52.

    A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B. Stat. Methodol. 64, 479–498 (2002).

  53. 53.

    & Statistical significance for genome-wide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

Download references

Acknowledgements

This research was supported by the European Commission FP7 grant HEALTH-F4-2007-201052 and HEALTH-F4-2010-261376, Natural Science Foundation of China (30890032, 30725008, 30811130531 and 31161130357), the Shenzhen Municipal Government of China (ZYC200903240080A, BGI20100001, CXB201108250096A and CXB201108250098A), European Research Council CancerBiome grant (project reference 268985), METACARDIS project (FP7-HEALTH-2012-INNOVATION-I-305312), the Danish Strategic Research Council grant (2106-07-0021), the Ole Rømer grant from Danish Natural Science Research Council and the Solexa project (272-07-0196). Additional funding came from the Lundbeck Foundation Centre for Applied Medical Genomics in Personalized Disease Prediction, Prevention and Care (http://www.lucamp.org/), the Novo Nordisk Foundation Center for Basic Metabolic Research (an independent research center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation; http://www.metabol.ku.dk) and the Metagenopolis grant ANR-11-DPBS-0001. We are indebted to many additional faculty and staff of BGI-Shenzhen who contributed to this work.

Author information

Author notes

    • Junhua Li
    • , Huijue Jia
    • , Xianghang Cai
    • , Huanzi Zhong
    •  & Qiang Feng

    These authors contributed equally to this work.

Affiliations

  1. BGI-Shenzhen, Shenzhen, China.

    • Junhua Li
    • , Huijue Jia
    • , Xianghang Cai
    • , Huanzi Zhong
    • , Qiang Feng
    • , Manimozhiyan Arumugam
    • , Bing Chen
    • , Wenwei Zhang
    • , Juan Wang
    • , Xun Xu
    • , Liang Xiao
    • , Suisha Liang
    • , Dongya Zhang
    • , Zhaoxi Zhang
    • , Weineng Chen
    • , Hailong Zhao
    • , Huanming Yang
    • , Jian Wang
    •  & Jun Wang
  2. BGI Hong Kong Research Institute, Hong Kong, China.

    • Junhua Li
  3. School of Bioscience and Biotechnology, South China University of Technology, Guangzhou, China.

    • Junhua Li
  4. Department of Biology, University of Copenhagen, Copenhagen, Denmark.

    • Qiang Feng
    • , Karsten Kristiansen
    •  & Jun Wang
  5. European Molecular Biology Laboratory, Heidelberg, Germany.

    • Shinichi Sunagawa
    • , Manimozhiyan Arumugam
    • , Jens Roat Kultima
    • , Takuji Yamada
    • , Julien Tap
    • , Daniel R Mende
    •  & Peer Bork
  6. The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

    • Manimozhiyan Arumugam
    • , Trine Nielsen
    • , Torben Hansen
    • , Oluf Pedersen
    •  & Jun Wang
  7. INRA, Institut National de la Recherche Agronomique, Metagenopolis, Jouy en Josas, France.

    • Edi Prifti
    • , Florence Levenez
    • , Joel Doré
    • , S Dusko Ehrlich
    • , Nicolas Pons
    • , Emmanuelle Le Chatelier
    • , Jean-Michel Batto
    • , Sean Kennedy
    • , Florence Haimet
    • , Yohanan Winogradski
    •  & Julien Tap
  8. Center for Biological Sequence Analysis, Technical University of Denmark, Kongens Lyngby, Denmark.

    • Agnieszka Sierakowska Juncker
    • , Henrik Bjørn Nielsen
    •  & Søren Brunak
  9. Digestive System Research Unit, University Hospital Vall d'Hebron, Ciberehd, Barcelona, Spain.

    • Chaysavanh Manichanh
    • , Francisco Guarner
    • , Maria Antolin
    • , Francesc Casellas
    • , Natalia Borruel
    • , Encarna Varela
    •  & Antonio Torrejon
  10. Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University (KAU), Jeddah, Saudi Arabia.

    • Jumana Yousuf Al-Aama
  11. Princess Al-Jawhara AlBrahim Centre of Excellence in Research of Hereditary Disorders (PACER-HD), Faculty of Medicine, KAU, Jeddah, Saudi Arabia.

    • Jumana Yousuf Al-Aama
    • , Sherif Edris
    • , Huanming Yang
    •  & Jun Wang
  12. Department of Biological Sciences, Faculty of Science, King Abdulaziz University (KAU), Jeddah, Saudi Arabia.

    • Sherif Edris
  13. James D. Watson Institute of Genome Science, Hangzhou, China.

    • Huanming Yang
    •  & Jian Wang
  14. INRA, Institut National de la Recherche Agronomique, Unité mixte de Recherche 14121 Microbiologie de l'Alimentation au Service de la Santé, Jouy en Josas, France.

    • Joel Doré
    • , Antonella Cultrone
    • , Marion Leclerc
    • , Catherine Juste
    • , Eric Guedon
    • , Christine Delorme
    • , Séverine Layec
    • , Ghalia Khaci
    • , Maarten van de Guchte
    • , Gaetana Vandemeulebrouck
    • , Alexandre Jamet
    • , Rozenn Dervyn
    • , Nicolas Sanchez
    • , Hervé Blottière
    • , Emmanuelle Maguin
    •  & Pierre Renault
  15. Centre for Host-Microbiome Interactions, Dental Institute Central Office, King's College London, Guy's Hospital, London Bridge, UK.

    • S Dusko Ehrlich
  16. Max Delbrück Centre for Molecular Medicine, Berlin, Germany.

    • Peer Bork
  17. Macau University of Science and Technology, Macau, China.

    • Jun Wang
  18. Commissariat à l'Energie Atomique, Genoscope, France.

    • Eric Pelletier
    • , Denis LePaslier
    • , François Artiguenave
    • , Thomas Bruls
    •  & Jean Weissenbach
  19. Centre National de la Recherche Scientifique, UMR 8030, Evry, France.

    • Eric Pelletier
    • , Denis LePaslier
    • , François Artiguenave
    • , Thomas Bruls
    •  & Jean Weissenbach
  20. Evry, France, Université d'Evry Val d'Essone, Evry, France.

    • Eric Pelletier
    • , Denis LePaslier
    • , François Artiguenave
    • , Thomas Bruls
    •  & Jean Weissenbach
  21. The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.

    • Keith Turner
    •  & Julian Parkhill
  22. Danone Research, Palaiseau, France.

    • Gérard Denariaz
    • , Muriel Derrien
    • , Johan E T van Hylckama Vlieg
    •  & Patrick Viega
  23. Gut Biology & Microbiology, Danone Research, Centre for specialized nutrition, Wageningen, the Netherlands.

    • Raish Oozeer
    •  & Jan Knoll
  24. Istituto Europeo di Oncologia, Milan, Italy.

    • Maria Rescigno
  25. Institut Mérieux, Lyon, France.

    • Christian Brechot
    • , Christine M'Rini
    •  & Alexandre Mérieux
  26. Laboratory of Microbiology, Wageningen University, Utrecht, the Netherlands.

    • Sebastian Tims
    • , Erwin G Zoetendal
    • , Michiel Kleerebezem
    •  & Willem M de Vos
  27. Department of Bacteriology and Immunology, University of Helsinki, Helsinki, Finland.

    • Willem M de Vos

Consortia

  1. MetaHIT Consortium

    A full list of additional members and affiliations appears at the end of the paper.

Authors

  1. Search for Junhua Li in:

  2. Search for Huijue Jia in:

  3. Search for Xianghang Cai in:

  4. Search for Huanzi Zhong in:

  5. Search for Qiang Feng in:

  6. Search for Shinichi Sunagawa in:

  7. Search for Manimozhiyan Arumugam in:

  8. Search for Jens Roat Kultima in:

  9. Search for Edi Prifti in:

  10. Search for Trine Nielsen in:

  11. Search for Agnieszka Sierakowska Juncker in:

  12. Search for Chaysavanh Manichanh in:

  13. Search for Bing Chen in:

  14. Search for Wenwei Zhang in:

  15. Search for Florence Levenez in:

  16. Search for Juan Wang in:

  17. Search for Xun Xu in:

  18. Search for Liang Xiao in:

  19. Search for Suisha Liang in:

  20. Search for Dongya Zhang in:

  21. Search for Zhaoxi Zhang in:

  22. Search for Weineng Chen in:

  23. Search for Hailong Zhao in:

  24. Search for Jumana Yousuf Al-Aama in:

  25. Search for Sherif Edris in:

  26. Search for Huanming Yang in:

  27. Search for Jian Wang in:

  28. Search for Torben Hansen in:

  29. Search for Henrik Bjørn Nielsen in:

  30. Search for Søren Brunak in:

  31. Search for Karsten Kristiansen in:

  32. Search for Francisco Guarner in:

  33. Search for Oluf Pedersen in:

  34. Search for Joel Doré in:

  35. Search for S Dusko Ehrlich in:

  36. Search for Peer Bork in:

  37. Search for Jun Wang in:

Contributions

J.L., Q.F., S.D.E., P.B. and Jun W. managed the project. T.N., T.H., F.G. and O.P. performed clinical sampling. C.M., W.Z., F.L. and Jua.W. performed DNA extraction. J.L., M.A., K.K., P.B. and Jun W. designed the analyses. J.L., H.J., X.C., H. Zhong, Q.F., E.P., A.S.J., B.C., L.X., S.L., D.Z., Z.Z., W.C., H. Zhao, S.E. and H.B.N. performed the data analyses. J.L., X.C., S.S., J.R.K., Z.Z. and W.C. constructed the integrated gene catalog and performed the functional and taxonomic annotation analyses. J.L., X.C., H. Zhong, B.C. and S.L. performed the country-specific signature analyses. J.L., H.J. and H. Zhong wrote the paper. S.S., M.A., X.X., J.Y.A.-A., H.Y., Ji.W., S.B., K.K., O.P., J.D., S.D.E., P.B. and Jun W. revised the paper. The MetaHIT Consortium members contributed to design and execution of the study.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Peer Bork or Jun Wang.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Notes, Supplementary Figures 1–9 and Supplementary Tables 9,10,14,16–20,22,24,26

Excel files

  1. 1.

    Supplementary Table 1

    Statistics for sequencing data of the 1,267 samples.

  2. 2.

    Supplementary Table 2

    Selection for 511 human gut-related sequenced prokaryotic genomes.

  3. 3.

    Supplementary Table 3

    Detailed statistics for the 3,449 sequenced genomes used for taxonomic classification.

  4. 4.

    Supplementary Table 4

    Improved genome coverage by IGC genes.

  5. 5.

    Supplementary Table 5

    Breakdown of IGC genes by occurrence frequency and phylogenetic classification.

  6. 6.

    Supplementary Table 6

    List of gut-related prokaryotic genera.

  7. 7.

    Supplementary Table 7

    List of specific KOs in MetaHIT 2010 and IGC.

  8. 8.

    Supplementary Table 8

    Final pool of healthy Chinese and Danish adults used for analysis.

  9. 9.

    Supplementary Table 11

    Detailed information of population-associated genus markers.

  10. 10.

    Supplementary Table 12

    Detailed information of population-associated KO markers.

  11. 11.

    Supplementary Table 13

    Differential enrichment of enzymes in carbohydrate metabolism.

  12. 12.

    Supplementary Table 15

    Sporulation- and germination-related KOs in the Danish gut microbiome.

  13. 13.

    Supplementary Table 21

    Overrepresentation of multidrug- or penicillin-resistant proteins in Chinese and Danes.

  14. 14.

    Supplementary Table 23

    Elevated metabolic potential for carcinogenic xenobiotics in Chinese adults.

  15. 15.

    Supplementary Table 25

    Enrichment of nitrogen metabolism in the Chinese gut microbiota.

  16. 16.

    Supplementary Table 27

    Distribution of functional categories for genes of different occurrence frequencies.

  17. 17.

    Supplementary Table 28

    Functions overrepresented in individual-specific genes.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.2942