Many analyses of the human gut microbiome depend on a catalog of reference genes. Existing catalogs for the human gut microbiome are based on samples from single cohorts or on reference genomes or protein sequences, which limits coverage of global microbiome diversity. Here we combined 249 newly sequenced samples of the Metagenomics of the Human Intestinal Tract (MetaHit) project with 1,018 previously sequenced samples to create a cohort from three continents that is at least threefold larger than cohorts used for previous gene catalogs. From this we established the integrated gene catalog (IGC) comprising 9,879,896 genes. The catalog includes close-to-complete sets of genes for most gut microbes, which are also of considerably higher quality than in previous catalogs. Analyses of a group of samples from Chinese and Danish individuals using the catalog revealed country-specific gut microbial signatures. This expanded catalog should facilitate quantitative characterization of metagenomic, metatranscriptomic and metaproteomic data from the gut microbiome to understand its variation across populations in human health and disease.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $20.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
European Nucleotide Archive
Gene Expression Omnibus
Sequence Read Archive
This research was supported by the European Commission FP7 grant HEALTH-F4-2007-201052 and HEALTH-F4-2010-261376, Natural Science Foundation of China (30890032, 30725008, 30811130531 and 31161130357), the Shenzhen Municipal Government of China (ZYC200903240080A, BGI20100001, CXB201108250096A and CXB201108250098A), European Research Council CancerBiome grant (project reference 268985), METACARDIS project (FP7-HEALTH-2012-INNOVATION-I-305312), the Danish Strategic Research Council grant (2106-07-0021), the Ole Rømer grant from Danish Natural Science Research Council and the Solexa project (272-07-0196). Additional funding came from the Lundbeck Foundation Centre for Applied Medical Genomics in Personalized Disease Prediction, Prevention and Care (http://www.lucamp.org/), the Novo Nordisk Foundation Center for Basic Metabolic Research (an independent research center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation; http://www.metabol.ku.dk) and the Metagenopolis grant ANR-11-DPBS-0001. We are indebted to many additional faculty and staff of BGI-Shenzhen who contributed to this work.
Statistics for sequencing data of the 1,267 samples.
Selection for 511 human gut-related sequenced prokaryotic genomes.
Detailed statistics for the 3,449 sequenced genomes used for taxonomic classification.
Improved genome coverage by IGC genes.
Breakdown of IGC genes by occurrence frequency and phylogenetic classification.
List of gut-related prokaryotic genera.
List of specific KOs in MetaHIT 2010 and IGC.
Final pool of healthy Chinese and Danish adults used for analysis.
Detailed information of population-associated genus markers.
Detailed information of population-associated KO markers.
Differential enrichment of enzymes in carbohydrate metabolism.
Sporulation- and germination-related KOs in the Danish gut microbiome.
Overrepresentation of multidrug- or penicillin-resistant proteins in Chinese and Danes.
Elevated metabolic potential for carcinogenic xenobiotics in Chinese adults.
Enrichment of nitrogen metabolism in the Chinese gut microbiota.
Distribution of functional categories for genes of different occurrence frequencies.
Functions overrepresented in individual-specific genes.