Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Fast and accurate metagenotyping of the human gut microbiome with GT-Pro

Abstract

Single nucleotide polymorphisms (SNPs) in metagenomics are used to quantify population structure, track strains and identify genetic determinants of microbial phenotypes. However, existing alignment-based approaches for metagenomic SNP detection require high-performance computing and enough read coverage to distinguish SNPs from sequencing errors. To address these issues, we developed the GenoTyper for Prokaryotes (GT-Pro), a suite of methods to catalog SNPs from genomes and use unique k-mers to rapidly genotype these SNPs from metagenomes. Compared to methods that use read alignment, GT-Pro is more accurate and two orders of magnitude faster. Using high-quality genomes, we constructed a catalog of 104 million SNPs in 909 human gut species and used unique k-mers targeting this catalog to characterize the global population structure of gut microbes from 7,459 samples. GT-Pro enables fast and memory-efficient metagenotyping of millions of SNPs on a personal computer.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: In sillico metagenotyping framework.
Fig. 2: Genetic landscape of 909 human gut species.
Fig. 3: Computational performance evaluation of GT-Pro.
Fig. 4: Metagenotyping accuracy evaluation of GT-Pro using simulations.
Fig. 5: Metagenotyping and gene imputation from gut metagenomes.
Fig. 6: Global genetic structure in 7,459 human gut metagenomes.

Similar content being viewed by others

Data availability

All described datasets are publicly available through the corresponding repositories. Genome assemblies for building GT-Pro used in this study were downloaded from the UHGG database and are available at MGnify (http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes). The 1,171 C. difficile genomes are available at NCBI RefSeq (https://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Clostridioides_difficile/), and the accession numbers of 114 high-quality nonredundant C. difficile genomes are in Supplementary Table 15. All metagenomic samples are available at NCBI SRA (https://www.ncbi.nlm.nih.gov/sra) with accession numbers in supplementary tables: 25,133 human microbiome samples (Supplementary Table 8), Tanzania (Supplementary Table 9), North America (Supplementary Table 10), Madagascar (Supplementary Table 11) and North American IBD cohort (Supplementary Table 12) and global biogeography samples (Supplementary Table 13). The GT-Pro SNP databases and genotype profiles of 25,133 human microbiome samples generated in this study are available in a cloud server with public access permission (https://fileshare.czbiohub.org/s/waXQzQ9PRZPwTdk) and can be accessed through GitHub (https://github.com/zjshi/gt-pro).

Code availability

The implementation and documentation of GT-Pro is available on the GitHub (https://github.com/zjshi/gt-pro). GT-Pro is written in C++ with python scripts, it is released as open-source software under the MIT license.

References

  1. Garud, N. R. & Pollard, K. S. Population genetics in the human microbiome. Trends Genet. 36, 53–67 (2020).

    Article  CAS  PubMed  Google Scholar 

  2. Maini Rekdal, V., Bess, E. N., Bisanz, J. E., Turnbaugh, P. J. & Balskus, E. P. Discovery and inhibition of an interspecies gut bacterial pathway for Levodopa metabolism. Science 364, eaau6323 (2019).

    Article  PubMed  Google Scholar 

  3. Zeng, Q., Liao, C., Terhune, J. & Wang, L. Impacts of florfenicol on the microbiota landscape and resistome as revealed by metagenomic analysis. Microbiome 7, 155 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Chattopadhyay, S. et al. High frequency of hotspot mutations in core genes of Escherichia coli due to short-term positive selection. Proc. Natl Acad. Sci. USA 106, 12412–12417 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Treangen, T. J., Ondov, B. D., Koren, S. & Phillippy, A. M. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol. 15, 524 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013).

    Article  CAS  PubMed  Google Scholar 

  7. Luo, C. et al. ConStrains identifies microbial strains in metagenomic datasets. Nat. Biotechnol. 33, 1045–1052 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Nayfach, S., Rodriguez-Mueller, B., Garud, N. & Pollard, K. S. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res. 26, 1612–1625 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Costea, P. I. et al. metaSNV: a tool for metagenomic strain level analysis. PLoS ONE 12, e0182392 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Quince, C. et al. DESMAN: a new tool for de novo extraction of strains from metagenomes. Genome Biol. 18, 181 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Ounit, R., Wanamaker, S., Close, T. J. & Lonardi, S. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16, 236 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Liu, Y., Zhang, L. Y. & Li, J. Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index k-mers. Bioinformatics 35, 4560–4567 (2019).

    Article  CAS  PubMed  Google Scholar 

  15. Breitwieser, F. P., Baker, D. N. & Salzberg, S. L. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 19, 198 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Phillippy, A. M. et al. Comprehensive DNA signature discovery and validation. PLoS Comput. Biol. 3, e98 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Shajii, A., Yorukoglu, D., William Yu, Y. & Berger, B. Fast genotyping of known SNPs through approximate k-mer matching. Bioinforma. 32, i538–i544 (2016).

    Article  CAS  Google Scholar 

  18. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662.e20 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Nayfach, S., Shi, Z. J., Seshadri, R., Pollard, K. S. & Kyrpides, N. C. New insights from uncultivated genomes of the global human gut microbiome. Nature 568, 505–510 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  PubMed  Google Scholar 

  24. Smith, J. M., Smith, N. H., O’Rourke, M. & Spratt, B. G. How clonal are bacteria? Proc. Natl Acad. Sci. USA 90, 4384 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Redfield, R. J. Do bacteria have sex? Nat. Rev. Genet. 2, 634–639 (2001).

    Article  CAS  PubMed  Google Scholar 

  26. Lin, M. & Kussell, E. Inferring bacterial recombination rates from large-scale sequencing datasets. Nat. Methods 16, 199–204 (2019).

    Article  CAS  PubMed  Google Scholar 

  27. Ansari, M. A. & Didelot, X. Inference of the properties of the recombination process from whole bacterial genomes. Genetics 196, 253 (2014).

    Article  PubMed  Google Scholar 

  28. González-Torres, P., Rodríguez-Mateos, F., Antón, J. & Gabaldón, T. Impact of homologous recombination on the evolution of prokaryotic core genomes. mBio. 10, e02494–18 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Garud, N. R., Good, B. H., Hallatschek, O. & Pollard, K. S. Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. PLoS Biol. 17, e3000102 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Leinonen, R., Sugawara, H. & Shumway, M., International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2011).

    Article  CAS  PubMed  Google Scholar 

  32. Smits, S. A. et al. Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania. Science 357, 802 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Zou, Y. et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat. Biotechnol. 37, 179–185 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Turnbaugh, P. J. et al. The Human Microbiome Project. Nature 449, 804–810 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Franzosa, E. A. et al. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat. Microbiol. 4, 293–305 (2019).

    Article  CAS  PubMed  Google Scholar 

  36. Issa, M., Ananthakrishnan, A. N. & Binion, D. G. Clostridium difficile and inflammatory bowel disease. Inflamm. Bowel Dis. 14, 1432–1442 (2008).

    Article  PubMed  Google Scholar 

  37. Rousseau, C. et al. Clostridium difficile colonization in early infancy is accompanied by changes in intestinal microbiota composition. J. Clin. Microbiol. 49, 858–865 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Vincent, C. et al. Bloom and bust: intestinal microbiota dynamics in response to hospital exposures and Clostridium difficile colonization or infection. Microbiome 4, 12 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Tierney, B. T. et al. The landscape of genetic content in the gut and oral human microbiome. Cell Host Microbe 26, 283–295.e8 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Almeida, A. et al. A unified sequence catalogue of over 280,000 genomes obtained from the human gut microbiome. Preprint at bioRxiv https://doi.org/10.1101/762682 (2019).

  41. Nei, M. & Gojobori, T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426 (1986).

    CAS  PubMed  Google Scholar 

  42. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

    Article  CAS  PubMed  Google Scholar 

  44. Liu, X. et al. A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures. Bioinformatics 34, 171–178 (2017).

    Article  CAS  PubMed Central  Google Scholar 

  45. Kokot, M., Długosz, M. & Deorowicz, S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics 33, 2759–2761 (2017).

    Article  CAS  PubMed  Google Scholar 

  46. Mende, D. R., Sunagawa, S., Zeller, G. & Bork, P. Accurate and universal delineation of prokaryotic species. Nat. Methods 10, 881–884 (2013).

    Article  CAS  PubMed  Google Scholar 

  47. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf. 11, 119–119 (2010).

    Article  CAS  Google Scholar 

  48. Kultima, J. R. et al. MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS ONE 7, e47656 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Gourlé, H., Karlsson-Lindsjö, O., Hayer, J. & Bongcam-Rudloff, E. Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics 35, 521–522 (2018).

    Article  CAS  PubMed Central  Google Scholar 

  54. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinforma. 30, 1312–1313 (2014).

    Article  CAS  Google Scholar 

  55. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinf. 10, 421 (2009).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This study was funded by NSF (grant #1563159), Chan Zuckerberg Biohub, Chan Zuckerberg Initiative, and Gladstone Institutes.

Author information

Authors and Affiliations

Authors

Contributions

K.S.P. and S.N. conceived the project. K.S.P., S.N. and Z.J.S. designed experiments and drafted the manuscript. Z.J.S. conducted experiments, analyzed data, made figures and wrote software. B.D. wrote software and contributed to analysis of software performance. C.Z. contributed to analysis of structural variation imputation and tested software. K.S.P. supervised the project, provided computational resources and funding. K.S.P. and S.N. provided feedback. All authors read, edited and reviewed the paper.

Corresponding authors

Correspondence to Stephen Nayfach or Katherine S. Pollard.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Yun William Yu, Falk Hildebrand and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–42.

Reporting Summary.

Supplementary Tables

Supplementary Tables 1–15.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Z.J., Dimitrov, B., Zhao, C. et al. Fast and accurate metagenotyping of the human gut microbiome with GT-Pro. Nat Biotechnol 40, 507–516 (2022). https://doi.org/10.1038/s41587-021-01102-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-021-01102-3

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics