Abstract
Due to technical limitations, most gut microbiome studies have focused on prokaryotes, overlooking viruses. Phanta, a virome-inclusive gut microbiome profiling tool, overcomes the limitations of assembly-based viral profiling methods by using customized k-mer-based classification tools and incorporating recently published catalogs of gut viral genomes. Phanta’s optimizations consider the small genome size of viruses, sequence homology with prokaryotes and interactions with other gut microbes. Extensive testing of Phanta on simulated data demonstrates that it quickly and accurately quantifies prokaryotes and viruses. When applied to 245 fecal metagenomes from healthy adults, Phanta identifies ~200 viral species per sample, ~5× more than standard assembly-based methods. We observe a ~2:1 ratio between DNA viruses and bacteria, with higher interindividual variability of the gut virome compared to the gut bacteriome. In another cohort, we observe that Phanta performs equally well on bulk versus virus-enriched metagenomes, making it possible to study prokaryotes and viruses in a single experiment, with a single analysis.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
Accession numbers of all publicly available metagenomes used for analysis are provided in Supplementary Table 11. Source data for individual figures are provided with this manuscript. Phanta’s databases are available from the links specified at https://github.com/bhattlab/phanta (ref. 45). There are no restrictions on data availability. Source data are provided with this paper.
Code availability
Workflows were used for the preprocessing and assembly steps described in Methods and the workflows are available at https://github.com/bhattlab/bhattlab_workflows. Phanta and its postprocessing scripts are publicly available at https://github.com/bhattlab/phanta (ref. 45) with a detailed tutorial describing installation and usage.
References
Shreiner, A. B., Kao, J. Y. & Young, V. B. The gut microbiome in health and in disease. Curr. Opin. Gastroenterol. 31, 69–75 (2015).
Pflughoeft, K. J. & Versalovic, J. Human microbiome in health and disease. Annu. Rev. Pathol. 7, 99–122 (2012).
Cryan, J. F. et al. The microbiota-gut-brain axis. Physiol. Rev. 99, 1877–2013 (2019).
Kau, A. L., Ahern, P. P., Griffin, N. W., Goodman, A. L. & Gordon, J. I. Human nutrition, the gut microbiome and the immune system. Nature 474, 327–336 (2011).
Tringe, S. G. & Hugenholtz, P. A renaissance for the pioneering 16S rRNA gene. Curr. Opin. Microbiol. 11, 442–446 (2008).
Drewes, J. L. et al. High-resolution bacterial 16S rRNA gene profile meta-analysis and biofilm status reveal common colorectal cancer consortia. NPJ Biofilms Microbiomes 3, 34 (2017).
Romano, S. et al. Meta-analysis of the Parkinson’s disease gut microbiome suggests alterations linked to intestinal inflammation. NPJ Parkinsons Dis. 7, 27 (2021).
Yatsunenko, T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222–227 (2012).
Davis-Richardson, A. G. et al. Bacteroides dorei dominates gut microbiome prior to autoimmunity in Finnish children at high risk for type 1 diabetes. Front. Microbiol. 5, 678 (2014).
Xu, W. et al. Characterization of shallow whole-metagenome shotgun sequencing as a high-accuracy and low-cost method by complicated mock microbiomes. Front. Microbiol. 12, 678319 (2021).
Sharpton, T. J. An introduction to the analysis of shotgun metagenomic data. Front. Plant Sci. 5, 209 (2014).
Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017).
Kunin, V., Copeland, A., Lapidus, A., Mavromatis, K. & Hugenholtz, P. A bioinformatician’s guide to metagenomics. Microbiol. Mol. Biol. Rev. 72, 557–578 (2008).
Gregory, A. C. et al. MetaPop: a pipeline for macro- and microdiversity analyses and visualization of microbial and viral metagenome-derived populations. Microbiome 10, 49 (2022).
Pandolfo, M., Telatin, A., Lazzari, G., Adriaenssens, E. M. & Vitulo, N. MetaPhage: an automated pipeline for analyzing, annotating, and classifying bacteriophages in metagenomics sequencing data. mSystems. 7, e0074122 (2022).
Shen, W. et al. KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping. Bioinformatics 39, btac845 (2023).
Lopera-Maya, E. A. et al. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project. Nat. Genet. 54, 143–151 (2022).
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).
Hiseni, P., Rudi, K., Wilson, R. C., Hegge, F. T. & Snipen, L. HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data. Microbiome 9, 165 (2021).
Zhernakova, A. et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science 352, 565–569 (2016).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).
Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
Kieft, K., Zhou, Z. & Anantharaman, K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 8, 90 (2020).
Roux, S., Enault, F., Hurwitz, B. L. & Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015).
Guo, J. et al. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome 9, 37 (2021).
Ren, J. et al. Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8, 64–77 (2020).
Ren, J., Ahlgren, N. A., Lu, Y. Y., Fuhrman, J. A. & Sun, F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome 5, 69 (2017).
Amgarten, D., Braga, L. P. P., da Silva, A. M. & Setubal, J. C. MARVEL, a tool for prediction of Bacteriophage sequences in metagenomic bins. Front. Genet. 9, 304 (2018).
Fang, Z. et al. PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning. GigaScience 8, giz066 (2019).
Sutton, T. D. S., Clooney, A. G., Ryan, F. J., Ross, R. P. & Hill, C. Choice of assembly software has a critical impact on virome characterisation. Microbiome 7, 12 (2019).
Nayfach, S. et al. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat. Microbiol. 6, 960–970 (2021).
Gregory, A. C. et al. The gut virome database reveals age-dependent patterns of virome diversity in the human gut. Cell Host Microbe 28, 724–740 (2020).
Soto-Perez, P. et al. CRISPR-Cas system of a prevalent human gut bacterium reveals hyper-targeting against phages in a human virome catalog. Cell Host Microbe 26, 325–335 (2019).
Paez-Espino, D. et al. IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes. Nucleic Acids Res. 47, D678–D686 (2019).
Tisza, M. J. & Buck, C. B. A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases. Proc. Natl Acad. Sci. USA 118, e2023202118 (2021).
Camarillo-Guerrero, L. F., Almeida, A., Rangel-Pineros, G., Finn, R. D. & Lawley, T. D. Massive expansion of human gut bacteriophage diversity. Cell 184, 1098–1109 (2021).
Benler, S. et al. Thousands of previously unknown phages discovered in whole-community human gut metagenomes. Microbiome 9, 78 (2021).
Van Espen, L. et al. A previously undescribed highly prevalent Phage identified in a Danish enteric virome catalog. mSystems 6, e0038221 (2021).
Bharti, R. & Grimm, D. G. Current challenges and best-practice protocols for microbiome analysis. Brief. Bioinform. 22, 178–193 (2021).
Ghurye, J. S., Cepeda-Espinoza, V. & Pop, M. Metagenomic assembly: overview, challenges and applications. Yale J. Biol. Med. 89, 353–362 (2016).
Rose, R., Constantinides, B., Tapinos, A., Robertson, D. L. & Prosperi, M. Challenges in the analysis of viral metagenomes. Virus Evol. 2, vew022 (2016).
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017).
Pinto, Y., Chakraborty, M., Jain, N. & Bhatt, A. S. bhattlab/phanta. GitHub https://github.com/bhattlab/phanta (2023).
Wright, R. J., Comeau, A. M. & Langille, M. G. I. From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools. Microb. Genom. 9, mgen000949 (2023).
Breitwieser, F. P., Baker, D. N. & Salzberg, S. L. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 19, 198 (2018).
Sun, Z. et al. Challenges in benchmarking metagenomic profilers. Nat. Methods 18, 618–626 (2021).
Yachida, S. et al. Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer. Nat. Med. 25, 968–976 (2019).
BenLangmead. Index zone. GitHub https://benlangmead.github.io/aws-indexes/k2 (2023).
Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011).
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021).
Danovaro, R. & Serresi, M. Viral density and virus-to-bacterium ratio in deep-sea sediments of the Eastern Mediterranean. Appl. Environ. Microbiol. 66, 1857–1861 (2000).
Weitz, J. S., Beckett, S. J., Brum, J. R., Cael, B. B. & Dushoff, J. Lysis, lysogeny and virus–microbe ratios. Nature 549, E1–E3 (2017).
Marbouty, M., Thierry, A., Millot, G. A. & Koszul, R. MetaHiC phage-bacteria infection network reveals active cycling phages of the healthy human gut. eLife 10, e60608 (2021).
Liang, G. et al. The stepwise assembly of the neonatal virome is modulated by breastfeeding. Nature 581, 470–474 (2020).
Shkoporov, A. N. & Hill, C. Bacteriophages of the human gut: the ‘known unknown’ of the microbiome. Cell Host Microbe 25, 195–209 (2019).
Moreno-Gallego, J. L. et al. Virome diversity correlates with intestinal microbiome diversity in adult monozygotic twins. Cell Host Microbe 25, 261–272 (2019).
Chen, W. et al. Vast human gut virus diversity uncovered by combined short- and long-read sequencing. Preprint at bioRxiv https://doi.org/10.1101/2022.07.03.498593 (2022).
Shkoporov, A. N. et al. The human gut virome is highly diverse, stable, and individual specific. Cell Host Microbe 26, 527–541 (2019).
Stachler, E. & Bibby, K. Metagenomic evaluation of the highly abundant human gut Bacteriophage CrAssphage for source tracking of human fecal pollution. Environ. Sci. Technol. Lett. 1, 405–409 (2014).
Dutilh, B. E. et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun. 5, 4498 (2014).
Benler, S. et al. A diversity-generating retroelement encoded by a globally ubiquitous Bacteroides phage. Microbiome 6, 191 (2018).
Guerin, E. & Hill, C. Shining light on human gut Bacteriophages. Front. Cell. Infect. Microbiol. 10, 481 (2020).
Kleiner, M., Hooper, L. V. & Duerkop, B. A. Evaluation of methods to purify virus-like particles for metagenomic sequencing of intestinal viromes. BMC Genomics 16, 7 (2015).
Khan Mirzaei, M. et al. Challenges of studying the human virome - relevant emerging technologies. Trends Microbiol. 29, 171–181 (2021).
Krishnamurthy, S. R. & Wang, D. Origins and challenges of viral dark matter. Virus Res. 239, 136–142 (2017).
Roux, S., Hallam, S. J., Woyke, T. & Sullivan, M. B. Viral dark matter and virus–host interactions resolved from publicly available microbial genomes. eLife 4, e08490 (2015).
Roux, S. et al. iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria. PLoS Biol. 21, e3002083 (2023).
Jain, C., Rodriguez, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, A. High throughput ANI analysis of 90 K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 841–842 (2010).
Hockenberry, A. J. & Wilke, C. O. BACPHLIP: predicting bacteriophage lifestyle from conserved protein domains. PeerJ 9, e11396 (2021).
Watts, S. C., Ritchie, S. C., Inouye, M. & Holt, K. E. FastSpar: rapid and scalable correlation estimation for compositional data. Bioinformatics 35, 1064–1066 (2019).
Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey data. PLoS Comput. Biol. 8, e1002687 (2012).
Fritz, A. et al. CAMISIM: simulating metagenomes and microbial communities. Microbiome 7, 17 (2019).
Li, H. et al. lh3/seqtk. GitHub https://github.com/lh3/seqtk (2018).
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
Acknowledgements
We thank D. Maghini and B. Doyle for thoughtful comments on the manuscript; S. Nayfach, P. Hiseni, S. Salzberg and J. Lu for helpful conversations; B. Doyle and J. Wirbel for testing Phanta; and B. Siranosian, C. Nicolau, K. Bettinger, A. Behr and the Stanford Research Computing Center for computational support. Computing costs were supported, in part, by an NIH S10 Shared Instrumentation under grant 1S10OD02014101. Figure 1 was created using BioRender.com. This study was supported in part by NIH R01AI148623 and R01AI143757, a Stand Up 2 Cancer Grant, the Chan Zuckerberg Initiative, a Sloan Foundation Fellowship and the Allen Distinguished Investigator Award (to A.S.B.). Y.P. is supported by the School of Medicine Dean’s Postdoctoral Fellowship. M.C. was supported by an NIH-funded predoctoral fellowship (5T32HG000044-25) and is supported by the National Defense Science and Engineering Graduate Fellowship (starting September 2022).
Author information
Authors and Affiliations
Contributions
Y.P., M.C. and A.S.B. conceived the study and wrote the manuscript. Y.P. and M.C. developed Phanta. Y.P., M.C., N.J. and A.S.B. analyzed the data. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks Guanxiang Liang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–12.
Supplementary Tables
Supplementary Tables 1–11.
Source data
Source Data Fig. 2
Statistical source data.
Source Data Figs. 3 and 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pinto, Y., Chakraborty, M., Jain, N. et al. Phage-inclusive profiling of human gut microbiomes with Phanta. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01799-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41587-023-01799-4