Abstract
Metagenomic sequence analysis is rapidly becoming the primary source of virus discovery1,2,3. A substantial majority of the currently available virus genomes come from metagenomics, and some of these represent extremely abundant viruses, even if never grown in the laboratory. A particularly striking case of a virus discovered via metagenomics is crAssphage, which is by far the most abundant human-associated virus known, comprising up to 90% of sequences in the gut virome4. Over 80% of the predicted proteins encoded in the approximately 100 kilobase crAssphage genome showed no significant similarity to available protein sequences, precluding classification of this virus and hampering further study. Here we combine a comprehensive search of genomic and metagenomic databases with sensitive methods for protein sequence analysis to identify an expansive, diverse group of bacteriophages related to crAssphage and predict the functions of the majority of phage proteins, in particular those that comprise the structural, replication and expression modules. Most, if not all, of the crAss-like phages appear to be associated with diverse bacteria from the phylum Bacteroidetes, which includes some of the most abundant bacteria in the human gut microbiome and that are also common in various other habitats. These findings provide for experimental characterization of the most abundant but poorly understood members of the human-associated virome.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Rohwer, F. Global phage diversity. Cell 113, 141 (2003).
Suttle, C. A. Marine viruses—major players in the global ecosystem. Nat. Rev. Microbiol. 5, 801–812 (2007).
Simmonds, P. et al. Consensus statement: virus taxonomy in the age of metagenomics. Nat. Rev. Microbiol. 15, 161–168 (2017).
Dutilh, B. E. et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun. 5, 4498 (2014).
Dutilh, B. E. Metagenomic ventures into outer sequence space. Bacteriophage 4,e979664 (2014).
Ogilvie, L. A. & Jones, B. V. The human gut virome: a multifaceted majority. Front. Microbiol. 6, 918 (2015).
Hurwitz, B. L., U’Ren, J. M. & Youens-Clark, K. Computational prospecting the great viral unknown. FEMS Microbiol. Lett. 363, fnw077 (2016).
Yarygin, K. et al. Abundance profiling of specific gene groups using precomputed gut metagenomes yields novel biological hypotheses. PLoS ONE 12,e0176154 (2017).
Manrique, P. et al. Healthy human gut phageome. Proc. Natl Acad. Sci. USA 113,10400–10405 (2016).
Ahlgren, N. A., Ren, J., Lu, Y. Y., Fuhrman, J. A. & Sun, F. Alignment-free d 2* oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic Acids Res. 45, 39–53 (2017).
Wexler, A. G. & Goodman, A. L. An insider’s perspective: bacteroides as a window into the microbiome. Nat. Microbiol. 2, 17026 (2017).
Whitaker, W. R., Shepherd, E. S. & Sonnenburg, J. L. Tunable expression tools enable single-cell strain distinction in the gut microbiome. Cell 169, 538–546 (2017).
Pramono, A. K. et al. Discovery and complete genome sequence of a bacteriophage from an obligate intracellular symbiont of a cellulolytic protist in the termite gut. Microbes Environ. 32, 112–117 (2017).
Holmfeldt, K. et al. Twelve previously unknown phage genera are ubiquitous in global oceans. Proc. Natl Acad. Sci. USA 110, 12798–12803 (2013).
Oude Munnink, B. B. et al. Unexplained diarrhoea in HIV-1 infected individuals. BMC Infect. Dis. 14, 22 (2014).
Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).
Burroughs, A. M., Kaur, G., Zhang, D. & Aravind, L. Novel clades of the HU/IHF superfamily point to unexpected roles in the eukaryotic centrosome, chromosome partitioning, and biologic conflicts. Cell Cycle 16, 1093–1103 (2017).
Lander, G. C. et al. The P22 tail machine at subnanometer resolution reveals the architecture of an infection conduit. Structure 17, 789–799 (2009).
Casjens, S. R. & Molineux, I. J. Short noncontractile tail machines: adsorption and DNA delivery by podoviruses. Adv. Exp. Med. Biol. 726, 143–179 (2012).
Bhardwaj, A., Molineux, I. J., Casjens, S. R. & Cingolani, G. Atomic structure of bacteriophage Sf6 tail needle knob. J. Biol. Chem. 286, 30867–30877 (2011).
Xiang, Y. et al. Crystal and cryoEM structural studies of a cell wall degrading enzyme in the bacteriophage phi29 tail. Proc. Natl Acad. Sci. USA 105, 9552–9557 (2008).
Casjens, S. R. & Thuman-Commike, P. A. Evolution of mosaically related tailed bacteriophage genomes seen through the lens of phage P22 virion assembly. Virology 411, 393–415 (2011).
Lane, W. J. & Darst, S. A. Molecular evolution of multisubunit RNA polymerases: sequence analysis. J. Mol. Biol. 395, 671–685 (2010).
Iyer, L. M., Burroughs, A. M., Anand, S., de Souza, R. F. & Aravind, L. Polyvalent proteins, a pervasive theme in the intergenomic biological conflicts of bacteriophages and conjugative elements. J. Bacteriol. 199, e00245–17 (2017).
Berdygulova, Z. et al. Temporal regulation of gene expression of the Thermus thermophilus bacteriophage P23-45. J. Mol. Biol. 405, 125–142 (2011).
Iyer, L. M. & Aravind, L. Insights from the architecture of the bacterial transcription apparatus. J. Struct. Biol. 179, 299–319 (2012).
Yakunina, M. et al. A non-canonical multisubunit RNA polymerase encoded by a giant bacteriophage. Nucleic Acids Res. 43, 10411–10420 (2015).
Lavysh, D. et al. The genome of AR9, a giant transducing Bacillus phage encoding two multisubunit RNA polymerases. Virology 495, 185–196 (2016).
Ruprich-Robert, G. & Thuriaux, P. Non-canonical DNA transcription enzymes and the conservation of two-barrel RNA polymerases. Nucleic Acids Res. 38, 4559–4569 (2010).
Krupovic, M. & Koonin, E. V. Multiple origins of viral capsid proteins from cellular ancestors. Proc. Natl Acad. Sci. USA 114, E2401–E2410 (2017).
Barr, J. J. et al. Subdiffusive motion of bacteriophage in mucosal surfaces increases the frequency of bacterial encounters. Proc. Natl Acad. Sci. USA 112,13675–13680 (2015).
Krupovic, M. et al. Taxonomy of prokaryotic viruses: update from the ICTV bacterial and archaeal viruses subcommittee. Arch. Virol. 161, 1095–1099 (2016).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Soding, J. Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960 (2005).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Besemer, J. & Borodovsky, M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res. 33, W451–454 (2005).
Kelley, D. R., Liu, B., Delcher, A. L., Pop, M. & Salzberg, S. L. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 40, e9 (2012).
Yutin, N., Makarova, K. S., Mekhedov, S. L., Wolf, Y. I. & Koonin, E. V. The deep archaeal roots of eukaryotes. Mol. Biol. Evol. 25, 1619–1630 (2008).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59,307–321 (2010).
Bailey, T. L., Williams, N., Misleh, C. & Li, W. W. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34, W369–373 (2006).
Thompson, W. A., Newberg, L. A., Conlan, S., McCue, L. A. & Lawrence, C. E. The Gibbs Centroid Sampler. Nucleic Acids Res. 35, W232–237 (2007).
Acknowledgements
The authors thank Y.I. Wolf and S. Shmakov for technical help and Koonin group members for discussion. N.Y., K.S.M., A.B.G. and E.V.K. are supported by intramural funds of the US Department of Health and Human Services (to the National Library of Medicine).
Author information
Authors and Affiliations
Contributions
E.V.K. conceived of the study. N.Y., K.S.M. and M.K. performed research. N.Y., K.S.M., A.B.G., M.K., A.S., R.A.E. and E.V.K. analysed the data. E.V.K. wrote the manuscript, which was read, edited and approved by all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary information
Supplementary Figures 1–3 and Supplementary Notes 1 and 2.
Supplementary Dataset 1
The selected representative set of crAss-like family members; conserved genes in the crAss-like phage family (an extended version of Table 1); BLAST scores of conserved crAss-like family proteins.
Supplementary Dataset 2
Annotation of the crAssphage and IAS phage genes and comparison of the MetaGeneMark and the current RefSeq (Glimmer) crAssphage annotations.
Rights and permissions
About this article
Cite this article
Yutin, N., Makarova, K.S., Gussow, A.B. et al. Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat Microbiol 3, 38–46 (2018). https://doi.org/10.1038/s41564-017-0053-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-017-0053-y
This article is cited by
-
Exploring the gut DNA virome in fecal immunochemical test stool samples reveals associations with lifestyle in a large population-based study
Nature Communications (2024)
-
Tail-tape-fused virion and non-virion RNA polymerases of a thermophilic virus with an extremely long tail
Nature Communications (2024)
-
A systematic framework for understanding the microbiome in human health and disease: from basic principles to clinical translation
Signal Transduction and Targeted Therapy (2024)
-
Phylogeny and disease associations of a widespread and ancient intestinal bacteriophage lineage
Nature Communications (2024)
-
CrAss-Like Phages: From Discovery in Human Fecal Metagenome to Application as a Microbial Source Tracking Marker
Food and Environmental Virology (2024)