Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut


Metagenomic sequence analysis is rapidly becoming the primary source of virus discovery1,2,3. A substantial majority of the currently available virus genomes come from metagenomics, and some of these represent extremely abundant viruses, even if never grown in the laboratory. A particularly striking case of a virus discovered via metagenomics is crAssphage, which is by far the most abundant human-associated virus known, comprising up to 90% of sequences in the gut virome4. Over 80% of the predicted proteins encoded in the approximately 100 kilobase crAssphage genome showed no significant similarity to available protein sequences, precluding classification of this virus and hampering further study. Here we combine a comprehensive search of genomic and metagenomic databases with sensitive methods for protein sequence analysis to identify an expansive, diverse group of bacteriophages related to crAssphage and predict the functions of the majority of phage proteins, in particular those that comprise the structural, replication and expression modules. Most, if not all, of the crAss-like phages appear to be associated with diverse bacteria from the phylum Bacteroidetes, which includes some of the most abundant bacteria in the human gut microbiome and that are also common in various other habitats. These findings provide for experimental characterization of the most abundant but poorly understood members of the human-associated virome.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Architecture and evolution of the capsid gene module of the crAss-like phage family.
Fig. 2: Whole-genome maps of crAssphage and IAS virus, the two members of the crAss-like family that are abundant in the human gut virome.
Fig. 3: Replicative gene module of the crAss-like phage family.
Fig. 4: Genome expression gene module of the crAss-like phage family.


  1. 1.

    Rohwer, F. Global phage diversity. Cell 113, 141 (2003).

    CAS  PubMed  Google Scholar 

  2. 2.

    Suttle, C. A. Marine viruses—major players in the global ecosystem. Nat. Rev. Microbiol. 5, 801–812 (2007).

    CAS  PubMed  Google Scholar 

  3. 3.

    Simmonds, P. et al. Consensus statement: virus taxonomy in the age of metagenomics. Nat. Rev. Microbiol. 15, 161–168 (2017).

    CAS  PubMed  Google Scholar 

  4. 4.

    Dutilh, B. E. et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun. 5, 4498 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Dutilh, B. E. Metagenomic ventures into outer sequence space. Bacteriophage 4,e979664 (2014).

    PubMed  PubMed Central  Google Scholar 

  6. 6.

    Ogilvie, L. A. & Jones, B. V. The human gut virome: a multifaceted majority. Front. Microbiol. 6, 918 (2015).

    PubMed  PubMed Central  Google Scholar 

  7. 7.

    Hurwitz, B. L., U’Ren, J. M. & Youens-Clark, K. Computational prospecting the great viral unknown. FEMS Microbiol. Lett. 363, fnw077 (2016).

    PubMed  Google Scholar 

  8. 8.

    Yarygin, K. et al. Abundance profiling of specific gene groups using precomputed gut metagenomes yields novel biological hypotheses. PLoS ONE 12,e0176154 (2017).

    PubMed  PubMed Central  Google Scholar 

  9. 9.

    Manrique, P. et al. Healthy human gut phageome. Proc. Natl Acad. Sci. USA 113,10400–10405 (2016).

    CAS  PubMed  Google Scholar 

  10. 10.

    Ahlgren, N. A., Ren, J., Lu, Y. Y., Fuhrman, J. A. & Sun, F. Alignment-free d 2* oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic Acids Res. 45, 39–53 (2017).

    CAS  PubMed  Google Scholar 

  11. 11.

    Wexler, A. G. & Goodman, A. L. An insider’s perspective: bacteroides as a window into the microbiome. Nat. Microbiol. 2, 17026 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Whitaker, W. R., Shepherd, E. S. & Sonnenburg, J. L. Tunable expression tools enable single-cell strain distinction in the gut microbiome. Cell 169, 538–546 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Pramono, A. K. et al. Discovery and complete genome sequence of a bacteriophage from an obligate intracellular symbiont of a cellulolytic protist in the termite gut. Microbes Environ. 32, 112–117 (2017).

    PubMed  PubMed Central  Google Scholar 

  14. 14.

    Holmfeldt, K. et al. Twelve previously unknown phage genera are ubiquitous in global oceans. Proc. Natl Acad. Sci. USA 110, 12798–12803 (2013).

    CAS  PubMed  Google Scholar 

  15. 15.

    Oude Munnink, B. B. et al. Unexplained diarrhoea in HIV-1 infected individuals. BMC Infect. Dis. 14, 22 (2014).

    PubMed  PubMed Central  Google Scholar 

  16. 16.

    Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).

    CAS  PubMed  Google Scholar 

  17. 17.

    Burroughs, A. M., Kaur, G., Zhang, D. & Aravind, L. Novel clades of the HU/IHF superfamily point to unexpected roles in the eukaryotic centrosome, chromosome partitioning, and biologic conflicts. Cell Cycle 16, 1093–1103 (2017).

  18. 18.

    Lander, G. C. et al. The P22 tail machine at subnanometer resolution reveals the architecture of an infection conduit. Structure 17, 789–799 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Casjens, S. R. & Molineux, I. J. Short noncontractile tail machines: adsorption and DNA delivery by podoviruses. Adv. Exp. Med. Biol. 726, 143–179 (2012).

    CAS  PubMed  Google Scholar 

  20. 20.

    Bhardwaj, A., Molineux, I. J., Casjens, S. R. & Cingolani, G. Atomic structure of bacteriophage Sf6 tail needle knob. J. Biol. Chem. 286, 30867–30877 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Xiang, Y. et al. Crystal and cryoEM structural studies of a cell wall degrading enzyme in the bacteriophage phi29 tail. Proc. Natl Acad. Sci. USA 105, 9552–9557 (2008).

    CAS  PubMed  Google Scholar 

  22. 22.

    Casjens, S. R. & Thuman-Commike, P. A. Evolution of mosaically related tailed bacteriophage genomes seen through the lens of phage P22 virion assembly. Virology 411, 393–415 (2011).

    CAS  PubMed  Google Scholar 

  23. 23.

    Lane, W. J. & Darst, S. A. Molecular evolution of multisubunit RNA polymerases: sequence analysis. J. Mol. Biol. 395, 671–685 (2010).

    CAS  PubMed  Google Scholar 

  24. 24.

    Iyer, L. M., Burroughs, A. M., Anand, S., de Souza, R. F. & Aravind, L. Polyvalent proteins, a pervasive theme in the intergenomic biological conflicts of bacteriophages and conjugative elements. J. Bacteriol. 199, e00245–17 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Berdygulova, Z. et al. Temporal regulation of gene expression of the Thermus thermophilus bacteriophage P23-45. J. Mol. Biol. 405, 125–142 (2011).

    CAS  PubMed  Google Scholar 

  26. 26.

    Iyer, L. M. & Aravind, L. Insights from the architecture of the bacterial transcription apparatus. J. Struct. Biol. 179, 299–319 (2012).

    CAS  PubMed  Google Scholar 

  27. 27.

    Yakunina, M. et al. A non-canonical multisubunit RNA polymerase encoded by a giant bacteriophage. Nucleic Acids Res. 43, 10411–10420 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Lavysh, D. et al. The genome of AR9, a giant transducing Bacillus phage encoding two multisubunit RNA polymerases. Virology 495, 185–196 (2016).

    CAS  PubMed  Google Scholar 

  29. 29.

    Ruprich-Robert, G. & Thuriaux, P. Non-canonical DNA transcription enzymes and the conservation of two-barrel RNA polymerases. Nucleic Acids Res. 38, 4559–4569 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Krupovic, M. & Koonin, E. V. Multiple origins of viral capsid proteins from cellular ancestors. Proc. Natl Acad. Sci. USA 114, E2401–E2410 (2017).

    CAS  PubMed  Google Scholar 

  31. 31.

    Barr, J. J. et al. Subdiffusive motion of bacteriophage in mucosal surfaces increases the frequency of bacterial encounters. Proc. Natl Acad. Sci. USA 112,13675–13680 (2015).

    CAS  PubMed  Google Scholar 

  32. 32.

    Krupovic, M. et al. Taxonomy of prokaryotic viruses: update from the ICTV bacterial and archaeal viruses subcommittee. Arch. Virol. 161, 1095–1099 (2016).

    CAS  PubMed  Google Scholar 

  33. 33.

    Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Soding, J. Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960 (2005).

    PubMed  Google Scholar 

  35. 35.

    Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Besemer, J. & Borodovsky, M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res. 33, W451–454 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Kelley, D. R., Liu, B., Delcher, A. L., Pop, M. & Salzberg, S. L. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 40, e9 (2012).

    CAS  PubMed  Google Scholar 

  38. 38.

    Yutin, N., Makarova, K. S., Mekhedov, S. L., Wolf, Y. I. & Koonin, E. V. The deep archaeal roots of eukaryotes. Mol. Biol. Evol. 25, 1619–1630 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).

    PubMed  PubMed Central  Google Scholar 

  40. 40.

    Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59,307–321 (2010).

    CAS  Google Scholar 

  41. 41.

    Bailey, T. L., Williams, N., Misleh, C. & Li, W. W. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34, W369–373 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Thompson, W. A., Newberg, L. A., Conlan, S., McCue, L. A. & Lawrence, C. E. The Gibbs Centroid Sampler. Nucleic Acids Res. 35, W232–237 (2007).

    PubMed  PubMed Central  Google Scholar 

Download references


The authors thank Y.I. Wolf and S. Shmakov for technical help and Koonin group members for discussion. N.Y., K.S.M., A.B.G. and E.V.K. are supported by intramural funds of the US Department of Health and Human Services (to the National Library of Medicine).

Author information




E.V.K. conceived of the study. N.Y., K.S.M. and M.K. performed research. N.Y., K.S.M., A.B.G., M.K., A.S., R.A.E. and E.V.K. analysed the data. E.V.K. wrote the manuscript, which was read, edited and approved by all authors.

Corresponding author

Correspondence to Eugene V. Koonin.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information

Supplementary Figures 1–3 and Supplementary Notes 1 and 2.

Life Sciences Reporting Summary

Supplementary Dataset 1

The selected representative set of crAss-like family members; conserved genes in the crAss-like phage family (an extended version of Table 1); BLAST scores of conserved crAss-like family proteins.

Supplementary Dataset 2

Annotation of the crAssphage and IAS phage genes and comparison of the MetaGeneMark and the current RefSeq (Glimmer) crAssphage annotations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yutin, N., Makarova, K.S., Gussow, A.B. et al. Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat Microbiol 3, 38–46 (2018).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing