Article | Published:

Global phylogeography and ancient evolution of the widespread human gut virus crAssphage


Microbiomes are vast communities of microorganisms and viruses that populate all natural ecosystems. Viruses have been considered to be the most variable component of microbiomes, as supported by virome surveys and examples of high genomic mosaicism. However, recent evidence suggests that the human gut virome is remarkably stable compared with that of other environments. Here, we investigate the origin, evolution and epidemiology of crAssphage, a widespread human gut virus. Through a global collaboration, we obtained DNA sequences of crAssphage from more than one-third of the world’s countries and showed that the phylogeography of crAssphage is locally clustered within countries, cities and individuals. We also found fully colinear crAssphage-like genomes in both Old-World and New-World primates, suggesting that the association of crAssphage with primates may be millions of years old. Finally, by exploiting a large cohort of more than 1,000 individuals, we tested whether crAssphage is associated with bacterial taxonomic groups of the gut microbiome, diverse human health parameters and a wide range of dietary factors. We identified strong correlations with different clades of bacteria that are related to Bacteroidetes and weak associations with several diet categories, but no significant association with health or disease. We conclude that crAssphage is a benign cosmopolitan virus that may have coevolved with the human lineage and is an integral part of the normal human gut virome.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

Sequence data that support the findings of this study have been deposited in GenBank under BioProject accession PRJNA510571 and at Each of the samples has a unique BioSample accession number (SAMN10656826SAMN10658627, SAMN10658653 and SAMN10659294). The SRA runs used in this analysis are included in Supplementary File 5. The data that support the findings of this study are also available from the corresponding authors on reasonable request.

Code availability

The code used to generate the data can be accessed at The current release81 is v.2.0.


  1. 1.

    Sender, R., Fuchs, S. & Milo, R. Are we really vastly outnumbered? Revisiting the ratio of bacterial to host cells in humans. Cell 164, 337–340 (2016).

  2. 2.

    Nguyen, S. et al. Bacteriophage transcytosis provides a mechanism to cross epithelial cell layers. mBio 8, e01874–17 (2017).

  3. 3.

    Reyes, A. et al. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466, 334–338 (2010).

  4. 4.

    Minot, S. et al. The human gut virome: inter-individual variation and dynamic response to diet. Genome Res. 21, 1616–1625 (2011).

  5. 5.

    Reyes, A., Semenkovich, N. P., Whiteson, K., Rohwer, F. & Gordon, J. I. Going viral: next-generation sequencing applied to phage populations in the human gut. Nat. Rev. Microbiol. 10, 607–617 (2012).

  6. 6.

    Paterson, S. et al. Antagonistic coevolution accelerates molecular evolution. Nature 464, 275–278 (2010).

  7. 7.

    Pedulla, M. L. et al. Origins of highly mosaic mycobacteriophage genomes. Cell 113, 171–182 (2003).

  8. 8.

    Heldal, M. & Bratbak, G. Production and decay of viruses in aquatic environments. Mar. Ecol. Prog. Ser. 72, 205–212 (1991).

  9. 9.

    Breitbart, M., Wegley, L., Leeds, S., Schoenfeld, T. & Rohwer, F. Phage community dynamics in hot springs. Appl. Environ. Microbiol. 70, 1633–1640 (2004).

  10. 10.

    Steward, G. F., Smith, D. C. & Azam, F. Abundance and production of bacteria and viruses in the Bering and Chukchi Seas. Mar. Ecol. Prog. Ser. 131, 287–300 (1996).

  11. 11.

    Minot, S. et al. Rapid evolution of the human gut virome. Proc. Natl Acad. Sci. USA 110, 12450–12455 (2013).

  12. 12.

    Dutilh, B. E. et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun. 5, 4498 (2014).

  13. 13.

    Yutin, N. et al. Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat. Microbiol. 3, 38–46 (2018).

  14. 14.

    Shkoporov, A. et al. ΦCrAss001, a member of the most abundant bacteriophage family in the human gut, infects Bacteroides. Preprint at (2018).

  15. 15.

    Barylski, J. et al. Analysis of spounaviruses as a case study for the overdue reclassification of tailed bacteriophages. Preprint at (2018).

  16. 16.

    Adriaenssens, E. & Brister, J. R. How to name and classify your phage: an informal guide. Viruses 9, 70 (2017).

  17. 17.

    Callahan, B. J., McMurdie, P. J. & Holmes, S. P. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11, 2639–2643 (2017).

  18. 18.

    NCBI Resource Coordinators Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 44, D7–D19 (2016).

  19. 19.

    Nicholls, S. M. et al. Probabilistic recovery of cryptic haplotypes from metagenomic data. Preprint at (2017).

  20. 20.

    Lim, E. S. et al. Early life dynamics of the human gut virome and bacterial microbiome in infants. Nat. Med. 21, 1228–1234 (2015).

  21. 21.

    Liang, Y. Y., Zhang, W., Tong, Y. G. & Chen, S. P. crAssphage is not associated with diarrhoea and has high genetic diversity. Epidemiol. Infect. 144, 3549–3553 (2016).

  22. 22.

    Piper, H. G. et al. Severe gut microbiota dysbiosis is associated with poor growth in patients with short bowel syndrome. JPEN J. Parenter. Enter. Nutr. 41, 1202–1212 (2017).

  23. 23.

    Vatanen, T. et al. Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans. Cell 165, 842–853 (2016).

  24. 24.

    Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).

  25. 25.

    Stachler, E. et al. Quantitative crAssphage PCR assays for human fecal pollution measurement. Environ. Sci. Technol. 51, 9146–9154 (2017).

  26. 26.

    Stachler, E. & Bibby, K. Metagenomic evaluation of the highly abundant human gut bacteriophage crAssphage for source tracking of human fecal pollution. Environ. Sci. Technol. Lett. 1, 405–409 (2014).

  27. 27.

    García-Aljaro, C., Ballesté, E., Muniesa, M. & Jofre, J. Determination of crAssphage in water samples and applicability for tracking human faecal pollution. Microb. Biotechnol. 10, 1775–1780 (2017).

  28. 28.

    Ahmed, W. et al. Evaluation of the novel crAssphage marker for sewage pollution tracking in storm drain outfalls in Tampa, Florida. Water Res. 131, 142–150 (2017).

  29. 29.

    Yatsunenko, T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222–227 (2012).

  30. 30.

    Santiago-Rodriguez, T. M. et al. Natural mummification of the human gut preserves bacteriophage DNA. FEMS Microbiol. Lett. 363, fnv219 (2016).

  31. 31.

    Maixner, F. et al. The 5300-year-old Helicobacter pylori genome of the Iceman. Science 351, 162–165 (2016).

  32. 32.

    Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  33. 33.

    Guerin, E. et al. Biology and taxonomy of crAss-like bacteriophages, the most abundant virus in the human gut. Cell Host Microbe 24, 653–664 (2018).

  34. 34.

    Raymond, F. et al. The initial state of the human gut microbiome determines its reshaping by antibiotics. ISME J. 10, 707–720 (2016).

  35. 35.

    Moeller, A. H. et al. Rapid changes in the gut microbiome during human evolution. Proc. Natl Acad. Sci. USA 111, 16431–16435 (2014).

  36. 36.

    Moeller, A. H. et al. Cospeciation of gut microbiota with hominids. Science 353, 380–382 (2016).

  37. 37.

    Zhernakova, A. et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science 352, 565–569 (2016).

  38. 38.

    Tigchelaar, E. F. et al. Cohort profile: LifeLines DEEP, a prospective, general population cohort study in the northern Netherlands: study design and baseline characteristics. BMJ Open 5, e006772 (2015).

  39. 39.

    David, L. A. et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature 505, 559–563 (2014).

  40. 40.

    Turnbaugh, P. J. et al. The effect of diet on the human gut microbiome: a metagenomic analysis in humanized gnotobiotic mice. Sci. Transl. Med. 1, 6ra14 (2009).

  41. 41.

    Singh, R. K. et al. Influence of diet on the gut microbiome and implications for human health. J. Transl. Med. 15, 73 (2017).

  42. 42.

    De Filippo, C. et al. Impact of diet in shaping gut microbiota revealed by a comparative study in children from Europe and rural Africa. Proc. Natl Acad. Sci. USA 107, 14691–14696 (2010).

  43. 43.

    Kovatcheva-Datchary, P. et al. Dietary fiber-induced improvement in glucose metabolism is associated with increased abundance of Prevotella. Cell Metab. 22, 971–982 (2015).

  44. 44.

    Edwards, R. A., McNair, K., Faust, K., Raes, J. & Dutilh, B. E. Computational approaches to predict bacteriophage-host relationships. FEMS Microbiol. Rev. 40, 258–272 (2016).

  45. 45.

    Manrique, P. et al. Healthy human gut phageome. Proc. Natl Acad. Sci. USA 113, 10400–10405 (2016).

  46. 46.

    Kupczok, A. et al. Rates of mutation and recombination in Siphoviridae phage genome evolution over three decades. Mol. Biol. Evol. 35, 1147–1159 (2018).

  47. 47.

    Schrago, C. G. & Russo, C. A. M. Timing the origin of New World monkeys. Mol. Biol. Evol. 20, 1620–1625 (2003).

  48. 48.

    Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

  49. 49.

    Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119 (2010).

  50. 50.

    Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).

  51. 51.

    Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

  52. 52.

    Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).

  53. 53.

    Zhou, X., Shen, X., Hittinger, C. T. & Rokas, A. Evaluating fast maximum likelihood-based phylogenetic programs using empirical phylogenomic data sets. Mol. Biol. Evol. 35, 486–503 (2017).

  54. 54.

    Dutilh, B. E. et al. Assessment of phylogenomic and orthology approaches for phylogenetic inference. Bioinformatics 23, 815–824 (2007).

  55. 55.

    Cinek, O. et al. Quantitative crAssphage real-time PCR assay derived from data of multiple geographically distant populations. J. Med. Virol. 90, 767–771 (2018).

  56. 56.

    Liang, Y., Jin, X., Huang, Y. & Chen, S. Development and application of a real-time polymerase chain reaction assay for detection of a novel gut bacteriophage (crAssphage). J. Med. Virol. 90, 464–468 (2018).

  57. 57.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

  58. 58.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  59. 59.

    Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 1998).

  60. 60.

    Ewing, B., Hillier, L., Wendl, M. C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).

  61. 61.

    Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000).

  62. 62.

    Knudsen, B. E., Bergmark, L. & Pamp, S. J. SOP—DNA isolation QIAamp Fast DNA Stool modified. Figshare (2016).

  63. 63.

    National Center for Biotechnology Information SRA Handbook (National Center for Biotechnology Information, 2009).

  64. 64.

    Stewart, C. A. et al. Jetstream: a self-provisioned, scalable science and engineering cloud environment. In Proc. 2015 XSEDE Conference Scientific Advancements Enabled by Enhanced Cyberinfrastructure 29 (ACM, 2015).

  65. 65.

    Towns, J. et al. XSEDE: accelerating scientific discovery. Comput. Sci. Eng. 16, 62–74 (2014).

  66. 66.

    Edwards, R. SearchSRA (2017);

  67. 67.

    Torres, P. J., Edwards, R. A. & McNair, K. PARTIE: a partition engine to separate metagenomics and amplicon projects in the Sequence Read Archive. Bioinformatics 33, 2389–2391 (2017).

  68. 68.

    Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).

  69. 69.

    Cantu, V. A., Sadural, J. & Edwards, R. PRINSEQ++, a multi-threaded tool for fast and efficient quality control and preprocessing of sequencing datasets. Preprint at (2019).

  70. 70.

    Levi, K., Rynge, M., Eroma, A. & Edwards, R. A. Searching the Sequence Read Archive using Jetstream and Wrangler. In Proc. Practice and Experience on Advanced Research Computing (2018).

  71. 71.

    Stallman, R. M., McGrath, R. & Smith, P. D. GNU Make: A Program for Directing Recompilation, for version 3.81 (Free Software Foundation, 2004).

  72. 72.

    Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

  73. 73.

    Letunic, I. & Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 44, W242–W245 (2016).

  74. 74.

    Berke, L. & Snel, B. The histone modification H3K27me3 is retained after gene duplication and correlates with conserved noncoding sequences in Arabidopsis. Genome Biol. Evol. 6, 572–579 (2014).

  75. 75.

    Meyer, F. et al. The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform. 9, 386 (2008).

  76. 76.

    Zhao, Y., Tang, H. & Ye, Y. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28, 125–126 (2012).

  77. 77.

    Vlčková, K. et al. Impact of stress on the gut microbiome of free-ranging western lowland gorillas. Microbiology 164, 40–44 (2018).

  78. 78.

    McNair, K., Zhou, C., Dinsdale, E. A., Souza, B. & Edwards, R. A. PHANOTATE: a novel approach to gene identification in phage genomes. Bioinformatics (2019).

  79. 79.

    Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

  80. 80.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).

  81. 81.

    Dutilh, Bas E., and Edwards, Robert A. crAssphage Data Repository on GitHub (Github, 2018);

Download references


We thank R. Matthews, M. Wright, J. Alexander, S. Arredondo, N. Branch, D. Campbell, R. Chea, D. McDougle, J. Parks and V. Vipatapat for providing access to wastewater treatment samples; the members of the Mountain Gorilla Veterinary Project and the staff of Maryland Zoo for collecting the gorilla faecal samples in Rwanda; G. Britton for collecting the baboon faecal samples in Ethiopia; staff of the CSWCT, the UWA and the UNCST for collecting the chimpanzee faecal samples in Uganda; J. Manor at Central Virology Laboratory, Chaim Sheba Medical Center, Tel-Hashomer Hospital and G. Steward, Department of Oceanography, University of Hawai’i at Manoa for help with sample collection; the COMPARE and LifeLines-DEEP projects for sharing data; O.D.N. thanks G. Steward, University of Hawai’i, Manoa for support. P.C.F. thanks C. Taylor for support with the PCR. Primate samples were provided by the PMC at the University of Illinois Urbana-Champaign; D.T.M. thanks the Australian Research Council’s Linkage Project LP160100408, Melbourne Water and EPA Victoria for funding the collection of samples in Melbourne. Gorilla samples were originally obtained by M.K. and the Mountain Gorilla Veterinary Project in Rwanda. G.R. and N.J.D. provided the wild baboon samples from Ethiopia. Howler samples were provided by M.K. and lemur samples were provided by R.E.J. and M.T.I., R.M.S. and L.M. provided the chimpanzee samples with permission from the CSWCT, the UWA and the UNCST. The primate microbiome project was supported by NSF BCS 0935347 to S.L., R. Stumpf, B.W. and K. Nelson. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. This work used the XSEDE Jetstream resources at Indiana University and Texas Advanced Computing Center through allocation MCB170036 to R.A.E., which is supported by National Science Foundation grant number ACI-1548562. Some of this work was supported by San Diego State University Grants Programs to R.A.E., including the Summer Undergraduate Research Program. This work was supported by National Science Foundation grant numbers MCB-1441985 to R.A.E. and DUE-1323809 to E.A.D; the Department of Energy Lawrence Livermore National Laboratory grant B618146 to R.A.E., P.A.d.J. and B.E.D. were supported by the NWO Vidi grant 864.14.004; F.L.N. by the NWO Veni grant 016.Veni.181.092; S.J.J.B. by the European Research Council Stg grant (638707) and the Vidi grant 864.11.005; O.C. and K. Mazankova by the Ministry of Health of the Czech Republic grant numbers 15-31426A and 15-29078A; P.C.F. by a Rutherford Discovery Fellowship from the Royal Society of New Zealand. J.J.B. by the ARC Discovery Early Career Researcher Award (DE170100525); S.L.D.M. by an NIH Pathway to Independence Fellowship (1K99AI119401-01A1); K.B. by award number 1510925 from the United States National Science Foundation; M.T.I. by National Geographic Society (CRE) and NSERC; and C.D. by the Agence Nationale de la Recherche JCJC grant ANR-13-JSV6-0004 and Investissements d’Avenir Méditerranée Infection 10-IAHU-03. The LifeLines-DEEP sample collection and analysis was funded by the Netherlands Heart Foundation (IN-CONTROL CVON grant 2012-03) to A.Z. and J.F., by the Top Institute Food and Nutrition, Wageningen, the Netherlands (TiFN GH001) to C.W., by NWO Vidi grants 864.13.013 to J.F. and 016.178.056 to A.Z., NWO Spinoza Prize SPI 92-266 to C.W., and by the ERC FP7/2007-2013/ERC Advanced Grant agreement 2012-322698 to C.W., ERC Starting Grant 715772 to A.Z. A.Z. also holds a Rosalind Franklin Fellowship from the University of Groningen. The COMPARE data collection was funded by The Novo Nordisk Foundation (NNF16OC0021856).

Author information

B.E.D. and R.A.E. conceived the study, performed the experiments and bioinformatics, and wrote the paper with input from all authors. A.A.V. performed the volunteer experiments and sampled San Diego wastewater treatment plants. F.L.N., H.M.N., M.O. and P.A.d.J. performed human volunteer experiments. A.M.E., A.R., A.T., D.A.C., J.M.H., K.L., K.McNair, T.C. and V.A.C. performed bioinformatics analysis. A.A.R.R., A.Alassaf, A.C., A.M., A.O., A.R.M., A.S.N., A.W., B.M.-G., B.M.E., C.D., C.F., C.H., D.C., D.K., D.T.M., E.A.D., E.B., E.N.I., E.N.S., E.S.L., G.A., G.C.-A., G.-S.C., G.T., H.H., H.N., J.A., J.J.B., J.J.T., J.M.C., J.M.M., J.W., K.B., K.L.W., K.Mazankova, L.C.S., L.D., M.A.U.I., M.K.M., M.L., M.M.Z., M.Morris, M.Muniesa, M.P., M.P.D., N.T., N.V., O.C., O.D.N., P.C., P.C.F., P.D., P.R., P.V., R.d.l.I., R.K.A., R.L., R.O., R.R., R.Santos, R.Strain, S.J.J.B., S.L.D.M., S.M., S.M.-M., S.W., T.C., T.J., U.Q. and Z.-X.Q. performed sampling, PCR and sequencing. A.K., A.Z., C.W. and J.F. performed the Lifelines analysis. F.M.A., H.Z. and R.S.H. provided and analysed COMPARE project data. A.Asangba, B.W., G.A.O.R., N.J.D., N.-p.N., R.Stumpf and S.L. analysed and provided the non-human primate sequences. M.C. collected gorilla samples. A.T., E.G. and K.M.G. performed the NYC sewage sampling and data analysis. A.J.P., J.S., L.C.M., P.J.T., S.R.H. and S.T.K. examined crAssphage transfer among infants. M.T.I. and R.E.J. collected lemur samples. M.K. collected howler monkey samples. D.L., K.R. created the map of the world figure. L.M. collected chimpanzee samples.

Correspondence to Robert A. Edwards or Bas E. Dutilh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–9, Supplementary Tables 1–6 and Supplementary References.

Reporting Summary

Supplementary File 1

Global sampling of crAssphage: the metadata and sequence data for each of the amplicon regions.

Supplementary File 2

Gretel strains: number of strains identified from all of the different samples in the SRA.

Supplementary File 3

Lifelines phenotype correlations: correlation, P value and adjusted P value for 207 exogenous and intrinsic human variables, and the presence of crAssphage in stools.

Supplementary File 4

Lifelines microbial correlations: correlation, P value and adjusted P value for the presence of 491 bacterial isolates and the presence of crAssphage in stools.

Supplementary File 5

SRA Runs: the identities of all runs in the SRA with matches to crAssphage, including the number of sequences that match, the total bases aligned and the average coverage.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark
Fig. 1: crAssphage presence or absence status over time in the human gut.
Fig. 2: Diversity of crAssphage strains in metagenomic samples
Fig. 3: Global locations of 2,424 crAssphage strains for amplicon A.
Fig. 4: Maximum likelihood phylogeny and dot plot showing full genomic colinearity between crAssphage and ten long contigs that were assembled from faecal metagenomes of different non-human primates.