Transmission and persistence of crAssphage, a ubiquitous human-associated bacteriophage

The recently discovered crAssphage is by far the most abundant and ubiquitous known human gut bacteriophage. It appears to be highly specific to the human gastrointestinal tract; however, the patterns of transmission and persistence of this bacteriophage are unknown. Here, we identify modes of transmission and describe long-term persistence of crAssphage in several human populations. We find that most humans harbor a single, dominant strain of crAssphage in their microbiome. This is in contrast to the bacterial microbiota, where individuals can harbor a variety of closely- or distantly-related strains of the same bacterial species. We show that crAssphage can be vertically transmitted from mother to infant, acquired through fecal microbiota transplantation, and transmitted in immunocompromised hosts in a hospital setting. We also observe that once a crAssphage strain is acquired, it persists stably within an individual over a timescale of months. These results enhance our understanding of the dynamics of crAssphage, which has emerged as one of the most successful human-associated microbes, and provide a foundation for future studies of the role of this phage in the biology of the human microbiome.

and provide a foundation for future studies of the role of this phage in the biology of the human 23 microbiome. 24 25

Main text 26
In addition to trillions of bacteria, the human gastrointestinal tract is densely populated 27 with bacteriophages. Bacteriophages can drive bacterial community composition and mediate 28 horizontal gene transfer 1 , and alterations in the human gut virome have been associated with 29 disease 2,3 . Yet, our knowledge of the contributions of specific bacteriophages to human biology 30 is limited, in part due to the paucity of viral sequences represented in reference databases. 31 High-throughput sequencing and advanced genomic tools have facilitated the in silico discovery 32 and characterization of previously unknown bacteriophages. The preeminent example of such a 33 discovery is crAssphage (cross Assembly phage), initially identified from human virome 34 sequencing data 4 . CrAssphage is a bacteriophage with an ~97 kilobase circular, double-35 stranded DNA genome. Interestingly, crAssphage sequences are found almost exclusively in 36 human fecal metagenomes, and can be highly abundant. Initial estimates indicate that 37 crAssphage is present in 73-77% of humans 4,5 , challenging the notion that the gut virome is 38 highly individual-specific. Subsequently, it has been shown that a wide range of crAss-like 39 phages exists in nature 5,6 . However, whether or how crAssphage influences host biology or is 40 involved in disease is unknown. To answer higher-order questions about the role of crAssphage 41 in human biology, it is necessary to establish basic principles of crAssphage acquisition, 42 persistence, and distribution. To this end, we analyzed crAssphage sequences from both 43 published and novel datasets, finding that crAssphage typically exhibits monoclonal dominance 44 in a given individual and can be transmitted vertically from mothers to infants as well as 45 horizontally in adults with compromised or simplified microbiomes. 46

47
To determine whether individuals have one or many crAssphage strains in the gut metagenome, 48 we identified sites with more than one variant present in the crAssphage genome (multi-allelic 49 sites) from stool metagenomic sequencing data from individuals from four cohorts (Table S1) 7-9 . 50 We aligned metagenomic sequencing reads to the crAssphage reference genome 4 and 51 identified single-nucleotide variants (SNVs) that are present at intermediate frequency (between 52 10% and 90% frequency). We limited our analysis to metagenomes with 30X coverage or 53 greater of the crAssphage genome. We find that 77 of 106 metagenomes (73%) have fewer 54 than 50 multiallelic sites (Figure 1), suggesting that most individuals are likely colonized by a 55 single crAssphage or near-identical crAssphages. 28 of 106 metagenomes (26%) have between 56 50 and 999 multiallelic sites, corresponding to polymorphism in roughly 0.05% to 0.999% of the 57 genome. One of 106 metagenomes (1%) have 1000 or more multiallelic sites, corresponding to 58 greater than 1% of the genome. These results suggest an exclusion principle which favors 59 colonization of a particular strain within the gut of an individual, though notably, a minority of 60 individuals may be simultaneously colonized by more than one crAssphage strain. The number 61 of multi-allelic sites stays relatively stable within individuals over time ( Figure S1). We observe 62 that SNVs are relatively evenly distributed throughout the genome ( Figure S2 where crAssphage is acquired. Given the apparent specificity of crAssphage to the human gut 74 as opposed to other mammals or the environment, we hypothesized that crAsspage is likely 75 acquired through human-to-human contact. It is well-documented that infants acquire many of 76 their first microbes, such as Bacteroides species, from their mother during and after 77 delivery 7,8,12,13 . However, it has been shown that adult twins and their mothers have unique gut 78 viromes 14 . Given that Bacteroides species are believed to be the bacterial host of 79 crAssphage 4,15 , we postulated that crAssphage is vertically transmitted from mother to infant, 80 similar to what is observed for many bacterial taxa and in contrast to what is reported for other 81 members of the human virome. To test the hypothesis that crAssphage is vertically transmitted, 82 we examined publicly available shotgun metagenomic data from two stool microbiome 83 datasets 7,8 from mothers and their infants (n=142 mother-infant pairs). We evaluated 84 crAssphage presence and relatedness using StrainSifter 16 , a tool that performs variant calling 85 and phylogenetic analysis of microbial genomes. We considered metagenomes to contain the 86 phage if there were reads mapping to the crAssphage reference genome with at least 5X 87 coverage. We detected crAssphage strains in 27 of the 142 mothers studied (19%) and 16 of 88 142 infants (11%) (Figure 2). Of the 27 mother-infant pairs where crAssphage strains were 89 detected in at least one maternal sample, we find that 6 pairs (22%) share an identical or highly 90 related strain of crAssphage between the mother and infant, indicating that crAssphage can be 91 vertically transmitted from mother to infant. Ten of 142 infants (7%) harbor a strain of 92 crAssphage that is not detected in the mother's stool. This could be a result of sampling during 93 a low-crAssphage state in the mother; alternatively, these infants may have acquired 94 crAssphage from another individual in their household. 95 96 It has previously been reported that birth mode does not influence crAssphage relative 97 abundance in the gut virome of Irish infants 11 . In the two mother-infant cohorts analyzed here, 98 we only detect crAssphage in the gut microbiome of vaginally-born infants. Zero of 22 99 Cesarean-born infants have crAssphage in their stool samples at 5X coverage or greater; this is 100 not statistically significantly within this sample collection (Fisher's exact test; p=0.1332). 101 However, these samples were obtained from highly heterogeneous populations in diverse global 102 regions, and studies of larger cohorts are necessary to definitively determine the relationship 103 between birth mode and crAssphage colonization. Notably, crAssphage is not sufficiently 104 abundant to be detected in meconium or shortly after birth in our analysis and is only found in 105 infant stool sampled at least one month after birth. It is important to note, however, that these 106 datasets comprise total metagenomic shotgun sequencing as opposed to enriched viral 107 particles. It is possible that crAssphage particles are transmitted to the infant during vaginal birth 108 and persist in the infant gut, but that they are not detected until host Bacteroides strains achieve 109 higher relative abundance later in development 7,17,18 . Alternatively, it is possible that crAssphage 110 is not sufficiently abundant to be detected in total shotgun metagenomic data in some samples.  Table S2.    hospital. We observe that three patients from each clade occupied the same room at different 203 times (Figure 4b), raising the possibility that individuals acquired crAssphage from a shared 204 source within that room, or that one individual's crAssphage was left behind and persisted in the 205 hospital room, and was acquired by subsequent occupants. These data suggest that 206 crAssphage may be acquired through environmental contact, and that crAssphage may be 207 'viable' outside of its likely obligately anaerobic bacterial host or the human body. Furthermore, 208 a substantial inoculum of crAssphage in stool may not be required to transmit a new strain of 209 crAssphage to an individual. This level of exposure is more consistent with the amount that 210 infants experience at birth, a much less dramatic and less direct exposure than FMT. 211

212
The results herein demonstrate various modes of acquisition and transmission of crAssphage. 213 We show that crAssphage can be vertically transmitted, and that it is usually present in the gut 214 as early as a few months after birth. We show that crAssphage persists through sampling on the 215 scale of months, and perhaps even longer, indicating that the host immune system tolerates this 216 phage and does not mount an immune response. This may provide an important clue toward 217 the evolutionary history of host and microbe, as the human immune system may have evolved 218 tolerance to this virus. Future work remains to determine precisely whether, and how, 219 crAssphage influences the gut ecosystem. To do so, it will be necessary to determine the 220 host(s) of these crAssphages through isolation and culture experiments. With this knowledge, it 221 will be possible to design and test hypotheses that will help to elucidate the role of crAssphage 222 in the biology of the human gut. Metagenomic reads were aligned to the crAssphage reference genome (NCBI RefSeq 258 NC_024711.1) and variants were identified using snippy with default parameters 26 . For multi-allelic site analysis, the raw vcf output from snippy was filtered with bcftools 27 to include only 260 SNVs with frequency between 0.1 and 0.9. 261 262

Mother-baby and HCT phylogeny 263
Phylogenetic trees were constructed using the StrainSifter pipeline as previously described 16 . 264 Briefly, reads are aligned to the crAssphage reference genome using the Burrows-Wheeler 265 Aligner v0.7.10 28 and to include only high-confidence alignments with mapping quality of 60 266 using the SAMtools 29 view and filtered using BamTools 29,30 filter (v2.4.0) to include only reads 267 with 5 or fewer mismatches. Samples in which reads cover at least 50% of the genome at a 268 depth of 5X were included. Pileup files are created from BAM files using SAMtools mpileup, and 269 SNVs with at least 0.8 frequency are identified and concatenated into a fasta file, from which a 270 multiple sequence alignment is created using MUSCLE 29-31 and a phylogenetic tree is computed 271  Table S1. Sequence data unique to this 285 manuscript will be deposited in the NCBI SRA at the time of publication. 286 287 Acknowledgements 288 We thank Tessa Andermann, Joyce Kang, Ekaterina Tkachenko, and Ryan Brewster for 289 collecting and sequencing HCT patient stool samples. We also thank the patients and nurses on 290 the Blood and Marrow Transplantation service for their enthusiastic participation in collecting 291 HCT samples for this project. We thank Dylan Maghini for helpful comments on the manuscript. 292 We thank Ryan deGive for his assistance in accessing patient location information. This work 293