CRISPR-resolved virus-host interactions in a municipal landfill include non-specific viruses, hyper-targeted viral populations, and interviral conflicts

Viruses are the most abundant microbial guild on the planet, impacting microbial community structure and ecosystem services. Viruses are specifically understudied in engineered environments, including examinations of their host interactions. We examined host-virus interactions via host CRISPR spacer to viral protospacer mapping in a municipal landfill across two years. Viruses comprised ~ 4% of both the unassembled reads and assembled basepairs. A total of 458 unique virus-host connections captured hyper-targeted viral populations and host CRISPR array adaptation over time. Four viruses were predicted to infect across multiple phyla, suggesting that some viruses are far less host-specific than is currently understood. We detected 161 viral elements that encode CRISPR arrays, including one with 187 spacers, the longest virally-encoded CRISPR array described to date. Virally-encoded CRISPR arrays targeted other viral elements in interviral conflicts. CRISPR-encoding proviruses integrated into host chromosomes were latent examples of CRISPR-immunity-based superinfection exclusion. The bulk of the observed virus-host interactions fit the one-virus-one-host paradigm, but with limited geographic specificity. Our networks highlight rare and previously undescribed complex interactions influencing the ecology of this dynamic engineered system. Our observations indicate landfills, as heterogeneous contaminated sites with unique selective pressures, are key locations for atypical virus-host dynamics.

In the 2016 network, six of 59 viral elements were predicted to infect two or more distinct hosts.
In the 2017 network, eleven of 161 viral elements were predicted to infect two or more distinct hosts. In the 2016-host-to-2017-viruses network, one of 79 viral elements were predicted to infect two or more distinct hosts. In the 2017-hosts-to-2016-viruses, thirteen of 125 viral elements were predicted to infect two or more distinct hosts.

Curation of cross-phylum and cross-domain virus-host interactions
14 viral elements were predicted to infect multiple hosts that are from different phyla (8) or domains (6), in a total of 10 interactions (Supplemental Table 1). Curation took the following requirements into consideration: 1) Host genome quality -following our baseline thresholds of >70 completion and <10 contamination, all 10 interactions pass 2) CRISPR-encoding scaffolds' taxonomic annotations from JGI scaffold annotations 1 and/or the Contig Annotation Tool 2 were required to be consistent with GTDB-tk-based MAG taxonomy 3 -six of the remaining interaction sets had erratic scaffold annotations (four remaining) 3) CRISPR-encoding scaffolds were required to be minimum 4,000 bp long -no additional interactions were eliminated. Four interactions pass quality curation.
Only in the case of GW1_158 (class Paceibacteria) and LW4_67 (genus Methylobacter) did two MAGs share any spacers in the CRISPR array responsible for the targeting of their mutually shared virus. Notably, GW1_158's array contained 57 spacers and LW4_67's contained 78, but 56 of these spacers were shared with 100% nucleotide identity. The scaffold of LW4_67 relevant to this analysis is the longer of the two (15,46 includes an intact Type I CRISPR-Cas system 4 , and numerous coding sequences with predicted species-level taxonomy that is in accordance with the GTDB-tk annotated taxonomy of this MAG. While the scaffold of GW1_158 relevant to this analysis only has a length of 4,111 bp, this scaffold also has predicted species-level taxonomy, according to the Contig Annotation Tool, that is in accordance with the GTDB-tk annotated taxonomy of this MAG. We thus predict the shared spacers are the result of a possible horizontal gene transfer of a CRISPR array via a mobile genetic element 5 , but could also be due to a misbin, a possibility that cannot be refuted for such a short fragment. In all other cases, no spacers were shared, reducing the likelihood that the predicted shared viral targeting is underlaid by lateral gene transfer or mis-binning (Supplemental Table 1).
Three of the refuted cross-domain interactions all include the same Halobacterota bin (CLC_33_Methanoculleus_2017), and three different bacterial partners -two Cloacimonadota and one Spirochaetota (CLC_82_Cloacimonadaceae_2017, LW3_15_Cloacimonadaceae_2017, and CLC_92_Sphaerochaeata_2017). CLC_33_Methanoculleus_2017 contains a CRISPR array on a relatively short scaffold (3,667 bp) that targets two different viral scaffolds, each of which is targeted by a distinct Cloacimonadota MAG. This same Methanoculleus MAG encodes a second CRISPR array on another short scaffold (3,710) that targets a different viral scaffold that is also targeted by a Sphaerochaeata MAG. This may be a legitimate signal, or the result of multiple orphaned CRISPR arrays or mobile elements binning into this genome. We note the observation as worth monitoring in other systems and future samples from this system.
We anticipate that, as computational methods for associating hosts and viral elements improve, there will be more instances of viruses that show potential to infect across relatively high taxonomic levels. We hope that rather than defaulting to these interactions being workflow artifacts, or considering them as ground truths, they are examined critically on a case-by-case basis.

Hyper-targeting of viral elements
In the 2016 network, a member of the family Acholeplasmataceae_A (LW2_3) showed the strongest targeting, with 48 CRISPR spacer matches to a single viral element (v111_2016) ( Figure   1B Table 1]), we now also observed 2017 hosts with the ability to hypertarget viral elements ( Figure   1B). It is notable that these four viral elements are not only hyper-targeted, but are also predicted to infect members of different phyla (phylum Bacteroidota and phylum Firmicutes). Members of these two phyla (LW2_63_Bacteroidales, and LW2_3_ Acholeplasmataceae_A) also shared a viral element in the 2016 network (v80_2016, Supplemental Figure 3), but only LW2_3_ Acholeplasmataceae_A hyper-targeted the viral element in that case.
We posit that the shared hyper-targeting across multiple phyla is a result of LGT or misbinning of CRISPR-encoding scaffolds. CLC_107_Bacteroidales_2017 scaffold Ga0265294_10000314 (length of 63,905 bp, 147 spacers encoded) has a six mismatch, 4,167 bp alignment with LW2_3_Acholeplasmataceae_A_2016 scaffold Ga0172382_10051677 (length of 4,167bp, 63 spacers encoded) and thus mutual hyper-targeting between distinct phyla is likely due to a misbin of scaffold Ga0172382_10051677. Furthermore, LW2_114_Bacteroidales_2017 scaffold Ga0265293_10000924 (length 67,619 bp, 182 spacers encoded) has a single mismatch 3,322bp alignment with LW2_22_Acholeplasmataceae_2016 scaffold Ga0172382_10071763 (length 3,322bp, 34 spacers encoded) and thus mutual hyper-targeting between distinct phyla is likely due to a misbin of scaffold Ga0172382_10071763. LW2_22_Acholeplasmataceae_2016 and LW2_114_Bacteroidales_2017 also each encode two other CRISPR arrays that share five and two spacers, respectively, between MAGs.
Despite the striking differences between their viral element targeting profiles, CLC_107_Bacteroidales_2017 and LW2_114_Bacteroidales_2017 are predicted to have an Average Nucleotide Identity (ANI) of >98.5 according to dRep 6

Virally-encoded CRISPR arrays convolute virus-host networks
While linking hosts to their putative viruses using CRISPR spacer matches to protospacers, we also came across instances of viral elements that encoded CRISPR arrays and CRISPR-Cas systems. Virus-encoded CRISPR arrays and CRISPR-Cas systems have been identified before 5,7-to play roles in regulating host transcription and translation 11 , CRISPR-Cas system inhibition 5 and interviral conflicts 5,8 . In the case of interviral conflicts, a CRISPR-encoding virus, integrated as a provirus, can provide its host with immunity against other viruses, akin to a superinfection exclusion system 5 . For this work, we used the full set of viral predictions prior to clustering with CD-HIT 15 in order to preserve CRISPR array variation between closely-related viral elements. scaffold. This could suggest that the prophage introduced CRISPR-Cas systems into its host's genome upon integration or that the CRISPR-Cas systems of this host were co-opted by its virus after excision from the host chromosome. The CRISPR array being outside the prophage boundary predicted by Prophage Hunter is likely due to partial degradation of the prophage over time.
LW2_146_ Pigmentiphaga has three connections to v36_2016 in the 2017-host-to-2016-virus network, all of which are predicted to be due to prophage-mediated immunity.
We next repeated the workflow described above to examine CRISPR arrays in the viral

Supplemental Tables and Figures
Supplemental   P h a g e p o rt a l p ro te in G e n e 4 9 p ro te in P h a g e te rm in a se la rg e su b u n it E n d o n u cl e a se P ro p h a g e -d e ri ve d u n ch a ra ct e ri ze d p ro te in Y o p X P a rB -l ik e n u cl e a se d o m a in p ro te in B e ta -g lu co sy l-H M C -a lp h a -g lu co sy l-tr a n sf e ra se P h a g e m a jo r ca p si d p ro te in Ta