Natural phage communities crosslink different species within the genus Staphylococcus


 The importance of the bacteriophage host range builds on its role as an innate barrier, which defines the phages’ impact on bacterial communities and genome diversity. Yet, little is known about host range natural patterns. We characterize 94 novel staphylococcal phages from wastewater and establish their host range on a diversified panel of 117 staphylococci from 29 species. Using this high-resolution phage-bacteria interaction matrix, we unveil a multi-species host range as a dominant trait of the isolated staphylococcal phages. Phage genome sequencing shows this pattern to prevail irrespective of taxonomy. Network analysis between phage-infected bacteria revealed that hosts from multiple species, ecosystems, and drug-resistance phenotypes share numerous phages. This could promote genetic mobilization facilitated by many transfer routes. Lastly, we demonstrate that phages throughout this network package foreign genetic material at various frequencies. Our findings defy a strong host specialism of phages and highlight great possibilities for horizontal gene transfer.


Introduction
Bacteriophages (phages) are the most abundant biological entities on Earth, and yet the 27 understanding of the association between bacteria and their infecting phages remains limited. 28 Host range interactions have profound implications on how phages influence bacterial 29 community composition and ecology 1,2 , or facilitate horizontal gene transfer 3-5 . Thus, the host 30 range is a central trait to understand in phage biology (reviewed in 6 ), which knowledge has 31 important applications in industry and human health 7 . The global health crisis of drug-resistant 32 pathogens could be ameliorated by phage therapy, a promising treatment strategy especially 33 for antibiotic resistant bacteria 8 . Paradoxically, phages may impose adverse implications, as 34 they could be suitable vectors for bacterial adaptation traits, such as virulence and 35 antimicrobial resistance determinants 9 . However, their exact role in the exchange of genetic 36 material remains unclear as transduction frequencies and the phages' range of influence is still 37 unsettled. Per definition, the phage host range refers to the taxonomic breadth of bacteria a 38 phage can successfully infect (reviewed in 10 ). Labor-intensive infection assays showed that 39 host ranges diversify from narrow to broad 11 . While "broad" and "narrow" are partially 40 conditioned by the genetic diversity of challenged hosts, a narrow host range is commonly 41 reported for phages that replicate on few hosts. In contrast, broad host range phages complete 42 their lifecycle in numerous strains from distinct or the same species 6 . To date, most isolated 43 phages (> 85 %) belong to the order Caudovirales 12 and are reported as specialists with narrow 44 host ranges. On average, they infect two strains from a single species 2 . However, a few 45 network studies from a compilation of published data or marine viruses find that phages can 46 infect a multitude of hosts and that different phage types predate each bacterial species 13-15 . 47 These data challenge a strong phage-host specialization and the reported frequency of narrow 48 host range phages. 49 Bacteria of the genus Staphylococcus are part of the natural skin microbiota of mammals and 50 life-threatening pathogens due to their increasing virulence and antibiotic resistance. 51 Therefore, they are considered as targets for a phage therapy approach. Adversely, phage 52 transduction, in particular generalized transduction, is increasingly perceived as the primary 53 route for mobilizing antibiotic resistance within this genus (reviewed in 16 ). Based on their ability 54 to produce coagulase, staphylococci are divided into the traditionally more pathogenic 55 coagulase-positive staphylococci (CoPS), with S. aureus as the major species, and coagulase-56 negative staphylococci (CoNS) such as S. epidermidis. Today, CoNS are recognized as major 57 nosocomial pathogens with limited treatment options due to a large proportion of antibiotic 58 resistant strains 17 . They are regarded as important reservoirs of antimicrobial resistance 59 determinants that could spread to clinical most critical species. Dissociated from clinical 60 manifestations and based on multilocus data, a refined phylogeny for staphylococcal species 61 into 15 species cluster and six species groups was suggested recently 18 . 62 9 Ultimately, all isolated phages productively infected at least one drug-resistant strain. Our 240 results evidence that infection of antibiotic-resistant strains by phages from anthropogenic 241 environments is common. 242 Bacteriophages could be suitable vectors for genetic exchange due to their vast abundance, 243 stability in the environment, and their ability to bridge the spatial separation of donor and 244 recipient bacteria 9 . The infection of hosts from diverse ecosystems is thereby a pre-requisite. 245 To assess the potential phage-induced transfer of genetic material, we tested the phages' 246 ability to connect hosts from the environmental, veterinary, or human biome. Only two phages 247  Table 14). 253 In conclusion, we show that diverse staphylococcal phages connect naturally occurring hosts 254 from different ecosystems and drug resistance phenotypes, suggesting this feature to be a 255 general competence. 256

Natural phage communities crosslink species within the genus Staphylococcus 257
Next, we analyzed the established biadjacency matrix focusing on the interplay between 258 infected bacteria rather than individual phage host ranges. To do so, we reduced the matrix to 259 the 60 phage-permissive strains, which represented 27 different staphylococcal species. The 260 network was collapsed into a bipartite projection in which hosts are represented as nodes and 261 phages as edges. An edge between two bacterial nodes indicates the presence of at least one 262 phage infecting both hosts, and the edges are weighted according to the number of phages 263 that do so. The projection showed an interconnected network with 1,030 host interactions 264 through 93 different staphylococcal phages (connectance = 0.58) (Table 3, Figure 4). We 265 sought to establish parameters that best describe how this natural phage community crosslinks 266 members of the genus Staphylococcus. On the one hand, we consider the number of shared 267 phages between two hosts as an important marker, as they indicate transfer routes and 268 opportunities for genetic exchange. Thus, the higher the number of shared phages between 269 two hosts, the higher the chance of genetic displacement as multiple phages could govern a 270 transfer. On the other hand, we recognize the number of direct neighbors, which is the count 271 of nodes connected by an edge to the specified node. Neighbors are a measure of centrality 272 and demonstrate the host's impact. 273 The bipartite projection revealed that staphylococcal strains from different species groups 18 274 share on average 3.8 ± 7.3 (n = 1293) phages. Moreover, individual staphylococcal strains and 275 species were connected by 4.3 ± 7.9 (n = 1770) (Supplementary Figure 4) and 4.1 ± 7.5 (n = 276 1666) different phages, respectively. However, staphylococcal strains from the same species 277 were significantly better connected (two-sided Wilcoxon rank sum test with continuity 278 correction: W = 237492, p-value = 1.93 e -15), as on average they were linked by 8.3 ± 11.4 279 (n = 104) different phages. Surprisingly, the two best connected hosts throughout this network 280 belong to different species, species groups 18 , and coagulase types: S. lugdunensis I0507 281 (Epidermidis-Aureus, CoNS) and S. schleiferi I3823 (Hyicus-Intermedius, CoPS), which share 282 sensitivity to 58 different staphylococcal phages. 283 In addition to the numerous transfer opportunities, we found a bacterium to be connected to 284 34.3 ± 13.6 (n = 60) strains from 17.4 ± 5.2 (n = 60) staphylococcal species through phages. 285 The strain with the highest number of neighbours is most likely to receive and donate genetic 286 material. We found S. vitulinus C5817 as the most central host that could interact with 56 of 287 59 available hosts. Furthermore, phage infections connected strains of the species S. Lastly, we appraised the connectivity between ecosystems by phages, as there is a rising fear 291 of genetic mobilization between the human, environmental and animal biome. To do so, we 292 assessed the number of shared phages between hosts of different origins. Surprisingly, we 293 found no significant difference in the average number of shared phages between hosts from 294 the same (4.7 ± 8.4, n = 565) or different biome (4.1 ± 7.6, n = 1205) (Two-sided Wilcoxon 295 rank sum test with continuity correction, W=1315504, p = 0.08981). Hosts of environmental 296 and veterinary origin, however, were exceptional well connected, as they share 5.7 ± 8.8 (n = 297 476) phages on average (Supplementary Figure 6b). Furthermore, of the on average 34 298 neighbors previously found for a host in this network, only 11.2 ± 7.0 (n = 60) share the same 299 isolation biome, whereas 23.1 ± 9.9 (n = 60) hosts derived from different ecosystems 300 (Supplementary Figure 7). The interconnection of spatially separated staphylococcal strains 301 becomes critical when addressing the dissemination of drug resistance determinants. Here, 302 we demonstrate that each drug resistant host is connected on average to 16.0 ± 6.7 (n = 33) 303 drug susceptible neighbors through 4.3 ± 8.0 (n = 891) different phages. Our findings evidence 304 the existence of multiple routes and opportunities for genetic material to be mobilized by 305 phages between hosts of different species, sources, and clinical relevance. 306

WWTPs are reservoirs for diverse CoNS phages 307
We sequenced the genome of 40 CoNS viruses of our natural phage community (Table 2) and 308 assessed their morphology by electron microscopy ( Figure 5). Among the 40 sequenced 309 phages, 29 were isolated from the WWTP inlet, seven from the outlet, and four were bacterial 310 lysogens. Overall, we identified 29 myoviral and 11 siphoviral morphologies. Isolated phages 311 from the raw wastewater revealed to be mainly myoviruses (with two siphoviruses), whereas 312 11 siphoviruses dominated in the treated water. All induced prophages were siphoviruses (Table  313 2). As anticipated, the sequenced myovirus' genome sizes ranged from 128.3 -145.1 kb, while 314 the siphoviruses separated into two groups between 42.2 -44.5 kb and 85.8 -92.2 kb 19 . 315 Interestingly, all siphoviruses with a larger genome were isolated as free viral particles and 316 displayed the distinct morphology with tails > 300 nm, while the smaller ones were solely 317 isolated after induction ( Figure 5). Lysogeny modules were only found in the genome of the 318 latter. This is coherent with literature, as smaller staphylococcal siphoviruses are predicted to 319 be temperate, whereas larger siphoviruses are presumably virulent 19 . To date, only three 320 representatives of the latter are reported. With the characterization of seven novel large 321 siphoviruses, we significantly extend the currently available sequencing landscape of this 322 phage fraction. Next, we assigned the closest phage relative for each of our novel phages 323 based on average nucleotide identity (ANI). Interestingly, 29 CoNS viruses shared a relatively 324 high genome identity (> 88 % ANI) with known staphylococcal phages, while the other 11 325 appeared to be distantly related (< 70 % ANI). We detected a total of 34 tRNAs among 18 326 phage genomes. All tRNAs-encoding phages corresponded to strictly lytic myoviruses or large 327 siphoviruses. These results are compatible with the hypothesis that tRNAs are more prevalent 328 among virulent phages. They are less well adapted to their replication hosts and hence, have 329 a compositional difference for codon or amino acid usage 33 . Lastly, we computed a 330 phylogenomic analysis using the phage genomes described herein along with 292 331 staphylococcal phages deposited on NCBI ( Figure 6). As a unique ecosystem, water from a 332 WWTP revealed to contain diverse staphylococcal phages from different families and genera. 333 The phylogenomic tree showed a good agreement between phage morphology, genome 334 length, and taxonomy. However, the extent of the phage host range seemed rather 335 independent, although members of the Herelleviridae infected the highest number of strains 336 and species, followed by ~ 90kb, and lastly, ~40 kb Siphoviridae ( Figure 6). It is feasible that 337 phages with larger genomes have an extended host range, as they enclose more space to 338 encode arrays of genes that could counteract host defenses. However, one should consider 339 that temperate phages may have a broader host range than observed by productive infection 340 assays. The detection of hosts, in which these phages pursue a lysogenic infection cycle, will 341 expand the here unveiled host-range breath. In conclusion, by sequencing 40 CoNS 342 staphylococcal phages from the same environmental niche, we greatly extend the spectrum of 343 genome diversity. We demonstrate that phages from diverse taxonomic groups infect bacteria 344 from numerous species, ecosystems, and drug resistant phenotypes within the genus 345

Staphylococcus. 346
Phages from diverse taxonomic groups encapsidate foreign genetic material 347 In this study, we showed the existence of an expansive network among bacteria of different 348 species mediated by phages. Ultimately, we appraised those phages' potential to incorporate 12 foreign genetic material. For this, we transformed a natural S. sciuri plasmid pUR2865 34 (3.83 350 kb) conferring chloramphenicol resistance into S. epidermidis S414. This strain was chosen as 351 donor, as it was infected by most sequenced phages (26) and by members of the Sipho-and 352 Herelleviridae. We propagated those phages on S. epidermidis S414/pUR2865 and quantified 353 the encapsidated plasmid pUR2865 by qPCR. In addition, generalize transducing 354 staphylococcal phage 80α and myovirus phage K were propagated on S. aureus 355 RN4220/pUR2865. The removal of contaminating non-encapsulated DNA was verified using 356 controls as established in 35 . Plasmid numbers ranged from 1.3x10 1 to 1.6x10 6 copies/ng phage 357 DNA with high variations between phage samples. Using the detected copy numbers, we 358 estimated the frequency of transducing particle formation. We assumed, that transducing 359 particles consist of plasmid multimers only 36 , and that as many base pairs of plasmid DNA are 360 incorporated as the respective phage genome length. Figure 7a summarizes the differences 361 in frequencies of phage transducing particles monitored per phage sample. The frequencies 362 of transducing particles harbouring the plasmid indicate that one out of 1.5x10 2 to maximal 363 2x10 7 phages package foreign genetic material. We expected high plasmid incorporation rates 364 for phage 80α and for the small siphoviruses, as transduction ability for those phages is 365 generally accepted 37 . To our knowledge, there is only one report of a generalize transducing 366 staphylococcal myovirus 26 . Strikingly, with our model, phage 80α showed comparable 367 frequencies (5x10 -6 to 7x10 -8 ) of transducing particles to the here characterized myoviruses. In 368 contrast, the small siphoviruses isolated from bacterial lysogens, and one large siphovirus 369 (PG-2021_46), showed particularly high frequencies between 6.6x10 -3 and 1.6x10 -5 . These 370 suggest a more targeted packaging approach of foreign genetic material. Thus, we assessed 371 the phage genome termini, which reflect its DNA packaging mechanism (Table 2, Figure 7b). 372 Interestingly, in several cases, predicted packaging mechanisms did not correlate with phage 373 morphology, and we find high encapsidation frequencies for phages with other packaging 374 mechanisms than the previously found transducing pac 38,39 or cos 40,41 phages. Yet, a pac 375 mechanism is likely for the four induced small siphoviruses with high encapsidation rates, as 376 PhageTerm predicted terminally redundant and circularly permuted genome ends. However, 377 due to a low statistical signal, a definitive confirmation was not obtained. 378 Our results confirm that plasmid-borne genetic material can be used by phages for 379 mobilization. Furthermore, we demonstrate that multiple phages from diverse taxonomic 380 groups package foreign genetic material, albeit at various frequencies. 381 These data impose great potential for phage-mediated genetic transfer among bacteria, 382 supported by the fact that phages are involved in far more numerous microbial connections 383 than previously assumed. While the infection of a broad spectrum of hosts is desirable for phages in therapy, it 403 simultaneously implies opportunities to transfer genetic material. Indeed, phage mediated 404 horizontal gene transfer is considered to be one of the primary driving forces for the spread of 405 antimicrobial resistance in staphylococci 16 . However, it is thought to occur rarely, and primarily 406 within species due to estimated narrow host ranges 51 . Using a bipartite network, we 407 demonstrate that multiple phages are shared between antimicrobial resistant and susceptible 408 hosts, and that each drug resistant host in this network is, on average, connected to 16 drug 409 susceptible neighbors. The many connections and routes confirm the potential role of phages 410 in the mobilization and dispersal of genetic material. Nevertheless, transduction ability has, so 411 far, only been awarded to some staphylococcal phages 37 . On these grounds, we quantified 412 bacterial DNA encapsidation rates for 19 myoviruses and 9 siphoviruses from this network. We 413 detected packaged plasmid DNA in all assessed phages, confirming this competence as 414 widespread among staphylococcal phages 26,37,39 . Our data indicate that one phage particle out 415 of every hundred to maximal 10 7 phage particle is transducing. However, those numbers do 416 not necessarily reflect the frequency of generalized transduction due to the following 417 reasoning. We propose that within phage transduction one must acknowledge two main 418 bottlenecks. First, the capability of phages to incorporate foreign DNA and at which frequency 419 transducing particles are being formed. This is dependent on individual phage characteristics, 420 and on type and location of the bacterial cargo DNA within the host. Second, the delivery and 421 14 expression of the cargo DNA in the recipient bacteria. This can highly differ between strains, 422 as it is mostly depending on the bacterial "immune system" such as restriction modification 423 systems and CRISPR-Cas 10 . In simplified models, studies propose that transduction 424 efficiencies, thus the successfully delivery and expression of cargo DNA in a recipient 425 bacterium, is approximately 3 % 38,39 . To this regard, upcoming studies will determine the ability 426 of the here detected transducing particles to spread the drug resistance element across this 427 unique network. 428 In conclusion, this study reveals an expansive interspecies communication network and place

Enrichment Cocktail Constitution 452
Five cocktails were generated to enrich staphylococcal phages from wastewater. 453 Staphylococcal strains for each cocktail were selected to produce a diverse community and 454 combined either randomly (cocktail A), or according to their origin (cocktail B: animal related 455 strains; cocktail C and D: environmental isolated strains; cocktail E: lab strains). To assure 456 growth harmony for each bacterium within a cocktail, strains with cross-infective prophages or 457 bacteriocin producers were excluded. For this, all selected strains were induced using 458 Mitomycin C and UV irradiation (protocol adapted from 86 ). Briefly, 50 µL of a fresh overnight 459 culture was inoculated in 5 mL TSB and incubated on a shaker for 2 hours at 37 °C. The initial 460 absorbance was measured at OD600. Mitomycin C was added to a final concentration of 461 0.5 µg/mL, and bacterial suspensions were shaken at 37 °C. For UV irradiation, cells were 462 centrifuged at 6,000 x g for 10 minutes at room temperature. The pellet was resuspended in 463 5 mL 0.1 M MgSO4 and irradiated with UV-Light (2400 µJ/cm 2 ). After irradiation, cells were 464 transferred to double strength TSB, protected from light, and incubated on a shaker at 37 °C. 465 The absorbance of both UV and Mitomycin C induced strains was then measured every hour 466 for 6 hours or until a decrease of the OD600 was observed. The bacterial cultures were then 467 centrifuged at 3,000 x g for 12 minutes at 4 °C, the supernatant 0.22 µm sterile filtered, and 468 stored at 4 °C. For all induction experiments, S. aureus Newman served as a positive control, 469 as it contains three inducible prophages that lyse S. aureus RN4220 87 . Spot assays were 470 performed to assess the presence of cross-reactive phages that interfere with the growth of 471 strains within a cocktail. 472 The radial streak method was applied to determine whether cocktail members restrain the For prophage induction and isolation, bacterial pellets frozen from wastewater were thawed 497 and resuspended in 20 ml double strength TSB supplemented with 6.5 % NaCl for 498 staphylococcal enrichment. After overnight incubation, 10 ml of each enrichment was added 499 to 490 ml TSB, and the initial absorbance (OD 600) was measured. Cells were grown until an 500 OD600 of 0.5, and the sample split for the induction with Mitomycin C or UV irradiation. For 501 Mitomycin C induction, a final concentration of 1 µg/mL was added, and the suspension was 502 incubated at 37 °C for 6 hours. For UV irradiation, cells were centrifuged at 6,000 x g for 10 503 minutes and the pellet resuspended in 125 mL 0.1 M MgSO4. This resuspension was irradiated 504 (4400 µJ/cm 2 ), transferred to 125 ml double strength TSB, protected from light, and was 505 incubated for 6 hours at 37 °C. Finally, induced samples were centrifuged at 10,000 x g for 15 506 minutes at 4 °C, the supernatants 0.22 µm PES sterile filtrated, and stored at 4 °C. For 507 temperate phage detection, serially diluted phage suspensions were dropped on all hosts 508 selected for host range determination (Hosts in Supplementary Table 10). If either a zone of 509 lysis or individual plaques were visible after overnight incubation, phages were picked and 510 purified as described above. 511

Phage Host Range Determination 512
Phage host ranges were assessed on 123 strains (32 species) that originated from human 513 Phages with equal host ranges on all 123 hosts were clustered, and further characterizations 528 were continued with one selected phage per cluster. 529

Phage Propagation 530
Phages were produced using the double-agar-layer method and washed off 20 to 80 semi-531 confluent lysis plates using SM buffer (200 mM sodium chloride, 10 mM MgSO4, 50 mM tris, 532 and 0.01 % gelatin, pH 7.4) and agitation for 4 hours (20 rpm). The phage lysates were 533 collected, and cellular debris or agar remnants were removed by centrifugation at 5,000 x g for 534 10 minutes at 4 °C. The supernatant was 0.22 µm sterile filtrated. Phage particles were 535 precipitated with 7 % PEG8000 supplemented with 1 M NaCl in ice water for two days. The 536 precipitated phages were collected by centrifugation at 10,000 x g for 20 minutes at 4 °C, and 537 pellets were dissolved in 8 mL SM buffer. Phages were purified by CsCl ultracentrifugation. 538 Briefly, the density of each phage suspension was adjusted 1.15 g/mL using CsCl and added 539 18 on top of a three-layer (1.7, 1.5, and 1.35) CsCl density gradient. The gradient was centrifuged 540 at 82,000 x g for 2 hours at 10 °C, and the phages were collected between the 1.35 and 1.5 541 density layers. All purified phages were dialyzed overnight at 4 °C in 4 L SM buffer (50 kDa cut 542 off) under gentle magnetic stirring. 543

Phage DNA Extraction 544
Phage DNA was extracted using the phenol/chloroform DNA extraction method. In short, 640 545 µL of propagated phage lysate (> 10 10 pfu/mL) were treated with 10 U DNase I for 1 hour at 546 37 °C, and the enzyme heat-inactivated for 10 minutes at 65 °C in the presence of 20 mM 547 EDTA. Proteinase K was added to a final concentration of 100 µg/ml, the sample vortexed and 548 incubated for 1 hour at 50 °C, 300 rpm. Next, one volume of phenol:chloroform:isoamyl alcohol 549 (25:24:1) was added, the sample centrifuged for 13'000 x g for 15 minutes, and the aqueous 550 layer extracted. This step was repeated with 1 volume chloroform:isoamyl alcohol (24:1). DNA 551 was precipitated by adding 50 µl 5 M NaCl and 0.7 volumes of isopropanol. The next day, the 552 DNA was pelleted with 13'000 x g for 20 minutes at 4 °C, and the pellet washed twice with ice-553 cold 70 % EtOH. DNA was resuspended in 50 µl 10 mM Tris (pH = 8.0), and the concentration 554 was measured using Qubit. 555

Genome Library Preparation, Sequencing and Bioinformatics 562
Forty phages were selected for whole genome sequencing. Precedence was given to phages 563 Sequel. De-multiplexed reads were assembled using the Hierarchical Genome Assembly 575 Process 91 (HGAP4, SMRT Link v8.0.0). When needed, Sanger sequencing was used to close 576 gaps in the assembled genomes. Open reading frames (ORFs) were predicted with 577 PHANOTATE 92 and annotated using multiPhATE 93 with blastn against the NCBI virus, blastp 578 against pVOGs 94 , PhAnToMe, and NCBI virus, and jackhammer against the pVOGS database. 579 Potential tRNAs in phage genomes were predicted using tRNAScan-SE v2.0.5 95 . Phage 580 termini were predicted using PhageTerm 96 . 581

Phylogenetic Analysis 582
Biopython 32 package was used within the conda environment to retrieve fully sequenced 583 staphylococcal phage genomes deposited at GenBank as of June 2020 (n = 292) 19 . Unverified 584 cRNA or partial phage genomes were excluded from the analysis. The closest relative on NCBI 585 was determined by average nucleotide identity (ANI) values as in 97,98 . Distances between 586 genomic sequences for phylogenomic analysis were calculated as described in 99,100 , and the 587 tree visual represented in iTOL 101 . 588

Network Analysis 589
The network analysis was based on the host range matrix consisting of 123 bacterial hosts 590 and 94 phages isolated from wastewater. A binary incidence matrix was generated from the 591 data in which infections are indicated as one, and no interaction is marked as zero. Phage 592 resistant hosts (n = 63/123) were removed, and a bipartite network was generated using the 593 R package igraph 102 . In this network, phage permissive bacteria (60) and respective phages 594 All test statistics were calculated with R, using the base package stats. For data manipulation 654 and plotting, dplyr 106     Connectance (C = I /M) 0.58