Genomic structure and insertion sites of Helicobacter pylori prophages from various geographical origins

Helicobacter pylori genetic diversity is known to be influenced by mobile genomic elements. Here we focused on prophages, the least characterized mobile elements of H. pylori. We present the full genomic sequences, insertion sites and phylogenetic analysis of 28 prophages found in H. pylori isolates from patients of distinct disease types, ranging from gastritis to gastric cancer, and geographic origins, covering most continents. The genome sizes of these prophages range from 22.6–33.0 Kbp, consisting of 27–39 open reading frames. A 36.6% GC was found in prophages in contrast to 39% in H. pylori genome. Remarkably a conserved integration site was found in over 50% of the cases. Nearly 40% of the prophages harbored insertion sequences (IS) previously described in H. pylori. Tandem repeats were frequently found in the intergenic region between the prophage at the 3′ end and the bacterial gene. Furthermore, prophage genomes present a robust phylogeographic pattern, revealing four distinct clusters: one African, one Asian and two European prophage populations. Evidence of recombination was detected within the genome of some prophages, resulting in genome mosaics composed by different populations, which may yield additional H. pylori phenotypes.

Prophage genome characteristics. We were able to close the physical gaps between contigs in over 90% of the prophage genomes using PCR and Sanger sequencing. In most cases the prophage contigs were separated at insertion sequences, repetition zones and/or sequences showing homology with other bacterial genes. A prophage was considered intact if the size was larger than 20 Kb. According to this criterion, prophages were found to be intact in 23 of the 28 genomes (82%) ( Table 1). The other five genomes showed remnant prophages (Table S2, Supplementary Information) between 11.6 Kb and 19.8 Kb. Intact prophages were initially divided in several contigs (min 1-max 7) and have an average of 34 predicted genes (min 24, max 39), 28.7 Kb (min 22.6, max 33.0), and 36.7% GC, which is in line with other H. pylori prophages described 11,28 . The bacterial average GC percentage was 39.0%, suggesting horizontal gene transfer of the prophage region.
The gene content of intact prophages was similar to phage KHP30, a known complete phage with lytic cycle 30 . The intact prophage genomes had a rather similar sequence (Figures 1 and S1, Supplementary Information) with a reasonably conserved gene order (Tables S3 and S4, Supplementary Information) and in clear contrast with the host H. pylori, where the occurrence of genome rearrangement is well known 36 . Genome annotation of prophage genes produced with either RAST 37 or PHAST 38 revealed that most of the open reading frames (ORF) corresponded to hypothetical proteins, disclosing the diversity of prophage genes and the consequent difficulty in the annotation process. The annotation with Phages 1.0 (http://www.phantome.org/PhageSeed/Phage.cgi?page= phast) did not add more information and was not further considered.
The similarity of prophage genomes was also quantified as a heat-map ( Figure S2, Supplementary Information). This similarity matrix confirmed the percentages of bases which were identical. Only one prophage genome, strain Pt-4481-G, harbored a rearrangement ( Figure S3, Supplementary Information) (Table S4, Supplementary Information), different scenarios were observed: (i) one phage (Sw-C388-G) has lost the putative DNA primase and helicase, among other proteins of unknown function placed in the first half of the prophage genome; (ii) two phages (Sw-C520-G and Pt-259-G) most likely lost the second half of the prophage sequence; and (iii) another two phages (Is-3180-G and Pt-5303-G) most likely lost specific ORFs. Among the later, only Is-3180-G could be assembled, yielding less than 20 kb. Insertion Sequences. Insertion sequences (IS), comprised of two ORFs inserted into prophage genomes were found in 39.1% (9/23) of complete prophages (Table S5, Supplementary Information) classified (according to PST typing) as hpNEurope (n = 2), hpAfrica1 (n = 5) and hpEastAsia (n = 2), and in 50% (3/6) of remnant prophages classified as hpNEurope (n = 1), hpAfrica1 (n = 1) and hpSWEurope (n = 1). The complete prophages Uk-EN31-U, Uk-EN32-U, Pt-B92-G and Fr-GC43-A had IS605 inserted once in the first three cases and twice in the last case. Interestingly, in Fr-GC43-A one copy of IS605 was inverted in relation to the other copy ( Figure S4, Supplementary Information). IS605 was inverted in Uk-EN31-U, Uk-EN32-U and in one of the IS of Fr-GC43-A. The prophages Pt-228_99-G, Fr-ANT170-U and Fr-MEG235-U had two copies of ISHp608. The IS was inserted in a reverse order in relation to the other copy in Fr-ANT170-U, and twice with the same orientation in Pt-228_99-G. The third IS found was IS607 in genomes Pt-1293-U and Fr-B58-M.
Concerning remnant prophages, Sw-C388-G has the IS606 inserted at its 3′ end and the second ORF is again truncated in two. Finally, Is-3180-G carries ISHp608. The remnant prophage Pt-5303-G could not be completely assembly but ISHp608 was also found in a separate contig. Despite all of our efforts, we were not able to determine if this IS was inserted into the prophage genome or not.
IS were not always found at the same position in the prophage genomes, but prophages from strains of the same country of origin tended to present the same IS at same genome context (Table S5, Supplementary Information). Nevertheless, IS were present in most cases (9/13, 69%) immediately before DNA helicase (2/9), either before or after DNA primase (4/9), after structural protein (2/9), or after holin gene (1/9), which therefore could be considered as hotspots for IS in prophages.
The transposase genes from IS605 were inserted near the lysis cassette, as described for Mu-like phages 39 , DNA helicase and DNA primase. IS607 was located adjacent to DNA primase or a structural protein and ISHp608 near DNA primase, portal protein or structural protein. In a few cases IS were inserted into a coding sequence of a structural protein (Pt-1293-U and Pt-228_99-G) or a hypothetical protein (Pt-B92-G, Fr-ANT170-U and Fr-MEG235-U), which may impinge on transcription, and the prophage genes may be non-functional. Accordingly, IS do not appear to be randomly inserted into prophage genome. Our hypothesis is that the presence of IS within the prophage genome may inactivate the lytic cycle, benefiting the host.

Prophage insertion site.
Knowledge of the insertion site of prophages provides clues about ancient acquisition and vertical heritage. Accordingly, prophages at similar loci in different genomes can derive from a single ancestral prophage 20 . Furthermore, H. pylori prophage insertion sites have not been extensively studied before.
Prophage insertion site was mostly conserved among H. pylori PST populations. Interestingly, about 50% of the prophages enrolled in the present study and especially for the populations hpAfrica1 and hpNEurope are inserted between the same two genes, S-adenosylmethionine synthetase (synthesizes S-adenosylmethionine (AdoMet)), and UDP-3-O- [3-hydroxymyristoyl] glucosamine N-acyltransferase (metabolic pathway of lipid A). These two genes are usually contiguous in the H. pylori genome. Prophages classified as belonging to the hpEastAsia population, although represented in a very small number, appear to be inserted between genes coding for a competence protein ComGF and a putative outer membrane protein. Phages from hpSWEurope appear to be inserted at random locations (Table 1  The presence of tandem repeats at the 3′ end of the prophage insertion site was often verified for prophages integrated between S-adenosylmethionine synthetase and UDP-3-O- [3-hydroxymyristoyl] glucosamine N-acyltransferase (Table S6, Supplementary Information).

Prophage phylogenetic relationships.
To get insight into the genetic backbone of the identified prophages and to infer their phylogenetic relationships in the frame of the well-known H. pylori geographic distribution, all 23 intact genomic sequences (Table 1) as well as the publicly available complete genomes of six Helicobacter phages (India7, Cuz20, 1961 P, KHP30, KHP40 and phiHP33) and the outgroup H. acinonychis prophage, were selected for increasing genetic diversity and were analyzed. Figure 2 shows the phylogenetic inferences found for the complete prophage genome and the concatenated integrase and holin prophage genes (PST). We observed that the majority of the prophages gather by phylogeographic group, clustering accordingly to their population assigned by STRUCTURE [40][41][42] , in a similar fashion to what we described previously for the concatenated integrase and holin genes only 33 . However, evident exceptions were noted for some prophages, namely Pt-4472-G, Fr-G12-G, Fr-CG43-A, Pt-B92-G, Pt-21299R-U and Cuz20, which displayed a discrepant phylogeographic segregation from their PST classification, suggesting the existence of putative recombination events. For instance, Pt-4472-G prophage which, according to STRUCTURE analysis, belongs to hpSWEurope, appears to be a genomic mosaic composed of both hpSWEurope and hpAfrica1 populations. This is clearly evident in Figure 3A, where Pt-4472-G is > 90% similar to the latter in the genome central region, whereas the similarity to the hpSWEurope population reached values < 50%. Curiously, the regions where the opposite is observed (i.e., > 90% similarity to hpSWEurope) encompass both the integrase and holin genes that are used for PST classification. Another clear example of prophage recombination is exhibited by Pt-B92-G, which was PST-classified as hpAfrica1. Although most of its genome appears to be inherited from a hpAfrica1 or hpNEurope population, it displays a small middle region where similarities to the hpSWEurope population reached > 95% while is strikingly different from the remainder ( Figure 3B). Although less evident, we would also like to highlight two other interesting cases involving mosaicism between hpSWEurope and hpAfrica1 populations, namely Fr-G12-G and Pt-21299R-U. Despite the fact that the former was PST-classified as hpEastAsia, most of its genome was clearly inherited from a hpSWEurope population with the exception of a small 3′ -end region which is highly similar to an hpAfrica1 population (data not shown). To the contrary, most of the Pt-21299R-U genome is similar to hpAf-rica1, except for its 3′ -end which is highly similar (> 95%) to an hpSWEurope population (similarity to hpAfrica1 is as low as 40%). Interestingly, the holin gene is absent in this prophage and, in the integrase-involved region, both hpAfrica1 and hpSWEurope populations are almost equally represented (data not shown). Considering the huge genomic diversity observed among all prophage genomes, a precise identification of the location of the breakpoint regions for all of the described recombination events was not possible.

Discussion
Most phages identified in the present study, showed a remarkable genetic synteny among themselves ( Figure 1, Table S1, Supplementary Information). However, in comparison with phage KHP30, the synteny was punctuated by deletions of certain genes which were replaced by additional IS throughout the prophage genome. When prophages are present, the tendency in H. pylori is to have just one prophage per genome, which is in accordance with the small genome size of H. pylori, which is expected to have less neutral targets for prophage integration. Furthermore, H. pylori has slow bacterial growth, and a population at low density provides few resources for the production of virions, favoring lysogeny 21 .
Prophage ORFs were typically found in the same direction, which was opposite to that of the bacterial flanking genes. Concerning annotation most ORFs have an unknown function, as described for other species phages 43 . Although no known virulence gene was found in prophage genomes, the role of prophages in the virulence of H. pylori should not be immediately discarded. Frequently phages do not code for toxin genes, as they are not able of directly convert their host into a toxin producer 44 , but they can, however, indirectly modulate toxin production, such as TcdA and TcdB in Clostridium difficille 45 .
Considering the bacterium's ecological niche, H. pylori's persistence might be associated with both its broad genetic variability 46 and its capability of biofilm developing 47,48 . In both cases the presence of extracellular DNA (eDNA) is important, either as a source of DNA taken up by the naturally competent H. pylori, promoting recombination or contributing to biofilm development 48 . Apart from outer membrane vesicle shedding, cell lysis via spontaneous prophage induction might be a source of eDNA release, contributing to survival and to the wide genomic variability of H. pylori.
The IS found in the present study were previously described in H. pylori but outside a prophage context 49 . IS were described to be present in about one-third of a set of 238 independent isolates of H. pylori 50 . Bacterial IS of IS200/IS605 and IS607 family often encode a transposase (TnpA) and a protein of unknown function, TnpB, which were hypothesized to act as a methyltransferase 51 ; furthermore, orfB is also related to the Salmonella virulence gene gipA, a Salmonella prophage gene which enhances bacterial growth in Peyer's patches 52 .
As IS found within prophage sequences showed robust homology with those found in the H. pylori genome, it can be hypothesized that prophages mediate the transfer of IS, further contributing to the genome plasticity of H. pylori. In contrast, we cannot exclude that the transfer of IS otherwise from bacteria to prophages may also be feasible. Remarkably, IS have been described in other prophages, including cyanophage Ma-LMM01, specifically-infecting Microcystis aeruginosa and mediating the transfer of IS607 to the bacterial genome 53 . Besides prophages, IS605 is also associated with the cag pathogenicity island, dividing this island into two parts called cagI and cagII by insertion of one or two copies of IS605, providing intermediate phenotypes 54 . Prophage inactivation should be under selection because lytic cycle induction may kill the cell. Correspondingly, we find five remnant prophages that might result from these evolutionary dynamics, even though defective prophages can still provide an adaptive function to bacteria 20 . Recombination with incoming phages can also imprint a signal for purifying selection. In addition, IS present in prophages have been postulated to play a role in the inactivation and immobilization of incoming phages 55 .
We showed that the prophage insertion sites can be diverse in H. pylori genomes although with some common traits among H. pylori populations, as discussed below. All prophages from hpNEurope from the present study and from H. pylori Cuz20 and India7 genomes (available at the NCBI), as well as most prophages from hpAfrica1 populations, have the same genomic context, presenting the bacterial genes S-adenosylmethionine synthetase and UDP-3-O- [3-hydroxymyristoyl] glucosamine N-acyltransferase at the 5′ end and 3′ end, respectively. Interestingly, the prophages genomes integrated between these two loci usually present tandem repeats at the 3′ end, between the last prophage gene and the first bacterial gene after the prophage (Table S6, Supplementary Information), most often in noncoding regions. DNA tandem repeats or satellite DNA, are interor intragenic nucleotide sequences repeated two or more times in a head-to-tail manner. Because these repeat tracts are prone to strand-slippage replication and recombination events causing their copy number to increase or decrease, loci containing tandem repeats are hypermutable 56 . Tandem repeats may reversibly shut down or modulate the function of specific genes, allowing them to adapt to changing environments on short evolutionary time scales without an increased overall mutation rate. The environmental adaptability in H. pylori depends primarily on tandem repeat variations, which may cause gene phase switching. DNA tandem repeats may modulate gene expression affecting transcription initiation by modifying binding affinity of regulatory proteins (upstream of − 35 site) or altering the distance to promoter elements (between − 35 and − 10 sites), modifying the affinity of regulatory proteins or mRNA stability (between the transcriptional start and an ORF). The most frequent bacterial gene at the 5′ end of prophage codes for S-adenosylmethionine synthetase, which catalyzes the synthesis of AdoMet. AdoMet is an essential metabolic intermediate involved in many biochemical processes, such as a donor of methyl groups that allows DNA methylation (reviewed in ref. 57). Once DNA is methylated it may switch genes 6 .
All hpEastAsian prophages either described in the present study or found in the genomes of H. pylori YN4-84, UM038, FD430 and UM114 Asian strains (available at the NCBI) were inserted in the same genomic region, including the competence protein ComGF, which plays a role in transformation and DNA binding, at the 5′ end and a putative outer membrane protein at the 3′ end. The gene at the 5′ end is important for genetic variability of H. pylori 58 , while H. pylori outer membrane proteins are known to mediate adherence to gastric epithelium, and ultimately are associated with clinical outcome of the infection 59 . All things considered, the prophage insertion site may not be neutral for H. pylori gene expression and further studies are needed to evaluate the impact of prophage insertion on gene expression.
In general, the phylogenetic analysis of intact prophages presents clusters according to prophage population structure (exceptions are discussed below), confirming our previous results obtained by prophage sequence typing 33 . The prophage genomes cluster in four groups corresponding to the hpSWEurope, hpNEurope, hpAfrica1 and hpEastAsia phage populations. The strong phylogeographic signal of prophage genomes is in agreement with a model of co-evolution between the virus and its bacterial host. Indeed, prophages and bacteria are linked by a long history of co-evolution, but the genetic dimension of this co-evolution cannot be defined at present 14 . The phylogeographic clustering was in agreement with integration sites of prophages (discussed above). As suggested by others 60 , this could be explained by a vertical transmission of the phage rather than by random insertions which are common to prophages. Phage evolution is driven by a horizontal exchange of functional modules between more or less related phages, achieved by DNA recombination, explaining the genomic mosaicism among phages 61 . Recombination is a factor of rapid variability in H. pylori, which is among the most recombinogenic known pathogens 12 . In parallel, in the present study, phage genomes were shown to be prone to recombination events. Indeed several prophage genome mosaics were detected, involving, for the vast majority of the cases, both hpAfrica1 and hpSWEurope populations. This is not surprising considering that both populations were detected in the same geographic area. Nevertheless, most phage ORFs are of unknown function, so no assumptions can be performed regarding a putative impact of these recombination events on pathogenicity. These mosaic structures also highlight the need for a prudent use of the PST-based classification. In fact, although an agreement is observed for most of the cases, for the studied mosaic structures, for some of the studied mosaic structures only the integrase and/or the holin genes appeared to support the PST-based classification.
The remnant prophages encountered in the present study as well as in other H. pylori strains 32,62 and in non-pylori Helicobacter species 63 highlight an evolutionary scenario consistent with a prophage decay process during the complex interaction between H. pylori and the prophage. However, a model in which H. pylori strains from different geographical regions may have been infected by distinct phage lineages after the geographic separation of the bacterial host is also feasible 11 , but less likely due to the high genetic synteny between prophages from different geographic areas. Altogether, the integration at the same locus and a gene repertoire relatedness points to a vertical transmission, suggesting the so called pervasive domestication of prophages by the bacterial host which may drive bacterial adaptation 20 . Remarkably, the most divergent H. pylori prophage population (hpSWEurope), presented neither conserved loci for integration site nor IS.
This work not only provides a compendium of novel sequences, but also sets the stage for future studies aimed at better understanding the virus-host relationship. Results of the present study showed that prophages are more common in H. pylori than initially expected and that, in most cases, prophages appear to be intact, with a sequence size of over 20 Kb. Remarkably, we show for the first time that for phages classified as hpNEurope, hpAfrica and hpEastAsia, the insertion site appears to be preserved (Table 1). Furthermore, the phylogenetic analysis for a vast majority of phage genomes is similar to the phylogenetic analysis previously presented by our team 33 using two phage genes (integrase and holin), confirming our previous findings and reinforcing the hypothesis of co-evolution between prophages and H. pylori. Some recombinant phages were found, suggesting additional genetic diversity that hypothetically may provide H. pylori with advantageous phenotypes. Major challenges at present are to identify the function of prophage genes, to understand if the insertion site is neutral for the host and whether prophage presence plays a role in the adaptation of H. pylori to its host, or if prophage genes belonging to the lysis cassette are useful for biomedical applications, namely phage therapy.

Material and Methods
Bacteria and cell growth conditions. A total of 28 H. pylori strains carrying prophages were analyzed (Table S1, Supplementary Information). These included 15 strains isolated from patients with gastritis, nine from peptic ulcer patients, three from MALT patients and one from gastric cancer patient. The present study included strains from Portugal (n = 14), France (n = 6), Sweden (n = 4), UK (n = 2), Germany (n = 1) and Israel (n = 1). Prior to each assay, bacteria were grown in H. pylori selective medium (Biogerm, Portugal) at 37 °C For genomes sequenced in Portugal and Sweden, the yield and integrity of the purified DNA were then assessed through a Qubit assay (Quanti-it dsDNA Assay Kit, Broad Range; Lifetechnologies, Paisley, CA, USA) and agarose gel electrophoresis (0.7% gel), respectively. High-quality DNA samples were then applied to prepare Nextera XT Illumina paired-end libraries. These were subsequently subjected to cluster generation and paired-end sequencing (2 × 250 bp, 2 × 150 bp and 2 × 100 bp) by using the Illumina MiSeq (Portugal) and HiSeq 2500 (Sweden) platforms (Illumina Inc., San Diego, CA, USA), according to the manufacturer's instructions.
The number of passing filter reads obtained per sample ranged from 0.6-2.7 million reads. The FastQC (http:// www.bioinformatics.babraham.ac.uk/projects/fastqc/) and FASTX (http://hannonlab.cshl.edu/fastx_toolkit/) tools were applied to evaluate and improve the quality of the raw sequence data, respectively. Subsequently, high-quality reads were de novo assembled using Velvet (version 1.2.10) 64 (several assemblies using different k-mer sizes were run), where the best assembly was assumed as the one with the best cumulative ranks for N50, number of contigs/scaffolds, and length of the largest contig/scaffold. The obtained mean depth of coverage ranged from 135-to 195-fold. The final contigs/scaffolds were visually inspected (using Tablet 1.14.04.10) 65 and corrected.
For genomes sequenced in UK, quantification of DNA was assessed after DNA extraction with a Nanodrop spectrophotometer, as well as the Quant-iT DNA Assay Kit (Life Technologies) prior to sequencing. High-throughput genome sequencing was performed using a HiSeq 2500 machine (Illumina Inc.), and the 100 bp short read paired-end data was de novo assembled using Velvet (version 1.2.08) 64 . The VelvetOptimiser script (version 2.2.4) was run for all odd k-mer values from 21 to 99 (several assemblies using different k-mer sizes were run), with all program settings unchanged apart from a minimum output contig size set to 200 bp and the scaffolding option switched off.
All genomes were annotated using the RAST server (http://rast.nmpdr.org/) 37 , the NCBI Prokaryotic Genomes Annotation Pipeline version 2.3. and PHAST web server 38 . The respective trimmed reads were submitted to the Sequence Read Archive (SRA).
Assembly of prophage genomes. For prophage identification two strategies were taken. First, the PHAST web server 38 was used to identify putative prophages within contigs of each H. pylori genome. Second, MEGABLAST 66 was used to align the genome of H. pylori phage KHP30 or phiHP33 with the contigs of each sequenced H. pylori genome. PHAST analyses (http://phast.wishartlab.com/) applied over contigs allowed us to check homology, and to identify, annotate and graphically display prophage sequences, providing information on prophage completeness, categorized as either intact, incomplete, questionable or not detected. MEGABLAST was run using KHP30 or phiHP33 as reference since these prophages genomes were the most commonly found to be similar with the prophages detected by PHAST.
The MEGABLAST analysis results were particularly useful to determine which contigs were from phage origin and the order in which they probably appear. Based on this predicted contig order, primers flanking the contigs were designed, using primer3 v. 0.4.0 67 , to bridge gaps in the assembly in order to close the gaps (the gaps were of few bases to about five hundred bases). The PCR mix included Promega (Madison, WI, USA) buffer (1X), dNTPs (0.2 μ M), primers (0.5 μ M each), GoTaq polymerase (1.5 U), water to complete 25 μ l and DNA sample (25 to 50 ng). The PCR cycle was composed of a first cycle at 95 °C for 4 min, 35 cycles at 95 °C for 30 sec, 59 °C for 30 sec and 72 °C for 1 or 2 min. A last cycle at 72 °C for 7 min was applied. The PCR products were purified using MicroSpin S-400 or S-300HR columns (GE Healthcare, Velizy-Villacoublay, France) and directly sequenced on both strands using an external sequencing service provider (Eurofins Genomics, Regensburg, Germany, and Stabvida, Lisbon, Portugal). A multiple sequence alignment 68 was carried out using flanking parts of the contigs and the PCR sequenced product after assembly of the forward and reverse sequences.
The insertion sequences of the prophages were identified whenever the prophage 5′ and 3′ ends were contiguously flanked by bacterial genes in a contig. The last bacterial gene before the prophage sequence and the first bacterial gene after the prophage were identified as well as the homologous locus_tag for the reference genome H. pylori J99 36 . The presence of repeated sequences at prophage insertion sites was verified using Tandem Repeat Finder 69 (available at https://tandem.bu.edu/trf/trf.basic.submit.html).
Comparative genomic analyses of prophages. The assembled prophages were analyzed using PHAST to provide a first annotation. The annotation of prophage genomes was carried out further using Phages v. 1.0 (http://www.phantome.org/PhageSeed/Phage.cgi?page= phast), and RAST 37 . The annotation of coding sequences (CDS) found by the three different methods were compared.
The annotated prophages were aligned using the progressive Mauve algorithm software (version 2.3.1) 70 , to check the order of the CDS in the prophage genomes and the existence of a consensus sequence. In order to infer phylogenetic relationships among prophages, the intact genomes of the 23 prophages identified in the present study, were aligned using MAFFT version 7 71 together with other six phage Helicobacter genomes available at public databases (1961P, KHP30, KHP40, phiHP33, H. pylori India7, and H. pylori Cuz20) as well as with the H. acinonychis (accession number NC_008229.1) prophage used as an outgroup. A nucleotide Neighbour-joining phylogenomic tree was constructed using the MEGA (Molecular Evolutionary Genetics Analysis) 6.0 software 72 , with distances estimated using the Kimura two-parameter model 73 . Considering the huge genomic diversity observed among all prophage genomes as well as their different lengths, both complete and pairwise deletion options were used. While the former removes all sites containing missing data or alignment gaps before the distance estimations begin, in the pairwise-deletion, option sites are only removed during the analysis as the need arises. Branching significance was estimated using bootstrap confidence levels by randomly resampling the data 1,000 times with the referred evolutionary distance model.
To determine the population structure of prophages, we use prophage sequence typing (PST), as previously described 33 . Briefly, the multi-fasta file with the alignment of integrase and holin gene sequences was converted to the STRUCTURE 2.3.4 [40][41][42] program input file using xmfa2structure by X. Didelot and D. Falush (http://www. xavierdidelot.xtreemhost.com/clonalframe.htm). STRUCTURE was used to study the number of K populations using the admixture, performing runs in duplicate. In each run, a Markov Chain Monte Carlo (MCMC) of 10,000 iterations and a burn-in period of 10,000 iterations were chosen. The highest mean value of ln likelihood was compared for multiple runs of 2 ≤ K ≤ 6.
The existence of putative recombination phenomena within prophage genomes was first evaluated using the Recombination Detection Program version 4 (RDP4) 74 with default settings. RDP4 simultaneously applies different methods for detecting and characterizing individual recombination events that are evident within a sequence alignment without any need for predefined sets of non-recombinant reference sequences. SimPlot software (http://sray.med.som.jhmi.edu/SCRoftware/simplot/) was also used for characterizing with higher detail the genomic mosaicism of the identified recombinant prophages, as previously described for bacterial pathogens 75 . The similarity estimations were performed by using the Kimura two-parameter model with sliding window and step sizes that varied according to each recombinant genome. Data Availability. The genomes of the prophages are available with the accession numbers KX119174 to KX119206. The trimmed reads were submitted to the Sequence Read Archive (SRA), with the accession numbers SRP064706 to SRP064710, SRP071062, SRP071067, SRP071271, SRP071274, SRP071276 to SRP071280, SRP071282, SRP071284, SRP071289 to SRP071296, and SRP072438 to SRP072441.