New insights into the biodiversity of coliphages in the intestine of poultry

Despite phages’ ubiquitous presence and great importance in shaping microbial communities, little is known about the diversity of specific phages in different ecological niches. Here, we isolated, sequenced, and characterized 38 Escherichia coli-infecting phages (coliphages) from poultry faeces to gain a better understanding of the coliphage diversity in the poultry intestine. All phages belonged to either the Siphoviridae or Myoviridae family and their genomes ranged between 44,324 and 173,384 bp, with a G+C content between 35.5 and 46.4%. Phylogenetic analysis was performed based on single “marker” genes; the terminase large subunit, portal protein, and exonucleases, as well as the full draft genomes. Single gene analysis resulted in six distinct clusters. Only minor differences were observed between the different phylogenetic analyses, including branch lengths and additional duplicate or triplicate subclustering. Cluster formation was according to genome size, G+C content and phage subfamily. Phylogenetic analysis based on the full genomes supported these clusters. Moreover, several of our Siphoviridae phages might represent a novel unclassified phage genus. This study allowed for identification of several novel coliphages and provides new insights to the coliphage diversity in the intestine of poultry. Great diversity was observed amongst the phages, while they were isolated from an otherwise similar ecosystem.

Phage morphological analysis. Based on a sequencing cut-off value of ≤ 95% nucleotide similarity, 18 coliphages were selected and subjected to TEM to determine phage morphology and confirm phage classification. Based on the morphological features, the phages were classified into the Caudovirales order and either the Siphoviridae family or the Myoviridae family. Analysis of the isolated Siphoviridae phages showed a long flexible non-contractile tail with a length varying between ~ 100 and ~ 200 nm and icosahedral heads with widths ranging from ~ 52 to ~ 77 nm ( Fig. 1a-h). Among the isolated Myoviridae phages a long straight contractile tail was observed with a tailed length varied between ~ 100 and ~ 120 nm, head widths ranging from ~ 65 to ~ 84 nm, and head lengths from ~ 60 to ~ 110 nm ( Fig. 1i-r). Taxonomic classification of each of the coliphages is shown in Table 1.
Phage genome sequence analysis and annotation. All 38 coliphages isolated in this study were characterized based on WGS data. An overview of the genomic characteristics and properties are listed in Table 1. According to FastQC parameters, good quality of the raw sequence data for all phages was confirmed. The phage genomes ranged in size between 44,324 and 173,384 bp, with a G+C content between 35.5 and 46.4%. Genomes smaller than 90,000 bp had a G+C content between 38.9 and 46.4%, whereas the larger genomes had a G+C content of 35.5-38%. For each coliphage, 72-275 putative CDSs were identified using both automatic and manual annotation. CDSs encoding the phage terminase small subunit, the phage terminase large subunit, the phage portal protein, and phage capsid and scaffold proteins were identified within all 38 coliphage genome sequences. They presented the same conserved genome structure with a general gene order: the terminase small subunit upstream from the terminase large subunit, the phage portal protein and two genes encoding phage capsid and scaffold proteins. In general, one phage terminase small subunit, one phage portal protein, and up to four phage capsid and scaffold proteins were found within each of the phage genomes. Besides, phage exonucleases were identified in all phage genomes. For each phage, one to three CDSs for exonucleases were found. No gene encoding for an integrase was found, indicating that these phages are strictly virulent/lytic phages. No known acquired resistance or virulence genes were detected in any of the 38 phage genomes.
Phage phylogeny and taxonomy. Taxonomic classification of the 38 isolated coliphages was performed through multiple WGS genome comparisons. These coliphages included 27 (71%) Siphoviridae coliphages and 11 (29%) Myoviridae coliphages. The Siphoviridae phages were compared with 146 published phages from this family. The Myoviridae phages were compared with 171 published Myoviridae phages. According to ICTV guidelines, phage family, subfamily and genus were predicted based on genome similarity. Results are shown in Table 1. All Siphoviridae phages belonged to the Tunavirinae subfamily, except for Phage 61. This phage was predicted to belong to the Tequintavirus genus, which do not have any ICTV subfamily. The ten phages, Phage 8,53,54, Fig. S1). The resulting phylogenetic analysis placed phages isolated in this study in the same five clusters, cluster A1-3, B and C, with a cut-off height of 0.81 ( Supplementary Fig. S2). One additional reference phage was found in cluster B and C, including the T1-like reference phage CEB_EC3a and the T5-like reference phage EPS7, respectively. For the Myoviridae phage analysis, 9,420 gene groups were included ( Supplementary Fig. S3). The resulting phylogeny placed phages isolated in this study in the same three www.nature.com/scientificreports/ clusters, cluster D, E, and F, with a cut-off height of 0.58 ( Supplementary Fig. S4). For cluster D, additionally 13 Felix01-like reference phages were found. In contrast to the WGS-based analysis, at a cut-off height of 0.39, all cluster F phages isolated in this study, were found in one subcluster with 10 T4-like reference phages. The degree of topological and branch length agreement between the different phylogenetic methods were compared (Supplementary Table S2). The coliphage diversity was further assessed based on three phage marker genes: the terminase large subunit and phage portal protein, and the phage exonuclease. Selected gene sequences from known phages were included for reference. Results are summarized in Table 1. For all three marker genes, cluster formation was in accordance with resulting clusters of the pan genome-and WGS-based phylogeny, cluster A-F, only with minor differences. Results based on the terminase large subunit analysis are shown below (Fig. 4).
For cluster A, all coliphages isolated in this study were found within same subclusters as for the WGS-based phylogeny except for Phage 63, which was found in the A2 subcluster instead of A1. Analysis based on the phage  Fig. S5). Analysis based on the exonuclease resulted in multiple clusters of cluster C and F, as phages from these clusters encoded 2 or 2-3 exonuclease genes, respectively ( Supplementary Fig. S6). Comparison of the cluster construction of the three single genes analysis showed only minor topological and branch length differences (Supplementary Table S2). Moreover, cluster construction was in accordance with phage subfamily defined based on the whole genome. Siphoviridae phages from cluster   Phage comparative genomics. Pan genome analysis of Siphoviridae and Myoviridae phages isolated in this study revealed that neither of the two groups had any core genes. Analysis of coliphage genomes from each of the six clusters, A-F, identified core genes (core and softcore) and accessory genes (shell and cloud). As cluster A phages had only five core genes (2% of the total genome), analysis of subclusters, A1, A2 and A3, were performed additionally. Results are summarized in Table 2. The pan genome included between 81 and 333 genes, and core genes constituted between 22 and 73% of the pan genome. The level of synteny and genomic rearrangement within each cluster or subcluster of related phages was assessed by genome comparison. Results are summarized in Table 2. Eight comparisons were performed, corresponding to the eight (sub)clusters, A1, A2, A3, B, C, D, E, and F resulting from the phage diversity analysis above ( Supplementary Fig. S7-S14). Genome comparison of the phages resulted in identification of local collinear blocks (LCBs), indicating homologues DNA regions shared by two or more genomes without sequence rearrangements. The LCBs comprised different modules of genes with different functions, including modules for DNA packaging, structural proteins, head and tail morphogenesis, and host cell lysis. Several modules comprised only hypothetical proteins with unknown function. The average level of conservation varied between the different type of genes.
Genes encoding the terminase large and small subunit, the major capsid protein, DNA primase, singlestranded protein, portal protein, recombinase, specific tail protein and holin were the most conserved genes between all phages, whereas genes with the lowest level of conservation included, specific tail fiber proteins, tail  between the phages within the clusters but also highlighted that re-arrangement and/or gain/loss of LCBs must have occurred at some point during the evolution of the phages. The region encoding the terminase large subunit and portal protein were present in a conserved region all genomes in all eight comparisons.

Discussion
In this study, 38 coliphages were isolated from poultry faecal material, sequenced and characterized. The high number of coliphage genomes included in the analysis allowed for a better understanding of both coliphage diversity in poultry and the global coliphage diversity within the Siphoviridae and Myoviridae families. However, one should be aware of the possible biases as the coliphages were isolated using two E. coli K12-derived host strains only and, as such, cannot be seen as the complete coliphage diversity. Both host strains are mutated for the FhuA (previously called TonA) gene, which is used as receptor for some phages, including Siphoviridae coliphages T1 and T5 17 . However, phage adsorption is not always restricted to one receptor. Accordingly, we were able to isolate a T5-like phage using the selected host strains. Also, as none of the host strains has the F pili, phages utilising a F pili encoded receptor for absorption, such as Inoviridae phages, will most likely not be isolated 17 .
The reasons why we could not isolate coliphages from the Podoviridae family remains obscure, however, this is in accordance with the finding of Korf et al. 15 , who also did not isolate any Podoviridae coliphages from poultry while they could isolate them from sewage and surface water. On the other hand, other studies have successfully isolated Podoviridae phages from poultry faeces [18][19][20][21] . Several factors might play a role in the type of phage being isolated, including culture and isolation method, host strain, and isolation source. In Podoviridae studies mentioned above, phages were isolated using a single or multiple E. coli host-strains and the DLA method was similar to this study. However, a notable difference is the type of host-strain used. In our study, laboratory strains were used whereas the Podoviridae study host-strains only included E. coli strains isolated from poultry. As our 11 Myoviridae phages were isolated from samples from 11 different farms, no correlation between phage isolated and geographical location (poultry farm) was found (data not shown). In accordance to the findings of Olsen 16 , the most prevalent genus of the Myoviridae and Siphoviridae phages isolated in our study was the Felixounavirus (45.5%) and the Hanrivervirus (33.3%), respectively. Both studies used E. coli K-12 derived laboratory strains as host strain for phage isolation. While in a collection of 50 coliphages isolated from surface water, manure, sewage, or animal faeces, 29 different E. coli host strains were used 15 . No Felixounavirus phages could be isolated and most Myoviridae phages belonged to the Tequatrovirus genus, followed by the Mosigvirus genus. Those two genera were also found in our study. www.nature.com/scientificreports/ Currently, the polyphasic approach is the most commonly used for bacterial classification 22,23 , and a similar approach combining multiple methods is recommended when working with phages 12 . However, studying phage taxonomy has proven challenging since no universal conserved marker gene, as the 16S rRNA gene used for bacteria, exists throughout all phage families. Several semi-conserved family-specific marker genes have been proposed as candidates to support taxonomical classification of tailed phages, including DNA packaging and head assembly genes 1,24,25 . Accordingly, in this study we used the terminase large subunit, portal protein and exonucleases as markers, and were able to determine the family and subfamily to all coliphages isolated in this study in accordance with the TEM-based and whole genome-based classification, respectively. Furthermore, the analysis based on the terminase large subunit and portal protein resulted in a clear distinction between the two Tevenvirinae clusters, indicating different genera (Mosigvirus and Tequatrovirus). Thus, single gene-based analysis provides good initial indication in which taxonomic cluster the phages belong to. However, a single gene does not provide a global view of the structural organization of the phage nor accounts for genomic rearrangements, mutations, and mosaicism. Moreover, in accordance with finding from previous studies the selected gene was not always detected in the phage genomes 24 , thus excluding these phages form classification. When using single genes, one should be aware of the possibility of multiple distinct variants of the same gene within one genome. Several of our phages encoded up to three different exonuclease genes. Depending on which gene variant used for analysis, the risk of "false" cluster formation and distance, and hereby a faulty classification at subfamily or genus level was present. Thus, for more comprehensive phage taxonomy, including genus classification, the single gene analysis should be accompanied by whole genome-based analysis as well as functional gene studies. When investigating the evolutionary relationship between phages studies have shown the advantage of combining different proteomic and comparative genomic approaches, including WGS data and well-characterized reference dataset, which take into account the effect of horizontal gene transfer (HGT) and recombination events on the phage genome evolution 7,26 .
In this study, genome-based phylogenetic and taxonomic analysis were performed in combination with traditional morphological examination of the phage using TEM. Through the genome-based analysis we identified a potential new Siphoviridae genus. The three unclassified A3 subcluster from this study clustered together with the Jahat_MG145 reference phage, which was a singleton 16 . Thus, we propose that this group of phages represents a new unclassified genus with currently four phages, including Phage 52, Phage 56_2 Phage 69, and Jahat_MG145.
We aimed to expand our knowledge on the coliphage diversity, and observed great diversity among these phages, while they were isolated from a similar ecosystem. The diversity was characterised by a great span in genome size (44.3-173.1 kb) and G+C content range (35.5-46.4%), as well as cluster-specific characteristics of the six phage clusters, A-F. Cluster B phages had the smallest genomes and lowest number of CDSs followed by phages from cluster A, D, C, and E/F. Similar to findings from other studies 16,27 , lower genome size seemed to be correlated with an increase in G+C content. The largest variation in genome size and number of CDSs were observed for the group of Myoviridae phages, whereas the largest variation in G+C content (7.2%) was overserved for the group of Siphoviridae phages. Notably, the Tequintavirus Phage 61 showed to be more similar to the group of Myoviridae compared to the other Siphoviridae phages based on the above-mentioned characteristics. When omitting Phage 61, G+C content variation for the Siphoviridae phages was only 2.9%. Phage G+C content has been shown to be correlated with the G+C content of the phage host 27,28 . Accordingly, differences in G+C content observed for the coliphages might reflect phage-host interactions and co-evolution with past and current host(s). Gene content, including number of core genes appeared to be associated with the cluster. Moreover, number of exonucleases encoded by the phage appeared to be cluster associated as well, as only phages from cluster C and cluster F encoded multiple exonucleases. Encoding multiple exonucleases could be a result of adaptive evolution conferring fitness advantage over other phages. However, it could just as well reflect some of the challenges to accurate phage genome annotation, including false negatives (undetected genes) and incorrect functional annotation 29,30 . Gene content variation has been shown to be related to recombination events resulting in acquisition or loss of gene(s) 31 . Through our comparative genomics analysis of related phages, LCBs with modules with varying level of gene conservation were identified and highlighted the different levels of heterogeneity between different phage clusters. Repeat regions were observed in several of the phage genomes and resulted in variation of LCBs. The presence of these regions should be considered when assessing the gene content variation, as these regions are shown to be prone to genome assembly mistakes, and as such, might represent false level of variation 32 . LCBs modules are also called for mosaic section and the two terms have been used interchangeably throughout history. One definition is that mosaic sections refer to patches of low nucleotide similarity when two similar phage genomes are compared, and that modules refer to exchangeable sections between two or more phages in the population 13 . The genome comparison showed the mosaic nature of the phage genomes, with modules with high level of conservation interspersed with mosaic sections. The mosaic sections could be acquired though HGT from other phages, which is thought to happen when phages are found in the same host. Most often this happens through phage co-infection or single-phage infection of a host that carries one or more prophages 33,34 . Moreover, phages have been shown to acquire genes from their host 35 . The comparative genomics approach hereby underlines the continuous evolution of phage genomes as well as the great phage diversity.
In conclusion, this study has identified a potential new coliphage genus and several new species and provides insight not only to the coliphage diversity of the intestine of poultry but the global coliphage diversity as well. Moreover, classification of phages isolated in this study brings us one step closer to a more refined taxonomic understanding of coliphages. Our comparative genomic analysis showed different levels of heterogeneity between different phage clusters and highlighted the mosaic nature of the phage genomes as well as the continuous evolution of phages in a single environment source. However, to fully understand the complexity and underlying mechanisms of the phage diversity further studies are needed.  36,37 . Briefly, the samples (5 g) were emulsified in Luria Bertani (LB) broth (Sigma-Aldrich, Saint Louis, MO, USA). The decanted supernatant obtained from each emulsion was enriched by the addition of two early-log phase host bacteria, E. coli K-12 derived laboratory strains C600 38 or K514 39 , a nonrestricting, modifying (r K −, m K +) derivative of strain C600. Suspensions were incubated overnight at 37 °C, with shaking (120 rpm) and were then centrifuged at 4,000 rpm for 30 min to pellet the cellular debris. The supernatant containing the phage(s) was re-centrifuged and filtered using a 0.45 µm membrane filter followed by a 0.2 µm Minisart Filter (Fisher Scientific, Waltham, MA, USA). The enriched phage suspensions were enumerated and tested for lytic activity on the host bacteria using the double-layer agar (DLA) technique 36,40,41 . Briefly, phage suspensions were serial diluted and spotted on an overlay of the respective host bacteria on solid LB medium supplemented with 0.8% agar and 0.5 mM CaCl 2 . A clear zone in the plate, a plaque, resulting from the lysis of host bacterial cells, indicated the presence of virulent coliphage(s). Samples with lytic activity against the indicator strain were further processed for single phage plaque isolation, including three rounds of plaque purification, and propagation. All phage lysates were stored at 4 °C until required.
Phage morphological analysis. The morphology of unique coliphages (≤ 95% nucleotide similarity) isolated in this study was investigated using transmission electron microscopy (TEM). Phage suspension was applied to the surface of Formvar carbon-coated grids, the phages were fixed using paraformaldehyde (PFA) (4% w/v), washed, and negatively stained with UrAC (1% w/v). After drying, grids were examined using a JEM-1400 Plus transmission electron microscope (JEOL, Benelux).
Genomic DNA extraction and sequencing. DNA  Phage genome sequence analysis and annotation. FastQC software (https ://www.bioin forma tics.babra ham.ac.uk/proje cts/fastq c/), version 0.11.3, was used for quality control validation of the raw reads sequence data. Low-quality sequences were excluded from further analysis. The raw reads were trimmed for quality, adaptor sequences were removed using default parameters. The sequence reads were de novo assembled using QIAGEN Bioinformatics CLC Genomic Workbench, version 11.0.1, using default settings, with minimum contig length changed to 250 bp. An overview of assembly statistics is provided in Supplementary Table S3. The assembled phage sequences were compared with phage homologues from the National Center for Biotechnology Information (NCBI) nucleotide database (https ://blast .ncbi.nlm.nih.gov/Blast .cgi) using the Basic Local Alignment Search Tool (BLAST) software 44 , and from the custom PHAge Search Tool Enhanced Release (PHASTER) phage database 45 . Newly assembled phage sequences were compared using both BLAST and PHASTER to identify unique and identical (> 95% nucleotide sequence similarity) phages. Assembled contigs were submitted to the ResFinder database, version 3.2 46 and the VirulenceFinder database, version 2.0 47 to identify any acquired antimicrobial resistance and virulence associated genes, respectively. By default, selected threshold for %ID was 90% and 60% for minimum length. All 15 databases for antimicrobial resistance genes were selected. The taxonomic group of E. coli bacteria was selected for VirulenceFinder. PHASTER and The Rapid Annotation using Subsystem Technology (RAST) server and the SEED viewer, version 2.0, were used for identification of CDSs and initial annotation of the phage genomes, including identification of the phage terminase large subunit 48 . Gene function of genes defined as "hypothetical protein" was predicted by comparison to homologue genes with defined functions in other related phage genomes. The G+C content of the phages was calculated using the SEED viewer.
Phage phylogeny and taxonomy. Multiple genome alignment of the WGS sequences was performed using Applied Maths BioNumerics software, version 7.6. According to the ICTV taxonomy guidelines, the 38 coliphages were classified into phage family, subfamily and genera based on nucleotide similarity to known Siphoviridae and Myoviridae coliphages. Known reference coliphages included were limited to isolates with complete genomes found in the ICTV database and the NCBI database (data of November 2019).
phage diversity. Phylogenetic trees based on the phage whole genome sequences were constructed using R, version 3.5.1 49 , for comparison of the coliphages with published Siphoviridae or Myoviridae reference phage genomes (accessed from the NCBI database). The trees were constructed using unweighted pair group method with arithmetic mean (UPGMA) from a distance matrix of binary distances calculated from either, gene presence/absence within the full genomes of the phages determined using Roary version 3.12.0 50 , Prokka, version 1.13.7 51 and prodigal, version 2.6.3 52 , or Kmer presence/absence (using 10 and 21mer) based on de novo assembled contigs, as calculated using a python script (Supplementary Script kmer.py).
Scientific RepoRtS | (2020) 10:15220 | https://doi.org/10.1038/s41598-020-72177-2 www.nature.com/scientificreports/ For phylogenetic analysis based on single marker gene, phage gene sequences were aligned using Clustal X, version 2.1 53 . A maximum likelihood phylogenetic trees (unrooted) was constructed and supported by bootstrap analysis (inferred from 1,000 replicates) with default substitution model (Tamura-Nei model) to assess the diversity of the coliphages using the phylogenetic and molecular evolutionary analyses (MEGA) software, version X 54 . The phylogenetic trees were based on the nucleotide sequences of the CDSs of the following genes: phage terminase large subunit, phage portal protein, or phage exonucleases. The reference genomes included, represented the best matching published sequences to the phages in this study (selected based on the BLAST max score) and core reference genomes for comparison.
The degree of topological and branch length agreement between the different phylogenetic methods and between the three marker genes was investigated using the R packages Analysis of Phylogenetics and Evolution (ape) 55 and phangorn 56 . Phage comparative genomics. A more detailed analysis of the most closely related coliphage genomes was carried out. Genomes were re-annotated using Prokka and pan genome analysis was carried out with Roary using script "roary-e-n-s-p 20-i90*.gff ", including identification of core genome, including core and softcore genes, and accessory genome, including shell and cloud genes. In order to investigate the level of synteny and genomic rearrangement, whole genome alignment and comparison of coliphages and related reference phages from each cluster or subcluster were performed with the Mauve software using progressiveMauve 57 with default parameters. No more than 19 of the most related phages were included in each comparison for simplification. Relatedness of the phages were based on percentage of nucleotide similarity and number of shared core genes. Reference genomes were included for annotation references.