Highly diverse flavobacterial phages isolated from North Sea spring blooms

It is generally recognized that phages are a mortality factor for their bacterial hosts. This could be particularly true in spring phytoplankton blooms, which are known to be closely followed by a highly specialized bacterial community. We hypothesized that phages modulate these dense heterotrophic bacteria successions following phytoplankton blooms. In this study, we focused on Flavobacteriia, because they are main responders during these blooms and have an important role in the degradation of polysaccharides. A cultivation-based approach was used, obtaining 44 lytic flavobacterial phages (flavophages), representing twelve new species from two viral realms. Taxonomic analysis allowed us to delineate ten new phage genera and ten new families, from which nine and four, respectively, had no previously cultivated representatives. Genomic analysis predicted various life styles and genomic replication strategies. A likely eukaryote-associated host habitat was reflected in the gene content of some of the flavophages. Detection in cellular metagenomes and by direct-plating showed that part of these phages were actively replicating in the environment during the 2018 spring bloom. Furthermore, CRISPR/Cas spacers and re-isolation during two consecutive years suggested that, at least part of the new flavophages are stable components of the microbial community in the North Sea. Together, our results indicate that these diverse flavophages have the potential to modulate their respective host populations.


INTRODUCTION
Marine bacteriophages outnumber their hosts by one order of magnitude in surface seawater and infect 10-45% of the bacterial cells at any given time [1][2][3]. They have a major impact on bacterioplankton dynamics. This impact can be density dependent [4] and take many forms. By lysing infected cells, viruses decrease the abundance of their host population, shifting the dominant bacterial population, and recycling the intracellular nutrients inside the same trophic level [5]. By expressing auxiliary metabolic genes, phages likely enhance the metabolic capabilities of the virocells [6,7]. By transferring pieces of host DNA, they can drive bacterial evolution [8]. By blocking superinfections with other phages [9], they can protect from immediate lysis. Potentially phages even influence carbon export to the deep ocean due to aggregation of cell debris resulted from cell lysis, called the viral shuttle [10,11]. Marine phages modulate not only their hosts, but also the diversity and function of whole ecosystems. This global impact is reflected in a high phage abundance [12] and diversity [13,14].
Phages are also well known for modulating bacterial communities in temperate coastal oceans. Here, the increase in temperature and solar radiation in spring induces the formation of phytoplankton blooms, which are often dominated by diatoms [15], and are globally important components of the marine carbon cycle. These ephemeral events release high amounts of organic matter, which fuels subsequent blooms of heterotrophic bacteria. Flavobacteriia belong to the main responders [16,17] and their increase is linked to the release of phytoplankton derived polysaccharides [18,19]. These polysaccharides are produced by microalgae as storage compounds, cell wall building blocks, and exudates [20][21][22]. This highly complex organic matter is likely converted by the Flavobacteriia to low molecular weight compounds and thus they are important for the carbon turnover during phytoplankton blooms [18,[23][24][25]. Recurrent genera like Polaribacter, Maribacter, and Tenacibaculum succeed each other in a highly dynamic fashion [19]. This bacterial succession is likely triggered by the availability and quality of substrates such as polysaccharides [18,19], yet it cannot be fully understood without considering mortality factors such as grazing by protists, and viral lysis. Grazing by protists is mostly size dependent [26], whereas viral lysis is highly host specific [27].
This includes several Cellulophaga phage isolates from the Baltic Sea, covering all phage morphotypes in the realm Duplodnaviria and also two different phage groups in the realm Monodnaviria [27,37]. Phages were also isolated for members of the genera Polaribacter [38], Flavobacterium [39,40], Croceibacter [41], or Nonlabens [42] and included eight tailed phages, one having a myoviral and the rest having a siphoviral morphology. However, the coverage of the class Flavobacteriia and the diversity of marine flavophages remains low. With the exception of the Cellulophaga phages, most of the other flavophages have only been briefly characterized in genome announcements.
In the context of a large project investigating bacterioplankton successions during North Sea spring bloom season, we isolated and characterized new flavophages, with the purpose of assessing their ecological impact and diversity. In total, more than 100 phage isolates were obtained, sequenced, annotated, and classified. This diverse collection is here presented in the context of virus and bacterioplankton abundances. Metagenomes obtained for Helgoland waters of different size fractions were mapped to all newly isolated flavophage genomes, testing the environmental relevance of the flavophage isolates. This study indicates that flavophages are indeed a mortality factor during spring blooms in temperate coastal seas. Furthermore it provides twelve novel phage-host systems of six genera of Flavobacteriia, doubling the number of known hosts.

MATERIAL AND METHODS Sampling campaigns
Surface water samples were taken off the island Helgoland at the long term ecological research station Kabeltonne (54°11.3′ N, 7°54.0′ E). The water depth was fluctuating from 7 to 10 m over the tidal cycle. In 2017, a weekly sampling was conducted over five weeks starting on March 14 (Julian days  and covered the beginning of a spring phytoplankton bloom. In 2018, a weekly sampling was conducted over eight weeks starting on March 29 and ending on May 24 (Julian days . It covered the full phytoplankton bloom. Additional measurements were performed to determine the chlorophyll concentration, total bacterial cell counts, absolute Bacteroidetes cell numbers and total virus abundances (SI file 1 text). Viruses were counted both by epifluorescence microscopy of SYBR Gold stained samples and by transmission electron microscopy (TEM) of uranyl acetate stained samples (SI file 1 text). The Bacteroidetes cell numbers were determined by 16S rRNA targeted fluorescence in situ hybridization (FISH) with specific probes (SI file 1 text).

Phage isolation
Phage isolates were obtained either after an intermediate liquid enrichment step or by direct plating on host lawns (SI file 1 Table 1). In both cases, seawater serially filtered through 10, 3, 0.2 µm polycarbonate pore size filters served as phage source. To ensure purity, three subsequent isolation rounds were performed by picking single phage plaques and using them as inoculum for new plaque assays. Then, a phage stock was prepared. For more details, see SI file 1 text. These phage stocks were then used for assessing phage morphology, host range and genome size, and for DNA extraction (SI file 1 text).

Determination of phage genomes
Phage DNA was extracted using the Wizard resin kit (Promega, Madison, USA) and eluted in TE buffer (after [43]). The DNA was sequenced on an Illumina HiSeq3000 applying the paired-end 2 × 150 bp read mode. For most Cellulophaga phages (except Ingeline) and Maribacter phages a ChIPseq (chromatin immunoprecipitation DNA-sequencing) library was prepared. This was done to generate a NGS library for ssDNA phages and due to library preparation issues with the Maribacter phages for Illumina and PacBio. For the other phages a DNA FS library was prepared. The raw reads were quality trimmed and checked, then assembled using SPAdes (v3.13.0, [44]) and Tadpole (v35.14, sourceforge.net/projects/bbmap/). Assembly quality was checked with Bandage [32]. The genome ends were predicted using PhageTerm [33], but not experimentally verified. For more details about all these procedures, see SI file 1 text.

Retrieval of related phage genomes and taxonomic assignment
Several publicly available datasets of cultivated and environmental phage genomes were queried for sequences related with the flavophages isolated in this study, in a multistep procedure (SI file 1 text). The datasets included GenBank Viral (ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/, downloaded on 17.03.2021), the GOV2 dataset, IMG/VR2, as well as further environmental datasets [13,[45][46][47][48][49][50]. To determine the relationship of the new flavophages and their relatives with taxa recognized by the International Committee of Taxonomy of Viruses (ICTV), we have added these phages to larger datasets, including ICTV recognized phages (for details, see SI file 1 text).
To determine the family-level classification of the flavophages, we used VirClust ( [51], www.virclust.icbm.de), ViPTree [52] and VICTOR [53] to calculate protein-based hierarchical clustering trees. For dsDNA flavophages, a VirClust hierarchical tree was first calculated for the isolates, their relatives, and the ICTV dataset. Based on this, a reduced dataset was compiled, from family level clades containing our flavophages. The reduced dsDNA flavophages dataset and the complete ssDNA flavophage dataset were further analyzed with VirClust, VipTree and VICTOR. The parameters for VirClust were: (i) protein clustering based on "evalue", after reciprocal BLASTP hits were removed if e-value >0.0001 and bitscore <50; (ii) hierarchical clustering based on protein clusters, agglomeration method "complete", 1000 bootstraps, tree cut at a distance of 0.9. The parameters for VICTOR were "amino acid" data type and the "d6" intergenomic distance formula. In addition to phylogenetic trees, VICTOR used the following predetermined distance thresholds to suggest taxon boundaries at subfamily (0.888940) and family (0.985225) level [53]. Furthermore, the web service of GRAViTy (http://gravity.cvr.gla.ac.uk, [54]) was used to determine the similarity of ssDNA phages and their relatives with other ssDNA viruses in the Baltimore Group II, Papillomaviridae and Polyomaviridae (VMRv34).
To determine the intra-familial relationships, smaller phage genome datasets corresponding to each family were analyzed using (i) nucleic acidbased intergenomic similarities calculated with VIRIDIC [55] and (ii) core protein phylogeny. The thresholds used for species and genus definition were 95% and 70% intergenomic similarity, respectively. The core protein analysis was conducted as follows: (i) core genes were calculated with the VirClust web tool [51], based on protein clusters calculated with the above parameters; (ii) duplicated proteins were removed; (iii) a multiple alignment of all concatenated core proteins was constructed with MUSCLE (v3.8.425, [56]) and (iv) used for the calculation of IQ-Trees with SH-aLRT [57] and ultrafast bootstrap values [58] using ModelFinder [59]. All phylogenetic trees were visualized using FigTree v1.4.4. [60], available at http://tree.bio.ed.ac.uk/software/figtree/).

Phage genome annotation
All phage genomes analyzed in this study were annotated using a custom bioinformatics pipeline described elsewhere [28], with modifications (SI file 1 text). The final protein annotations were evaluated manually.

Host assignment for environmental phage genomes
To determine potential hosts for the environmental phage genomes, several methods were used. First we did, a BLASTN [61] search (standard parameters) against the nucleotide collection (nr/nt, taxid:2, bacteria), for all phages and environmental contigs belonging to the newly defined viral families. The hit with the highest bitscore and annotated genes was chosen to indicate the host. Second, with the same genomes a BLASTN against the CRISPR/cas bacterial spacers from the metagenomic and isolate spacer database was run with standard settings using the IMG/VR website (https://img.jgi.doe.gov/cgi-bin/vr/main.cgi). Third, WIsH [62] was used to predict hosts (standard parameters) with the GEM metagenomic contig database [63] as host database.

Detection of flavophages and their hosts in Helgoland metagenomes by read mapping
The presence of flavophages, flavobacterial hosts and environmental phages in unassembled metagenomes from the North Sea and their relative abundances were determined by read mapping, using a custom bioinformatics pipeline [28]. Two datasets were analyzed: (i) cellular metagenomes (0.2-3 µm fraction) from the spring 2016 algal bloom (SI file 1 Table 2) and (ii) cellular metagenomes (0.2-3 µm, 3-10 µm and >10 µm fractions) from the spring 2018 algal bloom (SI file 1 Table 2).

N. Bartlau et al.
A bacterium was considered present when >60% of the genome was covered by reads with at least 95% identity. A phage was present when >75% of its genome was covered by reads with at least 90% identity. Relatives of a phage were present when >60% of its genome was covered by reads with at least 70% identity. The relative abundance of a phage/ host genome in a sample was calculated by the following formula: "number of bases at ≥ x% identity aligning to the genome/genome size in bases/library size in gigabases (Gb)".

Host 16S rRNA analysis
The genomes of bacteria from which phages were isolated were sequenced with Sequel I technology (Pacific Biosciences, Menlo Park, USA) (SI file 1 text). After genome assembly the quality was checked and 16S rRNA operons were retrieved using the MiGA online platform [64]. Additionally, the 16S rRNA gene from all the other bacterial strains was amplified and sequenced using the Sanger technology (see SI file 1 text).
For phylogenetic analysis, a neighbor joining tree with Jukes-Cantor correction and a RAxML tree (version 8, [65]) were calculated using ARB [66]. The reference data set Ref132 was used, with the termini filter and Capnocytophaga as outgroup [67]. Afterwards, a consensus tree was calculated.

CRISPR spacer search
CRISPR spacers and cas systems were identified in the host genomes by CRISPRCasFinder [68]. Extracted spacers were mapped with the Geneious Assembler to the flavophage genomes in highest sensitivity mode without trimming. Gaps were allowed up to 20% of the spacer and with a maximum size of 5, word length was 10, and a maximum of 50% mismatches per spacer was allowed. Gaps were counted as mismatches and only results up to 1 mismatch were considered for the phage assignment to the hosts used in this study.
The IMG/VR [48] web service was used to search for spacers targeting the flavophage isolates and the related environmental genomes. A BLASTN against the viral spacer database and the metagenome spacer database were run with standard parameters (e-value of 1e-5). Only hits with less than two mismatches were taken into account.

RESULTS
Spring phytoplankton blooms were monitored by chlorophyll a measurements (Fig. 1, SI file 1 Fig. 1). In 2018, the bloom had two chlorophyll a peaks, and it was more prominent than in 2017. Diatoms and green algae dominated the 2018 bloom (SI file 1 Fig. 2). During both blooms, bacterial cell numbers almost tripled, from~6.5 × 10 5 cells ml −1 to~2 × 10 6 cells ml −1 . The Bacteroidetes population showed a similar trend, as revealed by 16S rRNA FISH data (Fig. 1).  Most of the isolations were done by enrichment, and some by direct plating (asterisks). Phage detection in metagenomes was performed by read mapping, with a 90% read identity threshold for phages in the same species (full red circles) and 70% read identity threshold for related phages (dashed red circles). Alternating gray shading of isolates indicates phages belonging to the same family. by TEM were almost two orders of magnitude lower than those determined by SYBR Gold (Fig. 1), a typical phenomenon when comparing these two methods [69]. However, both methods showed a strong increase of viral particle numbers over the course of the bloom. All viruses counted by TEM were tailed, a strong indication that they were infecting bacteria or archaea, but not algae [70] (SI file 1 Fig. 3). The capsid size ranged between 54 and 61 nm, without any significant differences between the three time points (SI file 1 Fig. 4). The virus to bacteria ratio increased throughout the bloom, almost doubling ( Fig. 1, SI file 1 Table 4).

Flavophage isolation and classification
For phage enrichment, 23 bacterial strains previously isolated from algal blooms in the North Sea were used as potential hosts (SI file 1 Table 1). In 2017, we implemented a method for enriching flavophages on six host bacteria. A much larger and more diverse collection of 21 mostly recently isolated Flavobacteriia was used in 2018. A total of 108 phage isolates were obtained for 10 of the bacterial strains, either by direct plating or by enrichment (see Table 1) These were affiliated with the bacterial genera Polaribacter, Cellulophaga, Olleya, Tenacibaculum, Winogradskyella, and Maribacter (Fig. 2, SI file 1 Table 5).
Intergenomic similarities at the nucleic acid level allowed the grouping of the 108 flavophages into 44 strains (100% similarity threshold) and 12 species (95% similarity threshold) (SI file 2). A summary of the new phage species and their exemplar isolate phage, including their binomial name and isolation data, is found in Table 1 Virion morphology as determined by TEM showed that 11 of the exemplar phages were tailed (Fig. 3). According to the new megataxonomy of viruses, tailed phages belong to the realm Duplodnaviria, kingdom Heunggongvirae, phylum Uroviricota, class Caudoviricetes, order Caudovirales [71]. The only non-tailed, icosahedral phage was Omtje (Fig. 4). Further digestion experiments with different nucleases showed that Omtje has a ssDNA genome (SI file 1 Fig. 5).
Hierarchical clustering with VirClust placed the new dsDNA, tailed flavophages into 9 clades of similar rank with the current eleven Caudovirales families (Fig. 3, SI file 1 Figs. 6 and 7, SI file 3).
Similar clades were obtained with VICTOR and ViPTree (SI file 1 Figs. 8 and 9). These clades formed individual clusters when the VirClust tree was cut at a 0.9 distance threshold (Fig. 3), which was shown to delineate the majority of Caudovirales families [51]. In agreement, these clades were assigned to different subfamilies by the VICTOR analysis (SI file 1 Fig. 8), which, in the light of the current changes in the ICTV phage classification, represent different families [72]. Therefore, we are proposing that the 9 clades correspond to 9 new families, which we tentatively named "Helgolandviridae", "Duneviridae", "Winoviridae", "Molycolviridae", "Pachyviridae", "Pervagoviridae" "Assiduviridae", "Forsetiviridae" and "Aggregaviridae" (Fig. 3). The core proteins for each family consisted mainly of structural proteins ( Table 2). With the exception of few proteins from "Pachyviridae" and "Pervagoviridae", the core proteins were not shared between the families (they belonged to separate protein clusters, see Fig. 3 and Table 2). Four of the new families ("Aggregaviridae", "Forsetiviridae", "Molycolviridae" and "Helgolandviridae") were formed only from new flavophages and from environmental phage genomes (Fig. 3, SI file 1 Fig. 8). The remaining five families also contained previously cultivated phages, infecting bacteria from the genus Cellulophaga ("Pervagoviridae", "Pachyviridae" and "Assiduviridae") and Flavobacterium ("Duneviridae", "Winoviridae"). Part of the cultivated flavobacterial phages in the "Duneviridae" and "Winoviridae" are currently classified by ICTV in three genera in the families Siphoviridae and Myoviridae. However, because the Siphoviridae and Myoviridae families are based on phage morphologies, they are being slowly dissolved and split into new families, based on sequence data [73].
Hierarchical clustering using VirClust (Fig. 4, SI file 1 Figs. 16 and 17), ViPTree (SI file 1 Fig. 18), and VICTOR (SI file 1 Fig. 19) of a dataset including Omtje and all related and reference ssDNA phages showed that Omtje is clustering with previously isolated ssDNA phages infecting Cellulophaga, separately from other ssDNA phage families, the Microviridae, Inoviridae, and Plectroviridae. This was supported also by GRAViTy (SI file 5). Only one protein cluster was shared outside this cluster, with Flavobacterium phage FliP (Fig. 4), even when forming protein-superclusters based on HMM similarities (SI file 1 Fig. 20). We propose here that this cluster represents a new family, tentatively called here "Obscuriviridae". The placement of this family into higher taxonomic ranks, including the realm, remains to be determined in the future. Intergenomic similarity calculations (SI file 6), supported by core gene phylogenetic analysis (SI file 1 Fig. 21), indicate that Omtje forms a genus by itself, tentatively named here "Omtjevirus".
The genome size ranged between~38 kbp and~49 kbp. The G + C content was very low (28.9-29.7%). For Leef, PhageTerm predicted genome ends of type Cos 3′ (Table 3). Three types of structural proteins were identified in all phages, a major capsid protein, a tail tape measure protein, and a portal protein. Several genes for DNA replication, modification and nucleotide metabolism genes were found ( Fig. 5 and SI file 7). An N-acetylmuramidase, potentially functioning as an endolysin, was detected in Danklef and Leef, surrounded by transmembrane domain (TMD) containing proteins. All three phages encoded an integrase, and thus have the potential to undergo a temperate lifestyle. Leef had also a LuxR protein, which is a quorum-sensing dependent transcriptional activator, and a pectin lyase.
The genome size ranged from~6.5 kbp (Omtje) to~73 kbp (Calle). The G + C content varied between 31.2 and 38.1% ( Table 3). The few structural genes recognized were in agreement with the virion morphology (Fig. 5). For example, Ingeline and Fig. 3 VirClust hierarchical clustering of the new dsDNA flavophages and their relatives (reduced dataset), based on intergenomic distances calculated using the protein cluster content. For an extended tree, including the larger ICTV dataset, see SI file 3. 1. Hierarchical clustering tree. Two support values, selective inference (si, [99]) and approximately unbiased (au, [100]), are indicated at branching points (si/ au) only for the major clades (see SI file 1 Figs. 6 and 7 for all support values). The tree was cut into smaller viral genome clusters (VGCs) using a 0.9 distance threshold. Each VGC containing our flavophages was proposed here as a new family. Exceptions were made for the "Aggregaviridae" and "Forsetiviridae", for which only part of the VGC were included in the families, to exclude some genomes with lower support values. Each VGC is framed in a rectangle in 2 and 3. 2. Silhouette width, measures how related is a virus with other viruses in the same VGCs. Similarity to other VGCs is indicated by values closer to -1 (red). Similarity to viruses in the same VGC is indicated by values closer to 1 (green). 3. Distribution of the protein clusters (PCs) in the viral genomes. 4. Genome length (bps). 5. Fraction of proteins shared with other viruses (dark gray), based on protein assignment to PCs. 6. Virus names, with flavophages isolated in this study marked in red. 7. Genus assignment. Gray bars indicate already defined genera by the ICTV with the following names from top to bottom: Silviavirus, Labanvirus, Unahavirus, Pippivirus, Lillamyvirus, Muminvirus, Lillamyvirus, Helsingorvirus, and Incheonvirus. 8. Habitat. 9. Host association, as determined by: 9a) BlastN; 9b) CRISPR spacers; 9c) WIsH with host database GEM. 10. Life style genes. 11. TEM images of the new flavophages, uranyl acetate negative staining. Scale bar in each TEM image has 50 nm 12. Newly proposed families. Full name of environmental phages in "Winoviridae" and unclassified: AP013511.1_Uncultured_Mediterranean_phage_uvMED,_group_G21,_isolate__uvMED-CGR-C117A-MedDCM-OCT-S32-C49 and, AP013358.1_Uncultured_Mediterranean_phage_uvMED,_group_G1,_isolate__uvMED-CGR-U-MedDCM-OCT-S27-C45, respectively.
Nekkels had a tape measure protein, and a portal protein was found in all three-tailed phages. From the replication genes, we detected in Omtje a replication initiation protein specific for ssDNA phages, and in Calle a DNA polymerase A (Fig. 5).
Potential endolysins found were the mannosyl-glycoprotein endo-beta-N-acetylglucosaminidase in Omtje and the N-acetylmuramoyl-L-alanine-amidase in Calle. The latter had in its vicinity two proteins with a TMD. In Ingeline we annotated a unimolecular spanin. Nekkels encoded a glucoside-hydrolase of the GH19 family, with a peptidoglycan-binding domain. In its vicinity there were three proteins with three to six TMDs and one potential unimolecular spanin (Table 3, SI file 7).
Pectin and chondroitin/alginate lyase domains were found in Ingeline and Nekkels, as part of bigger proteins, also carrying signaling peptides. Nekkels also encoded a Yersinia outer protein X (YopX), a potential virulence factor against eukaryotes. Ingeline encoded an integrase and a LuxR gene, pointing toward a potential temperate life style. Calle had 20 tRNAs and one tmRNA gene (Fig. 5, SI file 7).
In these phages we recognized several structural, DNA replication and modification, and nucleotide metabolism genes (Fig. 5). With respect to replication, Molly and Colly had a DNA polymerase I gene, plus a helicase and a primase. In Gundel and Harreka we only found a DNA replication protein. In Peternella we found a MuA transposase, several structural genes similar to the Mu phage [74], and hybrid phage/host genome reads, indicating that Peternella is likely a transducing phage.
Potential endolysins were a N-acetylmuramoyl-L-alanineamidase in Molly and Peternella and a L-alanine-D-glutaminepeptidase in Gundel. Harreka had a glycoside-hydrolase of the GH19 family. In the genomic vicinity of the potential lysins, several proteins having 1-4 TMDs were found, including a holin in Peternella (Table 3, SI file 7).
Additional features of these phages were: (i) ten tRNA genes in Gundel; (ii) a relatively short (199 aa) zinc-dependent metallopeptidase, formed from a lipoprotein domain and the peptidase domain in Molly, and (iii) a YopX protein in Harreka (Fig. 5).

Environmental phage genomes
Six of the nine proposed new families ("Forsetiviridae", "Pachyviridae", "Pervagoviridae", "Winoviridae", "Helgolandviridae" and "Duneviridae") include members for whose genomes were assembled from environmental metagenomes (Fig. 3). We have briefly investigated which bacterial groups are potential hosts for these phages. Most of them gave BLASTN hits with a length between 74 and 5844 bases with bacterial genomes from the Bacteroidetes phylum (Fig. 3, SI file 1 Tables 6 and 7), likely due to the presence of prophages and horizontal gene transfer events. Some of the environmental viral genomes gave Bacteroidetes associated hits against the metagenome CRISPR spacer database (SI file 1 Table 8). The host prediction using WIsH supported the results by BLASTN and CRISPR spacer matching, and predicted members of Bacteroidetes as hosts for most of the environmental viral genomes. Some hosts were identified down to the family level, as Flavobacteriaceae (SI file 1 Table 9). Only phages in "Winoviridae", "Pervagoviridae" and "Helgolandviridae" had other families than Flavobacteriaceae as hosts, but all in the Bacteroidetes phylum (Fig. 3). In addition, several marine contigs from the "Helgolandviridae" contained Bacteroidetes Associated Carbohydrate-binding Often N-terminal (BACON) domains (SI file 1 Table 10). Together, these results suggest that most of the environmental viral genomes in the new families are infecting members of the phylum Bacteroidetes.
Integrase encoding genes were found on all environmental phages from the "Forsetiviridae", and some of the genomes from "Winoviridae", "Pachyviridae", "Helgolandviridae" and "Duneviridae". LuxR encoding genes were found in many genomes from "Helgolandviridae", "Duneviridae" and "Forsetiviridae", and one had also a transporter for the auto-inducer 2 (Fig. 3, SI file 1 Table 10). In the "Winoviridae", all phages encoded Mu-like structural proteins, including the MuD terminase, and two environmental viral genomes also encoded a MuA transposase.  Table 11). This shows that Freya, Danklef and Leef, or their relatives, have infected Polaribacter strains in the Helgoland sampling site before 2016, when the host Polaribacter strains were isolated. Spacers matching Nekkels were found in a metagenome of a Rhodophyta associated bacterial community (SI file 1 Table 7), showing the presence of Nekkels or its relatives in this habitat.
Read mapping for phages and hosts show presence in North Sea waters. To assess the presence and dynamics of flavophages in the North Sea, we mined by read mapping cellular metagenomes (>10 µm, 3-10 µm, 0.2-3 µm) from the 2016 and 2018 spring blooms. We found five of the new flavophages in the cellular metagenomes from the 2018 spring phytoplankton bloom, at three different time points (Table 4). The complete genomes of Freya, Harreka and Ingeline were covered by reads with 100% identity, signifying that these exact phage isolates were present in the environment. About 85% from Danklef's genome was covered with reads having 100% identity, indicating that close relatives of this phage (e.g., same species) were present. The genome of Leef was covered only 62% with reads of >70% identity, suggesting that more distant relatives (e.g., genus level) were detected. All phages and their relatives were exclusively found in the >3 µm and >10 µm metagenomes. The most abundant flavophages were Freya and Danklef, reaching 53.8 and 10.4 normalized genome coverage, respectively.
Further, we searched for the presence of the five flavophage hosts in the 2018 spring bloom (Table 4). Polaribacter sp. was found in the >10 µm and 0.2-3 µm fractions, at different time points Fig. 4 VirClust hierarchical clustering of the new ssDNA flavophages and their relatives, based on intergenomic distances calculated using the protein cluster content. 1. Hierarchical clustering tree. Two support values, selective inference (si [99]) and approximately unbiased (au [100]) are indicated at branching points (si/au) only for the major clades (see SI file 1 Figs. 16 and 17 for all support values). The tree was cut into smaller viral genome clusters (VGCs) using a 0.9 distance threshold. Each VGC is framed in a rectangle in 2 and 3. 2. Silhouette width, measures how related is a virus with other viruses in the same VGCs. Similarity to other VGCs is indicated by values closer to -1 (red). Similarity to viruses in the same VGC is indicated by values closer to 1 (green). 3. Distribution of the protein clusters (PCs) in the viral genomes. 4. Genome length (bps). 5. Fraction of proteins shared with other viruses (dark gray), based on protein assignment to PCs. 6. Virus names, with flavophages isolated in this study marked in red. 7. TEM image of the new flavophage, uranyl acetate negative staining. Scale bar in TEM image has 50 nm. 8. Family (ICTV). 9. Kingdom (ICTV). 10. Realm (ICTV). Lighter colors in columns 8-10 represent phages not recognized by the ICTV, but by publications. Table 2. Core genes of the newly defined families.

Family
Total core genes Core gene annotation Number of core genes used for phylogeny Core genes used for phylogeny "Forsetiviridae" 8 Tape measure protein, major capsid protein, portal protein, structural proteins (5) 7 Tape measure protein, major capsid protein, portal protein, structural proteins (4) "Winoviridae"
b Two structural proteins each are shared (belong to the same protein cluster) between Pachyviridae and Pevagoviridae.
c One hypothetical protein is shared (belongs to the same protein cluster) between Pachyviridae and Pevagoviridae.
Number of core genes with the same annotation are indicated in parentheses. Table 3. Genome ends were predicted with PhageTerm. Due to sequencing method and ambiguous results not for all phage genomes ends were determined (n.d.). For Omtje and Gundel, having no tail and a very pointy tail, respectively, the tail length and tail width measurements were not applicable (n.a.).
during the bloom, including the same two samples in which its phages (Freya, Danklef, and Leef) and their relatives were detected. At the beginning of the bloom, in April, significantly more phages than hosts were present in the >10 µm fraction. The phage/host genome ratio was 384 for Freya, and 74 for Danklef. One and a half months later, only relatives of Freya, Danklef, and Leef were found, the phage/host ratio being lower than 0.1. Discrimination between the different Polaribacter strains was not possible, due to their high similarity (>98% ANI). Olleya sp. was found in the 0.2-3 µm and >10 µm fractions, at low abundances and dates preceding the detection of its phage, Harreka. Cellulophaga sp., the host for Ingeline, was not found (SI file 1 Tables 12-16).

DISCUSSION
We have performed a cultivation-based assessment of the diversity of flavophages potentially modulating Flavobacteriia, a key group of heterotrophic bacteria, in the ecological context of spring bloom events in two consecutive years. In contrast to highthroughput viromics-based studies, our approach enabled us to link phages to their host species and even to strains. Highly diverse flavophages, belonging to two distinct viral realms were isolated. As a point of reference, a viral realm is the equivalent of a domain in the cellular world [71]. Furthermore, four of the ten new families, and nine of the ten new genera had no previously cultivated representatives. This study not only uncovers a novel flavophage diversity, but also, structures a substantial part of the known marine flavophage diversity into families. These novel flavophage families are relevant not only for the marine environment. Besides cultivated flavophages, six of the families also include environmental phages from marine, freshwater, wastewater, and soil samples, which most likely infect Bacteroidetes.
During the phylogenetic analysis we have worked closely with ICTV members, to ensure a good quality of the phage taxonomic affiliations. Two taxonomic proposals for the new defined taxa are being submitted, one for flavophages in Duplodnaviria and one for "Obscuriviridae".
Genomic analysis indicates that the new flavophages have various life styles and diverse replication strategy characteristics. Some families are dominated by potentially temperate phages, and others by potentially strictly lytic phages, as indicated by the presence/absence of integrases. Genome replication can take place (i) through long concatemers [75] (Gundel and Leef), (ii) replicative transposition [76] (Peternella), and (iii) the rolling circle mechanism [77] (Omtje).
The lysis mechanism in the new dsDNA flavophages likely follows the canonical holin/endolysins model, as suggested by the lack of membrane binding domains in the potential endolysins. Harreka and Nekkels do not encode easily recognizable lysis enzymes. Instead, they encode each a GH19. Usually, this hydrolase family is known for chitin degradation yet peptidoglycan may also be degraded [78]. A phage GH19 expressed in Escherichia coli caused cellular lysis [79]. Furthermore, in Harreka and Nekkels, the vicinity with potential holins, antiholins, and spanin, and the peptidoglycan-binding domain in Nekkels, suggest that the GH19 proteins of these two phages likely function as endolysins and degrade bacterial peptidoglycan. It cannot be excluded though that these enzymes have a dual function. Once released extracellularly due to lysis, the endolysins could degrade chitin, an abundant polysaccharide in the marine environment, produced for example by green-algae or copepods [80].
The presence of lyases in the genomes of Leef (pectin lyase), Ingeline (pectin lyase) and Nekkels (pectin and alginate lyases) suggests that their bacterial hosts, Polaribacter sp. and Cellulophaga sp., are surrounded by polysaccharides. In the marine JAB -do ma in-c ont ain ing pro tein zin c-fi nge r-co nta inin g dom ain AA A fam ily AT Pas e ant irep res sor pro tein L-A la-D -Gl u-p ept ida se like tRN As tRN As tRN A Yop X pro tein cal cin eur in-l ike pho sph oes era se sup erfa mil y dom ain acy l car ier pro tein BA CO N dom ain -co nta inin g pro tein VH S10 69 pro tein BA CO N dom ain -co nta inin g pro tein KilA -N dom ain RN A-b ind ing pro tein Yce l fam ily pro tein ferr ic upt ake reg ula tor bet a-la cta ma se sup erfa mil y dom ain Pcf K-li ke pro tein Pcf J-li ke pro tein ferr ic upt ake reg ula tor bet a-la cta ma se sup erfa mil y dom ain Pcf K-li ke pro tein Pcf J-li ke pro tein  Bacteria are not only able to degrade these polysaccharides, but also to produce them and form capsules or an extracellular matrix of biofilms [87][88][89]. In phages, such polysaccharide degrading enzymes are usually located on the tails, as part of proteins with multiple domains to reach the bacterial cell membrane. Another enzyme potentially degrading proteins in the extracellular matrix was the zinc-dependent metallopeptidase of Molly, a "Mollyvirus", which infected a Maribacter strain. By depolymerizing the extracellular matrix surrounding the cells, lyases and peptidases help the phage to reach the bacterial membranes for infection or allow the new progeny to escape the cell debris and the extracellular matrix [90,91]. It remains to be proven if phages carrying these enzymes contribute significantly to the degradation of algal excreted polysaccharides, as a byproduct of their quest to infect new bacterial cells.
Previous studies indicate that flavobacteriia can exhibit a surface-associated life style [92]. Our results paint a similar picture. For example, we detected phages for Polaribacter, Cellulophaga and Olleya, as well as the Polaribacter and Olleya genomes themselves in the particulate fraction of the cellular metagenomes. Therefore, it is likely that these bacteria are associated with particles, protists, phytoplankton or zooplankton. Spacers in metagenomes matching Nekkels, a Cellulophaga phage, suggest an association with red macro-algae (SI file 1 Table 7). An association with eukaryotes is also supported by the presence of YopX proteins in Harreka, infecting Olleya, and in all three "Assiduviridae" phages infecting Cellulophaga. The ability for an attached lifestyle is indicated by scanning electron microscopy images of Cellulophaga sp. HaHaR_3_176 cells showing high amounts of extracellular material (SI file 1 Fig. 23).
The presence of integrases and LuxR genes in phages from the "Helgolandviridae" and "Duneviridae" (Fig. 3 and SI file 1 Table 10) suggests that they likely respond to changes in the host cell densities, for example by switching from the temperate to the lytic cycle, as recently observed in Vibrio cholerae (pro)-phages [93,94]. This type of quorum-sensing response would make sense in a habitat in which the host cells can achieve high densities, as for example in association with planktonic organisms or particles.
All of our flavophages, except Molly and Colly, replicated successfully at different times throughout the 2018 phytoplankton bloom. The isolation by enrichment or direct-plating indicated their presence in the viral fraction, and the retrieval by read-mapping indicated a likely presence inside the host or association with surfaces. Calle and Nekkels, infecting Cellulophaga, although not detected in the cellular metagenomes, were present in high abundance (at least 100 plaque-forming units ml −1 ) in the viral fraction of our samples, as revealed by successful isolation by direct-plating without previous enrichment. This apparent gap between the presences in either the viral or cellular fraction, could be either explained by phage lysis prior the metagenome collection, or by the fact that their host habitat was not sampled for metagenomics. A potential explanation is that members of the genus Cellulophaga are known to grow predominantly on macro-algae [95]. The detection of Omtje, Peternella, and Gundel in enrichment cultures, but not by direct-plating or in the cellular metagenomes, indicate that their presence in the environment is low. Further investigations of the specific habitat of both phage and host are necessary to confirm these findings. For flavophages with temperate potential, the ratio between phage and host normalized read abundance can be used to predict their lytic or temperate state in the environmental samples. Cellulophaga, the host of Ingeline, was not detected throughout the spring bloom. However, because Ingeline was detected presumably in the particle fraction and thus might be inside its host, we can hypothesize that either its host was in a low abundance, or its genome was degraded under the phage influence. Either way, it points toward Ingeline being in a lytic cycle at the time of detection. For Freya, Danklef, and their relatives, the high phage to host genome ratios (as high as 454×) from April suggest that these phages were actively replicating and lysing their cells. In contrast, when they reappeared in May, these phages were not only three orders of magnitude less abundant than in April, but also approximately ten times less abundant than their host. This indicates that in May, only a small proportion of the Polaribacter cells were infected with relatives of Freya and Danklef, either in a lytic or a temperate state.
A temporal dynamics in phage populations is also reflected in the appearance of Freya and Gundel. Moreover, it is reminiscent of the boom and bust behavior previously observed with T4-like marine phages [96]. In addition, the read mapping results indicated a change in dominant phage genotypes, from the same species and even strain with Danklef and Freya in April, towards more distant relatives at the genus or even family level in May. This suggests a selection of resistant Polaribacter strains after the April infection and differs from the genotypic changes previously observed for marine phages, which were at the species level [97]. Persistence over longer times in the environment was shown for Polaribacter phages by the CRISPR/Cas spacers, and for Omtje and Ingeline, by their isolation in two consecutive years.
The North Sea flavophages isolated in this study display a very high taxonomic, genomic and life style diversity. They are a stable and active part of the microbial community in Helgoland waters. With their influence on the dynamics of their host populations by either lysis or with regard to specific bacterial population by substrate facilitation, they are modulating the carbon cycling in coastal shelf seas. The increase in bacterial numbers, reflected in the phage numbers and the ratio of phage to bacterial cells, indicate that phages actively replicate through the 2018 phytoplankton bloom, matching previous observations of the North Sea microbial community [98]. Our read mapping data indicate complex dynamics, which can now be further investigated with a large number of valuable novel phage-host systems obtained in this study.