Introduction

Marine cyanobacteria from the genus Synechococcus are major primary producers, contributing significantly to the global carbon cycle (Scanlan, 2003). Synechococcus strains are divided into distinct clades based on their phylogeny and physiology (Rocap et al., 2002; Fuller et al., 2003; Scanlan et al., 2009) and have different distribution patterns that correlate with distinctive environmental conditions (Zwirglmaier et al., 2008). Their genomes range in size from 2.2 to 2.9 Mbp with 50–67% of their genes (~1570) being common to all strains (Dufresne et al., 2008; Scanlan et al., 2009). A substantial number of additional genes are found in only a subset of the strains, which are generally localized in hypervariable genomic islands (Palenik et al., 2003; Dufresne et al., 2008). Island-residing genes are often differentially expressed in response to environmental stressors (Coleman et al., 2006; Thompson et al., 2011a; Stuart et al., 2013; Tetu et al., 2013), as well as in response to viral infection (Lindell et al., 2007).

The oceans are teeming with viruses, with global estimates reaching 1030 virus-like particles (Suttle, 2005). They are thought to have a major impact on host mortality (Fuhrman, 1999; Suttle, 2005) and evolution (Avrani et al., 2011; Marston et al., 2012), and on the biogeochemical cycling of matter (Suttle, 2005; Breitbart, 2012). Both broad and narrow host-range phages are known to exist in the oceans (Miller et al., 2003a; Sullivan et al., 2003; Suttle, 2005; Comeau et al., 2006). Narrow host-range (specialist) phages are assumed to infect highly abundant hosts, whereas broad host-range (generalist) phages are expected to be prevalent when abundances of different host types are low or variable (Woolhouse et al., 2001; Sullivan et al., 2003; Elena et al., 2009; Dekel-Bird et al., 2015). Thus, broad host-range phages infect genetically diverse hosts (Sullivan et al., 2003; Elena et al., 2009; Dekel-Bird et al., 2015) and can respond rapidly to increases in population abundance of a wide range of host types (Suttle, 2005).

Representatives of all three families of tailed double-stranded DNA viruses, the Myoviridae, the Podoviridae and the Siphoviridae, are known to infect marine cyanobacteria. Of these the T4-like cyanomyoviruses are well-known to have a broad host-range (Waterbury and Valois, 1993; Suttle and Chan, 1993; Sullivan et al., 2003; Millard and Mann, 2006; Wang and Chen, 2008; Dekel-Bird et al., 2015). Furthermore, they are considered to be one of the most abundant cyanophages in the oceans based on isolation studies and metagenomics surveys (Marston and Sallee, 2003; Sullivan et al., 2003; Suttle, 2005; Millard and Mann, 2006; Angly et al., 2006; Bench et al., 2007). They are named T4-like cyanophages owing to their morphological and genomic similarity to the enteric Escherichia coli T4 phage (Sullivan et al., 2005; Mann et al., 2005; Weigele et al., 2007; Millard et al., 2009; Sullivan et al., 2010). Their genomes are 174–253 kb in size with 215–334 genes, of which 40–50 are homologous to T4 replication and virion structural genes. In addition, they carry a number of auxiliary metabolic genes that are of (cyano)bacterial origin, such as photosynthesis, carbon metabolism and phosphorus utilization genes (Mann et al., 2005; Sullivan et al., 2005; Weigele et al., 2007; Millard et al., 2009; Sullivan et al., 2010) that are likely to be highly relevant for phage fitness in the environment.

The temporal transcriptional events during phage infection of a single host are known to be highly reproducible (Pène and Uzan, 2000; Uzan, 2009). However, it is currently not known whether the transcriptional program of broad host-range phages is differentially tailored for infection of distinct hosts or whether the same program is maintained irrespective of the host infected. In addition, little is known about the transcriptional response of different hosts to infection by the same phage. This is of particular interest in light of the expectation that generalism comes with tradeoffs and thus differential fitness during infection of diverse hosts (Woolhouse et al., 2001; Elena et al., 2009). Furthermore, there is a growing understanding that different bacteria employ distinct defense strategies in response to phage infection (reviewed in Labrie et al., 2010; Stern and Sorek, 2011) and that these defense mechanisms are often encoded by strain-specific genes (Jabbar and Snyder, 1984; Riede and Eschbach, 1986; Mosig et al., 1997; Rifat et al., 2008).

The transcriptional program of the ecologically important T4-like cyanophages has yet to be investigated, although studies limited to a few genes have been carried out (Clokie and Mann, 2006; Thompson et al., 2011b). Phage transcriptional programs are generally orchestrated in three expression classes of early, middle and late genes, corresponding to host take-over, replication and virion morphogenesis, respectively (Guttman et al., 2005; Staley et al., 2007). In T4, this is regulated by the sequential modification of the host RNA polymerase and associated sigma factor, leading to the consecutive recognition of different promoter sequences: early T4 promoters resemble the major E. coli promoters recognized by the primary host sigma-70 (σ70) factor; middle promoters have a distinctive motif that are recognized with the aid of two phage proteins, AsiA and MotA; and expression from late promoters requires a phage-encoded sigma factor, gp55 (reviewed in Mosig and Eiserling, 2006). The lack of T4-like middle promoters and the motA and/or asiA genes led to the suggestion that a variety of T4-like phages, including marine cyanophages, have a simpler expression program consisting of only two modules (Desplats et al., 2002; Miller et al., 2003a; Mann et al., 2005; Clokie et al., 2006). However, this hypothesis has not yet been tested.

In this study we investigated the infection process and transcriptional program of Syn9, a broad host-range T4-like cyanophage (Waterbury and Valois, 1993), during infection of three different Synechococcus hosts and assessed their transcriptional response to infection using both RNA-seq and microarray analyses. The three Synechococcus hosts, WH7803, WH8102 and WH8109, are phylogenetically distinct and occupy different oceanic environments (Zwirglmaier et al., 2008; Scanlan et al., 2009). We found that the transcriptional program of the phage is nearly identical irrespective of the host infected and revealed a regulatory program that significantly deviates from the current paradigm for T4-like phages. This transcriptional and regulatory program appears to be common among T4-like cyanophages as a similar infection process was observed for an additional phage, P-TIM40, during infection of Prochlorococcus NATL2A, and similar promoter motifs were identified in numerous sequenced cyanophage genomes and in metagenomes. Our results indicate that the well-known mode of regulation in T4 cannot be taken as the rule among the broader family of T4-like phages. In contrast to the near-identical transcriptional program of the Syn9 phage, transcriptional responses of the three Synechococcus hosts showed considerable heterogeneity, with a large number of responsive genes located in hypervariable genomic islands. This likely reflects different strategies in the hosts’ attempts to cope with infection.

Materials and methods

Culture growth and experimental design

Synechococcus strains were grown in an artificial seawater-based medium (Wyman et al., 1985) with modifications as described in Lindell et al., 1998, at 22 °C under cool white light with a 14 : 10 h light:dark cycle at an intensity of 30 μmol photon m−2 s−1 during the light period. Generation times were: 2.1±0.1 days for WH7803 and WH8102 and 1.4±0.1 days for WH8109. The Syn9 phage was isolated from the Atlantic Ocean (Waterbury and Valois, 1993) and has 232 genes in its genome, of which ~20% are homologs of T4 genes (Weigele et al., 2007). Syn9 was propagated on Synechococcus WH8102. Cultures (800 ml) were infected at a multiplicity of infection of 2–3 ~2 h after ‘lights on’ in the morning. The infection cycle was complete 4 h before darkness. Samples were collected at different time points during the lytic cycle (5 min and 0.5, 1, 1.5, 2, 3, 4, 5, 6, 8, 10 h after phage addition), with samples from the appropriate time points used to assess host and phage RNA transcription, host DNA degradation, phage DNA replication and release of infective phage. Three independent infection experiments were carried out on each of the Synechococcus hosts for all analyses. Where relevant, results were compared with uninfected control cultures.

Prochlorococcus NATL2A was grown in the seawater-based medium Pro99 (Moore et al., 2007) with Mediterranean seawater under the same conditions described above. Generation time under these conditions was 1.5±0.1 days. The P-TIM40 phage was isolated from the Pacific Ocean and its genome was sequenced as part of this study (see Supplementary Text). Sequencing revealed a 188.6 kb genome and a GC content of 40.7%. Its genome encodes 235 predicted open-reading frames, 141 of which are homologous to Syn9, and one tRNA gene. P-TIM40 contains all 38 T4-like core genes as well as the 25 additional T4-like cyanophage core genes as defined by Sullivan et al., 2010. Auxiliary metabolic genes found in this phage include the psbA, hli, PTOX, phoH, talC, CP12, mazG, hsp20 and tryptophan halogenase genes. Prochlorococcus NATL2A (600 ml) was infected at a multiplicity of infection of two in triplicate experiments and the transcriptional program of the P-TIM40 phage was investigated during the first 3 h of the 10 h latent period, with samples for RNA analysis taken at 0.5, 1, 2 and 3 h after phage addition.

Quantification of infective phages and phage and host genomic DNA

Infective phages in the extracellular medium were quantified using the plaque assay. Samples were filtered over 0.2 μm sterile syringe filters (Tuffryn HT), and serial dilutions of the phage-containing filtrate were plated on lawns of the host in pour-plates of 0.28% ultra-pure low melting point agarose (Invitrogen, Carlsbad, CA, USA) in artificial seawater-based medium, as described in Moore et al., 2007. Extracellular phage DNA was quantified using qPCR for the g20 gene (see Supplementary Text) on samples diluted 100-fold in 10 mm Tris (pH8).

Intracellular phage and Synechococcus DNA was quantified after collection of cells on 0.2 μm pore-sized polycarbonate filters (GE, Fairfield, CT, USA) by filtration (7–10 inch Hg). Cells on the filters were washed three times with sterile seawater and once with 3-ml preservation solution (10 mm Tris, 100 mm EDTA, 0.5 m NaCl; pH8) and frozen at −80 °C. A heat lysis method was used to extract DNA from Synechococcus cells (Zinser et al., 2006). In brief, the polycarbonate filter with Synechococcus cells was immersed in 10 mm Tris pH8 and agitated in a mini-bead beater for 2 min at 5000 rpm without beads. The sample was removed from the filter shards and heated at 95 °C for 15 min and used in triplicate qPCR reactions. The portal protein gene (g20) was amplified for Syn9 DNA and the rbcL gene was amplified for Synechococcus DNA using the TaqMan qPCR procedure. See Supplementary Text for details of the PCR procedures. Standard curves were produced using linear plasmid DNA containing the phage gene and genomic DNA for the host gene. See Supplementary Table S1 for all qPCR primer sequences and Universal Probe Library (Roche Diagnostics, Basel, Switzerland) probe identities used in this study.

RNA extraction

Cultures for RNA extraction (50–75 ml of each infected or control culture) were collected on 0.4-μm pore-sized polycarbonate filters (GE) by filtration (10 inch Hg) and snap frozen in liquid nitrogen in 2 ml PGTX buffer and stored at −80 °C. Total RNA was extracted using the PGTX 95 method (Pinto et al., 2009) with minor modifications. Total nucleic acids were quantified based on absorption at 260 nm and RNA integrity was verified by gel electrophoresis. DNA was removed by DNase I digestion using the Turbo DNA-free kit (Ambion, Foster City, CA, USA). See Supplementary Text for further details of these procedures.

Microarray experimentation

An Affymetrix high-density custom-made tiling array was utilized for determining gene expression of the three Synechococcus hosts during infection by Syn9 and of the P-TIM40 phage during infection of Prochlorococcus. This array, SYN-PHG, contains perfect match probes on both strands for the three Synechococcus hosts at a density of every 35 nt and for the Syn9 and P-TIM40 phages at a density of every 17 nt (see Supplementary Text for more details on the microarray). Three biological replicates were analyzed for each time point after phage addition. Time points analyzed for Syn9 infection of the three Synechococcus strains were 2–5 min and 0.5, 1, 1.5, 2, 3, 4 h after phage addition, and were 0.5, 1, 2, 3 h after phage addition for P-TIM40 infection of Prochlorococcus NATL2A. Synthesis of cDNA, labeling, hybridization, staining and scanning were carried out according to Affymetrix protocols for Prokaryotic target RNA (GeneChip Expression Analysis Technical Manual, Prokaryotic; 702232, Rev.3) using 2 μg of total RNA, and were carried out at the Center for Genomic Technologies, Hebrew University of Jerusalem, Israel.

Data were analyzed using the probe signal intensities from the CEL files and the probe sets defined in the CDF library file using the R statistical language with Bioconductor add-on packages. The CEL file contains the signal intensities of all probes on the array for a particular sample, whereas the CDF library file defines the sets of probes that are positioned within each open-reading frame and is used for calculating expression summaries for each gene. Following previous analyses (Lindell et al., 2007), background signal correction, normalization and summarization were performed by Robust Multiarray Averaging from the affy package. Differential expression between infected and uninfected controls was statistically evaluated by a linear model comparing the time-resolved signal intensities from the arrays with uninfected hosts to those from arrays with infected hosts (Limma package; Smyth, 2005). P-values were adjusted for multiple testing and converted to false-discovery rates (q-values) by the Benjamini–Hochberg approach. A detailed description of the microarray design, experimental execution and analysis is provided in the Supplementary Text.

RNA-seq experimentation

RNA-seq analysis was used for the investigation of the Syn9 transcriptome during infection of the three Synechococcus hosts. The cDNA libraries were prepared using Illumina’s TruSeq RNA Sample Preparation Kit without the polyA isolation step. Whole-transcriptome libraries were prepared for two replicates of each infected host at four time points (5 min, 0.5, 1, 2 h after phage addition), and for the pooled triplicates of each uninfected host (control) at three time points (0.5, 1, 2 h after the beginning of the experiment). Complementary DNA libraries for bisulfite strand-specific sequencing were prepared as previously described (Edelheit et al., 2013). Libraries were prepared for pooled replicates of the infected Synechococcus sp. WH8109 host at three time points (0.5, 1, 2 h) and for uninfected Synechococcus sp. WH8109 at the 0.5 h time point. The 5′-end sequencing libraries were prepared as previously described (Wurtzel et al., 2012a, 2012b), with some modifications, and are briefly described in the Supplementary Text. RNA-seq libraries were sequenced using the Illumina HiSeq platform at the Weizmann Institute.

A detailed description of the analyses of the RNA-sequencing results is provided in the Supplementary Text. In brief, the RNA-seq reads were aligned separately for each sample to the reference genomes, and analyzed according to the library type. Reads from the whole-transcriptome library were counted for each gene, normalized to host rnpB transcript levels and to gene length. The normalized data were used for clustering the phage genes by their expression profile and are shown in all figures unless otherwise stated. The RPKM normalization was not used for phage gene expression because when most of the genes change their expression at a given condition, the RPKM—which calculates the expression of each gene relative to the sum of all mapped reads—fails to represent the actual changes in expression; therefore, an external reference point was needed. Normalization with a house-keeping gene has been shown to perform well for RNA-seq read counts (Bullard et al., 2010), and the rnpB gene was used for this purpose as the transcript levels of this gene were stable over the course of infection (Supplementary Figure S1), as has been shown previously during phage infection of Prochlorococcus (Lindell et al., 2007). Transcript levels of this gene are stable across a variety of physiological conditions in cyanobacteria (for example see Mitschke et al., 2011), and hence this gene is often used as an internal control for cyanobacterial expression studies (for examples see references Holtzendorff et al., 2001; Muro-Pastor et al., 2001; Stork et al., 2005; Puerta-Fernández and Vioque, 2011). Reads from the bisulfite strand-specific libraries were used to estimate antisense transcripts (asRNA) length according to the coverage they produced for the phage genome. Reads from the 5′-end libraries were used for determining transcriptional start sites across the phage genome, and were analyzed as previously described (Wurtzel et al., 2012a) with some minor modifications (Supplementary Text). Regions upstream of these transcriptional start sites were scanned for promoter motifs.

A summary of the RNA-sequencing data (number of reads, number of mapped reads and so on) is presented in Supplementary Tables S2–S4. The RNA-seq data are available at http://www.weizmann.ac.il/molgen/Sorek/syn9_browser/ under the ‘Data files’ tab. The transcriptome coverage data were normalized according to library size for each lane separately, to allow visual comparison between hosts at each time point.

Comparison of microarray and RNA-seq data

A comparison of the RNA-seq and microarray results, together with verification of gene profiles by qRT-PCR was carried out for the Syn9 infection experiments. This showed that the microarrays quickly became saturated for phage transcripts levels, whereas the RNA-seq captured the high dynamic range of the phage transcriptome more accurately and without saturation (see Supplementary Text, Supplementary Figure S2). However, the microarrays better represented the host’s transcriptional response, as the high proportion of phage-derived sequencing reads in the RNA-seq data set largely masked the readout of the host gene expression (see Supplementary Text,Supplementary Figure S2, Supplementary Tables S2–S4). We therefore used the RNA-seq platform to study the dynamics of the phage transcriptome, whereas the microarrays were used to infer relative host transcript levels after infection.

Clustering of phage gene expression

Cluster analyses for Syn9 were performed on normalized transcript levels of phage genes derived from the RNA-seq data. Input data consisted of transcript levels measured at four time points after infection from two independent replicates from each of the three hosts (resulting in 24 expression values per gene). To standardize across hosts prior to clustering, the transcript levels for each gene were adjusted to have a maximum expression equal to 1 and a minimum of 0 in each host independently. Hierarchical clustering was performed using Euclidian distance and average linkage as implemented in the R package stats. See Supplementary Text for Jaccard coefficient analysis of clustering. The dendrogram of Syn9 expression clusters was plotted using the dendrogram function (R package stats), with the gene names at the dendrogram tips colored using the labels_colors function (R package dendextend). The branches were colored using the dendrapply function with attribute ‘edgePar’ (R package stats). The ovals adjacent to the dendrogram tips were added manually.

Mass spectrometry identification of virion proteins

Phage virion particle proteins were identified following Sabehi et al., 2012. See Supplementary Text for details.

Data deposition in public databases

The genomes of Synechococcus sp. WH8109 and cyanophage P-TIM40 were sequenced as part of this study (see Supplementary Text) and were deposited in GenBank under the accession numbers CP006882 and KP211958, respectively.

The mRNA expression data were deposited in GEO under the SuperSeries accession number GSE74922: The microarray and RNA-sequencing data appear under the SubSeries accession numbers GSE63690 and GSE74921, respectively.

Results

Syn9 phage infection and transcriptome dynamics were nearly identical in the three hosts

We began by investigating the infection process and transcriptome dynamics of the Syn9 phage during infection of three distinct Synechococcus hosts: WH7803, WH8102 and WH8109. These strains belong to different clades and occupy different niches in the oceans (Table 1). Furthermore, 17–24% of their genomes contain strain-specific genes with only about two-thirds of their genes being common among all three hosts (Figure 1a). These strain-specific non-core genes are generally localized to genomic islands (Dufresne et al., 2008) (Supplementary Figure S3).

Table 1 Host strains used in this study
Figure 1
figure 1

Infection dynamics of the Syn9 phage in three Synechococcus hosts. (a) Core and accessory genes in the genomes of the three Synechococcus hosts. (b) The length of the lytic cycle as a function of the number of infective phage released to the extracellular medium, as determined from the plaque assay. Average and s.d. of three biological replicates for WH7803 and WH8102, and two biological replicates for WH8109. (c) The timing of intracellular phage gDNA replication (closed symbols) and host gDNA degradation (open symbols) as determined by qPCR targeting the phage g20 portal protein gene and the host rbcL Rubisco gene, respectively. Data are presented as percent of maximal gDNA for each gene. Average and s.d. of three biological replicates. (d) Ratios of phage and host mRNA with time after infection as determined from RNA-seq reads that mapped to the phage and host genomes, respectively. Average and range of two biological replicates.

Despite the extensive genomic differences mentioned above, the progression of the Syn9 infection cycle was similar in all three Synechococcus hosts. The lytic cycle was 6–8 h long with a latent period of ~4 h prior to release of the first virions from the infected cells (Figure 1b). Furthermore, a similar yield of infective phage was produced in two of the three hosts, but fewer infective phage were released during infection of WH8109 (Figure 1b), presumably due to a lower ratio of infective to total phage production on this host relative to the other hosts (Supplementary Figure S4). Host genomic DNA was rapidly degraded in all three hosts with levels dropping to 20% of the initial (maximal) level by 1 h after infection (Figure 1c). By this time phage genome replication had begun in all three hosts, although the timing of maximal phage genome replication occurred slightly later during infection of WH8102 (Figure 1c). Furthermore, transcription inside the cell was largely phage derived by 30 min after infection in all three hosts. An almost complete switch to phage transcription had occurred by 2 h after infection, with phage mRNA constituting ~98% of the total cellular mRNA (Figure 1d, Supplementary Tables S2–S4).

The temporal progression of gene expression was practically identical during infection of all three hosts, with largely the same gene cluster patterns found in three temporal expression classes (see below). A very small set of phage genes showed some variation in transcription patterns between hosts; however, none of these observed variations was statistically significant (Supplementary Figure S5). Strikingly, the correlation between transcript levels of individual phage genes when infecting each of the three hosts was extremely high (Figure 2). In fact, nearly identical expression patterns were observed for almost all phage genes (for example, Figure 2a and b). This is even though expression levels spanned three orders of magnitude between the most highly expressed and the most lowly expressed phage genes (Figure 2c and d). Thus, the Syn9 phage followed a clearly defined and predetermined gene expression program in three temporal classes irrespective of the host infected.

Figure 2
figure 2

High correlation in phage gene expression across hosts. (a, b) Gene expression in representative regions of the phage genome showing early (a) and late (b) phage genes at three different time points post infection (p.i.). Phage gene expression during infection of the Synechococcus WH7803, WH8102 and WH8109 hosts is shown with red, orange and brown lines, respectively. X axis, position on the phage genome; phage gene products are represented by red block arrows below the X axis. The RNA-seq reads (Y axis) were normalized to library size. (c, d) Correlation of phage gene expression for representative hosts and time points: at 0.5 h after infection of Synechococcus WH8102 and WH8109 (c) and at 2 h after infection of Synechococcus WH8109 and WH7803 (d). Expression of phage genes averaged over two biological replicates is shown in black and biological replicates are shown in red and blue (normalized to host rnpB transcript levels). Pearson correlation coefficients (r) using linear values are shown in the respective panels with P-values <2.2e-16 for both correlations. Coefficients of determination (R2) are also shown in each panel. Pearson correlation coefficients using logged values were 0.99 and 0.96 for (c, d), respectively. See Supplementary Material for correlations between all hosts and time points (Supplementary Figure S13) as well as for reproducibility between biological replicates (Supplementary Figure S14).

The phage temporal expression program

Three temporal classes of early, middle and late genes were evident for Syn9 when clustering the phage genes according to their expression patterns (Figure 3, Supplementary Figure S6, Supplementary Table S5, Supplementary Text). Each class can be further divided into subclasses, displaying a variety of expression patterns (Figure 3). For example, the expression of some early genes declined 1 h following infection, whereas others remained highly expressed past 1 h (Figure 3, Supplementary Figure S7).

Figure 3
figure 3

The temporal expression program of the Syn9 phage. Clustering of phage genes by their expression patterns is presented as a dendrogram with early, middle and late expression clusters shown in green, blue and red lines, respectively. Gene names at the dendrogram tips are colored according to the promoter class driving their expression (see legend, top left). The colors of the ovals adjacent to the genes denote the major classes of gene functions (see legend, bottom left). Graphs at the right of the subclusters show expression profiles of the individual genes in that subcluster as a function of time after infection. Subcluster designation at the top left corner of each graph is as in Supplementary Table S5. Relative transcript levels (Y axes of expression profile graphs) were normalized to minimum (0) and maximum (1) values for each gene and were averaged across all three hosts. A third early cluster with only four genes is presented in the Supplementary Material (Supplementary Results and Discussion; Supplementary Figure S7). The clustering and the construction of the dendrogram are explained in the Materials and methods under ‘Clustering of phage gene expression’.

The early genes in Syn9 constitute 15% of the genome. The majority (29 out of 35) were already expressed 5 min after infection (Figure 4a). Most are concentrated in a single 8 kb window on the phage genome and are transcribed from seven operons spanning 30 open-reading frames (Figure 4b, Supplementary Figure S8). None of the early genes are homologous to early genes of T4. Most, however, are homologous to hypothetical proteins found in only a subset of other cyanophages (Weigele et al., 2007; Sullivan et al., 2010). Only three are homologous to genes in the databases that code for protein domains of known function. These are a helicase (g75); a gene with AAA and cbbQ type domains (g153, previously annotated as cobS or a putative porphyrin biosynthesis gene); and a gene with a nucleotidyl transferase domain (g184). Interestingly, g184 is found in all T4-like cyanophage genomes, and remains highly expressed for at least 2 h after infection. We found that its protein is present in the virion particle (Supplementary Table S6) and is probably carried in the capsid and may thus function throughout the infection process.

Figure 4
figure 4

Syn9 early, middle and late promoters. (a) Expression of phage genes at 5 min after infection of the Synechococcus WH8102 host. (b) An expanded view of the region on the phage genome showing expression of the early-phage gene cluster at 5 min after infection. X and Y axes are as in Figure 2a. Transcription patterns suggest that the genes are organized in several consecutive operons. Green vertical lines represent the transcription start sites (TSS) of these operons, experimentally determined from genome-wide 5′ end RNA-seq. (ce) Promoter logos of the sequences upstream of the TSSs of the phage early (c), middle (d) and late (e) genes. The numbers below the motifs represent the position upstream of the TSS. For (c) and (d), sequences were aligned by the prominent motifs, separately for each region of the motif. For (e), sequences were aligned relative to the TSS. Representative promoter sequences from each class are shown below each consensus motif. Promoter logos were generated from 9 early promoters (c), 70 middle promoters (d) and 59 late promoters (e).

The middle genes consist of 106 genes, and include the DNA replication, nucleotide metabolism and recombination/repair genes identified by Weigele et al., 2007 as homologs of T4 genes. These include the DNA polymerase, primase, helicase, ribonucleotide reductase and thymidylate synthase genes (Supplementary Table S5). The T4 gp55 sigma factor homolog required for gene expression from late promoters in T4 is also transcribed as a middle gene. The six Syn9 tRNA genes are transcribed as middle genes and would therefore be available for the translation of both middle and late proteins as previously hypothesized (Weigele et al., 2007; Limor-Waisberg et al., 2011; Enav et al., 2012) (see Supplementary Text). In addition, a number of auxiliary metabolic genes coded in the Syn9 genome belong to the middle gene cluster (see below) as do five genes encoding virion particle proteins.

The late-gene cluster consists of 91 genes (Supplementary Table S6) of which at least 32 are structural genes based on mass spectrometry we performed to identify the proteins making up the virion particle (Supplementary Table S6 and see Weigele et al., 2007). These include the capsid, tail and putative tail fiber proteins (Weigele et al., 2007) as well as numerous proteins of unknown function. Other late genes include the DNA-packaging terminase genes, as well as the regA translational repressor gene, a few DNA replication and metabolism genes and some auxiliary metabolic genes (see below).

Phage asRNAs were identified from high-throughput transcriptional start site mapping. Thirty-one asRNAs were reproducibly expressed during infection of all three hosts (Supplementary Table S7). Many were transcribed from middle promoters, whereas a few were transcribed from late promoters. Strand-specific bisulfite sequencing revealed that some asRNAs overlap the 5′-untranslated region of phage genes, whereas others span multiple genes (Supplementary Figure S9). The lack of a predicted ORF suggests that many of the asRNA transcripts have a regulatory function (see Supplementary Text). An asRNA that may be involved in regulating gene expression was first reported in the T4-like cyanophage, S-PM2 (Millard et al., 2010).

The (cyano)bacterial-like auxiliary metabolic genes coded by Syn9 were transcribed in the middle- and late-temporal expression classes. Most of the cyanobacterial-like photosynthesis genes were transcribed in the middle cluster with the DNA replication and metabolism genes (Supplementary Table S5; Figure 3). These include the photosystem II core reaction center genes, psbA and psbD, two photosynthesis-related hli stress-response genes, and the phycoerythrin biosynthesis gene, cpeT. Two of these genes (psbA and hli) are known to be transcribed with DNA replication and metabolism genes in other cyanophages (Lindell et al., 2005; Clokie et al., 2006; Lindell et al., 2007) and are thought to be important for continued photosynthetic activity and the production of ATP and reducing equivalents needed for phage DNA replication and nucleotide biosynthesis (Mann et al., 2003; Lindell et al., 2005; Clokie et al., 2006; Lindell et al., 2007). In addition, two photosynthetic electron transfer genes (petE and PTOX) clustered with the late genes, with maximal transcript levels detected at 2 h after infection (see Supplementary Text).

A single bacteria-like carbon metabolism gene, the CP12 Calvin cycle inhibitor gene, had maximal transcription 1 h after infection and clustered with the middle genes. This gene is thought to direct carbon flux to the pentose phosphate pathway (PPP) during infection (Thompson et al., 2011b), which, in turn, has been hypothesized to enhance the production of ribose substrates and reducing equivalents for nucleotide biosynthesis during infection (Lindell et al., 2007; Thompson et al., 2011b). In contrast to CP12, the three pentose phosphate pathway genes (gnd, zwf and talC) clustered with the late genes, different to findings reported in Thompson et al., 2011b (see Supplementary Text). Additional bacterial-like genes, such as the phoH-family stress–response gene, the mazG potential regulatory gene and the tryptophan halogenase genes, were transcribed with middle genes. However, their function in cyanophages remains unclear.

Novel regulation of the transcriptional program

To understand how the phage expression program is regulated, we determined the promoters of all phage transcripts after mapping transcriptional start sites at a single-base resolution. A complete temporal promoter map for all Syn9 phage transcripts can be visualized using the interactive ‘Syn9 transcriptome browser’ we devised (http://www.weizmann.ac.il/molgen/Sorek/syn9_browser/). Grouping of promoters according to the time of appearance of their associated transcriptional start sites revealed distinct motif signatures that are likely to be responsible for directing the expression of the early, middle and late genes (Figure 4c and e). A very prominent element was found upstream of the early transcripts, composed of two palindromic 7-bp motifs separated by 10 bp, which are found 29–30 bp upstream of a −10 Pribnow box (with a consensus sequence of TGTGACA-N10-TGTCACA-N30-TATACT) (Figure 4c). These results stand in contrast with current knowledge on phage early transcription, because the expression from early-phage promoters is usually regulated by the host core transcriptional machinery, and hence phage early promoters are expected to resemble host σ70 promoters (Miller et al., 2003b; Nolan et al., 2006) (see Discussion). In contrast, the promoters of Syn9 middle genes, rather than the early genes, were characterized by the signature of the canonical σ70 recognition site, suggesting that it is these genes that are transcribed using the core host transcriptional machinery (Figure 4d). Finally, late promoters show a distinct motif very similar to the T4 late motif (Figure 4e; Miller et al., 2003b; Nolan et al., 2006). Overall we defined nine early promoters, 70 middle promoters and 59 late promoters upstream of phage genes. Approximately 25% of the promoters do not consist of any of these three signals. In these promoters, the motif might be too diverged from the canonical signal to be identified, or, a few of these promoters may be false positives inherent to our method.

Some phage genes presented complex regulation patterns, which can explain the variety of expression profiles observed (Figure 3). Many of these genes were activated by multiple alternative promoters of different temporal classes (Figure 3, Figure 5 and Supplementary Figure S10). For example, some of the early genes have both an early and a middle promoter, which explains how they retain their high expression at 1 h, dropping only at 2 h after infection (Figure 3). Other genes, such as g107 were transcribed from both middle and late promoters (Figure 5a). Alternative promoters also led to modular operon structures (Figure 5b) and, when within open-reading frames, to multiple protein variants (Figure 5c) (see Supplementary Text).

Figure 5
figure 5

Alternative Syn9 promoters generate complex temporal transcription patterns. Phage gene expression and promoter positions at 0.5 h (green) and 2 h (red) after infection of the Synechococcus WH8109 host. X and Y axes are as in Figure 2a. The letters E, M and L on the TSS vertical lines indicate whether the associated promoter is an early, middle or late promoter, respectively. (a) Example of an alternative promoter: a TSS driven by a middle promoter upstream of gene g107 is replaced by an alternative TSS driven by a late promoter at 2 h after infection. Normalized expression profiles of the genes are shown at the right. (b) An example of modular operon structure: g214 and g215 are transcribed from a middle promoter at 0.5 h after infection, and from an upstream late promoter at 2 h after infection as a part of the g213 operon. (c) An example of a modular gene: a middle promoter is inside the coding region of g125, potentially giving rise to a shorter protein. The blue line indicates the matching internal start codon. A TSS driven by a late promoter appears at 2 h after infection, upstream of the full length g125 ORF.

Other T4-like cyanophages display a program similar to Syn9

The expression of the Syn9 genome in three transcriptional clusters is different to expectations based on previous hypotheses for cyanophages (Mann et al., 2005; Clokie et al., 2006), and the regulatory elements that are likely to drive them are unlike those known for T4. Therefore, we wondered whether our findings are specific for Syn9 or are also found among other T4-like cyanophages. To begin addressing this question we analyzed the transcriptional program of P-TIM40, another T4-like cyanophage, during infection of Prochlorococcus NATL2A (see Materials and methods and the Supplementary Text for information on this phage). We found that, similar to Syn9, the P-TIM40 genome is transcribed in three expression clusters (Supplementary Figure S11). Also similar to Syn9, the early-gene cluster was made up of short genes of unknown function that were primarily clustered in one region of the genome (Figure 6a). Furthermore, the same distinctive early promoter motif identified in Syn9 was observed upstream of these early P-TIM40 genes (Figure 6a,Supplementary Figure S12). The middle- and late-gene expression clusters consisted of classical replication and morphogenesis genes, respectively (Supplementary Table S8). In addition, the psbA, hli, PTOX, talC, phoH and mazG auxiliary metabolic genes were transcribed with middle genes, whereas the CP12 and tryptophan halogenase genes were transcribed with late genes. The putative promoter sequences upstream of representative replication and morphogenesis genes revealed the host-like σ70 motif upstream of middle genes and the classical late motif upstream of late genes, as found for Syn9 (Figure 6c and d).

Figure 6
figure 6

Conservation of the three classes of Syn9 promoter motifs across T4-like cyanophages and environmental samples. (a) Examples of regions from sequenced cyanophage genomes showing the Syn9 early promoter motif. The P-TIM40 region consists of P-TIM40 early genes based on expression data analysis (see Supplementary Table S8). Phage genes are represented by block arrows; the genes that are homologous to Syn9 early genes are marked in grey. Locus numbers appear below gene names, as do positions on the phage genome. Phage names are shown on the left. The early promoter motif of each phage is displayed in Supplementary Figure S12. (b) Examples of contigs from the GOS project (Rusch et al., 2007) where the early promoter motif was detected (white flag). Genes with a significant BLAST hit to a viral gene are in grey (see Methods). ORFs (block arrows) were predicted using Glimmer (Salzberg et al., 1998). (c) Promoter sequences of a classical middle gene, the RNA polymerase sigma factor gene (T4 gp55-like), show the host-like σ70 promoter motif in Syn9 and in other representative T4-like cyanophages. This motif was found upstream of additional classical middle genes in the T4-like cyanophages, including the DNA polymerase (T4 gp43-like), DNA primase subunit (T4 gp61-like) and ssDNA-binding protein (T4 gp32-like) genes. (d) Promoter sequences of a classical late gene, the portal vertex protein gene (T4 gp20-like), show the late promoter motif in Syn9 and in other representative T4-like cyanophages. This motif was found upstream of additional classical late genes in the T4-like cyanophages, including the tail sheath protein (T4 gp18-like), baseplate hub subunit (T4 gp26-like) and head completion protein (T4 gp4-like) genes. For (c, d), promoter motifs (bold black type towards the left and middle of the sequences) and the TSS of the Syn9 genes (arrow) are shown. Sequences were aligned relative to the start codon of the gene (bold black type at the right end of the sequences).

Next, we assessed how common these early, middle- and late-promoter motifs are in the genomes of 15 other cyanophages that infect Prochlorococcus and Synechococcus and represent the diversity of known T4-like marine cyanophages (Sullivan et al., 2010). Multiple copies of the early motif were found in 14 out of the 15 phages (Supplementary Figure S12). In the vast majority of cases (90%) this promoter motif was located upstream of operon-like clusters of short uncharacterized genes that are localized to one main region of the genome (Figure 6a). In the S-PM2 phage, however, the operons of these putative early genes and their associated early promoter motifs were more dispersed in the genome relative to the other T4-like cyanophages investigated. Many of these operons contained genes homologous to Syn9 early genes (Figure 6a). These 15 cyanophage genomes also have host-like σ70 motif upstream of classical replication genes (Figure 6c) and the late promoter motif upstream of morphogenesis genes (Figure 6d).

We then wanted to determine whether the information-rich early promoter motif could be detected in natural populations of T4-like cyanophages. Indeed, this motif was found on 3035 contigs from the Global Ocean Sampling (GOS) expedition (Rusch et al., 2007). Of these, 924 (30%) contained at least one identifiable phage gene (Figure 6b), a 5.85-fold enrichment (P<2.2e−16, Fisher’s exact test) as compared with a random set of similarly sized contigs from the GOS data set. This suggests that these are bonafide phage contigs and that the genes downstream are early-phage genes. It should be noted that no traces of this early motif were detected in cyanobacterial genomes, nor in any other bacterial genome found in Genbank. Thus, the prevalence of this early motif in other T4-like cyanophage genomes and in the GOS data set implies that many T4-like cyanophages in the oceans undergo similar transcriptional and regulatory programs as those found for Syn9 and P-TIM40.

Host-specific transcriptional responses preferentially localize to genomic islands

Although phage gene expression was nearly identical irrespective of the host infected, the transcriptional response of the three hosts showed marked differences (see below). Nonetheless, similar general transcription patterns were observed. An immediate response was observed within 5 min of infection in all hosts, apparent from a transient increase in transcript levels of a small number of genes (Figure 7, Table 2, Supplementary Table S9) and a concomitant decline in transcript levels of another small group of genes (Table 2, Supplementary Table S10). Subsequently, within 0.5–1 h of phage infection, a marked decline of the vast majority of host genes (>90%) was observed. This massive decline was accompanied by a subsequent increase in transcript levels of some genes (Figure 7, Table 2,Supplementary Table S11), whereas other genes remained unchanged for most of the latent period (Table 2, Supplementary Table S12). Here we define host–response genes as those whose transcript levels underwent an immediate change as well as those that increased or remained unchanged subsequent to this immediate response. These latter two groups of genes seem to have been shielded from the rapid mRNA degradation or transcriptional downregulation, which was the fate of the vast majority of host genes.

Figure 7
figure 7

Temporal transcriptional profiles of host genes with an increase in transcript levels following Syn9 infection. (a) Synechococcus WH8102, (b) Synechococcus WH7803, (c) Synechococcus WH8109. Transcript levels, determined from microarrays, are presented as log2fold-change (FC) in infected cultures (inf) relative to uninfected cultures (ctrl) over the 4 h latent period of infection. Only genes whose expression levels were significant at a false-discovery rate of q<0.05 are shown. Immediately and subsequently increased transcript levels are shown with blue and red lines, respectively (see Supplementary Tables S9 and S11). Genes with unchanged and immediately declined transcript levels, as well as the majority of genes that subsequently declined, are not shown. Dashed lines indicate the change in transcript levels in the uninfected control cultures prior to infection relative to the infected cultures at 5 min after infection. Average of three biological replicates.

Table 2 Summary of the number of protein-coding host-response genes and their genomic locationa

Large disparities in the identities of host-response genes were found among the hosts. Most strikingly, of the 270 response genes combined across the three hosts, only two of them were common to all three hosts. These were the carboxysome carbon-fixation gene, ccmK2 (csoS1), which functions in concentrating CO2 (Supplementary Table S9); and a small heat shock-like chaperone (Supplementary Table S12). The majority of other response genes were host-specific. For example, transcript levels of three additional carbon-fixation genes and two high-light inducible (hli) stress-response genes increased only in Synechococcus WH8102. Nonetheless, the host-specific response genes often belonged to the same general functional groups of cell envelope, DNA repair, carbon fixation, respiration, translation (tRNA genes) and nutrient utilization.

An example of genes belonging to a similar functional group yet responding differentially to Syn9 infection is found for nitrogen transport and assimilation genes. In Synechococcus WH8102, for which the most marked immediate response was observed (Figure 7, Table 2), transcript levels of 33 nitrogen-related genes declined immediately after infection (Supplementary Table S10). Many belong to the NtcA regulon (Su et al., 2006), including ntcA itself, which is the major nitrogen transcriptional regulator in cyanobacteria (Herrero et al., 2001). In contrast to WH8102, transcript levels of a few nitrogen-related genes (nirA, narB, amt1) increased significantly in WH7803 immediately after infection, two of which are homologs of genes that declined in WH8102.

Some potential defense-related genes were also among the Synechococcus WH8102-response genes. Two of these genes, SYNW0071 and SYNW1946, encode a single-stranded RNA nuclease PIN domain often found in the toxic components of toxin–antitoxin operons in prokaryotes (Anantharaman and Aravind, 2003). The latter gene also encodes a PhoH domain, which is common in marine phages, including cyanophages such as Syn9 (Weigele et al., 2007; Sullivan et al., 2010; Goldsmith et al., 2011). A third gene, SYNW1659, is a conserved hypothetical protein that consists of a domain of unknown function (DUF3387, PF11867), that is associated with type I and type III restriction enzymes (see PF04851 and PF04313).

A number of other genes of unknown function with conserved domains were among the most highly expressed response genes. These included four genes whose transcript levels increased immediately after infection in WH8102 that have a common domain (DUF4278) limited to cyanobacteria and cyanophages. In addition, transcript levels of four homologous genes with the DUF1651 domain (three in WH8102 and one in WH8109) increased subsequently or remained unchanged in response to infection by Syn9. These too appear to be limited to unicellular marine cyanobacteria. Although the function of these two sets of genes remains unknown, they may represent a unique type of cyanobacterial-specific response to phage infection.

Analysis of the genomic localization of the response genes in the three hosts revealed that a large number of them (between 35–51%) are located in hypervariable genomic islands (Table 2). This is a significant overrepresentation with a 2–2.5-fold enrichment for each host (P<0.0001, Fisher’s exact test). This was most pronounced for genes with transcript levels that were subsequently increased or remained unchanged during the latent period, especially in Synechococcus WH8102 (Table 2). These results imply an important role for Synechococcus genomic islands in mediating the host response to phage infection (see Discussion).

Discussion

This study is the first to explore the transcriptional program of a broad host-range phage during infection of different hosts and the response of more than one host to the same phage. Our findings indicate that the ability of a single T4-like cyanophage to infect multiple, genetically diverse host types is not due to specialization of the infection process for the different hosts. Rather, if capable of infecting a particular host, the phage follows a strict, predetermined expression program and is indifferent to the considerable differences in host genome content. Conversely, each of the hosts mounted a gene-specific response to infection even though they were infected by the same phage. These findings suggest that successful infection of multiple hosts by broad host-range phages, at least within the T4-like phages, is more likely to be dependent on the effectiveness of host defense mechanisms in preventing infection than on the differential tailoring of the infection process by the phage for individual hosts.

Previous work suggested that T4-like cyanophages employ a two-class temporal transcriptional program, including early and late, but not middle, gene expression. This was initially based on bioinformatic analyses reporting the absence of T4-like middle promoter motifs from the S-PM2 and Syn9 genomes, as well as the lack of motA and asiA genes that are required for middle transcription in T4 (Mann et al., 2005; Weigele et al., 2007). Furthermore, host-like promoters, used for early expression in T4, were identified upstream of classical middle genes, leading to the conclusion that only early- and late-transcription occur in these cyanophages and that, different to T4, DNA replication genes are transcribed as early genes (Mann et al., 2005; Clokie et al., 2006). Subsequent studies that examined transcription of a small subset of S-PM2 and Syn9 genes also suggested that only two temporal expression classes were present in cyanophages (Clokie et al., 2006; Thompson et al., 2011b). However, neither of these studies examined the transcriptional profiles of any of the early genes identified in this present study.

In contrast to the conclusions reached in the above studies, our genome-wide transcriptional investigations of Syn9 and P-TIM40 clearly indicate that three temporal transcriptional classes are used by T4-like cyanophages and that they have three distinct promoter types. They further indicate that, similar to T4, classical DNA replication genes belong to the class of middle gene expression. However, these expression classes are regulated from promoter types different to those known for T4. This appears to be a common phenomenon across T4-like cyanophages, as seen from the presence of Syn9 promoter motifs upstream of similar gene types in diverse T4-like cyanophages: the newly identified prominent early motif is present upstream of clusters of small genes of unknown function; the host-like σ70 middle motif is upstream of DNA replication genes and the T4-like late motif is upstream of phage morphogenesis (structural) genes. Thus, our study provides the first detailed description of a cyanomyovirus transcriptional and regulatory program, and revealed a previously unidentified regulatory strategy that deviates from the T4 paradigm. Furthermore, this regulatory program appears to be ancestral to the T4-like marine cyanophage lineage. Moreover, the mechanism of regulation known so well for T4 may actually be the exception rather than the rule among the broad group of T4-like phages (see Discussion in Supplementary Text).

Although we have identified the promoter motifs likely to be directing the three transcriptional classes in T4-like cyanophages, the mechanisms underlying their temporal mode of regulation remain unknown. Particularly intriguing is the finding of the novel early promoter that controls early-gene expression in T4-like cyanophages. This early promoter motif has no resemblance to sequences in cyanobacterial genomes. Therefore, expression from this promoter element may well be enabled by a phage factor packaged in the capsid, especially as expression from this promoter is already observed 5 min after infection. Such factors are known to exist in other phages, for example, the Alt protein of phage T4 (Horvitz, 1974a, 1974b; Koch et al., 1995). What then prevents transcription from the host-like middle promoters during the early phase of infection? One possibility is that phage proteins entering the cell together with phage DNA modify the affinity of the host RNA polymerase for the different sequence motifs; increasing affinity for the early motif while reducing affinity for the middle host-like σ70 motif. Alternatively, access of the host RNA polymerase to phage middle promoters may be blocked by different phage proteins bound to the incoming DNA. Such proteins entering into the bacterial cell with the phage DNA are currently unknown for cyanophages, although some candidate proteins have been identified from our mass spectrometry analyses (Supplementary Table S6).

Despite the near-identical transcriptional program employed by the Syn9 phage, each host displayed a largely strain-specific response with genes that are preferentially localized to genomic islands. These included genes from the functional groups of cell envelope, DNA repair, carbon fixation, respiration and nutrient utilization. This may constitute a general response to stress, and these functional groups may be indicative of the kind of stresses experienced during infection. It is also possible that some of these host responses reflect an attempt at defense against infection, as inferred from the identity of some of the response genes. Alternatively, these genes may be induced by the phage to aid in the replication process, with the immediate response induced by phage proteins entering into the cell together with phage DNA. In this scenario, the factors involved would need to be different to those having a role in early-phage gene expression as early-phage promoter motifs are not found upstream of these (or any) cyanobacterial genes. The subsequent response may also be induced by a phage factor, but this would need to be one that is expressed relatively early in the transcriptional program. Rather than transcriptional induction, it is also feasible that the subsequent host-response is due to differential transcript stabilization, mediated by host or phage proteins, as drastic and rapid degradation of host DNA occurred within an hour of infection (Figure 1c). Indeed, both the E. coli T4 and T7 phages are known to induce changes in host transcript stability (Marchand et al., 2001; Ueno and Yonesaki, 2004) and duplex RNA has been proposed to stabilize host transcripts during infection of the marine cyanobacterium Prochlorococcus MED4 by the podovirus P-SSP7 (Stazic et al., 2011). Rapid degradation of host DNA is also known to occur during T4 infection (Mosig and Eiserling, 2006).

Until about a decade ago, it was generally thought that phage infection led to a complete shut-down of host transcription (Koerner and Snustad, 1979; Roucourt and Lavigne, 2009). However, the employment of whole-genome transcriptional analyses to assess host responses to infection has shown this not to be the case, with a range of responses in different host-phage systems (Poranen et al., 2006; Lindell et al., 2007; Ravantti et al., 2008; Lavigne et al., 2013, this study). First of all, common across systems is an immediate transient response followed by a subsequent response to infection. Yet the degree to which transcript levels increase, both with respect to the number of genes and their fold change, differ in the different systems. Furthermore, some but not all bacteria experience a massive decline in host gene expression upon infection. However, our findings can be generalized across marine Synechococcus (this study) and Prochlorococcus (Lindell et al., 2007), with both transient and subsequent increases in transcript levels accompanying an overall decline in host transcript abundance. Furthermore, not only did these same patterns occur in the two different host genera, but also in response to infection by phages belonging to two very different phage families, both T4-like (this study) and T7-like (Lindell et al., 2007) cyanophages. Whether this will also be the situation in response to infection by TIM5-like myoviruses and the siphoviruses in these cyanobacteria remains to be seen.

A most striking commonality in the response of the three Synechococcus hosts (this study) as well as that of Prochlorococcus (Lindell et al., 2007), was the preferential localization of the response genes in genomic islands. Indeed, responses of marine cyanobacteria to stressful conditions such as nutrient deprivation, high light, oxidative stress and metal toxicity have been documented to emanate from genes located in genomic islands (Coleman et al., 2006; Thompson et al., 2011a; Stuart et al., 2013). Furthermore, as found here, some of the most highly expressed genes in response to these environmental stressors are genes of unknown function located in genomic islands (Martiny et al., 2006; Tolonen et al., 2006; Thompson et al., 2011a). Thus, despite the apparent transient nature of genes in genomic islands (on evolutionary time scales) and the unknown function of many of these genes, they appear to be of great importance for the acclimation and adaptation of cyanobacteria to environmental change.

Numerous lines of evidence indicate that genomic islands are part of the co-evolutionary process between marine cyanobacteria and their phages. First, phages are considered one of the major means by which genes are transferred into genomic islands (Palenik et al., 2003, 2006; Coleman et al., 2006). Second, a number of host-responsive island genes have homologs in phage genomes, including hli and phoH-family genes, DNA repair and heat shock genes as well as genes of unknown function like the conserved DUF4278 domain-carrying genes ((Lindell et al., 2007); this study). Thus, expression of island genes during infection may have led to their utilization by phages over evolutionary time and their stable incorporation into phage genomes (Lindell et al., 2007). Third, phages serve as an important selective pressure influencing diversity and content of cell-surface genes in genomic islands (Avrani et al., 2011; Avrani and Lindell, 2015), the products of which are utilized by phage for attachment and entry into the host. Combined, these findings highlight the central role of hypervariable genomic islands in cyanobacterial host-phage interactions, serving as a central axis for host response and defense against phage infection, as well as a hotspot for dynamic gene exchange that continually reshapes the genomes of both host and phage.