Podoviruses that infect marine picocyanobacteria are abundant and could play a significant role on regulating host populations due to their specific phage-host relationship. Genome sequencing of cyanophages has unveiled that many marine cyanophages encode certain photosynthetic genes like psbA. It appears that psbA is only present in certain groups of cyanopodovirus isolates. In order to better understand the prevalence of psbA in cyanobacterial podoviruses, we searched the marine metagenomic database (GOS, BATS, HOT and MarineVirome). Our study suggests that 89% of recruited cyanopodovirus scaffolds from the GOS database contained the psbA gene, supporting the ecological relevance of the photosynthesis gene for surface oceanic cyanophages. Diversification between Clade A and B are consistent with recent finding of two major groups of cyanopodoviruses. All the data also shows that Clade B cyanopodoviruses dominate the surface ocean water, while Clade A cyanopodoviruses become more important in the coastal and estuarine environments.
Viruses are abundant in the ocean and can influence population dynamics and genetic diversity of their hosts1,2,3. Cyanophage are a specific group of viruses which infect cyanobacteria mainly including Prochlorococcus and Synechococcus. Many cyanophages have been isolated, and all the known marine cyanophages belong to three phage families: Myoviridae, Siphoviridae and Podoviridae4,5,6,7,8,9,10. Recent studies showed that cyanopodoviruses might make up 50% of cyanophage community in the sea11,12, suggesting that cyanopodoviruses interact actively with cyanobacteria in the marine environment.
Currently, nearly 40 cyanophage genomes have been sequenced, and half of them are cyanomyoviruses. Cyanomyoviruses have a relatively large genome size and acquire many accessory metabolic genes via horizontal gene transfer (HGT), which constitute the large reservoir of genetic diversity pool13,14,15,16,17,18,19. Five genome sequences of cyanosiphoviruses have been reported with genome size ranging from 30,332 to 105,532 bps20,21. Compared to cyanomyoviruses and cyanosiphoviruses, cyanopodoviruses have a relatively conserved genome size ranging from 42,257 to 47,872 bps11,22,23,24,25.
Genome sequencing of marine cyanophages has shown that many marine cyanophages encode photosynthesis genes. All the isolated cyanomyoviruses and more than half of the isolated cyanopodoviruses were detected to contain the key photosystem II reaction centre gene psbA in their genomes11,13,17,18,19,23,26,27,28,29, while no psbA gene was found among the known cyanosiphoviruses20,30. Two recent studies showed that 24 of 39 marine cyanopodovirus isolates contained psbA12 and 8 of 12 sequenced cyanopodovirus genomes encoded psbA13. In these two studies, the frequency of psbA-containing podoviruses was estimated based on isolated cyanophages which could be biased by the host used for isolation. Is it possible to quantify the presence of psbA in cyanopodoviruses in the ocean using a culture-independent approach? The metagenomic database is a useful tool, however these datasets in the public domain are also limited and may not represent true community composition.
In this study, we estimated the relative abundance and distribution of psbA-containing podoviruses based on the metagenomic data. Our approach is built on a conserved genomic structure of cyanopodoviruses. Cyanopodovirus genome organization can be divided into three parts: structural genes, nucleotide metabolism related genes and some hypothetical genes regions (Fig. 1)11,18,22,23,24. Both the composition and the arrangement of structural genes are conserved. One gene cluster, the “portal-capsid-tail/fiber”, existed in all cyanopodoviruses, as well as in other T7 phages31. Interestingly, the psbA gene was commonly located at a fixed position within the conserved gene cluster “portal-psbA-capsid”11. Based on this conserved gene cluster, we searched (BLAST) the GOS scaffold database using portal, capsid assembly, psbA and major capsid protein (MCP) genes, and successfully retrieved 79 cyanopodoviral scaffolds from the GOS database.
Among the 79 cyanopodovirus scaffolds, 70 contain psbA and 9 have no psbA. All the MCP sequences (>200 aa) were used to construct the phylogenetic tree. The MCP based phylogeny separated cyanopodoviruses into two major clades (Clade A and B) (Fig. 2), which is consistent with the phylogenetic relationship based on the DNA polymerase gene10,12,21,32. Nearly all cyanopodoviruses in Clade B carry the psbA gene whereas none of those in Clade A do (Fig. 2). A recent study also illustrated such psbA distribution pattern in cyanopodoviruses12.
In the Bermuda (BATS) database, 58 Clade B MCP homologs were recruited, but no Clade A MCP was found (Fig. 3A). We recruited 17 Clade B homologs, but no Clade A homologs from the North Pacific (HOT) database (Fig. 3A). In the GOS database, 729 Clade B MCP homologs and 18 Clade A MCP homologs were found (Fig. 3A). Interestingly, 17 of 18 of reads were recruited from the coastal water. It is likely that most of Clade A like sequences are from the podoviruses infecting marine Synechococcus10,33,34. In the MarineVirome database, 271 Clade B like MCP sequences and 4 Clade A like MCP sequences were detected (Fig. 3A).
Podoviruses in Clade A could be a transitional group between Clade B and other T7-like non-cyanobacterial podoviruses (Fig. 2). Four scaffolds in Clade B do not contain psbA, and the psbA gene in these four scaffolds might be lost during the evolution. Interestingly, scaffold JCVI_SCAF_109662694693 (in Clade B) contains a high light-induced gene (hli), but no psbA.
Our analysis suggests that Clade A podoviruses only make up a very small proportion of cyanopodoviruses in the surface ocean. In the open ocean, Clade A podoviruses only account for 0.27% and 1.12% of all cyanopodoviruses in the GOS and MarineVirome databases, respectively. In the coastal surface water, Clade A podoviruses can make up 8.02% and 14.29% of total cyanopodoviruses in the GOS and MarineVirome databases, respectively (Fig. 3B). Clade A podoviruses were not detected in the two open ocean stations, BATS and HOT. Clade A mainly consists of the psbA-lacking podoviruses which infect marine Synechococcus10,11,12. Our study suggests that it may be less important for cyanophages in coastal or estuarine environments to carry the psbA gene compared to cyanophages in the open ocean. Sullivan and colleagues also suggested a shorter latent period could explain the lack of psbA gene as result of shorter infection duration with no need the help of psbA23.
The metagenomic recruitment based on the unique portal-capsid structure provides a culture-independent survey on the distribution frequency of psbA-carrying cyanopodoviruses. However all of the datasets that were analyzed were mainly derived from the surface ocean. Our analysis suggests: 1) psbA-carrying cyanopodoviruses are the dominant cyanopodoviruses in the surface ocean; 2) Synechococcus podoviruses become relatively more abundant in the coastal water; 3) psbA is more important for oceanic cyanopodoviruses than for their coastal counterparts.
Four metagenomic databases were used to search homologs in our study: three from the bacterial fraction: the Global Ocean Survey database (GOS)35, the Bermuda database (BATS)36, the Hawaii Ocean Time-Series (HOT)37,38, and one viral fraction database: the MarineVirome39. All databases were obtained from the CAMERA website (http://camera.calit2.net/index.shtm).
Based on the cynaopodovirus genomic conserved gene cluster “portal-psbA-capsid”, we searched (BLAST) the GOS scaffold database using portal, capsid assembly, psbA and major capsid protein (MCP) genes using a reciprocal best-hit BLAST strategy but no e-value cutoff limitation (Fig. 1)40. The structural genes (portal, MCP or capsid assembly gene) allowed the identification of cyanopodoviruses via searching against the NCBI non-redundant proteins database.
To analyze the occurrence frequency and geographic pattern of cyanopodoviruses in the ocean, we recruited reads from BATS, GOS, HOT and MarineVirome datasets using all MCP sequences from sequenced cyanopodoviral genomes as published in Labrie's paper11,13. Our approach is similar to the methods described by Zhao et al.40,41. Briefly, all homologous reads were recruited from binning by e-value cutoff to avoid potential bias, and then each putative hit was extracted and used as a query to search against the NCBI non-redundant proteins database42. Metagenomic sequences returned a best-hit which could be used to confirm the classification, and all identified reads are listed in Table S1. The number of recruited reads was not normalized, because the method for sampling is different among all the sites and doesn't target the viruses. However, there should be no bias for cyanopodoviruses with or without psbA gene using any methods for sampling.
All the MCP sequences (>200 aa) were used to construct the phylogenetic tree. Sequences were aligned using Clustal X and phylogenetic trees were constructed using the neighbour-joining, minimum-evolution and maximum-parsimony algorithms of MEGA software 3.042. The phylogenetic trees were supported by bootstrap for re-sampling test with 1000 replicates.
This work was supported by the 973 Program (2013CB955700, 2011CB808800), the 863 Program (2012AA092003) and the NSFC (41376132). Both QZ and RZ were supported by the Fundamental Research Funds for the Central Universities (2013121051 and 2012121052 respectively). FC was supported by the Xiamen University 111 Program and Hanse-Wissenschaftskolleg Fellowship. Professor John Hodgkiss of The University of Hong Kong is thanked for polishing the English in this manuscript.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/