Viral tagging reveals discrete populations in Synechococcus viral genome sequence space

This article has been updated


Microbes and their viruses drive myriad processes across ecosystems ranging from oceans and soils to bioreactors and humans1,2,3,4. Despite this importance, microbial diversity is only now being mapped at scales relevant to nature5, while the viral diversity associated with any particular host remains little researched. Here we quantify host-associated viral diversity using viral-tagged metagenomics, which links viruses to specific host cells for high-throughput screening and sequencing. In a single experiment, we screened 107 Pacific Ocean viruses against a single strain of Synechococcus and found that naturally occurring cyanophage genome sequence space is statistically clustered into discrete populations. These population-based, host-linked viral ecological data suggest that, for this single host and seawater sample alone, there are at least 26 double-stranded DNA viral populations with estimated relative abundances ranging from 0.06 to 18.2%. These populations include previously cultivated cyanophage and new viral types missed by decades of isolate-based studies. Nucleotide identities of homologous genes mostly varied by less than 1% within populations, even in hypervariable genome regions, and by 42–71% between populations, which provides benchmarks for viral metagenomics and genome-based viral species definitions. Together these findings showcase a new approach to viral ecology that quantitatively links objectively defined environmental viral populations, and their genomes, to their hosts.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Population genome landscape plot showing the genetic relationship of cultivated and viral-tagged T4-like phages of Synechococcus WH7803 from a single seawater sample and all available marine cyanophage genomes.
Figure 2: The 15 dominant T4-like Candidatus genomes assembled from the viral-tagged metagenome.

Accession codes

Primary accessions


Data deposits

Data for viral genomes have been deposited in GenBank under accession numbers JN371768 and KF156338-40; metagenomic data have been deposited in CAMERA under accession numbers CAM_P_0001068 and CAM_P_0000915; raw data including gp23 sequences and informatic pipelines, assemblies and data for figures are available at

Change history

  • 10 September 2014

    Minor changes were made to Extended Data Fig. 2.


  1. 1

    Bergh, O., Børsheim, K. Y., Bratbak, G. & Heldal, M. High abundance of viruses found in aquatic environments. Nature 340, 467–468 (1989)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  2. 2

    Breitbart, M. Marine viruses: truth or dare. Annu. Rev. Mar. Sci. 4, 425–448 (2012)

    ADS  Google Scholar 

  3. 3

    Suttle, C. A. Marine viruses—major players in the global ecosystem. Nature Rev. Microbiol. 5, 801–812 (2007)

    CAS  Google Scholar 

  4. 4

    Hurwitz, B. L., Hallam, S. J. & Sullivan, M. B. Metabolic reprogramming by viruses in the sunlit and dark ocean. Genome Biol. 14, R123 (2013)

    PubMed  PubMed Central  Google Scholar 

  5. 5

    Flombaum, P. et al. Present and future global distributions of the marine Cyanobacteria Prochlorococcus and Synechococcus. Proc. Natl Acad. Sci. USA 110, 9824–9829 (2013)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  6. 6

    Holmfeldt, K., Odić, D., Sullivan, M. B., Middelboe, M. & Riemann, L. Cultivated single-stranded DNA phages that infect marine Bacteroidetes prove difficult to detect with DNA-binding stains. Appl. Environ. Microbiol. 78, 892–894 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7

    Hatfull, G. F., Jacobs-Sera, D. & Lawrence, J. G. Comparative genomic analysis of 60 mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. J. Mol. Biol. 397, 119–143 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Lavigne, R., Seto, D., Mahadevan, P., Ackermann, H.-W. & Kropinski, A. M. Unifying classical and molecular taxonomic classification: analysis of the Podoviridae using BLASTP-based tools. Res. Microbiol. 159, 406–414 (2008)

    CAS  PubMed  Google Scholar 

  9. 9

    Sullivan, M. B. et al. Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments. Environ. Microbiol. 12, 3035–3056 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10

    Mann, N. H. et al. The genome of S-PM2, a ‘photosynthetic’ T4-type bacteriophage that infects marine Synechococcus strains. J. Bacteriol. 187, 3188–3200 (2005)

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Deng, L., Gregory, A., Yilmaz, S., Poulos, B. T., Hugenholtz, P. & Sullivan, M. B. Contrasting life strategies of viruses that infect photo- and heterotrophic bacteria, as revealed by viral tagging. MBio 3, e00373–12 (2012)

    PubMed  PubMed Central  Google Scholar 

  12. 12

    Duhaime, M. B., Deng, L., Poulos, B. T. & Sullivan, M. B. Towards quantitative metagenomics of wild viruses and other ultra-low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method. Environ. Microbiol. 14, 2526–2537 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13

    Weitz, J. S. et al. Phage–bacteria infection networks. Trends Microbiol. 21, 82–91 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Sharon, I. et al. Comparative metagenomics of microbial traits within oceanic viral communities. ISME J. 5, 1178–1190 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    Sharon, I. et al. Photosystem I gene cassettes are present in marine virus genomes. Nature 461, 258–262 (2009)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Sullivan, M. B., Coleman, M. L., Weigele, P., Rohwer, F. & Chisholm, S. W. Three Prochlorococcus cyanophage genomes: signature features and ecological interpretations. PLoS Biol. 3, e144 (2005)

    PubMed  PubMed Central  Google Scholar 

  17. 17

    Millard, A. D., Zwirglmaier, K., Downey, M. J., Mann, N. H. & Scanlan, D. J. Comparative genomics of marine cyanomyoviruses reveals the widespread occurrence of Synechococcus host genes localized to a hyperplastic region: implications for mechanisms of cyanophage evolution. Environ. Microbiol. 11, 2370–2387 (2009)

    CAS  PubMed  Google Scholar 

  18. 18

    Angly, F. et al. Genomic analysis of multiple Roseophage SIO1 strains. Environ. Microbiol. 11, 2863–2873 (2009)

    CAS  Google Scholar 

  19. 19

    Konstantinidis, K. T. Genomic insights that advance the species definition for prokaryotes. Proc. Natl Acad. Sci. USA 102, 2567–2572 (2005)

    ADS  CAS  PubMed  Google Scholar 

  20. 20

    Lawrence, J. G., Hatfull, G. F. & Hendrix, R. W. Imbroglios of viral taxonomy: genetic exchange and failings of phenetic approaches. J. Bacteriol. 184, 4891–4905 (2002)

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Hendrix, R. W., Lawrence, J. G., Hatfull, G. F. & Casjens, S. The origins and ongoing evolution of viruses. Trends Microbiol. 8, 504–508 (2000)

    CAS  PubMed  Google Scholar 

  22. 22

    Hatfull, G. F. The secret lives of mycobacteriophages. Adv. Virus Res. 82, 179–288 (2012)

    CAS  PubMed  Google Scholar 

  23. 23

    Hendrix, R. W., Smith, M. C., Burns, R. N., Ford, M. E. & Hatfull, G. F. Evolutionary relationships among diverse bacteriophages and prophages: all the world’s a phage. Proc. Natl Acad. Sci. USA 96, 2192–2197 (1999)

    ADS  CAS  PubMed  Google Scholar 

  24. 24

    Marston, M. F. & Amrich, C. G. Recombination and microdiversity in coastal marine cyanophages. Environ. Microbiol. 11, 2893–2903 (2009)

    PubMed  PubMed Central  Google Scholar 

  25. 25

    Ignacio-Espinoza, J. C. & Sullivan, M. B. Phylogenomics of T4 cyanophages: lateral gene transfer in the ‘core’ and origins of host genes. Environ. Microbiol. 14, 2113–2126 (2012)

    CAS  PubMed  Google Scholar 

  26. 26

    Labonté, J. M. & Suttle, C. A. Previously unknown and highly divergent ssDNA viruses populate the oceans. ISME J. 7, 2169–2177 (2013)

    PubMed  PubMed Central  Google Scholar 

  27. 27

    Polz, M. F., Alm, E. J. & Hanage, W. P. Horizontal gene transfer and the evolution of bacterial and archaeal population structure. Trends Genet. 29, 170–175 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Whittaker, R. H. Dominance and diversity in land plant communities: numerical relations of species express the importance of competition in community function and evolution. Science 147, 250–260 (1965)

    ADS  CAS  PubMed  Google Scholar 

  29. 29

    Ceyssens, P. J. et al. Phenotypic and genotypic variations within a single bacteriophage species. Virol. J. 8, 134 (2011)

    PubMed  PubMed Central  Google Scholar 

  30. 30

    Shapiro, B. J. et al. Population genomics of early events in the ecological differentiation of bacteria. Science 336, 48–51 (2012)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Waterbury, J. B., Watson, S. W., Valois, F. W. & Franks, D. G. Biological and ecological characterization of the marine unicellular cyanobacterium Synechococcus. Can. Bull. Fish. Aquat. Sci. 214, 71–120 (1986)

    Google Scholar 

  32. 32

    John, S. G. et al. A simple and efficient method for concentration of ocean viruses by chemical flocculation. Environ. Microbiol. Rep. 3, 195–202 (2011)

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33

    Hurwitz, B. L., Deng, L., Poulos, B. P. & Sullivan, M. B. Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics. Environ. Microbiol. 15, 1428–1440 (2013)

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Deng, L. & Hayes, P. K. Evidence for cyanophages active against bloom-forming freshwater cyanobacteria. Freshw. Biol. 53, 1240–1252 (2008)

    CAS  Google Scholar 

  35. 35

    Suttle, C. A. & Chan, A. M. Marine cyanophages infecting oceanic and coastal strains of Synechococcus: abundance, morphology, cross-infectivity and growth characteristics. Mar. Ecol. Prog. Ser. 92, 99–109 (1993)

    ADS  Google Scholar 

  36. 36

    Waterbury, J. B. & Valois, F. W. Resistance to co-occurring phages enables marine Synechococcus communities to coexist with cyanophage abundant in seawater. Appl. Environ. Microbiol. 59, 3393–3399 (1993)

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37

    Wilson, W. H., Joint, I. R., Carr, N. G. & Mann, N. H. Isolation and molecular characterization of five marine cyanophages propogated on Synechococcus sp. strain WH 7803. Appl. Environ. Microbiol. 59, 3736–3743 (1993)

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38

    Fuller, N. J., Wilson, W. H., Joint, I. R. & Mann, N. H. Occurrence of a sequence in marine cyanophages similar to that of T4 g20 and its application to PCR-based detection and quantification techniques. Appl. Environ. Microbiol. 64, 2051–2060 (1998)

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39

    Lu, J., Chen, F. & Hodson, R. E. Distribution, isolation, host specificity, and diversity of cyanophages infecting marine Synechococcus spp. in river estuaries. Appl. Environ. Microbiol. 67, 3285–3290 (2001)

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Chen, F. & Lu, J. Genomic sequence and evolution of marine cyanophage P60: a new insight on lytic and lysogenic phages. Appl. Environ. Microbiol. 68, 2589–2594 (2002)

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41

    Marston, M. F. & Sallee, J. L. Genetic diversity and temporal variation in the cyanophage community infecting marine Synechococcus species in Rhode Island’s coastal waters. Appl. Environ. Microbiol. 69, 4639–4647 (2003)

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42

    Sullivan, M. B., Waterbury, J. B. & Chisholm, S. W. Cyanophages infecting the oceanic cyanobacterium Prochlorococcus. Nature 424, 1047–1051 (2003)

    ADS  CAS  PubMed  Google Scholar 

  43. 43

    Wang, K. & Chen, F. Prevalence of highly host-specific cyanophages in the estuarine environment. Environ. Microbiol. 10, 300–312 (2008)

    CAS  PubMed  Google Scholar 

  44. 44

    Kuznetsov, Y. G., Chang, S.-C., Credaroli, A., Martiny, J. & McPherson, A. An atomic force microscopy investigation of cyanophage structure. Micron 43, 1336–1342 (2012)

    CAS  PubMed  Google Scholar 

  45. 45

    Solonenko, S. A. & Sullivan, M. B. Preparation of metagenomic libraries from naturally occurring marine viruses. Methods Enzymol. 531, 143–165 (2013)

    CAS  PubMed  Google Scholar 

  46. 46

    Holmfeldt, K. et al. Twelve previously unknown phage genera are ubiquitous in the global oceans. Proc. Natl Acad. Sci. USA 110, 12798–12803 (2013)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Hess, M. et al. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331, 463–467 (2011)

    ADS  CAS  PubMed  Google Scholar 

  48. 48

    Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  49. 49

    Hyatt, D., Chen, G. L., Locascio, P. F., Land, M. L., Larimer, F. W. & Hauser, L. J. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010)

    PubMed  PubMed Central  Google Scholar 

  50. 50

    Niu, B., Fu, L., Sun, S. & Li, W. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics 11, 187 (2010)

    PubMed  PubMed Central  Google Scholar 

  51. 51

    Yooseph, S. et al. The Sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol. 5, e16 (2007)

    PubMed  PubMed Central  Google Scholar 

  52. 52

    Chao, A. & Lee, S. M. Estimating the number of classes via sample coverage. J. Am. Stat. Assoc. 87, 210–217 (1992)

    MathSciNet  MATH  Google Scholar 

  53. 53

    Simpson, E. H. Measurement of diversity. Nature 163, 688 (1949)

    ADS  MATH  Google Scholar 

  54. 54

    Akhter, S., Aziz, R. K. & Edwards, R. A. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combine similarity- and composition-based strategies. Nucleic Acids Res. 40, e126 (2012)

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55

    Kettler, G. C. et al. Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet. 3, e231 (2007)

    PubMed  PubMed Central  Google Scholar 

  56. 56

    Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004)

    PubMed  PubMed Central  Google Scholar 

  57. 57

    Dunn, J. C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Cybern. Syst. 3, 32–57 (1973)

    MathSciNet  MATH  Google Scholar 

  58. 58

    Coleman, M. L. et al. Genomic islands and the ecology and evolution of Prochlorococcus. Science 311, 1768–1770 (2006)

    ADS  CAS  PubMed  Google Scholar 

Download references


Funding was provided by the US Department of Energy (DOE) Joint Genome Institute (JGI) Community Sequencing Program, Biosphere 2, BIO5, US National Science Foundation (NSF) OCE0940390, and Gordon and Betty Moore Foundation grants to M.B.S., as well as NSF OCE1233760 and Burroughs Wellcome Fund grants to J.S.W. We thank J. Fuhrman for suggesting stable-isotope-labelled host DNA; A. Z. Worden and the CANON Initiative for the cruise opportunity; Worden laboratory members; the captain and crew of the R/V Western Flyer for operational/sampling support; J. B. Waterbury, S. W. Chisholm and A. Wichels for strains; the Tucson Marine Phage laboratory; Institute of Groundwater Ecology of Helmholtz Munich; and N. Pace, M. Young, S. W. Chisholm and S. Yilmaz for technical/analytical support and manuscript comments. We acknowledge the University of Arizona Genetics Core for viral-tagging metagenomic sequencing; iCyt and the Arizona Cancer Center and Arizona Research Laboratories (ARL) Division of Biotechnology Cytometry Core Facility for cytometry support; the University Information Technology Services Research Computing Group and the ARL Biotechnology Computing for high-performance computing clusters (HPCC) access and support. Community metagenomic sequencing was provided by the DOE JGI Community Sequencing Program under the Office of Science of the US DOE contract no. DE-AC02-05CH11231.

Author information




L.D., P.H. and M.B.S. designed the experiments. L.D. collected samples. L.D., A.C.G. and B.T.P. performed the experiments. L.D., J.C.I.-E., J.S.W., P.H. and M.B.S. analysed data, interpreted results and wrote the paper.

Corresponding author

Correspondence to Matthew B. Sullivan.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Agarose gel of PCR products used for screening the 97 cyanophage isolates derived from this study.

Primers used have a well-understood and strong history in the literature and amplify a 400 bp region of the portal protein encoded gene (gp20) of T4-like phages. ΦC refers to phages S-MbC.

Extended Data Figure 2 The viral-tagging metagenome is less complex than the whole viral community metagenome.

a, Diversity of the viral-tagging (VT) metagenome shows a five-to-tenfold reduction when compared to the community metagenome by different metrics applied to protein clusters with only 17.5% (1,360 of 7,762) of the viral-tagging protein clusters occurring in the community metagenome (Venn diagram). b, The viral taxonomic profiles from each metagenome assigned by BLASTx search (e-value <0.001) against all phage genomes present in NCBI (1,218 genomes, December, 2013), and compared against the designations from cultured isolates (n indicates number of reads on top of metagenome bars and number of phage isolates on top of isolates bars; percentage of metagenome bars represent the percentage of reads used).

Extended Data Figure 3 Candidatus genomes assembled from the dominant viral-tagging metagenome populations that were not T4-like myoviruses.

a, Black boxes represent the predicted ORFs, blue box-plots reflect the intrapopulation locus-to-locus variation as described in Fig. 2. A cumulative distribution plot of the genome-wide locus-by-locus percentage nucleotide identity is represented to the right of each genome. Colours denote the taxonomic assignment for each gene, based on blastp best hits to nr, for detailed annotation refer to Supplementary Data 1. CGs, Candidatus genomes. b, Normalized (corrected for contig length) coverage of all Candidatus genomes including the rare non-T4-like ones (in dark grey). c, Quantification of the relative abundances of the non-T4-like Candidatus genomes.

Extended Data Figure 4 Genome sizing and location in silico experiments.

a, Variation along the genome was investigated by in silico breaking the genome into 30 kb fragments using a 5 kb sliding window to compare the similarity profile (ANI of the genome or fragment versus the reference genomes) of the fragment to that derived from the whole genome, just a subset shown. Where Pearson’s r is high (>0.95, the right side of the genomes in blue) the fragment profile parallels that of a genome-wide profile. b, These similarity profiles were converted to a correlation distance (1 − Pearson’s r) and then clustered using hierarchical clustering (linkage = ‘complete’ or furthest distance). Comparison of the clustering patterns showed that 30 kb fragments from within the ‘blue’ region of the genome are more closely related to those derived from their own genomes than other genomes, except for the co-isolated phages P-HM1 and P-HM2, which were the most similar genomes in the data set.

Extended Data Figure 5 Sensitivity analyses for recruitment parameters.

Recruitment, as described in the main text, required 95% nucleotide identity over 95% of the length of the reads, whereas here we examine lower stringency recruitment including 90% nucleotide identity over 90% of the length of the read, and 80% nucleotide identity over 90% of the length of the read. These results were consistent with those described in the main text (Fig. 2)—that is, most of the recruited reads are in the top 2% (see histogram on the right). Only four representative Candidatus genomes (CGs) are shown here.

Extended Data Figure 6 Comparison of neighbour-joining trees derived from viral-tagging and phylogenomic analyses.

The left panel represents Euclidean distances of the three-dimensional space reconstructed with the first three principal components in Fig. 1. The right panel represents the currently accepted cyano-T4 phage core phylogeny derived from analysis of 57 concatenated proteins totalling 20,638 amino acids.

Extended Data Figure 7 Exploring viral-tagging metagenomic population sequence space.

a, Whole genome comparisons of isolates S-MbCM25 and S-MbCM6 show that they are part of the same population as CG-05, while isolate S-MbCM7 appears to be part of the CG-11 population (lines connecting reciprocal blast hits >95% identity). Note in Fig. 1 how the variation (‘cloud’) associated with CG-05 and CG-11 overlap with their representative isolates. b, Alignment, based on homologues sequences within each contig, of all assembled T4-like viral-tagging contigs (>1.5 kb) against CG-01 as a reference genome. At the deepest point (around the 205 kb mark, orange bar) there are a total of 14 to 17 overlapping contigs. c, Rank abundance curve for the 26 most abundant Candidatus genomes (CGs) in the viral-tagging source waters. Values are derived from mean contig coverage values. The blue line quantifies the cumulative use of reads as more genomes are added.

Extended Data Figure 8 Flow diagram describing the bioinformatics processing steps.

a, We recruited reads to each Candidatus genome requiring at least 95% identity and a coverage of 95% of the entire length of the read. Each read was non-redundantly assigned and aligned to a Candidatus genome using default parameters in MUSCLE. b, For each Candidatus genome population, we generated 100 random Candidatus genome sequences by probabilistically resampling (using the observed occurrences) the metagenomic data that went into generating their consensus sequences.

Extended Data Table 1 Phage isolates used in the study of phage infectivity indication by viral tagging and plaques assay
Extended Data Table 2 Pairwise nucleotide and amino acid identity calculated between all shared genes for each pair of Candidatus genomes

Supplementary information

Supplementary Table 1

Morphological taxonomic assignation of published viral isolates on Synechococcus WH7803. (PDF 106 kb)

Supplementary Data 1

Annotation of open reading frames (ORFs) on 26 CGs (worksheet1) and all small contigs larger than 1.5 Kb (worksheet2). (XLS 566 kb)

Supplementary Data 2

This file contains source data for Table 1 (in the main paper), Extended Data Figure 3 and Extended Data Figure 7. (XLSX 37 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Deng, L., Ignacio-Espinoza, J., Gregory, A. et al. Viral tagging reveals discrete populations in Synechococcus viral genome sequence space. Nature 513, 242–245 (2014).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing