Microbes and their viruses drive myriad processes across ecosystems ranging from oceans and soils to bioreactors and humans1,2,3,4. Despite this importance, microbial diversity is only now being mapped at scales relevant to nature5, while the viral diversity associated with any particular host remains little researched. Here we quantify host-associated viral diversity using viral-tagged metagenomics, which links viruses to specific host cells for high-throughput screening and sequencing. In a single experiment, we screened 107 Pacific Ocean viruses against a single strain of Synechococcus and found that naturally occurring cyanophage genome sequence space is statistically clustered into discrete populations. These population-based, host-linked viral ecological data suggest that, for this single host and seawater sample alone, there are at least 26 double-stranded DNA viral populations with estimated relative abundances ranging from 0.06 to 18.2%. These populations include previously cultivated cyanophage and new viral types missed by decades of isolate-based studies. Nucleotide identities of homologous genes mostly varied by less than 1% within populations, even in hypervariable genome regions, and by 42–71% between populations, which provides benchmarks for viral metagenomics and genome-based viral species definitions. Together these findings showcase a new approach to viral ecology that quantitatively links objectively defined environmental viral populations, and their genomes, to their hosts.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Data for viral genomes have been deposited in GenBank under accession numbers JN371768 and KF156338-40; metagenomic data have been deposited in CAMERA under accession numbers CAM_P_0001068 and CAM_P_0000915; raw data including gp23 sequences and informatic pipelines, assemblies and data for figures are available at http://datadryad.org/resource/doi:10.5061/dryad.gr3ks.
Bergh, O., Børsheim, K. Y., Bratbak, G. & Heldal, M. High abundance of viruses found in aquatic environments. Nature 340, 467–468 (1989)
Breitbart, M. Marine viruses: truth or dare. Annu. Rev. Mar. Sci. 4, 425–448 (2012)
Suttle, C. A. Marine viruses—major players in the global ecosystem. Nature Rev. Microbiol. 5, 801–812 (2007)
Hurwitz, B. L., Hallam, S. J. & Sullivan, M. B. Metabolic reprogramming by viruses in the sunlit and dark ocean. Genome Biol. 14, R123 (2013)
Flombaum, P. et al. Present and future global distributions of the marine Cyanobacteria Prochlorococcus and Synechococcus. Proc. Natl Acad. Sci. USA 110, 9824–9829 (2013)
Holmfeldt, K., Odić, D., Sullivan, M. B., Middelboe, M. & Riemann, L. Cultivated single-stranded DNA phages that infect marine Bacteroidetes prove difficult to detect with DNA-binding stains. Appl. Environ. Microbiol. 78, 892–894 (2012)
Hatfull, G. F., Jacobs-Sera, D. & Lawrence, J. G. Comparative genomic analysis of 60 mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. J. Mol. Biol. 397, 119–143 (2010)
Lavigne, R., Seto, D., Mahadevan, P., Ackermann, H.-W. & Kropinski, A. M. Unifying classical and molecular taxonomic classification: analysis of the Podoviridae using BLASTP-based tools. Res. Microbiol. 159, 406–414 (2008)
Sullivan, M. B. et al. Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments. Environ. Microbiol. 12, 3035–3056 (2010)
Mann, N. H. et al. The genome of S-PM2, a ‘photosynthetic’ T4-type bacteriophage that infects marine Synechococcus strains. J. Bacteriol. 187, 3188–3200 (2005)
Deng, L., Gregory, A., Yilmaz, S., Poulos, B. T., Hugenholtz, P. & Sullivan, M. B. Contrasting life strategies of viruses that infect photo- and heterotrophic bacteria, as revealed by viral tagging. MBio 3, e00373–12 (2012)
Duhaime, M. B., Deng, L., Poulos, B. T. & Sullivan, M. B. Towards quantitative metagenomics of wild viruses and other ultra-low concentration DNA samples: a rigorous assessment and optimization of the linker amplification method. Environ. Microbiol. 14, 2526–2537 (2012)
Weitz, J. S. et al. Phage–bacteria infection networks. Trends Microbiol. 21, 82–91 (2013)
Sharon, I. et al. Comparative metagenomics of microbial traits within oceanic viral communities. ISME J. 5, 1178–1190 (2011)
Sharon, I. et al. Photosystem I gene cassettes are present in marine virus genomes. Nature 461, 258–262 (2009)
Sullivan, M. B., Coleman, M. L., Weigele, P., Rohwer, F. & Chisholm, S. W. Three Prochlorococcus cyanophage genomes: signature features and ecological interpretations. PLoS Biol. 3, e144 (2005)
Millard, A. D., Zwirglmaier, K., Downey, M. J., Mann, N. H. & Scanlan, D. J. Comparative genomics of marine cyanomyoviruses reveals the widespread occurrence of Synechococcus host genes localized to a hyperplastic region: implications for mechanisms of cyanophage evolution. Environ. Microbiol. 11, 2370–2387 (2009)
Angly, F. et al. Genomic analysis of multiple Roseophage SIO1 strains. Environ. Microbiol. 11, 2863–2873 (2009)
Konstantinidis, K. T. Genomic insights that advance the species definition for prokaryotes. Proc. Natl Acad. Sci. USA 102, 2567–2572 (2005)
Lawrence, J. G., Hatfull, G. F. & Hendrix, R. W. Imbroglios of viral taxonomy: genetic exchange and failings of phenetic approaches. J. Bacteriol. 184, 4891–4905 (2002)
Hendrix, R. W., Lawrence, J. G., Hatfull, G. F. & Casjens, S. The origins and ongoing evolution of viruses. Trends Microbiol. 8, 504–508 (2000)
Hatfull, G. F. The secret lives of mycobacteriophages. Adv. Virus Res. 82, 179–288 (2012)
Hendrix, R. W., Smith, M. C., Burns, R. N., Ford, M. E. & Hatfull, G. F. Evolutionary relationships among diverse bacteriophages and prophages: all the world’s a phage. Proc. Natl Acad. Sci. USA 96, 2192–2197 (1999)
Marston, M. F. & Amrich, C. G. Recombination and microdiversity in coastal marine cyanophages. Environ. Microbiol. 11, 2893–2903 (2009)
Ignacio-Espinoza, J. C. & Sullivan, M. B. Phylogenomics of T4 cyanophages: lateral gene transfer in the ‘core’ and origins of host genes. Environ. Microbiol. 14, 2113–2126 (2012)
Labonté, J. M. & Suttle, C. A. Previously unknown and highly divergent ssDNA viruses populate the oceans. ISME J. 7, 2169–2177 (2013)
Polz, M. F., Alm, E. J. & Hanage, W. P. Horizontal gene transfer and the evolution of bacterial and archaeal population structure. Trends Genet. 29, 170–175 (2013)
Whittaker, R. H. Dominance and diversity in land plant communities: numerical relations of species express the importance of competition in community function and evolution. Science 147, 250–260 (1965)
Ceyssens, P. J. et al. Phenotypic and genotypic variations within a single bacteriophage species. Virol. J. 8, 134 (2011)
Shapiro, B. J. et al. Population genomics of early events in the ecological differentiation of bacteria. Science 336, 48–51 (2012)
Waterbury, J. B., Watson, S. W., Valois, F. W. & Franks, D. G. Biological and ecological characterization of the marine unicellular cyanobacterium Synechococcus. Can. Bull. Fish. Aquat. Sci. 214, 71–120 (1986)
John, S. G. et al. A simple and efficient method for concentration of ocean viruses by chemical flocculation. Environ. Microbiol. Rep. 3, 195–202 (2011)
Hurwitz, B. L., Deng, L., Poulos, B. P. & Sullivan, M. B. Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics. Environ. Microbiol. 15, 1428–1440 (2013)
Deng, L. & Hayes, P. K. Evidence for cyanophages active against bloom-forming freshwater cyanobacteria. Freshw. Biol. 53, 1240–1252 (2008)
Suttle, C. A. & Chan, A. M. Marine cyanophages infecting oceanic and coastal strains of Synechococcus: abundance, morphology, cross-infectivity and growth characteristics. Mar. Ecol. Prog. Ser. 92, 99–109 (1993)
Waterbury, J. B. & Valois, F. W. Resistance to co-occurring phages enables marine Synechococcus communities to coexist with cyanophage abundant in seawater. Appl. Environ. Microbiol. 59, 3393–3399 (1993)
Wilson, W. H., Joint, I. R., Carr, N. G. & Mann, N. H. Isolation and molecular characterization of five marine cyanophages propogated on Synechococcus sp. strain WH 7803. Appl. Environ. Microbiol. 59, 3736–3743 (1993)
Fuller, N. J., Wilson, W. H., Joint, I. R. & Mann, N. H. Occurrence of a sequence in marine cyanophages similar to that of T4 g20 and its application to PCR-based detection and quantification techniques. Appl. Environ. Microbiol. 64, 2051–2060 (1998)
Lu, J., Chen, F. & Hodson, R. E. Distribution, isolation, host specificity, and diversity of cyanophages infecting marine Synechococcus spp. in river estuaries. Appl. Environ. Microbiol. 67, 3285–3290 (2001)
Chen, F. & Lu, J. Genomic sequence and evolution of marine cyanophage P60: a new insight on lytic and lysogenic phages. Appl. Environ. Microbiol. 68, 2589–2594 (2002)
Marston, M. F. & Sallee, J. L. Genetic diversity and temporal variation in the cyanophage community infecting marine Synechococcus species in Rhode Island’s coastal waters. Appl. Environ. Microbiol. 69, 4639–4647 (2003)
Sullivan, M. B., Waterbury, J. B. & Chisholm, S. W. Cyanophages infecting the oceanic cyanobacterium Prochlorococcus. Nature 424, 1047–1051 (2003)
Wang, K. & Chen, F. Prevalence of highly host-specific cyanophages in the estuarine environment. Environ. Microbiol. 10, 300–312 (2008)
Kuznetsov, Y. G., Chang, S.-C., Credaroli, A., Martiny, J. & McPherson, A. An atomic force microscopy investigation of cyanophage structure. Micron 43, 1336–1342 (2012)
Solonenko, S. A. & Sullivan, M. B. Preparation of metagenomic libraries from naturally occurring marine viruses. Methods Enzymol. 531, 143–165 (2013)
Holmfeldt, K. et al. Twelve previously unknown phage genera are ubiquitous in the global oceans. Proc. Natl Acad. Sci. USA 110, 12798–12803 (2013)
Hess, M. et al. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331, 463–467 (2011)
Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005)
Hyatt, D., Chen, G. L., Locascio, P. F., Land, M. L., Larimer, F. W. & Hauser, L. J. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010)
Niu, B., Fu, L., Sun, S. & Li, W. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics 11, 187 (2010)
Yooseph, S. et al. The Sorcerer II global ocean sampling expedition: expanding the universe of protein families. PLoS Biol. 5, e16 (2007)
Chao, A. & Lee, S. M. Estimating the number of classes via sample coverage. J. Am. Stat. Assoc. 87, 210–217 (1992)
Simpson, E. H. Measurement of diversity. Nature 163, 688 (1949)
Akhter, S., Aziz, R. K. & Edwards, R. A. PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combine similarity- and composition-based strategies. Nucleic Acids Res. 40, e126 (2012)
Kettler, G. C. et al. Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet. 3, e231 (2007)
Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004)
Dunn, J. C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Cybern. Syst. 3, 32–57 (1973)
Coleman, M. L. et al. Genomic islands and the ecology and evolution of Prochlorococcus. Science 311, 1768–1770 (2006)
Funding was provided by the US Department of Energy (DOE) Joint Genome Institute (JGI) Community Sequencing Program, Biosphere 2, BIO5, US National Science Foundation (NSF) OCE0940390, and Gordon and Betty Moore Foundation grants to M.B.S., as well as NSF OCE1233760 and Burroughs Wellcome Fund grants to J.S.W. We thank J. Fuhrman for suggesting stable-isotope-labelled host DNA; A. Z. Worden and the CANON Initiative for the cruise opportunity; Worden laboratory members; the captain and crew of the R/V Western Flyer for operational/sampling support; J. B. Waterbury, S. W. Chisholm and A. Wichels for strains; the Tucson Marine Phage laboratory; Institute of Groundwater Ecology of Helmholtz Munich; and N. Pace, M. Young, S. W. Chisholm and S. Yilmaz for technical/analytical support and manuscript comments. We acknowledge the University of Arizona Genetics Core for viral-tagging metagenomic sequencing; iCyt and the Arizona Cancer Center and Arizona Research Laboratories (ARL) Division of Biotechnology Cytometry Core Facility for cytometry support; the University Information Technology Services Research Computing Group and the ARL Biotechnology Computing for high-performance computing clusters (HPCC) access and support. Community metagenomic sequencing was provided by the DOE JGI Community Sequencing Program under the Office of Science of the US DOE contract no. DE-AC02-05CH11231.
The authors declare no competing financial interests.
Extended data figures and tables
Extended Data Figure 1 Agarose gel of PCR products used for screening the 97 cyanophage isolates derived from this study.
Primers used have a well-understood and strong history in the literature and amplify a ∼400 bp region of the portal protein encoded gene (gp20) of T4-like phages. ΦC refers to phages S-MbC.
Extended Data Figure 2 The viral-tagging metagenome is less complex than the whole viral community metagenome.
a, Diversity of the viral-tagging (VT) metagenome shows a five-to-tenfold reduction when compared to the community metagenome by different metrics applied to protein clusters with only 17.5% (1,360 of 7,762) of the viral-tagging protein clusters occurring in the community metagenome (Venn diagram). b, The viral taxonomic profiles from each metagenome assigned by BLASTx search (e-value <0.001) against all phage genomes present in NCBI (1,218 genomes, December, 2013), and compared against the designations from cultured isolates (n indicates number of reads on top of metagenome bars and number of phage isolates on top of isolates bars; percentage of metagenome bars represent the percentage of reads used).
Extended Data Figure 3 Candidatus genomes assembled from the dominant viral-tagging metagenome populations that were not T4-like myoviruses.
a, Black boxes represent the predicted ORFs, blue box-plots reflect the intrapopulation locus-to-locus variation as described in Fig. 2. A cumulative distribution plot of the genome-wide locus-by-locus percentage nucleotide identity is represented to the right of each genome. Colours denote the taxonomic assignment for each gene, based on blastp best hits to nr, for detailed annotation refer to Supplementary Data 1. CGs, Candidatus genomes. b, Normalized (corrected for contig length) coverage of all Candidatus genomes including the rare non-T4-like ones (in dark grey). c, Quantification of the relative abundances of the non-T4-like Candidatus genomes.
a, Variation along the genome was investigated by in silico breaking the genome into 30 kb fragments using a 5 kb sliding window to compare the similarity profile (ANI of the genome or fragment versus the reference genomes) of the fragment to that derived from the whole genome, just a subset shown. Where Pearson’s r is high (>0.95, the right side of the genomes in blue) the fragment profile parallels that of a genome-wide profile. b, These similarity profiles were converted to a correlation distance (1 − Pearson’s r) and then clustered using hierarchical clustering (linkage = ‘complete’ or furthest distance). Comparison of the clustering patterns showed that 30 kb fragments from within the ‘blue’ region of the genome are more closely related to those derived from their own genomes than other genomes, except for the co-isolated phages P-HM1 and P-HM2, which were the most similar genomes in the data set.
Recruitment, as described in the main text, required 95% nucleotide identity over 95% of the length of the reads, whereas here we examine lower stringency recruitment including 90% nucleotide identity over 90% of the length of the read, and 80% nucleotide identity over 90% of the length of the read. These results were consistent with those described in the main text (Fig. 2)—that is, most of the recruited reads are in the top 2% (see histogram on the right). Only four representative Candidatus genomes (CGs) are shown here.
Extended Data Figure 6 Comparison of neighbour-joining trees derived from viral-tagging and phylogenomic analyses.
The left panel represents Euclidean distances of the three-dimensional space reconstructed with the first three principal components in Fig. 1. The right panel represents the currently accepted cyano-T4 phage core phylogeny derived from analysis of 57 concatenated proteins totalling 20,638 amino acids.
a, Whole genome comparisons of isolates S-MbCM25 and S-MbCM6 show that they are part of the same population as CG-05, while isolate S-MbCM7 appears to be part of the CG-11 population (lines connecting reciprocal blast hits >95% identity). Note in Fig. 1 how the variation (‘cloud’) associated with CG-05 and CG-11 overlap with their representative isolates. b, Alignment, based on homologues sequences within each contig, of all assembled T4-like viral-tagging contigs (>1.5 kb) against CG-01 as a reference genome. At the deepest point (around the 205 kb mark, orange bar) there are a total of 14 to 17 overlapping contigs. c, Rank abundance curve for the 26 most abundant Candidatus genomes (CGs) in the viral-tagging source waters. Values are derived from mean contig coverage values. The blue line quantifies the cumulative use of reads as more genomes are added.
a, We recruited reads to each Candidatus genome requiring at least 95% identity and a coverage of 95% of the entire length of the read. Each read was non-redundantly assigned and aligned to a Candidatus genome using default parameters in MUSCLE. b, For each Candidatus genome population, we generated 100 random Candidatus genome sequences by probabilistically resampling (using the observed occurrences) the metagenomic data that went into generating their consensus sequences.
Morphological taxonomic assignation of published viral isolates on Synechococcus WH7803. (PDF 106 kb)
Annotation of open reading frames (ORFs) on 26 CGs (worksheet1) and all small contigs larger than 1.5 Kb (worksheet2). (XLS 566 kb)
This file contains source data for Table 1 (in the main paper), Extended Data Figure 3 and Extended Data Figure 7. (XLSX 37 kb)
About this article
Cite this article
Deng, L., Ignacio-Espinoza, J., Gregory, A. et al. Viral tagging reveals discrete populations in Synechococcus viral genome sequence space. Nature 513, 242–245 (2014). https://doi.org/10.1038/nature13459
Nature Reviews Microbiology (2020)
Cellular & Molecular Immunology (2020)
Metagenomic Analysis of the Effect of Enteromorpha prolifera Bloom on Microbial Community and Function in Aquaculture Environment
Current Microbiology (2020)
Nature Reviews Microbiology (2020)
Microbiome Engineering: Synthetic Biology of Plant-Associated Microbiomes in Sustainable Agriculture
Trends in Biotechnology (2020)