Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses

Roux, Simon; Brum, Jennifer R.; Dutilh, Bas E.; Sunagawa, Shinichi; Duhaime, Melissa B.; Loy, Alexander; Poulos, Bonnie T.; Solonenko, Natalie; Lara, Elena; Poulain, Julie; Pesant, Stéphane; Kandels-Lewis, Stefanie; Dimier, Céline; Picheral, Marc; Searson, Sarah; Cruaud, Corinne; Alberti, Adriana; Duarte, Carlos M.; Gasol, Josep M.; Vaqué, Dolors; Bork, Peer; Acinas, Silvia G.; Wincker, Patrick; Sullivan, Matthew B.

doi:10.1038/nature19366

Letter
Published: 21 September 2016

Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses

Simon Roux¹,
Jennifer R. Brum¹,
Bas E. Dutilh^2,3,4,
Shinichi Sunagawa⁵^nAff27,
Melissa B. Duhaime⁶,
Alexander Loy^7,8,
Bonnie T. Poulos⁹,
Natalie Solonenko¹,
Elena Lara^10,11,
Julie Poulain¹²,
Stéphane Pesant^13,14,
Stefanie Kandels-Lewis^5,15,
Céline Dimier^16,17,18,
Marc Picheral^19,20,
Sarah Searson^19,20,
Corinne Cruaud¹²,
Adriana Alberti¹²,
Carlos M. Duarte^21,22,
Josep M. Gasol¹⁰,
Dolors Vaqué¹⁰,
Tara Oceans Coordinators,
Peer Bork^5,23,
Silvia G. Acinas¹⁰,
Patrick Wincker^12,24,25 &
…
Matthew B. Sullivan^1,26

Nature volume 537, pages 689–693 (2016)Cite this article

28k Accesses
446 Citations
487 Altmetric
Metrics details

Subjects

Abstract

Ocean microbes drive biogeochemical cycling on a global scale¹. However, this cycling is constrained by viruses that affect community composition, metabolic activity, and evolutionary trajectories^2,3. Owing to challenges with the sampling and cultivation of viruses, genome-level viral diversity remains poorly described and grossly understudied, with less than 1% of observed surface-ocean viruses known⁴. Here we assemble complete genomes and large genomic fragments from both surface- and deep-ocean viruses sampled during the Tara Oceans and Malaspina research expeditions^5,6, and analyse the resulting ‘global ocean virome’ dataset to present a global map of abundant, double-stranded DNA viruses complete with genomic and ecological contexts. A total of 15,222 epipelagic and mesopelagic viral populations were identified, comprising 867 viral clusters (defined as approximately genus-level groups^7,8). This roughly triples the number of known ocean viral populations⁴ and doubles the number of candidate bacterial and archaeal virus genera⁸, providing a near-complete sampling of epipelagic communities at both the population and viral-cluster level. We found that 38 of the 867 viral clusters were locally or globally abundant, together accounting for nearly half of the viral populations in any global ocean virome sample. While two-thirds of these clusters represent newly described viruses lacking any cultivated representative, most could be computationally linked to dominant, ecologically relevant microbial hosts. Moreover, we identified 243 viral-encoded auxiliary metabolic genes, of which only 95 were previously known. Deeper analyses of four of these auxiliary metabolic genes (dsrC, soxYZ, P-II (also known as glnB) and amoC) revealed that abundant viruses may directly manipulate sulfur and nitrogen cycling throughout the epipelagic ocean. This viral catalog and functional analyses provide a necessary foundation for the meaningful integration of viruses into ecosystem models where they act as key players in nutrient cycling and trophic networks.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Composition of the Global Ocean Viromes (GOV) dataset.**

**Figure 2: Characterization of the dominant oceanic viral clusters.**

**Figure 3: Characterization and distribution of viral AMGs involved in sulfur and nitrogen cycles.**

Dearomatization drives complexity generation in freshwater organic matter

Article Open access 24 April 2024

Lineage dynamics of the endosymbiotic cell type in the soft coral Xenia

Article Open access 17 June 2020

Biogeographic response of marine plankton to Cenozoic environmental changes

Article 17 April 2024

References

Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive Earth’s biogeochemical cycles. Science 320, 1034–1039 (2008)
Article ADS CAS PubMed Google Scholar
Rohwer, F. & Thurber, R. V. Viruses manipulate the marine environment. Nature 459, 207–212 (2009)
Article ADS CAS PubMed Google Scholar
Brum, J. R. & Sullivan, M. B. Rising to the challenge: accelerated pace of discovery transforms marine virology. Nat. Rev. Microbiol. 13, 147–159 (2015)
Article CAS PubMed Google Scholar
Brum, J. et al. Patterns and ecological drivers of ocean viral communities. Science 348, 1261498 (2015)
Article CAS PubMed Google Scholar
Karsenti, E. et al. A holistic approach to marine eco-systems biology. PLoS Biol . 9, e1001177 (2011)
Article CAS PubMed PubMed Central Google Scholar
Duarte, C. M. Seafaring in the 21st century: the Malaspina 2010 circumnavigation expedition. Limnol. Oceanogr. 24, 11–14 (2015)
Google Scholar
Lima-Mendez, G., Van Helden, J., Toussaint, A. & Leplae, R. Reticulate representation of evolutionary and functional relationships between phage genomes. Mol. Biol. Evol. 25, 762–777 (2008)
Article CAS PubMed Google Scholar
Roux, S., Hallam, S. J., Woyke, T. & Sullivan, M. B. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. eLife 4, 1–20 (2015)
Article Google Scholar
Mizuno, C. M., Rodriguez-Valera, F., Kimes, N. E. & Ghai, R. Expanding the marine virosphere using metagenomics. PLoS Genet . 9, e1003987 (2013)
Article CAS PubMed PubMed Central Google Scholar
Chow, C.-E. T., Winget, D. M., White, R. A., III, Hallam, S. J. & Suttle, C. A. Combining genomic sequencing methods to explore viral diversity and reveal potential virus-host interactions. Front. Microbiol. 6, 265 (2015)
PubMed PubMed Central Google Scholar
Roux, S. et al. Ecology and evolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell- and meta-genomics. eLife 3, e03125 (2014)
Article CAS PubMed PubMed Central Google Scholar
Dutilh, B. E. et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun . 5, 4498 (2014)
Article ADS CAS PubMed Google Scholar
Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013)
Article CAS PubMed Google Scholar
Sullivan, M. B. et al. Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments. Environ. Microbiol. 12, 3035–3056 (2010)
Article CAS PubMed PubMed Central Google Scholar
Zhao, Y. et al. Abundant SAR11 viruses in the ocean. Nature 494, 357–360 (2013)
Article ADS CAS PubMed Google Scholar
Labrie, S. J. et al. Genomes of marine cyanopodoviruses reveal multiple origins of diversity. Environ. Microbiol. 15, 1356–1376 (2013)
Article CAS PubMed Google Scholar
Andersson, A. F. & Banfield, J. F. Virus population dynamics and acquired virus resistance in natural microbial communities. Science 320, 1047–1050 (2008)
Article ADS CAS PubMed Google Scholar
Sunagawa, S. et al. Ocean plankton. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015)
Article CAS PubMed Google Scholar
Flores, C. O., Valverde, S. & Weitz, J. S. Multi-scale structure and geographic drivers of cross-infection within marine bacteria and phages. ISME J . 7, 520–532 (2013)
Article PubMed Google Scholar
Hurwitz, B. L., Brum, J. R. & Sullivan, M. B. Depth-stratified functional and taxonomic niche specialization in the ‘core’ and ‘flexible’ Pacific Ocean Virome. ISME J . 9, 472–484 (2015)
Article CAS PubMed Google Scholar
Anantharaman, K. et al. Sulfur oxidation genes in diverse deep-sea viruses. Science 344, 757–760 (2014)
Article ADS CAS PubMed Google Scholar
Friedrich, C. G., Bardischewsky, F., Rother, D., Quentmeier, A. & Fischer, J. Prokaryotic sulfur oxidation. Curr. Opin. Microbiol. 8, 253–259 (2005)
Article CAS PubMed Google Scholar
Santos, A. A. et al. A protein trisulfide couples dissimilatory sulfate reduction to energy conservation. Science 350, 1541–1545 (2015)
Article ADS CAS PubMed Google Scholar
Venceslau, S. S., Stockdreher, Y., Dahl, C. & Pereira, I. A. C. The “bacterial heterodisulfide” DsrC is a key protein in dissimilatory sulfur metabolism. Biochim. Biophys. Acta 1837, 1148–1164 (2014)
Article CAS PubMed Google Scholar
Dahl, C., Franz, B., Hensen, D., Kesselheim, A. & Zigann, R. Sulfite oxidation in the purple sulfur bacterium Allochromatium vinosum: identification of SoeABC as a major player and relevance of SoxYZ in the process. Microbiology 159, 2626–2638 (2013)
Article CAS PubMed Google Scholar
Huergo, L. F., Chandra, G. & Merrick & M. P. (II) signal transduction proteins: nitrogen regulation and beyond. FEMS Microbiol. Rev. 37, 251–283 (2013)
Article CAS PubMed Google Scholar
Stahl, D. A. & de la Torre, J. R. Physiology and diversity of ammonia-oxidizing archaea. Annu. Rev. Microbiol. 66, 83–101 (2012)
Article CAS PubMed Google Scholar
Loy, A. et al. Reverse dissimilatory sulfite reductase as phylogenetic marker for a subgroup of sulfur-oxidizing prokaryotes. Environ. Microbiol. 11, 289–299 (2009)
Article CAS PubMed PubMed Central Google Scholar
Pester, M., Schleper, C. & Wagner, M. The Thaumarchaeota: an emerging view of their phylogeny and ecophysiology. Curr. Opin. Microbiol. 14, 300–306 (2011)
Article CAS PubMed PubMed Central Google Scholar
Weitz, J. S. et al. A multitrophic model to quantify the effects of marine viruses on microbial food webs and ecosystem processes. ISME J . 9, 1352–1364 (2015)
Article PubMed PubMed Central Google Scholar
Arcondéguy, T., Jack, R. & Merrick & M. P. (II) signal transduction proteins, pivotal players in microbial nitrogen control. Microbiol. Mol. Biol. Rev. 65, 80–105 (2001)
Article PubMed PubMed Central Google Scholar
Pesant, S. et al. Open science resources for the discovery and analysis of Tara Oceans data. Sci. Data 2, 150023 (2015)
Article CAS PubMed PubMed Central Google Scholar
John, S. G. et al. A simple and efficient method for concentration of ocean viruses by chemical flocculation. Environ. Microbiol. Rep. 3, 195–202 (2011)
Article CAS PubMed PubMed Central Google Scholar
Hurwitz, B. L., Deng, L., Poulos, B. T. & Sullivan, M. B. Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics. Environ. Microbiol. 15, 1428–1440 (2013)
Article CAS PubMed PubMed Central Google Scholar
Aminot, A., Kérouel, R. & Coverly, S. in Practical Guidelines for the Analysis of Seawater (ed. O. Wurl ) 143–176 (CRC Press, 2009)
Tara Oceans Consortium & Tara Oceans Expedition. Registry of all samples from the Tara Oceans Expedition (2009–2013). http://dx.doi.org/10.1594/PANGAEA.842197 (2015)
Tara Oceans Consortium & Tara Oceans Expedition. Environmental context of all samples from the Tara Oceans Expedition (2009–2013). http://dx.doi.org/10.1594/PANGAEA.853810 (2015)
Tara Oceans Consortium & Tara Oceans Expedition. Biodiversity context of all samples from the Tara Oceans Expedition (2009–2013). http://dx.doi.org/10.1594/PANGAEA.853809 (2015)
Salazar, G. et al. Global diversity and biogeography of deep-sea pelagic prokaryotes. ISME J . 10, 596–608 (2016). 10.1038/ismej.2015.137
Article PubMed Google Scholar
Kultima, J. R. et al. MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS One 7, e47656 (2012)
Article ADS CAS PubMed PubMed Central Google Scholar
Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012)
Article CAS PubMed Google Scholar
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012)
Article PubMed PubMed Central Google Scholar
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006)
Article CAS PubMed Google Scholar
Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015)
Article CAS PubMed PubMed Central Google Scholar
Mavromatis, K. et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4, 495–500 (2007)
Article CAS PubMed Google Scholar
Roux, S., Krupovic, M., Debroas, D., Forterre, P. & Enault, F. Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences. Open Biol . 3, 130160 (2013)
Article CAS PubMed PubMed Central Google Scholar
Roux, S., Enault, F., Hurwitz, B. L. & Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015)
Article CAS PubMed PubMed Central Google Scholar
Pope, W. H. et al. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity. eLife 4, e06416 (2015)
Article CAS PubMed PubMed Central Google Scholar
Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)
Article CAS PubMed PubMed Central Google Scholar
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res . 42, D222–D230 (2014)
Article CAS PubMed Google Scholar
Eddy, S. R. Accelerated Profile HMM Searches. PLOS Comput. Biol. 7, e1002195 (2011)
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Brum, J. R. et al. Illuminating structural proteins in viral “dark matter” with metaproteomics. Proc. Natl Acad. Sci. USA 113, 2436–2441 (2016)
Article ADS CAS PubMed PubMed Central Google Scholar
Holmfeldt, K. et al. Twelve previously unknown phage genera are ubiquitous in global oceans. Proc. Natl Acad. Sci. USA 110, 12798–12803 (2013)
Article ADS PubMed PubMed Central Google Scholar
Kang, I., Jang, H. & Cho, J.-C. Complete genome sequences of two Persicivirga bacteriophages, P12024S and P12024L. J. Virol. 86, 8907–8908 (2012)
Article CAS PubMed PubMed Central Google Scholar
Kang, I., Oh, H.-M., Kang, D. & Cho, J.-C. Genome of a SAR116 bacteriophage shows the prevalence of this phage type in the oceans. Proc. Natl Acad. Sci. USA 110, 12343–12348 (2013)
Article ADS PubMed PubMed Central Google Scholar
Hjorleifsdottir, S., Aevarsson, A., Hreggvidsson, G. O., Fridjonsson, O. H. & Kristjansson, J. K. Isolation, growth and genome of the Rhodothermus RM378 thermophilic bacteriophage. Extremophiles 18, 261–270 (2014)
Article CAS PubMed Google Scholar
Marks, T. J. & Hamilton, P. T. Characterization of a thermophilic bacteriophage of Geobacillus kaustophilus. Arch. Virol. 159, 2771–2775 (2014)
Article CAS PubMed Google Scholar
Halmillawewa, A. P., Restrepo-Córdoba, M., Yost, C. K. & Hynes, M. F. Genomic and phenotypic characterization of Rhizobium gallicum phage vB_RglS_P106B. Microbiology 161, 611–620 (2015)
Article CAS PubMed Google Scholar
Rohwer, F. & Edwards, R. The Phage Proteomic Tree: a genome-based taxonomy for phage. J. Bacteriol. 184, 4529–4535 (2002)
Article CAS PubMed PubMed Central Google Scholar
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127–128 (2007)
Article CAS PubMed Google Scholar
Letunic, I. & Bork, P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res . 39, W475–8 (2011)
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)
Article CAS PubMed PubMed Central Google Scholar
Edwards, R. A., McNair, K., Faust, K., Raes, J. & Dutilh, B. E. Computational approaches to predict bacteriophage-host relationships. FEMS Microbiol. Rev. 40, 258–272 (2016)
Article CAS PubMed Google Scholar
Bland, C. et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209 (2007)
Article CAS PubMed PubMed Central Google Scholar
Rho, M., Wu, Y.-W., Tang, H., Doak, T. G. & Ye, Y. Diverse CRISPRs evolving in human microbiomes. PLoS Genet . 8, e1002441 (2012)
Article CAS PubMed PubMed Central Google Scholar
Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet . 16, 276–277 (2000)
Article CAS PubMed Google Scholar
Ogilvie, L. A. et al. Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences. Nat. Commun. 4, 2420 (2013)
Article ADS CAS PubMed Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011)
Article CAS PubMed PubMed Central Google Scholar
Oksanen, J. et al. The vegan package version 2.4-0; https://cran.r-project.org/web/packages/vegan/index.html (2016)
Sharon, I. et al. Comparative metagenomics of microbial traits within oceanic viral communities. ISME J . 5, 1178–1190 (2011)
Article CAS PubMed PubMed Central Google Scholar
Thompson, L. R. et al. Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. Proc. Natl Acad. Sci. USA 108, E757–E764 (2011)
Article PubMed PubMed Central Google Scholar
Dammeyer, T., Bagby, S. C., Sullivan, M. B., Chisholm, S. W. & Frankenberg-Dinkel, N. Efficient phage-mediated pigment biosynthesis in oceanic cyanobacteria. Curr. Biol. 18, 442–448 (2008)
Article CAS PubMed Google Scholar
Lindell, D., Jaffe, J. D., Johnson, Z. I., Church, G. M. & Chisholm, S. W. Photosynthesis genes in marine viruses yield proteins during host infection. Nature 438, 86–89 (2005)
Article ADS CAS PubMed Google Scholar
Lindell, D. et al. Genome-wide expression dynamics of a marine virus and host reveal features of co-evolution. Nature 449, 83–86 (2007)
Article ADS CAS PubMed Google Scholar
Sullivan, M. B. et al. Prevalence and evolution of core photosystem II genes in marine cyanobacterial viruses and their hosts. PLoS Biol . 4, e234 (2006)
Article CAS PubMed PubMed Central Google Scholar
Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004)
Article CAS PubMed PubMed Central Google Scholar
Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009)
Article CAS PubMed PubMed Central Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010)
Article ADS CAS PubMed PubMed Central Google Scholar
Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001)
Article CAS PubMed Google Scholar
Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011)
Article CAS PubMed Google Scholar
Sullivan, M. J., Petty, N. K. & Beatson, S. A. Easyfig: a genome comparison visualizer. Bioinformatics 27, 1009–1010 (2011)
Article CAS PubMed PubMed Central Google Scholar
Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protocols 5, 725–738 (2010)
Article CAS PubMed Google Scholar
Wiederstein, M. & Sippl, M. J. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res . 35, W407–10 (2007)
Article PubMed PubMed Central Google Scholar
Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013)
Article ADS CAS PubMed Google Scholar
Alberti, A. et al. Comparison of library preparation methods reveals their impact on interpretation of metatranscriptomic data. BMC Genomics 15, 912 (2014)
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank J. Weitz for advice on statistics, C. Pelikan for help with the DsrAB phylogenetic tree, C. Dahl for discussion regarding DsrC function, and members of the Sullivan and the V. Rich laboratories for suggestions and comments on this manuscript. We acknowledge support from UA high-performance computing and the Ohio Supercomputer Center. Sponsors and support for Tara Oceans and Malaspina expeditions are listed in the Supplementary Information. This viral research was funded by a National Science Foundation grant (1536989) and Gordon and Betty Moore Foundation grants (3790, 2631) to M.B.S., and the French Ministry of Research and Government through the ‘Investissements d’Avenir’ program OCEANOMICS (ANR-11-BTBR-0008) and France Genomique (ANR-10-INBS-09-08). Virus researchers were partially supported by the Water, Environmental and Energy Solutions Initiative and the Ecosystem Genomics Institute (S.R.), the Netherlands Organization for Scientific Research Vidi grant 864.14.004 and CAPES/BRASIL (B.E.D.), and the Austrian Science Fund (project P25111-B22, A.L.). Sequencing was provided by Genoscope (Tara Oceans) and DOE JGI (Malaspina). All authors approved the final manuscript. This article is contribution number 43 of the Tara Oceans expedition.

Author information

Shinichi Sunagawa
Present address: †Present address: Department of Biology, Institute of Microbiology, ETH Zurich, 8093 Zurich, Switzerland.,

Authors and Affiliations

Department of Microbiology, The Ohio State University, Columbus, 43210, Ohio, USA
Simon Roux, Jennifer R. Brum, Natalie Solonenko & Matthew B. Sullivan
Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, 3584 CH, The Netherlands
Bas E. Dutilh
Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, 6525 GA, The Netherlands
Bas E. Dutilh
Department of Marine Biology, Federal University of Rio de Janeiro, Rio de Janeiro, 21941-902, CEP, Brazil
Bas E. Dutilh
Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, 69117, Germany
Shinichi Sunagawa, Stefanie Kandels-Lewis & Peer Bork
Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, 48109, Michigan, USA
Melissa B. Duhaime
Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Research Network Chemistry Meets Microbiology, University of Vienna, Vienna, A-1090, Austria
Alexander Loy
Austrian Polar Research Institute, Vienna, A-1090, Austria
Alexander Loy
Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, 85721, Arizona, USA
Bonnie T. Poulos
Department of Marine Biology and Oceanography, Institut de Ciències del Mar (ICM), CSIC, Barcelona, E0800, Spain
Elena Lara, Josep M. Gasol, Dolors Vaqué & Silvia G. Acinas
Institute of Marine Sciences (CNR-ISMAR), National Research Council, Venezia, 30122, Italy
Elena Lara
CEA - Institut de Génomique, GENOSCOPE, Evry, 91057, France
Julie Poulain, Corinne Cruaud, Adriana Alberti & Patrick Wincker
PANGAEA, Data Publisher for Earth and Environmental Science, University of Bremen, Bremen, 28359, Germany
Stéphane Pesant
MARUM, Bremen University, Bremen, 28359, Germany
Stéphane Pesant
Directors’ Research, European Molecular Biology Laboratory, Heidelberg, 69117, Germany
Stefanie Kandels-Lewis
CNRS, UMR 7144, EPEP, Station Biologique de Roscoff, Roscoff, 29680, France
Céline Dimier
Sorbonne Universités, UPMC Université Paris 06, UMR 7144, Station Biologique de Roscoff, Roscoff, 29680, France
Céline Dimier
Institut de Biologie de l’École Normale Supérieure, École Normale Supérieure, Paris Sciences et Lettres Research University, CNRS UMR 8197, F-75005 Paris, INSERM U1024, France
Céline Dimier
CNRS, UMR 7093, Laboratoire d'océanographie de Villefranche, Observatoire Océanologique, Villefranche-sur-mer, 06230, France
Marc Picheral & Sarah Searson
Sorbonne Universités, UPMC Université Paris 06, UMR 7093, Observatoire Océanologique, Villefranche-sur-mer, 06230, France
Marc Picheral & Sarah Searson
Mediterranean Institute of Advanced Studies, CSIC-UiB, Esporles, 21-07190, Mallorca, Spain
Carlos M. Duarte
King Abdullah University of Science and Technology, Red Sea Research Center, Thuwal, 23955-6900, Saudi Arabia
Carlos M. Duarte
Max-Delbrück-Centre for Molecular Medicine, Berlin, 13092, Germany
Peer Bork
CNRS, UMR 8030, Evry, 91057, France
Patrick Wincker
Université d’Evry, UMR 8030, Evry, 91057, France
Patrick Wincker
Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, 43210, Ohio, USA
Matthew B. Sullivan

Authors

Simon Roux
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer R. Brum
View author publications
You can also search for this author in PubMed Google Scholar
Bas E. Dutilh
View author publications
You can also search for this author in PubMed Google Scholar
Shinichi Sunagawa
View author publications
You can also search for this author in PubMed Google Scholar
Melissa B. Duhaime
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Loy
View author publications
You can also search for this author in PubMed Google Scholar
Bonnie T. Poulos
View author publications
You can also search for this author in PubMed Google Scholar
Natalie Solonenko
View author publications
You can also search for this author in PubMed Google Scholar
Elena Lara
View author publications
You can also search for this author in PubMed Google Scholar
Julie Poulain
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Pesant
View author publications
You can also search for this author in PubMed Google Scholar
Stefanie Kandels-Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Céline Dimier
View author publications
You can also search for this author in PubMed Google Scholar
Marc Picheral
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Searson
View author publications
You can also search for this author in PubMed Google Scholar
Corinne Cruaud
View author publications
You can also search for this author in PubMed Google Scholar
Adriana Alberti
View author publications
You can also search for this author in PubMed Google Scholar
Carlos M. Duarte
View author publications
You can also search for this author in PubMed Google Scholar
Josep M. Gasol
View author publications
You can also search for this author in PubMed Google Scholar
Dolors Vaqué
View author publications
You can also search for this author in PubMed Google Scholar
Peer Bork
View author publications
You can also search for this author in PubMed Google Scholar
Silvia G. Acinas
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Wincker
View author publications
You can also search for this author in PubMed Google Scholar
Matthew B. Sullivan
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

Tara Oceans Coordinators

Contributions

S.R. and M.B.S. designed the study. C.D., M.P. and S.Se. contributed extensively to sampling collection. S.K.-L. managed the logistics of the Tara Oceans project. B.T.P., N.S. and E.L. performed the viral-specific processing of the samples. J.P., C.C., A.A. and P.W. led the sequencing of viral samples. S.R., S.Su. and B.E.D. led the assembly of raw data. S.R., S.Su., M.B.D. and M.B.S. analysed the genomic diversity data. S.R., A.L., J.R.B. and M.B.S. analysed the AMGs data. S.R., J.R.B., B.E.D, S.Su., M.B.D., A.L., S.P., P.B., S.G.A., C.D., J.M.G., D.V. and M.B.S. provided constructive comments, revised and edited the manuscript. Tara Oceans Coordinators provided constructive criticism throughout the study. All authors discussed the results and commented on the manuscript.

Corresponding author

Correspondence to Matthew B. Sullivan.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

All data are fully and freely available from the date of publication, with no restrictions, at EBI, PANGAEA, and iVirus. All of the samples, analyses, publications, and ownership of data are free from legal entanglement or restriction of any sort by the nations in whose waters Tara Oceans expedition sampled.

A list of participants and their affiliations appears in the Supplementary Information.

Extended data figures and tables

Extended Data Figure 1 Accumulation curves of populations and viral clusters and identification of abundant viral clusters in GOV samples.

a, b, Accumulation curves for viral populations (a) and viral clusters (b) were computed from 50 randomly shuffled samples (blue dots) for all samples, epipelagic, mesopelagic, or bathypelagic subsets. For each curve, the average of 50 iterations is displayed with red dots. c, Schematic of the selection process of abundant viral clusters. For each sample, viral clusters accounting for (up to) 80% of the sample diversity (as assessed by their Simpson index) was considered as abundant. On the left is an example for sample 125_MIX. Viral clusters detected as abundant in at least two different stations were included in the 38 viral clusters described in Fig. 2 and Extended Data Fig. 3.

Source data

Extended Data Figure 2 Comparison of viral clusters with other classification methods (phage proteomic tree and percentage of shared genes).

The phage proteomic tree includes the 756 GOV complete and near-complete genomes from epipelagic and mesopelagic samples and the closest reference genomes from RefSeq and environmental phages (d < 0.5 to a GOV sequence or found in the same viral cluster as a GOV sequence). Branches of monophyletic clades that include more than 3 GOV and/or uncultivated marine sequences with no isolate reference are highlighted in blue. All viral clusters with more than 8 representatives in the tree or part of the 38 abundant viral clusters are indicated by the colours of the outer ring. The name and affiliation (if available) of the 38 abundant viral clusters are indicated next to the viral cluster on the coloured ring. Viral clusters in which members were gathered in single monophyletic clades are indicated with a solid black outline, while viral clusters for which all-but-one member were gathered in a single monophyletic clade are highlighted with a dashed black outline. Distribution of the percentage number of shared genes estimated based on the number of shared protein clusters for viral genome/contigs pairs either between different viral clusters or within viral clusters (bottom right). On average, 73% and 39% of sequences within a viral cluster shared more than 20% and 40% of their genes, respectively, which represent the current thresholds currently accepted for sub-family and genus designations. Similarly, 83% of sequences within a viral cluster were consistently affiliated in the phage proteomic tree as they formed a monophyletic group that included only members of the particular viral cluster. Thus all three classification methods are largely consistent for the GOV dataset (see Supplementary Information).

Extended Data Figure 3 Summary of 34 of the 38 abundant viral clusters.

Summaries are given for the 34 abundant viral clusters not summarized in Fig. 2. Predicted genome size is based on the set of isolates and circular contigs in the viral cluster. NA (not applicable) corresponds to viral clusters either without any circular contigs, or for which the relative standard deviation of estimated genome size across the different isolate(s) and/or circular contigs is greater than 15%. Host association values are based on the number of cluster members associated with each host group. Statistical significance of this number of predictions was evaluated by comparison with an expected number of associations calculated using a Poisson distribution. Host associations based on known isolates are indicated with a star (for associations based on cultivated isolates) or a dot (for associations based on the detection of a cluster member in a microbial genome from the VirSorter Curated Dataset). The abundant epipelagic microbial groups (representing >1% of the microbial OTUs in epipelagic samples) are highlighted in bold. Distribution and relative abundance of viral clusters are based on the cumulated coverage of viral cluster members among sample viral populations. The main oceanic basins are indicated for each set of sample.

Extended Data Figure 4 Association between abundant viral clusters and abundance and diversity of host groups.

a, Abundance and diversity of bacterial and archaeal host groups associated with the 38 abundant viral clusters (see Fig. 2a). For each host group (at the phylum level, except for Proteobacteria where the class level is used), the different panels display, from top to bottom: (i) the number of viral clusters associated with this host group; (ii) the global relative abundance of this group estimated from the microbial metagenomic OTU counts; (iii) the global diversity of this group based on a Chao index computation including all Tara Oceans microbial metagenome samples (that is, including both alpha and beta diversity); (iv) the distribution of Chao indexes by sample for this group (the alpha diversity); and (v) the average Sorensen index between pairs of samples that include at least one OTU of this group (the beta diversity). OTU counts were derived from the 109 epipelagic microbial metagenomes described previously¹⁸. b, Pearson correlations between host-group relative abundance or diversity indices (global Chao index, average Chao index across samples and average Sorensen index across samples) and the number of viral clusters.

Source data

Extended Data Figure 5 Diversity, distribution, and genome context of dsrC genes in GOV contigs.

a, Maximum-likelihood tree (from an amino-acid alignment) including the 11 viral DsrC and microbial sequences from microbial metagenomes and NCBI nr database. The presence of conserved cysteine residues (termed CysA and CysB, as in ref. 24) is indicated with coloured circles next to each sequence or clade. The corresponding type of DsrC-like protein is indicated by the colouring of the branch or clade. The microbial metagenomic contigs affiliated to uncultivated, marine sulfur-oxidizing Gammaproteobacteria (as confirmed by complementary phylogenetic analysis of DsrAB; Supplementary Fig. 7) are indicated by stars. Viral AMG sequences are highlighted in blue, internal nodes and SH-like supports are represented by proportional circles (all nodes with support <0.40 were collapsed). Each dsrC AMG is associated with an abundance profile (right) that displays the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage—that is, contig coverage per Gb of metagenome). b, Comparison of dsrC-containing contigs maps. A T4-like marker gene (T4 baseplate) is indicated on the maps, alongside putative AMGs (Fe–S biosyn, iron–sulfur cluster biosynthesis; Amt, ammonia transporter).

Extended Data Figure 6 Diversity, distribution, and genome context of soxYZ genes in GOV contigs.

a, Bayesian tree from an amino-acid alignment, including the four viral soxYZ and microbial sequences from microbial metagenomes and the NCBI nr database. The affiliation of microbial clades (either from the NCBI reference or from the LCA affiliation of metagenomic contigs) is indicated by the colouring of the grouped clades or by a coloured square next to the sequence. Viral AMG sequences are highlighted in blue, posterior probabilities are represented by proportional circles (all nodes with posterior probability <0.40 were collapsed). Clades including sulfur-oxidizing proteobacteria are indicated on the tree. Each soxYZ AMG is associated with an abundance profile (on the right) displaying the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage; that is, contig coverage per Gb of metagenome). b, Comparison of soxYZ-containing contigs maps. For contig GOV_bin_4310_contig-100_0, the second largest contig from the same bin (GOV_bin_4310_contig-100_1) is displayed. T4-like marker genes (gp23 and the gene encoding T4 baseplate) are indicated on the maps alongside putative AMGs.

Extended Data Figure 7 Diversity, distribution, and genome context of P-II genes in GOV contigs.

a, Maximum-likelihood tree from an amino-acid alignment that includes the 10 viral P-II and microbial sequences from microbial metagenomes and the NCBI nr database. The affiliation of microbial clades (either from the NCBI reference or from the LCA affiliation of metagenomic contigs) is indicated by the colouring of the grouped clades or by a coloured square next to the sequence. Sequences lacking the conserved uridylation site of P-II (Supplementary Fig. 5) are highlighted with a star next to the sequence name or clade. Viral AMG sequences are highlighted in blue, internal nodes SH-like supports are represented by proportional circles (all nodes with support <0.40 were collapsed). Each P-II AMG is associated with an abundance profile (right) displaying the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage; that is, contig coverage per Gb of metagenome). b, Comparison of P-II-containing contig maps. Ammonia transporter genes linked to P-II are indicated on the map (dark red). When available, the viral-cluster affiliation of each contig is indicated next to the contig name. Contig GOV_bin_5834_contig-100_7 is too short to be clustered based on a shared protein cluster network, however the seed contig of its population was clustered (in VC_12, Siphoviridae P12024virus), hence the indication of this seed contig affiliation.

Extended Data Figure 8 Diversity, distribution, and genome context of amoC gene in GOV contigs.

a, Maximum-likelihood tree (from an amino-acid alignment) including the GOV amoC AMG and microbial sequences from microbial metagenomes and NCBI nr database. The affiliation of microbial clades (either from the NCBI reference or from the LCA affiliation of metagenomic contigs) is indicated by the colouring of the grouped clades or by a coloured square next to the sequence. Viral AMG sequence is highlighted in blue, internal nodes and SH-like supports are represented by proportional circles (all nodes with support <0.40 were collapsed). b, Abundance profile displaying the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage; that is, contig coverage per Gb of metagenome). c, Map of the amoC-containing contig.

Extended Data Figure 9

Normalized coverage of contigs harbouring AMG as a function of the temperature and nutrient concentrations of the corresponding samples. AMGs are grouped by clade based on their phylogeny (see Extended Data Figs 5, 6, 7) and their coverages are cumulated if multiple contigs are included in a clade. Plots display the cumulated normalized coverage of a clade (y axis) as function of the temperature or nutrient concentration (x axis) across all epipelagic samples for geographically unrestricted clades (that is, clades found in >5 samples, see Fig. 3c). Mesopelagic samples were excluded from the analysis since the AMG signal was detected in epipelagic samples. Samples are colour-coded according to ocean and sea regions (Supplementary Table 1). The calculated preferential range of temperature or nutrient concentration is displayed below each plot for epipelagic AMGs (P-II-4 distribution could not be linked to specific environmental conditions, but this AMG is the only one consistently retrieved in mesopelagic samples).

Source data

Extended Data Table 1 Summary of genes and contigs characteristics for new viral dsrC, soxYZ, P-II, and amoC AMGs

Full size table

Related audio

Noah Baker learns about the viruses in our oceans

Supplementary information

Supplementary Information

This file includes Supplementary Text and Data, Supplementary Figures 1-8 legends for Supplementary Tables 1-6 (see separate excel files) and additional references. The text includes additional information and literature context that help document details about the generation of the GOV dataset (assembly, identification of viral contigs, read mapping to viral contigs), viral cluster definition and affiliation (including comparison to other genome classification methods), host prediction (methods evaluation and results), discussions about AMG affiliation and host prediction for associated contigs, and list of supports and sponsors of Tara Oceans and Malaspina expeditions (including the list and affiliation of Tara Oceans coordinators). (PDF 8379 kb)

Supplementary Table 1

This file contains the list of viromes in the GOV dataset. Station number, depth, longhurst province, biome, and sequencing effort are indicated for each virome sample. (XLS 63 kb)

Supplementary Table 2

This file contains the GOV viral population summary. The number of contig and length of each population are presented, alongside their normalized coverage across the 104 GOV viromes. (XLS 13582 kb)

Supplementary Table 3

This file contains a summary of GOV Viral Clusters (VCs). For each VC, the composition (number and origin of VC members), affiliation, and coverage across GOV viromes are indicated. (XLS 865 kb)

Supplementary Table 4

This file contains the benchmarks of in silico host prediction methods. Results of host prediction methods evaluations performed using the NCBI RefSeq Virus database and VirSorter Curated Dataset. (XLS 7 kb)

Supplementary Table 5

This file contains the host prediction for GOV viral contigs that are associated with a population. Predictions are reported for each population with the type of signal (blastn, CRISPR, tetranucleotide composition), the host sequence used, and the strength of the prediction. (XLS 699 kb)

Supplementary Table 6

This file contains the PFAM domains detected in GOV viral contigs (≥1.5kb). For each PFAM domain, the number of genes detected in the GOV dataset is indicated, alongside the functional category of the domain. (XLS 369 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

Source data

Source data to Fig. 1

Source data to Fig. 2

Source data to Fig. 3

Source data to Extended Data Fig. 4

Source data to Extended Data Fig. 5

Source data to Extended Data Fig. 6

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roux, S., Brum, J., Dutilh, B. et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537, 689–693 (2016). https://doi.org/10.1038/nature19366

Download citation

Received: 08 February 2016
Accepted: 12 August 2016
Published: 21 September 2016
Issue Date: 29 September 2016
DOI: https://doi.org/10.1038/nature19366

This article is cited by

Exploring virus-host-environment interactions in a chemotrophic-based underground estuary
- Timothy M. Ghaly
- Amaranta Focardi
- Ian T. Paulsen
Environmental Microbiome (2024)
Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes
- Ling-Yi Wu
- Yasas Wijesekara
- Bas E. Dutilh
Genome Biology (2024)
COBRA improves the completeness and contiguity of viral genomes assembled from metagenomes
- LinXing Chen
- Jillian F. Banfield
Nature Microbiology (2024)
Diversity and potential host-interactions of viruses inhabiting deep-sea seamount sediments
- Meishun Yu
- Menghui Zhang
- Min Jin
Nature Communications (2024)
Metavirome mining from fjord sediments of Svalbard Archipelago
- Bhavya Kachiprath
- Jayanath Gopi
- Rosamma Philip
Journal of Soils and Sediments (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

Tara Oceans Coordinators

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Related audio

Supplementary information

PowerPoint slides

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links