Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses

Abstract

Ocean microbes drive biogeochemical cycling on a global scale1. However, this cycling is constrained by viruses that affect community composition, metabolic activity, and evolutionary trajectories2,3. Owing to challenges with the sampling and cultivation of viruses, genome-level viral diversity remains poorly described and grossly understudied, with less than 1% of observed surface-ocean viruses known4. Here we assemble complete genomes and large genomic fragments from both surface- and deep-ocean viruses sampled during the Tara Oceans and Malaspina research expeditions5,6, and analyse the resulting ‘global ocean virome’ dataset to present a global map of abundant, double-stranded DNA viruses complete with genomic and ecological contexts. A total of 15,222 epipelagic and mesopelagic viral populations were identified, comprising 867 viral clusters (defined as approximately genus-level groups7,8). This roughly triples the number of known ocean viral populations4 and doubles the number of candidate bacterial and archaeal virus genera8, providing a near-complete sampling of epipelagic communities at both the population and viral-cluster level. We found that 38 of the 867 viral clusters were locally or globally abundant, together accounting for nearly half of the viral populations in any global ocean virome sample. While two-thirds of these clusters represent newly described viruses lacking any cultivated representative, most could be computationally linked to dominant, ecologically relevant microbial hosts. Moreover, we identified 243 viral-encoded auxiliary metabolic genes, of which only 95 were previously known. Deeper analyses of four of these auxiliary metabolic genes (dsrC, soxYZ, P-II (also known as glnB) and amoC) revealed that abundant viruses may directly manipulate sulfur and nitrogen cycling throughout the epipelagic ocean. This viral catalog and functional analyses provide a necessary foundation for the meaningful integration of viruses into ecosystem models where they act as key players in nutrient cycling and trophic networks.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Composition of the Global Ocean Viromes (GOV) dataset.
Figure 2: Characterization of the dominant oceanic viral clusters.
Figure 3: Characterization and distribution of viral AMGs involved in sulfur and nitrogen cycles.

Similar content being viewed by others

References

  1. Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive Earth’s biogeochemical cycles. Science 320, 1034–1039 (2008)

    Article  ADS  CAS  PubMed  Google Scholar 

  2. Rohwer, F. & Thurber, R. V. Viruses manipulate the marine environment. Nature 459, 207–212 (2009)

    Article  ADS  CAS  PubMed  Google Scholar 

  3. Brum, J. R. & Sullivan, M. B. Rising to the challenge: accelerated pace of discovery transforms marine virology. Nat. Rev. Microbiol. 13, 147–159 (2015)

    Article  CAS  PubMed  Google Scholar 

  4. Brum, J. et al. Patterns and ecological drivers of ocean viral communities. Science 348, 1261498 (2015)

    Article  CAS  PubMed  Google Scholar 

  5. Karsenti, E. et al. A holistic approach to marine eco-systems biology. PLoS Biol . 9, e1001177 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Duarte, C. M. Seafaring in the 21st century: the Malaspina 2010 circumnavigation expedition. Limnol. Oceanogr. 24, 11–14 (2015)

    Google Scholar 

  7. Lima-Mendez, G., Van Helden, J., Toussaint, A. & Leplae, R. Reticulate representation of evolutionary and functional relationships between phage genomes. Mol. Biol. Evol. 25, 762–777 (2008)

    Article  CAS  PubMed  Google Scholar 

  8. Roux, S., Hallam, S. J., Woyke, T. & Sullivan, M. B. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. eLife 4, 1–20 (2015)

    Article  Google Scholar 

  9. Mizuno, C. M., Rodriguez-Valera, F., Kimes, N. E. & Ghai, R. Expanding the marine virosphere using metagenomics. PLoS Genet . 9, e1003987 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Chow, C.-E. T., Winget, D. M., White, R. A., III, Hallam, S. J. & Suttle, C. A. Combining genomic sequencing methods to explore viral diversity and reveal potential virus-host interactions. Front. Microbiol. 6, 265 (2015)

    PubMed  PubMed Central  Google Scholar 

  11. Roux, S. et al. Ecology and evolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell- and meta-genomics. eLife 3, e03125 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Dutilh, B. E. et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun . 5, 4498 (2014)

    Article  ADS  CAS  PubMed  Google Scholar 

  13. Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013)

    Article  CAS  PubMed  Google Scholar 

  14. Sullivan, M. B. et al. Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments. Environ. Microbiol. 12, 3035–3056 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Zhao, Y. et al. Abundant SAR11 viruses in the ocean. Nature 494, 357–360 (2013)

    Article  ADS  CAS  PubMed  Google Scholar 

  16. Labrie, S. J. et al. Genomes of marine cyanopodoviruses reveal multiple origins of diversity. Environ. Microbiol. 15, 1356–1376 (2013)

    Article  CAS  PubMed  Google Scholar 

  17. Andersson, A. F. & Banfield, J. F. Virus population dynamics and acquired virus resistance in natural microbial communities. Science 320, 1047–1050 (2008)

    Article  ADS  CAS  PubMed  Google Scholar 

  18. Sunagawa, S. et al. Ocean plankton. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015)

    Article  CAS  PubMed  Google Scholar 

  19. Flores, C. O., Valverde, S. & Weitz, J. S. Multi-scale structure and geographic drivers of cross-infection within marine bacteria and phages. ISME J . 7, 520–532 (2013)

    Article  PubMed  Google Scholar 

  20. Hurwitz, B. L., Brum, J. R. & Sullivan, M. B. Depth-stratified functional and taxonomic niche specialization in the ‘core’ and ‘flexible’ Pacific Ocean Virome. ISME J . 9, 472–484 (2015)

    Article  CAS  PubMed  Google Scholar 

  21. Anantharaman, K. et al. Sulfur oxidation genes in diverse deep-sea viruses. Science 344, 757–760 (2014)

    Article  ADS  CAS  PubMed  Google Scholar 

  22. Friedrich, C. G., Bardischewsky, F., Rother, D., Quentmeier, A. & Fischer, J. Prokaryotic sulfur oxidation. Curr. Opin. Microbiol. 8, 253–259 (2005)

    Article  CAS  PubMed  Google Scholar 

  23. Santos, A. A. et al. A protein trisulfide couples dissimilatory sulfate reduction to energy conservation. Science 350, 1541–1545 (2015)

    Article  ADS  CAS  PubMed  Google Scholar 

  24. Venceslau, S. S., Stockdreher, Y., Dahl, C. & Pereira, I. A. C. The “bacterial heterodisulfide” DsrC is a key protein in dissimilatory sulfur metabolism. Biochim. Biophys. Acta 1837, 1148–1164 (2014)

    Article  CAS  PubMed  Google Scholar 

  25. Dahl, C., Franz, B., Hensen, D., Kesselheim, A. & Zigann, R. Sulfite oxidation in the purple sulfur bacterium Allochromatium vinosum: identification of SoeABC as a major player and relevance of SoxYZ in the process. Microbiology 159, 2626–2638 (2013)

    Article  CAS  PubMed  Google Scholar 

  26. Huergo, L. F., Chandra, G. & Merrick & M. P. (II) signal transduction proteins: nitrogen regulation and beyond. FEMS Microbiol. Rev. 37, 251–283 (2013)

    Article  CAS  PubMed  Google Scholar 

  27. Stahl, D. A. & de la Torre, J. R. Physiology and diversity of ammonia-oxidizing archaea. Annu. Rev. Microbiol. 66, 83–101 (2012)

    Article  CAS  PubMed  Google Scholar 

  28. Loy, A. et al. Reverse dissimilatory sulfite reductase as phylogenetic marker for a subgroup of sulfur-oxidizing prokaryotes. Environ. Microbiol. 11, 289–299 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Pester, M., Schleper, C. & Wagner, M. The Thaumarchaeota: an emerging view of their phylogeny and ecophysiology. Curr. Opin. Microbiol. 14, 300–306 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Weitz, J. S. et al. A multitrophic model to quantify the effects of marine viruses on microbial food webs and ecosystem processes. ISME J . 9, 1352–1364 (2015)

    Article  PubMed  PubMed Central  Google Scholar 

  31. Arcondéguy, T., Jack, R. & Merrick & M. P. (II) signal transduction proteins, pivotal players in microbial nitrogen control. Microbiol. Mol. Biol. Rev. 65, 80–105 (2001)

    Article  PubMed  PubMed Central  Google Scholar 

  32. Pesant, S. et al. Open science resources for the discovery and analysis of Tara Oceans data. Sci. Data 2, 150023 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. John, S. G. et al. A simple and efficient method for concentration of ocean viruses by chemical flocculation. Environ. Microbiol. Rep. 3, 195–202 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Hurwitz, B. L., Deng, L., Poulos, B. T. & Sullivan, M. B. Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics. Environ. Microbiol. 15, 1428–1440 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Aminot, A., Kérouel, R. & Coverly, S. in Practical Guidelines for the Analysis of Seawater (ed. O. Wurl ) 143–176 (CRC Press, 2009)

  36. Tara Oceans Consortium & Tara Oceans Expedition. Registry of all samples from the Tara Oceans Expedition (2009–2013). http://dx.doi.org/10.1594/PANGAEA.842197 (2015)

  37. Tara Oceans Consortium & Tara Oceans Expedition. Environmental context of all samples from the Tara Oceans Expedition (2009–2013). http://dx.doi.org/10.1594/PANGAEA.853810 (2015)

  38. Tara Oceans Consortium & Tara Oceans Expedition. Biodiversity context of all samples from the Tara Oceans Expedition (2009–2013). http://dx.doi.org/10.1594/PANGAEA.853809 (2015)

  39. Salazar, G. et al. Global diversity and biogeography of deep-sea pelagic prokaryotes. ISME J . 10, 596–608 (2016). 10.1038/ismej.2015.137

    Article  PubMed  Google Scholar 

  40. Kultima, J. R. et al. MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS One 7, e47656 (2012)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  41. Peng, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012)

    Article  CAS  PubMed  Google Scholar 

  42. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012)

    Article  PubMed  PubMed Central  Google Scholar 

  43. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006)

    Article  CAS  PubMed  Google Scholar 

  44. Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Mavromatis, K. et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4, 495–500 (2007)

    Article  CAS  PubMed  Google Scholar 

  46. Roux, S., Krupovic, M., Debroas, D., Forterre, P. & Enault, F. Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences. Open Biol . 3, 130160 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Roux, S., Enault, F., Hurwitz, B. L. & Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Pope, W. H. et al. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity. eLife 4, e06416 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res . 42, D222–D230 (2014)

    Article  CAS  PubMed  Google Scholar 

  51. Eddy, S. R. Accelerated Profile HMM Searches. PLOS Comput. Biol. 7, e1002195 (2011)

    Article  ADS  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  52. Brum, J. R. et al. Illuminating structural proteins in viral “dark matter” with metaproteomics. Proc. Natl Acad. Sci. USA 113, 2436–2441 (2016)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  53. Holmfeldt, K. et al. Twelve previously unknown phage genera are ubiquitous in global oceans. Proc. Natl Acad. Sci. USA 110, 12798–12803 (2013)

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  54. Kang, I., Jang, H. & Cho, J.-C. Complete genome sequences of two Persicivirga bacteriophages, P12024S and P12024L. J. Virol. 86, 8907–8908 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Kang, I., Oh, H.-M., Kang, D. & Cho, J.-C. Genome of a SAR116 bacteriophage shows the prevalence of this phage type in the oceans. Proc. Natl Acad. Sci. USA 110, 12343–12348 (2013)

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  56. Hjorleifsdottir, S., Aevarsson, A., Hreggvidsson, G. O., Fridjonsson, O. H. & Kristjansson, J. K. Isolation, growth and genome of the Rhodothermus RM378 thermophilic bacteriophage. Extremophiles 18, 261–270 (2014)

    Article  CAS  PubMed  Google Scholar 

  57. Marks, T. J. & Hamilton, P. T. Characterization of a thermophilic bacteriophage of Geobacillus kaustophilus. Arch. Virol. 159, 2771–2775 (2014)

    Article  CAS  PubMed  Google Scholar 

  58. Halmillawewa, A. P., Restrepo-Córdoba, M., Yost, C. K. & Hynes, M. F. Genomic and phenotypic characterization of Rhizobium gallicum phage vB_RglS_P106B. Microbiology 161, 611–620 (2015)

    Article  CAS  PubMed  Google Scholar 

  59. Rohwer, F. & Edwards, R. The Phage Proteomic Tree: a genome-based taxonomy for phage. J. Bacteriol. 184, 4529–4535 (2002)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127–128 (2007)

    Article  CAS  PubMed  Google Scholar 

  61. Letunic, I. & Bork, P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res . 39, W475–8 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Edwards, R. A., McNair, K., Faust, K., Raes, J. & Dutilh, B. E. Computational approaches to predict bacteriophage-host relationships. FEMS Microbiol. Rev. 40, 258–272 (2016)

    Article  CAS  PubMed  Google Scholar 

  64. Bland, C. et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Rho, M., Wu, Y.-W., Tang, H., Doak, T. G. & Ye, Y. Diverse CRISPRs evolving in human microbiomes. PLoS Genet . 8, e1002441 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet . 16, 276–277 (2000)

    Article  CAS  PubMed  Google Scholar 

  67. Ogilvie, L. A. et al. Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences. Nat. Commun. 4, 2420 (2013)

    Article  ADS  CAS  PubMed  Google Scholar 

  68. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Oksanen, J. et al. The vegan package version 2.4-0; https://cran.r-project.org/web/packages/vegan/index.html (2016)

  70. Sharon, I. et al. Comparative metagenomics of microbial traits within oceanic viral communities. ISME J . 5, 1178–1190 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Thompson, L. R. et al. Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. Proc. Natl Acad. Sci. USA 108, E757–E764 (2011)

    Article  PubMed  PubMed Central  Google Scholar 

  72. Dammeyer, T., Bagby, S. C., Sullivan, M. B., Chisholm, S. W. & Frankenberg-Dinkel, N. Efficient phage-mediated pigment biosynthesis in oceanic cyanobacteria. Curr. Biol. 18, 442–448 (2008)

    Article  CAS  PubMed  Google Scholar 

  73. Lindell, D., Jaffe, J. D., Johnson, Z. I., Church, G. M. & Chisholm, S. W. Photosynthesis genes in marine viruses yield proteins during host infection. Nature 438, 86–89 (2005)

    Article  ADS  CAS  PubMed  Google Scholar 

  74. Lindell, D. et al. Genome-wide expression dynamics of a marine virus and host reveal features of co-evolution. Nature 449, 83–86 (2007)

    Article  ADS  CAS  PubMed  Google Scholar 

  75. Sullivan, M. B. et al. Prevalence and evolution of core photosystem II genes in marine cyanobacterial viruses and their hosts. PLoS Biol . 4, e234 (2006)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5, e9490 (2010)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  79. Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001)

    Article  CAS  PubMed  Google Scholar 

  80. Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011)

    Article  CAS  PubMed  Google Scholar 

  81. Sullivan, M. J., Petty, N. K. & Beatson, S. A. Easyfig: a genome comparison visualizer. Bioinformatics 27, 1009–1010 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Roy, A., Kucukural, A. & Zhang, Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protocols 5, 725–738 (2010)

    Article  CAS  PubMed  Google Scholar 

  83. Wiederstein, M. & Sippl, M. J. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res . 35, W407–10 (2007)

    Article  PubMed  PubMed Central  Google Scholar 

  84. Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013)

    Article  ADS  CAS  PubMed  Google Scholar 

  85. Alberti, A. et al. Comparison of library preparation methods reveals their impact on interpretation of metatranscriptomic data. BMC Genomics 15, 912 (2014)

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank J. Weitz for advice on statistics, C. Pelikan for help with the DsrAB phylogenetic tree, C. Dahl for discussion regarding DsrC function, and members of the Sullivan and the V. Rich laboratories for suggestions and comments on this manuscript. We acknowledge support from UA high-performance computing and the Ohio Supercomputer Center. Sponsors and support for Tara Oceans and Malaspina expeditions are listed in the Supplementary Information. This viral research was funded by a National Science Foundation grant (1536989) and Gordon and Betty Moore Foundation grants (3790, 2631) to M.B.S., and the French Ministry of Research and Government through the ‘Investissements d’Avenir’ program OCEANOMICS (ANR-11-BTBR-0008) and France Genomique (ANR-10-INBS-09-08). Virus researchers were partially supported by the Water, Environmental and Energy Solutions Initiative and the Ecosystem Genomics Institute (S.R.), the Netherlands Organization for Scientific Research Vidi grant 864.14.004 and CAPES/BRASIL (B.E.D.), and the Austrian Science Fund (project P25111-B22, A.L.). Sequencing was provided by Genoscope (Tara Oceans) and DOE JGI (Malaspina). All authors approved the final manuscript. This article is contribution number 43 of the Tara Oceans expedition.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

S.R. and M.B.S. designed the study. C.D., M.P. and S.Se. contributed extensively to sampling collection. S.K.-L. managed the logistics of the Tara Oceans project. B.T.P., N.S. and E.L. performed the viral-specific processing of the samples. J.P., C.C., A.A. and P.W. led the sequencing of viral samples. S.R., S.Su. and B.E.D. led the assembly of raw data. S.R., S.Su., M.B.D. and M.B.S. analysed the genomic diversity data. S.R., A.L., J.R.B. and M.B.S. analysed the AMGs data. S.R., J.R.B., B.E.D, S.Su., M.B.D., A.L., S.P., P.B., S.G.A., C.D., J.M.G., D.V. and M.B.S. provided constructive comments, revised and edited the manuscript. Tara Oceans Coordinators provided constructive criticism throughout the study. All authors discussed the results and commented on the manuscript.

Corresponding author

Correspondence to Matthew B. Sullivan.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

All data are fully and freely available from the date of publication, with no restrictions, at EBI, PANGAEA, and iVirus. All of the samples, analyses, publications, and ownership of data are free from legal entanglement or restriction of any sort by the nations in whose waters Tara Oceans expedition sampled.

A list of participants and their affiliations appears in the Supplementary Information.

Extended data figures and tables

Extended Data Figure 1 Accumulation curves of populations and viral clusters and identification of abundant viral clusters in GOV samples.

a, b, Accumulation curves for viral populations (a) and viral clusters (b) were computed from 50 randomly shuffled samples (blue dots) for all samples, epipelagic, mesopelagic, or bathypelagic subsets. For each curve, the average of 50 iterations is displayed with red dots. c, Schematic of the selection process of abundant viral clusters. For each sample, viral clusters accounting for (up to) 80% of the sample diversity (as assessed by their Simpson index) was considered as abundant. On the left is an example for sample 125_MIX. Viral clusters detected as abundant in at least two different stations were included in the 38 viral clusters described in Fig. 2 and Extended Data Fig. 3.

Source data

Extended Data Figure 2 Comparison of viral clusters with other classification methods (phage proteomic tree and percentage of shared genes).

The phage proteomic tree includes the 756 GOV complete and near-complete genomes from epipelagic and mesopelagic samples and the closest reference genomes from RefSeq and environmental phages (d < 0.5 to a GOV sequence or found in the same viral cluster as a GOV sequence). Branches of monophyletic clades that include more than 3 GOV and/or uncultivated marine sequences with no isolate reference are highlighted in blue. All viral clusters with more than 8 representatives in the tree or part of the 38 abundant viral clusters are indicated by the colours of the outer ring. The name and affiliation (if available) of the 38 abundant viral clusters are indicated next to the viral cluster on the coloured ring. Viral clusters in which members were gathered in single monophyletic clades are indicated with a solid black outline, while viral clusters for which all-but-one member were gathered in a single monophyletic clade are highlighted with a dashed black outline. Distribution of the percentage number of shared genes estimated based on the number of shared protein clusters for viral genome/contigs pairs either between different viral clusters or within viral clusters (bottom right). On average, 73% and 39% of sequences within a viral cluster shared more than 20% and 40% of their genes, respectively, which represent the current thresholds currently accepted for sub-family and genus designations. Similarly, 83% of sequences within a viral cluster were consistently affiliated in the phage proteomic tree as they formed a monophyletic group that included only members of the particular viral cluster. Thus all three classification methods are largely consistent for the GOV dataset (see Supplementary Information).

Extended Data Figure 3 Summary of 34 of the 38 abundant viral clusters.

Summaries are given for the 34 abundant viral clusters not summarized in Fig. 2. Predicted genome size is based on the set of isolates and circular contigs in the viral cluster. NA (not applicable) corresponds to viral clusters either without any circular contigs, or for which the relative standard deviation of estimated genome size across the different isolate(s) and/or circular contigs is greater than 15%. Host association values are based on the number of cluster members associated with each host group. Statistical significance of this number of predictions was evaluated by comparison with an expected number of associations calculated using a Poisson distribution. Host associations based on known isolates are indicated with a star (for associations based on cultivated isolates) or a dot (for associations based on the detection of a cluster member in a microbial genome from the VirSorter Curated Dataset). The abundant epipelagic microbial groups (representing >1% of the microbial OTUs in epipelagic samples) are highlighted in bold. Distribution and relative abundance of viral clusters are based on the cumulated coverage of viral cluster members among sample viral populations. The main oceanic basins are indicated for each set of sample.

Extended Data Figure 4 Association between abundant viral clusters and abundance and diversity of host groups.

a, Abundance and diversity of bacterial and archaeal host groups associated with the 38 abundant viral clusters (see Fig. 2a). For each host group (at the phylum level, except for Proteobacteria where the class level is used), the different panels display, from top to bottom: (i) the number of viral clusters associated with this host group; (ii) the global relative abundance of this group estimated from the microbial metagenomic OTU counts; (iii) the global diversity of this group based on a Chao index computation including all Tara Oceans microbial metagenome samples (that is, including both alpha and beta diversity); (iv) the distribution of Chao indexes by sample for this group (the alpha diversity); and (v) the average Sorensen index between pairs of samples that include at least one OTU of this group (the beta diversity). OTU counts were derived from the 109 epipelagic microbial metagenomes described previously18. b, Pearson correlations between host-group relative abundance or diversity indices (global Chao index, average Chao index across samples and average Sorensen index across samples) and the number of viral clusters.

Source data

Extended Data Figure 5 Diversity, distribution, and genome context of dsrC genes in GOV contigs.

a, Maximum-likelihood tree (from an amino-acid alignment) including the 11 viral DsrC and microbial sequences from microbial metagenomes and NCBI nr database. The presence of conserved cysteine residues (termed CysA and CysB, as in ref. 24) is indicated with coloured circles next to each sequence or clade. The corresponding type of DsrC-like protein is indicated by the colouring of the branch or clade. The microbial metagenomic contigs affiliated to uncultivated, marine sulfur-oxidizing Gammaproteobacteria (as confirmed by complementary phylogenetic analysis of DsrAB; Supplementary Fig. 7) are indicated by stars. Viral AMG sequences are highlighted in blue, internal nodes and SH-like supports are represented by proportional circles (all nodes with support <0.40 were collapsed). Each dsrC AMG is associated with an abundance profile (right) that displays the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage—that is, contig coverage per Gb of metagenome). b, Comparison of dsrC-containing contigs maps. A T4-like marker gene (T4 baseplate) is indicated on the maps, alongside putative AMGs (Fe–S biosyn, iron–sulfur cluster biosynthesis; Amt, ammonia transporter).

Extended Data Figure 6 Diversity, distribution, and genome context of soxYZ genes in GOV contigs.

a, Bayesian tree from an amino-acid alignment, including the four viral soxYZ and microbial sequences from microbial metagenomes and the NCBI nr database. The affiliation of microbial clades (either from the NCBI reference or from the LCA affiliation of metagenomic contigs) is indicated by the colouring of the grouped clades or by a coloured square next to the sequence. Viral AMG sequences are highlighted in blue, posterior probabilities are represented by proportional circles (all nodes with posterior probability <0.40 were collapsed). Clades including sulfur-oxidizing proteobacteria are indicated on the tree. Each soxYZ AMG is associated with an abundance profile (on the right) displaying the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage; that is, contig coverage per Gb of metagenome). b, Comparison of soxYZ-containing contigs maps. For contig GOV_bin_4310_contig-100_0, the second largest contig from the same bin (GOV_bin_4310_contig-100_1) is displayed. T4-like marker genes (gp23 and the gene encoding T4 baseplate) are indicated on the maps alongside putative AMGs.

Extended Data Figure 7 Diversity, distribution, and genome context of P-II genes in GOV contigs.

a, Maximum-likelihood tree from an amino-acid alignment that includes the 10 viral P-II and microbial sequences from microbial metagenomes and the NCBI nr database. The affiliation of microbial clades (either from the NCBI reference or from the LCA affiliation of metagenomic contigs) is indicated by the colouring of the grouped clades or by a coloured square next to the sequence. Sequences lacking the conserved uridylation site of P-II (Supplementary Fig. 5) are highlighted with a star next to the sequence name or clade. Viral AMG sequences are highlighted in blue, internal nodes SH-like supports are represented by proportional circles (all nodes with support <0.40 were collapsed). Each P-II AMG is associated with an abundance profile (right) displaying the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage; that is, contig coverage per Gb of metagenome). b, Comparison of P-II-containing contig maps. Ammonia transporter genes linked to P-II are indicated on the map (dark red). When available, the viral-cluster affiliation of each contig is indicated next to the contig name. Contig GOV_bin_5834_contig-100_7 is too short to be clustered based on a shared protein cluster network, however the seed contig of its population was clustered (in VC_12, Siphoviridae P12024virus), hence the indication of this seed contig affiliation.

Extended Data Figure 8 Diversity, distribution, and genome context of amoC gene in GOV contigs.

a, Maximum-likelihood tree (from an amino-acid alignment) including the GOV amoC AMG and microbial sequences from microbial metagenomes and NCBI nr database. The affiliation of microbial clades (either from the NCBI reference or from the LCA affiliation of metagenomic contigs) is indicated by the colouring of the grouped clades or by a coloured square next to the sequence. Viral AMG sequence is highlighted in blue, internal nodes and SH-like supports are represented by proportional circles (all nodes with support <0.40 were collapsed). b, Abundance profile displaying the relative abundance of the contig across the 91 epipelagic and mesopelagic samples (based on normalized coverage; that is, contig coverage per Gb of metagenome). c, Map of the amoC-containing contig.

Extended Data Figure 9

Normalized coverage of contigs harbouring AMG as a function of the temperature and nutrient concentrations of the corresponding samples. AMGs are grouped by clade based on their phylogeny (see Extended Data Figs 5, 6, 7) and their coverages are cumulated if multiple contigs are included in a clade. Plots display the cumulated normalized coverage of a clade (y axis) as function of the temperature or nutrient concentration (x axis) across all epipelagic samples for geographically unrestricted clades (that is, clades found in >5 samples, see Fig. 3c). Mesopelagic samples were excluded from the analysis since the AMG signal was detected in epipelagic samples. Samples are colour-coded according to ocean and sea regions (Supplementary Table 1). The calculated preferential range of temperature or nutrient concentration is displayed below each plot for epipelagic AMGs (P-II-4 distribution could not be linked to specific environmental conditions, but this AMG is the only one consistently retrieved in mesopelagic samples).

Source data

Extended Data Table 1 Summary of genes and contigs characteristics for new viral dsrC, soxYZ, P-II, and amoC AMGs

Related audio

Supplementary information

Supplementary Information

This file includes Supplementary Text and Data, Supplementary Figures 1-8 legends for Supplementary Tables 1-6 (see separate excel files) and additional references. The text includes additional information and literature context that help document details about the generation of the GOV dataset (assembly, identification of viral contigs, read mapping to viral contigs), viral cluster definition and affiliation (including comparison to other genome classification methods), host prediction (methods evaluation and results), discussions about AMG affiliation and host prediction for associated contigs, and list of supports and sponsors of Tara Oceans and Malaspina expeditions (including the list and affiliation of Tara Oceans coordinators). (PDF 8379 kb)

Supplementary Table 1

This file contains the list of viromes in the GOV dataset. Station number, depth, longhurst province, biome, and sequencing effort are indicated for each virome sample. (XLS 63 kb)

Supplementary Table 2

This file contains the GOV viral population summary. The number of contig and length of each population are presented, alongside their normalized coverage across the 104 GOV viromes. (XLS 13582 kb)

Supplementary Table 3

This file contains a summary of GOV Viral Clusters (VCs). For each VC, the composition (number and origin of VC members), affiliation, and coverage across GOV viromes are indicated. (XLS 865 kb)

Supplementary Table 4

This file contains the benchmarks of in silico host prediction methods. Results of host prediction methods evaluations performed using the NCBI RefSeq Virus database and VirSorter Curated Dataset. (XLS 7 kb)

Supplementary Table 5

This file contains the host prediction for GOV viral contigs that are associated with a population. Predictions are reported for each population with the type of signal (blastn, CRISPR, tetranucleotide composition), the host sequence used, and the strength of the prediction. (XLS 699 kb)

Supplementary Table 6

This file contains the PFAM domains detected in GOV viral contigs (≥1.5kb). For each PFAM domain, the number of genes detected in the GOV dataset is indicated, alongside the functional category of the domain. (XLS 369 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roux, S., Brum, J., Dutilh, B. et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537, 689–693 (2016). https://doi.org/10.1038/nature19366

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature19366

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing