Abstract

Viruses are the most abundant biological entities on Earth, but challenges in detecting, isolating, and classifying unknown viruses have prevented exhaustive surveys of the global virome. Here we analysed over 5 Tb of metagenomic sequence data from 3,042 geographically diverse samples to assess the global distribution, phylogenetic diversity, and host specificity of viruses. We discovered over 125,000 partial DNA viral genomes, including the largest phage yet identified, and increased the number of known viral genes by 16-fold. Half of the predicted partial viral genomes were clustered into genetically distinct groups, most of which included genes unrelated to those in known viruses. Using CRISPR spacers and transfer RNA matches to link viral groups to microbial host(s), we doubled the number of microbial phyla known to be infected by viruses, and identified viruses that can infect organisms from different phyla. Analysis of viral distribution across diverse ecosystems revealed strong habitat-type specificity for the vast majority of viruses, but also identified some cosmopolitan groups. Our results highlight an extensive global viral diversity and provide detailed insight into viral habitat distribution and host–virus interactions.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    Marine viruses—major players in the global ecosystem. Nat. Rev. Microbiol. 5, 801–812 (2007)

  2. 2.

    et al. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466, 334–338 (2010)

  3. 3.

    et al. Ocean plankton. Patterns and ecological drivers of ocean viral communities. Science 348, 1261498 (2015)

  4. 4.

    , & Prokaryotes: the unseen majority. Proc. Natl Acad. Sci. USA 95, 6578–6583 (1998)

  5. 5.

    et al. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification. Nucleic Acids Res. 43, D1099–D1106 (2015)

  6. 6.

    & Biogeography of viruses in the sea. Annu Rev Virol 2, 41–66 (2015)

  7. 7.

    & The Phage Proteomic Tree: a genome-based taxonomy for phage. J. Bacteriol. 184, 4529–4535 (2002)

  8. 8.

    Marine viruses and their biogeochemical and ecological effects. Nature 399, 541–548 (1999)

  9. 9.

    & Rising to the challenge: accelerated pace of discovery transforms marine virology. Nat. Rev. Microbiol. 13, 147–159 (2015)

  10. 10.

    , , , & Computational approaches to predict bacteriophage-host relationships. FEMS Microbiol. Rev. 40, 258–272 (2016)

  11. 11.

    et al. IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res. 42, D568–D573 (2014)

  12. 12.

    & Viral metagenomics. Nat. Rev. Microbiol. 3, 504–510 (2005)

  13. 13.

    et al. A call for standardized classification of metagenome projects. Environ. Microbiol. 12, 1803–1805 (2010)

  14. 14.

    & Computational prospecting the great viral unknown. FEMS Microbiol. Lett. (2016)

  15. 15.

    , & The global virome: not as big as we thought? Curr. Opin. Virol. 3, 566–571 (2013)

  16. 16.

    et al. Membrane biofouling in a wastewater nitrification reactor: Microbial succession from autotrophic colonization to heterotrophic domination. Water Res. 88, 337–345 (2016)

  17. 17.

    , , & Propagating the missing bacteriophages: a large bacteriophage in a new class. Virol. J. 4, 21 (2007)

  18. 18.

    et al. Microbial species delineation using whole genome sequences. Nucleic Acids Res. 43, 6761–6771 (2015)

  19. 19.

    Methods for virus classification and the challenge of incorporating metagenomic sequence data. J. Gen. Virol. 96, 1193–1206 (2015)

  20. 20.

    , & Depth-stratified functional and taxonomic niche specialization in the ‘core’ and ‘flexible’ Pacific Ocean Virome. ISME J. 9, 472–484 (2015)

  21. 21.

    , , & Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. eLife 4, (2015)

  22. 22.

    , , & Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiology 155, 733–740 (2009)

  23. 23.

    & Virus population dynamics and acquired virus resistance in natural microbial communities. Science 320, 1047–1050 (2008)

  24. 24.

    et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709–1712 (2007)

  25. 25.

    et al. Global transcription of CRISPR loci in the human oral cavity. BMC Genomics 16, 401 (2015)

  26. 26.

    , & Causes for the intriguing presence of tRNAs in phages. Genome Res. 17, 1486–1495 (2007)

  27. 27.

    , & Programming Bacteriophages by Swapping Their Specificity Determinants. Trends Microbiol. 23, 744–746 (2015)

  28. 28.

    & A century of the phage: past, present and future. Nat. Rev. Microbiol. 13, 777–786 (2015)

  29. 29.

    , , & Large variabilities in host strain susceptibility and phage host range govern interactions between lytic marine phages and their Flavobacterium hosts. Appl. Environ. Microbiol. 73, 6730–6739 (2007)

  30. 30.

    , , & The isolation and characterization of two Stenotrophomonas maltophilia bacteriophages capable of cross-taxonomic order infectivity. BMC Genomics 16, 664 (2015)

  31. 31.

    et al. Virus-host and CRISPR dynamics in archaea-dominated hypersaline Lake Tyrrell, Victoria, Australia. Archaea 2013, 370871 (2013)

  32. 32.

    et al. Antarctic archaea-virus interactions: metaproteome-led analysis of invasion, evasion and adaptation. ISME J. 9, 2094–2107 (2015)

  33. 33.

    & Here a virus, there a virus, everywhere the same virus? Trends Microbiol. 13, 278–284 (2005)

  34. 34.

    et al. Functional metagenomic profiling of nine biomes. Nature 452, 629–632 (2008)

  35. 35.

    , & Global distribution of nearly identical phage-encoded DNA sequences. FEMS Microbiol. Lett. 236, 249–256 (2004)

  36. 36.

    et al. Global diversity and biogeography of deep-sea pelagic prokaryotes. ISME J. 10, 596–608 (2016)

  37. 37.

    & Molecular bases and role of viruses in the human microbiome. J. Mol. Biol. 426, 3892–3906 (2014)

  38. 38.

    et al. Metagenomic analysis of double-stranded DNA viruses in healthy adults. BMC Biol. 12, 71 (2014)

  39. 39.

    et al. Association between living environment and human oral viral ecology. ISME J. 7, 1710–1724 (2013)

  40. 40.

    , , , & Large-scale contamination of microbial isolate genomes by Illumina PhiX control. Stand. Genomic Sci. 10, 18 (2015)

  41. 41.

    & When a virus is not a parasite: the beneficial effects of prophages on bacterial fitness. J. Microbiol. 52, 235–242 (2014)

  42. 42.

    & Nearly identical bacteriophage structural gene sequences are widely distributed in both marine and freshwater environments. Appl. Environ. Microbiol. 71, 480–486 (2005)

  43. 43.

    , & Microbiome data science: understanding our microbial planet. Trends Microbiol. 24, 425–427 (2016)

  44. 44.

    et al. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4). Stand. Genomic Sci. 10, 86 (2015)

  45. 45.

    et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44 (D1), D279–D285 (2016)

  46. 46.

    , , , & KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44 (D1), D457–D462 (2016)

  47. 47.

    Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010)

  48. 48.

    , & An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)

  49. 49.

    , , & MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002)

  50. 50.

    , & HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011)

  51. 51.

    et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 10, R85 (2009)

  52. 52.

    , & FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009)

  53. 53.

    & Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. Syst. Biol. 61, 1061–1067 (2012)

  54. 54.

    et al. The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences. PLoS Biol. 14, e1002342 (2016)

  55. 55.

    , , & VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015)

  56. 56.

    et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009)

  57. 57.

    & Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006)

  58. 58.

    , & EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000)

  59. 59.

    BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)

  60. 60.

    et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003)

  61. 61.

    et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209 (2007)

  62. 62.

    & ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 32, 11–16 (2004)

  63. 63.

    et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun. 5, 4498 (2014)

  64. 64.

    , , , & Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes. Front. Microbiol. 6, 381 (2015)

  65. 65.

    et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012)

Download references

Acknowledgements

We thank A. Visel and H. Maughan for critical reading and feedback, A. Pati for help in earlier versions, and the IMG and GOLD teams for their support. This work was conducted by the US Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, under contract number DE-AC02-05CH11231 and used resources of the National Energy Research Scientific Computing Center, supported by the Office of Science of the US Department of Energy.

Author information

Affiliations

  1. Department of Energy, Joint Genome Institute, Walnut Creek, California 94598, USA

    • David Paez-Espino
    • , Emiley A. Eloe-Fadrosh
    • , Georgios A. Pavlopoulos
    • , Alex D. Thomas
    • , Marcel Huntemann
    • , Natalia Mikhailova
    • , Edward Rubin
    • , Natalia N. Ivanova
    •  & Nikos C. Kyrpides
  2. Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA

    • Edward Rubin
  3. Metabiota Inc., San Francisco, California 94104, USA

    • Edward Rubin

Authors

  1. Search for David Paez-Espino in:

  2. Search for Emiley A. Eloe-Fadrosh in:

  3. Search for Georgios A. Pavlopoulos in:

  4. Search for Alex D. Thomas in:

  5. Search for Marcel Huntemann in:

  6. Search for Natalia Mikhailova in:

  7. Search for Edward Rubin in:

  8. Search for Natalia N. Ivanova in:

  9. Search for Nikos C. Kyrpides in:

Contributions

D.P.E., N.N.I., and N.C.K. conceived and led the study. All authors participated in the analysis and interpretation of data. D.P.E., E.E.F., E.R., N.N.I., and N.C.K. wrote the paper.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Nikos C. Kyrpides.

Reviewer Information Nature thanks C. A. Suttle and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Results, Supplementary References and full legends for Supplementary Tables 1-28.

Excel files

  1. 1.

    Supplementary Data

    This file contains Supplementary Tables 1-28 – see the Supplementary Information document for full table legends.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature19094

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.