Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Biogeographic patterns and drivers of soil viromes

Abstract

Viruses are crucial in shaping soil microbial functions and ecosystems. However, studies on soil viromes have been limited in both spatial scale and biome coverage. Here we present a comprehensive synthesis of soil virome biogeographic patterns using the Global Soil Virome dataset (GSV) wherein we analysed 1,824 soil metagenomes worldwide, uncovering 80,750 partial genomes of DNA viruses, 96.7% of which are taxonomically unassigned. The biogeography of soil viral diversity and community structure varies across different biomes. Interestingly, the diversity of viruses does not align with microbial diversity and contrasts with it by showing low diversity in forest and shrubland soils. Soil texture and moisture conditions are further corroborated as key factors affecting diversity by our predicted soil viral diversity atlas, revealing higher diversity in humid and subhumid regions. In addition, the binomial degree distribution pattern suggests a random co-occurrence pattern of soil viruses. These findings are essential for elucidating soil viral ecology and for the comprehensive incorporation of viruses into soil ecosystem models.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the global soil virome database.
Fig. 2: Viral community properties across biomes and geography.
Fig. 3: Drivers of viral community assembly.
Fig. 4: Random co-occurrence pattern.
Fig. 5: Map of viral α-diversity indices (Shannon index) at 0.01° resolution as modelled using the random-forest model.

Similar content being viewed by others

Data availability

All GSV sequences, GSV database viral information and map TIFF files can be downloaded from Zenodo at https://zenodo.org/records/10463783. The interactive GSV map is available at https://bmalab.shinyapps.io/global_soil_viromes.

Code availability

Scripts used in this manuscript are available on microbma GitHub under project ‘global soil viromes’ (https://microbma.github.io/project/gsv.html).

References

  1. Emerson, J. B. Soil viruses: a new hope. mSystems 4, e00120-19 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Guidi, L. et al. Plankton networks driving carbon export in the oligotrophic ocean. Nature 532, 465–470 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. van den Hoogen, J. et al. Soil nematode abundance and functional group composition at a global scale. Nature 572, 194–198 (2019).

    Article  PubMed  Google Scholar 

  4. Delgado-Baquerizo, M. et al. A global atlas of the dominant bacteria found in soil. Science 359, 320–325 (2018).

    Article  CAS  PubMed  Google Scholar 

  5. Bahram, M. et al. Structure and function of the global topsoil microbiome. Nature 560, 233–237 (2018).

    Article  CAS  PubMed  Google Scholar 

  6. Gregory, A. C. et al. Marine DNA viral macro- and microdiversity from pole to pole. Cell 177, 1109–1123.e14 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Paez-Espino, D. et al. Uncovering Earth’s virome. Nature 536, 425–430 (2016).

    Article  CAS  PubMed  Google Scholar 

  8. Roux, S. et al. IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucleic Acids Res. 49, D764–D775 (2021).

    Article  CAS  PubMed  Google Scholar 

  9. ter Horst, A. M. et al. Minnesota peat viromes reveal terrestrial and aquatic niche partitioning for local and global viral populations. Microbiome 9, 233 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Emerson, J. B. et al. Host-linked soil viral ecology along a permafrost thaw gradient. Nat. Microbiol. 3, 870–880 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Jin, M. et al. Diversities and potential biogeochemical impacts of mangrove soil viruses. Microbiome 7, 58 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Han, L.-L. et al. Distribution of soil viruses across China and their potential role in phosphorous metabolism. Environ. Microbiome 17, 6 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Bi, L. et al. Diversity and potential biogeochemical impacts of viruses in bulk and rhizosphere soils. Environ. Microbiol. 23, 588–599 (2021).

    Article  CAS  PubMed  Google Scholar 

  14. Williamson, K. E., Fuhrmann, J. J., Wommack, K. E. & Radosevich, M. Viruses in soil ecosystems: an unknown quantity within an unexplored territory. Annu. Rev. Virol. 4, 201–219 (2017).

    Article  CAS  PubMed  Google Scholar 

  15. Santos-Medellin, C. et al. Viromes outperform total metagenomes in revealing the spatiotemporal patterns of agricultural soil viral communities. ISME J. 15, 1956–1970 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Leinonen, R., Sugawara, H. & Shumway, M., the International Nucleotide Sequence Database Collaboration. The Sequence Read Archive. Nucleic Acids Res. 39, D19–D21 (2011).

    Article  CAS  PubMed  Google Scholar 

  17. Trubl, G., Hyman, P., Roux, S. & Abedon, S. T. Coming-of-age characterization of soil viruses: a user’s guide to virus isolation, detection within metagenomes, and viromics. Soil Syst. 4, 23 (2020).

    Article  CAS  Google Scholar 

  18. Camarillo-Guerrero, L. F., Almeida, A., Rangel-Pineros, G., Finn, R. D. & Lawley, T. D. Massive expansion of human gut bacteriophage diversity. Cell 184, 1098–1109.e9 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Gregory, A. C. et al. The gut virome database reveals age-dependent patterns of virome diversity in the human gut. Cell Host Microbe 28, 724–740 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Brister, J. R., Ako-adjei, D., Bao, Y. & Blinkova, O. NCBI Viral Genomes Resource. Nucleic Acids Res. 43, D571–D577 (2015).

    Article  CAS  PubMed  Google Scholar 

  21. Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).

    Article  CAS  PubMed  Google Scholar 

  22. Fierer, N. & Jackson, R. B. The diversity and biogeography of soil bacterial communities. Proc. Natl Acad. Sci. USA 103, 626–631 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Bates, S. T. et al. Examining the global distribution of dominant archaeal populations in soil. ISME J. 5, 908–917 (2011).

    Article  CAS  PubMed  Google Scholar 

  24. Halliday, F. W. & Rohr, J. R. Measuring the shape of the biodiversity–disease relationship across systems reveals new findings and key gaps. Nat. Commun. 10, 5032 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Declerck, S. A. J., Winter, C., Shurin, J. B., Suttle, C. A. & Matthews, B. Effects of patch connectivity and heterogeneity on metacommunity structure of planktonic bacteria and viruses. ISME J. 7, 533–542 (2013).

    Article  PubMed  Google Scholar 

  26. Leibold, M. A. & Mikkelson, G. M. Coherence, species turnover, and boundary clumping: elements of meta-community structure. Oikos 97, 237–250 (2002).

    Article  Google Scholar 

  27. Presley, S. J., Higgins, C. L. & Willig, M. R. A comprehensive framework for the evaluation of metacommunity structure. Oikos 119, 908–917 (2010).

    Article  Google Scholar 

  28. Rahman, G. et al. Determination of effect sizes for power analysis for microbiome studies using large microbiome databases. Genes 14, 1239 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Jansson, J. K. & Wu, R. Soil viral diversity, ecology and climate change. Nat. Rev. Microbiol. 21, 296–311 (2023).

    Article  CAS  PubMed  Google Scholar 

  30. Kimura, M., Jia, Z.-J., Nakayama, N. & Asakawa, S. Ecology of viruses in soils: past, present and future perspectives. Soil Sci. Plant Nutr. 54, 1–32 (2008).

    Article  Google Scholar 

  31. Faust, K. & Raes, J. Microbial interactions: from networks to models. Nat. Rev. Microbiol. 10, 538–550 (2012).

    Article  CAS  PubMed  Google Scholar 

  32. Eisenberg, E. & Levanon, E. Y. Preferential attachment in the protein network evolution. Phys. Rev. Lett. 91, 138701 (2003).

    Article  PubMed  Google Scholar 

  33. Ma, B. et al. Genetic correlation network prediction of forest soil microbial functional organization. ISME J. 12, 2492–2505 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Ma, B. et al. Geographic patterns of co-occurrence network topological features for soil microbiota at continental scale in eastern China. ISME J. 10, 1891–1901 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Ma, B. et al. Earth microbial co-occurrence network reveals interconnection pattern across microbiomes. Microbiome 8, 82 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Zhou, J. et al. Functional molecular ecological networks. mBio 1, e00169-10 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Knowles, B. et al. Lytic to temperate switching of viral communities. Nature 531, 466–470 (2016).

    Article  CAS  PubMed  Google Scholar 

  38. Coutinho, F. H. et al. Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans. Nat. Commun. 8, 15955 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Knowles, B. et al. Variability and host density independence in inductions-based estimates of environmental lysogeny. Nat. Microbiol. 2, 17064 (2017).

    Article  CAS  PubMed  Google Scholar 

  40. Crowther, T. W. et al. The global soil community and its influence on biogeochemistry. Science 365, eaav0550 (2019).

    Article  CAS  PubMed  Google Scholar 

  41. Lance, J. C. & Gerba, C. P. Virus movement in soil during saturated and unsaturated flow. Appl. Environ. Microbiol. 47, 335–337 (1984).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Hurst, C. J., Gerba, C. P. & Cech, I. Effects of environmental variables and soil characteristics on virus survival in soil. Appl. Environ. Microbiol. 40, 1067–1079 (1980).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Zhao, B., Zhang, H., Zhang, J. & Jin, Y. Virus adsorption and inactivation in soil as influenced by autochthonous microorganisms and water content. Soil Biol. Biochem. 40, 649–659 (2008).

    Article  CAS  Google Scholar 

  44. Nayfach, S. et al. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat. Microbiol. 6, 960–970 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Sakowski, E. G. et al. Interaction dynamics and virus–host range for estuarine actinophages captured by epicPCR. Nat. Microbiol. 6, 630–642 (2021).

    Article  CAS  PubMed  Google Scholar 

  46. Johansen, J. et al. Genome binning of viral entities from bulk metagenomics data. Nat. Commun. 13, 965 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. de Jonge, P. A. et al. Adsorption sequencing as a rapid method to link environmental bacteriophages to hosts. iScience 23, 101439 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Džunková, M. et al. Defining the human gut host–phage network through single-cell viral tagging. Nat. Microbiol. 4, 2192–2203 (2019).

    Article  PubMed  Google Scholar 

  49. Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Kuzyakov, Y. & Mason-Jones, K. Viruses in soil: nano-scale undead drivers of microbial life, biogeochemical turnover and ecosystem functions. Soil Biol. Biochem. 127, 305–317 (2018).

    Article  CAS  Google Scholar 

  51. Liao, H. et al. Response of soil viral communities to land use changes. Nat. Commun. 13, 6027 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Roux, S., Enault, F., Hurwitz, B. L. & Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Roux, S. et al. Minimum Information about an Uncultivated Virus Genome (MIUViG). Nat. Biotechnol. 37, 29–37 (2019).

    Article  CAS  PubMed  Google Scholar 

  54. Kim, K.-H. et al. Amplification of uncultured single-stranded DNA viruses from rice paddy soil. Appl. Environ. Microbiol. 74, 5975–5985 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Guo, J., Vik, D., Pratama, A. A., Roux, S. & Sullivan, M. Viral sequence identification SOP with VirSorter2. protocols.io https://www.protocols.io/view/viral-sequence-identification-sop-with-virsorter2-5qpvoyqebg4o/v3 (2021).

  56. Wang, B. et al. Tackling soil ARG-carrying pathogens with global-scale metagenomics. Adv. Sci. 10, 2301980 (2023).

    Article  CAS  Google Scholar 

  57. Page, M. J. et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Int. J. Surg. 88, 105906 (2021).

    Article  PubMed  Google Scholar 

  58. Whitman, T. et al. Dynamics of microbial community composition and soil organic carbon mineralization in soil following addition of pyrogenic and fresh organic matter. ISME J. 10, 2918–2930 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Swenson, T. L., Karaoz, U., Swenson, J. M., Bowen, B. P. & Northen, T. R. Linking soil biology and chemistry in biological soil crust using isolate exometabolomics. Nat. Commun. 9, 19 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Högfors-Rönnholm, E. et al. Metagenomes and metatranscriptomes from boreal potential and actual acid sulfate soil materials. Sci. Data 6, 207 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Mackelprang, R. et al. Microbial community structure and functional potential in cultivated and native tallgrass prairie soils of the midwestern United States. Front. Microbiol. 9, 1775 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Nuccio, E. E. et al. Niche differentiation is spatially and temporally regulated in the rhizosphere. ISME J. 14, 999–1014 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Mushinski, R. M. et al. Nitrogen cycling microbiomes are structured by plant mycorrhizal associations with consequences for nitrogen oxide fluxes in forests. Glob. Change Biol. 27, 1068–1082 (2021).

    Article  CAS  Google Scholar 

  64. Ouyang, Y. & Norton, J. M. Short-term nitrogen fertilization affects microbial community composition and nitrogen mineralization functions in an agricultural soil. Appl. Environ. Microbiol. 86, e02278-19 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Abraham, B. S. et al. Shotgun metagenomic analysis of microbial communities from the Loxahatchee nature preserve in the Florida Everglades. Environ. Microbiome 15, 2 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Kalyuzhnaya, M. Systems level insights into methane cycling in arid and semi-arid ecosystems via community metagenomics and metatranscriptomics. DOE Data Explorer https://www.osti.gov/dataexplorer/biblio/dataset/1488146 (2015).

  67. Banfield, J. Terabase sequencing for comprehensive genome reconstruction to assess metabolic potential for environmental bioremediation. OSTI.GOV https://www.osti.gov/dataexplorer/biblio/dataset/1487721 (2011).

  68. West-Roberts, J. A. et al. The Chloroflexi supergroup is metabolically diverse and representatives have novel genes for non-photosynthesis based CO2 fixation. Preprint at bioRxiv https://doi.org/10.1101/2021.08.23.457424 (2021).

  69. Kakalia, Z. et al. The Colorado East River Community Observatory data collection. Hydrol. Process. 35, e14243 (2021).

    Article  Google Scholar 

  70. Jun, C., Ban, Y. & Li, S. Open access to Earth land-cover map. Nature 514, 434 (2014).

    Article  PubMed  Google Scholar 

  71. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

    Article  CAS  PubMed  Google Scholar 

  73. Kieft, K., Zhou, Z. & Anantharaman, K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 8, 90 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Ren, J. et al. Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8, 64–77 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. von Meijenfeldt, F. A. B. et al. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol. 20, 217 (2019).

    Article  Google Scholar 

  76. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  77. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article  PubMed  Google Scholar 

  78. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Paez-Espino, D. et al. IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes. Nucleic Acids Res. 47, D678–D686 (2019).

    Article  CAS  PubMed  Google Scholar 

  80. Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Rodriguez-R, L. M., Gunturu, S., Tiedje, J. M., Cole, J. R. & Konstantinidis, K. T. Nonpareil 3: fast estimation of metagenomic coverage and sequence diversity. mSystems 3, e00039-18 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  84. Ma, B. et al. A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources. Nat. Commun. 14, 7318 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. van Dongen, S. M. Graph Clustering by Flow Simulation. PhD thesis, Univ. Utrecht (2000).

  86. Bin Jang, H. et al. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat. Biotechnol. 37, 632–639 (2019).

    Article  Google Scholar 

  87. Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. 39, 578–585 (2021).

    Article  CAS  PubMed  Google Scholar 

  88. Bland, C. et al. CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics 8, 209 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  89. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Galiez, C., Siebert, M., Enault, F., Vincent, J. & Söding, J. WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs. Bioinformatics 33, 3113–3114 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).

    Article  Google Scholar 

  93. Wang, B. et al. Network enhancement as a general method to denoise weighted biological networks. Nat. Commun. 9, 3108 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Chavent, M., Kuentz-Simonet, V., Liquet, B. & Saracco, J. ClustOfVar: an R package for the clustering of variables. J. Stat. Softw. 50, 1–16 (2012).

    Article  Google Scholar 

  95. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  96. Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2022).

  97. Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).

  98. Phillips, H. R. P. et al. Global distribution of earthworm diversity. Science 366, 480–485 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. GDAL/OGR Contributors. GDAL/OGR Geospatial Data Abstraction Library. Open Source Geospatial Foundation https://gdal.org/ (2021).

  100. Tennekes, M. tmap: thematic maps in R. J. Stat. Softw. 84, 1–39 (2018).

    Article  Google Scholar 

  101. Davison, A. C. & Hinkley, D. V. Bootstrap Methods and their Application Ch. 5 (Cambridge Univ. Press, 1997).

  102. Canty, A. & Ripley, B. boot: Bootstrap R (S-Plus) functions. R version 1.3-28.1. CRAN https://CRAN.R-project.org/package=boot (2022).

  103. Ginestet, C. ggplot2: elegant graphics for data analysis. J. R. Stat. Soc. A 174, 245–246 (2011).

    Article  Google Scholar 

  104. Wickham, H., François, R., Henry, L., Müller, K. & Vaughan, D. dplyr: a grammar of data manipulation. R version 1.1.2. RStudio https://dplyr.tidyverse.org/ (2023).

  105. Wickham, H. Reshaping data with the reshape package. J. Stat. Softw. 21, 1–20 (2007).

  106. Wickham, H. et al. Welcome to the Tidyverse. J. Open Source Softw. 4, 1686 (2019).

    Article  Google Scholar 

  107. Luo, F., Zhong, J., Yang, Y., Scheuermann, R. H. & Zhou, J. Application of random matrix theory to biological networks. Phys. Lett. A 357, 420–423 (2006).

    Article  CAS  Google Scholar 

  108. Bivand, R. & Piras, G. Comparing implementations of estimation methods for spatial econometrics. J. Stat. Softw. 63, 1–36 (2015).

    Article  Google Scholar 

  109. Bivand, R., Hauke, J. & Kossowski, T. Computing the Jacobian in Gaussian spatial autoregressive models: an illustrated comparison of available methods. Geogr. Anal. 45, 150–179 (2013).

    Article  Google Scholar 

  110. Dormann, C. F. et al. Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30, 609–628 (2007).

    Article  Google Scholar 

Download references

Acknowledgements

We thank C. Kelly, C. Averill, D. Buckley, D. Goodheart, D. Duncan, D. Myrold, E. Eloe-Fadrosh, E. Brodie, E. Högfors-Rönnholm, H. Cadillo-Quiroz, J. Tiedje, J. Jansson, J. Norton, J. Blanchard, J. Schweitzer, J. Banfield, J. Gladden, J. Raff, K. Peay, K. Gravuer, K. M. DeAngelis, L. Meredith, M. Kalyuzhnaya, M. Waldrop, N. Fierer, P. Dijkstra, P. Baldrian, S. Theroux, S. Tringe, T. Woyke, T. Whitman, W. Mohn and San Diego State University for permission to use their metagenome data. We also thank Amazon Web Services for providing computing resources. This work was supported by the National Natural Science Foundation of China (grants 41721001, 42090060, 42277283 and 41991334), the Key R&D Program of Zhejiang Province (2023C02004, 2023C02015) and the Fundamental Research Funds for the Central Universities (226-2022-00139).

Author information

Authors and Affiliations

Authors

Contributions

B.M. and J.X. created the study design. Y.W., K.Z., X.T., H.D. and R.X. collected all datasets. B.M., Y.W., K.Z., C.T., C.W. and B.D. performed the data analysis and visualization. J.X., B.M., Y.W., E.S., K.Z., X.L., R.X., X.T., R.A.D., Y.-G.Z., Y.Y., L.H. and H.C. contributed to scientific discussion and wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jianming Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Ecology & Evolution thanks Kyle Meyer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Flow diagram of sample identification.

The arrow delineates sequential steps. There are three main stages: identification, screening and inclusion. The number in each box represents the total number of samples involved in the step.

Extended Data Fig. 2 Bioinformatic Workflow.

The red background highlights the software used along with version specifics. The blue background outlines information on data volumes. Arrows illustrate the order of computational procedures, encompassing (A) prediction of viral contigs from metagenome-assembled contigs, (B) creation of OTU tables and conducting biogeography analyses, (C) clustering of genomes for database comparison (a) and detailing phylogenetic levels (b), (D) assignment of viral taxonomy, (E) identification of temperate phages and (F) determination of host assignment.

Extended Data Fig. 3 Viral information.

Virus validation (a) Density plot of the number of BUSCO hits divided by the total number of genes (BUSCO ratio) for all viruses in GSV dataset. (b) Histogram of the number of GSV vOTUs with different numbers of viral protein family (VFP) hits. Histograms of the number of (c) vOTUs, (d) viral genus-level vOTUs and (e) viral family-level vOTUs present in different percentages of GSV samples. (f) The proportion of genome populations that are putative prophages for this study (GSV), IMG/VR v3 ‘soil only’ metagenomes (IMGsoil), Phages and Integrated Genomes Encapsidated Or Not database (PIGEON), Global Oceans Viromes 2.0 database (GOV2), Gut Virome Database (GVD), Gut Phage Database (GPD) and Viral Refseq v201 (Refseq). (g) Distribution of sequence quality determined by CheckV. (h) Viral contigs sorted by relative abundance and contig length, and those identified at Family level (blue).

Extended Data Fig. 4 Host-virus linkages.

Host-virus network wherein nodes indicate species (hosts; blue) or vOTUs (viruses; bronze); edges indicate a host-virus relationship. A small number of viral nodes were responsible for a large number of host-viral relationships in the virus-host network. Microbial interaction networks often follow a scale-free format in which the majority of connections belong to a small number of nodes. As such, keystone (or hub) nodes enact substantial leverage over the community as a whole.

Extended Data Fig. 5 Assessing the Impact of Sequencing Depth on Diversity Results.

(a & b) Correlations between Shannon index obtained from subsampled reads and those obtained from all reads. Each dot represents a soil metagenome sample that colored by the biome type. The lines denote the predicted values based on the linear mixed model and the shaded areas flanking the lines indicate the upper and lower 95% confidence intervals. The numbers in the lower right corner are the spearman correlation results. (c) Viral Shannon index across varying sequencing depths, with second-order fit for total samples (left upper corner) and for subsamples separated by biomes (upper) and continents (bottom). The lines in the graph represent the predicted values as calculated by the linear mixed model. Surrounding these lines, the shaded regions illustrate the upper and lower bounds of the 95% confidence intervals. (d) Correlation between microbial diversity and viral Shannon index normalized by sample read number (Shannon per Read Count), and each dot represents a soil metagenome sample that colored by the biome type. (e) Median and interquartile ranges for Shannon per Read Count, with whiskers extending to ≤1.5× interquartile range. Significance differences were assessed using one-way ANOVA with LSD test; biomes with different lowercase letters are significantly different at α=0.05; (n = 620 (Agricultural Land), n = 42 (Artificial Surfaces), n = 40 (Bare Land), n = 310 (Wetland), n = 293 (Grassland), n = 56 (Tundra), n = 417 (Forest), n = 21 (Shrubland)). (f) Correlation between microbial diversity and viral Shannon index for samples with sequencing depths ≥100 million reads. (g) Median and interquartile ranges for viral Shannon index at species level for samples with sequencing depths ≥100 million reads, with whiskers extending to ≤1.5× interquartile range. Significance was assessed using one-way ANOVA and LSD tests, with varying lowercase letters marking significant differences at α = 0.05 (n = Same as (e)).

Extended Data Fig. 6 Expanded viral diversity across biomes (including paddy soil and coastal soil).

Median and interquartile ranges for viral Shannon index at species level, with whiskers extending to ≤1.5× interquartile range. Significance differences were assessed using one-way ANOVA with LSD test; biomes with different lowercase letters are significantly different at α = 0.05. The numbers in the figure represent sample sizes (n).

Extended Data Fig. 7 Model validation, accuracy assessment and extent of interpolation across all terrestrial pixels for the 10 environmental covariate layers.

(a) Clustering tree of covariates (main effects circled with a red box). (b) Leave-One-Out cross validation result of the models forecasting viral alpha diversity (Shannon index). Linear regression was used to analyze the relationship between observed and predicted Shannon indices, assuming a two-sided test. (c) Percentage of pixels falling within the convex hulls of the first 5 principal component spaces (covering >80% of the sample space variation collectively). Prediction outliers occurred at latitudinal extremes. The limited sample footprint in equatorial sites, Sahara Desert area, middle Asia and Australia resulted in lower forecast confidence for these regions. (d) Bootstrapped (100 iterations) coefficient of variation (standard deviation divided by the mean predicted value) results represent prediction accuracy of Shannon index. Sampling was stratified by biome. The Shannon predictions had low certainty in Sahara Desert area, middle Asia and areas between the Tropic of Capricorn and the Equator.

Extended Data Fig. 8 Accumulation curves.

Accumulation curves for total samples (left upper corner) and for subsamples separated by biomes (upper) and continents (bottom). The curves depict mean values, and the shaded regions around these curves represent the standard deviation (SD).

Supplementary information

Reporting Summary

Peer Review File

Supplementary Tables 1–7

Supplementary Table 1. Metagenomes used in the GSV dataset. Sample ID (NCBI number, JGI ID), longitude, latitude, biome, sequencing size, continent, data contributor and library strategy and so on. Table 2. Host–virus linkage information. Table 3. The 84 associated environmental factors used to analyse viral biogeography. Table 4. A total of 84 global covariate layers used in model establishment. The 7 Nadir Reflectance Band layers (that is, MCD43A4.005 BRDF-Adjusted Reflectance 16-Day Global 500 m) are grouped as one item in the table. Table 5. Effect size results of 84 global covariate layers, biome, latitude and longitude on -diversity and β-diversity. Table 6. Overview of metagenomic data size from previous studies (Sheet 1). Analysis of read count thresholds and their implications on viral diversity (Sheet 2). Table 7. Spearman’s correlation results after spatial regression for environmental factors and for viral and microbial community diversity.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, B., Wang, Y., Zhao, K. et al. Biogeographic patterns and drivers of soil viromes. Nat Ecol Evol 8, 717–728 (2024). https://doi.org/10.1038/s41559-024-02347-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41559-024-02347-2

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing