Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Soil microbiome predictability increases with spatial and taxonomic scale


Soil microorganisms shape ecosystem function, yet it remains an open question whether we can predict the composition of the soil microbiome in places before observing it. Furthermore, it is unclear whether the predictability of microbial life exhibits taxonomic- and spatial-scale dependence, as it does for macrobiological communities. Here, we leverage multiple large-scale soil microbiome surveys to develop predictive models of bacterial and fungal community composition in soil, then test these models against independent soil microbial community surveys from across the continental United States. We find remarkable scale dependence in community predictability. The predictability of bacterial and fungal communities increases with the spatial scale of observation, and fungal predictability increases with taxonomic scale. These patterns suggest that there is an increasing importance of deterministic versus stochastic processes with scale, consistent with findings in plant and animal communities, suggesting a general scaling relationship across biology. Biogeochemical functional groups and high-level taxonomic groups of microorganisms were equally predictable, indicating that traits and taxonomy are both powerful lenses for understanding soil communities. By focusing on out-of-sample prediction, these findings suggest an emerging generality in our understanding of the soil microbiome, and that this understanding is fundamentally scale dependent.

Your institute does not have access to this article

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: How can predictability vary as a function of taxonomic scale?
Fig. 2: How can predictability vary as a function of spatial scale?
Fig. 3: Predictability and spatial scale.
Fig. 4: Predictability and taxonomic scale.
Fig. 5: Average predictability of soil fungi and bacteria in the continental United States.

Data availability

All data used to train statistical models are either publicly available in associated studies or were provided on request to original study authors. All data used to validate models are publicly available through the National Ecological Observatory Network data portal ( We will provide raw and processed data on request for purposes of replicating the findings of this study.

Code availability

All code needed to process raw data and to replicate these analyses is available at GitHub (


  1. Schlesinger, W. H. & Bernhardt, E. S. Biogeochemistry: an Analysis of Global Change (Elsevier/Academic Press, 2012).

  2. Fernandez, C. W., Langley, J. A., Chapman, S., McCormack, M. L. & Koide, R. T. The decomposition of ectomycorrhizal fungal necromass. Soil Biol. Biochem. 93, 38–49 (2016).

    CAS  Google Scholar 

  3. Glassman, S. I. et al. Decomposition responses to climate depend on microbial community composition. Proc. Natl Acad. Sci. USA 115, 11994–11999 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Mushinski, R. M. et al. Microbial mechanisms and ecosystem flux estimation for aerobic NOy emissions from deciduous forest soils. Proc. Natl Acad. Sci. USA (2019).

  5. Prosser, J. I. Dispersing misconceptions and identifying opportunities for the use of ‘omics’ in soil microbial ecology. Nat. Rev. Microbiol. 13, 439–446 (2015).

    CAS  PubMed  Google Scholar 

  6. Delgado-Baquerizo, M. et al. A global atlas of the dominant bacteria found in soil. Science 359, 320–325 (2018).

    CAS  PubMed  Google Scholar 

  7. Tedersoo, L. et al. Global diversity and geography of soil fungi. Science 346, 1256688 (2014).

    PubMed  Google Scholar 

  8. Bahram, M. et al. Structure and function of the global topsoil microbiome. Nature 560, 233–237 (2018).

    CAS  PubMed  Google Scholar 

  9. Drews, G. The roots of microbiology and the influence of Ferdinand Cohn on microbiology of the 19th century. FEMS Microbiol. Rev. 24, 225–249 (2000).

    CAS  PubMed  Google Scholar 

  10. Chase, J. M. Spatial scale resolves the niche versus neutral theory debate. J. Veg. Sci. 25, 319–322 (2014).

    Google Scholar 

  11. Ricklefs, R. E. & Renner, S. S. Global correlations in tropical tree species richness and abundance reject neutrality. Science 335, 464–467 (2012).

    CAS  PubMed  Google Scholar 

  12. Cavender-Bares, J., Keen, A. & Miles, B. Phylogenetic structure of Floridian plant communities depends on taxonomic and spatial scale. Ecology 87, S109–S122 (2006).

    PubMed  Google Scholar 

  13. Cavender-Bares, J., Kozak, K. H., Fine, P. V. A. & Kembel, S. W. The merging of community ecology and phylogenetic biology. Ecol. Lett. 12, 693–715 (2009).

    PubMed  Google Scholar 

  14. Ladau, J. & Eloe-Fadrosh, E. A. Spatial, temporal, and phylogenetic scales of microbial ecology. Trends Microbiol. 27, 662–669 (2019).

    CAS  PubMed  Google Scholar 

  15. Elena, S. F. & Lenski, R. E. Evolution experiments with microorganisms: the dynamics and genetic bases of adaptation. Nat. Rev. Genet. 4, 457–469 (2003).

    CAS  PubMed  Google Scholar 

  16. Diaz, S. & Cabido, M. Plant functional types and ecosystem function in relation to global change. J. Veg. Sci. 8, 463–474 (1997).

    Google Scholar 

  17. Violle, C. et al. Let the concept of trait be functional! Oikos 116, 882–892 (2007).

    Google Scholar 

  18. Fierer, N., Bradford, M. A. & Jackson, R. B. Toward an ecological classification of soil bacteria. Ecology 88, 1354–1364 (2007).

    PubMed  Google Scholar 

  19. Nguyen, N. H. et al. FUNGuild: an open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecol. 20, 241–248 (2016).

    Google Scholar 

  20. Whittaker, R. H. Communities and Ecosystems (Macmillan, 1975).

  21. Gibbons, S. M. Microbial community ecology: function over phylogeny. Nat. Ecol. Evol. 1, 0032 (2017).

    Google Scholar 

  22. Locey, K. J. & Lennon, J. T. Scaling laws predict global microbial diversity. Proc. Natl Acad. Sci. USA 113, 5970–5975 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Dietze, M. C. Ecological Forecasting (Princeton Univ. Press, 2017).

  24. Losos, J. B. Phylogenetic niche conservatism, phylogenetic signal and the relationship between phylogenetic relatedness and ecological similarity among species. Ecol. Lett. 11, 995–1003 (2008).

    PubMed  Google Scholar 

  25. Ramirez, K. S. et al. Detecting macroecological patterns in bacterial communities across independent studies of global soils. Nat. Microbiol. 3, 189–196 (2018).

    CAS  PubMed  Google Scholar 

  26. Smets, W. et al. A method for simultaneous measurement of soil bacterial abundances and community composition via 16S rRNA gene sequencing. Soil Biol. Biochem. 96, 145–151 (2016).

    CAS  Google Scholar 

  27. Hubbell, S. P. The Unified Neutral Theory of Biodiversity and Biogeography (Princeton Univ. Press, 2001).

  28. Leibold, M. A., Urban, M. C., De Meester, L., Klausmeier, C. A. & Vanoverbeke, J. Regional neutrality evolves through local adaptive niche evolution. Proc. Natl Acad. Sci. USA 116, 2612–2617 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Dietze, M. & Lynch, H. Forecasting a bright future for ecology. Front. Ecol. Environ. 17, 3 (2019).

    Google Scholar 

  30. Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Todd-Brown, K. E. O. et al. Causes of variation in soil carbon simulations from CMIP5 Earth system models and comparison with observations. Biogeosciences 10, 1717–1736 (2013).

    Google Scholar 

  32. Todd-Brown, K. E. O. et al. Changes in soil organic carbon storage predicted by Earth system models during the 21st century. Biogeosciences 10, 18969–19004 (2013).

    Google Scholar 

  33. Lekberg, Y. et al. More bang for the buck? Can arbuscular mycorrhizal fungal communities be characterized adequately alongside other fungi using general fungal primers? New Phytol. 220, 971–976 (2018).

    PubMed  Google Scholar 

  34. Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 37, 4302–4315 (2017).

    Google Scholar 

  35. Running, S., Mu, Q. & Zhao, M. MOD17A3 MODIS/Terra Net Primary Production Yearly L4 Global 1km SIN Grid V055. NASA EOSDIS Land Processes DAAC (NASA, 2011);

  36. Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Kõljalg, U. et al. Towards a unified paradigm for sequence-based identification of fungi. Mol. Ecol. 22, 5271–5277 (2013).

    PubMed  Google Scholar 

  39. Steidinger, B. S. et al. Climatic controls of decomposition drive the global biogeography of forest-tree symbioses. Nature 569, 404–408 (2019).

    CAS  PubMed  Google Scholar 

  40. DeSantis, T. Z. et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Albright, M. B. N., Chase, A. B. & Martiny, J. B. H. Experimental evidence that stochasticity contributes to bacterial composition and functioning in a decomposer community. mBio 10, e00568-19 (2019).

    PubMed  PubMed Central  Google Scholar 

  42. Berlemont, R. & Martiny, A. C. Phylogenetic distribution of potential cellulases in bacteria. Appl. Environ. Microbiol. 79, 1545–1554 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Ho, A., Lonardo, D. P. D. & Bodelier, P. L. E. Revisiting life strategy concepts in environmental microbial ecology. Microbiol. Ecol. (2017).

  44. Wang, L. & Wise, M. J. Glycogen with short average chain length enhances bacterial durability. Naturwissenschaften 98, 719–729 (2011).

    CAS  PubMed  Google Scholar 

  45. Soil Microbe Community Composition (DP1.10081.001) (National Ecological Observatory Network (NEON));

  46. Averill, C., Dietze, M. C. & Bhatnagar, J. M. Continental-scale nitrogen pollution is shifting forest mycorrhizal associations and soil carbon stocks. Glob. Change Biol. 24, 4544–4553 (2018).

    Google Scholar 

  47. Pawlowsky-Glahn, V., Egozcue, J. J. & Tolosana-Delgado, R. Modelling and Analysis of Compositional Data (John Wiley & Sons, 2015).

  48. Smithson, M. & Verkuilen, J. A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol. Methods 11, 54–71 (2006).

    PubMed  Google Scholar 

  49. Cribari-Neto, F. & Zeileis, A. Beta regression in R. J. Stat. Softw. 34, 1–22 (2010).

    Google Scholar 

  50. Johnson, N. L., Kotz, S. & Balakrishnan, N. Discrete Multivariate Distributions (Wiley, 1997).

  51. Plummer, M. JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In Proc. 3rd International Workshop on Distributed Statistical Computing 1–8 (2003);

  52. Denwood, M. J. runjags: an R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. J. Stat. Softw. 71, 1–25 (2016).

    Google Scholar 

  53. Gelman, A. & Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge Univ. Press, 2007).

  54. R Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2019).

  55. Moran, P. A. P. Notes on continuous stochastic phenomena. Biometrika 37, 17–23 (1950).

    CAS  PubMed  Google Scholar 

  56. Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).

    CAS  PubMed  Google Scholar 

Download references


The National Ecological Observatory Network is a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle Memorial Institute. C.A., Z.R.W., M.C.D. and J.M.B. were supported by NSF Macrosystems Biology (no. 1638577). C.A. was supported by an Ambizione Grant (no. PZ00P3_179900) from the Swiss National Science Foundation. K.F.A. was supported by the Boston University BRITE Bioinformatics REU program. D. Maynard gave feedback on an earlier version of this manuscript. L. Stanish helped to access and interpret microbial data from the NEON Network. J. Luecke designed and illustrated Figs. 1 and 2.

Author information

Authors and Affiliations



C.A., J.M.B. and M.C.D. conceived the study. C.A., Z.R.W. and K.F.A. performed all analysis and computation. All of the authors wrote the manuscript collaboratively.

Corresponding author

Correspondence to Colin Averill.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Ecology & Evolution thanks Xiaofeng Xu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Cross-validation within the NEON dataset.

Mean cross-validated R2 relative to the 1:1 prediction across functional and taxonomic groups for (a) bacteria and (b) fungi. All models were trained on 70% of NEON core or plot level data, and the validated using the remaining 30% of the data.

Extended Data Fig. 2 Coefficient of variation across taxonomic and functional groupings.

Coefficient of variation of model predictions vs. observations across functional and taxonomic groups, both in and out of sample for (a) bacteria and (b) fungi.

Extended Data Fig. 3 Principal component analysis of microbial environmental sensitivities.

Principal component analysis of phylogenetic and functional group parameter values in the global calibration dataset for (a) fungi and (d) bacteria. Factor importance in principal component space is indicated by the direction and length of factor vectors. We visualize the strongest correlation between an individual factor effect size and predictability and the calibration dataset (b,e), as well as the correlations for all factors (c,f). Factors include net primary productivity (NPP), whether or not conifers are present (conifer), whether or not a site is a forest (forest), mean annual temperature (MAT), mean annual precipitation (MAP), soil pH (pH), soil percent carbon (%C), soil carbon to nitrogen ratio (C:N), and the relative abundance of ectomycorrhizal trees (relEM).

Extended Data Fig. 4 Qualitatively similar but quantitatively different relationships between Acidobacteria and soil pH.

Relative abundance of bacterial phylum Acidobacteriaplotted as function of soil pH, highlighting differences in trends between independent sources. a, Values from combined calibration dataset and validation dataset, with points and loess curves colored by dataset. The relationship between Acidobacteria and pH within the validation data, sourced from the National Ecological Observatory Network, appears to have strong a systematic bias; however, due to the compositional nature of amplicon sequencing data, it is difficult to determine the source of biases for any given taxon. b, Values from a subset of 5 independent datasets used in calibration, with points and loess curves colored by dataset.

Extended Data Fig. 5 Variance decomposition.

Density plot of variance decomposition for all (a) bacterial and (b) fungal groups modeled at the site level.

Extended Data Fig. 6 Distribution of samples used in this analysis.

Distribution of sampling sites used in this analysis. Sites used for fungal model calibration are in pink, sites used for bacterial model calibration are in blue, and NEON sites used for validation are in yellow.

Supplementary information

Supplementary Information

Supplementary Figs. 1–6 and the caption for Supplementary Data 1.

Reporting Summary

Peer Review File

Supplementary Data 1

Out of sample R2 and R21:1 values for all bacterial and fungal groups modeled. Values are reported at core, plot and site scales.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Averill, C., Werbin, Z.R., Atherton, K.F. et al. Soil microbiome predictability increases with spatial and taxonomic scale. Nat Ecol Evol 5, 747–756 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing