Detecting macroecological patterns in bacterial communities across independent studies of global soils

Article metrics


The emergence of high-throughput DNA sequencing methods provides unprecedented opportunities to further unravel bacterial biodiversity and its worldwide role from human health to ecosystem functioning. However, despite the abundance of sequencing studies, combining data from multiple individual studies to address macroecological questions of bacterial diversity remains methodically challenging and plagued with biases. Here, using a machine-learning approach that accounts for differences among studies and complex interactions among taxa, we merge 30 independent bacterial data sets comprising 1,998 soil samples from 21 countries. Whereas previous meta-analysis efforts have focused on bacterial diversity measures or abundances of major taxa, we show that disparate amplicon sequence data can be combined at the taxonomy-based level to assess bacterial community structure. We find that rarer taxa are more important for structuring soil communities than abundant taxa, and that these rarer taxa are better predictors of community structure than environmental factors, which are often confounded across studies. We conclude that combining data from independent studies can be used to explore bacterial community dynamics, identify potential ‘indicator’ taxa with an important role in structuring communities, and propose hypotheses on the factors that shape bacterial biogeography that have been overlooked in the past.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Merging of data from 30 independent studies.
Fig. 2: Regardless of technical differences between studies, many bacterial taxa are still informative about bacterial community structure.
Fig. 3: Rarer taxa are more important for structuring communities than abundant taxa.
Fig. 4: The importance of bacterial taxa classified at different taxonomic ranks.
Fig. 5: Importance of bacterial taxa in community structure related to their occurrence in different studies.


  1. 1.

    Proser, J. I. Dispersing misconceptions and identifying opportunities for the use of ‘omics’ in soil microbial ecology. Nat. Rev. Microbiol. 13, 439–446 (2015).

  2. 2.

    Bardgett, R. D. & van der Putten, W. H. Belowground biodiversity and ecosystem functioning. Nature 515, 505–511 (2014).

  3. 3.

    Caporaso, J. G. et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6, 1621–1624 (2012).

  4. 4.

    Tedersoo, L. et al. Fungal biogeography. Global diversity and geography of soil fungi. Science 346, 1256688 (2014).

  5. 5.

    Davison, J. et al. Fungal symbionts. Global assessment of arbuscular mycorrhizal fungus diversity reveals very low endemism. Science 349, 970–973 (2015).

  6. 6.

    Wieder, W. R., Bonan, G. B. & Allison, S. D. Global soil carbon projections are improved by modelling microbial processes. Nat. Clim. Change 3, 909–912 (2013).

  7. 7.

    Karhu, K. et al. Temperature sensitivity of soil respiration rates enhanced by microbial community response. Nature 513, 81–84 (2014).

  8. 8.

    Barberán, A., Casamayor, E. O. & Fierer, N. The microbial contribution to macroecology. Front. Microbiol. 5, 203 (2014).

  9. 9.

    Ramirez, K. S. et al. Biogeographic patterns in below-ground diversity in New York City’s Central Park are similar to those observed globally. P. R. Soc. B 281, 20141988 (2014).

  10. 10.

    O’Brien, S. L. et al. Spatial scale drives patterns in soil bacterial diversity. Environ. Microbiol. 18, 2039–2051 (2016).

  11. 11.

    Evans, S., Martiny, J. B. H. & Allison, S. D. Effects of dispersal and selection on stochastic assembly in microbial communities. ISME J. 11, 176–185 (2017).

  12. 12.

    Talbot, J. M. et al. Endemism and functional convergence across the North American soil mycobiome. Proc. Natl Acad. Sci. USA 111, 6341–6346 (2014).

  13. 13.

    Barber, A. et al. Why are some microbes more ubiquitous than others? Predicting the habitat breadth of soil bacteria. Ecol. Lett. 17, 794–802 (2014).

  14. 14.

    Ranjard, L. et al. Turnover of soil bacterial diversity driven by wide-scale environmental heterogeneity. Nat. Commun. 4, 1434 (2013).

  15. 15.

    Jetz, W., McPherson, J. M. & Guralnick, R. P. Integrating biodiversity distribution knowledge: toward a global map of life. Trends Ecol. Evol. 27, 151–159 (2012).

  16. 16.

    Ricketts, T. H. et al. Disaggregating the evidence linking biodiversity and ecosystem services. Nat. Commun. 7, 13106 (2016).

  17. 17.

    Dirzo, R. et al. Defaunation in the Anthropocene. Science 345, 401–406 (2014).

  18. 18.

    Patterson, D. J., Cooper, J., Kirk, P. M., Pyle, R. L. & Remsen, D. P. Names are key to the big new biology. Trends Ecol. Evol. 25, 686–691 (2010).

  19. 19.

    Santos, A. M. & Branco, M. The quality of name-based species records in databases. Trends Ecol. Evol. 27, 6–7 (2012).

  20. 20.

    Beiko, R. G. Microbial malaise: how can we classify the microbiome? Trends Microbiol. 23, 671–679 (2015).

  21. 21.

    Tedersoo, L. et al. Standardizing metadata and taxonomic identification in metabarcoding studies. Gigascience 4, 34 (2015).

  22. 22.

    Ramirez, K. S. et al. Toward a global platform for linking soil biodiversity data. Front. Ecol. Evol. 3, 91 (2015).

  23. 23.

    Turner, W. et al. Free and open-access satellite data are key to biodiversity conservation. Biol. Conserv. 182, 173–176 (2015).

  24. 24.

    Gilbert, J. A., Jansson, J. K. & Knight, R. The Earth Microbiome project: successes and aspirations. BMC Biol. 12, 69 (2014).

  25. 25.

    Joppa, L. N. et al. Big data and biodiversity. Filling in biodiversity threat gaps. Science 352, 416–418 (2016).

  26. 26.

    Sinha, R., Abnet, C. C., White, O., Knight, R. & Huttenhower, C. The Microbiome Quality Control project: baseline study design and future directions. Genome Biol. 16, 276 (2015).

  27. 27.

    Sogin, M. L. et al. Microbial diversity in the deep sea and the underexplored ‘rare biosphere’. Proc. Natl Acad. Sci. USA 103, 12115–12120 (2006).

  28. 28.

    García-Palacios, P. et al. Are there links between responses of soil microbes and ecosystem functioning to elevated CO2, N deposition and warming? A global perspective. Glob. Chang. Biol. 21, 1590–1600 (2015).

  29. 29.

    Hermans, S. M. et al. Bacteria as emerging indicators of soil condition. Appl. Environ. Microbiol. 83, e02826-16 (2016).

  30. 30.

    Philippot, L. et al. The ecological coherence of high bacterial taxonomic ranks. Nat. Rev. Microbiol. 8, 523–529 (2010).

  31. 31.

    Shade, A., Caporaso, J. G., Handelsman, J., Knight, R. & Fierer, N. A meta-analysis of changes in bacterial and archaeal communities with time. ISME J. 7, 1493–1506 (2013).

  32. 32.

    Hendershot, J. N., Read, Q. D., Henning, J. A., Sanders, N. J. & Classen, A. T. Consistently inconsistent drivers of microbial diversity and abundance at macroecological scales. Ecology 98, 1757–1763 (2017).

  33. 33.

    Bier, R. L. et al. Linking microbial community structure and microbial processes: an empirical and conceptual overview. FEMS Microbiol. Ecol. 91, fiv113 (2015).

  34. 34.

    Walters, W. A., Xu, Z. & Knight, R. Meta-analyses of human gut microbes associated with obesity and IBD. FEBS Lett. 588, 4223–4233 (2014).

  35. 35.

    Bik, H. M. et al. Sequencing our way towards understanding global eukaryotic biodiversity. Trends Ecol. Evol. 27, 233–243 (2012).

  36. 36.

    Lauber, C. L., Hamady, M., Knight, R. & Fierer, N. Pyrosequencing-based assessment of soil pH as a predictor of soil bacterial community structure at the continental scale. Appl. Environ. Microbiol. 75, 5111–5120 (2009).

  37. 37.

    McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).

  38. 38.

    Lozupone, C. A. et al. Meta-analyses of studies of the human microbiota. Genome Res. 23, 1704–1714 (2013).

  39. 39.

    Pawluczyk, M. et al. Quantitative evaluation of bias in PCR amplification and next-generation sequencing derived from metabarcoding samples. Anal. Bioanal. Chem. 407, 1841–1848 (2015).

  40. 40.

    Lu, X., Seuradge, B. J. & Neufeld, J. D. Biogeography of soil Thaumarchaeota in relation to soil depth and land usage. FEMS Microbiol. Ecol. 93, fiw246 (2017).

  41. 41.

    Jung, S. P. & Kang, H. Assessment of microbial diversity bias associated with soil heterogeneity and sequencing resolution in pyrosequencing analyses. J. Microbiol. 52, 574–580 (2014).

  42. 42.

    Langille, M. G. et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat. Biotechnol. 31, 814–821 (2013).

  43. 43.

    Jousset, A. et al. Where less may be more: how the rare biosphere pulls ecosystems strings. ISME J. 11, 853–862 (2017).

  44. 44.

    De Cáceres, M. & Legendre, P. Associations between species and groups of sites: indices and statistical inference. Ecology 90, 3566–3574 (2009).

  45. 45.

    Maestre, F. T. et al. Increasing aridity reduces soil microbial diversity and abundance in global drylands. Proc. Natl Acad. Sci. USA 112, 15684–15689 (2015).

  46. 46.

    Knights, D. et al. Bayesian community-wide culture-independent microbial source tracking. Nat. Methods 8, 761–763 (2011).

  47. 47.

    Muir, P. et al. The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biol. 17, 53 (2016).

  48. 48.

    Rideout, J. R. et al. Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences. PeerJ 2, e545 (2014).

  49. 49.

    Yilmaz, P. et al. The genomic standards consortium: bringing standards to life for microbial ecology. ISME J. 5, 1565–1567 (2011).

  50. 50.

    Wickham, H. & Francois, R. dplyr: a grammar of data manipulation. R package v. 0.5.0 (CRAN, 2016);

  51. 51.

    The R Core Team. R: A Language and Environment for Statistical (R Foundation for Statistical Computing, 2016);

  52. 52.

    Wilke, A. et al. The MG-RAST metagenomics database and portal in 2015. Nucleic Acids Res. 44, D590–D594 (2016).

  53. 53.

    Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).

  54. 54.

    Suzuki, M. T. & Giovannoni, S. J. Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR. Appl. Environ. Microbiol. 62, 625–630 (1996).

  55. 55.

    Sipos, R. et al. Effect of primer mismatch, annealing temperature and PCR cycle number on 16S rRNA gene-targeting bacterial community analysis. FEMS Microbiol. Ecol. 60, 341–350 (2007).

  56. 56.

    Klindworth, A. et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 41, e1 (2013).

  57. 57.

    Joshi, N. A. & Fass, J. N. Sickle: a sliding-window, adaptive, quality-based trimming tool for FastQ files. v. 1.33 (2011);

  58. 58.

    Rognes, T. et al. vsearch: VSEARCH 1.9.6. (2016);

  59. 59.

    McDonald, D. et al. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Gigascience 1, 7 (2012).

  60. 60.

    Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).

  61. 61.

    Koster, J. & Rahmann, S. Snakemake — a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).

  62. 62.

    Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

  63. 63.

    Breiman, L. & Cutler, A. Using Random Forests v4.0 (UC Berkeley, 2003);

  64. 64.

    Shi, T. & Horvath, S. Unsupervised learning with Random Forest predictors. J. Comput. Graph. Stat. 15, 118–138 (2006).

Download references


We thank all the people who contributed data and input to this study. This study was conducted at a workshop (May 2015, Manchester, UK) funded by the British Ecological Society’s special interest group Plants-Soils-Ecosystems and organized by F.T.d.V. and K.S.R. This study and participants were funded in part by ERC Advanced Grant 26055290 (K.S.R., and W.H.v.d.P.); BBSRC David Phillips Fellowship (BB/L02456X/1) (F.T.d.V.); ERC Grant Agreements 242658 (BIOCOM) and 647038 (BIODESERT) (F.T.M.); the European Regional Development Fund (Centre of Excellence EcolChange) (J.D.); Yorkshire Agricultural Society, Nafferton Ecological Farming Group, and the Northumbria University Research Development Fund (C.H.O.); BBSRC Training Grant (BB/K501943/1) (C.H.); Wallenberg Academy Fellowship (KAW 2012.0152), Formas (214-2011-788) and Vetenskapsrådet (612-2011-5444) (E.D.); the Glastir Monitoring & Evaluation Programme (contract reference: C147/2010/11) and the full support of the GMEP team on the Glastir project (D.L.J., S.C., and D.A.R.). Computing was facilitated by the University of Manchester Condor pool and the CLIMB infrastructure (

Author information

The idea for this study was conceived by F.T.d.V. and K.S.R. The data sets were compiled by C.G.K., R.G., J.D., A.H., B.C., G.F., A.L.S., and J.R. Metadata were compiled by J.D. and J.R. Raw sequence analysis was conducted by M.d.H. Primer bias analysis was conducted by A.C. Random Forest analyses and figures were conducted by C.G.K. The manuscript was written by K.S.R., C.G.K., and F.T.d.V., with contributions from all co-authors.

Correspondence to Kelly S. Ramirez.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Supplementary Tables 2 and 3 and Supplementary Figures 1–10.

Life Sciences Reporting Summary

Figure Generation Data

Supplementary Table 4: Data used to generate figures.

Figure Generation Code

R code use to generate figures.

Supplementary Table 1

Summary of all datasets used.

Supplementary Table 5

Name-matched data.

Supplementary Table 6

Sequence-matched data.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading