## Introduction

Soil hosts unparalleled bacterial diversity, ranking highest among all other compartments of the biosphere1,2,3. The number of bacterial phylotypes ranges between 102 and 106 per gram of soil1,2,4, with high values similar to the diversity in all of earths environments3. This immense richness is often attributed to soil’s intrinsically heterogeneous physical and chemical micro-environments5,6,7,8,9. The complex structure of soil pores offers numerous refugia for hosting diverse bacterial species9. This wide range of microhabitats is particularly important for maintaining the rare components of the soil microbiome. Low abundance bacterial species play important roles in key biogeochemical processes10,11 and serve as a “seed bank” for species richness12. Microbial diversity is manifested both at the scale of soil grains8 and at very large scales across climatic regions and terrestrial biomes2,13,14. These observations often include variations in microbial biomass that responds to resource availability and affects bacterial diversity at all scales15,16,17. For example, well-established observations of microbial abundance variations with soil depth18 could confound inferences of bacterial richness by promoting the detection of low abundant species in resource-rich environments.

Quantifying the roles of soil factors, such as soil texture, porosity and hydration conditions in relation to climate and vegetation cover, is an important step towards disentangling bacterial diversity and abundance as suggested by recent empirical evidence17. Soil chemical properties such as pH2,14,17,19 and organic carbon content15,16,17 together with climatic attributes, such as aridity index15, precipitation2,17 and temperature13, have been identified as important explanatory variables. Yet, the rapid expansion of soil bacterial diversity datasets has not been met with similar development of predictive models for interpretation of the observed spatial patterns20. Improved predictability of soil bacterial diversity could be essential for understanding soil bacterial functioning; from contributions to soil respiration11,21 to the resistance of bacterial communities to invasion by pathogens22.

Such endeavors invariably require development of mechanistic frameworks for systematic incorporation of the various factors that affect soil bacterial diversity. In this study, we capitalize on recent empirical2,8,13,15,17,23 and theoretical developments7,24,25 to generalize the role of soil aqueous microhabitat fragmentation and its nearly universal role in mediating bacterial diversity across soil types and climatic conditions. To characterize the average conditions in soils and facilitate long-term predictions, we define a soil climatic water content that combines rainfall patterns and volumetric soil water holding capacity into a well-defined attribute. This measure considers the average duration between soil wetting events important for diversity maintenance (see Methods). Under a wide range of climatic conditions, soils remain unsaturated with the bacterial aqueous habitats fragmented to varying degrees based on soil type and rainfall dynamics (amount and frequency). A critical hypothesis is that the microscale arrangement of water retained in soil pores defines the size distribution and connectedness of aqueous bacterial habitats that, in turn, affect diffusion rates of substrates, the rates and spatial extents of cell motility25,26 and opportunities for cell-to-cell interactions27. The objective of this study was to formalize the influence of these abiotic factors in a heuristic framework that enables quantitative representation of soil bacterial abundance and diversity at scales ranging from grains to watersheds and beyond.

The core of the model is the quantification of numbers and sizes of aqueous bacterial habitats considering climatic water contents and soil types. We use concepts of percolation theory to describe the size distribution of aqueous patches24 that could support bacterial cells. Soil organic carbon input flux, derived from the net primary productivity (NPP), and mean annual temperature (MAT) are used to estimate a soil-carrying capacity that defines limits for the abundance of bacterial cells (Fig. 1). For simplicity, we first assume that each isolated aqueous patch is inhabited by a single bacterial phylotype (hereafter referred to as “species”). This heuristically enables estimation of bacterial diversity based on the species abundance distribution (SAD) deduced from the size and number distribution of microscale aqueous habitats. The framework expresses soil bacterial diversity at two interlinked spatial scales: at the single aqueous habitat scale and at the soil sample scale that can contain many isolated aqueous habitats.

Modeled trends of soil bacterial carrying capacity and diversity are compared to empirical observations1,4,18 across terrestrial biomes and suggest a peak in bacterial diversity at intermediate climatic water contents. To evaluate predictions by this aqueous-phase fragmentation-based heuristic model (HM), we employ a detailed, spatially explicit individual-based model (SIM) that mechanistically simulates bacterial communities growing on hydrated soil surfaces7,25. The SIM enables systematic variations of hydration conditions and tracks the growth and life history of multiple species interacting on soil grain surfaces (see Methods).

The simple HM does not differentiate between the roles of legacy and environmental conditions in shaping soil bacterial diversity. As evidenced from the choice of climatic averaging and the implicit representation of species with no taxonomic attribution, the focus lies on the role of aqueous habitats and their average connectivity. Other factors at play such as soil chemistry and the presence of larger organisms are not modeled. We refer to “microbes” for aspects that apply to all microbial life in soil (bacteria, fungi, protists and viruses), and specifically to bacteria for modeling and quantification of diversity and abundance. Summarizing, we propose a hydration-centered modeling framework that considers the interplay of climatic water content; carbon input flux and temperature in shaping soil microhabitats and thus bacterial diversity.

## Results

### Estimation of soil bacterial carrying capacity

We evaluated theoretical estimates of soil bacterial carrying capacity using previously published measurements of soil microbial carbon18. The HM assumes that a certain proportion of the annual NPP-derived organic carbon input is allocated to bacteria (24% of NPP for bacterial respiration28,29). We found that varying the range of expected values (14–30% of NPP28) had little impact on estimates of carrying capacity. A constant value of this respiratory fraction was therefore considered based on mechanistic model simulations28. We employ a basic estimate of bacterial cell maintenance rate of 1.5 gC gCcell−1 y−1 (≈10−4 gC gCcell−1 h−1) and adjust it according to the local mean annual temperature (MAT)30 to account for different climatic regions. Combining local annual NPP and adjusted cell maintenance rate, we derive estimates of soil bacterial carrying capacity as upper bounds for soil bacterial cell density (Fig. 2a). Despite the many simplifying assumptions, we obtain reasonable estimates of potential soil bacterial carrying capacity that are comparable with observations of realized bacterial cell density across a range of environmental conditions. Model estimates of soil-carrying capacity for three values of MAT are depicted in Fig. 2a (representing the median of three groups: ≤0 °C, 0−15 °C and >15 °C with −2, 9 and 19 °C, respectively). Observed cell densities tend to be higher for colder regions as considered by the HM. We note that soil bacterial cell density is expected to vary with soil depth due to the distribution of organic carbon flux from the soil surface and distribution by plant roots18. Soil bacterial carrying capacity decreases steeply with depth and was represented parametrically by a lognormal distribution (μ = 0.18, σ = 1.00) (Fig. 2b). The lognormal distribution provided a better global representation of the average topsoil carrying capacity (upper 10 cm, Supplementary Fig. 1) over the previously reported exponential model18. It is important to keep in mind that the estimated soil-carrying capacity was defined independently from bacterial diversity and values were calculated globally based on NPP, MAT and soil depth.

### Modeling bacterial diversity considering climate and soil

The simple HM was developed in two conceptual steps. We first assumed only a single species per aqueous habitat. This approach, although useful as a heuristic, exhibited some limitations for large aqueous habitats under wet conditions (see comparison of species abundance distributions below). We thus adapted the model to allow multiple species in large habitats by assigning the number of species Nsp proportional to the length scale of a habitat of size s (Nsp~s1/d, d = 2 or 3 = dimensionality). Hence, the HM links species richness to the soil aqueous-phase fragmentation via percolation theory and accommodates the possibility of multiple species per habitat. For most unsaturated conditions the refined formulation does not alter the prediction since small habitats are likely to host only a single species. In the following we refer to the multispecies HM if not stated otherwise. We have used median values of global soil-carrying capacity to describe trends in soil bacterial diversity across soil types and climatic regions. Comparisons of model estimates with empirical observations of bacterial richness obtained from the studies of Thompson et al. (EMP)1 and Delgado-Baquerizo et al. (DEL)4 are depicted in Fig. 3 along with the mechanistic predictions by the SIM. We have expressed mean soil hydration status via the climatic water content that is a proxy for average soil wetness and habitat connectivity. Soil and climatic variables were compiled from different sources (Supplementary Table 1) with matched geographical coordinates and soil depths for the samples. We present soil bacterial richness (total number of types) and note that taxonomic assignment was absent for the phylotypes detected in EMP. Bacterial richness was binned by water contents because some hydration conditions were overrepresented (bin width: 0.05). Since richness in the EMP data was measured at different soil depths, they were also grouped to top and subsoil (<25 cm and ≥25 cm). Exact number of samples per group are reported in Supplementary Table 2. The EMP data display a tendency of lower values of richness in the subsoil (Fig. 3a). In the DEL dataset, measurements were taken at the same soil depth, and soil pH is reported instead (Fig. 3b). We observe a strong tendency of lower soil pH in climatically wetter soils. The results depict an average decrease in bacterial richness where the soil becomes saturated as also predicted by the HM for median soil-carrying capacity (Fig. 3a, b). The modeled sensitivity to soil-carrying capacity is shown for a scenario of reduced cell densities (e.g. less carbon input to deeper soil layers; Fig. 3a dashed line). We emphasize that parameters were not fitted to observed diversity data, but rather are based on mean values for soil properties (porosity θs = 0.49 and 0.47; sample length L = 5 and 6 mm; textural length δ = 0.07 and 0.1 mm; for EMP and DEL, respectively). Additionally, we used a fixed value for the critical water content (θc ≈ 0.15) and a threshold for the number of cells Ncell needed to model occupancy of potential habitats (Ncell > 4000). Lastly, we compared the aqueous-phase fragmentation-based HM to numerical simulations of the SIM. We simulated the spatially explicit growth and movement of individual cells in a diverse bacterial community on heterogeneous soil pore surfaces. Qualitatively, both HM and SIM predict similar trends of variations in bacterial richness with soil hydration conditions as estimated from the EMP and DEL datasets (Fig. 3a, b). In addition to removing single cells (singletons) from the simulated communities, the modeled species counts were rarefied to 5000 and 1000 for comparison with EMP and DEL, respectively. To compare with the DEL dataset, simulated bacterial richness is reported only for the 512 most abundant species and describes the observed invariance of richness towards low climatic water contents (Fig. 3b). The discrepancy in water contents where richness peaks (between HM and SIM) is attributed to the dimensionality of the models (three for HM, two for SIM) and is well captured by the percolation-based HM in two dimensions (Supplementary Fig. 2).

### Species abundance distribution varies with hydration status

We quantified variations in bacterial species abundance distribution (SAD) with soil attributes and climatic water contents in comparison with empirical estimates from the EMP and DEL datasets (Supplementary Fig. 3). Here we used soil properties and carrying capacity specific for each geographical location and soil depth. The results show good alignment of the single-species model predictions with observed relative SADs and resulted in Pearson correlation values of 0.84 (n = 230) and 0.76 (n = 218) for the EMP and DEL datasets, respectively (Supplementary Fig. 3a, b). Nevertheless, the single-species HM erroneously predicts a higher proportion of the most abundant species than observed. We attribute this systematic overestimation to the stringent assumption of one single species per aqueous (micro-) habitat. This discrepancy suggests that the single species per aqueous habitat assumption may not hold for very large aqueous habitats in wet soil that could host multiple species. To rectify this limitation, we considered a scenario where the number of species Nsp is assumed proportional to the size s of an aqueous habitat (Nsp~s1/3). This relaxed occupancy assumption improved Pearson correlations to values of 0.88 (n = 230) and 0.84 (n = 218) for the EMP and DEL datasets, respectively (Supplementary Fig. 3c, d). Predictions by the HM for ranked SADs compare qualitatively with observations that were grouped by average hydration conditions (Supplementary Fig. 4). An increase in dominance of the most abundant bacterial species is visible in the ranked SADs of both datasets under sufficiently wet conditions (Supplementary Fig. 4b, c).

### Global patterns of soil bacterial habitat diversity

Motivated by the general agreement with observations of bacterial richness and the SADs produced by the HM, we used highly resolved global datasets for soil properties, NPP and precipitation as inputs to estimate global patterns of soil bacterial habitat richness (Fig. 4a). Recall that a central element of the model is the link between the number of distinct aqueous habitats per soil volume and soil bacterial richness. Additionally, we considered the sizes of aqueous habitats to yield spatially resolved distributions of the Shannon index of bacterial diversity patterns (Fig. 4b). We note that the modeled soil bacterial diversity follows constraints imposed by local soil-carrying capacity where high bacterial cell numbers are associated with locally high NPP and low cell maintenance requirements. Both diversity indices exhibit spatial patterns with distinct regions of increased diversity associated with climatic transition zones (e.g., the Sahel). This pattern is more pronounced when considering the Shannon index and suggests that soil bacterial community evenness, indicative of how equally habitats are partitioned, is sensitive to soil wetness. Such an association is also observed empirically where evenness decreases with increasing climatic water contents (Pearson r = −0.17 and −0.43 for EMP and DEL, respectively; Supplementary Fig. 5a).

### Disentangling soil bacterial abundance and diversity

To address the challenge of disentangling bacterial abundance and diversity, we compared bacterial community evenness with climatic water content and carrying capacity (Fig. 5). Evenness decreases gradually with climatic water content and with increasing soil-carrying capacity (Fig. 5, Supplementary Fig. 5b). The results are consistent with the tendency of wetter conditions being associated with an increase in cell densities and was confirmed (with no prior assumptions) using detailed mechanistic modeling (SIM) for small spatial and short temporal scales (Supplementary Fig. 6). In the aqueous-phase fragmentation-based HM, predicted bacterial cell densities are independent of climatic water contents. This could result in unrealistic values relative to empirical observations. We therefore used pairs of values for carrying capacity and climatic water contents to constrain the HM for evenness prediction (Fig. 5). Considering the relation between climatic water content and soil-carrying capacity highlights the sensitivity of HM predictions to bacterial cell density as also observed in the mechanistic simulation results of the SIM. The dependency of cell density on climatic water content in the SIM results in a persistent decrease of evenness with increasing water content (Supplementary Fig. 7). When considering paired values of water content and cell densities obtained from the SIM, the simpler HM captures the simulated trends reasonably well (Supplementary Fig. 7). Although beyond the scope of this study, we observed that pre-processing measurements of relative species abundance may affect diversity metrics such as richness and evenness, which alters the apparent tendencies (Supplementary Fig. 8).

## Discussion

The heuristic nature of the aqueous-phase fragmentation-based model (HM) precludes comparison of bacterial richness and abundance on a per sample basis, as climatic assumptions and associated large-scale variables are not likely to apply at a particular sampling location and time. Nonetheless, the proposed HM captures the salient features of global trends in bacterial richness related to climate, biome and soil type. Our estimate of soil bacterial cell density represents an upper bound on soil bacterial abundance (carrying capacity) and shows general agreement with measurements of soil bacterial biomass carbon18. It tracks the temperature dependency of reaction rates30 and provides an independent measure of maximal cell density that is sensitive to climate and organic carbon input by vegetation. Bacterial diversity increases towards lower values of climatic water contents (i.e., with increased aridity15), as long as soil bacterial life is not limited by low organic carbon input. Assuming a constant soil bacterial carrying capacity, we can attribute much of the variations in bacterial richness to the microscale behavior of soil hydration conditions (Fig. 3). Surprisingly, the trends of bacterial richness for both surveys EMP1 and DEL4 were very similar despite their different objectives and processing protocols of the genetic information, namely the use of amplicon sequence variants in EMP and operational taxonomic units in DEL (Fig. 3a, b). We note that the values of bacterial richness in the DEL dataset saturate towards lower values of climatic soil hydration (Fig. 3b). This is likely due to the truncation of species richness used in that study which focuses on the most abundant soil bacteria4. These, highly abundant species, might be the last to disappear under reduced carrying capacity and therefore do not show a decline towards dry conditions. The data available at low climatic water contents are sparse and do not provide support for the predicted steep decline of bacterial diversity as soil becomes dry that was previously reported with increased aridity at large scales15. However, a significant decrease in bacterial richness was also observed in a recent statistical meta-analysis for climatic scales31 and could be confirmed using the SIM (Fig. 3b). Additionally, it has been reported that bacterial diversity declines sharply with moisture in dry soils of Antarctica23 and decreases with soil relative humidity along transects of the Atacama desert32. Microcosm experiments revealed an increase in richness with moisture that peaks at intermediate water contents that promote rare bacterial species33. Similarly, bacterial richness was highest at intermediate climatic water contents where isolated aqueous habitats are numerous and sufficiently well supplied by diffusion to realize the soil-carrying capacity (Fig. 3). This observation is supported by the mechanistic simulation results of the SIM, which explicitly considers the dynamics and spatial structure of the bacterial community (Fig. 3). The generality of the aqueous-phase fragmentation-based approach permits comparison of systems with different dimensionality and can account for the shift of maximal richness towards higher water contents when comparing the HM with the two-dimensional simulation of bacterial life on hydrated surfaces by the SIM (Supplementary Fig. 2).

Increasing the organic carbon input and thus soil bacterial abundance seems to support higher diversity of soil microorganisms15. This is in line with the observation of decreasing bacterial richness with soil depth (Fig. 3a) that is often attributed to diminishing carbon inputs with depth (Fig. 2b). However, considering the various interacting factors at play, the general picture might be more complicated. An increase in soil-carrying capacity may not necessarily translate to increased bacterial diversity as evidenced by declining community evenness (Fig. 5, Supplementary Fig. 5). This could be due to dominance of a few species that may cluster near nutrient hotspots34, or loss of oligotrophic species that would be outcompeted in well-connected and dense communities9. We observe sensitivity of bacterial evenness to climatic water contents (Fig. 5), also in relation to the soil-carrying capacity (Supplementary Fig. 5). However, care should be taken regarding the interpretation of bacterial richness and evenness, since biases introduced by data processing and sampling could depend on the shape of the underlying SAD (Supplementary Fig. 8). Mechanistic models, such as HM and SIM, are valuable tools to investigate such dependencies as illustrated by considering only the most abundant species (Fig. 3b) or increasing sampling effort and removing species present at low abundance (Supplementary Fig. 8a, b, respectively). Nonetheless, an inherent tradeoff between availability of nutrients and protection by spatial isolation appears to play an important role in the establishment and maintenance of high soil bacterial diversity17,31,34. In other words, the relation between bacterial abundance and diversity is only positive when the aqueous phase is fragmented and spatial isolation suppresses the dominance of few species. As aqueous microhabitats become connected following soil rewetting by rainfall or irrigation, competition and other trophic interactions between bacterial cells are likely to reduce soil bacterial diversity (Fig. 3a, b) by reducing the communities evenness (Fig. 5). Many other factors such as pH1,2,14,19, nutrient composition5, carbon sources distribution6,34, stoichiometric constraints14,18 and metabolic dependencies35 shape soil bacterial abundance and diversity and could contribute to the discrepancy between our HM and empirical observations. Our study suggests that some of those factors might be associated with climatic hydration conditions. Interestingly, we find that soil samples exhibiting high bacterial diversity at intermediate climatic water contents coincide with near neutral pH values. In contrast, samples at low and high climatic water contents show high (basic) and low (acidic) pH tendencies, respectively (Fig. 3b). This is supported by studies that relate soil pH with differences in soil water balance at climatological timescales36. We consequently expect soil pH to result from differences between precipitation and evapotranspiration as described by climatic water contents (Supplementary Fig. 9). Teasing apart such confounding associations requires detailed statistical analysis and experimental validation, which are best conducted in dedicated studies.

Using a single parameter set, largely based on standard percolation theory combined with data on soil properties, our HM predicts SADs that closely resemble empirical observations (Supplementary Figs. 3, 4). Nevertheless, the increased aqueous-phase connectedness in climatically wet soils may also promote interactions that are suppressed under spatial isolation of dry conditions23. Processes that support bacterial species coexistence across small distances are not captured by the present model and would result in persistent underestimation of bacterial diversity (unless provisions are introduced as done for very large aqueous habitats—see Supplementary Figs. 2, 3). Another inherent limitation of the analyses presented here is the focus on soil bacteria ignoring the interplay with other soil microorganisms that comprise Earth’s microbiome20. For example, fungi could play an important role in modifying soil bacterial habitats2 and are currently only considered in the partitioning of microbial carbon.

The framework presented in this study captures the salient spatial trends in soil bacterial diversity at climatic timescales and provides insights into effects of habitat fragmentation on the prevalence of bacterial interactions in natural soil. This is particularly important for the interpretation of species co-occurrence and interspecific interactions35. Such interactions between different species become possible only for conditions supported by the soil aqueous-phase connectedness23. This promotes diversity by enabling macroscopic coexistence5,7,24 in soil bacterial communities competing for space and a common resource.

A unique aspect of the HM is the ability to bridge scales from soil pores to biomes where information at both scales is preserved. Further investigations are required to test some of the model implications at different scales. For example, elucidating the dependency of cell microscale distribution on soil type and hydration conditions could provide insights into the processes shaping bacterial interactions in soil. Additionally taking into account factors affecting the partitioning of carbon at the ecosystem scale could refine model estimates of bacterial abundance beyond potential carrying capacity. Nonetheless, modeling climate and soil-specific bacterial diversity offers a useful reference for comparing effects of climatic shifts (e.g. in temperature, precipitation) or land use change (e.g. in intensity of agricultural management or restoration to natural ecosystems) on soil bacterial communities that could guide future exploration of the soil bacterial micro- and macro geography.

## Methods

In the following, we provide a detailed overview of the methods used in the study and list key assumptions. Although the HM uses a yearly timescale for climatic averaging, the framework could be applied to finer and more resolved datasets. The global predictions of soil bacterial diversity were based on a 0.1° × 0.1° grid to harmonize raster layers. For a description of data sources, see Supplementary Table 1. Variables added to the datasets of point measurements are taken at the native, highest spatial resolution of the respective property. Where necessary and not explicitly stated, missing values were imputed using the mean value of the corresponding variable.

### Soil bacterial carrying capacity derived from NPP

The flux of carbon into the soil is taken from the MODIS NPP dataset37. We have used mean annual values (2000–2015). Missing values (e.g. desert) were imputed with values obtained from the Miami model38 using parameters fitted to the nonmissing values of MODIS NPP. Only an average fraction (ϵ = 0.24) of the total NPP entering the soil column is available for bacterial respiration28,29. The vertical distribution of microbial carbon in the soil column follows the distribution of plant roots18. This allowed us to impose the depth z at which most of the carbon is released by integrating over the sampled interval dz and calculating the fraction of NPP available for bacteria at a particular depth ($${\mathrm{NPP}}_{{\mathrm{b}},z} = {\it{\epsilon }}\frac{{{\mathrm{NPP}}}}{{d_{{\mathrm{soil}}}}}F_z = {\it{\epsilon }}\frac{{{\mathrm{NPP}}}}{{d_{{\mathrm{soil}}}}}{\int} {f\left( z \right){\mathrm{d}}z}$$). The factor Fz denotes the fraction of carbon available at a particular depth and is described by f(z) for the entire depth of the soil profile considered (dsoil = 1 m). Assuming no net growth of the bacterial community so that only energy requirements for maintenance metabolism are satisfied permits computation of maximal bacterial cell density ρcell (m−3). This soil-carrying capacity supported by the input flux of carbon is calculated using Eq. (1).

$$\rho _{{\mathrm{cell}}}\left( {z,T} \right) = \frac{{{\mathrm{NPP}}_{{\mathrm{b}},z}}}{{f_TmM_{\mathrm{c}}}}.$$
(1)

Using a constant mass of carbon per cell Mc and by fitting maintenance rate m, we calculated the bacterial cell density ρcell. Temperature dependency was implemented as a factor fT based on the Schoolfield model30 using mean annual temperature (MAT) from the WorldClim dataset39.

### Soil bacterial abundance dataset

Xu et al. (XU)18 compiled a dataset for the abundance of soil carbon associated with microbial biomass. This was used here as a reference for bacterial abundance for a range of geographical locations. We considered the relation between the soils carbon to nitrogen (C:N) ratio and the proportion of bacterial biomass to total microbial biomass40. Total microbial biomass carbon contains mainly fungal and bacterial carbon (CmicCF+CB). A piece wise linear function was used to describe the ratio of fungal to bacterial carbon (RFB = CF/CB) with varying C:N ratio of the soil organic matter. This ratio was taken as a constant below C:N = 18.4 (RFB = 5, see ref. 28) and increases with a slope of 0.5 above said value40. From RFB the relative proportion of bacterial biomass fB was calculated (fB = 1 / (RFB+ 1)). A carbon content per cell41 of Mc = 8.6 × 10−14 gC was used in all conversions of soil bacterial biomass and for the estimation of soil-carrying capacity. To determine the decay of carbon input in the soil profile (fz) we first averaged the bacterial biomass per soil depth. Averaging was necessary to avoid putting more weight on more frequently sampled depths. Values were integrated from the soil surface to the maximum depth of 2 m. This cumulated bacterial biomass was normalized by its total sum to obtain the cumulative fraction of biomass with soil depth. For parameter estimation, we fit the cumulative lognormal distribution to the cumulative fraction of bacterial biomass yielding μ = 0.18 and σ = 1.00 for parametrization of Fz. We chose a lognormal distribution as it gave a better fit to the vertical distribution of measured bacterial biomass than the previously used exponential model (Supplementary Fig. 1). The global maintenance rate was subsequently estimated by fitting Eq. (1) for the soil-carrying capacity to measurements of soil bacterial biomass carbon18 using inputs of local NPPb,z and MAT. The optimization yielded a maintenance rate of m = 1.5 gC gCcell−1 y−1.

### Soil bacterial diversity datasets

Two datasets of bacterial species/phylotype abundances based on 16S rRNA sequencing were employed in this study. Data from the Earth Microbiome Project as published by Thompson et al. (EMP)1 and data collected by Delgado-Baquerizo et al. (DEL)4 were used to estimate bacterial diversity. Diversity was calculated on the data “as provided” using the procedures outlined below. Except some samples in the EMP dataset had to be removed due to misclassification or unsuitable conditions. The following procedure was applied to filter the EMP data based on metadata: Samples labeled as “Soil(non-saline)” were selected if the environmental material was either “soil” or “bulk soil”. We then removed samples containing the features “oil contaminated soil” or “extreme high temperature habitat”. Tables of sampled abundances of phylotypes were then used as published (90 bp qc filtered and rarified to 5000 for EMP (n = 2871) and the top 511 phylotypes after taxonomic assignment for DEL (n = 237)). Variables relevant to soil and climate were added according to reported geographical coordinates and soil depth resulting in 484 and 218 sites for EMP and DEL, respectively. The mass of soil is taken from the extraction protocol used in the studies. For DEL, 0.25 g of soil and for EMP an average of 0.175 g were chosen.

### Estimating soil-specific “climatic” water content

A metric for the average hydration conditions relies on estimation of a representative value of water content based on rainfall patterns. We use a simplified approach where the periods in which soil drains or dries following a rain event are calculated. We apply a threshold to the precipitation time series to remove small wetting events that immediately evaporate and estimate the time in between rain events. The average duration between events is the characteristic dry down for given geographical locations. During this time, water mass is lost at a constant rate determined by (mean daily) potential evapotranspiration (PET), resulting in an exponential reduction of average water content within the considered soil profile (dsoil = 1 m). We assume for simplicity that a daily temporal resolution is compatible with the cessation of internal drainage of most soils. Hence, climatic soil water content does not exceed field capacity (a stable water content after internal drainage becomes negligible). For simplicity, we define the volumetric field capacity θFC (Vwater/Vsoil in m3 m−3) as half of the porosity θs (Vvoid/Vsoil in m3 m−3). The latter is obtained using an empirical (pedo-transfer) function42 that relates commonly measured soil properties (sand-, silt-, clay- contents and bulk density43) to soil porosity. The MSWEP44 precipitation records of 37 years (1979–2016) are used to derive average rainfall quantities per wetting−drying cycle. The spatial resolution of the precipitation data is roughly 11 km at the equator and the temporal resolution is given at a sub-daily (3 hourly) timescale. The data are down sampled to daily resolution as the dynamics of soil wetting and drying relevant for the bacterial habitat are expected to be within this timescale. Further, the precipitation time series is subjected to a threshold taken from estimates of PET45 based on temperature and radiation39 to identify wetting events. The run lengths between wetting events are measured and averaged across wetting cycles. The key result of the analysis is the mean time interval between rainfall events τ (an ensemble average) for every location. This quantity combined with daily PET (m d−1) were used to deduce the climatic water contents θτ (Vwater/Vsoil in m3 m−3) according to Eq. (2).

$$\theta _\tau = \theta _{{\mathrm{FC}}}{\mathrm{e}}^{ - \alpha < \tau > }\,with\, \alpha = \frac{{{\mathrm{PET}}}}{{d_{{\mathrm{soil}}}\theta _{{\mathrm{FC}}}}}.$$
(2)

The significance of θτ is that it combines rainfall patterns, PET, and soil properties over climatic timescales and provides a measure of the average hydration conditions experienced by soil bacteria in a particular geographical location (Supplementary Fig. 9).

### Estimation of aqueous habitat size distribution

We estimated the size distribution of distinct aqueous habitats based on soil properties and hydration conditions (e.g., climatic water content). Soil water content was treated as the aqueous-phase occupancy probability p (the probability of finding a water filled pore or roughness element) that, in turn, enabled the application of standard percolation theory to represent the characteristics of aqueous bacterial habitats (sizes and numbers). We considered the soil as a three-dimensional lattice (two-dimensional (2D) for comparison with the SIM) with a critical occupancy probability and universal exponents that determine the number of (aqueous) patches and their sizes46. The critical percolation threshold pc was multiplied by the soil void fraction (or saturated water content θs) to account for soil porosity47. The critical water content is thus defined by Eq. (3) and could be expressed as critical saturation Sc (4) to remove the dependency on θs.

$$\theta _c = \theta _{\mathrm{s}}p_{\mathrm{c}},$$
(3)
$$S_{\mathrm{c}} = \frac{{\theta _{\mathrm{c}}}}{{\theta _{\mathrm{s}}}} = p_{\mathrm{c}}.$$
(4)

The size distribution of aqueous patches ns(p) was assumed to follow general proportionalities of percolation theory (57)46:

$$n_{\mathrm{s}}\left( p \right)\sim s^{ - \tau }e^{ - \frac{s}{{s_\xi }}},$$
(5)
$$s_\xi \sim \left| {p_{\mathrm{c}} - p} \right|^{ - \frac{1}{\sigma }},$$
(6)
$$P^\infty \sim \left( {p - p_{\mathrm{c}}} \right)^\beta.$$
(7)

With the patch size s (number of sites/pores) for s 1, Fisher exponent τ ≈ 2.18 (2D: τ = 187/91), cutoff exponent σ 0.45 (2D: σ = 36/91) and cutoff size sξ46. P is the fraction of the domain occupied by a spanning (algebraically infinite) patch with exponent β ≈ 0.41 (2D: β = 5/36). The patch sizes follow a power law distribution at p = pc. Away from this critical point when the cutoff size sξ is exceeded, patches shrink with decreasing water content (p < pc) or merge and grow when approaching saturation (p > pc) as patches of size s > sξ become exponentially scarce. Although the prediction is strictly valid only for p close to pc, we assume such relations to hold for the range of conditions considered. The occupancy probability p was thus substituted with climatic water content θτ and pc with a critical water content θc ≈ 0.15 that correspond to a simple cubic lattice with porosity θs ≈ 0.5 (triangular lattice in 2D; θc ≈ 0.25).

To account for different soil types, a characteristic length scale δ is estimated based on the geometric mean diameter of soil particles48. This length scale is used for normalization of the aqueous patch size distribution in the range of water contents and patch sizes relevant for bacterial life. The soil type length scale δ and the system size L were considered (soil domain or sample size); here we used the mass of soil sampled msoil and bulk-density ρsoil specific to soil type (8). The total number of candidate sites N0 in the sampled soil was then determined from simple geometry considering the dimensionality d = 2 or 3 (9).

$$L = \left( {\frac{{m_{{\mathrm{soil}}}}}{{\rho _{{\mathrm{soil}}}}}} \right)^{\frac{1}{d}},$$
(8)
$$N_0 = \frac{{L^d}}{{\delta ^d}}$$
(9)

We approximated the behavior of the percolation transition using a bounded logistic curve that provides a smooth function $$\hat P^\infty$$

$$\hat P^\infty = \frac{\theta }{{1 + {\mathrm{e}}^{ - k\left( {\theta - \theta _{\mathrm{c}}} \right)}}},$$
(10)

where k describes the “sharpness” of the transition (k = 16 for all calculations). The total size of aqueous clusters or potential habitats Ns was normalized as follows:

$$N_s^0(\theta ,N_0) = \frac{{\theta - \hat P^\infty }}{{\mathop {\sum }\nolimits_1^{N_0} s\,n_s\left( \theta \right)}},$$
(11)
$$N_s(\theta ,s) = N_s^0\left( {\theta ,N_0} \right)s\,n_s\left( \theta \right).$$
(12)

Thus requiring, by pre-factor Ns0, that the total volume of aqueous patches conserves the volume of soil water at a given state of hydration. For practical reasons, subsequent calculations of aqueous patches proceed by removing the largest patch after normalization (this large patch biases the counting of habitats in a sample).

### Calculation of bacterial species diversity

The distribution of aqueous patches derived from percolation theory and their properties defined the degree of spatial isolation and restricted the number of potential habitats. Both aspects were expected to alter the bacterial diversity patterns observed in natural soils. The estimated aqueous patch sizes and their prevalence defined the distribution of bacterial habitats. Together with carrying capacity we can estimate the number of cells within a single (habitat) size class s (13).

$$N_{{\mathrm{cell}},s} = \rho _{{\mathrm{cell}}}\,s\,\delta ^d$$
(13)

Aqueous patches with cell counts below a prescribed threshold (or limit of detection, Ncell < 4000 for comparisons with empirical data) were removed from the total number of potential habitats Ns. Conceptually this can be interpreted as the discrete nature of bacterial cells that limits counts to integers greater than one. Empirically, there exists a lower limit of detection and a minimal number of cells from a single species (1) is needed to contribute to the measurement of bacterial richness. Initially, we assumed that only a single species occupies a patch by outcompeting possible coinhabitants. Hereby, the modeled species abundance distribution (SAD) follows the distribution of aqueous habitats with abundances bounded by carrying capacity within a defined volume of soil. Subsequently, we introduced the possibility of multiple species occupying large aqueous patches (in proportion to their size and dimension; Nsp~s1/d, d = 2 or 3) to correct for model bias of over predicting the dominant species. The exponent (1/d) suggests that the number of species per habitat grows with the average distance between any two points selected randomly within a single habitat of size s. The limit of detection was not used for the comparison of SADs as the total number of habitats was truncated to the number of observed species.

Bacterial diversity was calculated in the general form49 for all SADs (modeled and data):

$$^qD = \left( {\mathop {\sum}\limits_{i = 1}^{{\mathrm{SR}}} {p_i^q} } \right)^{1/(1 - q)}$$
(14)

With relative species abundance pi and species richness SR. For q = 0 the equation corresponds to the weighted harmonic mean and equals the actual number of types (SR). The equation is not defined for q = 1 where the limiting form is described by the well-known Shannon index H (15) and evenness E1,0 is calculated as defined by Eq. (16)49.

$${\mathrm{lim}}_{q \to 1} {\, }^{q}D = {\, }^{1}D = \exp \left( H \right) = {\mathrm{exp}}\left( { - \mathop {\sum}\limits_{i = 1}^{{\mathrm{SR}}} {p_i{\mathrm{ln}}(p_i)} } \right)$$
(15)
$$E_{1,0} = \frac{{\;}^{1}D}{{\;}^{0}D}.$$
(16)

### Spatially explicit individual-based model (SIM)

An individual-based approach was previously developed to model growth of diverse bacterial species on heterogeneous soil surfaces7,25 and was adopted for the current study. The spatial domain was represented by a hexagonal grid with periodic boundary conditions (length L = 1 mm; area of a grid cell Ahex = 100 μm2; and porosity θs = 0.49). Grid cells consisted of water holding elements with volumes drawn from a random uniform distribution (unif) that have a maximal size equal to the spacing of the grid (dx = 1.1 × 10−5 m). Thereby the modeled domain represents a slab of the soil pore space with a defined volume (Vsoil = L2 dx). The bulk water content is prescribed to the domain as a control parameter and spatially distributed relative to the sizes of grid elements while conserving the total volume of water (Vwater = ∑ Vwater,x,y). Based on the local volume of water, an average water film thickness h was calculated (hwater,x,y = Vwater,x,y/Ahex). The heterogeneity of the water film thickness modified the mass transfer between grid cells by changing the cross-sectional area that contributed to the diffusive flux. Diffusion was solved using the implicit finite differences method with bacterial consumption represented as a sink term. Diffusivity is taken for a small molecule that is readily available for bacterial consumption (e.g. glucose) and does not vary spatially (D = 6.7 × 10−10 m2 s−1). The simulation period corresponded to 8 days at a 1-min time step. Initial concentration of nutrients was constant in space and randomly replenished to initial concentration over time to mimic a fluctuating environment. The arrival of nutrient pulses was modeled as a Poisson process with an average rate of one arrival every 4 h. The initial nutrient concentration was set to provide enough carbon to sustain a fixed cell density (1017 m−3, corresponding to high carrying capacity) and was distributed evenly among nutrient pulses. The mass of nutrients locally available for bacterial consumption depended on the volume of water in a grid cell. All simulated bacteria were represented as elongating cylindrical capsules that consume a common carbon source dissolved in the aqueous phase. The diversity and multiple species i were prescribed in the model by varying Monod parameters (growth rate μmax,i, half saturation constant Ki—additionally maintenance rate mi := 0.01 μmax,i). Species-specific parameters were randomly selected from uniform distributions of the Monod parameters (μmax ~ unif(10−4 h−1, 1.14 h−1), K~ unif(6.8 g m−3, 680 g m−3)). All other parameters were held constant (mass of the cell mcell = 9.5 × 10−13 g, mass at division mdiv = 2 mcell, yield Y = 0.5, cell radius rcell = 0.5 μm). A single cell of each species was inoculated randomly on the domain at the beginning of the simulation (species richness SR at t = 0, SRt0 = 4096). Individual cells grew and divided along their axis with a slight asymmetry in mass to avoid complete synchrony (fm ~ unif(0, 0.05), mcell,1 = fmmdiv and mcell,2 = (1 fm)mdiv). All bacterial cells were subject to active and passive motion and could move continuously in the domain. Growth-induced shoving represents the passive motion and was implemented by displacing cells relative to their nearest neighbors (only considering the capsule geometry as n-spheres; no forces, e.g. capillary, friction, elastic, electrostatic, etc.). Shoving was not resolved to full relaxation due to the size of the domain, number of cells and the scale of interest (compromise between reduced computational demand and precision of the resulting spatial distributions). However, we implemented a simple rule to prevent local crowding: if the projected area of bacterial cells in a grid cell exceeded the area of the grid cell (Ahex), bacterial cells were randomly picked and moved to form a second layer (piling cells at the z-direction) from which they could “drop” down again once space became available. Bacterial swimming motility was permitted where the aqueous phase was connected and the water film thickness exceeded cell diameter26. Cells aligned their motility trajectories along gradients of the nutrient field, whereas their velocity was modified by the water film thickness26 and nutrient concentration50. Additionally, each velocity component (vx, vy) is independently multiplied with a random factor to allow for individual trajectories (fv~ unif(0, 2)). Integrating along the projected trajectory of each cell enabled consideration of varying water film thickness and prevented cells with high instantaneous velocity from “jumping” across grid cells. At the end of the simulation, the total number of cells and the number of cells per species were measured. To enable comparison of richness estimates from varying sample sizes (e.g. with observed species richness or simulations with different cell densities), total cell numbers were rarified to 5000 and 1000 counts, to compare with EMP and DEL, respectively. For comparison with the DEL dataset only the top 512 most abundant species were considered. Singletons, i.e. cells that were sampled only once when rarefying, were removed from the counts. The rarefication procedure was averaged across 15 trials to increase robustness of the diversity estimates. Only community evenness was also estimated without rarefication and removal of singletons as it affected the apparent community structure (Supplementary Fig. 8).

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.