Soil bacterial diversity mediated by microscale aqueous-phase processes across biomes

Soil bacterial diversity varies across biomes with potential impacts on soil ecological functioning. Here, we incorporate key factors that affect soil bacterial abundance and diversity across spatial scales into a mechanistic modeling framework considering soil type, carbon inputs and climate towards predicting soil bacterial diversity. The soil aqueous-phase content and connectivity exert strong influence on bacterial diversity for each soil type and rainfall pattern. Biome-specific carbon inputs deduced from net primary productivity provide constraints on soil bacterial abundance independent from diversity. The proposed heuristic model captures observed global trends of bacterial diversity in good agreement with predictions by an individual-based mechanistic model. Bacterial diversity is highest at intermediate water contents where the aqueous phase forms numerous disconnected habitats and soil carrying capacity determines level of occupancy. The framework delineates global soil bacterial diversity hotspots; located mainly in climatic transition zones that are sensitive to potential climate and land use changes.

In Fig. 1 the biodiversity is not influenced by the water content, which is in contradiction to what the authors claim. Fig. 2: I feel it is very hard to tell from that figures if the model is in line with the data. I feel many functions would do the job and there is no discussion how well the model fits, like correlation or test of alternative models. Line78. What happens if you use a high value for carbon per cell? Why did you choose to lose a low value?

Methods:
Formula [1] to what is m fitted?
You treat the bacterial density as a function of depth? Where do the authors get fz from? In the supplement the authors state that the source of organic carbon is uniformly distributed. How does this fit together?
Line 252: Shouldn't PET be something like volume per time? Its given as m/d? Whats is the unit of the field capacity? Supplement line 624: By selecting appropriate maintenance rates, we may obtain an upper bound for cell density that vary among geographic locations (integrating NPP and temperature). I assume that's not a integration in a mathematical sense but means that both are taken into account?
Supplement 627: To relax this assumption,we have considered a prescribed fraction (25%) of the NPP dedicated to soil bacteria. Isn't the ratio of bacteria and fungi strongly varying eg with the pH of the soil? Supplement line 634: Additionally, we cannot ignore the dependency of NPP on precipitation and do not attempt to treat derived carrying capacity as an independent variable but rather as a location specific property I don't understand that sentence Supplement 654: Soil microbial abundance declines rapidly with depth due to a commensurate decrease in nutrient availability as determined by plant litter and root exudates. Didnt you say earlier that you assume homogeneous input of carbon across the depth of 1m? Figure S1: SAD is a curve and D a number, so how can D be proportional to SAD?
How does NPP influence diversity? I see how it influences total abundance but not necessarily diversity. Is it because at low NPP carrying capacity is low and thus some micro-habitats are left empty?
Extended figure 1: What are units of axes? What were parameters of models? Was there any fitting done? For example what is factor for proportionality between species abundance and size of microhabitat? How was it obtained?
Reviewer #2: Remarks to the Author: Biogeography studies on soil microbial community are developing fast and providing many exciting results. Yet, they remain somehow disappointing because regarding soils characteristics, only chemical characteristics are considered in those studies, such as pH, available nutrients, C, C/N... It is however well established from local scale studies that the abundance, diversity and activity of soil microbial communities are also largely influenced by physical characteristics of their habitat, which are spatially very heterogeneous at the microscale. Indeed, as written by the authors, it was demonstrated that the hydration states of soil aqueous microhabitats, their size distribution and connectedness shape bacterial abundance and diversity. In this context, the work presented in this manuscript is highly welcome and relevant. Stating that there is only anecdotal evidence of a link between soil carbon and soil biodiversity is exaggerated and not necessary to justify the work developed here (soil bacterial diversity does increase with the organic matter content, as shown by biogeography studies from the local (e.g. Siciliano et al. 2015), the regional (e, g. Maestre  The authors propose to explain the biogeography of soil microorganisms by trophic conditions and by microscale aqueous habitats. For this they combine in each point of continents an estimate of the trophic carrying capacity for microbes (defined as a certain proportion of the NPP), with a model describing local microbial habitat conditions (number of aqueous microhabitats and their size distribution), to predict the abundance and diversity of soil bacterial and compare these predictions with published data on abundance and diversity of soil bacteria. As a conclusion, most of the variations in soil bacteria diversity are ascribed to the microscale hydration conditions. The approach is very attractive. The work is with no doubt novel and very creative. It is extremely exciting to try to bridge microscale and macro scales microbial biogeography. Yet the manuscript does not convince me, because it goes too far too fast. As a result, it is also very frustrating for the reader. Describing the trophic conditions and microscale physical conditions that bacteria experience in soils of the whole planet with available data worldwide at the targeted resolution (0.1x0.1°) requires a number of simplifications. Among them it is stated that the trophic resources for bacteria are 25% of NPP whatever the ecosystem, that field capacity is half of soil total porosity in all soils of the world, considered that only one bacterial "species" stand in a given wet microhabitat, etc. Why not after all, if it is to develop the proof of concept… However, it becomes very difficult to follow the results with the many successive assumptions and no discussion of their limits and consequences step by step and not presentation of intermediate results. The reader cannot appreciate the consequences of the many assumptions. It is not satisfying that the approach is developed at such a large scale that no validation is possible, while it should be possible developing it locally, considering e.g. 3-5 sites with contrasted soils, well documented for all the variables needed here, the variables being measured, not modelled, before going global. Then for example it would be possible to test the hypotheses considered to be verified on the global dataset (e.g. "a compensating effect of enhanced carrying capacity that allows higher richness despite an increase in habitat connectedness", l. 168). In addition, only complex final variables are presented in figures and there are no "intermediate" results presented e.g. on the number of aqueous microhabitats and their size distribution depending on soil type and moisture conditions for contrasted soil and climate conditions. I am not a soil physicist and cannot evaluate the model presented here, but would have liked intermediate results to be presented and discussed.
I found it difficult along the manuscript to differentiate results, from discussion and to follow precisely the discussion (e.g. section lines 160-183), especially as the literature is treated in a very general way, not accounting if the results cited were obtained from a microcosm model study, or from a plot or region scale. In conclusion, I find that this work is novel and important. It is important enough to strengthen, validate, discuss it stepwise, e.g. on a number of case studies at the plot scale before going global, and I would suggest to do so in a newly submitted version.
Reviewer #3: Remarks to the Author: The manuscript by Bickel and Or describes an interesting theoretical, modeling-based study of soil bacterial biodiversity patterns and provides an interesting perspective in which micro-scaler habitat connectivity and carrying capacity are combined. The work is laudable in that it attempts to link micro-scale patterns to bacterial diversity at the biome level. The authors develop models of bacterial abundance and diversity patterns and examine the accuracy of their models using two large-scale datasets of bacterial microbiome diversity.
In general, I think the work has potential value, but also suffers from some important drawbacks in my opinion. Perhaps most importantly, I found it difficult to decipher out the clear take-home messages that the authors wish to put forth. This is for instance an issue in the abstract: from the abstract it is difficult to determine what was actually done in the study and which conclusions are most important. Also, there are some parts of the text that remain somewhat vague (I try to point out a few below). In addition, the study is somewhat limited in its appreciation for other factors driving bacterial diversity patterns; while this is mentioned, it would be helpful to have the current study put more into the context of these different drivers.
In total, I very much appreciate the authors' quantitative approach to integrating micro-scale patterns of habitat connectivity and bacterial abundance with larger scale patterns of microbial diversity, but I think the manuscript would have to be improved substantially to get is message across effectively to a broad audience.
Below, I have listed a number of issue for the authors' consideration (not in order of importance). 1.) L26 and throughout: it may seem a bit nit-picky, but please be careful with the use of the term population. This should refer to a species, not a group of species 2.) L27-29: I'm not exactly sure what you mean here -do you refer to the shape of the species-area curve (steep at first, then flat, then rising again)? 3.) L32: You refer here to key soil factors, but only consider a couple -I suggest being clearer and referring to the specific factors you examined. 4.) L33: This is a bit vague -it would first have to be determined that bacterial diversity and abundance are "entangled" (in what way?) before we would have a need to disentangle them. I think it would easier a more straightforward to simply state the question as "what determines the relative diversity of soil bacterial communities"? 5.) L39: Again -what exactly is the "challenge of ecosystem functional diversity"? Also, you claim that it is important to be able to predict soil-borne microbial diversity -can you make a stronger case for the need to be able to do this? 6.) L56: It might be handy to state what you start with this simplification. 7.) L63-65: It might be useful to explain here (and more in the final section) that you do not examine other properties like pH, disturbance, etc. i.e. it is not that you seek to ignore other factors, but that you seek to examine specifically habitat connectivity and density. 8.) L90-L92: could this also have to do with the loss of many aerobic populations? 9.) L103-105 (and in general): Is it possible to tease apart these confounding associations? 10.) L137: This seems like a strange expression given the fact that the Shannon index is calculated from richness and evenness. 11.) L145: Why does this refer to "bacterial" biomes? As far as I can tell, this refers to biomes in general. 12.) L153: It seems to me that land use would be of great importance -how does that fit into your scheme? 13.) L187-189: Might however be worth mentioning that volatile compounds can also be important here & these would only be effective in less saturated soils. 14.) L193-195: Again, how does the current study relate to the importance of these factors in driving patterns of soil-borne bacterial diversity? 15.) L226: This assumption is obviously not constant -how would this affect the model? 16.) Fig3b: Why does this relationship break down at low climatic water content for the DEL database? This deserves some discussion.
Herewith also a few very minor things: 1.) L13: hyphenate "Biome-specific" 2.) L44: delete "a" 3.) L51: Delete "In" 4.) L62: hyphenate "long-term" 5.) L67: insert "our" before "model" "In their manuscript "Soil bacterial diversity shaped by microscale processes across biomes" the authors present a model that -as they claim -allows to calculate the biodiversity in soil from the porperties of the local soil and climate. This results in a simulated map of microbial diversity of the (nearly) whole planet. I think the overall approach is very interesting. I have never seen a similar approach before (although I am also not exactly working in the same field)." We thank the reviewer for the general appreciation of our approach. To the best of our knowledge it is the first process based model that attempts to link soil and climatic properties to soil bacterial diversity across scales. The simplified approach allows estimation of bacterial diversity (at the global scale) for locations where soil and climate properties are available.
"However, I had unfortunately quite a hard time with the paper. Especially it was unclear to me what in the end the model is and how it works. Moreover, it is hard to understand what is really the predictive power of the model, which makes the whole approach very speculative for me. Also the writing of the paper is often hard to follow. I think the manuscript would strongly profit from clearer more direct language with less jargon, a clearer description of what was actually done and assessment of the quality of the model." In the revised manuscript we attempted to clarify the points of concerns raised by the reviewer We attempted to streamline the presentation by simplifying sentences and minimizing the use of jargon where possible. Supplementary text was removed and necessary information was incorporated into the main text for clarity of the modeling approach (introduction and methods). Additionally, the model introduction (L52-74) was revised and a new illustrative figure was included ( Fig. 1). At the core of the model framework is the estimation of numbers and sizes of bacterial habitats that depend on soil type and climate. Together with an estimate of carrying capacity, we can model soil bacterial diversity that is sensitive to hydration conditions. We supplemented this aqueous phase "Major points: I remains unclear to me how the calculation of the micro-habitat distribution and carrying capacity come together to generate the outcomes that are shown. In several cases the parameters remain unclear and contradicting statements about the model are made (more detailed points see below)." We clarified these points in the revised manuscript. The calculations of micro-habitat occupancy were done using eq. 12 that describes the size and numbers of aqueous patches (serving as potential unique habitats) as a function of water content. We use eq. 13 to estimate the number of bacterial cells per habitat size class depending on local cell density (soil carrying capacity). Combining this information with the assumption on how many species share a single micro-habitat (in the simplest case, we allow for one species to dominate the habitat) we obtain estimates of species abundance distribution (SAD). The resulting SAD which is sensitive to water contents and carrying capacity,  The reviewer is correct in the observation, yet, to the best of our knowledge, there are no similar models that include soil and climatic properties and permit direct comparison with our HM.
Most existing ecological models that describe SADs rely on empirical fitting parameters that are difficult to interpret with respect to the physical structure of habitats (e.g. neutral models 1 ) or require many implicit assumptions on species properties (e.g. trait-based, idiosyncratic models 2 ).
Furthermore, results of either approach would be indistinguishable with respect to the SAD 2 . The novelty in this study is to use the HM as mechanistic framework that provide a basis for explaining changes in the SAD based on carrying capacity and climatic water content. Nevertheless, to address this and other comments we have used a mechanistic model that simulates microbial life on hydrated soil surfaces [3][4][5][6] to evaluate the HM predictions independently (at least for the small scale).
The resulting diversity and abundance trends by the HM were confirmed using the more detailed and computationally demanding SIM results (albeit with focus on shorter temporal and smaller spatial scales). While we cannot rule out additional factors, our findings suggest that the physical aqueous phase configuration plays and important role in shaping bacterial cell-cell interactions, ranges of migration, spatial confinement and more that ultimately determine soil bacterial diversity.
We could, in principal, provide goodness-of-fit metrics to compare our HM with other (statistical) models. However, since we are not fitting our model to the diversity data it is not meaningful to do so. Additionally, we use inferred and remotely sensed soil and climatic properties that are subject to large uncertainty and we do not expect to reproduce exactly local estimates of diversity.
"The maintenance rate m is fitted locally to the data as far as I understood. Later on this local m is used to predict species diversity at a certain location. It feels for me the authors use data to generate prediction about exactly this data and thus move in a circle. I think the way to test the predictive power of a model is to make predictions about data that was not used for fitting parameters." We think that this is a misunderstanding; the maintenance rate m is estimated globally using a dataset of microbial biomass carbon 7 that is independent of the bacterial 16S diversity datasets 8,9 . While m is estimated globally, local information on mean annual temperature (MAT) is used to adjust the rate m and account for different geographic locations. Apart from m, two additional parameters that affect variations in carrying capacity with soil depth were fitted to the dataset of soil microbial biomass carbon 7 . All other parameters were held constant. Additionally, we note that local carbon inputs and soil type are also important variables in determining the resulting soil bacterial abundance.
"As far as I understand the authors assume that there is one species in each micro-habitat. Therefore if the carrying capacity is changed the total population density of each species changes but the diversity should stay the same, because the relative abundance stays the same. So why is for the model incorporation of NPP , MAT and their effect on carrying capacity necessary when the carrying capacity has not influence on relative abundance?" As we discussed in the original version, the assumption of a single species dominating a microhabitat was made as a base case to evaluated the HM and determine potential SAD. Later in the study, we relaxed this assumption based on observations that deviate from model predictions especially under wet conditions (SI Fig. S3). The soil carrying capacity in the HM is used to constrain the number and sizes of potential habitats (i.e. isolated aqueous patches that can contain a certain number of cells).
It is correct that in the process of scaling the relative SAD by cell density should not change diversity.
Measurements of bacterial diversity, however, are subject to a limit of detection (cell counts of a single species). By applying a cutoff in cell density, small habitats do not contribute to measures of diversity. This relation with carrying capacity is sensitive to the shape of the SAD and causes

Nature Communications
Response to reviewers' comments Page 5 of 23 abundance and diversity to be "entangled". In our HM we use a cutoff in number of cells to remove habitats that are unlikely to be occupied or contain a too low number of cells to be detected. The SAD is hereby sensitive to carrying capacity and thus NPP and MAT.
"I think species abundance often follows a power low distribution and also percolation theory delivers power law distributions. Therefore it seems not surprising to me that SAD can show correlation with habitat size distribution (Extended Fig.1), but it does not mean the one causes the other. For example also a scale free interaction networks may lead to a power law shaped SAD." For clarification, we note that percolation theory delivers a power law at the critical point only. Away from this the SAD is represented by an exponential cutoff sensitive to climatic water contents and with magnitudes in richness that are sensitive to soil type (total potential number of habitats).
Models of SAD that generate similar power laws with an exponential cutoff include neutral and idiosyncratic models 2 . Both assume the existence of a meta-population (prior distribution) from which species can immigrate from. In soils, we argue, that the aqueous phase is fragmented and thus disconnected. To establish a power law distribution within a single aqueous patch it would be necessary to fulfill requirements of meta-population based models (i.e. im-/migration between patches is possible). While other mechanisms could lead to a power law-like distribution, the process and kernel density estimates). We are not aware of any comparable process based models that would allow estimation of carrying capacity (upper bound on cell density) and could be included.
Regarding the distribution of bacterial biomass with soil depth, we have used the log-normal model as it provided a better fit than the previously reported exponential model (SI fig. S1). We further compared the dependency of carrying capacity on water contents with simulations using the SIM (SI fig. S6). Note that in both models (HM, SIM) the relation between water contents and cell density is not explicitly prescribed. and the methods section. The lower diversity in deeper soil layers is expected due to the reduction in carrying capacity and is captured by the HM (Fig. 3a dashed line). The simple HM provides mean trends of bacterial diversity comparable with two independent datasets of bacterial diversity by using a single parameter set and without attempting to fit the data. We are not aware of comparable models and thus report simulation results using the SIM (Fig. 3 -blue squares). The SIM provides qualitatively similar results with a discrepancy in water contents corresponding to maximal richness that could be ascribed to the differences in dimensionality between the HM and SIM (SI Fig.   S2). The colors indicate estimates of soil pH 10 , which has been shown to be affected by climate 11 . For comparison with the DEL dataset, only the top 512 species were considered in the SIM. This goes along the expectation that the HM model could describe soil bacterial diversity values for specific soil samples -such specific prediction is beyond the capability of any available model at preset. While NPP and MAT are local properties, they are also derived from large scale observations and might not necessarily reflect conditions at small scales. Based on the heuristic assumptions that underlie such a simple model, we do not expect cell densities to be at carrying capacity at every location in space and time where bacterial diversity was sampled. Soil carrying capacity is used to take into account the carbon input and temperature of biomes by scaling bacterial diversity relative to a lower cutoff (limit of detection). Our focus lied on the role of hydration conditions in affecting bacterial diversity and we found the median carrying capacity to be representative for average trends (Fig. 3).

"Fig.4 I don't understand what is plotted here. How is habitat richness defined? How are microhabitats distinguished (how much to they have to differ in size to be different)? In b) there is once written it shows the diversity of the habitats then the diversity of the bacteria?"
In the revised manuscript we clarify the assumed equivalency between habitat richness (numbers and sizes of unique aqueous habitats) and bacterial richness (see results section L143-144). This is a core assumption that links the number of habitats and the number of bacterial species in a soil volume. For the simplest case where we assume one species dominating each habitat, we could substitute habitat richness with bacterial richness (otherwise certain provisions need to be made to link these two with a modified calculation). The size of a habitat depends on the soil type (textural characteristic length δ, L356-357) and on the mean water content in the soil. Both define the smallest unit length for distinguishing habitats of varying sizes. The ambiguity in Fig. 4b  .

Methods: "Formula [1] to what is m fitted?"
The procedure of obtaining m is described in L262-268 and L286-289 in the methods section. We use local MAT and soil depth specific NPP to fit m to measured estimates of bacterial biomass carbon.
"You treat the bacterial density as a function of depth? Where do the authors get fz from? In the supplement the authors state that the source of organic carbon is uniformly distributed. How does this fit together?" We thank the reviewer for the opportunity to clarify this important point. The "uniform distribution of carbon" referred to the very small scales (size of a single aqueous habitat) and not to the distribution along the soil profile. The distribution of bacterial biomass carbon with soil depth is parametrized using a log-normal distribution (providing fz). We have re-written the methods section and removed the supplementary text. against gravity, is expressed as volumetric water contents (Vwater/Vsoil in m -3 /m -3 ).
"Formula 2: As far as I understand this formula is derived by assuming a exponential evaporation of water from the ground between the rain (tau). However, shouldn't formula 2 then be an integral over an exponential decay which should have the form 1/alpha*(1-exp(-alpha tau)) ? Shouldn't the formula the authors provide just give the water content after the time tau?" Yes, the formula gives the water content after the characteristic time τ (an ensemble average). We adapted the methods to state this more explicitly (L325-332).
"Line 241: For simplicity we define the field capacity θ as half of the porosity θ obtained using a pedotransfer function Maybe you could shortly explain what that means?" A sentence to explain how porosity was obtained was added (L314-316): "The latter [porosity] is obtained using an empirical (pedo-transfer) function that relates commonly measured soil properties (sand-, silt-, clay-contents and bulk density) to soil porosity." We assume climatic water contents cannot exceed field capacity as the time scale considered would allow the soil to drain internally (L315-317).

"line 271: Why a log-normal model? I feel Fig2b could be fit with many functions?"
In principle, the reviewer is correct, however using previously reported exponential model 7,15 resulted in a poor fit compared to the log-normal (SI Fig S1). Often top soils are dry and contain fewer roots than deeper in the soil (hence affecting soil carbon distribution). The misfit of an exponential model was particularly evident for the top soil (10 cm) where most of samples (abundance and diversity) were taken. We agree that other bounded functions with "heavy" tails could be appropriate. Nonetheless, we opted for the log-normal as it the most parsimonious (multiplicative random process) and provides tractable central tendencies (mean of the log normal corresponds to the median of the untransformed data). The language was sloppy and we meant to explain the partitioning of NPP and the other parameters.
We have revised the entire section to clarify the procedure and assumption in simpler language (see Materials and methods in the revised manuscript).  We thank the reviewer for pointing out an ambiguity in our representation of the partitioning of carbon input (net primary productivity -NPP) to bacterial respiration and the attribution of soil microbial biomass carbon to fungal and bacterial biomass carbon. We revised the manuscript to "Supplement line 634: Additionally, we cannot ignore the dependency of NPP on precipitation and do not attempt to treat derived carrying capacity as an independent variable but rather as a location specific property I don't understand that sentence" Net primary productivity (NPP) is an ecosystem level property that considers regional vegetation patterns and is determined by climatic conditions (among other factors). We used precipitation data to estimate climatic water contents and encountered an association with carrying capacity  There is a misunderstanding of the scales where we assumed homogeneous carbon input -the statement of homogeneous input applies at the microscale, however, a microhabitat at a depth of 0.2 m would have more carbon than a microhabitat at a depth of 1 m. At a scale of microhabitats carbon inputs are homogeneous and at a soil profile scale, carbon inputs decline rapidly with soil depth and was parametrized using a log-normal distribution.
" Figure S1: SAD is a curve and D a number, so how can D be proportional to SAD?" We thank the reviewer for finding this mistake. It should read: q D ~ f(SAD). However, we have removed the supplementary text and illustrative figure S1. We thank the reviewer for the encouraging comment. The motivation of our work was to develop the ability of bringing scales from soil grains to biomes while considering the soils heterogeneity at small scales.

"Stating that there is only anecdotal evidence of a link between soil carbon and soil biodiversity is exaggerated and not necessary to justify the work developed here (soil bacterial diversity does increase with the organic matter content, as shown by biogeography studies from the local (e.g. Siciliano et al. 2015), the regional (e, g. Maestre et al.2015, Pasternak et al. 2013, Liu et al. 2014) and the global (Delgado-Baquerizo et al. 2016) scales)."
We thank the reviewer for pointing out literature references and agree that the wording was inadequate. We adapted the paragraphs in question (L26-30).

"The authors propose to explain the biogeography of soil microorganisms by trophic conditions and by microscale aqueous habitats. For this they combine in each point of continents an estimate of the trophic carrying capacity for microbes (defined as a certain proportion of the NPP), with a model describing local microbial habitat conditions (number of aqueous microhabitats and their size distribution), to predict the abundance and diversity of soil bacterial and compare these predictions with published data on abundance and diversity of soil bacteria. As a conclusion, most of the variations in soil bacteria diversity are ascribed to the microscale hydration conditions. The approach is very attractive. The work is with no doubt novel and very creative. It is extremely exciting to try to bridge microscale and macro scales microbial biogeography. Yet the manuscript does not convince me, because it goes too far too fast. As a result, it is also very frustrating for the reader."
While we are delighted by the reviewer's recognition of our works novelty, we disagree that the study goes "too far too fast" in the context of spatial scales. The goal of this study is to establish general trends at large spatial scales across a wide range of environments. Many properties cannot be separated (e.g. carrying capacity and hydration conditions, SI fig. S6) and lead to the necessity of incorporating additional formulations (e.g. parametrizing the decay of biomass with soil depth, SI fig.   S1) that incorporate the environmental context. We thus emphasized that while we have used data from a wide range of conditions (global datasets), the model captures trends that are also predicted by a spatially-explicit individual based model (SIM). In summary, the HM is aimed at presenting general regional trends based on mechanistic understanding of microscale conditions, and is not intended to provide specific predictions for a single sample (this would be beyond the capability of most mechanistic and statistical soil bacterial life models available at present; L173-175). We focus on few, in our opinion, important aspects (hydration conditions and biome specific carrying capacity) that are sufficient for understanding the main conclusions (aqueous micro-habitat fragmentation mediates soil bacterial diversity) and capture general trends of soil bacterial abundance and diversity.

"Describing the trophic conditions and microscale physical conditions that bacteria experience in soils of the whole planet with available data worldwide at the targeted resolution (0.1x0.1°) requires a number of simplifications. Among them it is stated that the trophic resources for bacteria are 25% of NPP whatever the ecosystem, that field capacity is half of soil total porosity in all soils of the world, considered that only one bacterial "species" stand in a given wet microhabitat, etc. Why not after all, if it is to develop the proof of concept… However, it becomes very difficult to follow the results with the many successive assumptions and no discussion of their limits and consequences step by step and not presentation of intermediate results. The reader cannot appreciate the consequences of the many assumptions."
The reviewer touches upon an important point -the necessary simplifications and assumptions for building the simplest model that captures the phenomenon under study (i.e. linking soil bacterial diversity and abundance to mechanistic processes and variables). Ability to link soil bacterial diversity to a "universal variable" that reflects climate, soil type and hydration is the core novelty of the HM propose here. We have revised the explanations and justification of the many assumptions and their context in terms of scale. We make use of certain observations such as that the water content at field capacity across many soil types is about half the value of saturated water content (porosity); we also invoke (for simplicity) the fact that in any aqueous patch single or multiple successful species would emerge as dominating that connected landscape (based on local conditions and physiological traits) -all of these are "heuristic" assumptions that enable us to construct a general heuristic model. We revised the main text to state assumptions more explicitly and to provide brief discussion of their consequences where applicable. We thank the reviewer for complimenting the core of our study and acknowledging the efforts in linking micro-scale patterns of soil aqueous habitats to biome characteristics. We went one step further in this revised version by not only evaluating model predictions of our HM with empirical observations but also providing mechanistic simulations using the SIM.

"In general, I think the work has potential value, but also suffers from some important drawbacks in my opinion. Perhaps most importantly, I found it difficult to decipher out the clear take-home messages that the authors wish to put forth. This is for instance an issue in the abstract: from the abstract it is difficult to determine what was actually done in the study and which conclusions are most important."
The abstract was rewritten entirely with focus on emphasizing the main message. Additionally we adapted the discussion to more clearly highlight findings and main conclusions.
"Also, there are some parts of the text that remain somewhat vague (I try to point out a few below). In addition, the study is somewhat limited in its appreciation for other factors driving bacterial diversity patterns; while this is mentioned, it would be helpful to have the current study put more into the context of these different drivers." We emphasized the role of other factors in the revised introduction (L33-34) and discuss them in context of our process-based understanding (L197-204, L214-219).

"In total, I very much appreciate the authors' quantitative approach to integrating micro-scale patterns of habitat connectivity and bacterial abundance with larger scale patterns of microbial diversity, but I think the manuscript would have to be improved substantially to get is message across effectively to a broad audience."
Motivated by the reviewer's appreciation, we revised the manuscript text substantially and amended model validation by comparing to simulation of the SIM.

"Below, I have listed a number of issue for the authors' consideration (not in order of importance). 1.) L26 and throughout: it may seem a bit nit-picky, but please be careful with the use of the term population. This should refer to a species, not a group of species"
We thank the reviewer for pointing out this inaccuracy in terminology and have adapted the wording accordingly were applicable. The referenced section has been removed from the revised manuscript text. The key soil factors under consideration have been explicitly stated in the introduction (L30-32) and throughout the method section.

"4.) L33: This is a bit vague -it would first have to be determined that bacterial diversity and abundance are "entangled" (in what way?) before we would have a need to disentangle them. I think it would easier a more straightforward to simply state the question as "what determines the relative diversity of soil bacterial communities"?"
We explained how bacterial diversity and abundance are possibly "entangled" in the introduction (L27-30) and provide and dedicated section in the results (L157-171). Further we found that species abundance and diversity (specifically richness and evenness) are not independent (Fig. 5). This affects measures of richness by making abundant species more detectable and was confirmed using the SIM (SI fig S8). Thus, the processing of diversity data and the measurements themselves are sensitive to the shape of the SAD.

"5.) L39: Again -what exactly is the "challenge of ecosystem functional diversity"? Also, you claim that it is important to be able to predict soil-borne microbial diversity -can you make a stronger case for the need to be able to do this?"
We revised the questionable phrase and provide a stronger case for the need of studying soil bacterial diversity in the context of ecosystem processes (L36-38).

"6.) L56: It might be handy to state what you start with this simplification."
We added a statement on the consequences of this simplification in the introduction (L56-61). A more explicit statement on the factors considered in this study is provided (L66-70) and we discuss potential limitations (L214-217 and L226-233).
"8.) L90-L92: could this also have to do with the loss of many aerobic populations?" We cannot rule out a loss of aerobic populations. However, the soils considered are likely to be aerated since samples are mostly taken from top soils (upper 10 cm) and the soils were not fully saturated. Furthermore, we could expect that facultative anaerobes and many aerobes would experience a competitive advantage at higher water contents. This might affect richness in unpredictable ways. Additionally, even if richness would drop due to loss of aerobic species we would still observe a decrease in evenness that suggests that dominance of species is enhanced in wetter environments (SI fig S5). Lastly, if aerobic populations were to be outcompeted it would implicitly require that they share their habitat with better adapted, anaerobic populations. This is in line with the notion of reduced soil bacterial diversity with increased habitat connectedness and would more likely occur in large habitats under wet conditions.

"9.) L103-105 (and in general): Is it possible to tease apart these confounding associations?"
It is difficult and in some cases impossible to tease apart such confounding associations, especially without process based models. Nonetheless, it should be possible to identify hierarchies in variables that affect soil bacterial diversity (e.g. soil pH is proxy of the soils buffering capacity which results from the water balance at climatological timescales 11 ) using mechanistic modelling (and experimental manipulation) to disentangle (and validate) confounding associations.

"10.) L137: This seems like a strange expression given the fact that the Shannon index is calculated from richness and evenness."
The definition of evenness is given as the diversity of order q=1 (exponential of Shannon index) divided by the diversity of order q=0 (richness) as described in the Methods section (eq. 16).
"11.) L145: Why does this refer to "bacterial" biomes? As far as I can tell, this refers to biomes in general." In general, there is substantial overlap between the traditional definitions (based on temperature and precipitation); but for soils with different water holding capacities our classification would differ.
However, the section on bacterial biomes was removed from the manuscript as it did not contribute substantially to the main message. Land use could be readily incorporated in the current model if it changes the effect of climatic variables (e.g. modified hydration conditions due to irrigation) and soil properties (e.g. bulk density due to compaction). Further, many agricultural practices change soil structure (e.g. tillage) and vegetation properties (e.g. crop rotations) that also affect the input of carbon in the soil profile. We discuss few aspects throughout the final section of the revised manuscript. Considering transport limitations as a function of soil hydration would indeed be interesting as a way to expand the current model and we thank the reviewer for mentioning the subject. However, we are unaware of potential implications for bacterial diversity that would not be mediated by habitat connectedness. We focus on the role of aqueous microhabitat fragmentation and did not discuss implications for gaseous transport in unsaturated soils as it is beyond the scope of this study.

"14.) L193-195: Again, how does the current study relate to the importance of these factors in driving patterns of soil-borne bacterial diversity?"
The roles of other factors are discussed in context of soil bacterial diversity and the findings of our study in the revised manuscript (L214-224).

"15.) L226: This assumption is obviously not constant -how would this affect the model?"
It is not obvious that this assumption is not, at least on average, constant at the scales considered.
The value of carbon content per cell is used to convert total bacterial biomass carbon to cell counts.
The value is not likely to change the order of magnitude of estimated carrying capacity. It would shift the total number of cells to higher values if a low value would be considered but does not alter the shape of the relation with NPP, MAT or climatic water contents. Thus, it does not affect the central conclusion that aqueous micro-habitat fragmentation affects soil bacterial diversity. We thank the reviewer for this interesting question. From our understanding, the data used in the DEL study considers only the most abundant species. Those are less sensitive to reduced carbon input (and hydration conditions). Using our SIM we emulated the data processing by truncating the ranked SAD to the top 512 most abundant species and could confirm the invariance of bacterial richness at low to intermediate water contents (Fig. 3b). This is further discussed in the context of disentangling abundance and diversity (L200-206).

Reviewers' Comments:
Reviewer #1: Remarks to the Author: Overall the manuscript improved a lot, the language is much clearer, the modeling process can be better understood, simulation and data can be better compared and the new SIM model adds valuable information.
I have few minor comments left: Two times when the model fails a new (usually fitted) parameter is introduced to save it. This makes it difficult to compare the results throughout the paper, since the model basically changes.
Especially the question arises if Fig. 3 would look different allowing more than one species per habitat. Also in Fig. 5 it remains unclear how using the empirical input parameters changes the outcome and how this model change would affect the other results of the paper. Remarks to the Author: I found the manuscript to be much improved and much more readable and accessible than the original. I think the authors do a good job showing that this general approach is relevant to predicting patterns of microbial diversity. They also do a better job of discussing the limitations and assumptions of their approach.
I think especially the abstract could still do a better job of zooming in on the question at hand, so as to capture the attention of the reader and clearly guide him/her to what is going to be addressed in the manuscript. For instance, the first sentence is so general as to not be very useful. This could be stated much more sharply. Perhaps something like "microbial diversity has been shown to vary across terrestrial habitats, with presumptive links to function." Also, it would be helpful to clearly state the question that is being addressed.
Herewith some specific comments as I went back through the manuscript: 1.) Introduction: the structure of the introduction would be improved by the use of paragraphs. 2.) L21: some studies suggest even higher numbers -might be worth including. 3.) L24: I think it would be better to refer to "the rare components of the soil microbiome" 4.) L62 (and elsewhere) -I think it is important to make it clear that you refer to terrestrial biomesalso, I think this sentence should be rearranged: thus… Modeled trends of soil bacterial carrying capacity and diversity were compared to empirical observations across different terrestrial biomes" 5.) L80-82L I suggest switching the order around of this sentence -we found that varying the range of expected values had little impact on carrying capacity estimates. We therefore used a constant value in our model. ; later you say that his assumption does not hold, so I think this has to be toned down here. Also, I think that you can make a better distinction between potential carrying capacity and realized carrying capacity. 10.) Figure 3: Empirical data is extremely sparse at the dry end of the spectrum (zero and one point in panels a and respectively) -thus the "real" data not show strong support for the sharp diversity decline under the driest conditions here. I think this deserves mention.

R1.1: Overall the manuscript improved a lot, the language is much clearer, the modeling process can be better understood, simulation and data can be better compared and the new SIM model adds valuable information.
We thank the reviewer for the encouraging comment and we fully agree that the additional mechanistic modeling results helped support the heuristic and simple model and also clarified certain important aspects in the revised manuscript.
I have few minor comments left:

R1.2:
Two times when the model fails a new (usually fitted) parameter is introduced to save it. This makes it difficult to compare the results throughout the paper, since the model basically changes.
We generally agree with the reviewer, yet keeping in mind the minimalistic nature of the heuristic model and the broad range of conditions explained by this relatively simple approach, the performance of the HM is quite remarkable. We wish to clarify that we did not fit an additional parameter to address model limitations; instead, we explored alternative assumptions regarding species occupancy. The original and simplest "single species per aqueous habitat" assumption holds well for most unsaturated conditions where the soil aqueous phase is fragmented to many small habitats. However, as soil water content increases and the aqueous phase becomes reconnected, habitats may grow substantially and are able to accommodate occupancy of multiple bacterial species. This is in essence the phenomenological correction (important primarily near saturation) that we have introduced to the HM (N sp ~ s 1/d , d = 2 or 3 = dimensionality). It postulates existence of cluster size or length scale at which individual populations would not interact within large aqueous habitats (e.g. separated by "diffusive spheres"). The exponent (1/d) suggests that the number of species per habitat grows with the average distance between any two points selected randomly within a single habitat of size s. Based on the reviewer comment, we have decided to use only one version the HM that allows multiple species per habitat throughout the manuscript. We discuss the difference in the two assumptions and compare the outcome in the supplementary materials (see Supplementary Figure 3 below) Fig. 3 would look different allowing more than one species per habitat.

R1.3: Especially the question arises if
We have tested and confirmed that changing the number of species per habitat would not alter the shape of the richness-water content relation considerably. It would stretch the curve towards larger values of richness as shown for two dimensions (surfaces) in comparison with simulation (SIM) results depicted in Supplementary Figure 2. Because we need to set a detection limit for comparing model predictions with observations a higher level of richness due to the increased number of species per habitat can be partially compensated for by selecting higher detection limits. The primary effect of including multiple species per habitat is manifested in the modeled species abundance distributions (Supplementary Figure 3) that affect evenness in Fig. 5 (see R1.4). Fig. 5 it remains unclear how using the empirical input parameters changes the outcome and how this model change would affect the other results of the paper.

R1.4: Also in
We thank the reviewer for this insightful suggestion. The consideration of multiple species per habitat results in higher evenness with high water contents where only a few but large habitats emerge. As stated in the response above (R1.2), we now report only the multi species heuristic model (HM) and the agreement with the observations' central tendency improved considerably (median ± IQR, Fig. 5). With the added flexibility, the modified HM (multiple species per aqueous habitat as a function of habitat size) is expected to perform better when comparing diversity metrics that consider species relative abundance directly (Supplementary Figure 3, see R1.3). Additionally, the evaluation of the HM for every sampled location circumvents the need to rely on empirical correlation of model inputs. Instead, we use the independent estimates of soil carrying capacity and climatic water contents for each sampled location and report the tendency of the HM predictions as a smoothed trend line (instead of using binned values; see Fig. 5 -solid line). Other results of the manuscript would not be affected (for example the predicted maps in Fig. 4 use independently estimated carrying capacity and climatic water contents while predictions for richness in Fig. 3 are based on median carrying capacity).

Fig. 5.
Bacterial community evenness decreases with carrying capacity and climatic water contents. Evenness from two independent studies is shown together with estimated cell density (carrying capacity). Samples were aggregated by latitude, longitude and soil depth (EMP 1 , n = 484 and DEL 4 , n = 218). The median and interquartile ranges (colored symbols and bars) are displayed for groups of water contents (bin width: 0.05, number of samples see Supplementary Table 2). Individual data points are shown for bins containing less than ten samples (small circles) and samples with cell density lower than 10 12 m -3 were removed. Evenness predicted by the heuristic model (HM) is calculated using paired values of climatic water content and carrying capacity (evaluated for every sample). Using the joint data of water content and cell density as model input, the HM reproduces the observed tendency of evenness. A locally weighted scatterplot smooth (LOWESS) of modeled evenness is shown for the HM predictions (solid line).  Indeed the observed decrease in evenness is relatively small, as evidenced by the low, yet negative Pearson correlation coefficients relating evenness to climatic water contents for samples from the two datasets (-0.17 and -0.41 for EMP and DEL, respectively: Supplementary Figure 5a). Mechanistically, we expect an overall decrease in soil bacterial evenness as the soil becomes wet. This is predicted by both the HM and the SIM independently. We note however, that pre-processing of relative abundance information (e.g. removal of singletons) can significantly alter the apparent relation as demonstrated with the SIM (Supplementary Figure 8b). Hence, we do not necessarily expect to observe a monotonous decrease. Nonetheless we fit a linear model of the form: ~ + + log ( ); with climatic water content ( ) and cell density ( ). Although, the goodness of fit is not large ( 2 = 0.14), both slopes are negative and the magnitude of the intercept appears reasonable ( = 1.06 ± 0.09; = −0.31 ± 0.08; = −0.05 ± 0.01). Additionally, the residuals indicate no model bias (Supplementary Figure 5b). Negative slopes (β, γ) suggest that evenness is jointly reduced by increasing climatic water contents and cell density. Model residuals are not indicative of a persistent bias. Additionally, evenness is shown for bins of water contents (median ± IQR) to highlight the central tendency.

R1.7: Maybe use the word 'model' or similar in the title to manage expectations a little
We included the term 'modeled' to label figure axis where applicable. Species in the right panel (wet and connected soil) have been modified to appear more similar.

R1.9: Fig2: maybe color datapoints according to temperature range
We tried coloring data points by temperature range but decided to not keep the color as many points overlap (as evidenced by the distributions of cell density values) and little additional information could be displayed. Nonetheless, we adapted the presentation of the figure to show more clearly the cell densities to expect under different ranges of temperature (see R1.5).

R1.10: Fig3: Line 74 estimates of soil pH: how was it estimated? Measured?
We wrongfully cited SoilGrids 4 (global digital soil maps) as the source of soil pH estimates, which was the case in an earlier version of the manuscript. The current values of soil pH were reported by Delgado-Baquerizo et al. and originate from sample scale measurements 3 .

Nature Communications
Response to reviewers' comments (Rev. 2) Page 7 of 13

R1.11: Fig5: maybe label SIM and HM symbols according to cell density
The calculations of the HM in figure 5 have been modified. We evaluated the HM with paired climatic water content and cell density estimates for each sample and display a smoothed trend line of the resulting evenness (see R1.4). It is therefore not possible to label the symbols by cell density. However, cell densities of the SIM are reported in Supplementary Figure 7.

Reviewer #3 (Remarks to the Author):
R3.1: I found the manuscript to be much improved and much more readable and accessible than the original. I think the authors do a good job showing that this general approach is relevant to predicting patterns of microbial diversity. They also do a better job of discussing the limitations and assumptions of their approach.
We thank the reviewer for the encouraging comment and are happy that improvement could be noticed regarding our discussion of limitations and assumptions. We agree that the abstract could be formulated more concisely. The suggested phrase was adopted and we included an explicit statement of what is addressed in the manuscript ("Here we […]"). The abstract has been revised to better reflect the content of the manuscript. Herewith some specific comments as I went back through the manuscript:

R3.3: 1.) Introduction: the structure of the introduction would be improved by the use of paragraphs.
We followed the suggestion and used paragraphs where applicable.
A study 5 reporting soil bacterial diversity in the order of 10 6 was included. We also compared to a recent study 6 on the expected number of phylotypes on earth to provide an upper bound on what could be expected in terrestrial environments.
l26: "The number of bacterial phylotypes ranges between 10 2 to 10 6 per gram of soil 2,3,7 , with high values similar to the richness in all of earths environments 6 ."

R3.5: 3.) L24: I think it would be better to refer to "the rare components of the soil microbiome"
We thank the reviewer for this suggestion. The phrase was adopted and improved the flow of the text.
l30: "This wide range of microhabitats is particularly important for maintaining the rare components of the soil microbiome."

R3.6: 4.) L62 (and elsewhere) -I think it is important to make it clear that you refer to terrestrial
biomes -also, I think this sentence should be rearranged: thus… Modeled trends of soil bacterial carrying capacity and diversity were compared to empirical observations across different terrestrial biomes" The sentence was rearranged as recommended, and we specified (where applicable) that the biomes considered are terrestrial.
l70: "Modeled trends of soil bacterial carrying capacity and diversity are compared to empirical observations 1-3 across terrestrial biomes." R3.7: 5.) L80-82L I suggest switching the order around of this sentence -we found that varying the range of expected values had little impact on carrying capacity estimates. We therefore used a constant value in our model.
We followed the suggestion that greatly improves the logical structure of the sentence.
l88: "We found that varying the range of expected values (14-30% of NPP 8 ) had little impact on estimates of carrying capacity. A constant value of this respiratory fraction was therefore considered based on mechanistic model simulations 8 ."

R3.8: 6.) L202: I think you should define what you mean by hotspots -hot spots for what? Activity? Interaction? Diversity?
The ambiguous usage of the term "hot-spots" was avoided. We now explicitly refer to "nutrient hotspots" and with that implicitly to potential aspects for bacterial activity, interactions and diversity.
l228: "This could be due to dominance of a few species that may cluster around nutrient hot-spots 9 , or loss of oligotrophic species that would be outcompeted in well-connected and dense communities." R3.9: 7.) L223-224: This could of course also be approached experimentally.
We agree with the reviewer that this could be approached experimentally. We would further suggest that experimental validation would be essential in disentangling the effects of soil carbon and water on bacterial habitats and included a corresponding statement.
l250: "Teasing apart such confounding associations requires detailed statistical analysis and experimental validation, which are best conducted in dedicated studies" R3.10: 8.) L247: On the more positive side, you might also mention restoration efforts (as opposed to only aspects that cause soil degradation).
Following the suggestion we included restoration efforts as part of changes in land use. ; later you say that his assumption does not hold, so I think this has to be toned down here. Also, I think that you can make a better distinction between potential carrying capacity and realized carrying capacity.
We thank the reviewer for these helpful suggestions. The soil matrix was included in the illustration and the white background was removed (Fig 1). We toned down the wording in the figure caption to indicate the possibility of having multiple species in a single aqueous habitat (l589). Additionally, a sentence was added to clarify the distinction between potential and realized carrying capacity (l591).
l589: "When the soil becomes sufficiently dry almost all aqueous habitats are physically isolated and might contain only a few species." l591: "The specific carrying capacity in a biome is based on carbon input flux and temperature that establish an upper bound on bacterial cell density (rarely realized in any particular location due to other limiting factors)." R3.12: 10.) Figure 3: Empirical data is extremely sparse at the dry end of the spectrum (zero and one point in panels a and respectively) -thus the "real" data not show strong support for the sharp diversity decline under the driest conditions here. I think this deserves mention.
We thank the reviewer for pointing out the lack of discussion regarding the sample coverage at the dry end. This also led us to discover a mistake in the presentation of the data, which removed two data points. Additionally, we adapted the binning of data to comply with Nature Communications guidelines of displaying average values only if the number of samples is greater than ten. Wherever there are less samples, the individual data points are now shown. To further improve the representation we newly grouped samples of the EMP dataset by top-and sub-soil (<25cm and ≥25cm). Nonetheless, we now also address the sparsity of data used at the dry end explicitly in the discussion (l209). We recently published a global, statistical meta-analysis 10 with an increased number of sampled locations in dry regions that exhibit a drop in diversity under low climatic water contents at larger scales; as previously reported for increased aridity 11 . Furthermore, a sharp decline in diversity under very dry conditions was also reported for dry valleys of Antarctica 12 and a drop in richness (Faith's PD) and Shannon index with soil relative humidity was observed in the Atacama desert 13 . A distinct drop in richness towards dry and wet conditions was also observed in soil microcosm experiments when rare species were emphasized 14 . The studies mentioned were included in the discussion section. However, we could only speculate about other factors that might cause the lack of clear patterns, particularly the absence of residual soil moisture in the HM (that could make water contents appear too low) and the possible influence of dew (that could enhance bacterial growth) in some dry regions.
l209: "The data available at low climatic water contents are sparse and do not provide support for the predicted steep decline of bacterial diversity as soil becomes dry that was previously reported with increased aridity at large scales 14 . However, a significant decrease in bacterial richness was also observed in a recent statistical meta-analysis for climatic scales 30 and could be confirmed using the SIM (Fig. 3b). Additionally, it has been reported that bacterial diversity declines sharply with moisture in dry soils of Antarctica 22 and decreases with soil relative humidity along transects of the Atacama desert 31 . Microcosm experiments revealed an increase in richness with moisture that peaks at intermediate water contents that promote rare bacterial species 32 ."