Introduction

Soil is the largest carbon (C) reservoir in terrestrial ecosystems. Small changes in the C budget can cause large fluctuations in atmospheric carbon dioxide concentrations, which, in turn, can have profound impacts on the structure and function of terrestrial ecosystems1,2,3. As native inhabitants of soil, microorganisms can regulate soil C dynamics through catabolism and anabolism4,5,6,7, and the role of microbial anabolism in transforming soil organic matter into more stable forms is increasingly highlighted6. Plant residues are obtained by microorganisms and part of them are used to construct microbial biomass6,8. After the death of microorganisms, microbial-derived C (in vivo turnover products, including dead microbial residues and some metabolites)6 is stabilized in the soil by association with minerals or occlusion in aggregates6,9, which can resist disturbance from external factors over longer residence times. This process results in the preservation of large amounts of underground microbial-derived C. Some studies have shown that microbial residual C accounts for a large proportion (more than 50%, up to 80%) of the soil organic carbon (SOC) pool10,11,12,13,14,15. Therefore, although microbial biomass C only accounts for 2–4% of SOC11,16, the contribution of microbial residues to the SOC cannot be ignored. The relatively stable storage of microbial residues contributes to the persistence of organic C in soil6,11. Consequently, it is essential to study the accumulation patterns of microbial residues to enhance our understanding of C sequestration in terrestrial ecosystems6,11,17,18.

Microbial communities are highly sensitive to environmental change19,20. External factors, including soil properties and climate change, can affect the transfer of microbial metabolites to the soil and their stabilization by influencing microbial physiological characteristics, such as microbial growth rate and growth efficiency, as well as biochemical characteristics21,22,23. For example, under limited soil moisture conditions, both microorganism growth and activity are weakened24, and the transformation of plant residues into microbial biomass and then into microbial residues is decreased25,26. Under increased soil moisture conditions, the C utilization efficiency (CUE) of microorganisms and the microbial residue accumulation efficiency should improve. However, excessive soil moisture would lead to oxygen limitation, which would decrease microbial substrate utilization efficiency, and consequently reduce microbial necromass accumulation27.

Multiple studies have shown that low CUE limit the synthesis of microbial biomass28,29. The higher the microbial growth rate and growth efficiency, the higher the rate of accumulation of microbial residues30. However, with increased microbial activity, SOC decomposition and utilization ability (plant-derived or microbial-derived C31, etc.) increases29. Consequently, there may not be a simple linear relationship between microbial physiological characteristics and their mediated C sequestration23. Considering the influence of environmental change on the internal characteristics of microbial communities19,20, there is potentially an optimal range of environmental conditions for maximizing the accumulation of microbial residues. Determining the optimal environmental conditions for microbial growth with the greatest microbial residue accumulation and the least SOC decomposition could facilitate the management of C sequestration in soil. However, the optimal environmental conditions for microbial C accumulation remain poorly understood.

The remains of dead microorganisms (microbial residues) can be traced using amino sugars32,33, given that only a small proportion of amino sugars in the soil is associated with microbial biomass, and plants do not contain amino sugars34,35. In addition, different amino sugars have been linked to specific microbial populations35,36. Therefore, using amino sugars as a biomarker to characterize microbial residues facilitates the characterization of SOC sources and the estimation of their stabilization potentials18,32,37,38. In the majority of studies, only four types of amino sugars, glucosamine, galactosamine, mannosamine, and muramic acid, have been quantified32, among which glucosamine, galactosamine, and muramic acid are most abundant38, of these four, fungi produce most of the glucosamine, while bacteria produce only a small proportion of glucosamine, in addition to muramic acid32,38.

Amino sugar biomarkers are increasingly being used to study the mechanisms underlying storage of microbial residues26,39,40. This enables extensive quantification of the global heterogeneity of soil amino sugar content and its predictors. Forest and grassland ecosystems account for approximately 30%41 and 26%42 of the earth’s land surface area, respectively, and approximately 47% and 20% of the earth’s land SOC content, respectively42,43. In the present study, we collected amino sugar data from forest and grassland ecosystems for meta-analysis. We found that most of the studies on soil microbial residues are carried out in the 0–10 or 0–20 cm mineral soil layers in these ecosystems. We synthesized 268 data points of microbial residues in the 0–20 cm soil layer of forest and grassland ecosystems (Fig. 1).

Fig. 1: Datasets used in the study.
figure 1

a Study sites. b Kruskal–Wallis test between different ecosystems and climatic zones; letters in the figure indicate significant differences such that conditions with the same letters are not significantly different. c Frequency distribution histogram of original data for amino sugars; numbers in the graph show basic statistical information. b Central lines in the boxes represent medians, left edges of the boxes represent first quartiles, right edges of the boxes represent third quartiles, whiskers indicate 1.5-fold inter-quartile ranges, and dots beyond whiskers represent outliers.

The aim of the present study was to investigate the relative importance of climate, geographical location, and soil physicochemical properties in predicting microbial residue accumulation, as well as to explore the variation in microbial residues with changes in environmental variables. We hypothesized that climate, geographical location, and soil physicochemical properties have different effects on microbial residue accumulation. We further hypothesized that their relative importance is in the order geographical location > climate > soil physicochemical properties, since differences in geographical location would lead to distinct climates, and different geographical location and climate conditions would have varying effects on soil physicochemical properties. Moreover, we speculated that the response patterns of microbial residues to different environmental variables is not always gradual. Hence, we hypothesized the occurrence of an optimal range or thresholds of environmental conditions for the accumulation of microbial residues.

Results

Geographical patterns of microbial residues

The concentrations of topsoil amino sugars ranged from 0.04 to 11.21 mg g−1 soil, with a mean of 2.25 ± 0.13 and a median of 1.68 mg g−1 soil (Fig. 1c). The Scheirer–Ray–Hare test, using ecosystem type and climate zone as fixed factors, revealed no significant difference in amino sugar concentrations between forests and grasslands (2.21 ± 0.22 and 2.29 ± 0.17 mg g−1 soil, respectively) (p = 0.111, Table 1). However, significant differences were detected between climatic zones (p = 0.019, Table 1). Among them, significantly higher amino sugar concentrations were detected in the temperate zone than in the subtropical zone (p = 0.031), while no significant difference in amino sugar concentrations was observed among other climatic zones. No significant interaction on amino sugar concentrations was found between vegetation types and climatic zones (p = 0.094, Table 1). Kruskal–Wallis tests across the five categories (subtropical grasslands, temperate grasslands, tropical forests, subtropical forests, and temperate forests) revealed significant differences in amino sugar concentrations between subtropical forests and temperate forests (p = 0.009) and between subtropical forests and temperate grasslands (p = 0.047, Fig. 1b). Globally, amino sugar concentrations were significantly correlated with SOC concentrations (Spearman’s correlation R2 = 0.490, p < 0.001, Fig. 2b).

Table 1 Scheirer–Ray–Hare test of the effects of ecosystem type, climate zone, and their interactions on soil respiration amino sugar content (characterization of microbial residues).
Fig. 2: Models of relationships between environmental variables and amino sugars on a global scale.
figure 2

a Structural equation model (SEM). b Correlations between environmental variables. c Standard total effect of SEM. d Random Forest model, which shows average predictive importance (mean square error (MSE) increase percentage) for each environmental factor for absolute content of soil amino sugars. SEM shows the causality and correlation between the absolute content of amino sugars in soil and the environmental variables. Red lines indicate positive effects; blue lines indicate negative effects. The thickness and color of the lines are directly proportional to the standardized path coefficient on the single arrow. SOC soil organic carbon. R2 represents the variance of biomarkers explained by the model. Asterisks indicate the significance of each predictor, with one and two asterisks indicating p < 0.05 and p < 0.01, respectively.

Predictors of microbial residues on a global scale

The Random Forest model showed that all the environmental variables considered in this study were important for predicting the concentrations of amino sugars (Overall model: R2 = 0.768, p < 0.001; environment variable p < 0.05, Fig. 2d). Our structural equation modeling (SEM) explained 65.4% of the variance in amino sugars (Fig. 2a). SOC, soil carbon-to-nitrogen (C:N) ratio and aridity index still had more standard total effects (STEs) (0.929, −0.289, −0.226, respectively, Fig. 2c) after considering the causal relationships between variables.

Nonlinear responses of microbial residues to drought, soil carbon-to-nitrogen ratio, and soil organic carbon

Three important environmental variables (SOC, soil C:N ratio, and aridity index) suggested by SEM were selected for further analysis (Fig. 2c). Linear models, quadratic models, and general additive models (GAM) were fitted to determine the relationships between amino sugar concentrations and environmental variables. Comparisons of Akaike information criteria (AIC) values among the models showed that nonlinear relationships used in the GAMs were best fits for the relationship between amino sugars and aridity index, soil C:N ratio, and SOC concentration (Table 2 and Supplementary Fig. 1). We identified threshold levels for the increase in aridity index (0.768) and soil C:N ratio (9.583 [0.57 after ln (x + 1) conversion, and 2.26 after ln (x) conversion, respectively], Fig. 3a, b) for the accumulation of amino sugars. Above this threshold, the concentrations of amino sugars decreased significantly (Fig. 3c, d). According to the generalized climate classification scheme of aridity index values formulated by the United Nations Environment Program in 1997, this threshold level would lie within the humid climate class (aridity index >0.65; Supplementary Table 1). The threshold of soil C:N ratio was between the lower quartile and the lower decile, and was less than the mean C:N ratios (14.16 ± 0.35) observed in our study (Supplementary Fig. 2). For SOC, the curvature at the threshold did not affect the original trend, and there was a linear increase in accumulation of amino sugars with SOC (Fig. 4).

Table 2 The best model for amino sugar response for each variable.
Fig. 3: Models of amino sugar responses to aridity index and soil C:N ratio.
figure 3

a, b Nonlinear responses of amino sugars to aridity index and soil C:N ratio. c, d Differences of predicted values of variables under aridity and soil C:N ratio thresholds. Data for each variable (a, b) are log-transformed. a, b Black, red, and blue solid lines represent the smoothed trendline fitted by the general additive model (GAM), and the linear fitting on the left and right sides of each threshold, respectively. Black numbers and vertical dashed lines describe the identified threshold. Violin graphs (c, d) show the bootstrap predictions for each variable on each side of each threshold (red plots: regression before the threshold; blue plots: regression after the threshold). Three asterisks indicate a significant difference according to the Mann–Whitney U-test before and after the threshold at p < 0.001.

Fig. 4: Models of amino sugars responding to soil organic carbon (SOC).
figure 4

a Nonlinear responses of amino sugars to SOC. b Differences of predicted values of variables under SOC threshold. The data are logarithmically transformed; details are as described in the Fig. 3 legend.

Predictors of microbial residues in different ecosystem types and climatic zones

Since the majority of study sites, forests and grasslands (87.5% and 99.3%, respectively, Fig. 1) are located in subtropical and temperate zones, we conducted independent analyses for subtropical forests, temperate forests, subtropical grasslands, and temperate grasslands. Our Random Forest models suggested that SOC and aridity index were the most important predictors of amino sugars in subtropical grasslands, temperate grasslands, subtropical forests, and temperate forests (environment variable: p < 0.05, Fig. 5). Soil pH and C:N ratio also significantly predicted amino sugars in grasslands (p < 0.05, Fig. 5a, b), while the absolute latitude predicted amino sugars in forests (p < 0.05, Fig. 5c, d). In contrast to other categories, soil clay content was the most influential factor determining amino sugars in temperate forests (p < 0.05), while the effect of soil pH was not significant (Fig. 5).

Fig. 5: Random Forest models of different ecosystems and climatic zones.
figure 5

a Subtropical grassland. b Temperate grassland. c Subtropical forest. d Temperate forest. Random Forest models showing the average predictive importance (mean square error (MSE) increase percentage) of each environmental factor in the absolute content of soil amino sugars (characterization of microbial residues). Asterisks indicate the significance of each predictor, with one and two asterisks indicating p < 0.05 and p < 0.01, respectively.

Our SEMs explained 88.2, 74.4, 74.6, and 62.4% of the variance in amino sugars in subtropical grasslands, temperate grasslands, subtropical forests, and temperate forests, respectively (Fig. 6). From the STEs of SEMs of all categories, we identified SOC as the main positive regulatory factor for amino sugars (Fig. 7). In grasslands, aridity index and soil pH were the next most important predictors (Fig. 7a, b), while the absolute latitude strongly predicted amino sugars in temperate grasslands (STE: 0.280, Fig. 7b). In subtropical forests, the effect of soil pH on amino sugars was second only to that of SOC, followed by soil C:N ratio and soil clay content (Fig. 7c). The soil C:N ratio in temperate forests showed a strong negative effect on amino sugars (STE: −0.272, Fig. 7d).

Fig. 6: Structural equation models (SEMs) for ecosystems and climatic zones.
figure 6

a Subtropical grassland. b Temperate grassland. c Subtropical forest. d Temperate forest. The SEMs show the causality and correlation between the absolute content of soil amino sugars (characterization of microbial residues) and environmental variables. Red lines indicate positive effects, while blue lines indicate negative effects. The thickness and color of the line are directly proportional to the standardized path coefficient on the single arrow. SOC soil organic carbon. R2 represents the variance of biomarkers explained by the model. The asterisk (*) indicates the significance of the path; one, two, and three asterisks indicate p < 0.05, p < 0.01, and p < 0.001, respectively.

Fig. 7: Standard total effects of environmental variables on the absolute content of soil amino sugars.
figure 7

Standard total effects (direct and indirect effects) in structural equation models of: a subtropical grassland, b temperate grassland, c subtropical forest, and d temperate forest.

Discussion

In terrestrial ecosystems, latitude significantly affects light intensity and thus photosynthesis in surface vegetation, resulting in the inhibition of soil C input and changes in soil C fluxes44,45,46. Globally, although the amino sugars closely related to SOC had a significant correlation with latitude (p < 0.05, Fig. 2b), in the present study, the correlation coefficient between them was markedly low (R2 = 0.044, Fig. 2b), and the SEM revealed low STE for latitude (0.030, Fig. 2c). This suggests a weak effect of latitude on the accumulation of amino sugars. However, we found that SOC, soil C:N ratio, and aridity index were the most robust predictors of amino sugars (Fig. 2c). Several studies have reported strong positive correlations between amino sugars and SOC47,48, with higher soil organic matter content associated with higher concentrations of soil microbial residues. Our results are consistent with these findings; although microbial necromass C accounts for a large proportion of SOC10, it does not completely determine the amount of SOC. However, SOC can be used as a predictor of microbial residue concentration.

Aridity is a key driver of biological and geochemical processes49,50. Increasingly arid climate conditions have altered the elemental (C, N, etc.) balance in ecosystems51, resulting in decreased available soil C and N, poor soil nutrient supply, and increased N limitation on the growth of plants and soil microorganisms51,52. This could cause a decline in microbial function, thereby affecting ecosystem sustainability53. Additionally, drought conditions weaken the evaporative cooling effect of plant leaves, reinforcing the effects of high temperatures54. Such relationships between soil water and temperature may lead to the production of heat waves55, further threatening microorganism survival. Although SOC increased significantly with decreased aridity (increase in aridity index; Supplementary Fig. 3a), the amino sugar concentrations did not increase continuously (STE: −0.226, Fig. 2c). Therefore, no significant linear relationship was observed between aridity index and microbial residues (Supplementary Fig. 3b). It is known that an increase in aridity results in limitation of water availability26,56, reduction in soil hydraulic conductivity57, formation of disconnected resource islands58, and reduction in metabolic activity and substrate utilization efficiency of soil microbial communities24,59. Moreover, aridity can reduce the capacity of microorganisms as decomposers to utilize SOC and plant litter for growth and reproduction, thus reducing the efficiency of accumulation of microbial residues25,60. Drought conditions can also reduce the production of plant litter and root biomass, reducing plant C input61. The consequent lack of available substrates can in turn delay the production and accumulation of microbial residues.

If the global climate situation improves and aridity stress decreases, soil water content could increase, and ecosystem processes could gradually recover. In such a context, the content of plant-derived C transformed into microbial-derived C by microbial utilization gradually increases, and the efficiency of transfer and stabilization of microbial residues to soil increases6,26. However, extremely humid climates and active microorganisms are not conducive to the accumulation of SOC (plant-derived or microbial-derived C, etc.). The movement of oxygen in the soil is limited by extremely high soil moisture content58. Under such conditions, the growth efficiency of microorganisms is reduced, resulting in lower microbial CUE, which in turn affects the production, accumulation, and stabilization of microbial residues in the soil.

Such environments are more disadvantageous to fungal communities. Because fungi are aerobic, many bacteria can survive under anaerobic conditions62, shifting microbial community structure to bacterial dominance under humid climates61. However, bacterial residues are more easily decomposed than fungal residues, explaining the gradual decrease in microbial residues27. Additionally, under wet conditions, the fast leaching channels are reused first61, and the leaching loss of available N entails additional N requirement by surface soil microorganisms to meet their growth needs61. The reduction in microbial growth efficiency and CUE reduces the efficiency of microbial-derived C stabilization. Moreover, soil microbial residues may move to deeper layers with leaching, sharply reducing microbial residues in surface soil.

Aridity indirectly affects the accumulation of microbial residues through changing the soil C:N ratio61. Changes in microbial residues with changes in this ratio occur after an irreversible critical point, which may also be considered a threshold. C and N are closely related to the growth and development of microorganisms in the biogeochemical cycle63. In an ecosystem with limited nutrients and resources, microorganisms put more energy into catabolism through the synthesis of extracellular enzymes, decomposing complex organic matter and releasing compounds suitable for plants and microorganisms64, rather than on anabolism (microbial biomass synthesis)65. The efficiency of microbial anabolism is weakened, thereby reducing microbial CUE66, the production efficiency of microbial-derived C, and the efficiency of long-term slow-cycling microbial-derived C accumulation in soil67, as well as ultimately reducing the relative contributions of microbial residues to SOC68. The contributions of microbial-derived C to the SOC pool could be enhanced by application of fertilizers with high N content to alleviate environmental pressure69,70. Excluding the case of intensive traditional farming activities, environments with sufficient nutrient supply can be considered to accelerate the accumulation of soil microbial residues68,71. Hence, the proportion of microbial residues in SOC is higher in soils with low C:N ratios37,72 due to relatively high soil nutrient content enabling greater anabolism in microorganisms. Moreover, unstable substrates (organic matter with low C:N ratio) can improve the CUE of microorganisms, which is conducive to the formation and accumulation of microbial necromass in mineral soils52,73.

We found that soil nutrient supply did not continue to promote the accumulation of microbial residues above a certain level (Fig. 3b). This may be explained by the effect of the soil C:N ratio on the bacteria:fungi ratio74. The distribution of fungi is strongly limited by the availability of resources, with low nutrition and acidic environments more conducive to bacterial than fungal growth74. However, fungal necromass constitutes the majority of microbial residues (Supplementary Fig. 4 and Supplementary Data 1). Hence, microbial residues decrease when conditions are not optimal for the accumulation of fungal residues. Alternatively, we speculate that this could be the result of interactions among drought, plants, and microorganisms.

Drought reduces the plant N absorption, increasing ammonium N and nitrate N in the soil75. Furthermore, the weakening of microbial activity reduces the mineralization rate of N76, leading to a lower C:N ratio in the soil; this also explains the lower content of microbial residues in high-nutrient soil. To clearly identify the changes in microbial residues caused by drought, and the coupling between plants and microorganisms, we fit the values of amino sugars using a GAM, which revealed a clear threshold for the reduction in microbial residues (Supplementary Fig. 5). Under increased drought conditions in terrestrial ecosystems, soil nutrients seem to become sufficient (p < 0.001, Supplementary Fig. 3c). In such a context, the microbial community structure, enzyme secretion characteristics, and microbial biomass formation will reach optima. The microbial carbon pump process, which is driven by microorganism turnover following growth and death in microbial communities6, has the greatest effect, with microbial residues gradually reaching a peak. When microbial communities achieved states in which stable C produced by anabolism is transferred to the soil with maximum efficiency, the accumulation of microbial residues in soil began to slow down, and then decreased sharply.

The dominant factors predicting microbial residue accumulation varied, but SOC and aridity index were key factors in all cases (Figs. 5 and 7). Although Random Forest analysis showed that soil C:N ratio was not important in the forest (Fig. 5c, d), it still had large STEs (Fig. 7c, d). Among the other environmental variables, soil pH had a considerable effect on microbial residues in all datasets, excluding temperate forests. Previous studies have shown that soil pH can control soil microbial community structure, affect the adsorption of soil minerals on soil organic matter, and even affect the recycling efficiency of microbial residues by microbial communities74,77. Therefore, dynamic changes in soil pH can affect microbially mediated C sequestration7. Moreover, we noticed that the effect of pH in subtropical forests was similar to that of SOC, which was significantly stronger than the effect of these factors in other regions. This may be because subtropical forest pH is relatively low, and the accumulation of microbial residues is more sensitive to changes in pH. Similarly, clay mineral content also plays an important role in stabilizing microbial residues26,78. The stable storage of SOC is related to both low decomposition as well as the protective mechanism of soil minerals79. Microbial residues are stabilized in fine-grained soil minerals by adsorption and binding, and physical protection prevents microbial residues from being reused by microbial communities69,79,80. Therefore, soil with higher clay mineral content generally has higher organic C content81. Moreover, clay can maintain the moisture and nutrient elements of the surface soil, resulting in greater surface microbial activity82,83. Hence, the effect of clay on the accumulation of microbial residues is not the same in different data sets due to these two factors.

Our research has some limitations. First, our criteria for data compilation were not perfect. For example, we calculated the mean for soil amino sugars for the 0–10 and 10–20 cm layers in some studies assuming equal weights, because we reasoned that in most studies, the same quality of soil was used for the determination of amino sugar content; however, this may not be the actual situation. In addition, although we expected the topsoil data to be 0–20 cm, a small number of studies did not reach this soil depth, but we also included them in our synthesis. As the amino sugar concentrations decreased sharply with soil depth, these data may not be consistent with the amino sugar concentrations of soil at the depth of 0–20 cm.

Second, most of our data are from the northern hemisphere, especially from temperate and subtropical regions, with little to no data from boreal and tropical regions (Fig. 1). Although our selection criteria for reducing publication bias excluded some data, we were able to obtain heterogeneous data. This lack of data presents significant limitations on global estimates of microbial residue accumulation. Improvements in data collection and accurate estimation of such residues will be the focus of our future study.

Third, we used data from the global high-resolution (250 m) gridded soil properties database (http://data.isric.org) because some data were not provided in original studies. Such grid-prediction data may not match data from the actual research studies. Finally, there remains uncertainty about soil amino sugars as a biomarker for microbial necromass. Because the soil needs hydrolysis or digestion before the detection of soil amino sugars33, which means that the detected sugars exist more or less in the complete cell biomass10; therefore, the content of soil amino sugar is affected by microbial biomass to a certain extent. Therefore, future studies should explore the relationship between amino sugars and biomass to determine the accurate amino sugar content for the characterization of microbial residues.

In conclusion, our meta-analysis identified SOC, and drought as the main factors influencing global and regional microbial residues. In the wake of global climate change, microbial communities could experience an optimal anabolic phase. In this phase, the soil microbial carbon pump mediated soil C capture process will exhibit the best response state. At the global scale, the optimum appears when the aridity index is ~0.768 or the soil C:N ratio is ~9.583, and the amount of microbial residues accumulation peaks. Above this level, microbial residues reduce significantly. We suggest that may be due to changes in dominant microbial species in communities. Collectively, our results provide insights into the optimal conditions for microbial residue accumulation. Our findings provide a useful reference for monitoring and management of terrestrial C storage.

Methods

Data sources

We extensively searched the Web of Science (http://apps.webofknowledge.com) and the China Knowledge Resource Integrated Database (http://www.cnki.net) for scientific articles on microbial residues. Data were collected from field trials published before December 2020. Search terms were “amino sugars” or “microbial necromass” or “microbial residues,” combined with “grassland” or “forest” or “woodland”. To reduce the influence of publication bias, only studies meeting the following seven criteria were selected:

  1. 1.

    As previous studies have shown that amino sugars, the biomarkers of microbial residues, exhibit maximum release after hydrolysis with hydrochloric acid for 6–8 h at 105 °C33, only data from articles using hydrolysis with 6 mol/L hydrochloric acid at 105 °C for 6–8 h and determination by gas or liquid chromatography were used. In the datasets we collected, only eight did not use the method of Zhang and Amelung33 (Supplementary Data 2); therefore, we ignored small differences between the methods for determining the concentration of amino sugars in different studies, such as derivatization, so that we could effectively compare and contrast data from different studies.

  2. 2.

    Sampling depth was clearly defined. Our aim was to investigate the mineral topsoil, excluding the litter layer and O horizon soil. Since most of the studies were carried out at mineral soil depths less than 20 cm, we defined topsoil for this study as 0–20 cm. We selected all the studies that measured amino sugars at 0–20 cm depths. In some studies, one sample included both 0–10 cm and 10–20 cm layers. Since the quality of soil samples used to determine the concentrations of amino sugars in most studies was similar, we calculated the arithmetic mean of the two concentrations at the same point to represent the concentration of amino sugars at 0–20 cm depth. Similarly, with concentrations determined at 0–5 cm, 5–10 cm, and 10–15 cm depths in some studies, we used the arithmetic mean of the three concentrations.

  3. 3.

    Studies in which contents of glucosamine, galactosamine, muramic acid, and mannosamine were provided in the absence of direct supply of total amino sugar concentrations, as these four types of amino sugars are most commonly quantified32. Since the proportion of mannosamine is usually less than 4%32, its concentrations were not provided in some studies. Therefore, for these studies, we used measurements of the other three types of amino sugars, which are readily quantified, to calculate the total amino sugar concentrations32. Some studies only reported the concentrations of fungal necromass C and bacterial necromass C. We used two conversion factors to convert these to glucosamine and muramic acid concentrations10. Subsequently, we added them for the calculation of total amino sugar concentrations. Other studies only reported concentrations of glucosamine C, galactosamine C, muramic acid C, and mannosamine C. We converted these concentrations to corresponding amino sugar concentrations, using their relative molecular masses. See below for the detailed calculation method.

  4. 4.

    All amino sugar concentration data were obtained from in situ measurements; laboratory incubation and model simulation data were discarded.

  5. 5.

    Data from soil fractions with different aggregate sizes were not used; only those from bulk soil were used. Data from living microbial cells were not used.

  6. 6.

    In manipulation experiments (e.g., warming, CO2 rising, and nitrogen addition), the concentrations of amino sugars from the control treatment were used.

  7. 7.

    Data obtained from the averages of large ranges of sample points, from which accurate location information was not available, were not used.

To study the factors affecting amino sugars in different communities, grassland and forest ecosystems were divided into tropical, subtropical, and temperate zones. If data from the same study were published in different journals, only the data from one of them were used to avoid pseudoreplicates. The amino sugar content of different altitudes, grassland, or forest types was regarded as an independent duplication in the global analysis for a single study.

We obtained 268 data points (Supplementary Fig. 6 and Supplementary Table 2) from 64 articles that met the criteria for global analysis, and three unpublished grassland data points (near Longnan City, Gansu Province, China [Supplementary Data 2, Ref. ID: 65])84. This included 1, 52, 95, 15, 40, and 65 data points from six categories: tropical grasslands, subtropical grasslands, temperate grasslands, tropical forests, subtropical forests, and temperate forests, respectively (Supplementary Data 2). At each study site, we also recorded other information from the original publications, including geographic (latitude, longitude, and altitude) and climate (mean annual temperature, mean annual precipitation, and aridity index) variables. Each dataset is indicated in the graph (Fig. 1a).

Other climate and soil attribute data sources

Since most published articles do not include such data, a global climate database (Worldclim, version 2.0) was used to obtain climate information (mean annual temperature, mean annual precipitation). Similarly, the global high-resolution (250 m) gridded soil properties database (http://data.isric.org) was used to obtain some soil physical and chemical properties (i.e., clay content, pH, SOC content, and N content). Aridity index data were obtained from the Global Aridity and PET Database (http://www.cgiar-csi.org/data/global-aridity-and-pet-database). The Global Land Cover Characteristics Database v2.0 was used to obtain altitude data (https://lta.cr.usgs.gov/GLCC). To evaluate the consistency of the predictors of amino sugars, we ensured that the depth of data used in the database was consistent with the depth of sample points. These data were obtained using the ESRI ArcMap (Environmental Systems Research Institute, Redlands, CA, USA).

Calculation of amino sugar concentrations

We calculated the absolute concentration of amino sugars mg g−1 soil in the surface layer (0–20 cm) of soil. The total amino sugar content is equal to the sum of glucosamine, galactosamine, muramic acid, and mannosamine contents.

Some studies only reported the contents of fungal necromass C and bacterial necromass C in SOC, which is based on the concentration of glucosamine and muramic acid combined with two conversion factors. The absolute concentrations of muramic acid mg g−1 soil and glucosamine mg g−1 soil were calculated as follows39:

$${{{{{\rm{Muramic}}}}}}\; {{{{{\rm{acid}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)=\frac{{{{{{\rm{Bacterial}}}}}}\; {{{{{\rm{necromass}}}}}}\; {{{{{\rm{C}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{SOC}}}}}}\right)\times {{{{{\rm{C}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)}{1000\times 45}$$
(1)
$$ {{{{{\rm{Glucosamine}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)=\\ \left(\frac{{{{{{\rm{Fungal}}}}}}\; {{{{{\rm{necromass}}}}}}\; {{{{{\rm{C}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{SOC}}}}}}\right)\times {{{{{\rm{C}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)}{1000\times 9\times 179.17}+\frac{2\times {{{{{\rm{Muramic}}}}}}\; {{{{{\rm{acid}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)}{251.23}\right)\times 179.17$$
(2)

In these equations, 45 is the conversion factor from muramic acid to bacterial necromass C, and 9 is the conversion factor from glucosamine to fungal necromass C. It is assumed that the ratio of muramic acid and glucosamine in bacterial cells is 1:239,85.

In some studies, only glucosamine C, galactosamine C, muramic acid C, and mannosamine C were reported. Since muramic acid has nine C atoms, while the other amino sugars have six C atoms, the absolute concentration of each amino sugar mg g−1 soil was calculated as10:

$${{{{{\rm{Muramic}}}}}}\; {{{{{\rm{acid}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)=\frac{{{{{{\rm{Muramic}}}}}}\, {{{{{\rm{acid}}}}}}\, {{{{{\rm{C}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)\times 251.23}{108}$$
(3)
$${{{{{\rm{Glucosamine}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)=\frac{{{{{{\rm{Glucosamine}}}}}}\; {{{{{\rm{C}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)\times 179.12}{72}$$
(4)
$${{{{{\rm{Galactosamine}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)=\frac{{{{{{\rm{Galactosamine}}}}}}\; {{{{{\rm{C}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)\times 179.12}{72}$$
(5)
$${{{{{\rm{Mannosamine}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)=\frac{{{{{{\rm{Mannosamine}}}}}}\; {{{{{\rm{C}}}}}}\left(\frac{{{{{{\rm{mg}}}}}}}{{{{{{\rm{g}}}}}}}{{{{{\rm{soil}}}}}}\right)\times 179.12}{72}$$
(6)

Statistical analysis

Microbial residues

Unless otherwise specified, all statistical analyses were conducted using R 4.0.3. Before conducting statistical analyses, we tested the normality of all data. The Scheirer–Ray–Hare test was used to assess differences in amino sugars across vegetation types, climate zones, and to determine their interactions. Additionally, since there was only one data point for tropical grassland, differences in amino sugars among the five categories (subtropical grasslands, temperate grasslands, tropical forests, subtropical forests, and temperate forests) were analysed using the Kruskal–Wallis test. The two-tailed Spearman’s linear correlation was used to explore global correlations between amino sugars and environmental variables. Statistical significance was assessed at p < 0.05.

Evaluating the importance of environmental variables

We used all the amino sugar data for model analysis. The Random Forest model is a machine learning algorithm for regression and classification. We used Random Forest models to identify the most important environmental variables (absolute latitude, aridity index26, soil clay content, SOC, soil C:N ratio, and soil pH)86,87. The importance of variables was evaluated by classifying multiple decision trees86. Since our purpose was to only determine the importance of predictors and not to predict the data, we used the whole dataset for analysis, without dividing the data into training and prediction sets. These analyses were performed using the randomForest88 package in R 4.0.3 (http://cran.R-project.org/). The significance of the model and the cross-validation R2 were evaluated by using the A3 package. Similarly, the rfPermute package was used to assess the significance of each predictor’s importance to amino sugars.

We used SEM to test whether the relationship between amino sugars and environmental factors remains unchanged when considering causal relationships among multiple environmental factors at the same time. Because the correlation between SOC and N was significant (Spearman’s R = 0.91), only SOC was selected as the organic component of SEM. Before performing the SEM, we performed logarithmic transformation for non-normal variables and standardized each variable using the Z-score transformation to improve the comparability of the data89. We built a prior model (Supplementary Fig. 7) based on existing knowledge, and determined the final SEM through maximum likelihood estimation and based on an overall goodness-of-fit, including chi-square (χ2) statistics, whole-model p value, goodness-of-fit index, and the root-mean-square error of approximation90. Since some variables were non-normal, the Bollen–Stine bootstrap test was used to recalculate the overall fit of the model90. When the bootstrap p value was greater than 0.1, the model was considered to have a good fit90. To integrate the function of SEM, we calculated the STE of each environmental variable. Since most studies on forests and grasslands (87.5% and 99.3%, respectively, Fig. 1) were located in subtropical and temperate zones, we only conducted independent Random Forest and SEM analyses on temperate and subtropical forests and grasslands. SEM analysis was performed using Amos 26.0 (Amos IBM, USA).

Linear and nonlinear responses of environmental variables

For global data, we fitted linear and nonlinear (e.g., GAM91) regressions to the relationships between variables with large effect values and amino sugars shown by SEM. The linear model assumes that the response of amino sugars to environmental variables is gradual92. The GAM models indicated that the gradient of environmental variables is nonlinear but continuous92. We selected the GAM model to describe the complexity of nonlinear trends (through smoothing parameters91). We then used the AIC to determine the best-fit model for each environmental variable92. In general, a difference in AIC values >2 indicates that the models are significantly different, with the most likely model being the one with the lowest AIC value92.

Threshold detection

The existence of thresholds can be explored and nonlinear trends determined only when the nonlinear model is suitable92. As described by Goffman et al.93 and Miguel et al.92, we fitted segmented regressions by actively searching for continuous thresholds, with abrupt changes in the slope on both sides of the threshold94, and searching for discontinuous thresholds or breakpoint to fit step + segmented (stegmented) regressions, with changes in intercept and slope on both sides of the threshold94. In addition, when segmented or stegmented regression are fitted to the GAM regression model, segmented or stegmented regression can reveal the maximum curvature point of fitting92. This can be considered to be a threshold because it shows the extreme value of amino sugar response to environmental variables, even if the fitting of segmented or stegmented regression is worse than that of the GAM model92.

Therefore, for environmental variables that the GAM models fit better than linear models, we fit segmented and stegmented regressions. These models all provide a threshold point for prediction, which demonstrates the change in functional relationship (slope or slope + intercept of segmented and stegmented regressions, respectively94). We considered this to be the threshold of the GAM regression model. We used the AIC to select the most suitable threshold model for data. Segmented/stegmented and GAM regressions were fitted with the chngpt94 and gam packages of R 4.0.3, respectively.

Verifying the importance of the determined threshold

To test whether the determined threshold significantly affects the intercepts of stegmented regressions, we conducted linear regressions on both of its sides of the threshold92. Then, we extracted the intercepts and used the boots package in R to perform 1000 bootstrap samplings before and after the threshold of environmental variables for prediction, and tested the difference using the Mann–Whitney U-test. The global map, fitting curve, and histogram of sample distribution in this study were all plotted using R 4.0.3.