Introduction

Historical drying trends have been demonstrated to occur over the land surface, mostly in the subtropics and midlatitudes, through diverse indicators such as aridity and drought indices, the extent of drylands, and the frequency and severity of drought events1,2,3,4,5,6,7,8,9,10,11. Such drying trends are also widely projected to continue during the twenty-first century, especially under the high greenhouse gas emission pathways1,2,3,4,5,6,8,9. However, the detailed magnitudes, statistical significance, and spatial patterns of the drying trends depend considerably on the choice of specific drought or aridity indices9,12,13, and on the sources of historical data (e.g., site measurements, satellite products, model outputs, or observational proxy data)14,15,16. Meteorological drought or aridity indices based on precipitation and potential evapotranspiration are widely used2,4,6,8,17, but their long-term trends are disconnected from the trends in vegetation growth or individual hydrological variables (e.g., leaf area index [LAI] and runoff)9,12,18. Also, the vast majority of past studies focused on Earth system model (ESM) simulations1,3,5,6,10, or used one or a few data sets (e.g., satellite observations, reanalysis, offline model simulations)2,7,8,9,11, resulting in high potential biases or uncertainty in drying or wetting except in a few hot-spot regions (e.g., Southern Europe, southern/southwestern North America, southern Africa)15,19.

The causes of terrestrial drying can be understood in terms of the effects of natural internal variability (e.g., teleconnections), natural solar variability and volcanic eruptions, and human-induced greenhouse gases and aerosol emissions. Previous studies16,17,20 have effectively separated these effects by using formal detection and attribution (D&A) methods21 on ESM predictions that have different combinations of these natural and anthropogenic factors turned on and off. However, these studies focused on either meteorological drought17,20 or agricultural drought over part of the globe16, and they did not reveal seasonal variations or vertical differences across soil layers. The D&A of seasonal multilayer soil moisture changes will add value to existing understanding of terrestrial drying because of the direct relevance of soil moisture to terrestrial biophysical and biogeochemical processes9,22,23,24 and land-atmosphere feedbacks25.

To date, the analysis of trends and the D&A analysis of soil moisture have been hampered by the limited availability of continuous, long-term, broadscale observations (e.g., multilayer soil moisture, soil moisture before the satellite era) and the low signal-to-noise (S/N) ratio of most water cycle changes11,26,27,28,29. We recently developed a set of long-term soil moisture data sets that were merged from a comprehensive list of observations, reanalysis, and offline model simulations and showed better performance than the source data sets30. These merged data sets can effectively reduce the potential biases and uncertainty caused by the limited sampling of source data sets in past studies2,7,8,9,11. In this study, we converted the merged data sets to a drought index (the 3-month Standardized Soil Moisture Index [SSI]; “Methods” and Supplementary Methods Sect. 1) and conducted a formal pattern-based D&A analysis16,17,21 using, on one hand, the average value of the merged SSI as pseudo-observation and, on the other hand, the SSI of the latest Coupled Model Intercomparison Project Phase 6 (CMIP6) historical and single-forcing experiments31,32. We further developed a generalized additive model (GAM)-based emergent constraint method11,33 to constrain the future S/N ratios of the D&A analysis16,17,21 and the trends in the SSI under a high-emission scenario (Shared Socioeconomic Pathway [SSP] 5-8.5). In the D&A analysis, we followed the practice of previous D&A studies on hydrological variables17,20,34,35,36 to enhance the S/N ratio by aggregating the pseudo-observed and simulated SSI to zonal averages. The zonal averaging reduces the influences from natural internal variability, which increases in importance at smaller spatial scales37, and the less-understood local-scale forcings such as land use and land cover change38, but retains the influences from large-scale circulations (e.g., the expansion of the Hadley cell39,40, the shifting of the Intertropical Convergence Zone17). We conducted the analysis for each month and both the surface (0–10 cm) and root-zone (0–100 cm) layers to reveal the seasonal and vertical patterns of possible anthropogenic influences on the SSI changes.

Results

SSI trends in the pseudo-observation and CMIP6 simulations

The global average 3-month SSI time series showed consistent drying signals from 1971 to 2016 in the pseudo-observation and the CMIP6 ALL simulations (forced by all anthropogenic and natural forcing agents) during April–September and October–March, for the surface and root-zone soil layers (Fig. 1a, b, e, f). The differences in SSI between the average values of the brightening decades (1987–2016) and the end of the dimming decades (1971–1986)41 () ranged between −3.57 × 10−2 and −1.54 × 10−2, with larger ∆ occurring in the surface soil layer than in the root-zone layer during April–September in both the pseudo-observation and the ALL simulations (Fig. 1a, b, e, f). In the ALL simulations, the global historical drying trends continued into the future and were greater in the surface soil layer than in the root zone (Fig. 1a, b, e, f).

Fig. 1: Historical and future evolution of the 3-month Standardized Soil Moisture Index (SSI) in the 0–10 cm and 0–100 cm soil layers.
figure 1

a, b, e, f The global mean time series of the October–March (ONDJFM) and April–September (AMJJAS) average SSI. Black lines represent the pseudo-observation (Mean NonCMIP). Blue lines and blue shading represent the average and 95% confidence intervals of the ALL simulations (forced by all anthropogenic and natural forcing agents). \(\Delta\) represents the difference between the mean SSI of 1987–2016 and 1971–1986. c, g The month-by-latitude SSI trends over 1971–2016 of the pseudo-observation. Vertical hatching indicates where the trends had the same signs as the average trends of the ALL simulations. Horizontal hatching indicates where the trends had the opposite signs to the average and were outside the 95% confidence intervals of the trends of the ALL simulations. The month abbreviations are, from left to right (J, F, M, A, J, J, A, S, O, N, D), in the order of January–December. Gray shading indicates where the trends were significantly different from zero at 95% confidence level. d, h The month-by-latitude SSI trends over 1971–2016 of the ALL simulations, averaged over all the models. Vertical hatching indicates at least 90% of the ALL simulations agreed on the signs of the trends, and horizontal hatching 80%. Gray shading indicates where more than 50% of the ALL trends were significantly different from zero at 95% confidence level. The month abbreviations are, from left to right (J, F, M, A, J, J, A, S, O, N, D), in the order of January–December.

Zonal patterns showed that the global average drying were mainly driven by the Northern Hemisphere between 20°N and 40°N and the Southern Hemisphere, where the pseudo-observation and the ALL simulations showed consistent drying trends in both the surface and the root-zone layers (Fig. 1c, d, g, h). The pseudo-observation showed more wetting and less drying than the ALL simulations above 40°N, but these differences were mostly within the uncertainty caused by natural internal variability and structural differences across the CMIP6 ESMs (i.e., within the 95% confidence interval [CI] of the ALL simulations; Fig. 1c, d, g, h). The major differences (i.e., pseudo-observation outside the 95% CI of the ALL simulations) around 60°N in the Northern Hemisphere spring were caused by differences in eastern Canada and western Europe (Supplementary Fig. 2), which were likely related to the overestimation of the increasing trends in air temperature42 and potential evapotranspiration in the ALL simulations (Supplementary Fig. 3). The pseudo-observation also had major differences from the ALL simulations in a few months near 30°N, and in February–July in 0°N–20°N (Fig. 1c, d, g, h). The former difference may be caused by the underestimated rate of the expansion of the Hadley cell by the ALL simulations39,40 or model biases in the Monsoon-related precipitation in northern India (Supplementary Fig. 3; Singh et al.43). The latter may be caused by biases in the merged soil moisture data sets for the pseudo-observation in the Sahel region. The merged soil moisture data sets partially depended on the CERA20C, ERA20C, ERA-Interim, and ERA5 reanalyses30, whose precipitation drivers all display negative biases in the temporal trends over the Sahel region compared to gridded rain gauge observations44 (Supplementary Fig. 4).

D&A analysis of the zonally averaged 3-month SSI

In this study, we used a pattern-based D&A method to investigate the influence of external forcings on the pseudo-observed SSI changes. In the first step, for each soil layer and each month of the year, we calculated the ALL fingerprint, which represents the mode-based spatial signature of SSI changes in response to the combination of anthropogenic and natural forcings (“Methods” and Supplementary Methods Sect. 2). The ALL fingerprint was defined as the leading empirical orthogonal function (EOF) of the multi-model average zonal-mean SSI anomalies over the 1971–2100 period, derived from the concatenated CMIP6 historical and future climate simulations forced by changes in all the anthropogenic and natural forcings (Supplementary Table 2). The combined month-latitude fingerprint patterns (Fig. 2a, e) resemble, for both soil layers, the month-latitude patterns of the ALL trends (Fig. 1d, h), indicating that the fingerprints captured the main spatiotemporal characteristics of the zonally averaged 3-month SSI changes. The greatest drying responses in the ALL fingerprints occurred in the summer season (i.e., June–August in the Northern Hemisphere and December–February in the Southern Hemisphere) (Fig. 2a, e), which were slightly shifted compared with the months with greatest drying in the ALL trends (Fig. 1d, h). To verify the seasonal shifts were not caused by difference in time periods employed to calculate the fingerprints, we calculated the fingerprints over 1971–2020 (ALL-2 fingerprints; Fig. 2b, f) instead of 1971–2100 and found similar seasonal shifts. In comparison with the ALL fingerprints (derived from 84 concatenated simulations), the GHG fingerprints (derived from 30 historical simulations forced by anthropogenic greenhouse gases only) showed more widespread drying trends, and the AER fingerprints (derived from 28 historical simulations forced by anthropogenic aerosols only) showed more wetting trends (Fig. 2c, g, d, h; Supplementary Table 2). These differences suggest that both the GHG and AER forcings influenced the ALL fingerprints. We note that during January–June, the AER fingerprints were quite different from the AER trends and displayed discontinuity compared to July–December (Fig. 2d, h; Supplementary Fig. 5). These were caused by the lack of clear directional changes in the AER-forced changes in SSI, as indicated by the mostly insignificant trends in the principal components associated with the AER fingerprints in these months (Supplementary Fig. 6).

Fig. 2: Month-by-latitude fingerprints of the CMIP6 simulations under different forcings for the 3-month Standardized Soil Moisture Index (SSI) in the 0–10 cm and 0–100 cm soil layers.
figure 2

ah The forcings abbreviations are: ALL and ALL-2—forced by all anthropogenic and natural forcing agents, GHG—forced by anthropogenic greenhouse gases only, AER—forced by anthropogenic aerosols only. The ALL fingerprints were calculated on zonally averaged 3-month SSI over the 1971–2100 period, and the ALL-2, GHG, AER fingerprints over the 1971–2020 period. The month abbreviations are, from left to right (J, F, M, A, J, J, A, S, O, N, D), in the order of January–December.

The second step of the D&A method determines whether the ALL fingerprints were statistically detectable in the pseudo-observed SSI changes (“Methods” and Supplementary Methods Sect. 2). To achieve this goal, for each soil layer and each month, we projected the pseudo-observed zonal SSI during 1971 to 2016 onto the ALL fingerprint, resulting in a 46-year time-series that reflected the spatial agreement between the ALL fingerprint and the pseudo-observed SSI through time. The trend in this time series constituted the 46-year pseudo-observed signal. We also projected the zonal SSI of the concatenated control runs simulations, which were only influenced by natural internal variability, onto the ALL fingerprints. Note that the SSI patterns from the control simulations should not resemble the ALL fingerprint, except by chance. Using all the overlapping chunks of 46-year segments in the projected control time series, we calculated a probability distribution of unforced 46-year trends. We considered a pseudo-observed signal to be detectable at a 95% confidence level when the signal lay outside the two-sided 95% CI of the unforced trends, which indicates that the signal is very unlikely to result from internal variability alone. In the last step of the D&A, for each soil layer and each month, we projected the SSI from historical simulations under 6 different sets of forcing agents onto the ALL fingerprint. We then calculated the 1971 to 2016 trends and obtained the 6 probability distributions of simulated forced signals. We considered a detected signal to be attributable to a specific set of forcing agents if the pseudo-observed signal lay within the 95% CI of the distribution of the correspondingly forced simulated signals. The 6 sets of forcing agents we considered were: ALL, GHG, AER, ANT (anthropogenic forcings only), GHGAER (anthropogenic greenhouse gases and aerosols only), and NAT (natural solar and volcanic forcings only) (Supplementary Table 2).

We describe here the results of this investigation. For the surface soil layer, the 1971–2016 pseudo-observed signals were detectable from August to November. During those months, the pseudo-observed signals were within the 95% CIs of the distributions of the ALL, ANT, GHGAER, and GHG-forced signals and thus attributable to those sets of forcings (Fig. 3h–k). Those detected pseudo-observed signals were also often in better agreement with (i.e., closer to the mean of) the ALL and ANT distributions than with the GHGAER and GHG distributions (Fig. 3i–k). The surface pseudo-observed signals could not be attributed to the NAT or AER forcings (Fig. 3a–k). For the root-zone soil layer, the pseudo-observed signals were detectable and attributable to the ALL, ANT, GHGAER, and GHG forcings in September–April (Fig. 3m–p, u–x). The pseudo-observed signals could not be attributed to the NAT forcings during those months, nor to the AER forcings in October–‍December and March–April (Fig. 3m–p, u–x). In September, January‍, and February, the pseudo-observed signal was located within the 95% CI of the distribution of the AER-forced signals, but its position near the upper tail of the distribution did not warrant formal attribution. In a nutshell, the soil moisture signals were detectable and attributable to the ALL forcing (but also to the ANT, GHGAER, and GHG forcings) for a 4-month period (from August to November) for the surface layer, and a longer 8-month period (September to April) for the root-zone layer.

Fig. 3: Monthly detection and attribution of the pseudo-observed signals over 1971–2016 in the 0–10 cm and 0–100 cm soil layers.
figure 3

ax Vertical black lines represent the pseudo-observed signals. Gray shaded vertical regions represent the 95% confidence intervals of the unforced trends. Bell-shaped lines represent the fitted Gaussian distributions on the simulated signals of the various forced CMIP6 simulations: ALL (black solid, forced by all anthropogenic and natural forcing agents), ANT (brown dashed, forced by anthropogenic forcings only), GHGAER (purple dashed, forced by anthropogenic greenhouse gases and aerosols only), GHG (magenta solid, forced by anthropogenic greenhouse gases only), AER (blue solid, forced by anthropogenic aerosols only), NAT (green dotted, forced by natural solar and volcanic forcings only). Shading beneath the bell-shaped lines represent the two-sided 95% confidence intervals of the distributions. All the signals and unforced trends were on the ALL fingerprints and were for the same window length (46 years).

To quantify the relative strengths of the detected pseudo-observed signals in each month, we summarized the temporal evolution of the pseudo-observed signals using the concept of detection time45 (“Methods”). The earliest detection time occurred in October for the surface layer and in December for the root-zone layer (Table 1). The latest detection time occurred in August for the surface layer and in January and March for the root-zone layer (Table 1). These results suggested that the detected anthropogenic influences had the strongest presence in the pseudo-observation in autumn, and weaker presence in summer and spring.

Table 1 Detection times at which the pseudo-observed signals on the ALL fingerprints became significant at the 95% confidence level

Using the ALL fingerprint, we further conducted systematic sensitivity analysis on the D&A by altering the timescale, distributional fit, and soil moisture data sets used in calculating the SSI, and the set of model ensemble members for calculating the fingerprints, signals, and unforced trends (Supplementary Methods Sect. 3). Although the months in which the pseudo-observed signals were detected, and the detection times of those signals varied slightly, the results generally support our conclusions (Fig. 3, Table 1) (Supplementary Tables 47).

Evolution of the anthropogenic signals in the twenty-first century

Following the D&A on historical signals, we investigated the future presence of anthropogenic forcings using the future S/N ratios of the ALL simulations under the Shared Socioeconomic Pathway 5-8.5 (SSP5-8.5). In those S/N ratios, the signals (S) refer to the simulated ALL-forced signals, obtained as trends in 46-year moving windows in the ALL simulations during 1971–2100. The noise (N) refers to the standard deviation of all the 46-year unforced trends obtained from the projected control run time series. We considered the future S/N ratios to mean significant presence of anthropogenic forcings when the absolute values of those S/N ratios were 1.96 or greater, which was consistent with the detection criterion for the pseudo-observed signal, if one assumes that the unforced trends followed Gaussian distribution (Supplementary Methods Sect. 2.52.6).

The raw future ALL-forced S/N ratios were likely too high, considering that the average historical ALL-forced signals were often too high compared with the pseudo-observed signals (Fig. 3), and that the ESMs-based estimated noises and the ALL-forced signals have a common source of error, i.e., the same model physics. To avoid drawing too strong conclusions about the future presence of anthropogenic forcings, we constrained the raw future S/N ratios using a GAM-based emergent constraint method (“Methods”). Emergent constraint is a well-accepted approach to reduce the uncertainty in future climate projections by bringing in information from historical observations while making use of the modeled historical-future relationships11,46,47. Here, the historical observations were the pseudo-observed S/N ratios, and the modeled historical-future relationships were between the historical and future S/N ratios of the ALL simulations. The detailed formula and physical justifications of the specific implementation of emergent constraint here are in the “Methods” and Supplementary Methods Sect. 4.

The constrained simulated S/N ratios of the surface soil layer did not become significant at the 95% confidence level (i.e., did not exceed ±1.96) in December–July until the 2000s or later, but were generally significant in August–November (Fig. 4). The constrained S/N ratios of the root-zone layer were mostly significant in September–May but were not significant in June–August until 2020 or later (Fig. 4). These seasonality patterns were consistent with the historical seasonality in the detection times of the pseudo-observed signals (Table 1). The constrained S/N ratios of the surface soil layer increased over the future period in all the months, suggesting increased aridity, whereas the constrained S/N ratio of the root-zone soil layer only showed increases in May–August, and showed fluctuations in the other months (Fig. 4).

Fig. 4: Evolution of the raw and constrained signal-to-noise (S/N) ratios over the future periods 1971–2016, 1972–2017, …, 2055–2100.
figure 4

al The x-axis indicates the starting years of the future periods. Dark blue (purple) solid lines represent the constrained S/N ratios of 0–10 cm (0–100 cm). Shaded area bound by dark blue (purple) solid lines represent the two-sided 95% confidence intervals of the constrained S/N ratios of 0–10 cm (0–100 cm), which were calculated using regression results, not the Gaussian assumption. Light blue (orange) dashed lines represent the average of the raw S/N ratios of 0–10 cm (0–100 cm). Shaded area bound by light blue (orange) dashed lines represent the two-sided 95% confidence intervals of the raw S/N ratios of 0–10 cm (0–100 cm). Horizontal solid lines indicate the 1.96 threshold, above which the S/N ratios mean significant presence of anthropogenic forcings at 95% confidence level. The signals in the S/N ratios were ALL-forced (i.e., forced by all anthropogenic and natural forcing agents), on the ALL fingerprints, and for 46-year window lengths. The noises in the S/N ratios were defined as the standard deviation of the unforced trends on the ALL fingerprints for the 46-year window length.

We further analyzed the implications of constraining the future S/N ratios on the zonal trends in SSI using mathematical relationships between the S/N ratios and the trends (Supplementary Methods Sect. 5). For the surface soil layer, the adjusted future SSI trends based on the constrained S/N ratios had smaller drying magnitudes, especially during January–June, than the un-adjusted average SSI trends of the ESMs (compare the last row to the first row of Supplementary Fig. 10). However, even after emergent constraint, the surface 3-month SSI showed accelerating drying trends over the course of the twenty-first century for nearly all the latitudes and months of the year (last row of Supplementary Fig. 10). For the root-zone SSI, the adjusted future SSI trends based on the constrained S/N ratios had smaller drying trends than the un-adjusted mean SSI trends in general and had reversal from drying to wetting trends in June–July in the Southern Hemisphere (compare the last row to the first row of Supplementary Fig. 11). The accelerating wetting around 50°N during January–March and accelerating drying during October–April in the northern high latitudes and southern mid-latitudes in the root-zone SSI were consistent between the adjusted and un-adjusted future SSI trends (Supplementary Fig. 11).

Discussion

Using multi-source merged soil moisture data sets48 and the CMIP6 simulations31,32, we verified the previously reported historical and future drying of the global land surface1,2,3,4,5,6,7,8,13,14,15,49,50 and further demonstrated clear vertical, zonal, and seasonal patterns. The changes in the SSI in the two soil layers were less widespread than previously reported changes in potential evapotranspiration51, which is consistent with past findings about the overestimation of drought by potential evapotranspiration changes12,13. The surface SSI dried more rapidly than the root-zone SSI, as can be seen in the historical differences between the brightening and dimming decades during April–September (the Δ values in Fig. 1a, b, e, f), the future time series under the SSP5-8.5 scenario (Fig. 1a, b, e, f), the future evolutions of S/N ratios (Fig. 4), and the future zonal trends especially around 50°N in the Northern Hemisphere spring (Supplementary Figs. 1011). One potential cause of these vertical divergences is that the surface soil responds faster than the root-zone to meteorological conditions because of the concentration of plant roots in the surface soil and the slow speed of capillary rise27,52. In the Northern Hemisphere spring, the impacts of the increasing atmospheric evaporative demand can be exacerbated on the surface SSI by the decreasing snow cover but mitigated on the root-zone SSI by the relatively low spring vegetation activity. This contrasting effect of snow cover was supported by the fact that the correlations between snow water equivalent and SSI had opposite signs between the surface and the root-zone soil layers in the Northern Hemisphere spring (Supplementary Fig. 12). Also, reduced stomatal conductance in response to drought and the future increase in atmospheric carbon dioxide concentration53 mitigates the impact of rising temperatures on transpiration, which affects both the surface and root-zone soil moisture; but no such mitigation effect exists for soil evaporation, which mainly affects the surface layer. This mechanism is supported by the higher correlations between air temperature and the surface SSI than the root-zone SSI (Supplementary Fig. 12). Under severe drought, vegetation die-off may also exacerbate the drying of the surface SSI due to less shading.

Although the historical and future SSI trends and the fingerprints of the ALL simulations indicated mostly drying of the two soil layers, the underlying mechanisms varied by months and latitudes. In the northern mid- to high latitudes (20°N and above), the seasonal drying pattern was mainly driven by strong increases in air temperature, and secondarily by increasing leaf area index in summer and early autumn and decreasing snow water equivalent in spring (Supplementary Fig. 13). In the northern subtropics (0°N–20°N), increases in precipitation was the main contributor to the summer wetting (Supplementary Fig. 13). In the Southern Hemisphere, increases in temperature was the main contributor to the drying, followed by increasing leaf area index and decreasing snow water equivalent below 40°S (Supplementary Fig. 13).

The detection of significant differences of the pseudo-observed signals from natural internal variability and the attribution of the detected signals to greenhouse gases-dominated anthropogenic forcings (ALL, ANT, GHGAER, and GHG) in the surface 3-month SSI in August–November and in the root-zone 3-month SSI in September–April (Fig. 3) expanded the conclusions of previous D&A research on drought11,16,17,20,28,29. Whereas the previous studies demonstrated significant anthropogenic impacts on the annual or summer (either Northern Hemisphere summer or locally defined) average drying, we demonstrated seasonally varying impacts that were the greatest in the transition months (August–November) but also occurred in the root-zone in the boreal winter and spring months (Fig. 3). These seasonal patterns suggest that the traditional focus on annual or summer average drying may lead to underestimation of the drought risks in late summer and autumn, which can still affect ecosystem functions and reservoir operations54,55, and neglection of the increase in flood risks in the northern high-latitude spring (Fig. 2). Some previous D&A studies showed that the AER forcing impacted annual and summer drought before the 1980s16,17. This study could not reliably attribute the detected signals to the AER forcing because the pseudo-observed signals were very near the upper edge of the 95% CI of the AER-forced signals (Fig. 3). This finding may be because much of the studied historical period (1971–2016) was post-1980. Considering the regionally non-uniform changes in aerosol emissions56, the complexity of aerosol effects, and current model inadequacies57, future studies are needed to better detect, attribute, and understand the mechanisms of AER-forced soil moisture changes.

In recognition of the biases in the S/N ratios of the ALL simulations compared to the pseudo-observation (Fig. 3), we developed a GAM-based emergent constraint approach to better estimate the simulated future S/N ratios than the average of the ESMs (“Methods”). We further quantified the implications of constraining the future S/N ratios on the future SSI trends using mathematical relationships between the S/N ratios and the SSI trends (Supplementary Methods Sect. 5). Bias-correcting the future projection based on historical D&A results is a common practice in optimal fingerprinting-based D&A58,59, but the practice cannot be applied to pattern-based D&A16,17,21 because of different mathematical formulas. The developed GAM-based emergent constraint approach is similar to conventional emergent constraint in that it estimates linear relationships between modeled historical and future variables11,46,47. But the traditional linear regression framework of emergent constraint11,46,47 requires a separate linear regression to be fitted per future period, while a single GAM can be fitted on all the future periods and ensures that the constrained future S/N ratios form a smoothly varying time series (“Methods”). A limitation of GAM is that it can only test the overall significance of the fitted term60 and cannot test whether the linear relationship between historical and future S/N ratios becomes less significant over time (e.g., reported by Winkler et al.47 because of nonlinear physical relationships). Therefore, future studies should explore more advanced statistical testing methods to find potential temporal changes in the significance of the emergent relationship. Despite this limitation, the GAM reconciled the difference between the modeled and the pseudo-observed S/N ratios, was parsimonious, and achieved nearly always better fit on the future S/N ratios than fitting a separate linear regression per future period (Supplementary Fig. 8). Therefore, the developed approach provides a reasonable framework for pattern-based D&A studies to account for the model biases in future S/N ratios.

Apart from the GAM framework, additional limitations exist in the current study. We only used the leading fingerprint in the D&A and the emergent constraint, but some anthropogenic signals may exist in other fingerprints17. The pseudo-observation contains some bias in the Sahel region (Supplementary Fig. 4) because of the limitations of the source data sets. The CMIP6 ESMs do not fully or consistently simulate interactive atmospheric chemistry31, dynamic vegetation61, and land use and land cover change62. These limitations should be addressed with future methodological and data advancements.

In summary, we identified significant human contributions to global SSI-based drying of the surface soil in August–November and the root-zone soil in September–April over 1971–2016. The drying mainly occurred in the northern and southern mid-latitudes, and in the summer and autumn seasons in the northern high-latitudes; counteracting wetting occurred in the northern subtropics and in spring in the northern high-latitudes. The anthropogenic impacts were mainly contributed by greenhouse gas emissions. Pseudo-observation constrained future S/N ratios and SSI trends under the SSP5-8.5 scenario suggested accelerating drying in the surface soil and in the root-zone soil, except in the spring in the northern high latitudes, where the root-zone SSI showed accelerating wetting. These heterogeneous SSI changes point to greater risks of drought and floods in the future, suggesting the need for latitude- and seasonally dependent mitigation and adaptation measures. By revealing detailed spatiotemporal patterns of human forcings’ impacts on long-term SSI changes, this study advanced current understanding of the changes and causes of terrestrial aridity, and generated results and methodological developments that will be of interest to the scientific community and the broader public.

Methods

We used the SSI for the D&A rather than the raw soil moisture because the magnitudes of soil moisture in ESMs are highly dependent on model-specific assumptions about soil properties, evapotranspiration, runoff, and drainage, whereas the temporal variabilities are more robust63. For the D&A analysis in the main text, we calculated the surface and root-zone SSI at the timescale of 3 months using a distributional fit procedure that involved the Gaussian mixture distribution (Supplementary Methods Sect. 1). The calculated 3-month SSI of each month reflects the average soil moisture conditions of the current month and previous two months (e.g., the 3-month SSI in February reflected the average soil moisture between previous year’s December and this year’s January and February). The pseudo-observed SSI, abbreviated as Mean NonCMIP and spanning 1971–2016, was the average SSI derived from three merged soil moisture products30 that are independent of the CMIP5 or CMIP6 ESMs (Supplementary Methods Sect. 1). We calculated the modeled SSI under the influences of all the anthropogenic and natural forcings (ALL), anthropogenic greenhouse gases only (GHG), anthropogenic aerosols only (AER), combined influences of anthropogenic greenhouse gases and aerosols (GHGAER), anthropogenic forcings only (ANT), natural solar and volcanic forcings only (NAT), and internal natural variability only (piControl) using all the available appropriate CMIP6 ensemble members (Supplementary Table 2). For sensitivity analysis on the D&A results, we also calculated monthly SSI at the 1- and 6-month timescales, using alternative pseudo-observations, statistical distribution, and CMIP6 ensemble members (Supplementary Methods Sect. 3). We aggregated all the pseudo-observation and modeled SSI to 5° zonal averages for the D&A.

We conducted the D&A analysis separately for the SSI in each month of the year and soil layer used a pattern-based method16,21,35,45,64 (Supplementary Methods Sect. 2). We calculated the ALL fingerprint as the first empirical orthogonal function (EOF) of the multi-model average of zonal-mean SSI anomalies derived from the concatenated CMIP6 historical and SSP5-8.5 simulations. We then projected the zonal-mean SSI from the pseudo-observation and from the CMIP6 simulations forced by different sets of forcing agents (ALL, GHG, AER, GHGAER, ANT, NAT) onto the ALL fingerprint. We treated the 1971–2016 trend in the projected pseudo-observed time series as the pseudo-observed signal (referred to as S). We treated the 1971–2016 trends in the projected time series under a specific set of forcing agents (ALL, GHG, AER, GHGAER, ANT, or NAT) as the simulated forced signals. The probability distribution of those simulated forced signals was fitted on the model ensemble members forced by the same set of agents (Supplementary Table 2). We also projected the piControl simulations onto the ALL fingerprint and calculated the unforced trends of all the overlapping 46-year segments in the projected time series. We calculated the noise (referred to as N) as the standard deviation of those unforced trends. When a pseudo-observed signal was outside the two-tailed 95% confidence interval (CI) of the fitted probability distribution of the unforced trends, we considered the pseudo-observed signal detectable at the 95% confidence level. This corresponded to an absolute value of the pseudo-observed S/N ratio of 1.96 or greater, under the assumption of Gaussian distribution of the unforced trends. If, in addition, a detected pseudo-observed signal lay within the 95% CI of the distributions of the ALL, GHG, AER, GHGAER, ANT, or NAT-forced signals, we considered detected signal to be attributable to the indicated set of external forcing agents.

We used the concept of detection time45 to quantify when the pseudo-observed signal first became detectable. For each month and soil layer, instead of calculating the pseudo-observed and ALL-forced signals as trends over a fixed time period (1971–2016), we calculated time-varying signals, with the starting year of the trends being in 1971, and the ending year varying between 1981 and 2016. For each time period, we also calculated the corresponding noise as the standard deviation of the unforced trends over an equal length of time. That is, the corresponding noise of the signals over 1971–1981 would be based on 11-year segments of the projected piControl series, over 1971–1982 based on 12-year segments, …, over 1971–2016 based on 46-year segments. The detection time was the ending year at which the time-varying pseudo-observed signal first became and afterward remained significant at the 95% confidence level45. To ensure that the detected pseudo-observed signals were attributable to the external forcings represented by the ALL fingerprint, we also set a consistency condition: if a detected signal was outside the 95% CI of the distribution of the ALL-forced signals over the same time period, then the signal was treated as if insignificant.

We constrained the biased future S/N ratios of the CMIP6 ALL simulations using a generalized additive model (GAM)-based emergent constraint method. The standard emergent constraint method estimates linear regression relationships between modeled historical and future variables, treating the historical-future pair of each individual ESM as an x–y pair in the regression11,46,47,65. If the slope of the linear regression is statistically significant, the observed historical variable is plugged into the regression equation to generate a constrained future value, which is presumably better than the average of the ESMs, and an uncertainty interval for the constrained future value11,46,47,65. In this study, the modeled historical variable was the historical ALL-forced S/N ratios over 1971–2016, the modeled future variable was the ALL-forced S/N ratios over various future time periods (1972–2017, 1973–2018, …, 2055–2100), and the historical observation was the pseudo-observed S/N ratio over 1971–2016. The linear regression approach only allows one future value to be generated with one regression. Therefore, if we used the linear regression approach, we would perform a separate linear regression between each pair of future and historical ALL-forced S/N ratios. Since the regression coefficients of any two adjacent future periods would be estimated separately, the constrained future S/N ratios of these two periods could differ considerably. Such discontinuity would violate the temporal smoothness of the S/N ratios as trends, for the trends over two substantially overlapping time periods (e.g., the 46-year time periods 1972–2017 and 1973–2018 have 45 overlapping years) should only differ slightly, unless the non-overlapping years were substantial outliers. To ensure smooth transition between the constrained S/N ratios of adjacent future periods, we used GAM66, instead of linear regression, to estimate the historical-future relationships. The GAM takes the form \(y={\beta }_{1}+{{{{{\rm{s}}}}}}\left(x,\, t\right)+\varepsilon\), where \({\beta }_{1}\) is the intercept, \(\varepsilon\) is the fitting residual, \(y\) is the future modeled S/N ratio of an ESM, \(x\) is the historical modeled S/N ratio of the same ESM, \(t\) is the year, and \({{{{{\rm{s}}}}}}\left(\bullet \right)\) is a tensor product smooth over \(x\) and t—which, intuitively, is the sum of products between all the pairs of the marginal spline basis of \(x\) and the marginal spline basis of \(t\)60. We set the marginal spline of \(x\) to consist of one intercept term and one linear term of \(x\). This setup ensured that the relationship between \(x\) and \(y\) was always linear, following the convention of past emergent constraint studies11,46,47,65. We set the marginal spline of \(t\) to have cubic order and determined the number of splines by minimizing the Akaike Information Criteria67, because preliminary analysis showed that the linear regression coefficients between the historical and future S/N periods varied nonlinearly over time. We fitted a separate GAM for each month of the year using the PyGAM package67. We also averaged the ALL-forced S/N ratios of each ESM within its ensemble members before putting the values into the regression to prevent the ESMs with more ensemble members from exerting more influence on the regression. Supplementary Fig. 7 shows an example of such fitted tensor product \({{{{{\rm{s}}}}}}\left(x,\, t\right)\). The GAM achieved nearly always better fit on the future ALL-forced S/N ratios than fitting a separate linear regression for each future period (Supplementary Fig. 8). All the fitted GAMs were significant at the 95% confidence level. In additional to statistical significance, the emergent constraint approach requires the modeled historical-future relationship to be physically justified11,46,47,65. Supplementary Methods Sect. 4 discusses the physical justification for this emergent constraint between the historical and future ALL-forced S/N ratios.

To propagate the effect of emergent constraint on the S/N ratios to ALL-forced zonal trends in SSI, we decomposed the zonal trends into the sum of two terms (Supplementary Methods Sect. 5). Briefly, the first term is proportional to the future S/N ratio, and the second term is interpreted as a remainder term that is dependent on the non-leading empirical orthogonal functions that were obtained as part of the D&A process. To adjust the SSI, we replaced the future S/N ratio in the first term with the constrained S/N ratio and kept the other terms in the equation the same.

Throughout this paper, all mentions of average values, the percentages of models that agreed in sign with the average value, and the percentages of models that were significant at the 95% confidence level should be understood as weighted. That is, the values were first averaged over the ensemble members of each ESM and then averaged over the ESMs, to prevent the ESMs with more ensemble members from dominating the results. Because agreement in sign and significance at 95% confidence level were Boolean values, “true” was treated as 1, and “false” as 0, in the weighted averaging. Standard deviations were not weighted and were calculated directly on the ensemble members of all the ESMs. Unless specified otherwise, all the 95% CI in this paper were calculated using the Gaussian assumption, i.e., equal to the weighted average ± 1.96 × standard deviation. Also, unless specified otherwise, all the trends in this paper were calculated using linear least squares.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.