Quantification of human contribution to soil moisture-based terrestrial aridity

Current knowledge of the spatiotemporal patterns of changes in soil moisture-based terrestrial aridity has considerable uncertainty. Using Standardized Soil Moisture Index (SSI) calculated from multi-source merged data sets, we find widespread drying in the global midlatitudes, and wetting in the northern subtropics and in spring between 45°N–65°N, during 1971–2016. Formal detection and attribution analysis shows that human forcings, especially greenhouse gases, contribute significantly to the changes in 0–10 cm SSI during August–November, and 0–100 cm during September–April. We further develop and apply an emergent constraint method on the future SSI’s signal-to-noise (S/N) ratios and trends under the Shared Socioeconomic Pathway 5-8.5. The results show continued significant presence of human forcings and more rapid drying in 0–10 cm than 0–100 cm. Our findings highlight the predominant human contributions to spatiotemporally heterogenous terrestrial aridification, providing a basis for drought and flood risk management.

3) Methods: The pattern-based detection and attribution method is the same (but a bit simplified) as that used by Bonfils et al. (2020). But there are indications that the method may not have been implemented properly. For example, while Bonfils et al. (2020) detected human influence at the 5% level, this paper would state to have detected at the 95% confidence interval. Fig. S3 of this paper shows fingerprint for different forcings but the fingerprints for individual forcings have never been used in the detection as the signal is the projection of observations to the fingerprint of all forcing. The detection and attribution analysis is conducted on latitudinal average of soil moisture index without giving clear justification. Figure S10 seems to contradict the wisdom/justification of making latitudinal average: trends of ALL forcing simulations are of opposing signs over large continuous areas in the same latitude bands. The authors claimed they had invented a new emergent constraint based on S/N without giving a clear physical reasoning or proof, but only to walk back later that they didn't know if that emergent constraint actually worked (see discussion on page 18, in particular lines 438-440).

4) Presentation:
The main text is relatively easy to understand, though there are so conceptual errors in describing results (e.g. detection at the 95% confidence interval etc.). But the Supplementary Information is impossible to follow. I read for more than 5 times and I am still not sure I fully understand how the authors did the analyses (I am sorry if I have misunderstood your work in my review).
Reviewer #3 (Remarks to the Author): Dear Authors, From my first reading of the manuscript, I feel that this manuscript is very interesting but currently not suited for publication in Nature Communications. The manuscript is well written and I see the importance but the current form of the manuscript needs more than major revisions in it's current form. The topic of D&A of soil moisture trends is very relevant and interesting, certainly with the CMIP6 models included. I feel the analysis is relevant but need further improvement. Please find some suggestions below.
I find the manuscript hard to read, it is difficult to follow the different steps, approaches, and experiments. The manuscript would benefit from a smaller set of experiments or a better description of the different experiments in a concise manner. The current manuscript balances on a lot of different analysis, different angles and it is very hard to detangle the main message. I find it very tough to get through the current form of the manuscript. I know and see that there is a lot of interesting results in the manuscript, but the current content load of the manuscript is way too high. My recommendation would be to focus on 2/3 main message to better show the reader the added value.
Are these trends significant, please provide uncertainty bounds? If I read the caption of Figure 1, they are not outside of the 95% CI. Please also clarify left and right orientated hatching, better to use hatching and some horizontal lines. I left-orientated from left to right or right to left?
Line 147-148 is not very satisfying, if the authors indicate potential model deficiencies it would be better to show that this is the case instead of leaving the reader without any proof.
Line 151-153, another speculation that is not really support by any evidence. I would prefer Mention how stomate closure response affect potential evaporation rates in the future I feel that 22 extra figures in the appendix and 12 more table do not improve the readability of the manuscript. I feel that the authors should be able to provide the main message of the manuscript with a significant lower number of figures in the appendix. 1 Reviewer #1 (Remarks to the Author): Although the impacts of anthropogenic climate change on drought have been extensively studied, most were focused on meteorological drought indices (e.g., PDSI and aridity index), and few were on soil moisture, especially the long-term multi-layer soil moisture changes. The general methodology of this study appears to be robust, with the use of multiple observation-constrained soil moisture datasets, latest CMIP6 sensitivity simulations, and formal detection and attribution (D&A) analyses. Moreover, its demonstration of seasonally and vertically varying anthropogenic effects is very interesting and has implications for improving drought predictability and management. Considering the importance of soil moisture D&A to land surface processes and land-atmosphere feedbacks, the findings of the study are a significant contribution to both the D&A and drought research fields. I thus recommend accepting the manuscript after addressing the following comments and concerns.
1) SSI trends in the pseudo-observation and CMIP6 simulations. Observations showed that a drying tendency existed in the northern subtropics. However, the authors suggested that wetting occurred in the northern subtropics, which seems contrary to the results shown in Fig.1. Besides the anthropogenic warming effects that increase potential evaporation, this observed drying trend may be partially associated with the poleward extension of Hadley circulation. However, the multi-model means showed little drying tendency in the northern subtropics and the core drying region shifted further north. This makes me wonder whether current CMIP6 models have enough skill to reproduce the observed changes of the northern branch of the Hadley circulation.
Response: Thank the reviewer for the encouragement and comments. Regarding the northern subtropics issue, we performed a literature search and found that the current CMIP6 models underestimated the rate of expansion of the Hadley cell compared with reanalysis, although the CMIP6 models had a higher rate than the CMIP5 models (Hu et al., 2018;Xia et al., 2020). Such model biases may have caused underestimation of the drying tendency in 20°N-40°N, and the difference from pseudo-observation at this latitude. Based the global map of SSI (Fig. S1), we also suggest that the difference between the pseudo-observation and the ALL simulations was caused by the Indian Monsoon region. Therefore, we added the following explanation in the main text lines 133-138 (tracked change version lines 214-234): "The pseudo-observation also had major differences from the ALL simulations in a few months near 30°N, and in February-July in 0- 20°N (Fig. 1c,d,g,h). The former difference may be caused by the under-estimated rate of the expansion of the Hadley cell by the ALL simulations 39,40 or model biases in the Monsoon-related precipitation in northern India (Fig. S1;ref. 43 )." The overestimation of drying in the ALL simulations in 40°N-60°N was mainly contributed by the eastern Canada and western Europe regions (Fig. S1), and is likely related to the overestimation of the warming trends by the CMIP6 models in these regions (see Fig. 3 of (Fan et al., 2020)). Our additional analysis in the revised manuscript suggests that the potential evapotranspiration is overestimated in these regions in the ALL simulations (Fig. S2). Therefore, we added the following explanation in the main text lines 129-133 (tracked change version lines 210-214): "The major differences (i.e., pseudo-observation outside the 95% confidence interval [CI] of the ALL simulations) around 60°N in Northern Hemisphere spring were caused by differences in eastern Canada and western Europe (Fig. S1), which was likely related to the over-2 estimation of the increasing trends in air temperature 42 and potential evapotranspiration in the ALL simulations (Fig. S2)." 2) Line 106: The authors stated that the temporal variability of model-based soil moisture tended to be more robust than its magnitude. However, they did not evaluate the models' performance in reproducing the observed spectrum structure of soil moisture temporal variability, such as the these at interdecadal, interannual and intraseasonal scales.
Response: Thank you for this suggestion. In the new version of our dataset paper , we performed a comparison between the power spectral densities of the merged and the source data sets using regional averages. Here, we reproduced the graph, but only showing the same ALL simulations of CMIP6 ESMs as used in the current D&A paper ( Figure A). It can be clearly seen that the three products used to create the pseudo-observation in the main text, Mean ORS, OLC ORS, and EC ORS, had power spectral densities that were generally within the envelopes of the ALL simulations. This indicates that the selected CMIP6 soil moisture simulations did not systematically over-or underestimate the variabilities across different temporal scales. The auxiliary data sets, EC CMIP5, EC CMIP6, EC CMIP5+6, and EC ALL, which were used for sensitivity analysis in the Supplementary Material, also had power spectral densities that were mostly within the envelopes of the ALL simulations. Figure A. Power spectral densities of the regional mean soil moisture of the merged data sets (Mean ORS,OLC ORS,EC ORS,EC CMIP5,EC CMIP6,EC CMIP5+6,EC ALL), and the ALL simulations (Source data sets). The power spectral densities were calculated on each individual ALL simulation, and the envelope shows the maximum and the minimum values of the calculated densities at each frequency. The regions are the IPCC SREX regions (Field et al., 2012). Nnorth, W -west, C -central, E -east, S -south. 3) Detection and attribution. The authors used a complex D&A method to quantify the impacts of various external forcings on the spatiotemporal patterns of SSI changes. However, they did not add much explaining possible physical/biophysical mechanisms through which these external forcings pose influences on the surface and root-zone SSI. At least, the mechanisms behind GHG-induced SSI drying would be different from those of aerosols and volcanic activities.
Response: Thank you for raising this issue. We added mechanisms related discussion in the revised manuscript in lines 255-286 (tracked change version lines 485-654), and created some supporting figures (Figs. S5 and S9).

4)
Suppl. Lines 231-233: I do not think that is a good answer for the contradict signals of volcanic aerosol emissions in ALL and NAT PCs. The authors may need to explore further this 'significant' contradiction.
Response: Thank you for pointing this out. We deleted the ANT, GHGAER, and NAT PCs from the paper because they were not used in the D&A in any way; also, the other two reviewers commented that the manuscript was cramped with information. Regarding the "significant" contradiction, we examined the original figures (reproduced in Figure B and Figure C below) more closely, and found that the contradiction mainly occurred in the months July to November. In these months, the NAT fingerprints were almost the opposite of the ALL fingerprints ( Figure  D). Therefore, the projection of the same volcanic aerosol-induced SSI patterns in these months on the NAT fingerprints would likely have opposite signs compared with the projection on the ALL fingerprints. Response: Thank you for raising this issue. For the same reason as mentioned in comment #4, the ANT, GHGAER, and NAT fingerprints and PCs were removed from the new manuscript, and any uncertainty therein does not affect the D&A analysis. Therefore, we did not investigate into the differences.
Reviewer #2 (Remarks to the Author): This paper conducts a detection and attribution analysis on merged soil moisture data products with an aim to quantify human contribution to soil moisture-based droughts. Here, the authors first converted soil moisture data products that are related to observations and that are simulated by CMIP6 models under various external forcing into standardized soil moisture index (SSI), and then compared them using pattern-based detection and attribution method. They claim that human influences affected soil moisture of both top 10cm and top 100cm soil layers in various 3month seasons. The author developed, what they considered to be the first, observationconstrained future SSI projections. While it is interesting to compare observed soil moisture with those simulated by models to gain understanding of possible human influence on soil moisture, thereby attributing changes in drought (as the title of the paper may imply), there are a number of papers already on the same or similar subject. It is unclear to me if this paper brings sufficient novelty in methodology or our general understanding of soil moisture response to external forcing that would warrant it to publish in Nature Communications. Compared with existing 8 literature, Bonfils et al. (2020) in particular, I am not sure we learnt a lot new here. In addition to this general observation, I also have a number of concerns regarding data, methods, presentation and interpretation.
Response: Thank the reviewer for your comments. As cited in the Introduction of the main text, there are indeed many papers focused on the subject of drought, a few of which are already related to the D&A of drought. However, many of these past papers were mainly centered on the use of meteorological drought indices (Schlaepfer et al., 2017;Dai et al., 2018;Cook et al., 2014;Feng and Fu, 2013;Bonfils et al., 2020). As shown in Figure 1 of Yang et al. (2019), the meteorological drought indices derived from precipitation and potential evapotranspiration generally overestimated global drying tendencies; only the Palmer Drought Severity Index (PDSI), which has a more complicated formula of calculation, showed considerably similarity to surface soil moisture; but the PDSI still could not reveal detailed drought structures across soil layers. Compared with the meteorological droughts, soil moisture more directly affects the availability of water to plants. For example, past studies have demonstrated that soil moisture is a greater control on plant activity than vapor pressure deficit , and that multi-layer soil moisture can better explain vegetation growth than single layer . Soil moisture also directly modulates the primary productivity, phenology, and land-atmosphere feedbacks Lian et al., 2021;Gevaert et al., 2018).
Also, the majority of the past papers were either only focused on the analysis of ESM simulations (Xu et al., 2019;Lu et al., 2019;Cook et al., 2014;Greve and Seneviratne, 2015) or the use of limited observations, reanalysis, or offline model simulations (Schlaepfer et al., 2017;Deng et al., 2020;Feng and Fu, 2013;Lian et al., 2021;. Because soil moisture can be considerably different across data sets Xu et al., 2021;Cheng et al., 2017), having different sources of products merged using strict statistical methods can overcome the biases/errors contained in individual data sets. Such data set-related improvement and novelty has been demonstrated in this manuscript and our related paper .
Additionally, the seasonal and layer-wise differences, identified by the present study but omitted in past studies, have significant practical implications. For example, because the drying was generally greater in summer than winter, the annual average drying examined by previous studies would exclude or underestimate the impacts of soil drying on seasonal vegetation dynamics (e.g., phenology). Also, the focus on annual averages by past studies could preclude the identification of spring wetting in the northern high-latitudes, which can significantly affect flood risks, reservoir operations, and vegetation productivity.
Therefore, although previous studies and this research address global aridification, our work adds more robustness and evidence, showing that human signals significantly existed in long-term changes of soil moisture-based drought and not just in meteorological drought indices (e.g., Bonfils et al., 2020); moreover, this research systematically clarifies evident seasonal and layerwise variations in these human signals. We thus revised the entire Introduction (lines 55-107; tracked change version 58-178) to better address these advancements over previous studies.
Regarding Bonfils et al. (2020) specifically, they employed a meteorological drought index, Climate Moisture Index (CMI), which is based on precipitation and potential evapotranspiration. Therefore, it may likely share similar abovementioned deficiencies as to the meteorological drought indices. Moreover, Bonfils et al. (2020) applied their D&A methods on the joint changes in temperature, rainfall, and aridity; consequently, the signals and signal-to-noise ratios shown in their Figures 1, 2, and 3 reflected joint changes in temperature, rainfall, and aridity, containing some information from all three variables. Using only the SSI, our results are thus focused on the clean information directly from soil moisture, and have different physical meanings from those of Bonfils et al. (2020). For example, comparing their joint CMI fingerprints 1 and 2 (Extended Data Figs. 1c and 1f of Bonfils et al. [2020]) with the ALL fingerprint shown in our Figure 2, one can see that our fingerprint is different from any of their fingerprints, although a weighted combination of the two could probably yield more similar results to ours. Drying above 60°N only existed in fingerprint 1 of Bonfils et al. (2020) and was smaller than the drying in the southern hemisphere; but in our fingerprints, the >60°N drying is greater than that in the southern hemisphere. No wetting above 30°N existed in the fingerprints of Bonfils et al. (2020), but our fingerprints show wetting between 40°N and 60°N in spring. Between 0°N and 20°N, fingerprint 1 of Bonfils et al. (2020) showed drying, and fingerprint 2 showed wetting; but our fingerprints show that the wetting and drying depend on the season, especially in the 0-10 cm layer.
Because another reviewer also pointed out that our study lacked detailed discussion of physical mechanisms, we added new analysis and discussion about the influences of precipitation, temperature, LAI, and snow water equivalent on the SSI in the revised manuscript (see Discussion lines 255-286 [tracked change version lines 485-654], and Figs. S5 and S9). However, such additional analysis is unlikely to significantly improve the understanding of meteorological drought changes because LAI and snowmelt are not explicitly in their calculation formulas, although some influences may exist through land-atmosphere feedbacks. Thus, these analysis can be used to further distinguish our findings from Bonfils et al. (2020), which focused on the investigation of large-scale circulation mechanisms (e.g., shift in the ITCZ) underlying hydroclimate changes.
1) The title: The title is misleading. The authors did the analysis on the standardized soil moisture index (SSI). While SSI is used in drought monitoring, only the low tail of SSI has something to do with drought. This paper focuses on trend of SSI, or long-term changes in mean of SSI; it does not deal with changes in extreme negative values of SSI (that indicate drought).

Response:
Thank you for point out the issue. We changed the title from "drought" to "terrestrial aridity".
2) Data: The merged soil moisture product is perhaps the most comprehensive soil moisture data with regard to spatial and temporal coverage. But it is unclear if this data product is suitable for the analysis of long-term changes and thus if it fits for the purpose of this study in particular. The paper describing the development of this datasets is still under review according to ESSD website. I did read the discussion paper and realized that while the data developer argued add values of the data products in terms of "improvement and harmonized spatial, temporal, and vertical coverages", they did not conduct careful check of data homogeneity. Thus the suitability of the data product for the analysis of long-term change is not known. Given that the data are merged from different sources, one would assume inhomogeneity in the data product.
Response: Thank you for the comment. In the process of revising the data set paper , we performed statistical test of the temporal homogeneity of the merged data sets on a grid-by-grid basis using a previous procedure (Su et al., 2016). The new results are displayed in the Figure 4 and Figures S6 and S7 of the revised data set paper. The new analysis shows that in terms of mean soil moisture values, little inhomogeneity could be detected; in terms of the variance of the soil moisture values, inhomogeneity could be detected in 10%-30% of the grid cells, depending on which merged data set was evaluated against which reference data set. The highest percentage of inhomogeneity in variance was actually detected in a merged data set that was averaged from seven TRENDY v7 land surface models (see Table S1-S3 of the data set paper, column "Used time period," which shows that only these seven models were used for the 1970-2016 period). This merged data set should not contain any inhomogeneity because offline land surface models and the simple averaging process do not contain inhomogeneity. We concluded that the merged data sets were homogeneous, and the detected homogeneity may have arisen from inhomogeneity and disturbance representation issues in the reference data sets as described in the data set paper . Therefore, we suggest that the merged data sets are suitable for D&A.
3) Methods: The pattern-based detection and attribution method is the same (but a bit simplified) as that used by Bonfils et al. (2020). But there are indications that the method may not have been implemented properly. For example, while Bonfils et al. (2020) detected human influence at the 5% level, this paper would state to have detected at the 95% confidence interval. Fig. S3 of this paper shows fingerprint for different forcings but the fingerprints for individual forcings have never been used in the detection as the signal is the projection of observations to the fingerprint of all forcing. The detection and attribution analysis is conducted on latitudinal average of soil moisture index without giving clear justification. Figure S10 seems to contradict the wisdom/justification of making latitudinal average: trends of ALL forcing simulations are of opposing signs over large continuous areas in the same latitude bands. The authors claimed they had invented a new emergent constraint based on S/N without giving a clear physical reasoning or proof, but only to walk back later that they didn't know if that emergent constraint actually worked (see discussion on page 18, in particular lines 438-440).
Response: Thank you for the pointing out these issues. During the process of creating this manuscript, we had a lot of conversations with one of our coauthors, Dr. Bonfils, who was also leading Bonfils et al. (2020), ensuring that the procedure was correctly implemented. The 5% significance level and 95% confidence level/interval are different terminologies that refer to the same thing-when a test statistic is within the 95% CI, there is no statistical significance; when outside, there is statistical significance at a 5% level (total probability is always 100%, hence 100% − 95% = 5%). We changed the terminologies to p = 0.05 and p = 0.01 to remove the confusion (main text lines 167-168, 177-178, 194, 197, 220, 223, 591, 599, 618-619, 628, 631 [tracked change version lines 300-302, 312-313, 347-349, 444-447, 1130, 1138, 1157-1158, 1169, 1172]; Supplementary Material lines 177). In the original Fig. S3, some of the fingerprints were indeed not used in the detection. We deleted the entire figure to avoid confusing the readers in the updated version because the necessary fingerprints (ALL, GHG, AER) are already presented in the main text.
Latitudinal averaging is a common practice in D&A on hydrological variables enhancing the S/N ratio, such as precipitation (Sarojini et al., 2016;Marvel and Bonfils, 2013;Zhang et al., 2007), evapotranspiration (Douville et al., 2013), and joint changes in precipitation, temperature, and drought index (Bonfils et al., 2020). Latitudinal averaging has also been performed in other analyses on global temperature and soil moisture to highlight large-scale model performance or pattern changes (e.g., Santer et al., 2018;Berg et al., 2017). All of these variables have longitudinal heterogeneity. In general, the relative importance of natural internal variability to external forcings becomes larger at smaller scales (Bindoff et al., 2013). Also, external forcings other than greenhouse gases and aerosols, such as land use and land cover change, can be more important at smaller scales (Lawrence et al., 2016); however, the simulated system responses to these forcings remain quite uncertain because of inconsistent or incomplete representations among different ESMs. Latitudinal averaging would remove the influences from local factors and longitudinal factors, such as Monsoon circulations, but not from latitudinal factors, such as the expansion of the Hadley cell, the shifting of ITCZ, and the pole-to-equator gradient in warming. Therefore, we considered the latitudinal averaging not to invalidate the D&A process.
In recognition of the existence of longitudinal heterogeneity, we performed an analysis that shows historical S/N ratios at the 5° grid level ( Figure E, Figure F). The S/N ratios were above 1 in most of the global grids, indicating that the pseudo-observation is distinguishable from the natural internal variability. Also, in some parts of the world (e.g., Canada and northern Asia, South Africa, eastern South America, eastern Australia), the S/N ratios were significant at p = 0.05 (i.e., greater than 1.96). These results support the original conclusions that anthropogenic forcings significantly impacted SSI. We did not add this analysis into the manuscript because Reviewer #3 suggested that the original manuscript already had too much information.  Regarding physical justification of the emergent constraint, we claim physical justifications as follows, partially based on our new analysis on the contribution of precipitation, temperature, LAI, and snowmelt to the SSI trends: (1) There is demonstrated emergent constraint between the historical and future trends in surface air temperature (Tokarska et al., 2020) and between the historical trends in leaf area index (LAI) and future changes in gross primary productivity, which is linearly correlated with LAI until the latter reaches saturation . Temperature is the most important contributor to the SSI changes found in our study (see Fig. S9 of the revised manuscript). No emergent constraint relationship is known between historical and future trends in global precipitation or snow water equivalent. However, precipitation changes in convergence regions are closely related to temperature changes through hydrological sensitivity (Su et al., 2017), and the Northern Hemisphere spring snow cover is linearly correlated with global surface air temperature (Mudryk et al., 2020). (2) The spatial patterns in the correlation between the drivers of SSI and the SSI were consistent between historical and future time periods, although the magnitudes change somewhat (see Fig. S5 of the revised manuscript). Therefore, the historical spatial patterns of SSI trends should be correlated with the future spatial patterns of SSI trends through the relationship between the trends in the historical and future drivers. The magnitudes change is not an issue for emergent constraint because it can be accounted for by varying the regression coefficient over the future periods. Regarding the statistical significance issue, in our original results, multiple emergent constraint regressions were performed, with one individual regression for one individual month and future time period (e.g., 2001-2046, 2002-2047, …, 2055-2100). Some of these regressions were significant and some of them not. There may be several causes of this difference in significance. First, the relative importance of the drivers to SSI trends were different from month to month (see Figs. S5 and S9 of the revised manuscript). Because the degree to which emergent constraint exists in the different drivers also differs (point 1 above), the drivers' difference will propagate to the degree to which emergent constraint exists in the SSI. Second, the uncertainty in climate projections generally increase further into the future. This is consistent with and may explain the fact that the original emergent constraint regressions were generally less significant further into the future. A recent study found that the existence of emergent constraint can vary with the degree to which the climate has changed because linearity assumptions underlying the emergent constraint regression can break down at high levels of climate change . This phenomenon may be applicable to the case of our original results. Overall, the significance of the emergent constraint in the original paper should be interpreted as being seasonally and temporally dependent.
Given these discussion and caveats, we acknowledge that the reported results of the original emergent constraint can be confusing to readers. The significance of the emergent constraint was likely underestimated in the previous analysis because we did not leverage the fact that the S/N ratios were trends (i.e., the S/N ratios of two adjacent time periods were derived from a large amount of overlapping data points). As a result of this overlap, fitting one linear regression per future time period would have resulted in more parameters than necessary and reduced the statistical significance of the parameters. Therefore, in the revised paper, we reduced the number of parameters in the emergent constraint by applying a generalized additive model (GAM) (Wood, 2017), which takes the form = + , + , where is the intercept, is the fitting residual, is the future modeled S/N ratio, is the historical modeled S/N ratio, is the year, and • is a tensor product smooth over and twhich, intuitively, is a sum of the products between all pairs of the spline basis of and the spline basis of (Wood, 2006). This form of regression allows the relationship between and to change smoothly over time as a function of . In our implementation, we used a simple linear function as the marginal smooth of . Thus, the relationship between and is linear (i.e., in the same form as the linear regressions performed by past emergent constraint studies). We used a cubic regression spline as the marginal smooth of . The new GAMs for all the months and soil layers had statistically significant term • and achieved better fit than the original rolling regressions, as indicated by the smaller standard deviations of the fitting residuals ( Figure G). Therefore, we deemed the new GAM-based emergent constraint procedure to be reasonable and has the benefit of eliminating the sometimes significant, sometimes insignificant confusion of the original rolling regressions. The constrained future S/N ratios were slightly different from the original linear constrained results, but the overall trends were similar. The results based on the new GAM emergent constraint are reported in Section 2.

4) Presentation:
The main text is relatively easy to understand, though there are so conceptual errors in describing results (e.g. detection at the 95% confidence interval etc.). But the Supplementary Information is impossible to follow. I read for more than 5 times and I am still not sure I fully understand how the authors did the analyses (I am sorry if I have misunderstood your work in my review).

Response:
We apologize for the confusion. As the reviewer can see from the Track Changes version, we have cleared up the writing of both the main text and the Supplementary Material.

Reviewer #3 (Remarks to the Author):
From my first reading of the manuscript, I feel that this manuscript is very interesting but currently not suited for publication in Nature Communications. The manuscript is well written and I see the importance but the current form of the manuscript needs more than major revisions in it's current form. The topic of D&A of soil moisture trends is very relevant and interesting, certainly with the CMIP6 models included. I feel the analysis is relevant but need further improvement. Please find some suggestions below.
I find the manuscript hard to read, it is difficult to follow the different steps, approaches, and experiments. The manuscript would benefit from a smaller set of experiments or a better 16 description of the different experiments in a concise manner. The current manuscript balances on a lot of different analysis, different angles and it is very hard to detangle the main message. I find it very tough to get through the current form of the manuscript. I know and see that there is a lot of interesting results in the manuscript, but the current content load of the manuscript is way too high. My recommendation would be to focus on 2/3 main message to better show the reader the added value.

Response:
We apologize for the confusion. As the reviewer can see from the Track Changes version, we have cleared up the writing of both the main text and the Supplementary Material. We also have made substantial changes based on the suggestions/comments from the other two reviewers in the revised version.
Are these trends significant, please provide uncertainty bounds? If I read the caption of Figure 1, they are not outside of the 95% CI. Please also clarify left and right orientated hatching, better to use hatching and some horizontal lines. I left-orientated from left to right or right to left?
Response: Thank you for these suggestions. We have denoted statistical significance of the displayed trends of the pseudo-observation (panels c, f) in Fig. 1 by applying a white mask on the insignificant areas and using contours to delineate boundary of the white-masked areas. For the trends of the ALL simulations (panels d, h) in Fig. 1, we denoted where less than 50% of the simulations had significant trends, also by white masks. We also changed the hatching to use vertical and horizontal lines.
Line 147-148 is not very satisfying, if the authors indicate potential model deficiencies it would be better to show that this is the case instead of leaving the reader without any proof.
Response: Thank you for this comment. We clarified the causes of model deficiencies by using the ratio of precipitation to potential evapotranspiration. We also showed that part of the disagreement may be caused by deficiencies in some of the reanalysis data that went into the merged soil moisture products used in this study. These revisions are reflected in lines 129−141 (tracked change version lines 210−238) and the Supplementary Material Figs. S2 and S3.
Line 151-153, another speculation that is not really support by any evidence. I would prefer Response: Thank you for this comment. This issue is addressed in the same lines and Supplementary Material figures as the comment above.
Mention how stomate closure response affect potential evaporation rates in the future Response: Thank you for this comment. This is implied in Discussion lines 263−267 (tracked change version lines 493−497) "Also, reduced stomatal conductance in response to drought and the future increase in atmospheric carbon dioxide concentration 53 mitigates the impact of rising temperatures on transpiration, which affects both the surface and root-zone soil moisture, but no such mitigation effect exists for evaporation, which mainly affects the surface layer." Zhang, X., Zwiers, F. W., Hegerl, G. C., Lambert, F. H., Gillett, N. P., Solomon, S., Stott, P. A., and Nozawa, T.: Detection of Human Influence on Twentieth-Century Precipitation Trends, 448, 461-465, https://doi.org /10.1038/nature06025, 2007. After reading the revised draft and the authors' responses, I believe the revised draft has sufficiently addressed all of my previous concerns and therefore recommend acceptance of the article. I appreciate the authors' efforts in adding graphs to support their discussions on model-data disagreements and the forcing mechanisms. The readability of the draft was much better than the last one. The choice of GAM in the revised application of emergent constraint is different from traditional linear regression methods, but appeared to be justifiable based on the improved residuals. I think future studies that want to apply emergent constraint on rolling future periods and future detection and attribution studies can benefit from knowledge of such a new procedure.
Reviewer #2 (Remarks to the Author): I appreciate significant efforts the authors made in addressing my comments. I am convinced there are added values to look into different soil layers and soil moistures. I am also convinced the merged soil moisture data is potentially useful for the purpose of this paper if it's caveat is carefully considered when interpreting the analyses. But I am not convinced the detection and attribution method is implemented properly and in particular the interpretation of the relevant analyses is correct. I also don't think what the authors consider to be innovative emergent constraint adds any value.

1) Detection and attribution:
a. Overall, there may be some useful information that ALL could be detected in SSI during certain months but the description and interpretation are confusion and may contain significant errors.
b. Method: I am not sure if the authors fully understand the statistical concepts used in the analysis. Their response does not give the impression that they do. Stating that the 5% significance level and the 95% confidence interval are the same thing of different terminologies is a clear example. The lack of proper understanding in the statistical concepts makes it difficult for the authors to properly describe and interpret what the analyses tell them. Their statement (lines 183-187) "also, the signals of the pseudo-observation … partially attributable to the AER forcing". Every panel in Fig 3 shows very clearly that AER distribution is almost the same as that of noise. Thus any claim that AER can be detected in the observation, and more importantly the observed changes can be attributed to (even in part), cannot be correct, it does not matter if the observed s/n is within or outside the 95% range of AER. The distributions of ALL, ANT, and GHG overlap with that of noise but they are also quite away from the central of the noise distribution. One may be able to argue that if the observed S/N is above of the 95th percentile of the noise distribution and if that is also within certain range of ALL/ANT/GHG, one could make a detection claim. But to do so, one needs to clearly define what the threshold is to make such a claim to avoid misinterpretation.
c. Interpretation: in addition to problems mentioned in a), there are also issues in the meaning of the fingerprints, in particular the lack of physical understanding/interpretation of fingerprints. As described in page … and shown in fig 2, the fingerprints for AER and GHG are very similar for many months and latitudes. But what are the physics for AER and GHG to behave similarly (wet or dry for the same latitude and during the same months)? What is the implication of this in the interpretation of attribution claim?
2) Emergent constraint: a. Overall, the emergent constraint is confusion and does not seem to add any value.
b. The description of the method is confusion and does not seem to be consistent with the description of the results. Lines 604-608 indicate the regression was done between modelled future periods and modelled historical period, but lines 211-213 suggest to relate to the (observed) historical ratio to correct model bias. If the regression is meant to correct model biases, one would expect the future ratio to regress on to observed rather than the modelled ratio.
c. While one may use the observed ratio to correct modelled ratio for the same historical period, one cannot simply extrapolate that adjustment factor to the future period without making explicit assumption such as same warming rate or soil moisture response rate because the s/n ratio is computed for the same fixed 46 year period. But regressing modelled future period's ratio to the modelled historical ratio does not make sense at al. If the signal is sufficiently strong and if the warming rate is consistent along time (assuming soil moisture respond proportionally to warming rate) one would expect the same s/n ratio from the same model, any difference is simply noise/uncertainty. On the other hand, if future warming rates differ from the past (which is unlikely the case under the SSP8.5), the difference may also be affected by future warming rate which shall not be corrected.

Reviewer #1 (Remarks to the Author):
After reading the revised draft and the authors' responses, I believe the revised draft has sufficiently addressed all of my previous concerns and therefore recommend acceptance of the article. I appreciate the authors' efforts in adding graphs to support their discussions on modeldata disagreements and the forcing mechanisms. The readability of the draft was much better than the last one. The choice of GAM in the revised application of emergent constraint is different from traditional linear regression methods, but appeared to be justifiable based on the improved residuals. I think future studies that want to apply emergent constraint on rolling future periods and future detection and attribution studies can benefit from knowledge of such a new procedure.
Reviewer #2 (Remarks to the Author): I appreciate significant efforts the authors made in addressing my comments. I am convinced there are added values to look into different soil layers and soil moistures. I am also convinced the merged soil moisture data is potentially useful for the purpose of this paper if it's caveat is carefully considered when interpreting the analyses. But I am not convinced the detection and attribution method is implemented properly and in particular the interpretation of the relevant analyses is correct. I also don't think what the authors consider to be innovative emergent constraint adds any value.

1) Detection and attribution:
a. Overall, there may be some useful information that ALL could be detected in SSI during certain months but the description and interpretation are confusion and may contain significant errors.
b. Method: I am not sure if the authors fully understand the statistical concepts used in the analysis. Their response does not give the impression that they do. Stating that the 5% significance level and the 95% confidence interval are the same thing of different terminologies is a clear example. The lack of proper understanding in the statistical concepts makes it difficult for the authors to properly describe and interpret what the analyses tell them. Their statement (lines 183-187) "also, the signals of the pseudo-observation … partially attributable to the AER forcing". Every panel in Fig 3 shows very clearly that AER distribution is almost the same as that of noise. Thus any claim that AER can be detected in the observation, and more importantly the observed changes can be attributed to (even in part), cannot be correct, it does not matter if the observed s/n is within or outside the 95% range of AER. The distributions of ALL, ANT, and GHG overlap with that of noise but they are also quite away from the central of the noise distribution. One may be able to argue that if the observed S/N is above of the 95th percentile of the noise distribution and if that is also within certain range of ALL/ANT/GHG, one could make a detection claim. But to do so, one needs to clearly define what the threshold is to make such a claim to avoid misinterpretation.
Thank you for pointing out the potential errors in the methodology. Point-by-point responses are below: i.
Regarding the 5% significance level and 95% confidence interval terminology In the previous revision, we actually replaced "95% confidence level" with "p = 0.05". In the previous round of comments, the reviewer said that "this paper would state to have detected at the 95% confidence interval". Since we only ever stated detectability at "95% confidence level", we believed "confidence level" was what the reviewer meant, but used the "95% confidence level/interval" wording in the response letter in order to echo the reviewer's wording. We are sorry that it has caused further confusion.
We did sate in the response letter that "when a test statistic is within the 95% CI, there is no statistical significance; when outside, there is statistical significance at a 5% level (total probability is always 100%, hence 100% − 95% = 5%)". This we believe to be correct. For example, to quote a 2016 paper on statistical concepts 1 : "The effect sizes whose test produced P > 0.05 will typically define a range of sizes … that would be considered more compatible with the data … This range corresponds to a 1 -0.05 = 0.95 or 95% confidence interval." ii.
Regarding the similarity between the AER and the noise distributions Thank you for pointing this out. This is indeed a problem that we failed to recognize. In the revised manuscript, we added the following to lines XXX-XXX of the main text: "However, the 95% CIs of the AER signals were very similar to the noises, suggesting that the effects of AER on SSI changes are too small and uncertain to warrant attribution." However, we think the potential existence of AER contributions to the signals cannot be completely ignored. In Fig. 3, the significant overlap between the 95% CI of the AER forcing and the noise suggested that the AER forcing has no effect on SSI during our period of interest. But the AER signals and the noises in Fig. 3 were calculated by projecting the simulations on the ALL fingerprints, which mainly reflects the GHG effects. When the AER signals and the noises were projected onto the AER fingerprints, the separation between the AER signals and the noises was improved, especially when the pseudo-observation signals were detected ( Figure  A1i-k, u-w). We added lines 209-211 (lines 213-215 in the tracked version) to the main text to reflect this difference. Furthermore, at least two previous studies that used different detection and attribution (D&A) methods suggested potential contributions by the AER forcing to drought or soil moisture 2,3 . Current D&A of AER effects on hydrological changes is still impaired by the spatiotemporal complexity of aerosol forcings and model inadequacies in aerosol indirect effects 4 . Therefore, we think it is beneficial to report the potential AER contributions along with the caveats in order to promote future studies in this area. Figure A1. Signals of the pseudo-observation (Mean NonCMIP), the shapes and 95% CIs of the Gaussian distributions fitted to the signals of the AER simulations, and the 95% and 99% CIs of natural internal variability (noise). All the signals were for 1971-2016 and the noises were for the window length of 46 years. All the signals and noises were projected on the AER fingerprints.
c. Interpretation: in addition to problems mentioned in a), there are also issues in the meaning of the fingerprints, in particular the lack of physical understanding/interpretation of fingerprints. As described in page … and shown in fig 2, the fingerprints for AER and GHG are very similar for many months and latitudes. But what are the physics for AER and GHG to behave similarly (wet or dry for the same latitude and during the same months)? What is the implication of this in the interpretation of attribution claim?
Thank you for these questions. The fingerprints in our D&A method can be interpreted as the large-scale spatial signatures caused by the forcings 5 . We added the following to lines 143-146 (the same lines in the tracked version) to the main text: "In pattern-based D&A analysis, the fingerprint represents the large-scale spatial signature caused by the forcings, and higher S/N ratios means greater tendency of the spatiotemporal field of interest (here the SSI) to become similar to the fingerprint over time ." The SSI fingerprints were usually similar to the SSI trends, and we intended the discussion on the mechanisms of the SSI trends to be also applicable to the SSI fingerprints. We edited lines 154-169 (154-171 in the tracked version) and line 298-317 (lines 307-333) of the main text to better reflect this intention.
The AER and GHG fingerprints show similarities in Jan-May in the northern mid-latitudes, and Jun-Dec in the northern high-latitudes and southern low-and mid-latitudes. The Jan-Jun AER fingerprints of the surface soil layer, and Apr-Jun AER fingerprints of the root-zone soil layer are not associated with statistically significant trends in the principal components (manuscript SI Fig.  S4). Therefore, the AER fingerprints of these months and soil layers do not reflect directional changes in the SSI pattern like the GHG fingerprints. We added lines 161-163 (lines 162-165 in the tracked version) to the main text to note this. The Jul-Dec AER fingerprints of the surface layer and the Jul-Mar fingerprints of the root-zone layer are associated with significant trends in the principal components (manuscript SI Fig. S4). Still, their similarities to the GHG fingerprints should be accidental and caused by different mechanisms (see next paragraph). Also, any similarity between the GHG and AER fingerprints is not a problem for the D&A, because the D&A involves using the entire latitudinal pattern, which is always different between GHG and AER. Figure A3 and Figure A4 show that the latitudinal trends in the drivers of soil moisture (precipitation, temperature, leaf area index, snow water equivalent) are very different between the GHG and AER simulations. In the GHG simulations, the precipitation trends are generally positive in the Northern Hemisphere, which would induce SSI wetting trends ( Figure A3). Therefore, the widespread SSI drying trends must be driven by the increasing evapotranspiration caused by increasing temperature and leaf area index, and decreasing snow water equivalent that exposes more soil to evaporation ( Figure A3). In the Southern Hemisphere, increasing temperature and leaf area index, and decreasing precipitation both contributed to drying SSI in the GHG simulations ( Figure A3). In contrast, in the AER simulations, temperature is only increasing above 30 o N and decreasing elsewhere, and the AER-forced trends in the drivers of SSI are mostly weaker than in the GHG simulations ( Figure  A4). The increasing temperatures above 30 o N is centered in Europe ( Figure A4), which is opposite to the effects of aerosol loading in Europe 4 and consistent with decreasing sulfate emissions in Europe after 1970 6 . The small patch of drying in the AER fingerprint between 0 o and 20 o N (Fig. 2) corresponds to a band of decreasing precipitation ( Figure A3) that is centered in East Asia ( Figure A4). This is consistent with the increasing aerosol emissions in Asia after 1970 and the weakening effect of aerosols on monsoon circulation 4,6,7 . The drying in the AER fingerprint in the Southern Hemisphere (Fig. 2) corresponds to decreasing precipitation in the Amazon and Brazilian Cerrado regions and the Southeast Asian islands ( Figure A4). The former two are consistent with previously simulated effects of aerosols reductions in the US and Europe 8 . The last region is subject to the influence of monsoon circulation like East Asia. We edited lines 308-317 (lines 318-333 in the tracked version) in the main text to briefly reflect these mechanisms. Figure A2. Trends in latitudinally averaged precipitation (pr), air temperature (tas), leaf area index (lai), and snow water equivalent (snw) over 1971-2016 in the GHG simulations. The trends were calculated using least squares over each simulation, averaged over the ensemble members of each model, and then averaged over the models. Vertical stippling indicates where more than 90% of the models agreed with the displayed sign, and horizontal stippling indicates more than 80%. Figure A3. Trends in precipitation (pr), air temperature (tas), leaf area index (lai), and snow water equivalent (snw) over 1971-2016 in the AER simulations. The trends were calculated using least squares over each simulation, averaged over the ensemble members of each model, and then averaged over the models. Vertical stippling indicates where more than 90% of the models agreed with the displayed sign, and horizontal stippling indicates more than 80%. Figure A4. Trends in precipitation (pr), air temperature (tas), leaf area index (lai), and snow water equivalent (snw) over 1971-2016 in the AER simulations. The trends were calculated using least squares over each simulation, averaged over the ensemble members of each model, and then averaged over the models. Vertical stippling indicates where more than 90% of the models agreed with the displayed sign, and horizontal stippling indicates more than 80%.
2) Emergent constraint: a. Overall, the emergent constraint is confusion and does not seem to add any value.
The value of emergent constraint in this study lies in giving better estimate of the future projected S/N ratios than the simple median or unweighted average over the models. Without applying the emergent constraint, the median modeled S/N ratios often have considerable positive biases compared to the observed S/N ratios (compare the median of the ALL distributions to the observation in Fig. 3, and the dashed versus solid lines in Fig. 4). The modeled S/N ratios are probably too high, because the modeled signals and noises have a common source of error, i.e., the shared model physics. The observed S/N ratios use observation-based signals but modeled noises, and thus do not have this common error. By constraining the future modeled S/N ratios, we avoid drawing too strong conclusions about the strength of future S/N ratios. We edited lines 220-227 (225-232 in the traced version) in the main text to better reflect this rationale.
b. The description of the method is confusion and does not seem to be consistent with the description of the results. Lines 604-608 indicate the regression was done between modelled future periods and modelled historical period, but lines 211-213 suggest to relate to the (observed) historical ratio to correct model bias. If the regression is meant to correct model biases, one would expect the future ratio to regress on to observed rather than the modelled ratio.
We indeed performed the regression between historical and future modeled S/N ratios by treating the historical-future pair of each individual model as an x-y pair in the regression. Then, we plugged the observed historical ratio as an x-value into the same regression to generate point estimates of y-values ("constrained future S/N ratios"), as well as corresponding uncertainty intervals. This is standard practice in emergent constraint, which is a wellestablished method to reduce the uncertainty in future climate projections [9][10][11] . We edited lines 227-232 (lines 232-238 in the tracked version) in the main text to better describe the emergent constraint method, which is also described in the Methods section.
The emergent constraint method is therefore different from traditional bias-correction, which extrapolates the historical biases of models to correct the future projections of the same models.
As noted by the reviewer in point c, this extrapolation makes unrealistic assumptions. Emergent constraint makes different assumptions, which we detail in the response to point c. To avoid confusing the readers with bias-correction, we paraphrased all the bias-correction wordings using "constrained" or "emergent constraint" in the revised text.
c. While one may use the observed ratio to correct modelled ratio for the same historical period, one cannot simply extrapolate that adjustment factor to the future period without making explicit assumption such as same warming rate or soil moisture response rate because the s/n ratio is computed for the same fixed 46 year period. But regressing modelled future period's ratio to the modelled historical ratio does not make sense at al. If the signal is sufficiently strong and if the warming rate is consistent along time (assuming soil moisture respond proportionally to warming rate) one would expect the same s/n ratio from the same model, any difference is simply noise/uncertainty. On the other hand, if future warming rates differ from the past (which is unlikely the case under the SSP8.5), the difference may also be affected by future warming rate which shall not be corrected.
As mentioned above, the emergent constraint method does not extrapolate any historical model-to-observed adjustment factor to the future period. Instead, emergent constraint extrapolates the historical-future relationship from the model world to the observational world.
This extrapolation requires the models to be sufficiently realistic and the relationship between the historical and future variables of interest (in our case the S/N ratios) to be physically valid. Sufficient realism of the models is required by the entire D&A process, and the issue is discussed in Section 2.1 and Section 2.2 lines 170-175 (172-177 in the tracked version) of our manuscript. Physical justification of the historical-future relationship in the S/N ratios is in lines 232-244 (lines 238-251 in the tracked version) of the main text. We edited these lines slightly to reflect the warming rate and soil moisture sensitivity discussion below.
Regarding the two situations that the reviewer describes, if the warming rate and the soil moisture's sensitivity to warming are the same for any given model between the historical and future periods, the situation works well for emergent constraint. If all the modeled S/N ratios are constant and the models are realistic, then it seems reasonable to believe that the best estimate of the yet-to-be-observed future S/N ratio is the historical observed S/N ratio. The emergent constraint will give a 1:1 regression equation between the modeled historical and future S/N ratios, and this regression equation will indeed give a constrained future S/N ratio that is equal to the historical observed S/N ratio.
If the warming rates and the soil moisture's sensitivities differ between the historical and future periods, there can still be an emergent constraint between modeled historical and future S/N ratios, if the models with higher (lower) historical warming rates and historical sensitivity continue to have higher (lower) future warming rates and future sensitivity. In this situation, it seems reasonable to believe that higher (lower) historical observed S/N ratios implies higher (lower) future to-be-observed S/N ratios, which is what emergent constraint will give. A past study already demonstrated historical-future relationship in modeled warming rates 10 . The historical-future relationships in the sensitivity of S/N ratios to warming rates are shown for 2025-2070 in Figure A5; other future periods have similar results. Although the linear relationships of Dec-Apr for the 0-100 cm soil layer do not have positive slopes, the constrained S/N ratios of these months are very close to, and therefore no worse than, the simple multimodel median S/N ratios (manuscript SI Fig. 4). In May-Aug when the future S/N ratios considerably exceed the historical ranges (manuscript SI Fig. 4), the positive linear relationships are always clear. These results support the validity of the use of emergent constraint in this study. We edited lines 232-244 (lines 238-251 in the tracked version) of the main text to reflect the new information in Figure A5.