## Introduction

Climate extremes pose a global threat to agricultural production and future food, feed, and fiber security. The economic cost of crop yield losses inflicted by extreme climatic events between 2003 and 2014 in developing countries around the world was about $80 billion1. In the U.S., the number of extreme weather incidents costing more than$1 billion has steadily increased in recent decades2. Projections of climate change suggest that with rising average global temperature and the impact of warming weather on the water cycle, the frequency of dry spells will increase in subtropical and lower mid-latitude terrestrial regions. While the average precipitation is expected to decline in most regions, light and moderate rainfall will shift to less frequent higher-intensity events3. This is expected to increase the precipitation contrast between dry and wet regions and seasons, leading to further increases and decreases in storm erosivity in wet and dry regions, respectively4,5. It is thus likely that the adverse impact of large seasonal and annual weather variability on natural resources and field crop production will be more pronounced, even in the near term6.

The U.S. southeastern and lower southern plains—a region known as the cotton belt—comprise almost 17% of global upland cotton production, representing over \$6 billion in total revenue and approximately 38% of the world’s exports of raw cotton fiber7. The cotton belt also outranks the rest of the country in billion-dollar extreme weather disasters2. Rising summer temperatures (especially during the night) are also evident in the vast majority of the cotton belt8. Although cotton is considered a drought-tolerant plant, warm summer nights have a documented impact on yield losses by stimulating respiration more than photosynthesis9. Altered rainfall distribution has also led to more frequent soil moisture deficits during the growing seasons3, while higher off-season heavy rainfalls have raised concerns about potential erosion and nonpoint source pollution.

Far-reaching agroecological and socioeconomic consequences of climate extremes demand a systems approach to climate-resilient agroecosystems10. As an integral component of the complex multidimensional ecological resilience concept increased agroecosystem resilience to climate perturbations requires greater emphasis on improving the system’s self-regulatory capacity11. Soil, as a central linkage of interconnected atmospheric and terrestrial domains, is a primary element of agroecosystem resilience to adverse climate conditions. The soil system does climate-buffering through regulating the hydrologic cycle, energy exchange, and temperature12,13, thereby alleviating stresses on plants and biota. The climate adaptability of the soil system is strongly related to intrinsic attributes while careful land use and judicious soil management have a documented impact on soil structural stability, yielding greater permeability during heavy rainfall, enhanced water storage in drought, and improved gas exchange for biological respiration and heat acclimation14,15. Such management practices offer a great potential to enhance the soil system’s capacity to resist and recover from climate perturbations, thereby maintaining vital soil functions and services.

Among the numerous benefits of conservation agricultural practices such as minimized soil disturbance, cover cropping, and crop diversification,16 has suggested that improved soil organic carbon (SOC), soil aggregation, nutrient cycling, and biodiversity are the most important factors enhancing soil resilience and recovery potential, either directly or by invoking other processes. However, the extent to which those qualities can uphold soil productivity under various types, severities, duration, and timing of climate extremes is poorly understood. The historical and prospective impact of increasing climate extremes on agroecosystem productivity and environmental services have been extensively reported in the literature17,18,19. There is still a lack of advanced knowledge, however, about the potential of agricultural practices to enhance soil system adaptability to climate extremes6.

Soil resilience comprises a key component of both “soil quality” (or “soil health”) and “soil degradability”, although certain discrepancies exist. For example, the common soil quality assessment frameworks barely consider the prolonged consequences of soil degradative processes (e.g., soil erosion) and changes in short- and long-term soil productive, environmental, and capital functions when exposed to extreme climate events6. In these models, therefore, anthropogenic soil disturbances such as tillage, machinery traffic, or biomass removal are the primary stressors that agroecosystems and their managers must deal with to maintain “soil quality”10,20. Likewise, indicators of soil degradability rarely address the soil’s capacity to recover from environmental stresses. Therefore, under average climate conditions with a low proportion of extreme to near-normal years, soil quality indicators may well represent the sustainability of soil management systems. However, with the manifest increase in extreme weather conditions, integrating the soil degradation process into soil quality can provide operational insight into whole-system attributes of soil resilience and sustainability.

Several methods have been recommended for assessing soil resilience, the majority of which are based on soil “malleability”, defined as changes from the initial state in individual soil quality indicators after exposure to stresses16,21,22. This approach can be useful when a static snapshot of a soil-specific situation at a given time and space is desired. However, it neglects many spatiotemporally dynamic characteristics of the soil system and stressors, as well as the external factors influencing soil resilience. Among those factors is the interaction between multiple soil properties (soil function), recovery rate (elasticity), terrain characteristics, stressor attributes (e.g., severity, magnitude, timing, duration, and target), and soil surface cover. A quite different approach to characterizing soil resilience is evaluating soil capacity to sustain its productivity23, environmental24, and capital25 functions under adverse climate conditions. The soil productivity function (e.g., biomass) is a surrogate for the multi-faceted interactions among soil–landscape–climate properties and processes that determine the capacity of soil to uphold production stability in unfavorable climates. In addition, soil productivity functions can represent the effect of physical damage due to climate extremes on yield. Finally, external inputs such as fertilizers, lime, and irrigation can enhance production stability in the face of climate stresses without considerably contributing to the soil self-regulatory capacity. As such, a measure of soil resilience based on productivity functions should be informed by detailed knowledge of the type and level of external inputs.

Climate extremes are complex natural phenomena with a wide range of definitions based on the exposed resource and spatiotemporal extent. The literature displays large inconsistencies regarding the conceptual and operational definition of drought, making it difficult to compare and synthesize results26. Defining drought based on the local divergence of climate parameters from long-term normals can considerably reduce the applicability of study outcomes across scales. The use of standardized drought indices has been suggested by26 as an effective approach to resolve this issue. Drought indices assimilate data on precipitation, evapotranspiration, streamflow, and other weather inputs into a single numerical value, allowing for multi-scalar evaluation of weather anomalies. These indices are currently used by programs such as the U.S. Drought Monitor and the European Drought Observatory for drought monitoring and early warning at regional to global scales. However, these climate-based early warning systems may not be well-suited to predict the risk of crop failure across plants and management systems. Plants vary largely in drought/wetness tolerance and heat and moisture demand across phenological stages27. The USDM outcomes are utilized by the United States Department of Agriculture (USDA), the Farm Service Agency (FSA), and the Internal Revenue Service (IRS) to support decisions on eligibility for low-interest loans, livestock forage programs, and tax deferrals28. Assessment and prediction of production vulnerability, and of responsive environmental, financial, and social programs require the drought assessment systems to be calibrated for crop- and management-specific climate adaptation potentials. However, it should be acknowledged that developing such systems requires information input from multiple rather than a single long-term experiment as presented here.

Developing climate-resilient agricultural practices is the anchor of many national and global climate adaptation frameworks29,30, but the recommended practices are often not based on experimental evidence and do not provide operational details, so remain generic and impractical6. The major limitation in developing evidence-based strategies is the lack of long-term soil and crop data under diverse management systems across a representative range of different climate conditions. Long-term field experiments with supervised management systems and controlled biotic stresses—e.g., pest, diseases, weed pressure—are essential in understanding the dynamic effect of management systems on soil adaptive capacity under favorable and unfavorable environmental conditions.

In this study we address the long-term impacts of various crop-management systems on the stability of cotton yields in the face of climate extremes. We do this by utilizing twenty-nine years of cotton yield data across thirty-two management systems, along with shorter-term measured soil biophysical properties, and a suite of the most common drought indices to (1) identify the most relevant climatic variables explaining long-term fluctuation in cotton yield; (2) identify the management systems resulting in the greatest yield stability across extreme climate conditions; and (3) better understand underlying soil mechanisms controlling soil resilience and consequently upholding yield stability in extreme climate conditions. Here we show that among climate indices, cumulative standardized precipitation index (SPI) and standardized precipitation evapotranspiration index (SPEI) during the growing season best explains the long-term mean cotton yield variabilities. Among the raw climate variables, mean soil temperature (MST) during the entire growing season, and especially low values of MST in August was the most critical yield-limiting factor. Here we show that throughout the May–October growing seasons, drought severity in August has the most detrimental effect on cotton yield. Our results suggest that long-term no-tillage (NT) is a practice that maintains yield stability in unfavorable climate conditions while enhancing crop yield in favorable years. Across N rates, legume cover crops—particularly hairy vetch (HV)—maintained higher yield than cereal and no cover crop treatments in near-normal and relatively dry (RD)/wet years. However, they were not effective in maintaining yield stability in dry and wet years. High SOC concentration is a common characteristic of high-yielding practices in near-normal and RD/wet years. However, yield stability in dry and wet years is more strongly linked to higher wet aggregate stability (WAS) and N content.

## Results

### Relationship between climatic factors and long-term yield variability

The relationship between the time series of overall cotton yields and climate variables varied largely across drought indices/climate indicators and cotton developmental stages (Fig. 1). Drought indices with a varying degree of association showed a quadratic relationship with the long-term mean cotton yield. In general, the coefficient of determination was higher for management systems with higher yields. Among all indices, cumulative SPI ($${R}^{2}$$ = 0.68) and SPEI ($${R}^{2}$$ = 0.63) during the growing season exhibited the strongest relationship with the mean cotton yield across management systems (Fig. 2). By accounting for the effect of potential evapotranspiration (PET) in water balance, SPEI led to one unit greater drought severity compared with SPI, resulting in the x-axis difference in Fig. 2. Throughout the May–October growing seasons, drought severity (represented by SPI, SPEI, and Palmer Z-index) in August had the most detrimental effect on cotton yield (Fig. 1a). Long-term cotton yield showed an identical responses to PET and degree days (DD60’s). The relationship between long-term cotton yield and raw climate variables including monthly air temperature and cumulative precipitation was similar to those of drought indices, but with lower $${R}^{2}$$ values (Fig. 1.b). However, yield responded differently to air and soil temperatures. MST during the entire growing season ($${R}^{2}$$ = 0.45), and especially low MST in August ($${R}^{2}$$ = 0.38) was the most critical yield-limiting factor among the raw climate variables. Air temperature in either individual months or during the entire growing season was poorly related to the cotton yield. Among the possible 37 input variables, the MARS procedure selected SPI-GS (SPI in the growing season), SPEI-GS, and MST-GS to explain 78% of the variability in long-term cotton yield. Diagnostic plots and model equations are given in Fig. S1. The selected variables were utilized for the hierarchical clustering of the experimental years into different climate conditions. The resulting dendrogram grouped experimental years into six clusters, including dry (D; cluster A; 2 years), RD (cluster B; 3 years), near-normal (NN; cluster C; 9 years), high-yielding near-normal (NNhy; cluster D; 7 years), relatively wet (RW; cluster E; 5 years), and wet (W; cluster F; 3 years) years (Fig. S2).

### The impact of management system on yield stability across the climate conditions

Long-term mean cotton yield in the current experiment showed a strong correlation with the published mean yield across Tennessee (r = 0.80) and to a smaller extent with the mean yield across the U.S. Cotton Belt (r = 0.60)31 (Fig. S3). Considering the yield outcome in low- and high-yielding years and disregarding the source of environmental stress, management systems with NT showed the most consistent effect on yield stability in poor environmental conditions (Fig. 3e, f, i, j). NT also increased yield potential for management systems with legume cover crops (Fig. 3a, b). Legume cover crops, in turn, decreased the yield gap among the management systems with a range of N application rates (Fig. 3g, h). Across drought severities, NT maintained a higher cotton yield than CT. The cover crop effect was significant in RD, near normal (NN), and near-normal high yield (NNhy) years, but not in dry (D) and wet (W) years. The N rate and N rate by cover crop interaction were significant in all climates except for the dry years (Table 1 and Table 2). In dry years, where the drought severities exceeded −3.4 (SPI) and −5.1 (SPEI) during the growing seasons, cotton yields under all management systems fell below 50% of the global mean yield (1 Mg ha−1) (Fig. 4). Within the lower 10 percentile of global mean yield (up to 0.57 Mg ha−1), NT significantly lowered the risk of yield losses compared with CT, whereas the effect of cover crop, N rate, and their interaction was not significant (Table 1). In general, the lowest probability of crop failure was observed within RD, NN, and NNhy climates with drought/wetness severities ranging between SPI scores of −1.7 to 0.46 and SPEI scores of −2.2 to 0.78 (Fig. 4). Within these climates, higher N rates and legume cover crops predominantly controlled higher yield outcomes. However, in extreme climates, both in dry (D) and wet (W) years, NT was the management that maintained higher cotton yield, as shown by the F-values in Table 2. The most significant contribution of NT to higher yield stability occurred in wet climates, where NT systems yielded on average 22% more cotton than CT systems. In W (SPI averaging 2.92 and SPEI averaging 3.61) and RW (SPI averaging 1.27 and SPEI averaging 1.90) years, the cover crop effect was not significant, but a minimum N rate of 34 kg ha−1 remained critical to maintaining yield stability. Figure 5 demonstrates the yield benefits of tillage, cover crop, and N rate compared with the corresponding control treatments in different climates. In all climates except for wet years, HV planted on either CT or NT resulted in greater yield benefits than NC (control no cover crop treatment). CC, the other legume cover crop, outperformed NC in D, RD, NNhy, NN, and RW climates, but only when it was planted in an NT system. However, management systems with both HV and CC cover crops exhibited lower yields than those with NC in wet years.

### Relationship between soil resilience indicators and yield stability across drought and wetness severities

Among soil resilience metrics, SOC provided the strongest explanation (R2 = 0.87) for yield stability under a range of drought and wetness severities (Fig. 6). ANOVA tests showed that intercepts, linear and quadratic slopes, and their interactions with climates differed (P < 0.001) among climate conditions (Table 2). The linear slope indicated 17 and 16% decrease in the probability of yield losses per unit increase in SOC from 7 up to 10 g kg−1 in RD and NNhy climates, respectively. In RW and NN climates a unit increase in SOC lowered the probability of yield losses by 12 and 10%, respectively. In W climate the corresponding decrease in the risk of yield losses was 5%, while SOC had a negligible effect on increasing yield stability in D climate. Total bound N (TNb) explained 78% of the variability in cotton yield stability across climate conditions (Fig. 6), with the intercept, common linear slope, and linear by climate interaction differing (P < 0.05) among climate conditions. Although the common quadratic slope was significant, it did not considerably relate to yield stability in the observed range of TNb. The effect of TNb on yield stability in different climates revealed an almost similar effect to that seen in SOC. During NNhy and RD years, greater TNb levels effectively reduced the risk of yield losses by 6.0 and 5.9% per unit increase in TNb, respectively. In contrast to SOC, increases in TNb concentration up to 200 mg kg−1 resulted in 5.0 and 5.3% reductions in the probability of yield losses in D and W years, respectively. Small differences in linear slopes among climate categories indicated that of the soil resilience metrics, WAS was the most critical factor in decreasing the risk of crop failure in the most extreme (W and D) climates. WAS explained 71% of the variability in cotton yields stability across climate conditions. Within the WAS values from 25 to 39%, a unit increase in WAS lowered the probability of yield loss by 2.4% in NNhy and RD climates. The corresponding values for RW, NN, W, and D climates were 2.2, 2.2, 2.1, and 2.0%, respectively (Table S1). The POxC and MB exhibited similar relationships with the yield stability; with increasing POxC and MB values a decreasing common linear yield loss slope was evident (P < 0.05), but the slopes did not differ among different climates. DOC did not significantly relate to yield stability.

## Discussion

The rainfed water supply and controlled biotic stresses in this long-term experiment provided a unique opportunity to observe the direct impact of true climate perturbations on yield. In general, soil temperature anomalies in August (optimum MST $$\approx$$ 27–31 °C) and rainfall deficiency in August (optimum $$\approx$$ 100–140 mm) and to some extent in June (optimum $$\approx$$ 50–100 mm) were the most critical yield-limiting climate factors. Considering the cotton growth life cycle in the study region, August is the period of peak bloom to open boll formation, which coincides with peak water consumption by cotton plants due to high transpiration and evaporative water demand. Along with the sufficient water supply, adequate accumulation of heat units upon the appearance of first bloom is essential for achieving boll maturity and fiber quality32. Likewise, the first square to first bloom development occurs mostly within June and is a critical time for avoiding water deficit stress. Cotton vegetative growth is rapid within the three weeks following the emergence of the first square, requiring rapid uptake of phosphorous (P) and potassium (K) through the aqueous medium33. Water deficiency within this period can adversely affect the biomass production and the number of fruiting sites as well as the earliness, which is particularly important in short-season environments. The effect of sub-seasonal and seasonal climate fluctuations on the variability of cotton yield was well-addressed by SPI, SPEI, and MST indices/indicators. Management systems with higher global yields generally showed a stronger relationship with climate indices. This indicates that high-yielding management systems diminish the adverse effect of other environmental stresses on yield and boost greater yield outcomes with improving climate conditions. SPEI index includes the PET, but this did not improve the predictability of yield variability over that from SPI. This can perhaps be attributed to the indeterminate relationship between PET and DD60’s. Though rising air temperature during critical cotton growth stages may enhance the evaporative loss of water, sufficient heat unit accumulation (DD60’s $$\approx$$ 2300) is essential to supporting many physiological processes that eventually determine cotton yield. As shown in Fig. 1, PET, DD60’s, and Tmean across cotton growth stages demonstrated weak but very similar relationships with yield variability. The quadratic relationship between PET and yield variability implies that, for example, in August, there is an optimum range of PET that maintains a high yield, while lower and higher values lead to yield losses. This is unlikely however that low PET in an RD month like August in such a rainfed system causes yield decline. This inconsistency is likely since low PET means low degree days due to the similar effect of air temperature on PET and DD60’s, but their contrasting effects on yield outcome. Alternately, both seasonal and sub-seasonal variations in MST were more strongly related to the yield variability (Fig. 1). In addition to the incidental solar radiation, the soil temperature is strongly affected by soil biophysical properties such as texture, compaction, organic matter content, and surface cover. Management can modify the majority of these factors, thereby altering the soil water regime, so soil temperature variations may not be well represented by air temperature variations34. Maximum yield outcome was obtained when MST within the time from first bloom to open boll development varied between 27 and 31 °C. Soil temperatures outside this range caused significant yield reductions. Residue cover left by conservation tillage systems such as NT can effectively buffer extreme summer heat by insulating the soil surface and enhancing albedo35, though this effect may gradually diminish throughout the growing season36,37. Accelerated residue retention by cover crops can further strengthen the heat buffering capacity of NT soil. Moreover, reduced soil manipulation stimulates organic matter accumulation in the topsoil and increases micropore volume, which collectively contributes to a higher soil water retention capacity38,39. Except for the two extremes of near-saturated and near-dry conditions, within a wide range of soil moisture contents water and vapor movement occurs simultaneously with temperature gradient and vice versa. A wetter soil by virtue of its greater thermal conductivity and heat capacity can decrease the diurnal temperature cycle amplitude. Therefore, soil temperature may relatively well represent the management effect on heat regulation. However, further long-term temperature monitoring at different soil depths is needed to fully comprehend the role of conservation practices on soil heat regulations and their impact on yield variability.

NT was the primary management practice boosting yield in favorable climates and reducing yield losses in unfavorable climate conditions. Even though the yield benefit of NT compared with CT was smaller in generally favorable climate conditions than the effect of higher N rates (67 and 101 kg ha1) or legume vs non-legume cover crops, NT exhibited a consistent yield benefit in all climate conditions (Table 1 and Fig. 5). In addition, NT increased the general effectiveness of cover crops—in particular legumes—in enhancing the yield outcome across climate conditions40 (Fig. 5). Moreover, NT considerably increased N use efficiency compared with CT. For example, management systems with 34 kg N ha1 on NT delivered on average greater yield benefits than systems with 67 kg N ha−1 on CT (Fig. 5). Reducing N fertilization by 50% while maintaining a high yield in all climate conditions (except for RD) has profound environmental and economic implications. For instance, it has been observed that net agronomic nitrous oxide emissions were governed by N fertilizer rate41. However, in the present study, the major N benefit of legume cover crops was associated with RD and near-normal years. In wet (W) and relatively wet (RW) years WW with the maximum N rate (101 kg ha−1) was associated with higher yields (Table 1). This is likely due to higher N loss through leaching, denitrification, and runoff in wet conditions. All wet (W) years in this study were characterized by heavy early growing season rainfall, particularly in May and June. Excessive early season rainfall—especially under cover crops with high residue mulch—is likely to adversely affect crop stand and reduce yield (Fig. 5). It is tied with low soil temperature due to lower heat exchange and high moisture content, which may hinder timely seed germination or plant establishment. Also, among wet years, late summer and fall storms in 2018 starting in September delayed harvest and caused major boll weathering and yield decline. Generally, the increasing global atmospheric temperature has been assumed to benefit crop production by extending growing seasons in some regions42. However, increasing surface temperatures evaporate more water into the atmosphere, which may cause yield losses by spawning more intense late summer and fall storms43,44,45. Combining WW cover crop either with NT or CT did not prompt significant yield benefits greater than NC in any climate condition. In general, the yield on CT management in most climate conditions remained highly dependent on enhanced N fertilization. The only exception was dry years, in which enhancing N rate beyond 67 kg ha−1 decreased yield in both CT and NT systems. A strong relationship between long-term mean cotton yield variability in the current rainfed experiment and the broader regional yield data indicates that yield at a regional scale is still dictated by climate conditions, especially in wet and dry years (Fig. S3). Although this may not be fully addressed through a single long-term experiment, this relationship signifies the importance of climate early warning systems that account for the crop- and management-specific vulnerabilities to climate extremes across the phenological stages. Thus, the precise auxiliary practices (e.g., irrigation, drainage, fertilization, cover crop planting and termination, and harvest scheduling) could be implemented on time to avoid considerable economic losses.

Our results suggest that greater SOC concentration is key for improved yield stability in NN and RD and wet climate conditions. The minimum threshold of 10–15 g kg−1 SOC suggested by46 to maintain soil resilience, restore soil biodiversity, and improve structural stability is in perfect accord with our observations (Fig. 6). However, there was a minimal effect of higher SOC on yield stability in dry and wet years (Table S1). Improved soil structural stability as represented by WAS proved to be the primary soil attribute linked to maintained yield stability under extreme climates, particularly in wet years. Soil aggregates physically protect terrestrial carbon pool and the level of aggregation and their stability determine the severity of fluvial, aeolian, and gaseous soil, carbon, and nutrient losses47. Climate forces such as kinetic rainfall energy or aeolian abrasion are known to disrupt soil aggregates. The resultant particles are prone to transport, carrying nutrients and C, and forming surface crusts. This process can promptly and chronically decrease productivity by reducing available water and nutrients, depleting SOC, and impeding air and water circulation. The positive relationship between SOC and WAS in this study (r = 0.65) corroborates previous findings on the linkage between SOC and WAS38,47,48. Soil total bound nitrogen, TNb was strongly related to yield stability in extreme climate conditions (Fig. 6 and Table S1). Besides being an essential nutrient for plant growth, nitrogen along with phosphorus and sulfur are important elements in the process of biomass humification49, which in turn contributes to the longevity of soil structural stability and resilience. Retention and supply of N in acute climate conditions may increase soil resilience through multiple processes, such as the regulation of water and nitrate influx and efflux rate, maintaining air–water balance for continued N mineralization and nitrification, and heat regulation in dry periods to avoid N volatilization. The maintenance of the N balance is thus an important indicator of climate-resilient agroecosystems.

Previous studies have demonstrated the regulatory influence of conservation practices on key soil functions—e.g., water, heat, and nutrient fluxes, structural hierarchy and stability, carbon sequestration, and biological vitality and diversity15,37,38,50. Such a soil system is expected to provide improved resilience to extreme climate events such as torrent rainfalls, heat waves, and droughts, thereby reducing the climate stresses on plants and stabilizing crop yield. The effect of improved overall soil health on crop yield has been shown to become more evident over time51. However, there is a lack of sufficient long-term crop yield data from different conservation practices exposed to a representative range of climate conditions. Consequently, there is little evidence-based information regarding the role of conservation practices in the climate adaptability of agroecosystems. Our study provides insight into the long-term interaction between climate extremes and yield stability under 32 tillage, cover crop, and nitrogen rate practices. We demonstrate how management-induced changes in soil condition, control soil resilience which in turn supports yield stability during unfavorable climate conditions. Our findings indicate that long-term NT provides a consistent basis for greater yield stability across climate conditions. In addition, NT improves the effectiveness of cover crops, particularly legumes, in drought mitigation and crop yields maintenance. Moreover, NT provides those benefits with relatively lower dependence on nitrogen fertilizer which has important environmental and economic implications. We defined the drought and wetness severities based on standardized drought indices to enable cross-scale evaluation, comparison, and synthesis of results. Demonstrated crop-and management-specific vulnerability of crop yield to climate extremes provides insight into climate-resilient agroecosystem management strategies. However, it should be acknowledged that these results represent a limited geographic scale and soil resilience has been quantified based on a limited number of variables. Therefore, further long-term studies are required for improved quantification of the soil resilience concept and its linkage with crop yield stability under climate extremes.

## Methods

### Long-term experiment

This study was conducted using data from a long-term rain-fed plot-scale cotton experiment at the University of Tennessee’s West Tennessee AgResearch and Education Center in Jackson, TN, USA (35°37′N: 88°51′W, altitude 113 m). The humid subtropical (Köppen classification) climate of the study region is characterized by a mean annual temperature of 15.5 °C and mean annual precipitation of 1375 mm. The experiment was originally initiated in 1981 on a 2-hectare area. The complete set of current treatments has been in place since 1984. The experiment is located on Lexington silt loam (fine-silty, mixed, thermic Ultic Hapludalfs) with a field slope ranging from 0 to 2%. The experiment layout comprises a cross-classified arrangement of four nitrogen (N) fertilizer rates: 0, 34, 67, and 101 kg Na ha−1; four cover crop treatments: winter wheat (WW, Triticum aestivum L.), HV (Vicia villosa), crimson clover (CC, Trifolium incarnatum L.), and no cover crop (NC, “winter weeds”); and two tillage systems: conventional tillage (CT) and NT; thus totaling 32 treatment combination with four replications established in 128 plots in a randomized complete block design with split–split plot. The experiment was initiated in 1981, and the complete set of current treatments have been in place since 1984. Cover crops are terminated in late April to early May with 3.51 L ha−1 of Gramoxone® SL 2.0 (N, N-dimethyl-4,4-bipyridinium dichloride) (Syngenta Global). Within 10 days, plots are broadcast with ammonium nitrate (NH4NO3) at the assigned rates. The CT treatments are double disked to a depth of 10 cm and harrow leveled to prepare the seedbed. Cotton is seeded uniformly in all plots. Cotton cultivars planted through the course of the experiment included Deltapine 50, Stoneville 825, Stoneville 474, Deltapine 425, Deltapine 451, and Phytogen 375. In October of each year the cotton was harvested, ginned, and lint yield was recorded.

### Characterization of climate extremes

Historical daily precipitation and air temperature data were recorded by a meteorological station at the experimental site. Weather anomalies were characterized using both raw climatic data and drought indices. To calculate the drought indices, we used version 1.0.4 of the “climate-indices” software package for the Python language52. The drought indices used, included SPI, SPEI, palmer drought severity index (PDSI), self-calibrated PDSI (sc-PDSI), palmer hydrological drought index (PHDI), palmer moisture anomaly index (Z-index), palmer modified drought index (PMDI), and percentage of normal precipitation (PNP) (fig. S5). The SPI is the number of standard deviations of the cumulative precipitation from the long-term median. To calculate SPI, a three-parameter gamma probability density function is fitted to the time series of precipitation data. The associated probability density distribution is then transformed to a normal distribution53. The SPI can be calculated at time scales from ¼- to 48-months to reflect the availability of different water resources. We calculated SPI for ¼- and 1-month scales since plants respond relatively quickly to precipitation anomalies. The SPEI follows the same computational procedure as SPI but additionally accounts for the effect of temperature variability in drought assessment. The PET required by SPEI for the assessment of water balance was calculated using the Thornthwaite54 model. The PDSI is another well-known index estimating the duration and severity of drought based on monthly precipitation and temperature data as well as the soil’s plant available water content (PAWC). In this study, the mean PAWC value (0.18 cm3cm−3) across management systems was obtained from15. The sc-PDSI, PHDI, and Z-index55,56 are variants of the PDSI index with an improved analytical capacity to capture rapidly emerging drought and moisture anomalies, long-term drought effect on water storage and streamflow, and extreme weather events in heterogeneous environments including multiple micro-climatic sub-regions. Soil temperature measured continuously at 10 cm depth was also considered as a variable not associated with any specific index. Finally, cotton growth stages are often defined as the number of days after planting—but achieving proper growth at each milestone also strongly depends on the number of days with optimum air temperature. Degree days (DD60’s) is a measure of heat units that plants have accumulated before they undergo defoliation and is often used as an indicator of plant maturity. Therefore, DD60’s was also calculated as the cumulative average daily air temperature minus 15.6 °C (minimum air temperature in which cotton growth occurs) over monthly periods.

### Quantification of soil resilience

The multi-disciplinary nature of resilience in ecosystem studies has led to a broad spectrum of definitions and assessment metrics, but there is a lack of proper quantification of agroecosystem resilience, particularity from the soil perspective. We used a suite of soil physical, chemical, and biological properties commonly used for soil health assessment to foster insight into the underlying mechanisms governing soil resilience and consequently yield stability in extreme climatic conditions. The soil properties included SOC, permanganate oxidizable carbon (POxC), dissolved organic carbon (DOC), WAS, totally bound nitrogen (TNb), and microbial biomass (MB). During the growing season of 2019, a composite soil sample including three random sub-samples was collected from each plot (n = 128) at 0–10 cm soil depth. Sampling was conducted by a hand auger except for the aggregate stability analysis, for which clods were collected by hand and fractured gently into pieces to obtain the natural soil aggregates. The SOC was measured by dry combustion method at 950 °C using a CN analyzer (Elementar vario SOC cube in solid mode, Hanau, Germany). Following the method of57, POxC was measured by adding 20 mL of 0.02 M potassium permanganate (KMnO4) to 2.5 g of soil sample. The solution was shaken at 120 rpm for 2 min and allowed to settle for 8 min. The supernatant was extracted and diluted to 1/100. Finally, the absorbency rate of the dilute solution was measured at 550 nm by a microplate reader (Biotek Instruments, Inc., Vermont, USA). To measure DOC, 10 g soil sample was mixed with 40 mL de-ionized water. The solution was shaken for 30 min, then centrifuged at 3500 rpm for 20 min and passed through a 0.45 µm membrane. The DOC concentration was measured by a SOC analyzer following the method of58 (Shimadzu Instruments, Inc., North America). The WAS was measured using a wet sieving apparatus (08.13, Eijkelkamp Agrisearch Equipment, Giesbeek, the Netherlands). Natural air-dried soil aggregates sized between 1 and 2 mm were placed on 0.26 mm sieves and wetted through capillary water motion from a wet cloth placed on the sieve bottom. Samples were raised and lowered in distilled water for 3 min with a frequency of 35 times per min59. Thereafter, the unstable fraction was oven-dried and weighed and stable fractions were corrected for the sand content using 2 g L−1 of sodium hydroxide as a dispersant. The fraction of water-stable aggregates was calculated as

$${{{{{\rm{WAS}}}}}}=\frac{\left({{{{{\rm{on}}}}}}\,{{{{{\rm{sieve}}}}}}-{{{{{\rm{sand}}}}}}\right)}{\left({{{{{\rm{total}}}}}}\,{{{{{\rm{sample}}}}}}-{{{{{\rm{sand}}}}}}\right)}\times 100$$
(1)

For MB, soil sampling was conducted in a relatively wet soil condition. Samples were not sifted unless they displayed rootlets or debris, in which case these were removed by sifting through a 3 mm soil sieve. Samples were analyzed in triplicate and the average CV was less than 10%. The assay was by the microBIOMETER© (Prolific Earth Sciences, Montgomery, NY, USA) kit protocol using kit reagents and supplies. 0.5 mL of soil was packed into the microBIOMETER soil sampler and added to the 10 mL of the high salt extraction solution which loosens the attachment of the microbes to soil particles. Prior to whisking for 30 seconds, the clay plug of soil was broken up by mechanically stabbing the base of the tube with a stainless-steel spatula. Samples were allowed to settle for 20 min, during which time the soil precipitates, and the fluid above the soil is colored with the microbes which contain soil pigments. Six drops from the top 13 mm of the extract in the tube were applied to the test card containing a membrane that traps the microbes on the surface while allowing the fluid to wick away. The intensity of the color on the membrane correlates with MB. MB was estimated using the microBIOMETER app on a cell phone. Briefly, the phone takes a picture, and the MB is estimated from the intensity of the color of the test card. At the same time as the microBIOMETER test was performed, each extraction (three samples) was analyzed by digital microscopy by making three slides and taking ten pictures of each slide, and analyzing by particle size. Every soil sample was thus analyzed by microscope and test card a total of nine times. The microscopic data showed an excellent correlation with the test card data.

### Statistical analyses

#### Yield stability under extreme climate conditions

The yield data for the first 5 years (1984–1988) of the experiment was removed to only consider the more stable condition of management systems following establishment. So, we considered yield data from 1989 to 2018 for n = 371251. The only exception was 1993, for which yield information is not available (Supplementary Data 1). To assess seasonal and sub-seasonal drought influences on linearly de-trended cotton yield, all drought indices, and soil- and plant-specific drought indicators were calculated at a monthly time scale and also accumulated over the growing season (May 1 to September 30) to obtain the “drought severity”60. The resulting 53 variables were subjected to principal component analysis to reduce multicollinearity. Due to the non-linear relationships between yield and the drought indicators/indices, we used multivariate adaptive regression splines (MARS) models61 to identify the most relevant variables explaining yield fluctuations. The MARS model is a non-parametric approach to capture the nonlinearity aspect of singular and interactive features in multiple regression by assessing cut points (knots). We used the “earth” function in R62 to apply a threefold generalized cross-validation procedure for “forward” pruning of features to avoid overfitting. The variable selection procedure was continued until the nature of non-linearity and an optimum number of knots were identified. The model selected four variables to build a model with five “basis functions” (including the intercept) up to the second degree of interactions (Fig. S1). We then applied a hierarchical clustering with the “Ward” partitioning algorithm to cluster the experimental years based on the drought variables selected by the MARS procedure. The optimum number of clusters was determined using the “elbow” method, which restrains the number of clusters as a function of the total within-cluster sum of squares (WSS), such that adding more clusters does not improve WSS. The resulting six clusters were marked as two clusters of near-normal years, and one cluster each of RD, dry, relatively wet, and wet years. The yield stability was assessed separately within each of these climate clusters. We defined yield stability as the probability that yield of the management system i falls below the specified level of $$\lambda$$ (10, 50, or 90th percentile of the global mean yield) in a given environment j . This probability was calculated using the equation

$${\Pr }\left({y}_{{ij}}\le \lambda \right)=\int_{-{{{{{\rm{\infty }}}}}}}^{\lambda }1/{\sigma }_{{ij}}\sqrt{2\pi \,}{\exp }\left[\left({-\left(\lambda -{\mu }_{i}\right)}^{2}\right)/{2\sigma }_{{ij}}^{2}\right]d\lambda$$
(2)

where $${\mu }_{i}$$ is the mean yield and $${\sigma }_{{ij}}$$ is the variance of the ith system. The normality of yield distribution for all management systems was satisfied based on the Shapiro–Wilk test (W > 0.9).

### Soil resilience control on yield stability across climate conditions

Using indicator variable regression in SAS63, the effect of a suite of soil quality indicators (Supplementary Data 2) on the probability of yield losses below the fiftieth percentile of global mean yield was determined under each climate condition. Indicator variable regression combines analysis of variance and regression. In this process (also known as dummy regression), a regression line for each treatment level is modeled, allowing slopes and s to be compared. Here, the climate conditions (e.g., dry and wet) were considered as categorical variables. A full model with linear, quadratic, and all interactions with treatments was executed, giving

$$\left({{{{{\rm{Pr }}}}}}{y}_{{ij}}\le \lambda \right)= \, \mu +C{L}_{i}+{\beta }_{1}\times S{P}_{{jj}}+{\beta }_{2}\times S{P}_{{jj}}\times C{L}_{i}+{\gamma }_{1}\times S{P}_{{ij}}\times S{P}_{{ij}}+{\gamma }_{2}\times C{L}_{i}\,\times S{P}_{{ij}}\times S{P}_{{ij}}+{\epsilon }_{{ij}}$$
(3)

where the probability of yield losses was predicted at ith climate condition (CL) in jth soil attribute (SP), $$\mu$$ is the common intercept, $$\beta$$ and $$\gamma$$ are slope parameters, and $$\epsilon$$ is the error term. Using a stepwise procedure, non-significant terms (P < 0.05) were dropped. To obtain the parameter estimates, significant terms in the final models were reduced to only one term per component (e.g., intercept, and linear and quadratic slope) and common intercept was removed using “point” option embedded in the “DANDA.SAS” macro64.