Response of stratospheric water vapour to warming constrained by satellite observations

Future increases in stratospheric water vapour risk amplifying climate change and slowing down the recovery of the ozone layer. However, state-of-the-art climate models strongly disagree on the magnitude of these increases under global warming. Uncertainty primarily arises from the complex processes leading to dehydration of air during its tropical ascent into the stratosphere. Here we derive an observational constraint on this longstanding uncertainty. We use a statistical-learning approach to infer historical co-variations between the atmospheric temperature structure and tropical lower stratospheric water vapour concentrations. For climate models, we demonstrate that these historically constrained relationships are highly predictive of the water vapour response to increased atmospheric carbon dioxide. We obtain an observationally constrained range for stratospheric water vapour changes per degree of global warming of 0.31 ± 0.39 ppmv K−1. Across 61 climate models, we find that a large fraction of future model projections are inconsistent with observational evidence. In particular, frequently projected strong increases (>1 ppmv K−1) are highly unlikely. Our constraint represents a 50% decrease in the 95th percentile of the climate model uncertainty distribution, which has implications for surface warming, ozone recovery and the tropospheric circulation response under climate change.

Future increases in stratospheric water vapour risk amplifying climate change and slowing down the recovery of the ozone layer. However, state-of-the-art climate models strongly disagree on the magnitude of these increases under global warming. Uncertainty primarily arises from the complex processes leading to dehydration of air during its tropical ascent into the stratosphere. Here we derive an observational constraint on this longstanding uncertainty. We use a statistical-learning approach to infer historical co-variations between the atmospheric temperature structure and tropical lower stratospheric water vapour concentrations. For climate models, we demonstrate that these historically constrained relationships are highly predictive of the water vapour response to increased atmospheric carbon dioxide. We obtain an observationally constrained range for stratospheric water vapour changes per degree of global warming o f 0 .3 1 ± 0 .3 9 ppmv K −1 . Across 61 climate models, we find that a large fraction of future model projections are inconsistent with observational evidence. In particular, frequently projected strong increases (>1 ppmv K −1 ) are highly unlikely. Our constraint represents a 50% decrease in the 95th percentile of the climate model uncertainty distribution, which has implications for surface warming, ozone recovery and the tropospheric circulation response under climate change.
The stratosphere is extremely dry. This was first realized by Alan Brewer in his pioneering analysis of balloon measurements in the 1940s, where he reported that the atmospheric water content is found to fall very rapidly just above the tropopause 1 . It is now well established that average stratospheric specific humidity is around 3-5 parts per million volume (ppmv) globally, with substantial daily to decadal variations driven by volcanic eruptions 2,3 , convective overshooting 4 , monsoonal circulations 5 and climate modes such as the El Niño-Southern Oscillation Article https://doi.org/10.1038/s41561-023-01183-6 atmospheric circulation [22][23][24][25][26] , but also because of its key role in shaping atmospheric chemistry and stratospheric ozone recovery [27][28][29] . Changes in the thickness of the ozone layer, in turn, affect the tropospheric photochemical environment, air quality, human health and ecology 30,31 .
Historically, tropical lower SWV observations show-if anything-a slight decrease over the last three decades (Supplementary Fig. 1; refs. 15,32), at least until the recent Hunga-Tonga eruption 33 . In contrast, the majority of climate models show long-term increases in historical simulations (Supplementary Figs. 1 and 2;refs. 17,21). Given substantial model biases in background concentrations and seasonal cycle representations of SWV and UTLS temperatures 17,21 , one might therefore ask if these models can reliably project SWV for future scenarios.
(ENSO) 6,7 and the Quasi-Biennial Oscillation (QBO) 8,9 . Brewer also already suggested that the dryness of the stratosphere can be explained by a large-scale stratospheric overturning circulation nowadays referred to as the Brewer-Dobson circulation (BDC) 10 , where air is freeze dried to very low concentrations as it enters the stratosphere through the cold tropical upper troposphere and lower stratosphere (UTLS).
The freeze-drying process is now better understood than ever, for example, using air parcel trajectory models of the tropical UTLS 5,[11][12][13][14][15][16] . However, despite this qualitative understanding, there is still substantial model uncertainty in projections of future changes in stratospheric water vapour (SWV) [17][18][19][20][21] . The uncertainty is a pressing concern, not only because SWV is a greenhouse gas affecting surface temperature and the Year Year  reanalysis temperature (T) data from the European Centre for Medium-Range Weather Forecasts (ECMWF) for July 1994 at all five pressure levels used to predict tropical lower SWV (q strat ). b, Example of a temperature time series for the grid location highlighted in yellow on the 100 hPa map in a; the yellow dot indicates the July 1994 value. All 5° × 5° grid points within 60° N-60° S serve as predictors for q strat (t), defining a temperature matrix T(t). c, The observational q strat (t) time series. Using ridge regression, we learn predictive relationships between T covering all five pressure levels and q strat , considering simultaneous and lagged temperature data (τ max = 2 months). This process is applied to 150 combinations of temperature reanalyses and versions of q strat observations from 1990 to 2020 (Methods). The result is an ensemble of predictive functions that are consistent with the observational record. We then learn equivalent functions from data produced by 27 climate models from the CMIP5/6 archives, that is, using data of the same spatial and temporal coverage for T and q strat . d, For these CMIP models, the predictive skill of the functions can be evaluated under strong climate change scenarios. As illustrated here for four CMIP models, this is achieved by comparing the annually averaged predictions (red) with actual abrupt-4 × CO 2 simulation results (grey). From these comparisons, a framework-related uncertainty is estimated characterizing imperfections in the statistical-learning predictions under 4 × CO 2 , which needs to be smaller than the model uncertainty to be constrained. To account for different levels of global warming simulated by each model, the q strat response (in ppmv; here shown relative to average 1990-2020 levels) is normalized by the change in global mean surface temperature (T g ). Colour-coded inset values for m represent the linear regression slopes (ppmv K −1 ) and r the correlation coefficients.
Article https://doi.org/10.1038/s41561-023-01183-6 In addition, it is unclear if models that better match observations 17,18 can be trusted more in their projections because past biases do not always translate into future projections 21,34 .
Here we introduce a statistical-learning framework to derive an observational constraint on these uncertain model projections. We estimate high-dimensional regression functions to predict tropical lower SWV from the UTLS temperature structure (Fig. 1), given the aforementioned link between UTLS temperatures and tropical dehydration. UTLS temperatures integrate the effects of a large number of processes affecting air dehydration, either directly or indirectly 8,15,18,35 . Our primary interest therefore is to quantify known relationships between the UTLS temperature structure and tropical lower SWV 9,36 but in a novel way that allows these relationships to hold under strong climate change scenarios. This, in turn, will open up new pathways for estimating observational constraints on future projections. Our analysis will not consider SWV production from changes in stratospheric methane concentrations, which gains importance in the middle to upper stratosphere 37,38 . To minimize such influences on our results, we focus on the tropical lower stratosphere at 70 hPa, that is, water vapour just above the tropical cold-trap region where air dehydration takes place 39 .

Learning predictive relationships from historical data
We aim to learn predictive functions f, ultimately characterized by their coefficients Θ which predict 30° N-30° S average, monthly and zonal mean SWV (specific humidity, in ppmv) at 70 hPa, mimicking frequently used indices characterizing water vapour entry rates through the tropical cold-trap region 17,18,35 . Hereafter, we will refer to this quantity as q strat . dT ijk is the standard-scaled monthly mean temperature (that is, zero meaned and scaled by its own σ over the training period) at 5° × 5° latitude-longitude grid points indexed by (i,j) within one of p = 5 atmospheric levels (250, 200, 150, 100, 70 hPa) indexed by k, covering a latitudinal range of 60° N-60° S (Fig. 1a).
f predicts q strat at time t with high skill and we cross-validated its performance for various regression specifications (Extended Data Fig. 1). These tests included the choice of pressure levels, latitude range and number of time lags for T. Unsurprisingly, we found that predictive performance improves if we also consider temperature data from the two preceding months (that is, τ max = 2 months), being reflective of the slow vertical ascent of air through the tropical tropopause layer 12,35 . We additionally apply logarithmic transformations to the q strat data as to approximately account for nonlinearity in the T-q strat relationships. A central concern when learning such high-dimensional regression functions from a relatively small number of (observed) monthly samples is to avoid overfitting. To manage this issue, we here use ridge regression 40 , similar to an approach recently applied successfully to constrain the global cloud feedback on climate change 41 .
We then learn different functions f from sets of T and q strat time series from both observations and climate models (Fig. 1b,c). As proxies for observations, we use the Stratospheric Water and OzOne Satellite Homogenized (SWOOSH) 42 q strat dataset and three reanalysis products for temperature and remove months when q strat observations are missing or unreliable (for example, related to the Mount Pinatubo eruption; Methods). The use of multiple reanalysis products and of SWOOSH uncertainty estimates allows us to incorporate the effects of measurement uncertainty in our constraints. For climate model data, we use simulations covering the same historical period or slightly shifted (depending on data availability) from the Coupled Model Intercomparison Project phases 5 and 6 (CMIP5/CMIP6; Methods). We treat each CMIP dataset in the same way as SWOOSH by masking equivalent months. Many models do not achieve realistic amplitudes of q strat variability 17,21 , leading to insufficiently clear signals for ridge regression to learn from. We therefore sub-select 27 models that at least approximate the SWOOSH variance (Methods). This selection is still meant to sample the model uncertainty in the T ijk and q strat responses across CMIP models so that our observational constraint will be based on better estimates for the learned parameters Θ. We obtained high predictive skill for each f on historical time slices not used during training and cross-validation (typically r 2 scores > 0.8). However, this is unsurprising given the central role temperature plays in setting present-day q strat . More challenging is the aim to use the functions trained on historical data to predict q strat responses under increased greenhouse gas forcing, that is, that the past relationships also hold in significantly warmer climates. Such climate-invariant functions open up new pathways to observationally constrain the q strat response to climate change. We note that previous studies developed climate index-based multiple linear regression (MLR) methods to analyse SWV variability and trends in observations and models 9,15,18,35,43,44 . On the basis of small sets of indices, these valuable tools can explain large fractions of variance and have been used to infer drivers of SWV changes under climate forcing 18 . However, these MLR methods do not achieve the predictive performance of ridge regression under extrapolation ( Supplementary Fig. 3), underlining the value of statistical learning for deriving our observational constraint. We further see advantages in exploiting only well-observed UTLS temperatures as predictors, whereas index-based methods typically require BDC metrics that are not a widely available CMIP output, allowing for evaluation of a greater number of models.

The observational constraint
We evaluate the extrapolation idea in a perfect-model setting: the 27 functions trained on historical CMIP data are used to predict q strat under abrupt-4 × CO 2 forcing, using the T 4 × CO 2 from the corresponding CMIP simulations as predictors. In Comparing the predictions to the actual CMIP results, we find excellent agreement across the multi-model ensemble (r = 0.90; Fig. 2a), indicating that the historical T-q strat relationships also hold well under strong greenhouse gas forcing and opening up a path to an observational constraint in three steps: (1) Given three reanalysis datasets and n = 50 iterations of SWOOSH q strat time series with varying added noise patterns (Methods; to estimate sampling and measurement uncertainty), we arrive at 3 × 50 = 150 statistical-learning functions f obs consistent with the observational record, each with its own set of coefficients Θ obs . (2) Using these f obs , we combine the uncertainty contributions introduced due to spread in the Θ obs and in the CMIP temperature responses T 4 × CO 2 (Methods), leading to a probability distribution for the observational prediction (shown along the x axis of Fig. 2a; solid red curve). (3) Finally, this distribution is convolved (Methods) with the framework-intrinsic prediction error evident from the scatter around the one-to-one line in Fig. 2a. The result is the probability distribution characterizing the observational constraint (shown along the y axis). Figure 2a demonstrates the large spread across the CMIP q strat responses (0.08 to 1.41 ppmv K −1 ), as compared to 0.31 ± 0.39 ppmv K −1 (90% confidence interval) for our observational constraint. The large model uncertainty is illustrated even more clearly in Fig. 2b for 27 CMIP5 models (red), 34 CMIP6 models (blue), the combined CMIP5/6 ensemble (black) and for the selected 27 CMIP5/6 models (grey). Ten CMIP models simulate trends of approximately 1.0 ppmv K −1 or larger, that is, well outside the observationally plausible range (orange). While the median across all 61 CMIP models (0.52 ppmv K −1 ) is still within typical uncertainty bounds, a substantial number of models are highly likely to overestimate the q strat feedback under global warming. Considering typical confidence intervals for our constraint with 0.08 to 0.54 ppmv K −1 (17% to 83%), −0.09 to 0.69 ppmv K −1 (5% to 95%) and −0.17 to 0.77 ppmv K −1 (2.5% to 97.5%), we find that about one-fifth (13) of the models exceed even the 97.5th percentile of the constraint and almost half of the models (27) exceed the 83rd percentile (upper end of the thick orange bar in Fig. 2b).
Notably, the observational constraint includes small negative q strat responses not seen in the models. It is probably unsurprising that negative feedbacks cannot be entirely ruled out given historically (rather) negative trends under global warming, limited sample size and possible external interferences not removed by data pre-processing (for example, remaining effects of volcanic eruptions). A negative SWV trend could also be driven by a strong BDC response, which would act to cool the tropical UTLS under CO 2 forcing 15,18,35 . The fact that we find a robustly positive 50th percentile for the constrained response underlines our hypothesis that the framework does not merely reproduce historical trends but can indeed learn approximately climate-invariant T-q strat relationships from internal variability (for example, related to QBO or ENSO) instead of being negatively impacted by it.

Emulation of the historical record and inference
We now ask if CMIP models are, in principle, able to reproduce observed q strat variability. We use the 27 functions trained on CMIP data to emulate the historical record of q strat anomalies, given temperature data from the reanalyses as predictors ( captured relatively well. The implication is that if provided with realistic UTLS temperature fields, most CMIP models would display the correct q strat tendencies. However, we also highlight that the year-to-year variability of the statistical-learning predictions typically takes on amplitudes substantially larger than those observed, in agreement with our key result of overly sensitive T-q strat relationships already seen in their abrupt-4xCO 2 responses. To detect the origin of these overestimated T-q strat sensitivities, we highlight the use of the statistical-learning functions for understanding model-observation discrepancies, for example, by visualizing the parameters Θ (Fig. 4 and Supplementary Figs. 9-11). This is possible because in ridge regression the absolute magnitude of each coefficient is proportional to its estimated prediction importance, that is, larger size coefficients imply greater importance. While interpreting these coefficient maps is non-trivial, we point out a few emerging patterns. For Θ obs , independent of the reanalysis dataset used, we, for example, find at 100 hPa (Fig. 4a) and below features suggestive of influences by tropical circulation anomalies, possibly ENSO 7,9,18 , in particular around the Maritime Continent and above the Eastern Pacific. The largest coefficients occur in a narrow band across the inner tropics (10° N-10° S) from 100 hPa upwards in agreement with the well-understood slow vertical ascent of tropical air masses through this cold-trap region (Fig. 4a,b). Zonal mean latitude-height cross sections of Θ show a clear upward progression of predictive information over time (Fig. 4c,d), supporting the view that the functions correctly identify the main characteristics of the underlying coupling between the large-scale circulation and tropical UTLS dehydration. Crucially, the CMIP multi-model mean Θ (Fig. 4b,d) strongly overestimates the inner tropical relationships between T and q strat , underlining the over-sensitivity of CMIP models on average. For CMIP models, the peak positive inner tropical Θ (probably representing the tape-recorder signal 36 ), additionally maximizes at 100 hPa without further growth with altitude, contrary to the Θ obs . We speculate that this discrepancy could be caused by the low vertical resolution of many CMIP models around the tropical tropopause. The Θ maps also uncover a few other intriguing discrepancies, including a pattern of large negative Θ obs at 100 hPa across the North Atlantic, which is part of a general strong positive to negative, tropics to extratropics gradient in Θ obs not reproduced in the CMIP mean. However, similar, or even clearer, patterns do occur in many individual models ( Supplementary Figs. 12-15). A reason might be the modulating role of the subtropical jet streams and their induced mixing barriers on tropics-extratropics SWV exchange whose strength will also depend on UTLS temperature gradients. To test such hypotheses, and to distinguish significant patterns in the coefficients from noise, we below recommend future modelling experiments to design systematically perturbed datasets to train ridge regressions on.

Constraint on the radiative feedback and implications
In conclusion, we have derived an observational constraint for changes in tropical lower SWV per degree global warming of 0.31 ± 0.39 ppmv K −1 (90% confidence interval). This constraint on current modelling uncertainty has important implications for the stratospheric feedback onto climate change 19,20,26 and for the recovery of the stratospheric ozone layer [27][28][29] . Indeed, our framework opens up new routes to the process-oriented evaluation of, and observational constraints on, state-of-the-art climate model projections. As such, we recommend its use as a complement to easily interpretable, but only analytically applicable, climate index-based regressions 9,15,18,35,43 . Our observational constraint is possible only through a highly effective statistical-learning approach to estimate climate-invariant relationships between UTLS temperatures and SWV from the still very limited record of SWV observations. In a perfect-model setting, we have confirmed that these relationships also seem to hold under large CO 2 forcing and are robust to the presence (or absence) of, for example, historical changes in aerosol and methane-related interferences. Our results reveal a widespread over-sensitivity in CMIP models of tropical lower SWV to changes in UTLS temperatures. In particular, our constraint implies that frequently modelled large increases per degree global warming > 1 ppmv K −1 are highly unlikely. Strikingly, around a quarter of CMIP models exceed even the upper 95th percentile of our constraint. Given the 90% range of model responses (0.18-1.41 ppmv K −1 ), our constraint represents a 50% decrease in the 95th percentile of the climate model uncertainty distribution and a narrowing of 37% of the overall 90% range (which includes small negative responses not found in CMIP).
Whereas the 27 models exhibit a wide range of total radiative SWV feedbacks of 0.091-0.256 W m −2 K −1 (90% confidence interval, with a median of 0.18 W m −2 K −1 ; Methods and Supplementary Table 1), we can apply our constraint on tropical lower SWV to also constrain this uncertainty by 30% to 0.086-0.201 W m −2 K −1 (median = 0.14 W m −2 K −1 ; Extended Data Fig. 3). This estimate represents an uncertainty reduction of 0.05 W m −2 K −1 , which is comparable to the effects of changes in biogenic volatile organic compounds or ozone (table 6.8 in ref. 46) and thus of relevance to policymakers. Our work further opens up new pathways for constraining the effects of changes in SWV on catalytic ozone-depletion cycles 28,29 , Arctic amplification, the North Atlantic Oscillation, the stratospheric circulation and the tropospheric jet streams 24,26 .
Finally, we highlight the urgent need to identify and address model-dependent root causes of the over-sensitivity, such as UTLS temperature biases 7,17,21 , atmospheric chemical feedbacks 19 , QBO influences on UTLS temperature variability 8 or unrealistic diffusivity of water vapour across the tropical tropopause 47 . Such efforts might benefit from analyses of how a variety of stratospheric (de-)hydration mechanisms 4,39 affect results within our novel observational constraint framework, for example, by learning from data produced in perturbed-physics ensembles or following targeted climate model tuning 48,49 . In particular, processes that might not (yet) be included in climate models pose potential blind spots to our perfect-model validation approach (Fig. 2a). We also highlight that our framework could be extended to address other key uncertainty factors in stratospheric climatology and atmospheric chemistry. A non-exhaustive list includes extratropical SWV trends, especially those in the radiatively important lowermost stratosphere 2,20,50 , and trends in lower stratospheric ozone 19 .

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41561-023-01183-6.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Water vapour observations and their uncertainty estimates
For SWV observations, we use the global Stratospheric Water and OzOne Satellite Homogenized (SWOOSH) 42 dataset, which includes vertically resolved water vapour data from a subset of the limb-profiling satellite instruments operating since the 1980s. SWOOSH is designed to accurately reproduce monthly average variability present in the underlying data. We select the variable combinedanomfillh2oq at 68 hPa, which is an anomaly-filled zonal mean specific humidity field in parts per million volume (ppmv) at 10° latitude resolution. We spatially weight (cosine-weighting for latitudes) and average (latitude-longitude) the field for points within 30° N to 30° S to obtain a representation of tropical lower SWV.
For our analysis, we consider SWOOSH v2.7 data covering the period from January 1984 to including December 2020. The version incorporates recent improvements in Earth Observing System Aura Microwave Limb Sounder (MLS) data 51 . However, before the availability of Aura MLS satellite data (September 2004-present), the SWOOSH dataset has a high number of missing data in the tropics. For the anomaly-filled version of SWOOSH, missing data were filled using a procedure that made 2D latitude-time linear interpolations for each month on deseasonalized anomalies using information from adjacent grid cells for which data existed; the seasonal cycle was added back on after filling in the latitude-time plane at each pressure level 42 . However, the interpolation was not evaluated for potential biases introduced by the procedure so that the use of the filled product for our statisticallearning approach introduces an additional uncertainty factor.
Here we make a first-order estimate of additional biases for the pre-MLS period by using the MLS period where sampling is high and effectively unbiased by latitude and time and masking these data as for the pre-MLS period. For each month for which at least one sample exists in the 30° S and 30° N latitude band in the pre-MLS period, we identify the month of that year in all Aura MLS data, mask the data as for the month of interest and estimate the bias introduced by integrating, weighting by latitude over 30° S to 30° N where data exist, and by comparing with the 'true' unbiased MLS/SWOOSH value. From this, we estimate a mean and standard deviation of the bias assuming Gaussianity.
For our final uncertainty calculations, we dropped all months from the SWOOSH dataset for which there was not at least one sample measured within 30° S to 30° N, which reduces the total number of samples (months) considered from 444 to 315. In effect, this also removes all SWOOSH data before January 1990. For the remaining 315 months, we estimate the uncertainty introduced by the biases of the anomaly-filling method by sampling (in addition to the uncertainty provided by SWOOSH and assuming a normal distribution) from the standard deviation in the MLS bias estimates outlined above and adding this to a random normal sample from the standard error of the SWOOSH data itself, that is, σ/√N, latitude-weighted by the sum of the squared errors.
Here N is the overall measured number of samples per monthly data point considered 42 . Adding these randomly drawn estimates for each month to the original filled SWOOSH time series yields sets of 'sampled time series'. Here we use n = 50 such randomly drawn time series to estimate the effects of the sampling biases on our overall uncertainty estimates. We find that the effect of sampling biases are small to negligible for the overall uncertainty estimation but we still include these error estimates in our uncertainty analysis for completeness.

Temperature data
To approximate observations for UTLS temperatures, we use three different reanalysis datasets for temperature at 250, 200, 150, 100 and 70 hPa over the same time period: ERA5 (ref. 52), MERRA-2 (ref. 53) and JRA-55 (ref. 54). For MERRA-2, we do not include the year 2020 as the corresponding data could not be found in the archive used at the time of writing (Data Availability). For learning the observational constraint functions, we combine each of the three reanalyses once with each of the n = 50 SWOOSH randomly drawn time series, resulting in 150 functions overall. From these 150 functions, we derive a first observational uncertainty estimate on predictions under 4 × CO 2 forcing by providing each function once with the modelled (and standard-scaled) monthly mean temperature profiles found under 4 × CO 2 for the 27 selected CMIP models (steps also described in the main text).

CMIP data
We consider climate model data from both the CMIP5 (ref. 55) and CMIP6 (ref. 56) archives. In total, this amounted to 61 models for which we found data for all required scenarios: An overview of all CMIP models, including individual model references and T g -normalized q strat feedback values, is provided in Supplementary Table 2. Equivalent results for normalization by 20° N -20° S temperature at 100 hPa are tabulated in Supplementary Table 3. For each model, we used variable output for 30° S-30° N average zonal mean specific humidity (hus) at 70 hPa and air temperature (ta) at 250, 200, 150, 100 and 70 hPa. To train the ridge regressions, we combined atmosphere-ocean-coupled historical simulations from 1 January 1984 onwards with Representative Concentration Pathway 4.5 (RCP4.5)/ Shared Socioeconomic Pathway 3-7.0 (SSP3-7.0) scenarios. The future RCP scenarios were selected as to maximize the number of models for which we could match the observed period within either CMIP archive given that scenario differences across the period 2005 (end of historical simulations for CMIP5) to 2020 (end of observed period used here) are negligible for our calculations. The same two variables plus surface air temperature (tas) were extracted for the same set of models for the abrupt-4 × CO 2 simulations. In all cases, we use only the first available ensemble member for each model.
It is well known that tropical UTLS water vapour variability is not represented well in many atmospheric models, both in terms of the timing and amplitude of the seasonal cycle and/or variations relative to it 17,21 (Supplementary Figs. 1 and 2). A concern of particular importance for the statistical-learning process employed here are cases where variability is substantially underestimated, because this will reduce the ability of ridge regression to learn meaningful T-q strat relationships, especially if the goal is to extrapolate potentially very large abrupt-4 × CO 2 responses 41,57,58 . We therefore include only CMIP models that represent at least 95% of the observed variance found for SWOOSH across the 315 potential training samples in our calculations (Fig. 2a) in the main text. These 27 models are: Analyses equivalent to the one shown in Fig. 2 but for other choices of percentage of observed variance thresholds (0%, 50%, 80%, 90%) are provided in Supplementary Fig. 16.
For a few of the selected models, the relevant SWOOSH period ( January 1990 to December 2020) could not be matched with a consistent set of simulations. Instead, we sampled equivalent months from their historical simulations only. For CMIP5, this concerns MPI-ESM-P for which we considered the period 1968-2004, amounting to the same number of samples (note that, for example, the period 1984 to 1990 is excluded according to the data mask derived from SWOOSH).

Statistical-learning framework
For each CMIP model and SWOOSH/reanalysis pair of specific humidity and temperature data, we train a predictive function f (see equation (1)). The exclusion of lags or the addition of time lags longer than τ max = 2 months do not further improve the performance (Extended Data Fig. 1). To quasi-linearize the T-q strat relationships, we apply the natural logarithm to the specific humidity data, which also improves the overall predictive performance of the learned functions, in particular, under extrapolation (Extended Data Fig. 1c).
Here we use temperatures within 60° N-60° S at each of the five atmospheric pressure levels as predictors. Our set-up is constrained by our empirical results that extending the area of predictors to the polar regions did neither improve the predictive performance on historical test data nor on the abrupt-4 × CO 2 simulations (Extended Data Fig. 1d,e). However, in particular for observations, we received the best cross-validation results on historical data when using 60° N-60° S instead of only tropical (30° N-30° S) temperatures. In a classic statistical-learning set-up of training, cross-validation and separate testing, we therefore chose the best performing configuration for the historical cross-validation data also for the abrupt-4 × CO 2 'test' scenario. We also explored the sensitivity of the extrapolation results to a longer training period (Extended Data Fig. 1f) and to the number of pressure levels at which temperature is considered as predictor (seven/ three/one in Extended Data Fig. 1g,h,i). As another simplification and to speed up the learning process, we interpolated the temperature data for each CMIP model and reanalysis dataset to a common 5° × 5° (latitude × longitude) grid. This coarser spatial resolution also allows us to homogenize the predictor resolution for all temperature datasets, which is necessary to later combine different sets of temperature predictors and ridge coefficients Θ for the observational constraint.
To estimate the coefficients Θ, we use ridge regression 40 , which here minimizes the cost function over 315 monthly mean samples indexed by t. The total number M of temperature predictors is 25,920 (5 levels × 24 latitudes × 72 longitudes × 3 months for maximum lag τ max = 2). This large number of predictors, especially given the limited length of the observational record, would lead to overfitting using multiple linear regression (MLR). Next, to avoiding overfitting, ridge regression is also known for its good performance in managing ill-posed problems with many collinear predictors 41,57 . Note that the first term in equation (2) is the MLR least squares error, which, as discussed, tends to overfit the data given large M. Ridge regression addresses overfitting through the second l 2 -norm regularization term, which penalizes large absolute values for Θ, modulated by the choice for the regularization parameter α. To approximate optimal α, we use fivefold cross-validation searching over α ∈ [0.0001, 0.0003, 0.1, …, 1 × 10 9 ] and evaluate according to the r 2 scores (coefficients of determination; ref. 58 provides a detailed explanation) as defined by Python's scikit-learn package 59 across the historical validation sets. This general search range for α was determined incrementally following tests showing that larger and smaller values for α would never be selected during cross validation. As mentioned above, we standardize temperature time series at each grid point to zero mean and unit standard deviation (over the historical period) to ensure that they are considered equally and so that the absolute magnitudes of the resulting sensitivities are reflective of their relative physical importance 57 . When combining Θ derived from SWOOSH/reanalysis pairs with CMIP temperature responses under 4 × CO 2 , we therefore re-scale the temperature fields according to the grid point σ values of the reanalysis dataset to represent the relative amplitude of the CMIP modelled temperature anomalies consistently. Due to the standard scaling of temperatures and our focus on the SWV response per degree warming, we do not carry over baseline model biases in mean values of temperature and humidity into our observational constraint.

Calculation of framework-related uncertainty
We follow a similar approach to Ceppi and Nowack 41 , in which the uncertainty in the constraint is calculated in several steps. First, we obtain a probability distribution of the observational prediction (x axis of Fig. 2a; solid red curve) by combining the uncertainties in Θ obs , denoted σ Θ , with those due to the different CMIP 4 × CO 2 temperature responses, σ T . For this, we first linearly combine all of the 150 estimates of Θ obs with each of the 27 CMIP T 4 × CO 2 fields, leading to 4,050 observationally constrained T g -normalized q strat predictions. To obtain σ Θ , we first take the multi-model mean over all predictions made using the same set of observed coefficients and subsequently calculate the standard deviation of these 150 samples. We follow the same procedure for σ T but now averaging estimates involving the same UTLS temperature response, calculating the standard deviation of the resulting 27 estimates. These uncertainties are then combined in quadrature, σ p = √σ Θ 2 + σ T 2 , to yield the uncertainty for the observational prediction q strat,p .
Next, this observational prediction uncertainty is convolved with the prediction error, calculated via standard least squares regression formulas 60 , whose 5-95% interval are represented by dashed red curves in Fig. 2a. This yields a probability distribution for the actual normalized q strat response q strat,a on the y axis of Fig. 2a: P(q strat,a ) = ∫ +∞ −∞ P(q strat,a |q strat,p )P(q strat,p ) dq strat,p (3) where the conditional probability P(q strat,a |q strat,p ) represents the prediction error. P(q strat,a ) is calculated numerically by Monte Carlo sampling, with a sample size of 10 7 , and we apply a Gaussian kernel smoother to the result with a standard deviation of 0.01 ppmv K −1 to obtain the final probability distribution.

Stratospheric water vapour feedback calculation
We approximate the implications of our tropical lower SWV constraint for also constraining the overall SWV radiative climate-feedback

Nature Geoscience
Article https://doi.org/10.1038/s41561-023-01183-6 parameter 20,46 . This is justified by our empirical finding that the q strat metric is highly correlated with SWV feedback parameters estimated from radiative transfer calculations (Pearson's r = 0.85; Extended Data Fig. 3). For this purpose, we combined the feedback parameters for the six selected CMIP5 models calculated by Banerjee et al. 20 and ran additional calculations for the 21 selected CMIP6 models. We then followed the same regression approach as taken for the observational constraint in Fig. 2a but replacing the variable along the y axis by the SWV feedback parameters. With this procedure, we obtain an observationally constrained 90% confidence interval for the feedback parameter of 0.086-0.201 W m −2 K −1 , equalling an uncertainty reduction of 0.05 W m −2 K −1 over the 90% confidence interval (0.091-0.256 W m −2 K −1 ) for the 27 CMIP models. We computed the SWV feedback for the CMIP6 models using the Parallel Offline Radiative Transfer programme 61 and following the procedure outlined in Banerjee et al. Briefly, for each model we computed the SWV change between the abrupt-4 × CO 2 and pre-industrial control simulations, based on the last 50 years of each simulation. We input these water vapour fields into the Parallel Offline Radiative Transfer programme to compute the stratospherically adjusted net tropopause radiative flux change for each model individually using the fixed dynamical heating approximation. Then, the SWV feedback (in W m −2 K −1 ) is computed by dividing the tropopause radiative flux change by the global mean surface temperature change (again, averaged over the last 50 years of each simulation).