Introduction

Most environmental impacts result from combinations of multiple weather and climate drivers that are referred to as compound events1,2,3. Given the complexity of the climate system, such events can occur in multiple ways, for example involving extreme conditions in different variables happening simultaneously or across space and time4. For instance, in 2014/2015, Rio de Janeiro (Brazil) was hit by concurrent extremely hot and dry summer conditions, which led to wide-ranging socio-economic repercussions that exceeded the impacts that would have been caused by heat and dryness in isolation5. The hot-dry weather favoured widespread wildfires as well as water scarcity that impacted coffee production5. Furthermore, excess fatalities due to a severe dengue fever outbreak were linked to the increase in water storage tanks installed by the population to mitigate the drought5. In a similar fashion, impacts from hurricanes typically arise from concurrent hazards via precipitation-driven flooding and wind-driven hazards such as storm surges, as happened when Hurricane Ida led to destruction in Louisiana (United States) in 2021, causing about 30 fatalities6. Extreme impacts can also arise from sequences of weather events that increase system vulnerability or lead to disproportionate damages, as exemplified by ecosystem7 and agricultural8 impacts from the two European drought years in 2018 and 2019, and by the critically reduced water storages in South Africa due to a sequence of three dry winters in 2015-20179,10. Furthermore, combinations of extremes occurring simultaneously (or almost simultaneously) across multiple regions can cause extreme impacts to connected global systems11. For example, simultaneous soil moisture droughts across multiple breadbaskets can reduce global food production and threaten food security12,13,14,15.

Compound events are thus characterized by complex multivariate relationships. Recognizing that a univariate perspective on hazards may severely underestimate risk16 led to a rapidly growing body of literature on different types of compound events3,4,17, including literature on interconnected, interacting, and cascading risks18. However, the limited temporal length of observations and routinely used model simulations often limits the possibility of identifying and studying the sparse multidimensional constellations of climate variables constituting compound events, an issue related to the curse of dimensionality1,19,20,21. Combining routinely used single-member simulations of multiple climate models, such as the standard multi-model ensembles from the coupled model intercomparison project (CMIP)22, offers larger sample sizes but leads to confounding irreducible uncertainties arising from internal climate variability (the inherent chaotic nature of climate) and from potentially reducible uncertainties arising from structural differences across models23,24,25,26. Robustly distinguishing between these uncertainties is important, however, as it can ultimately help reduce uncertainties from structural model differences25,26.

In this perspective, we argue that using output from climate Single Model Initial-condition Large Ensembles24 (SMILEs) should become standard for studying and projecting compound events. A SMILE consists of many simulations (i.e. ensemble members) from a single climate model based on the same model physics and under the same external forcings, but each starting from slightly different initial states. Thus, each realization in the ensemble evolves differently solely due to internal climate variability. Combining the multiple members of a SMILE ensures therefore a much better sampling of the data sparse regions associated with compound events compared to traditional model simulations27. Using a combination of different SMILEs then allows for identifying model differences.

Here we propose six important topics in compound event research and illustrate associated challenges. Each of the following six sections provides information on the importance of one of the six topics and demonstrates, based on our analyses and the literature, both the challenge of addressing the topic and how using SMILEs can be of help in this context. Specifically, we demonstrate that SMILEs are important to (i) robustly assess compound event frequencies and associated uncertainties; (ii) quantify the extent to which anthropogenic climate change has contributed to observed compound events; (iii) provide robust quantification of historical and future changes in compound events; (iv) quantify and explore uncertainties in future projections, including distinguishing between uncertainties arising from internal climate variability and structural model differences; (v) identify most dangerous low-likelihood compound events, advancing their physical understanding, and exploring associated risks; and (vi) evaluate the skill of advanced statistical models developed to improve the assessments of compound events. Via analyses related to four different compound event types, we illustrate how these topics can be addressed using SMILEs. The considered compound event types are compound hot-dry warm-season events, concurrent precipitation and wind extremes, multi-year soil moisture droughts, and concurrent soil moisture droughts across global soybean breadbaskets. See Table 1 for a comprehensive list of the six research topics and the associated analyses related to different compound events.

Table 1 The six research topics and the associated analyses related to different types of compound events

Results

Estimating compound event likelihoods and associated uncertainties

An important challenge of climate studies is assessing the likelihood of extreme events26. This information ultimately serves decision-makers, international development agencies, and insurances to develop preparedness for climate impacts3. Employing the approach of Bevacqua et al.26 and focussing on the historical (1950–1980) frequencies of compound hot-dry warm seasons (fHD), concurrent daily precipitation and wind extremes (fPW), and multi-year (i.e., consecutive annual) droughts (fMYD), we illustrate that using a large sample size is essential for robustly estimating compound event probabilities. Using 3100 years of data from 31-year periods of the 100 ensemble members of the MPI-GE climate model (40 members of the CESM1-CAM5 model for fPW) allows for robustly identifying hotspot areas where compound event frequencies are higher than what would be expected when assuming that the compound drivers are statistically independent (see unstippled regions in Fig. 1a–c).

Fig. 1: Historical (1950–1980) frequency of compound events and associated uncertainties.
figure 1

a Ensemble mean frequency of compound hot-dry events (fHD; during the warm season) based on the MPI-GE model. Stippling indicates areas where the compound event ensemble-mean frequency is smaller than expected in a reference case that assumes independence between the compound drivers (here, independence between average temperature and precipitation during the warm season). d, g fHD of the ensemble members associated with the 5th and 95th percentiles amongst all the members, respectively, of the fHD averaged over the region in the green box. b, e, h As in panels a, d, g, but for concurrent daily precipitation and wind extremes (fPW) and based on the CESM1-CAM5 model. c, f, i As in panels a, d, g, but for the frequency of three consecutive annual soil moisture droughts (f3YD). Univariate extremes were defined via percentile-based thresholds, i.e. hot and dry seasons occurring every 10 years on average, precipitation and wind events occurring twice a year, and soil moisture droughts every 5 years (see Methods).

Importantly, if such frequencies were estimated based on much smaller sample sizes contained in typically available model simulations and observations, they would be highly uncertain and could ultimately lead to misleading risk estimates. For example, based on 31 years of data from a single ensemble member, someone interested in compound hot-dry event risk over Southern Asia could obtain regionally-averaged fHD ranging from 2% to 6% (centred 90% range across the ensemble members), i.e. differing by a factor of 3 (Fig. 1d, g; note that frequencies at the local scale can even range from about 0% to 10%). That means, depending on the data sample selected at random, a wide range of possible risks may be estimated. Similarly, fPW estimates could differ by a factor of 2 on average over Portugal, while fMYD by a factor of 12 over Central North America. Hence, based on a short sample of data, in principle, a low compound event risk may be estimated over regions that are, instead, at high risk. This sampling uncertainty stems from the fact that 31-year periods are, in this case, highly insufficient to sample internal climate variability, resulting in a wide range of possible frequency estimates across single ensemble members26. Similarly, this sampling uncertainty is also inherent in estimates derived from short observational records. SMILEs allow users to obtain a robust model-based estimate of the compound event probability (i.e., the frequency computed from many years of data obtained by pooling multiple ensemble members) and allow for a quantification of how uncertain compound event frequency estimates are based on a given sample size. The latter can be estimated as the range of frequencies across single ensemble members. The differences in compound event frequencies across different ensemble members are likely to partially arise from fluctuations in climate modes of variability that influence regional weather over decadal scales. For example, Atlantic Multidecadal Variability modulates hot and dry conditions in Southern Asia28 and droughts in Central North America29. SMILE-based analyses can help to assess the influence of the modes of variability on the compound events12.

It is important to note that the uncertainties in compound event estimates are substantially larger than for univariate extremes independently of the considered sample size, highlighting that employing a large sample size is much more important for compound than for univariate events. This is exemplified by the relative uncertainty in the compound frequency, which is defined here as the ratio of twice the standard deviation to the mean of the frequency across ensemble members26. Based on 31 years of data, this relative uncertainty is 2.4 and 1.1 on average over land for compound hot-dry events and univariate hot events, respectively (see continuous lines in Fig. 2a). To reach a relative uncertainty of 1.1 for compound events, 132 years of data would be required (see arrow in Fig. 2a). The results further highlight that the relative uncertainty in the compound frequency is higher at locations where the coupling between the compound drivers is weaker (compare dashed and dotted lines in Fig. 2a), that is, when the statistical dependencies make the compound event rarer, therefore more difficult to sample.

Fig. 2: Dependence of the uncertainty due to internal climate variability on the sample size.
figure 2

a Relative uncertainty due to internal climate variability in the frequency of univariate hot events fH (orange) and compound hot-dry events fHD (solid green). Dashed and dotted green lines show the relative uncertainty in fHD for grid points with the 35% strongest and weakest coupling, respectively (see Methods). Relative uncertainty is defined as the ratio of twice the standard deviation to the mean of the frequency f across ensemble members averaged over global landmasses (see Methods). b As in a, but for the frequency of wind extremes (fW, orange) and concurrent precipitation and wind extremes (fPW, green). c As in a, but for the frequency of a single-year soil moisture drought (f1YD, orange), two (f2YD, green) and three (f3YD, purple) consecutive annual soil moisture droughts, respectively. The climate models and the definitions of the extremes are the same as in Fig. 1.

The larger uncertainties for compound than for univariate events holds for very different event types, including compound precipitation and wind extremes (Fig. 2b), despite the fact that the daily scale of precipitation-wind events yields a much larger sample size (compared to compound hot-dry warm-season events). In line with the curse of dimensionality, when we consider multi-year droughts, the relative uncertainties increase with the dimensionality of the considered compound event (Fig. 2c). Hence, employing large sample sizes is even more important for estimating the occurrence of higher-dimensional compound events such as multi-year floods, co-occurring fire drivers30,31, concurrent droughts in multiple regions13, and spatially extended floods32. While arguably our examples are somewhat extreme by focusing on a sample size as small as 31 years, the issues remain for larger sample sizes (Fig. 2). Furthermore, we note that it is not uncommon in the literature that probabilities of compound events are estimated based on such small sample sizes33,34,35,36,37,38, including probabilities of high-dimensional events13,20,30,39. Shorter periods than 31 years are often used for projections at fixed global warming levels, e.g. in reports of the Intergovernmental Panel on Climate Change3.

Compound event attribution

In the aftermath of a high-impact weather event, the extent to which anthropogenic climate change has contributed to the event is regularly questioned40. Addressing this attribution question is important to gain a better understanding of how climate change has affected and might affect extreme events and their impacts41. One way to approach the question is to frame it in terms of a probability ratio PR, defined as the ratio of the probability of an extreme event occurring under current conditions (factual world) to the probability that the event occurs in a world without anthropogenic climate change (counterfactual world)42. Such probabilities can be estimated using climate model simulations, for example by comparing runs simulating both worlds43, or by using approaches based on non-stationary extreme value theory, where the occurrence of extreme events is fitted against a covariate such as global mean temperature44. Values of PR  >  1 imply that climate change contributed to a certain event occurrence; for instance, PR  =  2 implies that climate change made a certain event class (typically defined as all events exceeding a critical threshold) two times more likely. So far, attribution studies have mainly treated events as univariate, however—because many climate-related impacts are caused by a combination of multiple drivers—studies on compound event attribution are emerging43,44,45,46.

As attribution studies typically focus on very extreme events, they are often based on large ensemble climate model simulations. For example, the weather@home experiment, a large ensemble of simulations of the regional atmosphere-only climate model HadRM3P47 has been widely used by the attribution community. Other studies have relied on the multi-model CMIP5 and CMIP6 ensembles to gather enough data to isolate any climate change forced signal from internal climate variability noise, though the ability to disentangle internal variability from model differences ultimately depends on the number of runs available for each model24,48. The required ensemble size to estimate internal variability, which is needed to provide robust attribution assessments, depends on several factors, such as the spatio-temporal scale of the event, the variable, and the amplitude of the forced trend49.

Here, we demonstrate that for compound event attribution, larger sample sizes are required than for univariate event attribution, in particular if the drivers are not strongly correlated and have trends of similar magnitude. We build on the synthetic data approach developed by Zscheischler and Lehner (2022)43. For the sake of simplicity, we focus on events that increase in probability with climate change (Fig. 3a). That is, we calculate the required sample size for the attribution of concurrent high extremes of climate variables that show positive trends due to anthropogenic climate change such as coastal flooding followed by a heatwave50. We define the required sample size as the number that guarantees correctly identifying PR > 1 in the presence of a forced trend, assuming no trend in the dependence between drivers43 (see Methods). In general, very large sample sizes are needed to disentangle small climate change driven trends from internal climate variability (Fig. 3b). We find that when trends in the drivers are comparable, attributing compound events requires a larger sample size than that needed for attributing any of the contributing univariate events (see stippling in Fig. 3b). In particular, a substantially larger sample size for compound than for univariate events is needed for smaller forced trends (Fig. 3c). For example, the sample size needed under identical trends of 0.4 σi in the drivers differs by about 330 (i.e. by a factor 2) between the bivariate (under zero correlation) and the univariate cases (Fig. 3c). Furthermore, larger sample sizes are required when the dependence between the compound event drivers is small or negative (Fig. 3c), i.e.—in general—when the dependence makes the compound event rare. For example, the sample size needed under identical trends of 0.4 σi in the drivers differs by about 1200 (a factor 3.5) between the bivariate case under −0.5 and +0.5 correlation (Fig. 3c). When the trends in the compound variables differ strongly in magnitude, attributing the compound event requires a sample size in between what is needed for attributing the univariate events in isolation (missing stippling in Fig. 3b). That is, for example, attributing a compound hot-dry event to anthropogenic warming trends requires a larger sample size than that needed for attributing the heat event in isolation, though smaller than for attributing the drought event alone given that trends in droughts are much weaker than trends in heatwaves in most regions3.

Fig. 3: Required sample size for attribution of compound and univariate events.
figure 3

a Idealized bivariate distribution of two compound drivers (X, Y) under historical conditions (green), and under present conditions (red) obtained via shifting the mean of the historical drivers by forced trends Trendi equal to one standard deviation (σi) of the historical distribution. Shading in the top-right depicts compound events, i.e. concurrent high extremes. b Minimum sample size required for attribution of concurrent high extremes (>90th percentile) of synthetic data simulated from a bivariate standard Gaussian distribution (with constant zero correlation) in the presence of positive forced trends TrendX and TrendY in the two drivers (expressed in units of standard deviations). Grey on the bottom bottom-left indicates a required sample size larger than the values in the palette. The large orange dot indicates the combination of trends displayed in a. Stippling indicates combinations of TrendY and TrendX for which attributing the compound event requires a larger sample size than that needed for attributing any of the underlying univariate events (elsewhere compound attribution requires a sample size in between that needed for the two univariate events in isolation). c Sample size required for attributing any of the two univariate events (orange), and the compound event (green) assuming equal forced trends for both drivers and different constant correlation (ρ) between the drivers (see legend).

Detecting changes in compound event characteristics

Internal climate variability arising from the chaotic nature of the climate system can obscure anthropogenic climate change signals within short time periods51. Hence, especially for low-probability events, robust sampling of internal climate variability under different global warming states is crucial for assessing anthropogenically-forced changes robustly51. For compound events, identifying anthropogenically-forced changes requires, in addition to considering changes in the individual compound event drivers, to also quantify changes in their coupling, i.e. their statistical dependence32,52,53,54. However, quantifying any potential anthropogenic signals in the dependence, and therefore understanding the processes driving such signals, has proven to be particularly challenging so far due to the limited sample size of considered datasets33,34,38,52,54. For example, for compound coastal flooding—which is often driven by concurrent wind and precipitation extremes54—Wahl et al.52 identified an increase in the dependencies between the drivers in the historical period over the United States’ coast, but it was not possible to attribute this increase to anthropogenic climate change. Bevacqua et al.54 suggested that internal climate variability may dominate such dependency changes even under high greenhouse gas emissions and that SMILEs may therefore be necessary for identifying any potential forced changes. The SMILE experimental design allows for a quantification of the forced response (including changes arising from mean and variability55 of individual compound drivers, as well as from their dependence), by averaging changes across ensemble members, hence filtering out spurious trends due to internal climate variability. Once SMILE simulations allow us to disentangle such changes, further analysis can be carried out to identify the physical mechanisms driving these changes.

Focussing on compound hot-dry events, previous studies based on single-member simulations highlighted disagreement among model runs on the sign of the changes in the statistical dependence between temperature and precipitation53, ultimately causing large uncertainties in impact assessments. In particular, changes in this dependence are expected to modulate the heat sensitivity of crops as temperatures rise, thereby affecting crop production56. We illustrate that such changes in the dependence can be robustly estimated using the large ensemble simulations of the MPI-GE model as an example (Fig. 4a). Highly misleading results in the strength and sign of the changes may be obtained based on short time periods (31 years) from a single model simulation (note that this sample size is similar to that used in previous studies33,34,35,38). This is due to the large uncertainty due to internal climate variability in the dependence change (Fig. 4b, compare with Fig. 4a). Thus, using model output from only one model simulation does not even guarantee to identify the correct sign in the projected change. For example over Central Europe, one may identify either a positive or negative change (Fig. 4c, d) despite the clear underlying negative forced change (Fig. 4a).

Fig. 4: Internal climate variability can obscure forced changes in intervariable dependence.
figure 4

a Ensemble mean (MPI-GE model) change in Spearman correlation (ΔρS) between summertime mean temperature and mean precipitation in a 2 °C warmer world relative to preindustrial conditions. b Uncertainty in ΔρS due to internal climate variability (twice the standard deviation of ΔρS across ensemble members, i.e. 2 UIV; see Methods). c, d ΔρS of the ensemble members associated with the 5th and 95th percentiles amongst all the members, respectively, ranked by ΔρS averaged over Central Europe.

Similar considerations apply to the overall changes of compound event occurrence (not just changes in statistical dependence of drivers), as exemplified for trends in spatially compounding events. A previous study, by comparing two recent adjacent 20-year observational periods, indicated an increase in the probability of multiple breadbasket failures for wheat, maize, and soybean, and a decrease for rice13. However, model simulations (MPI-GE model) indicate a large variability in the frequency of a similarly defined 5-dimensional compound event, i.e. concurrent annual soil moisture droughts across the five main soybean breadbaskets worldwide (Fig. 5a), when estimated based on just 31 years of data (derived from a single ensemble member) (see blue bars in Fig. 5b). As a result, even two consecutive 31-year periods under the same historical conditions could exhibit a large increase or decrease in this frequency solely due to internal variability (grey bars in Fig. 5c). This indicates the challenge of attributing detected changes in spatially compounding events to anthropogenic climate change without employing large sample sizes. Notably, the potential for detecting both positive and negative changes persists even when considering projections in a world 2 °C warmer than pre-industrial conditions (brown bars in Fig. 5c, and Fig. 5b), in which case forced changes are expected to be much larger than in the historical period. Here, large ensemble simulations allow for robustly identifying a future forced response. For instance, by merging one-hundred ensemble members of 31 years, the change in the probability of N concurrent regions under droughts (lines overlaid to brown bars in Fig. 5c), is statistically significant (95% confidence level) up to N = 4.

Fig. 5: Internal climate variability can obscure forced changes in compound events.
figure 5

a The five soybean regions. b Historical (blue) and future (orange) frequency of annual soil moisture drought (defined as events occurring every 4 years on average) occurring in N simultaneous (x-axis) soybean regions (based on MPI-GE model). Coloured bars indicate the centred 95th percentile range and the mean amongst the ensemble members (historical probabilities based on individual members are shown through background lines). c Projected change in probabilities of simultaneous droughts in a 2 °C warmer world relative to preindustrial conditions (brown) and variations between two historical periods due to internal climate variability (grey; 95% range of changes obtained from randomly paired historical ensemble members).

Uncertainty sources in projections and plausible climate storylines for impact modelling

Quantifying and understanding uncertainties in climate projections and identifying the potentially wide range of plausible climate futures is highly relevant for policymakers25,57,58. It is important to distinguish between uncertainty arising from internal climate variability, which is irreducible because of the inherently chaotic nature of the climate system, and uncertainty from structural differences amongst climate models, which could be reduced through, for instance, model improvements or emergent-constraints59. Uncertainty partitioning based on multi-model single-ensemble simulations23 is expected to be biased, especially at regional scales and for variables with large variability60, which includes compound event frequencies. Here, we illustrate how multiple SMILEs can be used to partition uncertainties in compound event projections following recently proposed approaches25,26 (see Methods).

For compound hot-dry events, we demonstrate that contributions from internal climate variability and structural model differences to uncertainties in compound event projections depend on the warming level, with the two sources broadly contributing comparably in a 2 °C warmer world (Fig. 6a) and model differences dominating in a 3 °C warmer world (see purple dominating in Fig. 6b). Uncertainty in climate projections (i.e. in the climate over a period of, for instance, 30 years) can be communicated via climate storylines, which are distinct plausible future climates61 that can be derived from large ensemble climate model simulations26,62,63. For example, one storyline could be derived from a model simulation that exhibits an increase in precipitation over a region of interest, while another from a simulation that exhibits a precipitation decrease. Note that climate storylines may also be used as input to impact models to allow for an efficient exploration of the full range of plausible future climates and associated impacts (see final discussion).

Fig. 6: Model differences dominate internal climate variability at higher warming levels.
figure 6

a, b Uncertainty in the frequency of compound-hot-dry events (fHD) due to model-to-model differences (UMD) relative to the sum of UMD and the uncertainty due to internal climate variability (UIV) in a world 2 °C (a) and 3 °C (b) warmer than pre-industrial conditions (expressed in percentage; uncertainty is dominated by model-to-model differences for values above 50% and by internal climate variability otherwise). c, f Average of the future fHD over Southern Africa (green box in d) according to individual ensemble members of different climate models in a 2 °C and 3 °C warmer world, respectively (larger symbols indicate ensemble mean of individual models). d, g In a 2 °C warmer world, climate storylines of compound hot-dry events resulting from internal climate variability, i.e. future fHD from the ensemble members associated with the lowest and highest values averaged over Southern Africa amongst all the members of the MPI-GE model (shown in the box in c). e, h In a 3 °C warmer world, storylines resulting in the lowest and highest risk model, i.e. ensemble mean of the future fHD associated with the two models out of seven (shown in the boxes in f) showing the lowest and highest values averaged over Southern Africa. Compound hot-dry events were defined as in Fig. 1.

Distinct climate storylines derived from different ensemble members of a single SMILE26,62 allow for communicating “certain” irreducible uncertainty stemming from internal climate variability. In fact, the robust sampling of internal variability offered by the large number of ensemble members of a SMILE allows for identifying very extreme, albeit plausible future scenarios that may lead to the largest impacts26,61,62. An illustration of climate storylines with different compound hot-dry event frequencies in a 2 °C warmer world over Southern Africa is provided in Fig. 6d, g. When uncertainties from model differences dominate, as it happens in a 3 °C warmer world in this example, climate storylines derived from different models can be used to communicate such uncertainties transparently (Fig. 6e, h). In principle, such uncertainty from model differences could be reduced by improving the representation of the processes driving compound event changes. For example, in Southern Africa, improving the representation of processes dominating precipitation uncertainties, in particular related to shifts in the convective region and the structure of sea surface temperature in the surrounding oceans64, is essential to reduce uncertainties in future compound risk26.

Event-based storylines

The largest socioeconomic impacts are often caused by individual very rare events65,66. Estimating the occurrence probabilities of such rare events is very challenging, but a better understanding of the events’ dynamics is critical for risk analyses. This can be done using an event-based storyline67, which represents a physically self-consistent unfolding of a plausible individual event. The concept of event-based storyline differs from that of climate storyline introduced above, as the former refers to a plausible individual event (e.g., a very intense storm or crop failure and its consequences), and the latter to a plausible future climate over a long period (e.g. 30 years). The concept of event-based storylines has been employed to study several types of rare high-impact events over recent years68,69,70,71,72. Event-based storylines can be used to improve our understanding of the physical drivers of very extreme events and, through collaborations between climate scientists and impact modellers, the event’s socio-economic consequences66. However, identifying rare high-impact events that typically consist of combinations of multiple climatic drivers from short samples of data is difficult73,74. For example, based on datasets with limited sample size, identifying and studying the potential for combinations of physical drivers that lead to record-shattering heat events can be difficult73, and therefore the threats of such rare but plausible extreme events could be neglected.

In this context, the large sample of atmospheric conditions obtained by pooling multiple ensemble members of a SMILE can be used to identify very rare and potentially high-impact events that are plausible but have not yet occurred in the observational record (so-called unseen events73,75,76), which is particularly useful for compound events as we illustrate with two analyses. To quantify rarity in a multi-dimensional space we rely on a copula-based approach77,78 by selecting events that maximise the copula describing the distribution of the compound event drivers (see Methods).

The first analysis concerns compound precipitation and wind extremes over Portugal. Using all available simulations from multiple ensembles of the model CESM1-CAM5, we find that much more extreme events are possible compared to the most extreme event identified in a single simulation of that model (Fig. 7a). Specifically, based on the first ensemble member we identify as the most extreme event an extratropical cyclone with a central pressure below 990 hPa, which causes wind extremes along the Portuguese coast, but virtually no heavy precipitation (Fig. 7b). Based on all simulations the most extreme event is a deep cyclone with a core pressure below 970 hPa, which causes extreme wind speeds and daily precipitation above 100mm in Portugal (Fig. 7c). Such an event may cause a combination of inland flooding and landslides2, coastal storm surges54, uprooted trees, and downed power lines79. In a second analysis, we search for concurrent droughts in five key soybean production regions using data from the model MPI-GE. The most extreme event based on all pooled ensemble members is associated with droughts that are characterized by (sometimes substantially) lower soil moisture over all regions than based on the first ensemble member alone (Fig. 7d). Here, in line with a recent SMILE-based analysis12 that revealed the contribution of oceanic modes of variability on concurrent droughts across multiple breadbasket regions, further analyses could help identify plausible oceanic precursors of worst-case spatially compounding droughts. In general, the simulated worst-case events should be considered plausible only if the driving processes are in line with basic physical principles80.

Fig. 7: Identifying extreme event-based storylines.
figure 7

a Pairs of wintertime (December-February, 1950–1980) daily-mean precipitation and wind values averaged over Portugal (box in b, c) based on data of the first ensemble member (violet) and based on all pooled ensemble members (orange) of the CESM1-CAM5 model. Large dots indicate the most extreme compound events in the two datasets, i.e. the selected event-based storylines (Methods). b, c Precipitation (shading) and sea level pressure (isolines) associated with the selected extreme events based on one (b) and all ensemble members (c). Stippling indicates locations exhibiting extreme precipitation (blue), extreme wind (green), and concurrent extremes (magenta), where extremes are defined as values exceeding the local 10-year return levels—based on December–February data. d Anomalies (%, with respect to the mean; 1950–1980) of annual regionally averaged soil moisture over the five soybean breadbasket regions according to the most extreme compound event storyline based on data from the first ensemble member (violet) and based on all pooled ensemble members (orange) of the MPI-GE model.

Developing and evaluating statistical tools for compound events analysis

Properties of compound events are often studied via statistical modelling2,16,20,81,82. For example, statistical weather generators, which are used to simulate realistic weather conditions, can—in principle—allow sampling unobserved extreme events and estimating the likelihood of very rare events, reducing issues related to the limited sample size of observations77. To meet the challenge of studying different aspects of compound events, new statistical methodologies of different levels of complexity are being developed2,20,83,84,85,86. However, novel statistical tools need to be tested and evaluated, and the limited sample size of observational records is a constraining factor in this regard.

Building on a perfect-model approach (introduced below), SMILEs offer a powerful testbed for new and existing methodologies24,75. This can be particularly useful for assessing the skills of novel methods in compound event research, which can be very complex given the need for modelling multiple inter-variable relationships and multivariate extremes87,88. In a perfect-model approach, the climate represented by the large ensemble simulations of one climate model is used as a testbed. The statistical model is calibrated to the data of a single ensemble member, representing the pseudo-observations that are limited in sample size. Then, the skill of the statistical model in representing the statistics of interest in the underlying climate can be evaluated against the climate derived from all other ensemble members.

For example, one could be interested in evaluating a multivariate statistical model that is fitted to 31 years of soil moisture observations across the five soybean breadbasket regions with the goal to estimate the probability of simultaneous droughts in the five regions. We illustrate using the perfect-model approach (Methods) that a statistical model based on pair-copula-constructions, a statistical tool that is widely used for high-dimensional compound events13,20,89, underestimates the probability of two regions under drought in favour of a possible overestimation of four regions under droughts (Fig. 8; the correct probability shown by the black line in the middle of the blue bar is outside of the sampling uncertainty in grey of the statistical model). The statistical model does also not represent the statistics of the data used for calibration (the red lines do not always fall inside the grey range), which in principle may be due to structural biases in the chosen statistical model, indicating the challenge of developing multivariate statistical models for compound events based on limited sample sizes and the advantage offered by the large sample size of SMILEs. While it seems unwise to estimate high-dimensional probabilities from limited sample sizes, this is not uncommon in the literature13. Finally, note that the uncertainty in the frequency of concurrent droughts arising from internal climate variability (blue bars; the range of frequencies that can be measured in a given 31-year period) is conceptually different from the sampling uncertainty from the statistical model (grey bars; an uncertainty around the frequency averaged across multiple 31-year periods), hence—based on theory—a match between these uncertainties should not be expected. Finally, we note that a user may employ a perfect model approach to test different types of statistical models and eventually choose the most appropriate one.

Fig. 8: Evaluating statistical modelling of compound events using large ensemble simulations.
figure 8

Blue bars as in Fig. 5b. Red line, the historical probability of annual soil moisture drought in N simultaneous regions based on data of the first ensemble member. Grey bar, sampling uncertainty (centred 95% confidence interval) in the historical probability estimated through a statistical model fitted to data of the first ensemble member (the mean prediction of the statistical model is shown by the black line).

Discussion

While large ensemble simulations are known to be important to study very extreme univariate events24,73,90,91, we demonstrated that such simulations are even more relevant for compound event analysis. In principle, when studying mild extremes under stationary climate conditions, pooling data from observations or a historical single ensemble member could increase sample size and partially reduce uncertainties. However, for studying very extreme compound events and non-stationary climate conditions, very large sample sizes from multiple ensemble members are essential14,26,32,92,93,94,95. Furthermore, climate models are an essential tool for investigating future changes in compound events. We recommend using SMILEs to avoid estimating potentially incorrect compound event frequencies and associated changes. SMILEs are especially important when the correlations between the compound drivers are weak and for very high dimensional compound events, that is, when identifying compound event characteristics becomes particularly difficult based on short sample sizes.

As we illustrated with several analyses, the large sample sizes of SMILEs allow for tackling the challenge of robustly estimating statistical features of compound events. However, we note that—as for virtually all of the research related to climate and weather—research gaps do not arise merely from statistical challenges. It is important to advance our understanding of the physical processes behind compound events and their changes via in-depth analyses that combine models and long observational datasets. The large sample size of SMILEs can help advance the physical understanding of compound events. For example, a SMILE-based analysis revealed that a relatively small percentage increase in precipitation intensities caused by a combination of dynamic and thermodynamic atmospheric changes will drive a disproportionately larger increase in the spatial extent of precipitation extremes32. Similarly, SMILEs helped reveal that the concurrence of El Niño with the cold phase of the Tropical North Atlantic can substantially enhance the probability of concurrent droughts over global breadbasket regions12. Such climate-model-based analyses should be carried out side by side with a climate model evaluation of the primary processes responsible for compound events (see discussion below), e.g. the climate modes of variability.

SMILEs can also help advance our understanding of very extreme events. For example, climate model simulations helped uncover the physical drivers of record-shattering hot events73,96. Given that climate models can have biases, very extreme simulated compound events should be considered plausible only if the identified driving processes are consistent with basic physical processes80. Models that are unable to capture events as extreme as those observed even with the improved sampling provided by SMILEs are likely affected by biases in their representation of real-world processes.

SMILEs further allow for robustly quantifying forced changes in compound event features, including the dependencies between compound drivers. In fact, large changes may appear when analysing data with limited sample size, although no forced trend may exist. This effect, which also exists for univariate quantities51, is much enhanced for compound events. Once forced changes are identified, they can be linked to their driving physical processes. For example, SMILEs can help disentangle the physical mechanisms behind dependence changes, e.g. it could be studied whether the strengthening of land-atmosphere feedbacks97 strengthens temperature-precipitation dependencies53, whether projected changes in storm tracks54,98 influence the dependence between precipitation and wind extremes, and how changes in dependence scale with global warming.

While SMILEs offer many advantages compared to single-member climate model simulations, a series of considerations are worthwhile. SMILEs have already helped substantially to advance our understanding of climate variability24; however, climate models can have biases that should be considered carefully. That means climate model skills need to be evaluated43,86,99. Assuming that any difference between model- and observation-based estimates is only due to biases can be misleading as, especially for multivariate relationships, large differences may arise from internal climate variability43,54. That is, climate models with a few or even one single member may fail to capture observed behaviours not because of model biases but simply because of insufficient sampling of internal climate variability. The improved sampling of internal climate variability in SMILEs allows for better model evaluation100—specifically, it allows for avoiding rejecting models for the wrong reason. In general, we highlight that a purely statistical assessment of the model skills in representing the ultimate quantity of interest (e.g., frequency of extreme events) should always be complemented by a process-based approach that evaluates the model representation of the driving processes of extreme events72,80,101 and therefore the physical realism of the simulated events. Here, we argue that the latter, process-based oriented evaluation is particularly relevant for compound events. In fact, when observations are very short in sample size, while SMILE-based model evaluation can avoid rejecting models for the wrong reason, it may be very difficult to disentangle model biases from noise arising from internal climate variability in highly uncertain statistical quantities such as the frequency of compound events.

Models that pass a process-based climate model evaluation may still have residual biases102. Hence, bias adjustment or downscaling methods are operationally used for impact assessments102. For compound events, multivariate downscaling methods103 or bias adjustments104,105,106,107—which adjust both the distributions of the individual variables of interest and their statistical coupling—should be considered104,108. Correcting SMILEs can be challenging80,104; however, there are recent developments that provide bias-adjusted large ensembles80,109 (e.g., the CanLEADv1110), as well as dynamically downscaled SMILEs27 (i.e., SMILE from regional climate models that should better represent local processes). Building on a perfect-model approach24,75, SMILEs could also be used to assess the skill of multivariate bias-adjustment methods104.

In general, running impact models for many SMILEs can be challenging due to the computational costs of some impact models and the need of bias adjusting climate models111. In this context, storylines derived from SMILEs can be used to explore and communicate a wide range of plausible future climates over a period of, for instance, 30 years (via climate storylines, Fig. 6), as well as the consequences of individual worst-case events (via event-based storylines, Fig. 7). The selection of such storylines should be based on a metric that takes into account the combinations of drivers that cause impacts to a given sector of interest72, hence the choice of the metric will depend on the sector and should be guided by impact modellers. For example, exploring impacts associated with a year characterised by widespread droughts over breadbasket regions (event-based storyline) may allow testing different adaptation strategies related to, for instance, irrigation, therefore supporting decision-making1,2,66,68, whereas climate storylines derived based on frequencies of cold spells occurring together with low winds might be explored for the renewable energy sector112. Overall, employing a limited set of storylines as input in impact models can allow for efficient exploration of a substantial range of possible impacts, therefore supporting climate risk assessments65,72. Given the cost of dynamically downscaling full SMILEs, downscaling single events or ensemble members can provide an effective approach for ultimately driving impact models with high-resolution climate data72.

In our illustrations, we employed seven CMIP5 generation global SMILEs24 (each providing 16-100 ensemble members, see Table 1). However, new generation CMIP6 outputs that include more SMILEs are now becoming available (see https://www.cesm.ucar.edu/projects/community-projects/MMLEA/ for a current overview). The substantial reduction of uncertainties in compound event frequencies with the increase in the sample size (Fig. 2) indicates that the many CMIP6 models that are now available with several ensemble members (16 models with at least 10 members), will offer new opportunities for studying different types of compound events. Considering such a variety of models will enable a more comprehensive study of uncertainties from model-to-model differences in climate risk assessments. Furthermore, new SMILEs from regional climate models have also started to increase in number and can allow for comprehensive regional analyses by combining regional and global SMILEs27. The increase in computational resources should eventually enable the generation of high-resolution SMILEs at the global scale. We would like to note that alternatives to SMILEs exist. For example, climate emulators113,114 simulate large sample sizes at a low-computational cost; however, despite some exceptions88, they are typically not yet tailored to compound events. Furthermore, combining our understanding of physical processes with intelligent use of methods can also help reduce required sample sizes and thus render single-member model simulations more useful. For instance, using approaches termed dynamical adjustments allows estimating the forced response of climate change on various quantities of interest under certain assumptions with comparably small sample sizes115,116.

Based on a number of illustrative analyses (Table 1), we conclude that assessing compound events and their changes based on datasets of limited sample size may be misleading. Observational data, particularly when not characterised by a very small sample size, remain paramount for analysing extreme events. Nevertheless, climate models are essential tools for studying the dynamics of the most extreme events and for estimating future changes. Hence, given the increasing availability of SMILEs, we argue they should be preferred over single-ensemble model simulations to provide stakeholders with well-informed information on future climate risks associated with compound events.

Methods

Data

We used seven SMILEs: CESM1-CAM5117 (including 40 ensemble members), CSIRO-Mk3-6-0118 (30), CanESM290 (50), EC-EARTH119 (16), GFDL-CM3120 (20), GFDL-ESM2M121 (30), and MPI-GE122 (100), providing data for the period 1950-2099 (based on the RCP8.5 emission scenario123 after 2005). We considered the period 1950–1980 as the historical baseline and periods of the same length in a world 2 °C (or 3 °C, in Fig. 6) warmer than pre-industrial conditions in 1870–190026.

Compound events

Given a variable and a model, extremes were defined as values exceeding a percentile-based threshold of the variable’s distribution obtained by pooling together data of the historical period from all available ensemble members of the model. Compound event frequencies were computed empirically via counting26,53,94.

Compound hot-dry events were defined as high mean temperature (above the 90th percentile) and low mean precipitation (below the 10th percentile) values over the warm season (the three consecutive months with highest mean temperature during 1950–1980)26. In Fig. 4, to quantify historical and future Spearman correlation between mean temperature and precipitation, for each ensemble member and grid point, we first removed linear trends in the two variables53 (detrending different ensemble members independently).

To study multi-year droughts, we defined droughts as annual average soil moisture over the total column below the 20th percentile (model MPI-GE). To assess concurrent droughts across soybean growing regions, we used the five soybean regions from Gaupp et al.13; here, a drought was defined when the regionally averaged annual soil moisture was below the 25th percentile. For droughts and compound hot-dry events, data were bilinearly interpolated to a 2.5° spatial grid before all calculations.

For concurrent wind and precipitation extremes, we considered historical daily data of accumulated precipitation and mean wind speed on the native model grid (model CESM1-CAM5). Frequencies of concurrent daily extremes (fPW) are based on precipitation and wind extremes defined as values occurring twice a year on average (i.e., values above their 100  (1–2/365)th percentiles).

Overall, we note that considering more extreme thresholds to define compound events (e.g., the 16th percentile to define a soil moisture drought) would result in higher relative uncertainties in the frequency of compound events than those shown in Fig. 2.

Ensemble mean and uncertainty quantification

Each of the seven SMILEs consists of several simulations starting from different initial conditions, resulting in multiple ensemble members that span a range of plausible climates and associated distributions for any quantity of interest X. Given a SMILE, we obtained robust estimates of X as the average of X from individual ensemble members (ensemble mean). Furthermore, when X is a projected change, e.g., the change in the frequency of compound events (Δf), the forced response of f in the considered SMILE is the ensemble mean of Δf.

For a given SMILE, following Maher et al. (2021)25 and Bevacqua et al. (2022)26, we quantified the uncertainty in the statistical quantity X in a single realisation due to internal climate variability as twice the standard deviation of X from individual ensemble members of a given SMILE model (2 UIV). In Fig. 6a, b, where we consider multiple SMILE models to compute such an uncertainty due to internal climate variability, we obtain an average of UIV across models as \({U}_{{{{{{{{\rm{IV}}}}}}}}}=\sqrt{\frac{1}{{N}^{mod}}\mathop{\sum }\limits_{s=1}^{{N}^{mod}}{\sigma }^{2}({X}_{s})}\), where Nmod = 7 is the number of models, and σ(Xs) is the standard deviation of X from individual ensemble members of the model s. In Fig. 6a, b, we show the uncertainty in the future fHD due to model-to-model differences (UMD) relative to the sum of UMD and UIV. To this end, we quantified UMD based on the standard deviations of the ensemble-mean of fHD in the seven SMILEs26.

Dependence of uncertainty on sample size

In Fig. 2, following the procedure of Bevacqua et al.26, we computed the ratio of twice the standard deviation to the mean of f (i.e., \(2\cdot {U}_{{{{{{{{\rm{IV}}}}}}}}}/\overline{f}\)) across Nens = 12 ensemble members (for compound precipitation-wind extremes we use Nens = 30) of various fixed sample size Nyears. The Nens members were obtained via sampling data from pooled concatenated historical ensemble members of the SMILE considered to analyse the compound event of interest. When assessing (1) compound hot-dry events and (2) precipitation-wind extremes, we randomly sampled calendar years54 from the concatenated data. For assessing (3) droughts over, e.g., two consecutive years, we first modified each ensemble member simulation by creating a dataset with pairs of all two consecutive soil moisture annual values (we disregarded the first year of each ensemble member for which no previous soil moisture value exists). Then, we concatenated these modified ensemble members. Finally, we sampled the Nens ensemble members taking pairs from the pooled concatenated data. To obtain dashed and dotted lines in Fig. 2a, we first computed the Spearman correlation between temperature and precipitation (based on all pooled ensemble members) and then selected the grid points with the 35% strongest and weakest coupling (i.e., strongest and weakest negative correlation coefficient), respectively.

Required sample size for attribution

Extending the approach of Zscheischler and Lehner (2022)43, based on synthetic data of two variables X and Y with a given Pearson correlation corXY, we quantified—in the presence of hypothetical combinations of anthropogenically forced positive trends TrendX and TrendY in the two variables—the minimum sample size required for attribution of univariate high extremes and concurrent high extremes. We repeated the experiment three times, for different values of the correlations corXY (−0.5, 0, and +0.5). Note that we assume no trends in the compound drivers’ dependency43. Each experiment was carried out in three steps:

  1. (1)

    Based on a bivariate standard Gaussian distribution with a given correlation corXY, we simulated N pairs (X, Y) both for a reference period representing hypothetical preindustrial conditions, and for a hypothetical present-day period under forced trends in the variables (via modifying the means of the distribution of the two variables by TrendX and TrendY). To identify the sample size N required for attribution, such simulations were conducted for a set of N ranging from 5 to 3000 and repeated Nbootstrap = 5000 times each.

  2. (2)

    For each of the Nbootstrap simulations associated with each sample size N, extreme events of individual variables were defined as values above the 90th percentile of the reference period. The probability ratio PR = p1/p0 was computed, where p0 is the empirically estimated probability of an extreme event occurring in the reference period simulations and p1 is that probability for the present-day period. PRs were computed for the two univariate extreme events and concurrent extremes.

  3. (3)

    We identified the minimum sample size N for which the attribution is robust, defined as a sample size for which the 95% confidence interval of PR is greater than one in 90% of the Nbootstrap simulations, where the confidence interval was estimated following Zscheischler and Lehner (2022)42,43.

Event-based storylines

Compound events are found in a multidimensional space, for which, in contrast to a univariate space, there is no unique natural ordering, therefore identifying the most extreme compound events requires employing a metric whose extreme values are associated with meaningful events from an impact perspective. Here, we use a copula-based approach to select the most extreme events77,78. To identify extreme compound wintertime (December-February) precipitation and wind events, we first defined, for all pooled members of CESM1-CAM5, the variables X1 and X2 as the spatially-weighted average of daily precipitation and wind, respectively, over Portugal (Fig. 7). Secondly, through the R-package copula124, we computed the bivariate empirical copula C of these (X1, X2), that is the empirical joint cumulative distribution function125 of (U1 = F1(X1), U2 = F2(X2)), where Fi is the empirical cumulative distribution function of Xi. The pairs (X1, X2) associated with the highest copula values (and therefore the highest probability Pr(X1 ≤ x1, X2 ≤ x2)) are then selected as the most extreme events. To identify extreme compound droughts across soybean breadbasket regions, we employed the same conceptual approach, but extended it to five dimensions and used data from the model MPI-GE. That is, we considered five variables (X1, X2, . . . , X5) that were defined as the negative values (given the interest in their extremely low values) of the breadbasket-regionally averaged annual soil moisture and computed the associated five-dimensional empirical copula C. Finally, the combinations (X1, …, X5) associated with the highest copula values were selected as the most extreme events.

Perfect-model approach for evaluating a statistical model of spatially compounding droughts

For Fig. 8, we used the breadbasket regionally averaged annual soil moisture data for the historical period. We tested how a multivariate statistical model fitted to pseudo-observations with a limited sample size (here, 31 years from the first member) represents the compound event statistics of the underlying climate defined by pooling all available ensemble members (31  100 years).

Following the modelling of Bevacqua et al.20, the multivariate statistical model consists of a five-dimensional probability density function (pdf) of the variables (X1, X2, . . . , X5) (the soil moisture in the five breadbasket regions). The pdf is decomposed into the five marginal distributions of Xi and the copula density c, where c accounts for the dependence amongst the variables Xi regardless of their marginal distributions and was modelled via pair copula constructions (PCCs)20. Firstly, marginal distributions were fitted through a kernel density estimate89 (via the stats R-package126, function density). Then, we fitted pair-copula families and selected the best PCC structure via the function RVineStructureSelect (based on AIC) of the VineCopula R-package127. Copulas were fitted to the variables Ui = Fi(Xi), where Fi is the empirical cumulative distribution function of Xi.

The resulting statistical model, i.e. the fitted 5-dimensional pdf, is associated with a unique probability of simultaneous droughts across N-regions, which is, however, affected by sampling uncertainty because the pdf is fitted to a sample of finite size. We estimated the sampling uncertainty via standard parametric bootstrap similar to previous studies20,21,39, in four steps. (1) We simulated Nbootstrap = 900 samples of (X1, X2, . . . , X5) with the same length as the original data (i.e., 31 years). This was achieved by simulating the variables (u1, u2, . . . , u5) and transforming them to (x1, x2, . . . , x5) via the inverse of the marginal kernel densities. (2) We fitted a pdf model to each of the Nbootstrap samples via the same procedure outlined above. (3) From each of these Nbootstrap models, we simulated a sample of (X1, X2, . . . , X5) with a length of 1000 times the original, such as to robustly obtain Nbootstrap probabilities of simultaneous droughts across N-regions associated with each pdf. (4) Finally, we computed the centred 95% confidence interval of such Nbootstrap probabilities, i.e. the final sampling uncertainty in the probability of simultaneous droughts across N-regions (shown via grey boxplots in Fig. 8 along with an overlaid black line indicating the mean of the Nbootstrap probabilities).

Finally, we note that a user may be interested in carrying out the analyses considering different ensemble members of a given SMILE as pseudo-observations (or different SMILEs to carry the overall analysis).