The trend and spatial spread of multisectoral climate extremes in CMIP6 models

Climate change could exacerbate extreme climate events. This study investigated the global and continental representations of fourteen multisectoral climate indices during the historical (1979–2014), near future (2025–2060) and far future (2065–2100) periods under two emission scenarios, in eleven Coupled Model Intercomparison Project (CMIP) General Circulation Models (GCM). We ranked the GCMs based on five metrics centred on their temporal and spatial performances. Most models followed the reference pattern during the historical period. MPI-ESM ranked best in replicating the daily precipitation intensity (DPI) in Africa, while CANESM5 GCM ranked first in heatwave index (HI), maximum consecutive dry days (MCCD). Across the different continents, MPI-LR GCM performed best in replicating the DPI, except in Africa. The model ranks could provide valuable information when selecting appropriate GCM ensembles when focusing on climate extremes. A global evaluation of the multi-index causal effects for the various indices shows that the dry spell total length (DSTL) was the most crucial index modulating the MCCD for all continents. Also, most indices exhibited a positive climate change signal from the historical to the future. Therefore, it is crucial to design appropriate strategies to strengthen resilience to extreme climatic events while mitigating greenhouse gas emissions.

Global climate models (GCMs) are critical for understanding climate change. Still, their coarseness restricts their usefulness for creating climate change adaptation and mitigation policies, especially at regional scales where climate change impacts are more pronounced 1 . Over the past two decades, many world regions have experienced increased intensity and frequency of extreme climate events. These extreme events have been attributed to climate change [2][3][4][5] . Nevertheless, past generations of GCMs predicted a rise in such severe occurrences [6][7][8] .
Coupled Model Intercomparison Project Phase 6 (CMIP6) 9 provides significant advantages that combine representative concentration pathways (RCPs) with shared socioeconomic pathways (SSPs), model advancement, and better modelling of synoptic processes 8,[10][11][12][13][14] . However, as these GCMs become more widely available, it is critical to continue evaluating their ability to represent extreme climatic events worldwide.
Seneviratne et al. 7 reported more heavy rainfall episodes, decreased cold extremes, and increased daily temperature extremes globally. Adeyeri et al. 2 observed that warm spell duration and the frequency of warm days and nights exhibited statistically significant positive trends over the transboundary Lake Chad Basin. Ge et al. 13 investigated future changes in precipitation extremes over Southeast Asia. Chen et al. 15 evaluated and compared CMIP6 and Coupled Model Intercomparison Project Phase 5 (CMIP5) model performance in simulating extreme seasonal precipitation in the western North Pacific and East Asia. Ridder et al. 16 reported future changes in return periods for the co-occurrence of heatwaves, drought, extreme winds, and precipitation based on a CMIP6 multimodel ensemble.
Dike et al. 17 evaluated seasonal precipitation extremes using CMIP6 and reported an increase in the severity of projected dry spells over Central Asia. Collazo et al. 18 considered CMIP6 models' ability to represent observed extreme temperatures and concluded that the chosen models could not simulate cold days trends. Das et al. 19 assessed the impact of climate change on temperature extremes and projected a substantial rise in temperature

Data and methods
The CMIP6 dataset incorporates different shared socioeconomic pathways (SSPs). This study focuses on eleven CMIP6 models (Supplementary Table S1) under two SSP scenarios (SSP 370 and 585) during historical , near future (2025-2060), and far future (2065-2100) periods for Africa, Asia, North America, South America, Europe, and Oceania. SSP370 is the medium-to-high end of future emissions and temperature scenarios 27 , while SSP585 is the only SSP with emissions sufficient to deliver the 8.5 W/m 2 level of forcing in 2100. Table 1 presents the sector-based indices 28 .
The reference dataset, W5E5 29,30 , is a daily-resolution observed climate data on a global 0.5° × 0.5° lat-lon grid. W5E5 was selected as the reference dataset because it has been used to bias-correct climate models for impact studies 31 . All precipitation outputs were re-gridded using second-order conservative mapping due to the significant variability and unequal distribution of precipitation. In contrast, all temperature outputs were re-gridded using bilinear interpolation 32 to prevent erroneous scale gap effects 31 . All outputs were re-grided to a single 1° × 1° grid. GCM ranking. The performance of each GCM was ranked for each climate index during the historical period using five different metrics 23,[33][34][35] ; namely, percentage bias, mean absolute error, index of agreement, correlation, and root mean square error. These performance metrics were combined for a universal rank Eqs. (1) and (2) in which all metrics were assigned an equal weight 35,36 . www.nature.com/scientificreports/ where T i,j is the error of metric j and particular model i in space and time. All metrics m are added together to get the total relative error T * i,tot per GCM per index. It should be noted that the index of agreement and correlation were reversed for the procedures min() and max() because the best-performing models should have a total relative score closer to zero. Rank 1 is the best-performing model for each index.
Empirical cumulative distribution function (ECDF). The ECDF is calculated by ordering all the unique observations in the data sample and calculating the cumulative probability for each as the number of observations less than or equal to a given observation divided by the total number of observations (n). This is given as 37 ; where 1X is the indicator of event X. When t is fixed, indicator 1X i ≤ t is a Bernoulli random variable with parameter p = F(t). F n (t) is the unbiased estimator of F(t). The independent, identically distributed real random variables with the shared cumulative distribution function F(t) are denoted by Xi(t).
We examined the ability of the GCMs to simulate the monthly climatology of the extreme indices by visually inspecting and subjecting the ECDF distributions to Kolmogorov-Smirnov test to understand the distribution differences between the reference and the outputs from the GCMs.
Correlation, partial correlation, index of importance and causal effects. The degree of association is a metric that shows how strong the relation between two variables is without considering that a third variable may influence both variables. When several factors impact the phenomenon under examination, partial correlation becomes extremely important.
However, due to the changing variance and linearity of the extreme events, we used the spearman correlation to benchmark the monotonic relationship between the extreme events.
Partial correlation is the correlation of two variables while controlling for a third or more other variables. The relationship is said to be partial when two variables are correlated while conditioning the third or several other variables 38 .
The partial correlation is given as: where ρUVW is the partial correlation of variables U and V conditioned on W, ρUV is the correlation between varaibles U and V, ρUW is the correlation between variables U and W while ρVW is the correlation between variables V and W. For a set of t controlled variables, W is Statistically significant correlation and partial correlation were examined using the Student's t-test at a 95%, 99% and 99.9% confidence level.
Due to the mutuality of these extreme events, we also quantified the causal influence of the candidate's extreme event on the responses of the other extreme occurrences.
Several methods have been proposed to assess this, but the most popular is the permutation of importance, based on the random forest algorithm 39,40 . However, this approach is unsuitable for highly correlated events, as it cannot distinguish between an event's conditional and marginal influence 41 .
As a result, the Conditional Variable Relevance-also known as the Conditional Permutation Importance (CPI)-was used to assess more partial importance in random forests. The CPI is presented as: For the regression trees in the random forest (k) are the prediction error of tree t in a random forest with predictors p and ntrees of trees based on the out-of-bag sample β (t) , respectively, before the permutation of the out-of-bag values of X k . β (t) is the out-of-bag cardinal number sample for tree t, and I() is the indicator function. y (t) i is the function f (t) (x i ) , which is the random forest prediction of the out-of-bag observation i before permutation. x p(i)k is the ith X k observation after permutation.
The forest permutation of importance PI (k) is the average overall tree-wise permutation PI where PI In the CPI, out-of-bag values of one predictor X (k) are conditionally permuted on other predictors Z (−k) . The extreme events are subjected to CPI following Debeer and Strobl 40 , while the causality assumptions are presented by van der Laan 42 .
Climate change signals in extreme events. For risk management purposes, climate projections should include a wide range of feasible spectrums of anticipated future climate change 36 . The climate change signals (CCS) in extreme events were calculated as the difference between (i) the far future and the historical period, (ii) the near future and the historical period, and (iii) the far future and the near future.
Trends and significance. Spatial plots based on robustness and significance criteria are presented for all indices' historical and future assessments. Following Haensler et al. 43 , this study considers a signal robust if 66% of the models agree. Statistically significant trends and climate change signals were examined using the Student's t-test at a 95% confidence level 35 . The trend in the climate extreme events was calculated based on the Mann-Kendall trend test, while the trend's magnitude was estimated using Sen's slope 35,[44][45][46] .

Results
Spatial spread climatology. The climatology was computed for all indices for the historical period. For instance, the climatology of extreme temperature range (ETR) ranged from 2 to 47 °C ( Supplementary Fig. S1), with the tropics having the smallest range. BCC replicated the spatial spread, with slight overestimation over Russia and underestimation around the Brazilian and West African coasts. Additionally, other models had varying degrees of overestimation or underestimation. Generally, the range increased away from the equator in the Northern and Southern Hemispheres. This pattern is visible in all models. The departure of the models from the reference is presented in Fig. 1. Most models had evident cold biases in the Southern Hemisphere. However, some models, e.g., BCC, performed well in North America, Greenland, West Africa, and some parts of Russia. The hot bias was highest in CANESM5, with ETR values as high as 23 °C in Greenland and China. MPI-ESM, MPI-LR, and NORESM2 all recorded cold tendencies over entire continents. The spatial spread for other indices is presented in the supplementary file ( Supplementary Fig. S2 to S9).

Regional ECDF.
Taking the ETR as an example, Supplementary Fig. S10 shows the distribution of the ECDF on the continental scale. Over Africa, four models (CANESM5, BCC, UKESM, and INM) reasonably replicated the reference ETR ( Supplementary Fig. 10a, Table 2). However, based on the Kolmogorov-Smirnov statistics ( Table 2), INM showed the lowest distribution distance (0.14), although with a P-value lesser than 0.05. This means INM performed best in capturing the ECDF distribution. However, the P-value suggested that INM distribution was not the same as the reference distribution. MIROC6 overestimated this by up to 5 °C. At the same time, IPSL, MPI-LR, MPI-ESM, and NORESM2 underestimated the distribution by up to 7 °C. Over Asia, GCM performance and ranking. Figure 2 shows the multi-index continental GCM ranks. Over Africa ( Fig. 2a; Supplementary Table S2), MPI-ESM performed best in replicating the daily precipitation intensity (DPI), while BCC performed worst (ranked 11th). For ETR, four models (CANESM5, BCC, UKESM, and INM), as observed in the ECDF earlier, ranked best in Africa. Over Asia, CANESM5 ranked first in heatwave index (HWI), maximum consecutive dry days (MCCD), daily temperature range (DTR), and dry spell total length (DSTL). In contrast, BCC ranked first in heatwave maximum length (HWML), hot spell maximum length (HSML), and DTRV. As observed by the ECDF of ETR over Asia, MIROC6 was the best-performing model in ETR. Over Europe and North America, IPSL ranked first in replicating the observed warm and dry days (WDD). In contrast, MPI-LR ranked first in reproducing the reference DPI over South America and Oceania, respectively. Notably, the hot spell frequency (HSF) and heat wave frequency (HWF) were well replicated by all CMIP6 models, as the error margin was between 2 to − 3 days/decade for all models.
Correlation and partial correlation. Figure 3 shows the multi-index correlation and partial correlation plot for the reference on the continental scale. For instance, in Africa, the HSML was strongly positively correlated with heatwave frequency (HWF) (88%), HWI (87%), HWML (90%), and heatwave total length (HWTL) (90%), all at the 95% confidence level (upper diagonal). However, this is different when considering partial correlation. The partial correlation plot (lower diagonal) shows that the HSML was positively correlated with heatwave frequency (HWF) (21%), HWI (59%), and HWML (27%), while negatively correlated with HWTL (− 18%), all at the 95% confidence level. The same pattern was observed in Asia and Oceania. The partial correlation values and directions differed in North America, South America, and Europe. The analysis was repeated for the ensemble-mean of the CMIP6 (Supplementary Fig. S11). Generally, the reference and model ensemble-mean have similar correlation and partial correlation patterns, albeit with differing magnitudes.
Causality and importance. Figure 4 presents the CPI results for the reference data. Over the entire continent, HWTL was greatly influenced by the hot spell maximum length (HSML) and HWF. There was no universal pattern of causality for ETR, as the importance-modulating index varied with the continent. In Africa, HWTL, DTRV, and HSML were the most influential indices for ETR, while in Asia, HWI, DTR, and DTRV were the most significant. In contrast, DTRV and DTR were the most influential indices in Europe. HWTL and HWF greatly influenced changes in the pattern of HSML on all continents, while DSTL was the most crucial index modulating the MCCD for all continents. Supplementary Fig. S12 to S18 show the CPI of other extreme indices over the entire continent. Additionally, there was general agreement between the index of importance observed in the reference and the multimodel ensemble-mean.
Trends and CMIP6 trend consistency. As a case study, Fig. 5 shows the trends in ETR for reference and the CMIP6 models during the historical period. For reference, there were some statistically significant negative trends in ETR in some parts of North America and Asia, with values between − 4.1 and − 0.3 °C/decade. However, there was a typical positive trend of ETR in Africa and South America, though some parts were not statistically significant. Moreover, the CMIP6 models could not capture the significant trends in the reference. They also exhibited some underestimations, particularly in South America, and overestimations in Greenland. Except for IPSL and CNRM, most models showed a general warm bias over North America and northern Asia www.nature.com/scientificreports/ www.nature.com/scientificreports/ ( Supplementary Fig. S19). Furthermore, there was a characteristic cold bias in South America, Africa, and Oceania. The range of the model trend biases was between − 3.4 and 4.7 °C/decade. There was a consistent negative trend of DTR over most parts of West and North Africa, southern Europe, and Central Asia, with agreement by all eleven models (Fig. 6). The positive DTRV trend consistency was prominent mainly between latitudes 15° N and 35° N; and longitudes 25° W and 60° E, while an apparent negative trend was observed over Europe, northern Australia, and western Canada, with agreement by at least nine models. The consistent positive trend of HTWL was particularly noticeable in West and North Africa, most parts of Australia, and South America.
Projection and change signal. Taking ETR as a case study, Fig. 7 shows the CCS in trends between the historical, near future (NF), and far future (FF) for both SSP 370 and 585 using the CMIP6 ensemble-mean. Although there were some significant negative trends of ETR in some parts of North America and some significant positive trends in some parts of South America, the ensemble-mean generally replicated the spatial trend in other parts of the world during the historical period, although with different magnitudes and significance. In most cases, most parts of the tropics witnessed a positive trend of ETR during the historical period. The FF under SSP 370 witnessed a significant widespread negative trend in Europe and Alaska. Similarly, other parts of the world mainly had positive trends. The trend values varied between − 4.9 and 3.5 °C/decade.
We further explored the climate change signal in the trends for the different periods.
The CCS between the historical and SSP 370 NF showed a rise in the trend of ETR from the historical to SSP 370 NF periods in Europe, with a magnitude between 1.0 and 4.0 °C/decade. However, there was a general drop in this trend for most parts of Canada and some parts of Russia. Moreover, most parts of the world experienced an increasing trend from the historical to SSP 370 NF. In reverse, the CCS between the historical and SSP 585 NF revealed that most areas of Canada that were previously accustomed to negative trends were reversed to positive. In contrast, the once positive trend in the USA under SSP 370 NF became negative. China also had a more intense positive trend during this period. A further look into the CCS between SSP 370 NF and SSP 370 FF shows a significant deepening in the trend of ETR over most parts of Europe, with magnitudes of between − 2.1 and − 5.0 °C/decade. At the same time, some other areas like north-eastern Canada had a significant rise in the trend by up to 4 °C. Supplementary Fig. S20 shows the result of the same experiment considering DTR. Figure 8 shows the yearly and total cumulative heatwave index (HWI). Africa had the highest days of HWI. The reference series values ranged from 149 (in 1983) to 171 days/year (in 2012). However, the highest days in the historical CMIP6 ensemble-mean varied from 120 to 130 days/year. The highest days of HWI for the entire period of study manifested during the far future under SSP 585, with 181 days/year from 2098 to 2100. Nevertheless, comparing the reference with the historical simulations showed an underestimation of HWI days for all continents. The total cumulative HWI index showed Africa having more than 6000 days of HWI during the far future under SSP 370 and 585, while Asia maintained a value of 2000 days under the same SSPs. Zonal and meridional cross-section ETR. Zonal and meridional cross-sections were used to understand further the influence of land, sea, and latitude differences on ETR (Fig. 9). Over Africa, there was an apparent increase of ETR away from the equator and the coast. However, high ETR was visible in desert regions, specifically between 20 and 30°N (Sahara Desert) and between 25 and 32°S (Kalahari Desert). More importantly, the ETR was highest in the Kalahari Desert for all years due to cooler (freezing) nights than in the Sahara Desert. Furthermore, the ETR decreased toward the Mediterranean due to the ocean's high specific heat capacity. ETR was highest in regions with a dense landmass and reduced toward the ocean due to the land's low specific heat capacity compared to the ocean.
ETR was lowest over scattered landscapes (− 10° N to 10° N) in Asia and increased with denser landmass (Fig. 9c). Meridionally, ETR was highest in mountainous regions due to the ability of these regions to warm and cool faster than the surrounding regions. ETR also decreased toward the Pacific Ocean. Generally, ETR ranged between 3.0 and 72.0 °C for the displayed continents.
Additionally, there were cold biases in most CMIP6 models along the tropics (Fig. 10). However, NORESM2 reasonably captured the low ETR along the equator and a few degrees away from the equator. IPSL and INM underestimated the low ETR along the equator by 5 to 20 °C. In contrast, MPI-LR, MPI-ESM, CANESM5, and BCC captured the high ETR north of 50°N reasonably well. The models, overall, mirrored the reference patterns. The ETR reduced as it approached the Arctic due to ice, increased as it advanced toward the Mediterranean, and declined as it approached the equator. The broad range was between 1.0 and 73.0 °C.

Discussion and conclusion
In this study, we explored the potentials of eleven GCMs participating in CMIP6 project in representing fourteen climate extreme indices that are useful for different sectors, including health, agriculture and water resources. Understanding of climate model skills, multi-index causal effects and global climate change signals of extremes is limited, but is key in reinforcing well-informed decisions in the various sectors.
For the selected models, individual CMIP6 GCM results often varied significantly between regions and models. For example, the top-performing GCM in simulating the reference DPI tended to exhibit consistently strong performance across continents, whereas the best-performing model in simulating DTSL differed among continents.
Additionally, there were observable biases in the climatology of various indices. The distribution of the ECDF supported these inconsistencies. Notably, all models behaved differently in capturing the ECDF of different indices and for different continents. These inconsistencies could be attributed to the various land surface www.nature.com/scientificreports/ www.nature.com/scientificreports/ schemes and simulation of features such as vegetation 36 and orography 47 , unrealistic large-scale variability 35,48 , and contrasting internal variability between climate models and observations 23,49 . Nevertheless, previous studies [50][51][52] showed that CMIP GCMs struggled to capture daily temperature minima and maxima. Yet, understanding extreme temperature events is critical since many earth system processes are influenced by the maximum and lowest temperatures as well as the mean temperature 36,52 .
The performance of GCM in simulating precipitation extremes also varied widely across the globe. However, it is essential to note that the highest biases were recorded immediately north and south of the equator, particularly for DSTL. The signal of these biases varied among the different models, suggesting the inability of the models to correctly resolve important biophysical features such as solar radiation, vegetation, and cloud features, especially at daily scales 53,54 or other synoptic-scale system interactions 55 . In contrast to Srivastava et al. 56 , who reported that UKESM was one of the best-performing models in the US for precipitation simulation, we found that IPSL performed best in simulating MCCD for both Africa and North America, while MPI-ESM and MPI-LR performed best in simulating DPI in Africa and North America, respectively. Across all continents, IPSL performed best for MCCD, while MPI-LR performed best for DPI. Remarkably, the MPI model family performed best in capturing the DPI across the six continents. This may have been produced from model inter-dependence 36,56,57 .
Despite the different biases associated with the various models, the correlation and partial correlation patterns were similar for the reference and model ensemble-mean, although with different magnitudes. Also, the causal impact for different indices varied. More precisely, the causal impact for ETR did not follow a universal pattern since the important modulating index differed depending on the continent. Furthermore, the multimodel ensemble-mean and the significant index agreed with the reference.
There was a mix of trends for the various specific indices for each location. Despite a rise in both maximum and minimum temperatures, the significant negative trends of DTR suggested quicker warming of the minimum temperature 2 .
Nonetheless, these results have some practical implications for decision-makers in different sectors. Due to changes in land cover, land use, and the resources employed in urban centres, cities have greater surface temperatures than rural areas (urban heat island phenomenon 53,58-60 ). Additionally, urban heat island impacts lengthen the duration of heat events during heatwaves 61 ; however, this behaviour is different for the various atmospheric circulation regions. For example, we could not observe increased heatwave total length in most urban areas in the mid-latitudes. Yet, the reverse is for urban areas in the tropics. This demonstrates that rather than urban heat islands, large-scale meteorological conditions 62 , significantly impact the length of the duration of heatwaves in midlatitude.
The consequences of drought on water demand and supply by natural systems and people, on the other hand, will be amplified as the climate warms 63,64 . Furthermore, rising temperatures worsen heavy precipitation by increasing atmospheric moisture, promoting the precipitation event through moisture convergence at low altitudes 65,66 and increasing evapotranspiration rates arid areas 46 . This will lead to more intense hydrometeorological situations, such as floods and droughts 5 and significantly impact the amount and quality of available water as well as river discharge timing and amplitude 67 . As a result, humans, society and natural systems are at risk 67,68 .
Therefore, it is crucial to design appropriate strategies to strengthen resilience to extreme climatic events while also mitigating further GHG emissions. However, optimal adaptation/mitigation strategies for climate change could be hampered by incorrect information from the poor representation of climatic events in data sets.
Beyond the specifics of this research, future work could investigate extreme historical events with additional reference datasets, bias-corrected or downscaled CMIP6 datasets, especially for extreme precipitation events. Additionally, more CMIP6 models could be utilized.

Data availability
CMIP6 data are publicly available through the Earth System Grid Federation at: http:// esgf. llnl. gov/. The W5E5 reference dataset is distributed by the GFZ Data Services and can be downloaded at https:// doi. org/ 10. 5880/ pik. 2019. 023. The derived data generated for the study are available from the corresponding author on reasonable request.