Introduction

The CMIP6 program is the most recent effort in coordinating the design and distribution of a large number of the state-of-the-art climate model simulations of the past, present, and future climates1. It serves as the basis for the sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR6)2. The observed global warming trend can be generally well captured by the state-of-the-art climate system models participating in CMIP63. Almost two-thirds of the impacts related to atmospheric and ocean temperature changes can be confidently attributed to anthropogenic forcing4. There is a linear relationship between global mean surface air temperature and the equivalent-CO2 concentrations5. The equivalent-CO2 is calculated by converting amounts of all the greenhouse gases to the equivalent amount of carbon dioxide with the same global warming potential. The contribution from equivalent-CO2 to historical global mean surface air temperature is about 70% on multi-decadal or longer timescales.

Greenhouse gases (GHG), CO2 in particular, are considered to be the major external forcing in the 20th century and will also have a crucial impact in the 21st century projection. A high correlation between global mean near-surface air temperature and the CO2 concentrations has also been found in future projections6. However, many of the latest CMIP6 models have larger climate sensitivities than the previous CMIP5 generation7,8,9. Higher climate sensitivity indicates the ‘hot model’ problem in CMIP6 models10, and the larger climate sensitivity range suggests a wider range of warming responses to CO2-forcing and larger model uncertainty in future warming projections11.

There have been several studies trying to narrow uncertainty on estimates of past and future human-induced warming based on detection and attribution techniques or metrics of climate sensitivity12,13. There is a strong correlation between the recent global warming trend and transient climate response (TCR), and the past warming trend therefore can be used to constrain future warming projections in climate models on global scales14. But most of the previous studies focus on the projections on a global scale15. However, it is regional-scale changes that are of vital importance for the impacts of climate change. Under global warming, climate changes are more extreme regionally and induce more severe impacts. For example, the rate of heatwaves increases since the mid-twentieth century, but trend magnitudes are not globally uniform. Decadal trends in the frequency of heatwaves are biggest over northern South America, the Middle East, and the Maritime Continent at 50% per decade, but range between 10% and 30% per decade over most of the other regions16. In this study, we try to constrain future warming projections but focus on regional land surface air temperature (LSAT) trend projection under the high-emission SSP5–8.5 scenario and intermediate-emission SSP2–4.5 scenario17. The SSP5–8.5 and SSP2–4.5 update the RCP8.5 and RCP4.5 in CMIP5 and have the most participating climate models. We only focus on surface air temperature trend over land since Atlantic multidecadal oscillation (AMO) and Pacific decadal oscillation (PDO) account for most of the decadal variability over the oceans5,18,19.

Results

Historical LSAT trend: observation and CMIP6 simulations

Figure 1a shows the LSAT trend from 1950 to 2014 in the CRU dataset. Almost the entire global land surface has experienced significant surface warming. The warming trend is remarkable over the mid- and high-latitudes of the Northern Hemispheric continent and northeast Africa, weakest over the Tibetan Plateau and its downstream area, and the Andes Mountains along the western edge of South America. The global warming features can be generally well captured by the ensemble mean of the 33 CMIP6 models (MME, Fig. 1b). The spatial correlation coefficient between MME and CRU is 0.55, higher than the correlations for 32 of the 33 models. However, the LSAT trend in MME is spatially much smoother than in CRU. As shown in Fig. 1d, regions with LSAT trend underestimated by more than 0.5 oC 100 years−1 in MME are where the observed warming rate higher than 1oC 100 years−1, e.g., over Asia and Alaska. And the overestimation in MME is where the observed warming rate is lower than 1oC 100 years−1, e.g., over the Continental United States (U.S.) and South America. The amplitude of the underestimation over Asia and the overestimation over the U.S. is almost half of the regional warming rate in the observation. That is, there is a big challenge for recent models to reasonably reproduce the LSAT trend in Asia and the U.S. The longitudinal gradient between Europe and northern Asia shown in CRU is also much weaker in MME.

Fig. 1: Land surface air temperature (LSAT) trends from 1950 to 2014 in.
figure 1

a CRU, b raw (MME), and c constrain (CMME) ensemble under 0.5 ~ 1.5 OBT of 33 CMIP6 couple models. Global mean LSAT trends are marked at the top-right corner of Figs. 1a–c. d and e are the trend biases in MME and CMME, respectively. f The trend differences between CMME and MME. Units: oC 100 years−1.

The standard MME tends to be a better estimate of the forced climate change of the real system than the result from a particular model by allowing model errors to cancel each other out for a large enough ensemble and has been widely used in studies on climate change simulations and projections2. However, it may fail to capture the real LSAT trend since some heavily-biased simulations may diminish the real signals or lead to unreliable results. The capabilities of individual models in reproducing the LSAT trend may also vary with regions, e.g., models that can reasonably produce the LSAT trend in Asia may fail to simulate the LSAT trend in the U.S. or other areas. In this study, we build ensemble at each grid-point based on certain OBS-Based Threshold (OBT), which we call a “constrained multi-model ensemble” (noted hereafter as CMME, see Methods section for details). General features of LSAT trend in CMME (Fig. 1c), constrained by the chosen OBT, are similar to those in MME. The blank areas over land, such as the Andes Mountains, are where no model can capture the LSAT trend under the OBT. Although the global mean LSAT trend barely change (1.71 oC 100 years−1 in MME versus 1.73 oC 100 years−1 in CMME), the spatial correlation coefficient with CRU increases to 0.93 in CMME. The mean CMME biases are 0.3oC 100 years−1 (Fig. 1e), less than half of that in MME (0.71oC 100 years−1). As expected, the differences between CMME and MME are similar to the MME biases but in opposite sign over 93% of the total area (Fig. 1f). Regions with the largest differences between CMME and MME are the areas where the MME shows the largest biases, i.e., Asia and the U.S. Therefore, we conclude that the warming amplitude may be largely biased on regional scales, although the geographic distribution of recent warming in MME is better than most individual models. The CMME result further improves the geographic distribution of LSAT trend in MME and its amplitude at a regional scale, because by construction the CMME is closer to the observed trends.

Model performances in reproducing LSAT trends vary with regions. As shown in Fig. 2b, more than 20 models are able to reproduce the LSAT trend over North Africa and the high latitude of Asia under the OBT. However, model performances are relatively lower over the U.S., western South America, India, and East Asia, where the LSAT trend is smaller than 1oC 100 years−1 (Fig. 1a). We further quantify model capability in reproducing regional LSAT trend in the 44 IPCC AR6 WG1 regions (Fig. 2c). Model capability is defined as the fraction of models that contribute to CMME under the OBT for each region. Region definitions follow the IPCC AR6 Working Group 1 (WG1) reference regions over land20. Region names and their acronym are introduced in Table 1, as well as the model capability in each region. Model capability is relatively high over Central America, the Central Eurasian continent, the northern and southern Africa, but relatively less good over North and South America, East and South Asia.

Fig. 2: Capability of CMIP6 models in reproducing LSAT trend from 1950 to 2014.
figure 2

a Percentage of grid-points that the LSAT trend is missing in CMME (x-axis) versus pattern correlation between CMME and OBS (y-axis) under different OBS-Based Threshold (OBT). The global mean model numbers in CMME with different OBT are marked at the right-hand side of the corresponding color dot. b Numbers of the CMIP6 models that contribute to the CMME under 0.5 ~ 1.5 OBT at each grid-point. c Percentage of models that contribute to CMME in each IPCC AR6 WG1 reference regions, defined as model capability in this study. The acronym of each region in c are introduced in Table 1.

Table 1 Model capability in reproducing the LSAT trend from 1950 to 2014 in 44 IPCC AR6 WGI reference regions (units: %) and LSAT trend in 2050–2100 under SSP5–8.5 and SSP2–4.5 scenarios in MME and the adjustment (CMME-MME, oC 100 years−1).

The adjustments in CMME upon MME in Future LSAT projections

The impact of constrained ensemble under the high scenario SSP5–8.5 is assessed (Fig. 3a–c). Since the SSP5–8.5 is intended to explore an extremely high-risk future and the projections perhaps not the most realistic21, we also examine the impact under a more optimistic scenario SSP2–4.5 (Fig. 3d–f). The LSAT trends from 2050 to 2100 are examined. The growing rates of CO2 concentration in SSP5–8.5 and SSP2–4.5 are about 11.7 ppmv year−1 and 2.0 ppmv year−1, respectively. The spatial patterns of the warming under these two different scenarios are similar and close to those featured in the 20th century. Globally, the LSAT trend in MME is about 7.62oC 100 years−1 under the SSP5–8.5 scenario (Fig. 3a) and 2.40 oC 100 years−1 under the SSP2–4.5 scenario (Fig. 3d). Consistent with what we observed in the historical period, the global mean LSAT trend projections in CMME are also close to those in MME (Fig. 3b and e). However, the regional differences are pronounced and generally grow with the growing rate of CO2 concentration. Left out low model capability over the Andes Mountains, the constraint of LSAT projections under both scenarios generally resemble the effect of constraint ensemble in the historical period (Fig. 1f) over more than 79% and 71% of the land surface under the SSP5–8.5 scenario (Fig. 3c) and the SSP2–4.5 scenario(Fig. 3f), respectively: regions that show exaggerated historical warming trend are also the regions where CMME tends to reduce the projected warming rate, and vice versa. According to CMME adjustments, warming projection over the Eurasian continent may be more intense than the raw ensemble (MME), whereas the warming risk over the U.S. may be lower than expected. The similarities between the CMME impacts under different scenarios indicate the scenario independency of CMME effect and the essential role of the CO2-concentration trend.

Fig. 3: Patterns of LSAT trend under the SSP5–8.5 and SSP2–4.5 scenarios from 2050 to 2100.
figure 3

(a) and (d) are results in MME. (b) and (e) are results in CMME. Global mean LSAT trend is marked at the top-right corner of the corresponding plot. (c) and (f) are the differences between CMME and MME. Units: oC 100 years−1.

Regions with the most considerable adjustments in future projections

The regional LSAT projections in the raw and constrained ensembles are further detailed over the 44 landscape reference regions. Considering the similarities between the adjustments under the different scenarios, we mainly focus on the adjustments under the SSP5–8.5 scenario. As shown in Fig. 4a, LSAT increases monotonically over all regions but the warming rates are quite different. The three regions with the highest increasing rate (higher than 9.9oC 100 years−1) in MME are the Russian Arctic (RAR), North-Eastern North America (NEN), and Russian Far East (RFE), all of which locate at high latitudes in the Northern Hemisphere. The three least warming regions (lower than 4.9oC 100 years−1) are the Caribbean (CAR), New Zealand (NZ), and Southern South America (SSA), surrounded by ocean.

Fig. 4: LSAT trend projections under the SSP5–8.5 scenario.
figure 4

a The range of annual mean LSAT anomalies in the 44 CMIP6 reference regions in MME from 2020 to 2100 (shading) and the evolutions of LSAT anomalies in the regions with the highest and least warming rates. The anomalies are relative to the year 2020. b The LSAT trend differences from 2050 to 2100 between CMME and MME in the 44 CMIP6 regions. For a reasonable comparison, values over the areas where the values are missing in CMME are not considered in MME.

The LSAT trend during the second half of the 21st century (2050–2100) in each region is also listed in Table 1. The CMME adjustments vary from -0.90 oC 100 years−1 in Eastern North America (ENA) to 0.59 oC 100 years−1 in East Central Asia (ECA). Accounting for the adjustment larger than 0.2 oC/ 100 years−1, warming trends are intensified only over 7 regions but suppressed over 13 regions. The amplification of warming is evident in Asia and Africa. Suppression of warming is mostly located over the American continents. Relative to the regional LSAT trend in MME, the warming adjustment is generally weaker and most significant over ECA (7%); the cooling adjustment prevails over ENA (-11.6%) and Central North America (CNA, -10.5%) in North America, Northern South America (NSA, -8.2%) in South America, Greenland/Iceland (GIC, -7.1%), and NZ (-10%). The adjustment may be also partly attributed to forcings other than GHG. Previous observational and model-based studies have found a cooling trend over the southeast and central United States, i.e., the United States ‘warming hole’. This is suggested to be attributed to anthropogenic aerosol forcing or internal climate variability with dominant variation by season, region, and time22. Although the amplitude is relatively smaller, the LSAT trends in MME and adjustments in CMME under the SSP2–4.5 scenario among the 44 regions vary linearly with those under the SSP5–8.5 scenario, with correlation coefficients of 0.96 and 0.87, respectively.

Generally, as suggested by CMME, the warming challenge in future projections may be more intense over Asia and less intense over American continents than suggested by the raw CMIP6 MME projections. The physical mechanism of the CMME adjustment is investigated and discussed in the next subsection.

LSAT-response to CO2-forcing and its impact on the LSAT trend reproduction

Global warming in the 20th century is mainly attributable to the increase in well-mixed greenhouse gases (WMGHGs), mainly CO2. The 1pctCO2 experiment is a transient climate simulation, in which the CO2 is the only anthropogenic external forcing. It is an idealized CO2-forced experiment that resembles the CO2 forcing during the Industrial Era. Here we compare the LSAT-response to CO2-forcing in MME and CMME in 1pctCO2 to quantify the effect of our bias-correction procedure, i.e., the CMME, in eliminating CO2-forcing response biases in climate models.

Figure 5a and b show the LSAT linear trend from 1850 to 1910 in the 1pctCO2 experiments for MME and CMME. The great similarity between the LSAT-response to CO2-forcing (Fig. 5a) and the LSAT trend with all forcings (Fig. 1b) with a correlation coefficient of 0.91, confirms the dominant role of CO2 in global warming. The warming trend in CMME generally resembles the MME results, but with different amplitude of regional warming (Fig. 5b). Figure 5c shows the differences between CMME and MME in LSAT-response to CO2-forcing in the 1pctCO2 experiment, which allows us to compare the CO2-response in MME to what the response would be if the historical LSAT biases were smaller. Interestingly, their differences closely resemble the historical trend biases in the MME with opposite signs (Fig. 1d). Over more than 15,000 model grids, the corresponding spatial correlation coefficient is -0.76, significant at 99% confidence level using student’s t-test. That is, the historical LSAT trend biases in MME are mainly due to model capability in simulating CO2-forcing response. Future constraint in SSP5–8.5 and SSP2–4.5 by CMME is therefore physically relative to its more reasonable CO2-forcing response since CO2 is also the dominant forcing in these scenarios.

Fig. 5: Linear LSAT trend in the 1pctCO2 experiment from 1850 to 1910, when CO2 increases from pre-industrial value to about 500ppmv, in.
figure 5

(a) MME and (b) CMME. (c) Differences between CMME and MME. Units: oC/ 100 years−1.

Discussion

The radiative forcing due to anthropogenic activity has increased continuously during the Industrial Era, with increases in atmospheric CO2 playing a major role. By observational constraint, our study found that model response to CO2-forcing is crucial for model performances in reproducing recent LSAT trends, especially on regional scales. Therefore, under future scenarios with increasing CO2 emission, models with better capability in reproducing recent LSAT trends are expected to give better estimations of future LSAT change.

In this study, we introduced a method to constrain future LSAT projections based on observed constrained model ensemble (CMME), especially on regional scales. The CMME is generally an extension of the standard MME method but the effect of poorly performing models is eliminated from the mean and its estimates of the future are informed by present-day model performance. Under the SSP5–8.5 and SSP2–4.5 scenarios, the CMME is close to the raw CMIP6 ensemble (MME) on a global scale, but the regional warming amplitudes are quite different. CMIP6 models may underestimate future warming over Asia but overestimate it over the American continents. More specifically, the cooling adjustments over Central and Eastern North America (CNA and ENA) are more than 10% of the MME projection.

Therefore, we should take care when handling regional LSAT projections by CMIP6 models, since the CMIP6 model response to CO2-forcing may be larger or smaller than expected and vary with region. Moreover, we should take great caution with projections over the regions with larger projection adjustment and lower model capability, i.e., South-Western South America (SWS) and New Zealand (NZ), where factors other than CO2 may also be important for the LSAT changes.

Methods

The constrained ensemble

The constrained ensemble is a set of ensemble members, by selecting models for each grid-point based on LSAT trends from 1950 to 2014 under certain OBS-Based Threshold (OBT). The OBT is set to a certain range centered on the observed LSAT trend. A model only contributes to the statistics of the CMME at the grid-points for which it satisfies the observational constraint. As shown in Fig. 2a, with the narrowing of OBT, the pattern correlation coefficient with the observation and the invalid grid-points (grid-points with missing value) both increases in CMME. We choose the 0.5 to 1.5 range of the corresponding observed counterpart as the threshold for CMME. In CMME, about 21 models (62% of the models) on average are selected, pattern correlation is about 0.93, and less than 3% of grid-points are filled with missing values.

We further examine the distribution of selected grid-points in each model (Figures not shown). Under the chosen OBT, the percentage of valid grid-points in all land surface grids ranges from 41.2% to 71.3%, about 60.7% on average. There are also not noticeable discontinuities in the coverage of valid grid-points. Quantile mapping bias correction algorithms are commonly used to modify future model projections23. However, due to the limited number of model projections, the TAS trends are discrete-continuous distributed after quantile mapping (Figures not shown), which does not meet the nature of TAS trend mappings. That is, quantile mapping bias correction algorithms may be not suitable here.

The constrained ensemble is used to calibrate the regional LSAT projection under the SSP5–8.5 and SSP2–4.5 scenarios in the 21st century in our study. The 3-point 1–2–1 low-pass filter is used to remove fluctuations less than 50 years in CRU and all the employed model outputs. It is aimed to minimize the impact of internal variability, especially the AMO and PDO. Further comparison shows that the effects of the band-pass filter on model selection are small, which suggests that the performance filter is selecting models mainly based on their forced responses, not on internal variability.

The observation and all the model outputs are interpolated to a regular 1° × 1° grid by bilinear interpolation.

Data

Grided observational dataset

The observational gauge-based gridded monthly mean LSAT from the Climatic Research Unit (CRU TS4) at University of East Anglia from 1950 to 2014 serves as the observation ref. 24. The dataset was derived from historical weather stations and interpolated on a 0.5° grid over the land surface. It is highly reliable, especially in mid-latitudes where there is a dense observation network.

CMIP6 models datasets

Simulations of 33 models from the Coupled Model Intercomparison Project Phase 6 (CMIP6) were used. Four experiments are relevant to our study: all-forcing historical experiments from 1950–2014, idealized 1pctCO2 experiments from 1850 to 1910, and Shared Socioeconomic Pathways 5–8.5 (SSP5–8.5) and 2–4.5 (SSP2–4.5) experiments in the 21st century. Table 2 lists the models’ names, institutions, and resolution of the atmospheric component. For the sake of equality, we use only the first member of each model if multiple runs were produced.

Table 2 Information of the 33 CMIP6 CGCM models used in this study.

In the 1pctCO2 experiment, CO2 increases at a prescribed rate of 1% per year from 285ppmv in 1850AD until the concentration doubles at model year 70. The 1pctCO2 experiment is an idealized configuration but can be used to identify the climate response to CO2 increase as the experiment does not include any confounding effects from other climate forcings like aerosols and land-use change.

The SSP5–8.5 scenario represents the high end of the range of future pathways and produces a radiative forcing of 8.5 W m−2 in 2100. The SSP2–4.5 scenario represents the intermediate-emission scenario with a nominal 4.5 W m−2 radiative forcing level by 2100.