Introduction

The global mean surface air temperature change (\(\Delta T\)) is an important metric of climate change. The \(\Delta T\) brings about vast consequences, such as changes in the hydrological cycle1,2,3,4,5, clouds6,7,8, radiative fluxes3,6,8,9, sea levels10,11, sea ice11,12,13, carbon emissions6,14, as well as regional climate15,16. To mitigate climate change, the United Nations Framework Convention on Climate Change proposed the Paris Climate Agreement to limit the global temperature increase below 2 °C and pursued efforts to limit it to 1.5 °C. According to estimations, the achievement of the goals in this century requires CDR17,18,19,20,21. It may interest us how the CE of CDR is implicated in the reversal of climate change; it is tightly associated with the stress regarding the realization of temperature goals.

If CDR is applied after the increase in CO2 concentration, the \(\Delta T\) exhibits hysteresis: it continues to rise for several years and then decreases; it cannot return to zero when the CO2 concentration recovers to an unperturbed level4,11,14,22,23. To date, it has been reported that the CDR rate, equilibrium climate sensitivity (ECS) and ocean heat uptake may alter the hysteresis of the \(\Delta T\)14. However, it is still unclear which factors are relatively important to the CE of CDR.

Most of the literature on the impacts of CDR is based on simulations of complex models, such as coupled general circulation models (CGCMs) and Earth system models (ESMs). Due to the great computational expense, investigations on CDR are mainly based on a single or few models6,10,11,14,24. This, to some extent, has hindered the understanding of the effects of some climatic properties on the behavior of the \(\Delta T\) under CDR. For example, it is difficult to modulate the parameters associated with ocean processes. Recently, the Carbon Dioxide Removal Model Intercomparison Project (CDRMIP)17 has been sponsored as a contribution to Coupled Model Intercomparison Project phase 6 (CMIP6)25. Within the project, although 8 models are used to conduct 1pctCO2-cdr simulations (in which the CO2 concentration declines) and the climatic properties of the models differ, it is still inadequate to discuss the effect of one climatic property while isolating the others.

The forcing and response energy budget framework may expand the understanding. To seek a straightforward understanding of the \(\Delta T\), the Intergovernmental Panel on Climate Change (IPCC) proposed the forcing and response energy budget framework26,27,28. It is expressed as follows:

$$\Delta N=F-\lambda \Delta T$$
(1)

where \(\Delta N\) is the net incoming radiative flux at the top of the atmosphere (TOA), \(F\) is the ERF and \(\lambda\) is the climate feedback parameter (net radiative flux feedback to the TOA per \(\Delta T\)). Conventionally, if the CO2 concentration is 2 times the preindustrial level and the climate reaches equilibrium, the \(\triangle T\) equates to the ECS (but for convenience, the present analysis uses 4 times the preindustrial concentration). To evaluate the transient response, the IPCC further introduced a two-layer EBM28,29,30,31,32, which is an accredited tool in the translation of ECS to transient climate response28. Because of its small computational cost, the framework provides an opportunity to expand the understanding of the effects of each climatic property while isolating the effects of others.

Results

Reproducibility of the EBM

Based on the outputs of 45 models in CMIP6, the EBM parameters were calibrated. Then, each CMIP6 model was applied to yield a surrogate EBM. The parameters are calibrated based on data from year 1–150 of Abrupt 4×CO2 simulation and year 1–140 of 1pctCO2 simulation; the results of EBMs in year 141–280 are predictions. The EBM captures the \(\Delta T\) evolutions under CO2 forcing well.

In the Abrupt 4×CO2 simulation, the \(\Delta T\) in CMIP6 models are all captured by their surrogate EBMs. In the simulation, the CO2 concentration abruptly quadruples relative to the preindustrial level and is then held fixed. The \(\Delta T\) in both the CMIP6 and EBM results rises dramatically and then increases at a slower rate (Supplementary Fig. 1); if compared individually, the two kinds of results are nearly identical. Taken the \(\Delta T\) in year 150 for example, the individual results in the CMIP6 and EBM is well coincident (Fig. 1a).

Fig. 1: The reproducibility of the ∆T in EBMs.
figure 1

The scatterplot of the \(\Delta T\) (unit: K) in EBM (x axis) and CMIP6 (y axis). In (a), the results are the \(\Delta T\) in year 150 in Abrupt-4×CO2 simulations. In (b), the results are the change in year 140 in 1pctCO2 simulations. In (c), the results are the CE of CDR in 1pctCO2-cdr simulations; the CE is defined as the \(\Delta T\) differences between the ends of 1pctCO2 simulations and 1pctCO2-cdr simulations.

In the 1pctCO2 simulation, most of the surrogate EBMs captures the \(\Delta T\) evolution in CMIP6 results. The \(\Delta T\) evolution of both the CMIP6 models and their surrogate EBMs displays the same ramp-up feature and nearly all the surrogate EBMs fit the \(\Delta T\) evolution of their CMIP6 models well (Supplementary Fig. 2). When the increase in the CO2 concentration ceases in year 140 (Fig. 2a), the \(\Delta T\) in most CMIP6 models and their surrogate EBMs are nearly identical (Fig. 1b). The exception is CESM2, GISS-E2-1-G and KACE-1-0-G (Fig. 1b and Supplementary Fig. 2). For CESM2, the mismatch in temperature in year 140 is possibly because the model may not fulfil our conditions of the ERF-CO2 relationship (Eq. 6). GISS-E2-1-G displays an unusual plateau in the \(\Delta T\) after 82 years (Supplementary Fig. 2). In KACE-1-0-G, the initial global mean surface air temperature is ~1 K cooler than the end of preindustrial simulation; it does not satisfy the protocol that 1 pctCO2 simulation initiates from the end of preindustrial simulation25. So GISS-E2-1-G and KACE-1-0-G are excluded in the subsequent analysis.

Fig. 2: The evolution of atmospheric CO2 concentration and ∆T.
figure 2

The evolved CO2 concentration (a; unit: ppm) is the forcing of the single parameter sensitive and reconstructed simulations. The time serials of \(\Delta T\) (units: K) are the results of single parameter sensitive simulations. bh demonstrates the results of the set of EBM simulations in which one parameter (indicated above each panel) was varied while the other parameters were set as multimodel averages; the colors of the curves from light to dark represent the varied parameters from small to large. The x-axis denotes the year.

For the 1pctCO2-cdr simulation, only 8 models are available. Since the 1pctCO2-cdr simulation is the extension of the 1pctCO2 simulation, the CE of CDR is defined as the temperature difference between the end of 1pctCO2 simulation (year 140 in Fig. 2a) and that of 1pctCO2-cdr simulation (the year 280 in Fig. 2a). Considering the interference of climatic internal variability, six surrogate EBMs (ACCESS-ESM1-5, CNRM-ESM2-1, CanESM5, GFDL-ESM4, MIROC-ES2L and UKESM1-0-LL) reasonably reproduce the CE, all with errors and less than their climatic internal variability (Fig. 1c and Supplementary Fig. 2 and Supplementary Table 2 in Supplementary Information). Two surrogate EBMs (CESM2 and NorESM2-LM) underestimate the CE (Fig. 1c and Supplementary Table 2). For CESM2, similar to the situation in the 1pctCO2 simulation, the difference in CE is possibly due to a mismatch of the ERF-CO2 relationship. For NorESM2-LM, the \(\Delta T\) evolution during years 141–280 is nearly symmetric to that during years 1–140, which is an “unusual” behavior which does not match the \(\Delta T\) behavior when the CO2 concentration drops as described in previous literature4,6,11,14,22,23,33. Despite the “unusual” behaviors and climatic internal variabilities, the surrogate EBMs are able to reproduce the CE in CMIP6 models under the application of CDR.

As the EBM contains the simplified energetic processes associated with the \(\Delta T\), its good reproducibility in the aforementioned scenarios instills confidence its ability to study the effects of climatic properties on the CE of CDR. Together with the low computational cost of the EBM, the present analysis conducted 391 EBM simulation ensembles (see “Methods” section for more information) with varying climatic properties, greatly augmenting the sample size to supplement 1pctCO2-cdr simulations, which only comprise 8 available models.

The CE of CDR

Under CDR conditions, the \(\Delta T\) is affected by the accumulation of the ERF if the CO2 concentration increases before the CDR is applied. Considering this, the EBMs are driven by the idealized evolution of CO2 concentration below: during years 1–140, it increases 1% per year; during 141–280, the pathway mirrors that of years 1–140 (Fig. 2a). This favors the acquisition of data with a high signal-to-noise ratio. The evolution of the \(\Delta T\) discussed in the following is based on the outputs of EBMs under such a pathway.

As changes in parameters affect the CE of CDR, the present analysis conducted single parameter sensitive simulations to investigate the effects of the parameters in the EBM (see “Methods”). The evolution of the \(\Delta T\) is displayed in Fig. 2. Overall, the \(\Delta T\) increases with the CO2 concentration from year 1 to 140; it decreases as the CO2 concentration declines from year 141 to 280; when the CO2 concentration recovers to preindustrial level, the \(\Delta T\) cannot fall back to 0. The \(\Delta T\) varies as the parameters alter.

The averages and diversities of CDR CE in the single parameter sensitive simulation, as well as the reconstructed simulations, are displayed in Fig. 3a. In the reconstructed results, the averaged CE of CDR is 3.5 K with a standard deviation of 0.7 K.

Fig. 3: The average and diversity of the CE features resulting from individual and all parameters.
figure 3

The error plots of the features of the \(\Delta T\) under CDR scenarios in EBMs. For each of the 7 leftmost parameters on the x-axis, the corresponding box represents the span of the \(\Delta T\) feature in the single parameter sensitive simulations; “Rec” represents the results of the reconstructed simulations. The dots are averaged results; the top and bottom lines of the error bars indicate the ±1 intermodel standard deviation, respectively. The features include the CE of CDR (a; unit: K) and the lag when cooling emerges after year 140 (b).

The leading factor affecting the CE is the coefficient of the vertical heat exchange in the ocean (parameter \(\gamma\)). The sole change in \(\gamma\) leads to a CE of CDR with an average of 3.5 K and a standard deviation of 0.7 K, which is almost identical to the reconstructed results (Fig. 3a). This parameter means that under the same temperature difference between the Earth’s surface and the deep ocean, a higher parameter yields higher heat exchange between the two. For instance, if the Earth’s surface is warmer than the deep ocean, more heat is loss from the former to latter, resulting in smaller temperature gap between the two layers and smaller temperature response of the Earth’s surface; if the Earth’s surface is cooling and the deep ocean is warmer than it, it receives more heat from the deep ocean, also leading to a smaller temperature gap between the two layers and a less cooling rate of the Earth’s surface.

Inferred from the magnitudes of the net energy gained by the Earth’s surface (\({Cd}\left(\Delta T\right)/{dt}\); Figs 4g and 5a, b), before year 200, Eqs. (4) and (6) in “Methods” can be expressed as:

$$F-\lambda \Delta T-\varepsilon H\approx 0$$
(2)
Fig. 4: The evolution of temperature and energy fluxes when the coefficient of vertical heat exchange in the ocean changes.
figure 4

The time serials in EBMs when the coefficient of vertical heat exchange in the ocean (\(\gamma\)) varies: \(\Delta T\) (a; unit: K), temperature change in the deep ocean (b; unit: K), changes in temperature differences between the Earth’s surface and the deep ocean (c; unit: K), ERF (d; unit: W m−2), energy feedback (\(-\lambda \Delta T\); e; unit: W m−2), energy loss of the Earth’s surface to the deep ocean (\(-\varepsilon H\); f; unit: W m−2), and net energy gained by the Earth’s surface (the sum of (ce); g; unit: W m2). The colors of the curves from light to dark represent \(\gamma\) from small to large. The results are the output of the single parameter sensitive simulations in which \(\gamma\) was varied and the other parameters were fixed.

Fig. 5: The energy fluxes during three periods and the derivative of the fluxes during one period when the coefficient of vertical heat exchange in the ocean changes.
figure 5

The averaged radiation fluxes in Single parameter sensitive simulations during three periods: a year 1–140, b year 141–200 and c year 261–280. d is the derivative of the fluxes during year 261–280. Red, blue, green and gray bars are ERF, radiation to deep ocean (\(-\varepsilon H\)), climate feedback (\(-\lambda \Delta T\)) and \({Cd}\left(\Delta T\right)/{dt}\), respectively. The values from left to right in x-axis mean the \(\gamma\) is from large to small. Positive and negative mean the Earth’s surface receiving and losing energy, respectively. The units of (ac) are W m−2, the one of (d) is W m−2 year−1.

This equation indicates that the ERF is approximately balanced by heat transfer from the Earth’s surface to the deep ocean and radiation feedback to the TOA.

When the CO2 concentration increases, the Earth’s surface receives energy due to CO2 forcing, resulting in warming (Fig. 4a, d). Since the deep ocean has a much larger heat capacity than the Earth’s surface (Supplementary Table 1), the surface warms faster than the deep ocean (Fig. 4c). A portion of the injected energy flows into the deep ocean (Fig. 4f), through advection, diffusion and mixing34; while the rest primarily feedbacks to the TOA (Fig. 4e). If only \(\gamma\) varies, a higher \(\gamma\) value leads to more heat being transferred from the Earth’s surface to the deep ocean (higher \(\varepsilon H\)), resulting in a lower efficiency of ERF in the warming of the Earth’s surface (Fig. 5a). This is accompanied by a larger-warmed deep ocean, a less-warmed Earth’s surface and a smaller temperature difference between the two layers. Consequently, the \(\triangle T\) in year 140 is smaller (Fig. 4a, b, f).

After year 140 but before the temperature gap between the Earth’s surface and the deep ocean disappears, higher \(\gamma\) values lead to slower decrease in the \(\Delta T\). During this period, although the ERF is decreasing, it is still the primary forcing (Fig. 5b). Similar to the period from years 1–140, the injected energy is partially transferred to the deep ocean and partially returned to the TOA. With higher \(\gamma\) values, more energy is transferred to the deep ocean, resulting in lower efficiency in driving changes in the \(\Delta T\) (Eqs. 2 and 3 and Fig. 5b); consequently, when the ERF declines, the \(\Delta T\) drops at a slower pace (Fig. 4a, d). Furthermore, if \(\gamma\) is higher, the rate of the \(\Delta T\) decrease must be slower to maintain a consistent relationship between net energy budget and the \(\Delta T\) change (refer to Supplementary Discussion in Supplementary Information).

As year 280 approaches, the CO2 forcing is nearly vanished, and the warmed Earth surface emits outgoing radiative fluxes, contributing to a cooling to itself (Fig. 4a, e, d). Since the deep ocean is warmer than the Earth’s surface, it provides heat to the surface (Fig. 4c, f). As the derivative of \({Cd}\left(\Delta T\right)/{dt}\) is small in this period (Fig. 5d), the derivative of both sides of Eq. (4) against t yields:

$${F}^{{\prime} }-\lambda {\Delta T}^{{\prime} }-\varepsilon {H}^{{\prime} }\approx 0$$
(3)

That means that the rate of the decline ERF is balanced by the rates of climate feedback and energy loss from Earth’s surface to deep ocean.

Higher \(\gamma\) values tend to result in more dramatic changes in the heat exchange between the deep ocean and the Earth’s surface; when approaching 280, it leads to a higher \(-\varepsilon {H}^{{\prime} }\) (Fig. 5d). According to Eq. (3) and with \({F}^{{\prime} }\) being fixed in all ensembles (Fig. 5d), the \({\Delta T}^{{\prime} }\) must be higher to maintain the balance of \({F}^{{\prime} }\), \(-\lambda {\Delta T}^{{\prime} }\) and \(-\varepsilon {H}^{{\prime} }\). Given that the \({\Delta T}^{{\prime} }\) is negative, its magnitude must be small and the \(\Delta T\) need to decrease at a slower rate (Fig. 4a).

Based on the above results, it is evident that higher coefficient of the vertical exchange in the ocean results in a slower decrease in \(\Delta T\) in period of year 141-200 and years approaching 280, indicating a lower CE of CDR. Conversely, a lower \(\gamma\) value corresponds to a higher CE (Fig. 4a). This finding aligns with Ehlert and Zickfeld10, suggesting that enhanced vertical mixing results in a lower CE. The out-of-phase relationship of the \(\Delta T\) between year 140 and 280 is not observed in other single parameter sensitive simulations (Fig. 2), further supporting the notion that the coefficient of the vertical heat exchange in the ocean to be the primary factor influencing the CE of CDR.

Simply speaking, the deep ocean acts as a “reservoir” of heat, wherein an increase in the CO2 concentration coupled with a higher \(\gamma\) leads to a faster injection of heat into the “reservoir”. This results in a less-warmed Earth’s surface and a larger amount of heat being restored by the deep ocean. Conversely, when the CO2 concentration is close to vanishing, the “reservoir” releases heat to prevent excessive cooling of the Earth’s surface.

The estimation of ERF due to CO2 concentration change (F) plays a secondary role in the CE variability. In the single parameter sensitive simulation, the sole change in F yielded output with an averaged CE of CDR of 3.4 K and standard deviation of 0.6 K (Fig. 3a). In this set of simulations, the timings of the processes, such as the peak of the \(\Delta T\) (year 144) and the transition of the deep ocean from heat uptake to heat release (year 245-246), do not change; the response amplitudes of the associated processes simply scale with F (Fig. 6). Higher F leads to stronger ERF, hence the higher temperature response of the earth surface and deep ocean, larger temperature gap between the earth surface and deep ocean, stronger vertical heat exchange in the ocean, more intensified energy gained by the Earth’s surface when the CO2 concentration increases, as well as more intensified energy lost by the earth surface when the CO2 concentration decreases (Fig. 6). Thus, higher F induces a relatively higher CE of CDR (Fig. 4a).

Fig. 6: The evolution of temperature and energy fluxes when the estimation of ERF varies.
figure 6

The time serials of energy processes and temperature change when F varies: the time series of the \(\Delta T\) (a; unit: K), temperature change in the deep ocean (b; unit: K), changes in temperature differences between the earth surface and the deep ocean (c; unit: K), ERF (d; unit: W m−2), energy feedback (\(-\lambda \Delta T\); e; unit: W m−2), energy loss of the earth surface to the deep ocean (\(-\varepsilon H\); f; unit: W m−2) and net energy gained by the earth surface (the sum of (ce); g; unit: W m−2). The colors of the curves from light to dark represent the F from small to large. The results are the output of the single parameter sensitive simulations in which F was varied and the other parameters were fixed.

The individual contributions of the remaining parameters are much smaller (Fig. 3a), and therefore, their roles are not discussed here.

The timing of CE emergence

Another important aspect related to the CE of CDR is the timing of the CE emergence (TOCEE). Previous studies have generally agreed that if CDR is applied, the \(\Delta T\) may peak several years after the maximum CO2 concentration1,6,14,23,24. This implies that the CE of CDR does not immediately emerge upon the initiation of CDR. In this context, we defined the TOCEE the first year after 140 when the \(\Delta T\) is lower than that in year 140. In the Reconstruct simulations, the mean TOCEE is 13.6 years. If one is interested in the CE during a period shorter than the entire CDR process, it is crucial to understand this TOCEE: before the TOCEE, the CE is less than zero, indicating a warming effect; only after TOCEE, the CE is positive. The relationship between CE and CO2 concentration is not strictly linear.

However, the influences of the parameters on the TOCEE are minor. In the Reconstruct simulations, the TOCEE exhibits relatively small diversity, with a standard deviation of 4.3 years (Fig. 3b). The single parameter sensitive simulations, it is evident that the primary factor affecting TOCEE is the heat capacity of the deep ocean (C0), accounting for a standard deviation of 3.0 years. The diversity in TOCEE is also influenced by the ECS, the coefficient of vertical heat exchange in the ocean (\(\gamma\)) and the heat capacity of the Earth’s surface (C), with standard deviations of 2.6, 2.6 and 2.1 years, respectively. Considering the limited diversity in the TOCEE, the role of the parameters is not extensively discussed in present analysis.

Discussion

The EMB results indicates that even under the same CDR pathway (reducing CO2 concentration by 1% per year for 140 years to pre-industrial level), the CE of CDR varies. Two factors primarily affect the CE: the primary one is the coefficient of the vertical heat exchange in the ocean and the second one is the estimation of ERF to changes in CO2 concentration. Additionally, it is worth noting that the CE does not manifest immediately after the application of CDR; typically, it emerges approximately 14.3 years later.

One implication of the findings concerns the coefficient of vertical heat exchange in the ocean. The question is how to define the depth of the boundary of the two layers. According to the estimation method in Geoffroy, et al.29 and the mean heat capacities of the Earth’s surface and deep ocean in Supplementary Table 1 in the Supplementary Information, the depth of the upper ocean is approximately 93 m, which roughly matches the previous estimation of the mixed-layer depth35. Thus, if we accurately estimate the vertical heat exchange coefficient between the mixed layer and the layer below, there is a potential to make a more precise projection of the CE of CDR.

The second implication concerns the estimation of ERF. Despite substantial effort and progresses made in understanding the relationship between ERF and changes in atmospheric CO2 concentration26,28,36,37, the ERF estimation still significantly affects the CE of CDR. To minimize the uncertainty in CE projection, a more accurate estimation of ERF is also required.

The third implication concerns the representativeness of model outputs in the CDRMIP. If the outputs of 8 models in the CDRMIP are used instead of outputs of the 43 CMIP6 models, the relative contributions to the CE diversity vary due to differences in the diversities of the parameters. Among the models participating in CDRMIP, owing to the larger standard deviation of ε (0.36; 0.24 in CMIP6 models; see Supplementary Table 3 in Supplementary Information) and smaller standard deviation of γ (0.18 W m−2 K−1; 0.33 W m−2 K−1 in the CMIP6 models), ε contributes the most to the diversity of CE, followed by γ (Supplementary Fig. 3 in Supplementary Information). Therefore, when discussing the CE diversity among the models, due to the inadequate sampling, the current 8 models in the CDRMIP cannot fully represent the feature exhibited by the 43 CMIP6 models.

In addition, one may concern the effect of the Atlantic meridional overturning circulation (AMOC) on the vertical heat exchange in the ocean. This concern arises from the following findings: (1) In response to changes in CO2 concentration, the AMOC may undergo a similar variation with slightly differences in phase38; (2) The AMOC is an important heat conveyor in the redistribution of global energy39. The relationship between the AMOC strength and the vertical heat exchange in the ocean appears contradictory. Some studies displayed that a stronger AMOC tends to delay global warming40,41, as it leads to a more vigorous vertical heat exchange in the ocean. On the other hand, a stronger decline in the AMOC may lead to less global warming42,43,44, suggesting that a weaker mean AMOC is associated with enhanced ocean heat uptake and intensified vertical heat exchange. This contradiction is reconciled by the explanation, proposed by He et al.45, that the base climate ocean circulation, including the AMOC, is more influential, as it establishes the basic state for transient climate responses. This reconfirms that the fixed coefficient of vertical heat exchange in the EBM is able to grasp the majority of changes in global temperature. However, it is essential to keep in mind that the coefficient of vertical heat transport in ocean is not constant. Compared with varying vertical heat transport, present simplification may lead to slight differences in ocean heat uptake, as well as the global climate response.

Methods

CMIP6 dataset

The parameter calibration of the EBM and the performance evaluation are based on CMIP6 simulations25, including (1) preindustrial control simulation: the CO2 concentration in the CGCMs or ESMs is held constant (284.7 ppm). The minimum number of simulation years is 500. (2) Abrupt 4×CO2 simulation: the CO2 concentration in the CGCMs and ESMs is abruptly quadrupled relative to the preindustrial level and then held constant. The simulation is initiated from the end of the preindustrial control simulation. The minimum number of simulation years is 150. (3) 1pctCO2 simulation: starting from the preindustrial level, the CO2 concentration in the CGCMs and ESMs is increased 1% per year until it quadruples at year 140. Then, it is held constant for 10 years. The simulation is initiated from the end of the preindustrial control simulation. Only the outputs of the first 140 years are used in the present analysis. (4) 1pctCO2-cdr simulation: starting from the level at the 140th year of the 1pctCO2 simulation, the CO2 concentration in the CGCMs and ESMs is decreased 1% per year until it recovers to the preindustrial level. Then, it is held constant17. The simulation is initiated from the 140th year of the 1pctCO2 simulation. The minimum number of simulation years is 140. Only the outputs of the first 140 years are used in the present analysis.

Forty-five models are used in the present analysis and are listed in Supplementary Table 1 in the Supplementary Information. The outputs of the preindustrial control, Abrupt 4×CO2 and 1pctCO2-cdr simulations are all available. Only eight models (ACCESS-ESM1-5, CESM2, CNRM-ESM2-1, CanESM5, GFDL-ESM4, MIROC-ES2L, NorESM2-LM and UKESM1-0-LL) conducted the 1pctCO2-cdr simulation and they were used in present analysis.

EBM and parameter calibration

The present analysis employed a two-layer EBM29,30,31 to analyze the effects of climatic properties on the \(\Delta T\) evolution under a CDR pathway. The model splits the climate system into two layers: (1) the Earth’s surface, including the atmosphere, the land surface, and the upper ocean, and (2) the deep ocean. It includes some key climatic processes, such as climate feedback, heat transfer between the two layers, the heat capacity of the climate system, and the efficacy of deep-ocean heat uptake. The equations of the EBM are

$$C\frac{d\left(\Delta T\right)}{{dt}}=F-\lambda \Delta T-\varepsilon H$$
(4)
$${C}_{0}\frac{d\left({\Delta T}_{0}\right)}{{dt}}=H$$
(5)
$$H=\gamma \left(\Delta T-{\Delta T}_{0}\right)$$
(6)

In Eqs. (4)–(6), \(\Delta {T}_{0}\) is the temperature change in the deep ocean; F, λ, ε, γ, C, C0 and H are the ERF, climate feedback parameter, efficacy factor of ocean heat uptake, heat exchange coefficient between the two layers, heat capacity of the upper layer, heat capacity of the deep ocean and energy obtained by deep ocean, respectively. \(\varepsilon H\) is the heat loss to deep ocean from the Earth’s surface. The equivalent climate feedback is \(\lambda +\left(\varepsilon -1\right)H/\Delta T\), serving a varying parameter of climate feedback.

The parameters are calibrated based on abrupt 4×CO2 simulations in CMIP6 using the method in Geoffroy et al.30. Each model underwent 50 iterations to generation the parameters:

(1) For first iteration, we set \(\varepsilon =1\). The value of F, ECS and λ are obtained by performing a linear regression of \(\Delta N\) and \(\Delta T\), which are CMIP6 outputs, following Eq. (1). To calibrate C, C0 and γ, we introduce four parameters \({\tau }_{f}\), \({\tau }_{s}\), \({a}_{f}\) and \({a}_{s}\). Using the following equation:

$${\mathrm{ln}}\left(1-\frac{\Delta T}{{\rm{ECS}}}\right)\approx {\mathrm{ln}}\,{\rm{a}}_{s}-\frac{t}{{\tau }_{s}}$$
(7)

where the t is time. The linear regression of the left-hand side against right-hand side over the period 30–150 yields \({a}_{s}\) and \({\tau }_{s}\). Then, using:

$${a}_{f}=1-{a}_{s}$$
(8)

we can determine \({a}_{f}\). Using the averaging of first 10 years of abrupt 4×CO2 simulations and following:

$${\tau }_{f}=\frac{t}{{\mathrm{ln}}\,{\rm{a}}_{f}-{\mathrm{ln}}\left(1-\frac{\Delta T}{{\rm{ECS}}}-{a}_{s}{e}^{-\frac{t}{{\tau }_{s}}}\right)}$$
(9)

we can calculate \({\tau }_{f}\). To avoid division by 0, NorESM2-LM and NorESM2-MM utilized the first 4 and 7 years, respectively. C, C0 and γ can be calculated as follows:

$$C=\frac{\lambda }{\frac{{a}_{f}}{{\tau }_{f}}+\frac{{a}_{s}}{{\tau }_{s}}}$$
(10)
$${C}_{0}=\frac{{\tau }_{s}\left({\tau }_{f}{a}_{f}+{\tau }_{s}{a}_{s}\right)-C}{\varepsilon }$$
(11)
$$\gamma =\frac{{C}_{0}}{{\tau }_{f}{a}_{s}+{\tau }_{s}{a}_{f}}$$
(12)

Then, we can obtain the \(\varepsilon H\)

(2) For iteration i (\(i\ge 2\)), the calculation process may be slightly different. By adding Eqs. (4) to (5), we can obtain:

$$\Delta N=F-\lambda \Delta T-\left(\varepsilon -1\right)H$$
(13)

The multi-linear regression of \(\Delta N\) against \(\Delta T\) and \(H\) provided the F, ECS, λ and \(\varepsilon\). The value of \(H\) is derived from iteration (i−1). The other steps remain the same as the first iteration.

Finally, 50 iterations yield the specific values of these parameters, listed in Supplementary Table 1 of the Supplementary Information.

The ERF, F, is a function of the CO2 concentration (CO2) and satisfies32

$$F\left({\rm{CO}}2\right)={F}_{4\times }\left\{\left(1-f\right)\frac{\mathrm{ln}\left({\rm{CO}}2\right)}{\mathrm{ln}\left(4\times {{\rm{CO}}2}_{0}\right)}+f{\left[\frac{\mathrm{ln}\left({\rm{CO}}2\right)}{\mathrm{ln}\left(4\times {{\rm{CO}}2}_{0}\right)}\right]}^{2}\right\}$$
(14)

In Eq. (14), f is the fraction of nonlog-linearity of the ERF. F represents the ERF when the CO2 concentration is 4 times the preindustrial level. CO20 is the CO2 concentration during the preindustrial period. f is calibrated based on 1pctCO2 simulation using the method described in Geoffroy and Saint-Martin31. To evaluate f, we define:

$$x={\log }_{4}\left[\frac{{\rm{CO}}2}{{{\rm{CO}}2}_{0}}\right]-1$$
(15)
$$y=\frac{1}{{\log }_{4}\left[\frac{{\rm{CO}}2}{{{\rm{CO}}2}_{0}}\right]}\frac{\Delta N+\lambda \Delta T+\left(\varepsilon -1\right)H}{{F}_{4\times }}-1$$
(16)

then, f is calculated as:

$$f=\frac{\sum _{t}{xy}}{\sum _{t}{x}^{2}}$$
(17)

Year 35-140 of 1pctCO2 simulation is utilized to minimize the influence of noise. The calculation of f uses iteration. In the first iteration, the H is set to zero; For iteration i (\(i\ge 2\)), the H is determined based on the results of the iteration (i-1). 50 iterations yield the f.

In Geoffroy and Saint-Martin31, there is another option in which f is assigned as 0.09. For each model, we used the EBM to reproduce the \(\Delta T\) in the 1pctCO2 simulation based on the calibrated f and assigned f. If the root mean square error relative to that of the CMIP6 result was lower, the corresponding f was adopted; otherwise, the assigned f was used. The parameter f is listed in Supplementary Table 1 in Supplementary Information.

EBM Simulations

To study the effects of the parameters, the present analysis conducted the following simulations: (1) Single parameter sensitive simulation: To investigate the effects of the parameters in the Eqs. (4)–(6) on the evolution of the \(\Delta T\), we changed one parameter according to the calibrated parameters of the CMIP6 models and set the other parameters as the mean of the CMIP6 models (The parameters are listed in Supplementary Table 1 in Supplementary Information; the parameters of GISS-E2-1-G and KACE-1-0-G are not used due to incorrect setup). The EBM (Eqs. (4)–(6)) is forced by the CO2 concentration pathway, that is, during years 1–140, it uses the same concentration as that in the 1pctCO2 simulation; during years 141–280, it uses the same concentration as that in the 1pctCO2-cdr simulation (see Fig. 2a). The total ensemble number is 301. (2) Reconstructed simulation: This simulation was designed to evaluate the performance of the EBM and determine the importance of the parameters. For each CMIP6 model, the surrogate EBM uses its corresponding parameters in Supplementary Table 1 in Supplementary Information. The EBM is forced by the CO2 concentration pathway, as in the single parameter sensitive simulation. The total ensemble is 45. (3) Abrupt 4×CO2 reconstructed simulation: This simulation was designed to evaluate the reproducibility of the EBM for the Abrupt 4×CO2 simulation in CMIP6. For each CMIP6 model, the surrogate EBM uses its corresponding parameters in Supplementary Table 1 in Supplementary Information. The EBM is forced by the CO2 concentration, as in the Abrupt 4×CO2 simulation in CMIP6. The total ensemble is 45.