Introduction

The climate feedback parameter λ determines the magnitude of the Earth radiative response to a given change in global mean surface temperature (GMST). The less negative λ, the smaller the radiative response of the Earth to a given change in GMST. It means that, under an external forcing that raises GMST, such as increased atmospheric CO2 concentrations, the climate system will warm more when λ is less negative. So λ plays a central role in the magnitude of the Earth climate response to increasing atmospheric CO2 concentrations. But λ is poorly constrained.

In addition the magnitude of the climate response depends on the inverse of λ, so estimates of the magnitude of the climate response are very sensitive to uncertainties in estimates of λ. This is illustrated with a simple metric of the magnitude of the climate response: the equilibrium climate sensitivity (ECS). The ECS is the equilibrium surface temperature response to a doubling of atmospheric CO2 concentrations. It is inversely related to the climate feedback parameter λ by the global energy budget of the climate system at equilibrium:

$$\,{{\mbox{ECS}}}\,=-\frac{{{\Delta }}{F}_{2\times }}{{\lambda }_{eq}}$$
(1)

where ΔF is the forcing anomaly due to doubling atmospheric CO2 concentrations. Because of the inverse relation (1), the uncertainty in the climate feedback parameter is increased when propagated in ECS error budgets and it becomes the dominant source of ECS uncertainty1,2 and thus of projections of the amplitude of future climate change.

The main reason why the climate feedback parameter remains uncertain is because it cannot be measured directly. It is instead estimated from a variety of methods ranging from process understanding of individual feedbacks to observational constraints derived from paleo-reconstructions or from the recent record of the global energy budget. Among these methods, the global energy budget approach is arguably the most simple and direct method to estimate the climate feedback parameter. It is a simple method compared to the process understanding approach because it uses only estimates of the radiative forcing ΔF, the surface temperature changes ΔT, and the planetary heat uptake ΔN through the simple global energy balance framework ΔN = ΔF + λΔT. It is a direct method compared to paleo-reconstruction methods as it uses instrumental observations of the current climate system instead of proxy indicators of a past and potentially different climate system. For these reasons there is a priori confidence in the global energy budget approach and it has been widely used in the literature to estimate the historical climate feedback parameter and derive an observational constraint on the uncertainty range of the ECS2,3,4,5.

We count in total three types of methods that derive estimates of λ and of the ECS, from observations of the global energy budget: methods based on the difference of the energy budget between two time periods, methods based on a regression of the energy budget and methods which analyse the changes of the energy budget in response to internal variability. A major issue with all these observational methods is that they provide with estimates of the climate feedback parameter during the historical period, \(\tilde{\lambda }\), which are potentially different from the climate feedback parameter at equilibrium λeq (which is the climate feedback parameter that is involved in the ECS definition, see Eq. (1)).

Indeed, there is theoretical and modelling evidence that the global climate feedback changes with time under increasing CO2 concentrations. General circulation models (GCMs) show that λ tends to get less negative over time as equilibrium is approached. As a consequence, historical simulations of GCMs show climate feedback parameters that are generally more negative than the equilibrium climate feedback parameter. This is for two reasons. First because in GCM simulations, λ depends on the climate state which means λ depends on the magnitude of the climatological mean in T and F6,7,8,9,10,11. Second because λ varies with the pattern of SST6,12,13,14. However, over the historical period and under a doubling of the CO2 atmospheric concentration the climate state changes only by a few degrees such that the climate state dependance of λ is small and the pattern effect explains mostly the decrease in λ when the equilibrium is reached. The decrease in λ close to equilibrium means that estimates of the climate feedback parameter derived from the historical observations of ΔT, ΔF and ΔN, are systematically smaller than the equilibrium climate feedback parameter and need to be corrected for the pattern effect to provide a constraint on the ECS.

To correct \(\tilde{\lambda }\) estimates for the pattern effect, the classical approach in the literature is to use GCM simulations to quantify the relationship between the historical \(\tilde{\lambda }\) and the equilibrium λeq2,11,15. This is a first step forward but this is not satisfactory yet as the extent to which GCMs can capture the time evolution of λ over decades, associated to changes in the SST pattern, has not been evaluated. Here we use a recent regression method2 to derive the first observational estimate of the time variations of \(\tilde{\lambda }\) since 1970 and we compare this estimate with climate model simulations of \(\tilde{\lambda }\).

Over a given time period, the climate feedback parameter \(\tilde{\lambda }\) is related to the global mean temperature change ΔT, the radiative forcing change ΔF, and the change in planetary heat uptake ΔN, by the global energy budget:

$$\tilde{\lambda }=\frac{{{\Delta }}N-{{\Delta }}F}{{{\Delta }}T}$$
(2)

We use a differential form of the energy budget2 and we regress ΔN − ΔF over ΔT for all time-windows longer than 25 years which start after 1957. It yields estimates of the time variations of \(\tilde{\lambda }\) over 1970–2005 (see Methods). We use this approach rather than the method3,11,16 based on the difference between two historical states (a reference state around 1860 and a present state) because regressions are a better estimator of \(\tilde{\lambda }\) and they make a full use of the available data2,17. In addition, compared to the difference method, the regression method relies on recent instrumental data only, and there is no dependence on a late 19th century reference state that is still largely unknown2.

For ΔT, we use the HadCRUT4 ensemble data set of surface temperatures averaged globally and annually (see Fig. 1 and Methods). For ΔF we use the IPCC AR6 forcing data18 and the associated uncertainty (see Methods). The planetary heat uptake ΔN has long been the limiting dataset to estimate \(\tilde{\lambda }\) over past decades. Here we derive an estimate of N by combining the estimates of the continent, ice and atmosphere heat uptake from a recent inventory19 with an estimate of the ocean heat uptake (see Fig. 1). The ocean heat uptake largely dominates the planetary heat uptake (more than 90%19) and it is the largest source of uncertainty20. We derive the estimate of the ocean heat uptake primarily from ocean in-situ temperature analysis. We account for the uncertainty due to the interpolation of sparse in-situ data before the 1980s and due to instrumental corrections by using seven different products. All in-situ temperature analysis are based on the same in-situ data and thus can show some systematic uncertainty. To account for the systematic uncertainty we use an independent estimate of the ocean heat uptake computed from the thermal expansion of the ocean which were derived from a recent reconstruction of the 20th century sea level budget21 (see Methods).

Fig. 1: Time series of the different terms of the Earth energy budget.
figure 1

a Time series of radiative forcing. b Planetary heat uptake. c Global mean surface temperature. All time series are low-pass filtered with a cutoff period at 15 years (see Methods). Confidence interval is 17–83%. Dash lines indicate the time series corrected for major volcanic eruptions.

Results

Figure 2 shows the estimates of \(\tilde{\lambda }\) over any time-window longer than 25 years that is included in the period 1957–2017. In the following, we focus on the estimates of \(\tilde{\lambda }\) over successive 25-year windows included within 1957–2017 (see the dashed grey line at y = 25 yr on Fig. 2). It yields a time series for \(\tilde{\lambda }\) over 1970–2005 (Figs. 2a and 3a). This time series confirms that \(\tilde{\lambda }\) has changed with time (Fig. 3a).

Fig. 2: Estimates of the median climate feedback parameter.
figure 2

Climate feedback parameter \(\tilde{\lambda }\) estimated over any window longer than 25 years included within 1957–2017. The vertical axis indicates the length of the window in years. The horizontal axis indicates the central date of the window in years (as an example, the first 25-year window cover the period 1957–1982 and its central date is 1970 thus its coordinate on the triangle are x = 1970, y = 25. Note that a horizontal section of the triangle at the coordinate y = p gives the time series of \(\tilde{\lambda }\) computed over successive windows of length p years). a Estimates of \(\tilde{\lambda }\) not corrected for the effect of major volcanic eruptions. b Estimates of \(\tilde{\lambda }\) corrected for the effect of major volcanic eruptions. Horizontal dashed lines indicate the horizontal sections that serve for the computation of \(\tilde{\lambda }\) time series on Fig. 3.

Fig. 3: Time series of the climate feedback parameter.
figure 3

Time series of the climate feedback parameter \(\tilde{\lambda }\) computed over windows of length 25 years (a, b) and over windows of length 40 years (c). These time series correspond to the horizontal sections indicated on the triangle of Fig. 2 (see the grey dashed lines on Fig. 2). a Time series of \(\tilde{\lambda }\) computed over successive windows of length 25 years (light blue curve) and 17–83% confidence level (light blue shaded area). The grey lines indicate the time series of \(\tilde{\lambda }\) computed over successive windows of length 25 years, derived from the AMIP-piforcing simulations. The yellow dot indicates the estimate of \(\tilde{\lambda }\) derived with the planetary heat uptake record of the DEEP-C project52. The magenta dot indicates the estimate of \(\tilde{\lambda }\) derived with the planetary heat uptake record of the CCI sea level budget project51 (see Methods). b Time series of \(\tilde{\lambda }\) computed over successive windows of length 25 years (light blue curve) and corrected for volcanic eruptions and 17–83% confidence level (light blue shaded area). The orange dashed line is NOAA ERSST v5 PDO index, low-pass filtered with a cutoff period of 15 years. c Time series of \(\tilde{\lambda }\) computed over successive windows of length 40 years (light blue curve) and 17–83% confidence level (light blue shaded area). The grey curves indicate the time series of \(\tilde{\lambda }\) computed over successive windows of length 40 years derived from the AMIP-piForcing simulations.

Across the second half of the 20th century \(\tilde{\lambda }\) has varied with values within the range [−3.2, −1.0]W  m−2 K−1 (see Figs. 2a and 3a). In particular, \(\tilde{\lambda }\) shows a rapid increase of +2.1 W  m−2 K−1 between 1976 and 1990 followed by a continuous decrease between 1990 and 2005 of −1.9 W  m−2 K−1 which accelerates between 1998 and 2005. These variations are likely caused by the pattern effect as the climate state has little changed since 1970. In particular, the sharp time variations of \(\tilde{\lambda }\) in the 1970s and in the late 1990s could be explained by the pattern effect induced by large volcanic eruptions. Indeed, stratospheric sulfate aerosol forcing from large volcanic eruptions tends to produce a large radiative response in the few years following the eruption with relatively limited global temperature change compared to other forcings17. For 25-yr windows that include the large volcanic eruptions of Pinatubo (1991), Mt Agung (1963) and El Chichon (1982), this could result in a significant change of \(\tilde{\lambda }\). However, after correcting for volcanic eruptions (see dashed lines on Fig. 1 and Methods) and rerunning the regressions we find that the rapid increase of +2.1 W  m−2 K−1 in the 1970s has been reduced in amplitude by only 40% while the continuous decrease since 1980 which accelerates after year 2000 has remained unchanged (see Figs. 2b and 3b). We find that the residual increase in \(\tilde{\lambda }\) between 1973–1979 and 1987–1993 is strictly positive at the 89.8% confidence level (CL) and the decrease between 1987–1993 and 2001–2005 is strictly negative at the 90.4%CL meaning that \(\tilde{\lambda }\) has varied significantly over the last decades in response to another process than volcanic eruptions.

The drop in \(\tilde{\lambda }\) in the early 2000s (see bottom right angle of the triangle on Fig. 2b) occurs during the global warming “pause” also called the “hiatus” period. In our record, which is filtered with a 15-year pass filter (see Methods), the hiatus period is visible from 2003 to 2012 (Fig. 1). Over this period, surface temperature slowed while the ocean heat uptake slightly increased and the forcing kept increasing at about the same pace (Fig. 1). The sudden change in the rate of the global mean surface temperature rise caused a drop in \(\tilde{\lambda }\).

There has been debate on the intensity of the slowdown in global mean surface temperature in the literature: some sea surface datasets showing faster decrease in sea surface temperature (SST) in the 2000s than others22. We tested the regression with another global mean surface dataset which is based on a different SST (see Methods). It reduced slightly the observed drop in \(\tilde{\lambda }\) (see supplementary Fig. S1) but did not remove it, meaning this recent drop in \(\tilde{\lambda }\) is robust in observations.

The Clouds and the Earth’s Radiant Energy System (CERES)23 provides with the spectral breakdown of the TOA radiation budget between the hiatus period (2003–2012) and the post hiatus period (after 2012). CERES measurement shows that global mean outgoing shortwave remained relatively weak until the end of the hiatus and then sharply decreased because of a decrease in tropical Pacific low cloud cover whose regional distribution closely matched that of SST warming24,25. The SST warming showed a pattern typical of the positive phase of the PDO over the eastern Pacific. These observations are consistent with recent model studies which suggest that recent time variations in \(\tilde{\lambda }\) are due to large variations in the global mean cloud feedback in response to changes in the Pacific SST pattern26,27,28,29,30,31. Specifically, the physical mechanism proposed by these studies is that a positive (resp. negative) shift in the PDO induces a cooling (resp. warming) in the western tropical Pacific and a warming (resp. cooling) in the eastern tropical Pacific which weakens (resp. strengthen) the low-level atmospheric stability in the eastern part of the tropical Pacific. The decreased (resp. increased) stability leads to reduced (resp. increased) low-level cloud cover in this region which in turn reduces (resp. amplifies) global reflection of solar radiation at TOA meaning a strengthening (resp. weakening) of the global mean cloud feedback.

This study provides a longer record of EEI than CERES which enables to explore the time variations of \(\tilde{\lambda }\) before the 2000s. We find that the drop in \(\tilde{\lambda }\) during the hiatus period is not exceptional. \(\tilde{\lambda }\) was around − 2 W  m−2 K−1 also in the early 1970s and experienced a sharp increase in the mid 1970s followed by a continuous decrease since then. To identify the SST pattern that is related to these time variations in \(\tilde{\lambda }\), we regress the time series of \(\tilde{\lambda }\) (which is estimated with the successive 25-yr windows, see the light blue line in Fig. 3a and b) with the local SST that is previously filtered with a low pass filter and a cutoff period of 15 years (to remove noise, see methods). The correlation map shows a typical basin scale PDO pattern in the Pacific (with the large East-West dipole and the North-South Pacific asymmetry, see Fig. 4). The correlation is positive in the Eastern part of the Pacific and negative in the western part meaning that a cooling (resp. warming) in the western pacific and a warming (resp. cooling) in the eastern Pacific leads to a strengthening (resp. weakening) of the global feedback (i.e. a global feedback that becomes less negative). This comparison of \(\tilde{\lambda }\) against local SST shows that \(\tilde{\lambda }\) time-variations are indeed associated to the Pacific seesaw of the SST characterized by the PDO. The relationship between \(\tilde{\lambda }\) and the PDO SST pattern holds over the total period 1970–2005 and also over the smaller period 1980–2005.

Fig. 4: Correlation maps between the climate feedback parameter time series and local SST time series.
figure 4

a, b Correlation map between the observed time series of the climate feedback parameter \(\tilde{\lambda }\) computed over windows of length 25 years (i.e. the light blue time series plotted on Fig. 3a or b) and the observed local SST (from HadCRUT4) filtered with a low pass filter and cutoff period of 15 years. c, d Same as panels (a) and (b) but computed from outputs of the ensemble mean of AMIP-piForcing experiments. e, f Same as panels c and d but the correlation is computed against the simulated total cloud fraction rather than the local SST. Note that on panels (a, c, e) the correlation maps are computed over the whole period 1970–2005 and on panels (b, d, f) the correlation maps are computed over the period 1980–2015. Note also that middle and bottom panels are reproduced in the supplementary information Fig. S7 and S8, with the AMIP-piForcing experiment of each model rather than the ensemble mean.

To check the synchronicity of \(\tilde{\lambda }\) variations with the PDO oscillations we correlate the time series of \(\tilde{\lambda }\) (which is estimated with the successive 25-yr windows, see the light blue line in Fig. 3a and b) with the PDO index. We also test the correlation of \(\tilde{\lambda }\) with other climate modes of variability including the Indian ocean Dipole (IOD), the Southern Annular mode (SAM) and the Atlantic multidecadal oscillation (AMO) (see Methods). Only the correlation with the PDO is significant (with p-value < 0.05, see the supplementary information Table S1). The degree of correlation with the PDO is high (>0.78) and it is maximum for a 0 yr lag, confirming that multidecadal variations in \(\tilde{\lambda }\) are synchronous with the PDO (Fig. 3b). However, the smoothing of the data prevent from identifying any significant lag, between the PDO and the time-variations of \(\tilde{\lambda }\), that is below 1 yr. We cannot determine either which leads the other.

Overall, the fact that \(\tilde{\lambda }\) time variations are associated to the typical Pacific SST seesaw characterised by the PDO and the synchronicity between \(\tilde{\lambda }\) time series and the PDO index show that the pattern effect is closely related to the PDO not only during the hiatus period (as shown by CERES) but also at multidecadal time scales. This line of evidence combined with previous evidence from climate model studies, which shows that the PDO modulates the global mean cloud feedback26,27,28,29,30,31, suggest that, at multidecadal time scales, the primary driver for the pattern effect is the PDO.

This result has consequences for the decadal variability of climate. If confirmed, it means that when the PDO switches to a long positive (resp negative) phase (as it happened in 1976 when the PDO switched from a negative phase between 1960 and 1976 to a positive phase between 1976 and 1998), the SST pattern switches as well and the negative (resp. positive) feedback due to SST pattern changes to a positive (resp. negative) feedback, leading to rapid enhanced (resp. damped) warming for several decades.

For the long term future, an important question is the extent to which climate model simulations can reproduce the time variations in \(\tilde{\lambda }\) and its synchronicity with the PDO. This validation over the past is a necessary step to get confidence in the capacity of climate models to simulate the forced pattern effect under increasing atmospheric CO2 concentration and simulate the real world ECS. We compared our estimate of \(\tilde{\lambda }(t)\) with estimates derived from Atmospheric General Circulation Model (AGCM) simulations forced at the surface by observed historical SST (AMIP-piForcing simulations). These simulations allow to evaluate in climate models the radiative response of the climate system to the observed SST pattern (see Methods).

We find that, on long time scales (i.e. for windows of 40-year length and above), AGCMs estimates of \(\tilde{\lambda }(t)\) are consistent across models and are consistent with observations despite a small bias (Fig. 3c). (The bias is probably due to a more severe regression dilution in the observed estimate of \(\tilde{\lambda }(t)\) than in AGCMs simulation estimates because there is additional noise in observations of the surface temperature coming from instrumental errors, see Methods).

At shorter time scales however (i.e. for windows of length 25 years for example), AGCMs show very weak decadal variations in \(\tilde{\lambda }\) since 1970 (see Fig. 3a, Fig. S9 and also ref. 32’s Fig. 6a which shows AMIP-piForcing simulations of \(\tilde{\lambda }\) over successive 30-year windows). These weak variations do not reproduce the observed variations in \(\tilde{\lambda }\) associated with the PDO (Fig. 3a). To investigate further this discrepancy between AGCMs and observations, we consider the ensemble mean of AMIP-piForcing simulations and plot the correlation maps of \(\tilde{\lambda }\) with the local SST as we have done before with observations. Results with the ensemble mean are representative of the general AGCMs AMIP-piForcing simulations as AGCMs AMIP-piForcing simulations are generally consistent with each other in terms of \(\tilde{\lambda }\) and SST32 (see Fig. S7 and S8 of the supplementary information).

We find the ensemble mean of AMIP-piForcing simulations behave differently over 1980–2005 than over 1970–2005. Over the period 1980–2005 the correlation map between \(\tilde{\lambda }\) and the local SST shows a general PDO pattern as in observations although the correlation is significant for different portions of the PDO pattern (Fig. 4). Overall, this means that the relationship between \(\tilde{\lambda }\) and the PDO is broadly similar in observations and the ensemble mean of AMIP-piForcing simulations, after 1980. This is in agreement with ref. 32 who find in AMIP-piforcing simulations that the Earth has warmed after 1980 with a strong tropical Pacific SST gradient leading to an important pattern effect and a largely negative \(\tilde{\lambda }\) of −1.65 ± 0.46 W  m−2 K−1. However, from the map correlation of Fig. 4, we also note that there are important differences between observations and AMIP-piforcing simulations. Over the South American stratocumulus region, correlations for the ensemble mean of AMIP-piForcing simulations are strongly positive while they are weakly negative (or not significant) in the observations. Off of the west coast of North America, correlations are strongly positive in the observations but they are weakly positive or not significant for the ensemble mean of AMIP-piForcing simulations.

Over the longer period 1970–2005 the situation is different. The correlation map does not show the PDO pattern in the ensemble mean of AMIP-piForcing simulations (unlike in observations see Fig. 4). It shows a more globally uniform pattern with a local positive correlation in the atlantic sector of the Southern ocean. This result suggests that before 1980, the relation between \(\tilde{\lambda }\) and SST changes in the AMIP-piForcing simulations: the sensitivity of \(\tilde{\lambda }\) to the Southern ocean becoming dominant over the sensitivity to the Pacific SST. Such a change is not seen in observations (Fig. 4). Note that, although the ensemble mean of AMIP-piForcing simulations behaves differently before 1980, it does not lead to a significant change in \(\tilde{\lambda }\) before 1980 (Fig. 3). We suspect this is because there is a compensating effect of the cloud response in the eastern equatorial pacific and in the central Pacific before 1980 (Fig. 4e, f).

Overall, we find that in AMIP-piforcing simulations, \(\tilde{\lambda }\) variations are only weakly associated to the Pacific SST changes induced by the PDO. They are also associated to SST changes in other regions and in particular in the Southern ocean. This sensitivity of \(\tilde{\lambda }\) to the Southern ocean SST is only marginal in observations compared to the sensitivity to the PDO.

Why do AGCMs simulations show a low sensitivity of \(\tilde{\lambda }\) to Pacific SST before 1980 ? Is it because AGCMs show too little sensitivity of tropical low clouds to Eastern Pacific SST changes (as suggested by ref. 33 who find AMIP models have too little cloud sensitivity to SST, particularly for shortwave TOA flux. See their Figs. S7S11)? Is it because AGCMs show too high sensitivity of Eastern Pacific low clouds to the Southern ocean SST? Or is it because of both? Could it be also the atmospheric stochasticity alone, without any SST changes, which can cause decadal variations in the TOA net flux (see e.g. ref. 34, their supplementary information Fig. 17c,d)? We don’t know yet. But these are important questions to answer because if such systematic biases across AGCMs are confirmed, it means that the forced pattern in climate model simulations, which shows a positive low cloud feedback in response to the late Eastern Pacific SST warming, may be biased as well. This would mean that climate models’ estimates of the ECS are systematically biased. This is an important issue which calls for further research.

Methods

Estimating changes in radiative forcing, global mean surface temperature and planetary heat uptake

Radiative forcing

Here we use the ensemble of effective radiative forcing (ERF) from the IPCC AR635 in order to propagate properly the uncertainties and estimate rigorously the confidence level of the time variations in \(\tilde{\lambda }\). The ERF of the IPCC AR6 is computed from two prescribed SST and sea-ice experiments after removing the top of the atmosphere energy budget change associated with the land surface temperature response. So it includes the aerosol and the non-aerosol cloud adjustments. Note however that the IPCC ERF only accounts for the direct radiative effect of the land surface temperature response and not the indirect effect on water vapour and tropospheric temperature. But the latter effect is small of the order of a tenth of W  m−218. The IPCC AR6 provides the ERF in the form of an ensemble of 100,000 time series which has been generated to include all sources of uncertainty including the uncertainty due to the time correlation in errors. So, with this ensemble, we can propagate the uncertainty from the observed forcing to the estimate of \(\tilde{\lambda }\) (see below).

Global mean surface temperature

Global mean surface temperature (GMST) time series are taken from ref. 36 v2 (https://www-users.york.ac.uk/~kdc3/papers/coverage2013/series.html), spanning from 1850 to present. Following the recommendation of ref. 37, a scale factor of 1.09 is applied to this dataset to correct for the bias induced by the sea surface temperature present in the dataset that has been retrieved by satellite. Since we use ordinary least squares method with GMST as predictor, no uncertainty on GMST can be considered in the regression. However, we conduct a sensitivity study (see below) to check the sensitivity of our \(\tilde{\lambda }\) estimates with respect to the GMST product. In this sensitivity study we use the GISS GMST time series38 and the ERSSTv5 dataset39.

Planetary heat uptake

We estimate the total planetary heat uptake by making an inventory of the heat stored in the climate system. We account for the heat stored in the cryosphere, in land, in the atmosphere and in the ocean. We use the most recent estimate of the heat stored in the cryosphere, the atmosphere and land from ref. 19. The heat stored in the cryosphere, land and atmosphere represents only ~7% of the total planetary heat uptake while the ocean heat uptake (OHU) represents the remaining 93%. Thus, the Ocean Heat Uptake (OHU) places a strong constraint on the planetary heat uptake. For this reason, in this study, we pay a special attention in the estimate of the OHU and the associated uncertainty (see below). Observations of OHU are primarily derived from the in situ record of the ocean subsurface temperatures using the TEOS-10 equations of sea water2,40. For the regression method we need time series of N as long as possible to make full use of available information. The longest global and continuous ocean temperature datasets are provided by four research groups41,42,43,44. They are objective mapping of the ocean temperature over 1955–2017 either with a statistical interpolation of the World Ocean Atlas (WOA) in situ data41,42,43 or with a combination of statistical interpolation of WOA in-situ data and climate model information44. The four products are based on the same database of in-situ data but they show differences because they use different in-filling strategy for data gaps, different corrections for instrumental biases and different climatologies. To account for these differences we use an ensemble of five datasets. This ensemble comprises the NOAA dataset43, the MRI-JMA dataset42, plus two versions of the EN4 dataset41 (UK Met Office’s ENACT/ENSEMBLES version 4) and the IAP dataset44(Institute of Atmospheric Physics) (each version using a different correction of the instrumental biases45,46). We take as best estimate of OHU the average of the ensemble (named hereafter “objective mapping solution”) because averaging allows to reduce the effect of particular errors of individual datasets and emphasizes the common variability among datasets. The uncertainty in the objective mapping solution is characterized with the spread of the ensemble. An issue with in situ data is that the coverage is not global and get sparser as we go back in time. The deep ocean below 2000 m remains poorly observed. The main source of observations come from repeated hydrographic campaigns by scientific research vessels. In addition, before the ARGO era, the data collection relies primarily on ships of opportunity, leaving areas outside of major trade routes with few direct observations. The objective mapping approach is efficient to reconstruct ocean temperatures back to ~1950 but it does not extend below 2000 m due to the sparse sampling at these depths. To cope with this issue, we add to the objective mapping solution an extra deep OHU below 2000 m depth of 0.0 ± 0.04 W  m−2 between 1960 and 1990 and then 0.07 ± 0.04 W  m−2 from 1991 to 201719.

The deep ocean estimate from ref. 19 is based on a statistical method to detect large-scale trends in the deep ocean temperature from repeat hydrographic sampling47,48). It has a coarse spatial resolution in particular before the mid-1980s meaning its uncertainty is potentially underestimated before this period. To account for this potentially biased uncertainty we use another estimate of the top-to-bottom OHC change for 1946–201949 based on an autoregressive artificial neural network (hereafter called “ARANN solution”). It uses in-situ temperature data from the WOA and performs an iterative autoregression that adjusts spatio-temporal correlation scales over time (estimated from the in situ temperature data itself). The autoregressive neural network propagates information from well-sampled times and regions to more sparsely sampled areas. This approach, which has been successfuly tested with synthetic data produced from ocean models, is more robust to sparse data than the objective mapping approach used in the objective mapping solution. So, we use the ARANN solution as another estimate of the total OHC since 1957 along with the objective mapping solution. We assume the distance between the ARANN solution and the objective mapping solution provides an estimate of uncertainty due to the poor sampling of the ocean below 2000 m depth before the mid 1980s.

Another issue with in-situ data is that they can be systematically biased. A good example is the systematic instrumental bias in expandable bathythermograph discovered in the 2000s45. This means that the ARANN solution and the objective mapping solution which are both based on the same data (WOA) can both be systematically biased in the same way. To account for this source of systematic uncertainty we use a totally independent estimate of the global OHC changes since 1957 based on the sea level budget approach20. We use the sea level budget reconstruction from ref. 21 and consider global mean thermosteric sea level changes (GMTSL) estimated as the difference between the global mean sea level changes (GMSL) and the barystatic sea level changes (BSL). We derive the global ocean heat uptake over 1957–2017 by multiplying the GMTSL by the expansion efficiency of heat20,50. This solution is called hereafter the “thermosteric solution”. It is a solution on different sources of data that are totally independent from in-situ temperature data. GMSL changes are derived from 559 tide gauge records from the Permanent Service for Mean Sea Level until 1993 and from satellite altimetry after 1993. We use the GMSL time series from 1957 only, because there is a dramatic drop in the number of tide gauges available before (particularly in the Indian and subPacific oceans which are key) and the closure of the sea level budget is of a lesser quality21. The BSL is derived from GRACE and GRACE-FO data from 2002 on. Before 2002, each mass contribution to the BSL is considered separately. For glaciers, a first estimate comes from a global glaciers model driven by 20th century surface forcing; a second estimate, starting in 1961, comes from in situ glaciological and geodetic observations of about 200 glaciers, extrapolated to obtain a global estimate. For the Greenland Ice Sheet, three estimates are used: a mass-balance reconstruction that covers 1900–2003, input-output estimates that cover 1972–2003, and a multi-method assessment that covers 1993–2003. For Antarctica, no mass-balance reconstruction exists before the satellite era, so that a small Antarctic Ice Sheet contribution before 1993 is assumed of 0.05 ± 0.04 mm  yr−1, based on an existing compilation; for 1993–2003, a multi-method assessments is used to derive the mass changes, with a spatial pattern based on GRACE observed pattern. For terrestrial water storage, the natural contribution comes from a 20th century reconstruction and the antropogenic contribution comes from an inventory of artificial reservoir impoundment and groundwater depletion. Because the thermosteric solution is based on data that is independent from in situ temperature data we assume that the distance between the thermosteric solution on one side and the objective mapping solution and the ARANN solution on the other side provides an estimate of the potential systematic uncertainty.

To estimate the ocean heat uptake time series since 1957 we make an ensemble of the three solutions: the mapping objective solution which only starts in 1960 (because of too much sparsity in the in situ temperature data before), the thermosteric solution which starts in 1957 (because of too much sparsity in tide gauge data before) and the ARANN solution. We take the ensemble mean as our best estimate of the ocean heat uptake because averaging allows to reduce the effect of particular errors of individual datasets and emphasizes the common variability among datasets. We take for uncertainty the spread of the ensemble. In epochs when the three solutions come to close agreement, this uncertainty is low, below the level of uncertainty of the objective mapping solution. It means that at these epochs, the systematic uncertainty due to the spatial coverage of the in situ data or the systematic uncertainty due to instrumental biases is below the uncertainty due to the processing of the solutions. At these particular epochs, we take for uncertainty the uncertainty of the objective mapping solution.

On Fig. 3a, in the main text, we derive two point estimates of the planetary heat uptake based on recent data (see pink and yellow points on Fig. 3a). The first estimate is a planetary heat uptake derived from the European space agency Sea-level budget closure Climate change initiative (ESA SLBC CCI) dataset v2.251 (https://dap.ceda.ac.uk/neodc/esacci/sea_level_budget_closure/data/timeseries_slb_elements/v2.2//ESACCI_SLBC_TimeSeriesOfSeaLevelBudgetElements_v2.2.csv). In this estimate we use the sea level budget approach to derive the ocean heat uptake. We follow precisely the same approach as for the thermosterisc solution but with the ESA sea level closure budget data. From this calculation we can estimate only one point because the ESA sea level closure budget data covers only the period 1993–2018. With this N dataset, we found \(\tilde{\lambda }=-1.958[-4.047;0.592]W\cdot {m}^{-2}\cdot {K}^{-1}\) (median, 17–83%). The second point estimate of the planetary heat uptake is the direct TOA earth energy imbalance reconstruction from ref. 52 (https://researchdata.reading.ac.uk/271/18/DEEPC_TOA_NET_v04.0_198501-201906.nc) (hereafter DEEP-C). It is calculated from ERBS WFOV v3.0 dataset on 1985-1999, and from CERES v4.1 data on 2000-2019. With this N dataset, we found \(\tilde{\lambda }=-1.468[-1.945;-1.027]W\cdot {m}^{-2}\cdot {K}^{-1}\) (median, 17–83%).

Estimate of the climate feedback parameter by regression

First, all time series of the radiative feedback, the GMST and the planetary heat uptake are filtered with a low pass filter and a cutoff period at 15 years in order to remove the interannual effect of the internal variability. For the planetary heat uptake which is derived from the thermosteric solution the filtering with a cutoff period at 15 years is particularly important because tide gauge records are known to be affected by coastal effects which introduce spurious interannual to decadal variability when they are aggregated to estimate the GMSL53. With a cutoff period at 15 years we expect to remove most of this spurious variability in the global mean sea level estimate derived from tide gauge records53.

The time series of the radiative feedback, the GMST and the planetary heat uptake show time-correlation in their errors, thus errors in the \(\tilde{\lambda }\) time series are likely to be correlated in time as well. This time correlation must be taken into account to evaluate the significance of the estimated time variations in \(\tilde{\lambda }\). To estimate the uncertainty in \(\tilde{\lambda }\), including the uncertainty due to time-correlation in errors, we propagate the uncertainty in radiative feedback, GMST and planetary heat upatke through the regression. First we generate a priori an ensemble of 500 radiative feedback time series and 500 N time series which include the time correlation of their errors. The 500 radiative feedback time series are drawn randomly from the 100,000 time series provided by the IPCC AR6. For the 500 N time series, we assumed the time correlation in errors in N is dominated by the error induced by the internal variability. The time correlation in the internal variability in N decrease exponentially from 1 to 0 with a typical e-folding time scale of 2 years (as suggested by climate models54). So to draw the 500 N time series we used a variance-covariance matrix with diagonal terms equal to the variance of the ensemble of the three ocean heat uptake estimate (i.e. the in situ solution, the ARANN solution and the sea level budget solution see methods) and cross diagonal term aligned with a 2-yr e-folding exponentially decreasing covariance.

Then, we consider all windows of more than 25 years between 1957 and 2017. In total, between 1957 and 2017, there are 35 successive 25-year windows, 34 successive 26-year windows, etc., and one single 61-year window. For each window, each N time series and each F time series, we regress ΔN − ΔF against ΔT to estimate \(\tilde{\lambda }\)2. In total for each window, we compute 500 values of \(\tilde{\lambda }\) from which we deduce a distribution of \(\tilde{\lambda }\) (note that positive values of \(\tilde{\lambda }\) are not truncated). The median of this unique distribution of \(\tilde{\lambda }\) is then plotted as a point on the triangle on Fig. 2.

Note that for each N time series and each F time series we computed one \(\tilde{\lambda }\) time series. So, in total we computed 500 \(\tilde{\lambda }\) time series. We evaluate on these 500 \(\tilde{\lambda }\) time series, the change in \(\tilde{\lambda }\) between 1976 and 1990 and the change between 2005 and 1990. From the 500 \(\tilde{\lambda }\) time series we find that 89.8% (resp. 90.4%) of the time series lead to a change in \(\tilde{\lambda }\) between 1976 and 1990 (resp. between 1990 and 2005) that are strictly <0 (resp. >0).

We apply the same procedure when N is derived respectively from the ESA CCI sea level budget and from DEEP-C but the procedure is applied on a single window of the length of respectively the ESA CCI sea level budget record and the DEEP-C record.

As an indicator of the Pacific Decadal Oscillation we use the PDO index ERSST v555 (https://www.ncei.noaa.gov/pub/data/cmb/ersst/v5/index/ersst.v5.pdo.dat). The index is also filtered with a low pass filter and a cutoff period at 15 years to be comparable with \(\tilde{\lambda }\).

Correction for volcanic eruptions

The long time series of ΔN, ΔF and ΔT are affected by three major volcanic eruptions between 1957 and 2017: Agung (1963), El Chichon (1982) and Pinatubo (1991). In order to evaluate and correct their effect on the historical record of the variables ΔF, ΔN and ΔT, we use climate model simulations of the historical period. Climate model simulations of the historical period do not have the same realization of the internal variability as the real-world climate so they cannot be used directly to evaluate the effect of volcanic eruptions. We use large ensembles of historical simulations to remove the internal variability and isolate the forced response of climate. In each large ensemble, we take the median across realizations in ΔN and ΔT during the 3 years following the eruptions of Agung, El Chichon and Pinatubo and we use this estimate to correct the historical time series in ΔN and ΔT. To correct ΔF we use the simulation of the historical period forced with natural forcing only and forced at the surface boundary with the SST of the control simulation (the so called piclim-histnat simulation from the Radiative Forcing Model Intercomparison Project phase 6 (CMIP6)56). In this simulation the radiative response is null because the SST is kept at the level of the control simulation so the TOA radiative imbalance N gives an estimate of the forcing (which is here the natural forcing). We isolate the forcing during the 3 years following the eruptions of Agung, El Chichon and Pinatubo and use this estimate to correct the historical time series in ΔF. In total only 4 CMIP6 models provide at the same time large ensembles of the historical simulation and the piclim-histnat simulation, namely CanESM557, IPSL-CM6A-LR58, CNRM-CM6-159 and MIROC660. We estimate a correction in ΔN, ΔT and ΔF from each model and take the multi-model mean to correct the historical time series in ΔN, ΔT and ΔF used in the manuscript.

Sensitivity of the \(\tilde{\lambda }\) estimate to surface temperature

A limitation of our approach is that the ordinary least square regression cannot account for the uncertainty in GMST while all other sources of uncertainty have been taken into account. To evaluate the impact of this limitation we test the sensitivity of our estimates of \(\tilde{\lambda }\) to GMST by using another GMST dataset, namely the GISS GMST time series38. The estimate of \(\tilde{\lambda }\) with the GISS GMST is shown on supplementary information Fig. S1 and show no significant differences with Fig. 2. The pattern in time variations of \(\tilde{\lambda }\) is the same. We find a drop in \(\tilde{\lambda }\) in the 1970s and in the 2000s when \(\tilde{\lambda }\) is computed over windows shorter than 35 years and we find a fairly constant \(\tilde{\lambda }\) when \(\tilde{\lambda }\) is computed over windows longer than 35 years.

Note that there is a systematic source of uncertainty in \(\tilde{\lambda }\) which comes from the surface temperature record and which is not evaluated by the change of GMST product. It is the variability in GMST which does not produce radiative response in the form \(\tilde{\lambda }T\). This type of uncertainty is called regression dilution (see for example ref. 17, their appendix D). The regression dilution generates a bias and a dispersion around the true \(\tilde{\lambda }\). We cannot estimate the bias in \(\tilde{\lambda }\) from observations because we don’t know the variability of GMST which does not produce radiative response in the form \(\tilde{\lambda }T\). So we did not estimate the bias generated by regression dilution and we did not correct for it. However, we estimated the uncertainty range in \(\tilde{\lambda }\) generated by regression dilution with the spread in \(\tilde{\lambda }\) across different AGCM historical realizations which show different realization of the noise in GMST that does not produce radiative response in the form \(\tilde{\lambda }T\) (see ref. 2 for more details on the method). Our computation of the uncertainty includes the uncertainty due to the dispersion of \(\tilde{\lambda }\) generated by the regression dilution. Note that in observations the instrumental noise is an additional noise in GMST that does not produce radiative response in the form \(\tilde{\lambda }T\). The regression dilution due to this noise has not been evaluated. It certainly explains part of the bias in \(\tilde{\lambda }T\) between the estimates from AMIP-piForcing simulations and from observations (Fig. 3).

Estimate of \(\tilde{\lambda }\) in AMIP-piForcing simulations

To compare our estimate of \(\tilde{\lambda }\) with climate model simulations we use the AMIP-piForcing simulations61 which have the same realization of the internal variability as in the real climate. In AMIP-piForcing simulations six AGCM (see Table 1) are forced with monthly time-varying observed fields of SST and sea ice from 1871 to 2010 using the Atmospheric Model Intercomparison Project (AMIP) II boundary condition dataset62,63,64. All simulations have natural and anthropogenic forcings held constant at preindustrial conditions. With constant forcing the variation in radiative fluxes comes only from the changing SST and sea ice boundary conditions, allowing the radiative response of the Earth to be accurately diagnosed directly from TOA imbalance N65. So, to estimate \(\tilde{\lambda }\) in AMIP-piForcing simulations, we apply exactly the same approach as in observations except that we regress the radiative response of the Earth (i.e. N in AMIP-piForcing simulations) against T. On long windows (longer than 35 years) AMIP-piForcing simulations capture the observed quasi constant \(\tilde{\lambda }\) (Fig. 3c). However on short windows (shorter than 35 years) the AMIP-piForcing simulations fail in capturing the significant time variations in \(\tilde{\lambda }\) (see also Fig. S9 in the supplementary information). The issue probably comes from the atmosphere model because the SST is prescribed in AMIP simulations. This issue is potentially linked with the too little sensitivity of tropical low clouds in AGCMs to the tropical SST changes33.

Table 1 AMIP-piForcing simulations used in this study.

Sensitivity of the results to the time smoothing of data

We tested the sensitivity of our results to the time smoothing of the input data. We have redone all the computation, the figures and the correlation maps with input data (i.e. GMST, radiative forcing and planetary heat uptake) not filtered and filtered with a low pass filter and a cutoff period of 5 and 10 years. The change of filtering affects the estimate of \(\tilde{\lambda }\) only marginally. There is slightly more noise on the estimate of \(\tilde{\lambda }\) as the cutoff period of the filtering decreases (See Fig. S2). This is because the regression itself, which is done on windows of length > 25 yrs already filters the data. However the filtering of the indices (PDO, IPO, SAM, IOD and AMO) and the local SST has an impact on the correlation calculation and the correlation maps (see Figs. S3, S4S5 and S6). When we apply a filter with a decreasing cutoff period to the indices and to the input data of the \(\tilde{\lambda }\) computation we find the correlation between PDO/IPO and the \(\tilde{\lambda }\) time series decreases and its significance as well (Table 2). The correlation is still dominant (corr. 0.33) and significant at the 80%CL (p-value < 0.1) with a 5-yr cutoff period but below, the correlation is not significant anymore. The reason comes from the filtering of the PDO indices. With no filtering, the interannual variability dominates in the PDO index and this interannual variability does not correlates with the \(\tilde{\lambda }\) time series which only shows decadal variability.

Table 2 correlation coefficient and p-value of \(\tilde{\lambda }\) against the PDO index for different filtering applied at the input data.

A similar situation occurs for the correlation maps between the \(\tilde{\lambda }\) time series and the local SST. Whatever the filtering, the correlation maps show the same pattern for observations and AGCMs simulations. But as the cutoff period of the filtering decreases the significance of the correlation decreases. The level of significance is down to 80% CL for large portions of the Pacific when we apply a 5-yr cutoff period. As for the PDO index, the reason is the filtering of the local SST rather than the filtering of the input data used to compute \(\tilde{\lambda }\). With no filtering the interannual variability dominates in the local SST and this interannual variability does not correlates with the \(\tilde{\lambda }\) time series which only shows decadal variability.

On the overall we find that the estimate of \(\tilde{\lambda }\) remains the same whatever the filtering. However the correlation with the PDO index and with local SST holds only for smoothing at 5 and 10 years and less so with no filtering. The reason is that interannual variability dominates in the PDO index and in the local SST, generating too high level of noise to get significant correlation with \(\tilde{\lambda }\) which only shows decadal variability.