Introduction

Global mean surface temperature (GMST) series are crucial for monitoring global warming. The warming can be quantified by a change from a base period (e.g. pre-industrial), or by the rate of change (the warming rate) over a time interval1. The GMST naturally fluctuates in time, displaying short periods of accelerated or decelerated warming (Fig. 1). Considerable attention has focused on changes in the warming rate in the scientific literature and news media, with episodes of accelerated/decelerated warming (i.e., surges and slowdowns) being recently debated2,3,4,5,6,7. These fluctuations may happen in the presence of long-term warming8,9 and can arise due to short-term variability (or noise) in the surface temperatures. Here, trend means the long-term change in mean temperatures and noise contains fluctuations about the trend.

Fig. 1: GMST anomalies from four datasets with superimposed piecewise linear model fitted trends.
figure 1

GMST anomalies are from NASA (blue), HadCRUT (yellow), NOAA (red), and Berkeley (grey) with fitted trends (thick lines) for A continuous models with changing autocorrelation, B discontinuous models with changing autocorrelation. Note: the model fits only show the trend in different regimes.

Noise in temperature series is often characterized by a short-memory process such as an autoregression. In this and other short-memory models, the ocean and other slow climate component systems respond to random atmospheric forcing slowly, producing variability at time scales longer than that of white noise10. Short-memory fluctuations can be large enough to temporarily mask a long-term warming trend, creating the appearance of a slowdown. They can also exacerbate a warming trend, mimicking a surge11. The key question is whether these fluctuations are occurring without any change in the underlying warming trend, or whether there has been an increase (warming surge) or decrease (warming slowdown) in the trend. To answer such questions, one needs to model the short-term variability in the GMST.

Several studies suggested that a slowdown in warming (the so-called hiatus) occurred in the late 1990s and investigated its causes3. The slowdown was attributed to several factors, including large-scale variability in the Pacific Ocean12,13,14,15,16 and external forcings15,17,18. However, studies focusing on the detection of this warming pause showed that the rate of change had not declined, and that this period (from approximately 1998–2012) was not unusual given the level of short-term variability present in the data11,19,20,21,22. More specifically, studies analyzing GMST using changepoint detection methods, which are specifically designed to objectively detect the timing of trend changes, showed no warming rate changes circa 199811,19,21. Further, a study assuming that the changepoint time is known and took place in 1998 showed that the trends before and after 1998 were statistically indistinguishable22. Overall, evidence for a pause or slowdown circa 1998 lacked a sound statistical basis.

As per the Intergovernmental Panel on Climate Change (IPCC), detection of change refers to the “process of demonstrating that climate or a system affected by the climate has changed in some defined statistical sense without providing a reason for that change”23,24. Typically, the process of attribution requires that a change is statistically detected23.

The major agencies monitoring GMSTs all rank 2023 as the warmest year since the start of the instrumental record commencing in 185025,26,27,28. Clearly, global warming has not paused, and the current discussion about the rate of warming in the news media and literature has shifted to whether there has been a warming acceleration2,4,5,29,30. For example29, suggests that the warming rate has increased since 1990 due to a global increase in the Earth’s energy imbalance (the difference in incoming solar radiation and infrared radiation emitted to space)31,32,33. Another recent study predicts an acceleration in the warming rate after 20102. With lessons learned from the hiatus, debate still fresh, we assess whether a warming surge is statistically detectable.

Changepoint techniques are used here to assess whether there has been a warming surge since the 1970s, and if so, to estimate when the surge(s) started (see Methods). Piecewise linear regression models that allow for trend changes are fitted to GMST datasets and assessed via changepoint techniques. When one does not a priori know the time of any changes, which is the case here, changepoint methods account for the number of possible different places where a new regime can begin, preventing overstating statistical significance by “cherry-picking” the location of the changepoint times. This is a critical statistical point. Alongside this, two types of changepoint models are used: discontinuous models where process means in the different regimes of the linear regression do not necessarily connect, and continuous models where the process means in successive regimes connect. While a continuous model is more physically realistic for a globally averaged GMST21, discontinuous fits are also provided as a sensitivity to model choice; a discontinuous model may approximate a continuous change in annual measurements. Our model fits are assessed by verifying whether the residual assumptions are met (see Supplementary Information). Four GMST records are analyzed at the annual time scale (see Methods). The Berkeley, HadCRUT, and NOAA series range from 1850 to 2023, while the NASA series covers 1880 to 2023. Since little evidence of a surge is concluded, a simulation study investigates how many additional years of GMST observations will be needed before a change in the warming rate becomes detectable.

Results

Can we detect a warming surge yet?

Continuous and discontinuous models were fitted to all annual GMST series (see Methods). Model fits and timings of any found changepoints are listed in Table 1 and illustrated in Fig. 1. For the continuous model, a single changepoint is detected near 1970 in all datasets (Fig. 1a). Similarly, we find one changepoint in all datasets for the discontinuous models (Fig. 1b). While the timings detected are slightly earlier for the discontinuous models, both cases do not indicate any changes in trend after the 1970s. We allow the first-order autocorrelation parameter to vary between segments in the fits presented in Fig. 1a, b because a previous study that analyzed global surface temperature time series with changepoint detection suggested a reduction in autocorrelation after the 1960s11. A changing autocorrelation allows us to better capture the larger serial autocorrelation in surface temperatures in the earlier part of the record. To assess sensitivity of our results to this choice, we also include continuous and discontinuous models fitted with fixed autocorrelation parameters (same in all regimes) (Supplementary Note 1). While there is variability in the number and timings of changes in the earlier part of the record, no warming surge is detected beyond the 1970s (Supplementary Figs. 1 and 3). With a changing autocorrelation, assumptions on the residuals seem valid both for the continuous and discontinuous models, but there is more leftover autocorrelation in the residuals when imposing a fixed AR(1) autocorrelation (Supplementary Note 2; Supplementary Fig. 2; Supplementary Table 1).

Table 1 Changepoints detected in four global mean surface temperature datasets within continuous and discontinuous changepoint models

To illustrate how a change in assumptions can yield false detections, we also fitted a discontinuous model that does not take into account autocorrelation (assuming independent errors) on the HadCRUT dataset (see Fig. 2). We include this model here to emphasize that ignoring autocorrelation can lead to spurious detection of changepoints. In fact, multiple spurious changes in the rate of warming are detected after 1970 with this model. A warming acceleration in the 1970s is detected, followed by a different regime with a similar trend starting in 2000, and finally an acceleration in warming in 2012. Similar results are found in the other datasets (Supplementary Fig. 4). However, these fits are not valid as residuals are all strongly autocorrelated (Supplementary Fig. 5; Supplementary Table 1). This illustrates how changepoint analyses can produce substantially different results if autocorrelation is ignored11,34,35. Furthermore, false detection issues are exacerbated with discontinuous fits that tend to enhance the impression of a change in trend21.

Fig. 2: An example of GMST time series with a spurious fit.
figure 2

A HadCRUT GMST anomalies (black) with fitted superimposed discontinuous piecewise linear trends (red) calculated assuming independent errors and B scatterplot of the errors to illustrate a positive correlation. A hypothesis test provides strong evidence that the independent errors assumption is not valid with a p-value  < 0.000008 (see Supplementary Table 1).

We focus on whether there has been a change in the warming trend, but here we also quantify how unusual the 2023 observed temperature is. In Fig. 3, we fit a continuous model with changing autocorrelation (same model as in Fig. 1a) withholding 2023. We then use this model to make a one step prediction for 2023, representing the expected anomaly considering a continued trend in warming and observed autocorrelation in the four respective datasets. The observed 2023 anomalies are much larger than the predicted values (Fig. 3). More specifically, the observed anomalies for 2023 are all in the 99th percentile when compared to the statistical predictions and their associated standard errors, indicating a large departure from the ongoing warming trend.

Fig. 3: Observed vs predicted GMST anomalies for 2023.
figure 3

A GMST anomalies from the NASA (blue), HadCRUT (yellow), NOAA (red), and Berkeley (grey) series with superimposed piecewise linear model (as in Fig. 1a). The fits displayed here include the autocorrelation. The model is fitted until 2022, and used to predict the 2023 data point (square) for each dataset based on the fitted trend and autocorrelation. The actual observations for 2023 are also presented (circle). B Zoom in over 2023 with observations (circle) and predictions (square). The 2023 predictions include 95% prediction intervals (vertical bars). The observed 2023 anomalies are all outside the 95% intervals, indicating a large departure from the expected mean.

How many years are needed to detect a surge?

The fitted models suggest that no changepoints (surges or pauses) have occurred after the 1970s in the GMSTs analyzed. However, it would be somewhat naive to categorically conclude that no surge has occurred since it is possible that the change in trend is too small or that there is not yet enough data for statistical detection. In this section, we consider how far into the future GMST must be observed and how large a surge is needed to identify a statistically significant change at the current warming rate.

With the HadCRUT GMST from 1970–2023, we computed how large a surge would need to be to become statistically detectable at the α = 0.05 significance level (see Methods). Elaborating, during the 1970–2023 period, the maximum difference in trends occurs in 2012, with the estimated segments being 1970–2012 and 2013–2023, respectively. Enforcing continuity between the two regimes, the estimated trends are 0.019 C/year (first segment) and 0.029C/year (second segment), a 53% increase. Accounting for the short-term variability in the HadCRUT GMST over 1970–2023 and the added uncertainty for the changepoint location, the second segment (2013–2023) would need a slope of at least 0.039C/year (more than a 100% increase) to be statistically different than 0.019 at the α = 0.05 significance level right now. The estimated slope of 0.029 C/year falls far short of this needed increase. While it is still possible there was a change in the warming rate starting in 2013, the HadCRUT record is simply not long enough for the surge to be statistically detectable at this time.

Figure 4 shows the magnitude of trend change required for different potential changepoint locations from 1990 to 2015 and extending the time series from 2024 until 2040. The changepoint times considered encompass surge timings suggested in the scientific literature and media2,4,29. For example, to detect a warming surge that starts in 1990 over the period of 1970–2024, the magnitude of the surge needs to be at least 67% relative to the 1970–1990 trend. This is equivalent to a change of trend from 0.018 C/year over 1970–1990 to 0.030 C/year over 1991–2024. If observations are extended into the future until 2030, the minimum surge detectable is 61%, becoming 55% by 2040.

Fig. 4: Minimum magnitude of a detectable (5% critical level) warming surge (%) given a range of potential timings for the start of the surge and different timings for the end of the time series.
figure 4

Surge magnitude estimates are based on the HadCRUT global mean surface temperature observed trend, variability and autocorrelation. Assuming a starting point in 1970, we consider a potential changepoint in trend for all years between 1990 to 2015 and potential vantage year from 2024 to 2040. The colorbar indicates the minimum magnitude of a surge to be detectable given the timing of the surge and the vantage year.

To detect a surge starting in 2008 (as suggested in ref. 4) with a 2024 vantage year, the magnitude of the surge increase needs to be at least 75%. Extending to the vantage year 2040, a surge would need to increase by at least 39% to be detectable.

To detect a warming surge starting in 2010 and ending in 2024, the trend needs to have changed by 84% (equivalent to a trend of 0.034 C/year from 2010–2024). If the time series extends to 2030, the surge would need to change by at least 58% (a magnitude of 0.028 C/year from 2010–2030) to be detectable. If the time series is further extended to 2040, a surge of at least a 39% change (corresponding to a magnitude of 0.026 C/year from 2010–2040) could be detectable.

We present this simulation based on time series properties of the HadCRUT data here; however, patterns are the same across all GMST datasets (Supplementary Notes 34; Supplementary Table 2; Supplementary Figs. 68). These estimates include the added uncertainty of an unknown changepoint location (see Methods). If the timing of a surge was known a priori from separate observational platforms (e.g., from satellites) or from climate model simulations, the minimum detectable surge magnitude would reduce (given the same short-term variability). In this case, the changepoint location is known and the problem reduces to testing whether the trends are the same before and after the hypothesized change time. Across all datasets, an increase of at least 55% is needed for a warming surge to be detectable in 2024. 

Over the different periods considered in the literature, the hardest surge timing to detect is 2015 when observations end in 2024. In this case, there are only nine years of observations after the change. The trend over those nine years would need to be 133% larger (0.044 C/year) to become detectable. Based on Fig. 4, it is harder to detect a surge when it occurs close to the series’ end. This behavior is also observed in a simulation study on the detection power for an increase in warming given different vantage years and surge timings (Supplementary Figs. 910). Detection power is lost with shorter time series (early vantage years) and for a late surge.

Discussion

GMST series fluctuate in time due to short-term variability, often creating the appearance of surges and/or slowdowns in warming. While these fluctuations may mimic an increase/decrease in the warming trend, they can simply arise from random noise in the series. This is important considering the warming hiatus discussion over the last decade and the more recent alleged warming acceleration. Formal detection of surges and pauses should account for noise (or short-term variability) and the additional uncertainty of identifying the changepoint times (unless the timing of a changepoint is suggested by independent model/theory/observations).

Here, several changepoint models were used to assess whether an acceleration in warming has occurred since 1970. Different changepoint model types were considered to assess sensitivity to model choice. After accounting for short-term variability in the GMST (characterized by an autoregressive process), a warming surge could not be reliably detected anytime after 1970. This holds regardless of whether the changepoint models impose continuity of mean responses between regimes or autocorrelation is fixed or time-varying. We further demonstrate that an acceleration is detected with a discontinuous model that assumes independent errors, which is not a statistically valid model choice. Model fits should be assessed for overall goodness of fit and produce residuals with a zero-mean and no autocorrelation (white noise). In the Supplementary Information, this is done by analyzing residuals from the model fits and testing them for residual autocorrelation.

While our focus is on whether there has been a continued acceleration in the rate of global warming, our analysis recognizes how unusual surface temperature anomalies were in 202336,37,38. Our model fit (in the continuous mean response model with changing autocorrelation, Fig. 1a) shows that the 2023 anomaly is larger than the 99th percentile of the expected mean, indicating a large departure from the current warming trend. One could consider including exogenous variables (such as ENSO) in the model, which would reduce variability in the residuals and enable statistically detection of any changes sooner39. The fact that trend changes in GMST records were not detected after the 1970s does not rule out that some small changes may have occurred; indeed, the records may be too short (or changes not large enough) to be detectable amidst the short-term variability. As such, a simulation study was conducted to assess when a warming surge will become detectable in the future. A change in the warming rate on the order of 35% around 2010 becomes detectable circa 2035. This is the case for both an acceleration or a slowdown in warming. Our simulations allow for either an increase or decrease in the trend (two-sided Student’s test). Detection lengths would reduce with one-sided testing (say warming only), but this is not deemed justifiable given the recent discussion about a pause. Indeed, testing for a warming increase because the same observations suggest an increase will tend to overstate significance. Finally, our conclusion that an acceleration is not detectable at the global level yet may not apply at regional levels and a rigorous detection of regional warming surges should be the focus of future work.

Our conclusions are based on piecewise linear models. While piecewise linear models provide a good first-order approximation of any nonlinearities and prevent overfitting the data, no model will perfectly describe our scenario. The assumption that global surface temperatures contain first-order autocorrelation, which describes the dependency in the year-to-year noise values, is short-memory and geometrically decays in year. Other types of models used to describe the noise in surface temperature observations include a long-memory model (where the decay is a power law)40,41,42. However, we do not consider these models here as identifying long-memory requires long time series and its presence tends to be more prominent in sea surface temperatures43,44. Stochastic trend models where the GMST trend cointegrates with the trend in radiative forcings have also been considered45,46,47. That said, these models have been mainly used for detection and attribution studies, and we focus on detecting a change in the warming trend here.

Method

Data

The following four GMST time series were analyzed in this study:

Changepoint models

Our work entails fitting several changepoint time series models that partition the GMST into regimes with similar trends using piecewise linear regression models. This work is most concerned with changes in the trend of the series.

Changepoint analyses partition the data into different segments at the changepoint times. To describe this mathematically, our model allows for m changepoints during the data record t {1, …, N}, which occur at the times τ1, …, τm, where the ordering 0 = τ0 < τ1 < τ2 < < τm < N = τm+1 is imposed. The time t segment index r(t) takes the value of unity for t {1, …, τ1}, two for t {τ1 + 1, …, τ2}, … , and m + 1 for t {τm + 1, …, N}. Hence, the m changepoint times partition the series into m + 1 distinct segments. The model for the whole series is

$${X}_{t}=E[{X}_{t}]+{\epsilon }_{t},$$

where E[Xt] is the regression function. The regression functions considered in this manuscript include a continuous (joinpin) model, where we impose process means to meet at the changepoint times, and its discontinuous counterpart. The model errors {ϵt} all have a zero mean and allow for autocorrelation; more about this component is said below.

The trend model regression structure we use has the simple piecewise linear form

$$E[{X}_{t}]={\alpha }_{r(t)}+{\beta }_{r(t)}t,$$

where βr(t) and αr(t) are the trend slope and intercept, respectively, of the linear regression in force during regime r(t). An equivalent representation is

$$E[{X}_{t}]=\left\{\begin{array}{cc}{\alpha }_{1}+{\beta }_{1}t,&\hfill 0={\tau }_{0} < \, t\le {\tau }_{1},\\ {\alpha }_{2}+{\beta }_{2}t,&\hfill {\tau }_{1} \, < \, t\le {\tau }_{2},\\ \vdots &\vdots \\ {\alpha }_{m+1}+{\beta }_{m+1}t,&\hfill {\tau }_{m} \, < \, t\le {\tau }_{m+1}=N.\\ \end{array}\right.$$
(1)

If continuity of the regression response E[Xt] is imposed at the changepoint times, the restrictions

$${\alpha }_{i}+{\beta }_{i}{\tau }_{i}={\alpha }_{i+1}+{\beta }_{i+1}{\tau }_{i},\qquad 1\le i\le m,$$

are imposed. These restrictions result in a model having m changepoints and m + 2 free regression parameters. Writing the model in terms of the free parameters α1β1, …, βm+1 only gives

$${X}_{t}={\alpha }_{1}+\sum _{i=1}^{r(t)-1}({\beta }_{i}-{\beta }_{i+1}){\tau }_{i}+{\beta }_{r(t)}t+{\epsilon }_{t}.$$

The model errors \({\{{\epsilon }_{t}\}}_{t = 1}^{N}\) are a zero mean autocorrelated time series. For a first-order autoregression (AR(1)), such a process obeys the difference equation

$${\epsilon }_{t}=\phi {\epsilon }_{t-1}+{Z}_{t},$$
(2)

where {Zt} is independent and identically distributed Gaussian noise with mean E[Zt] ≡ 0 and variability Var[Zt] ≡ σ2, and ϕ ( − 1, 1) is an autocorrelation parameter representing the correlation between consecutive errors. It is important to allow for autocorrelation in the model errors in climate changepoint analyses11,34,48,49: failure to account for autocorrelation can lead one to conclude that the estimated number of changepoints, \(\hat{m}\), is larger than it should be. Higher-order autoregressions are easily accommodated should a first-order scheme be deemed insufficient. We will also consider cases where the autoregressive parameter changes at each changepoint time; these essentially let ϕ depend on time t via the regime index r(t). The changepoint times for the autocorrelation and mean structure are constrained to be the same.

Estimation of the model parameters proceeds as follows. For a given number of changepoints m and their occurrence times τ1, …, τm, one first computes maximum likelihood estimators of the regression parameters. These produce maximum likelihood estimators of all αi and βi. This fit gives a model likelihood, which we base on the Gaussian distribution since the series are globally and annually averaged. This likelihood is denoted by L(mτ1, …, τm).

The hardest part of the estimation scheme lies with estimating the changepoint configuration. This is done via a Gaussian penalized likelihood. In particular, the penalized likelihood objective function O of form

$$O(m;{\tau }_{1},\ldots ,{\tau }_{m})=-2\ln (L(m;{\tau }_{1},\ldots ,{\tau }_{m}))+C(m;{\tau }_{1},\ldots ,{\tau }_{m})$$

is minimized over all possible changepoint configurations. The penalty C(mτ1, …, τm) is a charge for having m changepoints at the times τ1, …, τm in a model. As the number of changepoints in the model increases, the model fit becomes better and \(-2\ln (L)\) correspondingly decreases. However, eventually, adding additional changepoints to the model does little to improve its fit. The positive penalty term counteracts “overfitting” the number of changepoints, balancing likelihood improvements with a cost for having an excessive number of model parameters (changepoints and linear parameters within each segment). Many penalty types have been proposed to date in the statistics literature49. One that works well in changepoint problems is the Bayesian Information Criterion (BIC) penalty

$$C(m;{\tau }_{1},\ldots ,{\tau }_{m})=p\ln (N),$$

where p is the total number of free parameters in the model. Table 2 lists values of p for the various model types encountered in this paper. For example, for a continuous model with a global AR(1) structure, there are 2m + 4 free regression parameters in a changepoint configuration with m changepoints and m + 1 segment parameters. Also contributing to the parameter total are ϕ and σ2.

Table 2 Model penalties for fitting in trend models

Finding the best m and τ1, …, τm can be accomplished via a dynamic programming algorithm called PELT50 or a genetic algorithm search as in refs. 51,52. PELT is computationally rapid, performs an exact optimization of the penalized likelihood, and was used here.

Given a specified model with trend and autocorrelation structure, a prediction interval can be computed for the last regime. Assuming that the forecast errors are normal, a 95% prediction interval for a h-step ahead forecast is given by

$${\hat{X}}_{t+h}\pm 1.96\hat{{\sigma }_{h}}$$
(3)

where h represents the number of years ahead, \({\hat{X}}_{t+h}\) is the predicted temperature anomaly and \(\hat{{\sigma }_{h}}\) is an estimate of the standard deviation of the h-step forecast distribution. The prediction intervals are computed using the model fit from the final segment in each series.

Testing trend differences

How can one determine the statistical significance of a potential surge in the warming rate at some point since 1970? To address this question, a model is needed. Whilst our running example here considers the HadCRUT series since 1970, other GMST datasets are easily analyzed (see Supplementary Material).

A simplification of (1) for a single continuous changepoint is

$$E[{X}_{t}]=\left\{\begin{array}{ll}{\alpha }_{1}+{\beta }_{1}t,\hfill &\hfill 0 \, < \, t\le \tau ,\\ {\alpha }_{1}+{\beta }_{1}\tau +{\beta }_{2}(t-\tau ),\quad &\hfill \tau \, < \, t\le N.\end{array}\right.$$
(4)

There are other good single changepoint slope change techniques for this task — two are HAC tests of ref. 53 and the two phase regression models of ref. 54 (the latter would have to be modified for autocorrelation). We develop a simple procedure here that is very accessible.

If the single changepoint is known to occur at time τ, then the Student’s test based statistic

$${T}_{\tau }=\frac{{\hat{\beta }}_{2}-{\hat{\beta }}_{1}}{\widehat{{{\rm{Var}}}}{\left({\hat{\beta }}_{2}-{\hat{\beta }}_{1}\right)}^{\frac{1}{2}}}$$
(5)

can be used to make inferences. Here, \({\hat{\beta }}_{1}\) and \({\hat{\beta }}_{2}\) are the estimated trends of the two segments before and after time τ. One concludes a surge in warming if Tτ is too large to be explained by chance variation (as gauged by a Student distribution with N − 3 degrees of freedom); a change in the warming rate (negative or positive) is suggested when Tτ is too large to be explained by chance variation. In computing \({{\rm{Var}}}({\hat{\beta }}_{2}-{\hat{\beta }}_{1})\), the AR(1) parameters ϕ and σ2 are needed. Estimates of the two slopes, AR(1) parameters and their standard deviations are provided in time series fitting software (such as the arima function in R55).

Extreme care is needed when τ is unknown. Should the time τ be selected visually among the many possibilities where it can occur without accounting for this, statistical mistakes can ensue. This is why changepoint techniques are needed. For example, τ = 43, which corresponds to 2012, has been suggested as a time when warming accelerated4. For the 1970–2023 data, ignoring the selection of the changepoint, at the 0.05 significance level, a Tτ of 2.007 or more indicates warming rate changes; a two-tailed Student’s test was employed to allow for either an increase or a decrease in the warming rate. For our specific example, \(| {T}_{43}| =| {\beta }_{1}^{1970,2012}-{\beta }_{2}^{2013,2023}| /\hat{\sigma }=| 0.0286-0.0187| /0.0065=1.5281\) is not statistically significant at the 0.05 significance level.

If the time of the warming rate change is unknown (as is common), statistical significance is determined based on the null hypothesis distribution of

$${T}_{\max }=\max _{\ell \le \tau \le u}| {T}_{\tau }| ,$$
(6)

where and u are values that truncate the admissible changepoint times near the data boundaries for numerical stability. The \({T}_{\max }\) statistic has significantly different statistical properties (more tail area) than Tτ for a fixed τ. A common truncation requirement, and one that we follow, is to truncate 10% at the series boundaries:  = 0.1N and u = 0.9N. If the calculated \({T}_{\max }\) statistic exceeds the threshold QN, where QN is the 0.95 quantile of the null hypothesis distribution of \({T}_{\max }\), then a statistically significant rate change is declared with confidence 95%. The most likely changepoint time, \(\hat{\tau }\), is estimated as the τ at which \(| {T}_{\tau }| ={T}_{\max }\) is maximal. Statistical tests of this type are discussed in ref. 56 and ref. 57. There, large sample distributions are derived to determine QN. However, due to the relatively short series since 1970, we use a Monte Carlo method with Gaussian AR(1) errors to determine statistical significance.

Elaborating, our Monte Carlo approach simulates many series using parameter estimates from the current data under the null hypothesis. For example, with the 1970–2023 HadCRUT data, the null hypothesis parameters are estimated as \({\hat{\alpha }}_{1}^{1970-2023}=-0.17\), \({\hat{\beta }}_{1}^{1970,2023}=0.0199\), \({\hat{\beta }}_{2}=0\) (there is no second segment under the null hypothesis), \(\hat{\phi }=0.0865\), and \(\hat{\sigma }=0.097\) (Table 3). One hundred thousand time series were then simulated, \({T}_{\max }\) was computed for each series, and the 0.95 quantile of these values was identified to estimate QN.

Table 3 Estimated parameters for simulating annual global surface temperature anomalies over 1970–2023 (null model)

The simulated 95th percent quantile for the HadCRUT series is QN = 3.1082. The largest Tτ statistic occurs in 2012 and is \(| {T}_{43}| =1.5281(={T}_{\max })\), which is far from the required threshold of 3.1082. Hence, there is little evidence for a statistically significant change in the warming rate from 1970–2023 in the annual HadCRUT series; this conclusion holds for all GMST datasets considered in this paper.

So how large would the slope need to be in the second segment to declare a significant surge? We answer this for a baseline segment of 1970–2012 and a second segment from 2013–2023. To answer this, note that \({\hat{\beta }}_{1}\) and the numerator of Tτ do not depend on \({\hat{\beta }}_{2}\); thus, we can set \({T}_{\max }={T}_{43}=3.1082={Q}_{N}\) and solve for \({\hat{\beta }}_{2}\). This results in \({\hat{\beta }}_{2}=0.0388\). We see that a change in surge magnitude of 100(0.0388–0.0187)/0.0187 = 107% between the two segments is required for 95% statistical confidence.

The same logic can be used to determine future estimated rates necessary for statistical significance. Using the HadCRUT series up to 2023, there is no statistical evidence of a surge starting in 2012 relative to the 1970–2012 segment. Will this still be the case in 2025? How about 2040? For any potential surge starting in 1990-2015, the data from 1970-2023 was used to estimate the warming trend slope, intercept, and AR(1) structure. We then simulated cutoff quantiles for 95% statistical significance as above for several considered vantage years, pushing out to 2040. Since the estimated standard deviation of the slope differences depends only on the segment lengths and the AR(1) parameters, the above procedure can be solved as above for the minimal slope necessary to induce statistical significance.

Using the HadCRUT series, one hundred thousand Gaussian series were simulated up to 2040 under our best working model (no surge, \({\beta }_{1}^{1970,2023}=0.0199\), α = − 0.17, ϕ = 0.0865, and σ = 0.097). This gives the Monte Carlo quantile estimate Q71 = 2.9877. The numerator of the Tτ-statistic corresponding to a change starting in 2012 is estimated and solved for the minimally significant slope for the 2013–2040 segment: \({\hat{\beta }}_{1}^{1970,2012}+\widehat{{{\rm{Var}}}}{({\hat{\beta }}_{2}^{2013,2023}-{\hat{\beta }}_{1}^{1970,2012})}^{1/2}{Q}_{71}=0.0187+(2.9877)\;0.0025=0.0262\). In short, a 40% increase in the 2013–2040 warming rate relative to the 1970–2012 rate will be needed to declare a significant warming surge by 2040.

The above process was repeated for each surge year, from 1990–2015, and each vantage year from 2024–2040. For each surge year start τ, the minimum statistically significant slope is calculated assuming that \({T}_{\max }=| {T}_{\tau }|\). For each τ, this minimally significant slope is compared to the estimated slope from the 1970-(1969+τ) series segment to calculate its associated percent change. The results for the HadCRUT series are displayed in Fig. 4. Overall, one sees that either significantly increased warming or many more years of observations will be required before declaring any warming surge with a reasonable degree of confidence. This process is repeated for other GMST datasets based on the null hypothesis parameters listed in Table 3. Results are presented in the Supplementary Information.