Introduction

September 2023 was the warmest September globally, with the highest temperature anomaly of any month in any year since 1940 in the ERA5 dataset1. September 2023 also broke the previous monthly record by an exceptionally large margin: the previous record, set in 2020, was broken by 0.5 °C (Fig. 1). This is the largest margin by which the previous monthly record has been broken, in any month, in the entire ERA5 dataset.

Fig. 1: September 2023 shattered the record globally.
figure 1

Time series of September global mean temperature in 1940–2023 based on ERA5 reanalysis. Black circles indicate previous record-breaking Septembers before 2023. The temperatures are given in anomalies with respect to the 1991–2020 period.

In addition to September 2023, June, July, and August were also by far the warmest on record globally, with large margins. However, September had the largest margin of these months and is therefore the subject of this communication.

Results

The observed record margin is a rare event in the climate model simulations

We argue that internal climate variability alone is unlikely to explain the unusually large margin by which September’s record was broken. To illustrate this, we consider simulations from three climate model ensembles: Coupled Model Intercomparison Project 62 (CMIP6), the 100-member Max Planck Institute Grand Ensemble3 (MPI-GE), and the 100 member Community Earth System Model version 24 (CESM2-LE). These are well-established models known for their reliable simulations of both internal climate variability, such as the El Niño-Southern Oscillation, and the forced response to greenhouse gas forcing.

By looking at each model simulation for the period 1970–2050 and searching for the margins by which the monthly records are broken in each simulation, we obtain a total of 5166 September records in CMIP6 models, 1431 in MPI-GE and 2068 in CESM2-LE (see Methods). These distributions are shown in Fig. 2a–c. The distribution of record margins results from the unforced internal variability and the forced greenhouse gas-induced trend. A larger trend or higher variability, or both, increases the likelihood of large record margins.

Fig. 2: Model-simulated record margins for September global mean temperature in 1970–2050.
figure 2

Top row: all margins by which the previous record was broken in model simulations in 1970–2050. The number of samples in each model ensemble is shown in parenthesis. Black dashed line shows the observed margin in ERA5, and its percentage rank in the model-simulated distribution is shown at the top. Bottom row: distributions of the most extreme margins in each simulation. The black solid line shows the generalized extreme value distribution fitted to the extreme margins. a, d CMIP6 ensemble, b, e MPI-GE ensemble, and c, f CESM2-LE ensemble.

We briefly analyse the magnitude of internal variability in the models and observations by calculating the standard deviation of the detrended September mean temperature. Especially in CMIP6, the model-simulated standard deviation of September temperatures tends to be larger than in the observations (Supplementary Fig. 1). This suggests that the internal variability of the models is at least not smaller than in the observations, and thus the probabilities are not being underestimated.

The observed margin from September 2023 (0.5 °C) is shown as a black dashed line in Fig. 2. As can be seen, the observation falls in the far right tails of the model simulated margins. In CMIP6 (Fig. 2a), only three of the 5166 model-simulated margins exceed the observed margin, corresponding to the 99.94th percentile of the model distribution. In MPI-GE (Fig. 2b) the observation is completely outside the distribution, and in CESM2-LE (Fig. 2c) there is only one margin higher than the observation, meaning that the corresponding percentile is 99.95%.

We calculate the probability of the observed margin from the fitted Generalized Extreme Value (GEV) distribution. In the fit, we consider only the most extreme margin from each simulation, similar to the observations (see Methods). For CMIP6, we obtain a p-value of 0.004 (Fig. 2d, see Supplementary Table 1 for the confidence intervals). For MPI-GE and CESM2-LE, the p-values are 0.018 and 0.01 (Fig. 2e, f), respectively. These values are generally consistent with the empirically sampled probabilities, the probabilities for the 1990–2050 period and the probabilities for the August–October period (Supplementary Fig. 2, Supplementary Table 1).

We repeat the analysis for September by excluding those climate models that, based on the Hausfather et al.5 analysis, fall outside the likely transient climate response (66% probability range) of 1.4–2.2 °C. This further reduces the p-value for the CMIP6 models, giving a result of p = 0.002 (Supplementary Table 1).

Furthermore, the discrepancy between the observed and simulated margins is almost equally striking when all calendar months are considered by the models. Considering all months, the p-values of the observed margin are 0.029 in CMIP6, 0.017 in MPI-GE and 0.025 in CESM2-LE (Supplementary Table 1). This is despite the fact that the internal variability of the climate is greater in the northern hemisphere (NH) winter months than in the NH summer months, as El Niño tends to peak in the NH winter. Therefore, the margins for breaking records are generally greater in the NH winter months than in the NH summer months.

For comparison, we also briefly examined the probability of the observed record margin of 0.47 °C in February 2016. In this case, we obtained p-values of 0.115 for CMIP6, 0.078 for MPI-GE and 0.141 for CESM2-LE. The margin observed in February was therefore about an order of magnitude more likely than the one observed in September, and thus more likely to be due to internal variability alone. However, it is worth noting that in February 2016, super El Niño had just peaked, when its impact on global temperature was near maximum. This is not the case for September 2023.

The most plausible explanation for the model-observation discrepancy in September 2023 would be that the observed combination of forced warming and internal variability is so rare that it does not occur in the models. The strengthening El Niño following a triple La Niña event observed in 2020–2022 has occurred only a few times since 1950, and not earlier in the 21st century6. However, large ensemble models are designed to capture such rare climate anomalies, and we found that no member in MPI-GE and only one member in CESM2-LE simulated temperature jumps as large as the one observed in September 2023.

It is also worth noting that increased solar activity may have contributed to the record margin in September 2023. However, solar forcing is included in CMIP6 models7, so while it may have added a few hundredths of a degree to the record margin, it is unlikely that increased solar activity contributed to the model-observation discrepancy, although the solar cycle 25 may have risen slightly faster than the estimate prescribed in the scenario.

Discussion

Since the state-of-the-art climate models cannot generally reproduce the observed margin, we argue that it is highly unlikely (p ~ 1%) that internal climate variability alone would have caused the large increase in global mean temperature in September 2023. It is therefore likely that other external forcings such as (1) the Raikoke and Hunga Tonga volcanic eruptions8,9 and (2) the removal of sulphur pollution from ships10 have contributed to the observed temperature anomaly.

The Raikoke eruption in June 2019 injected enough sulphate into the stratosphere that it may have had a small cooling effect on the global mean temperature in September 2020. The Hunga Tonga eruption in January 2022 injected large amounts of both water vapour and sulphate aerosols into the stratosphere, causing both warming and cooling climate effects. Based on literature review, we estimate that the combined effect of the two eruptions on the temperature difference between September 2020 and 2023 may be 0.02–0.07 °C (Supplementary Note 1).

A number of studies have estimated the radiative forcing caused by the reduction of sulphate aerosol pollution from international shipping. The span of the estimates is large, from 0.02 to 0.60 Wm-2. It is noteworthy that the studies carried out using global climate models with interactive aerosol11,12,13 have produced higher estimates than studies relying on chemical transport models driven by offline meteorology. In early 2020, the sulphate pollution from shipping was reduced by an estimated 80%, or by 8.5 Tgyr−114,15. In the climate model scenarios, sulphur dioxide emissions from shipping are not reduced stepwise but more gradually16. Based on literature, we estimate that the reduction of sulphur emissions from shipping may have increased the temperature difference between September 2020 and 2023 by 0.05–0.075 °C (Supplementary Note 2).

In summary, our analysis suggests that the record margin observed in September 2023 was an extremely unlikely outcome. In principle, such a low probability could be due to (1) exceptional manifestation of internal variability (i.e. a strong El Niño following 3 year La Niña), (2) the models underestimating the magnitude of the internal variability, (3) external forcings not being accurately prescribed in the models, or (4) a combination of above factors.

The combined effects of the volcanic eruptions and the reduction of sulphate emissions from global shipping may plausibly have caused a temperature increase of 0.07–0.15 °C between September 2020 and 2023. If this turns out to be the case, the global average temperature of September 2023 would still be exceptional, but not quite as unlikely as without the forcings; a 0.1 °C reduction in the observed margin would increase the p-value in CMIP6 by a factor of 15. In any case, our results call for further analysis of the impact of other external forcings on the global climate in 2023.

Methods

Observations and climate models

We use monthly mean near-surface temperature from both observations and climate models. The observational data comes from ERA5 reanalysis17. We compared the observed temperatures to three climate model ensembles: all available realisations from Coupled Model Intercomparison Project 62 (CMIP6), the 100-member Max Planck Institute Grand Ensemble3 (MPI-GE), and the 100-member Community Earth System Model version 24 (CESM2-LE). Modelling uncertainty is addressed by considering 42 models in the CMIP6 ensemble, while internal climate variability is addressed using the two single model large ensemble datasets.

We use ERA5 to compare with models because ERA5 data provide a like-for-like comparison with climate models, unlike the observational datasets which are a blend of land 2-m temperature and sea surface temperature.

Quantifying record margins in climate models

From the models, we consider the period of 1970–2050, and search for the margins by which the global monthly records are broken. We require that there is a minimum of 10 years in the time series, so the first record is searched from the time series of 1970–1979 and so on. For the future period, we use the SSP2-4.5 scenario for CMIP6, RCP4.5 for MPI-GE and SSP3.70 for CESM2-LE. As we focus only on the pre-2050 period, the results do not markedly depend on the choice of the emission scenario. We chose the period 1970–2050 because the global warming rate in the models is similar to that observed, while in SSP2-4.5, the warming rate decreases during the latter half of the century. As a sensitivity test, we also repeated the analysis using the 1990–2050 period from the models (Supplementary Table 1).

When calculating the probabilities, we consider only the most extreme margin from each simulation, similar to the observations. We fit the Generalised Extreme Value (GEV) distribution to the extreme margins and calculate the probability that the simulated margin is equal to or greater than the observed margin. We assume that the margins of individual model realisations are independent of each other, although we acknowledge that this may not be entirely true. The uncertainty of the GEV probability is estimated by bootstrapping. We resample the margin data 1000 times by randomly drawing N samples with replacement, where N is the number of simulations in the model ensemble. This process creates an artificial ensemble for p-values from which the 5th and 95th percentiles are calculated.

In addition to the GEV fit, we calculate the probabilities empirically, by calculating the number of simulated margins equal to or greater than the observed margins, divided by the total number of simulated margins.