## Introduction

The idea that there is unpredictable variability in the weather and climate has been demonstrated in the seminal papers by Ed Lorenz1,2 and is popularised as ‘chaos’ and the ‘butterfly effect’: whereby a tiny disturbance such as the flap of a butterfly’s wings can grow into large-scale differences in future weather patterns. This leads to inherent uncertainty in any practical meteorological forecast and suggests fundamental limits on the predictability of the climate system. This sensitivity to initial conditions led to the ideas behind the development of ensemble weather prediction involving multiple numerical realisations;3 an approach that was subsequently extended to longer range forecasts4 and is now routinely used in seasonal predictions5 and longer climate projections.6 In these ensemble prediction systems, individual member simulations differ by small perturbations, which grow with time due to unpredictable variability7 or ‘chaos’, limiting predictability but also allowing the ensemble to capture the uncertainty in the future state of the system that arises due to uncertainty in initial conditions and/or model formulation. Although the underlying equations in climate and weather prediction models are fundamentally deterministic, and are therefore not random, the range of outputs from ensemble predictions is often treated probabilistically when either measuring the skill of retrospective predictions8,9 or expressing the outcome of a particular forecast.10

Estimates of the time horizon for predictability of individual weather events is typically 2 weeks for the mid-latitude atmosphere.11,12 However, some components of the climate system are predictable well beyond these timescales. For example, the Madden Julian Oscillation,13 El Niño-Southern Oscillation,14 Atlantic Multidecadal Variability15,16 and the Quasi-Biennial Oscillation17 are all predictable at monthly, seasonal or even longer timescales. Importantly, these sources of predictable variability also have remote teleconnections,18,19 leading to predictability in mid-latitude surface climate (i.e., average weather conditions) at seasonal and decadal lead times.

Given this, in the following section we assess the predictability for both observed (O) and model ensemble member (M) regional climate by dividing the temporal variability into predictable (signal, S) and unpredictable (noise, N) components:20,21

$$O = S_{o} + N_{o},\quad\quad M = S_{m} + N_{m},$$
(1)

## A signal-to-noise paradox in climate predictions

In order to demonstrate the existence of a signal-to-noise paradox we now compare the signal and noise components of observations and models. Climate model predictions, initialised with observational analyses and using fully coupled ocean–atmosphere models, now show potentially useful levels of prediction skill for year to year variations in the winter North Atlantic Oscillation (NAO). This implies predictability of European and North American winter climate out to a season or even longer ahead.20,22,23,24,25,26,27,28,29

All of these studies use ensembles and create an ensemble mean model prediction M to reduce the level of unpredictable noise:

$${\mathbf{M}} = S_m + N_m{\mathrm{/}}\surd n,$$
(2)

where n is the number of ensemble members.

If the ensembles are run for a number of historical cases, for example, a series of past winters, then the squared correlation rmo,2 between the year to year variability in the model ensemble mean M(t) (where time t denotes a particular year) and the observed variability O(t) provides an estimate of the predictable fraction of the observed variance30:

$$r_{mo}^2 = \sigma _{So}^2{\mathrm{/}}\left( {\sigma _{So}^2 + \sigma _{No}^2} \right),$$
(3)

where σSo2 and σNo2 are the variances of the signal and noise components of the observations, respectively.

In the limit of a large ensemble (n → ∞), Eq. (2) implies that the model noise vanishes and the ensemble mean consists of only the modelled predictable signal Sm. The proportion of modelled variance that is predictable may therefore be obtained as the variance of Sm divided by the total variance of individual model members. Eade et al.20 used these definitions of observed and modelled predictability to define the ratio of predictable components (RPC) between observations and model:

$${\mathrm {RPC}}^2 = \sigma _{So}^2{\mathrm{/}}\left( {\sigma _{So}^2 + \sigma _{No}^2} \right){\mathrm{/}}\left( {\sigma _{Sm}^2/\left( {\sigma _{Sm}^2 + \sigma _{Nm}^2} \right)} \right) = r_{mo}^2{\mathrm{/}}\left( {\sigma _{Sm}^2/\left( {\sigma _{Sm}^2 + \sigma _{Nm}^2} \right)} \right),$$
(4)

In principle, the RPC should be 1, as the observations and model should contain the same proportion of predictable variance and the squared correlation should match the predictable proportion of variance in the model.

If the RPC is less than one, then the correlation of the model ensemble mean with observations (rmo) is smaller than would be expected from the predictable fraction of variance in the model. RPC values below 1 are commonly found in climate predictions, especially in tropical seasonal predictions.20,31 This can be caused by several factors, including too few ensemble members to eliminate unpredictable noise, a lack of spread in the forecast ensemble, systematic errors in predicted signals such as poorly structured teleconnections or imperfect initialisation leading to ‘shocks’ in the forecasts.

If on the other hand, RPC is greater than one, then the correlation is higher than would be expected from the proportion of signal in the ensemble variance. RPC values above 1 were not generally expected, but this second possibility has been considered32 and examples have now been found in a number of different ensemble seasonal predictions, particularly in winter predictions of the NAO and Arctic Oscillation.21,23,25,28,29 For example, in the seasonal forecasts of the NAO reported by Scaife et al.,23 the predictable ensemble mean signal was around 2 hPa, the total ensemble variability was around 8 hPa and the correlation was around 0.6 so the RPC = 0.6/(2./8.) > 2. The high correlation score is therefore inconsistent with the small predictable signal in the model and it has been shown that the discrepancy is highly statistically significant21, hence a ‘signal-to-noise paradox’.26

An interesting consequence of the signal-to-noise paradox comes from the alternate form of Eq. (4) based on correlations alone:

$${\mathrm {RPC}}^2 = r_{mo}^2{\mathrm{/ }}r_{mm}^2.$$
(5)

If RPC > 1, then Eq. (5) implies that the correlation between the model ensemble mean and the observations (rmo) exceeds the average correlation between the model ensemble mean and a single ensemble member (rmm). In this case we arrive at the counterintuitive result that the model is better at predicting the real world than it is at predicting itself.20 Figure 1 illustrates this explicitly for a set of seasonal predictions of the NAO. The correlation of the modelled NAO (black line) climbs with ensemble size due to the suppression of unpredictable noise (Eq. (2)), asymptoting at the predictable limit where a very large ensemble has suppressed all noise. If we replace the observations with a single ensemble member (without replacement in the ensemble mean so as to avoid artificially high correlations between members with the same realisation of noise), then the resulting correlation should ideally be the same, as each ensemble forecast member is meant to represent an alternate, but perfectly viable version of the observed evolution.

However, as shown in Fig. 1, in practise the correlation between the ensemble mean and observations (rmo) is higher than the correlation between the ensemble mean and individual ensemble members (rmm), yielding an RPC value in excess of 2 as explained above, and suggesting that the model is better able to predict the real world than it is able to predict itself. Now as the total ensemble standard deviation (σSm2 + σNm2) is close to the observed variability of 8 hPa, the only remaining term in Eq. (4) is the signal standard deviation (σSm) which must therefore be at least two times too small. Note that independent sets of ensemble predictions give a similar result26 and other climate models show similar effects in their predictions of the NAO and AO.22,25,28,29

Note also that practical calculations of the RPC are expected to be underestimates of the true value.20 This is because any practical ensemble is finite in size and so the correlation with observations (rmo) will likely be lower than that of an infinite ensemble. Furthermore, the ensemble mean variance (σSm2) will likely be higher than that of an infinite ensemble due to incomplete suppression of noise. According to Eq. (4), the RPC from any practical ensemble is therefore also likely to be an underestimate. We deduce that ensemble mean signals are likely more than two times too small for the NAO and recommend use of Eq. (5) to calculate the RPC as it is an unbiased estimate.

Finally, as noted above, the signal-to-noise paradox is not only limited to the NAO. Although it is clearest in and around the Atlantic basin, it also occurs in parts of the Pacific and the southern hemisphere (Fig. 2) where it occurs in predictions of the Southern Annular Mode.33 Similar situations have been found on longer timescales in both interannual and decadal predictions20,26,34,35 and in other predicted variables such as surface temperature,20 wind,28 and rainfall.20,36

## A signal-to-noise paradox in atmosphere-only models

So far we have seen that initialised climate predictions of the atmospheric circulation in the Atlantic sector exhibit a signal-to-noise paradox, where they are better at predicting the real world than they are at predicting themselves. In this section we show that this idea could potentially explain a number of earlier results from atmosphere-only climate models forced by specified ocean conditions, and that the signal-to-noise paradox appears to have been present in generations of previous climate models.

Early studies of NAO variability and its potential predictability, given specified ocean surface conditions, gave moderate, but highly significant correlations with observations if enough ensemble members were used to eliminate unpredictable variability in the model.37,38,39 This important result suggested that their might be significant long-range predictability of the winter NAO (as has since been demonstrated), but it was inconclusive at the time because these were not actual forecasts. The specified ocean conditions in these experiments contained information from the future and in particular, this contained information about the subsequent behaviour of the NAO, as the NAO leaves a tripolar imprint in ocean temperatures40 which could feedback to the atmosphere. Careful arguments were put forward to suggest that reproducibility of the observed NAO variability in atmosphere-only model experiments might therefore be due to the limitations of the experiment in specifying the future ocean conditions, and that this could give a misleading overestimate of actual predictability.41 Despite this limitation, ensembles of these atmospheric simulations appear to have contained the same paradoxical result found in long-range forecasts and discussed in the previous section. Figure 3 shows the skill of reproducing NAO variability in one such ensemble of atmospheric model simulations. The same slow climb of correlation skill with ensemble size occurred (blue curve), and the correlation between the ensemble mean and the observed NAO, although modest, ultimately rose to a level exceeding the typical correlation with single ensemble members (black dots). Given the striking similarity between these results and results from the coupled model predictions in Fig. 1, it appears that that these early simulations were subject to the same signal-to-noise paradox.

Other atmosphere model experiments suggest that the signal-to-noise paradox might also be present on multidecadal timescales. Although it has since declined,42 the large multidecadal increase of the NAO from its low values in the 1960s to its very high values in the early 1990s has been the subject of many studies. The mechanisms behind this shift are still only partly understood and studies have linked it to changes in the Indian ocean basin,43 changes in the stratosphere,44 changes in the tropics,45,46 simple internal variability,47 coupled ocean–atmosphere cycles,48 or even climate change.49,50 However, there is a common thread to many of these studies, in that (apart from very rare exceptions) these experiments consistently reproduce only a fraction of the observed low frequency variability, even when multiple models and multiple ensemble members are considered.51 Although it is difficult to assess the significance of such results because the period of rapid NAO increase has been preselected from the observational record, this underestimation of multidecadal variability of the NAO in atmosphere-only (and coupled ocean–atmosphere52) model experiments is consistent with the weak reproduction of NAO variability in the signal-to-noise paradox.

Numerous other studies also show weak modelled signals in simulations of the atmospheric circulation in the Atlantic sector. The NAO-like response to the stratospheric quasi-biennial oscillation in winter appears to be underestimated in climate simulations17,53,54 as is the NAO-like response to tripolar Atlantic SST anomalies55,56 and the apparent response to Arctic sea ice perturbations.57

In summary, it appears that the signal-to-noise paradox has been present in atmospheric climate models for some time and it may occur across a wide range of timescales and in the Atlantic response to a wide range of phenomena.

## A signal-to-noise paradox in the response to external climate forcing?

A first line of evidence for a signal-to-noise paradox in the climate response to external forcing comes from the simulated response to tropical volcanic eruptions. Analysis of historical climate data in post eruption winters again suggests a response in sea level pressure that projects strongly on to the winter Arctic Oscillation or NAO,58,59 see Fig. 4, left panel.

However, when volcanic aerosol forcing is added to climate models, the response in the AO or NAO is of the correct sign but invariably weak (Fig. 4, noting the different scale bars) and much weaker than observed for both multimodel and individual model studies.59,60,61,62 For example, Stenchikov et al.59 state that: “…associated dynamic perturbations and winter surface warming over Northern Europe and Asia in the post-volcano winters is much weaker in the models than in observations”. Furthermore, it has also been shown that the observed response appears to be too large to be easily reconciled as chance aliasing of internal variability of the Arctic Oscillation onto post volcanic winters.60,62,63 In contrast, the mean global cooling response to volcanic eruptions in climate models does not show this feature and may even be too strong,64,65 so it is likely that in this case, the global irradiance forcing is sufficient and it is again the regional response in the north Atlantic that appears to lack amplitude. Several studies also point out a prolonged response to volcanic forcing, with a second winter response that is similar to that in the winter immediately following the eruption58 but this lagged response is not generally reproduced in climate models either.59,62 The weak model response in the two winters following explosive tropical volcanic eruptions may therefore be another example of the signal-to-noise paradox, with a similar pattern but weaker amplitude Atlantic sector response than is found in observations.

A second line of evidence for a signal-to noise paradox in the external response to climate forcing comes from a number of studies that point out that the surface response to solar variability may be too weak in climate model experiments.66,67,68 Recent modelling studies have confirmed a regional response in sea level pressure that maps onto the Arctic Oscillation and NAO,67,69,70 as has previously been repeatedly suggested from analysis of historical climate observations.71,72,73 A connection with the signal-to-noise paradox described here comes from the observed response to the 11-year solar cycle reaching its maximum not at the peak of the solar cycle, but rather at a lag of a few (2–4) years.69,74 In climate model experiments,69 the transient response to a step change in solar forcing, grows year on year in association with a growing tripolar anomaly in the North Atlantic sea surface temperature. This tripolar SST pattern is known to feed positively back onto the atmospheric circulation associated with the NAO,37,38,40,55,75 so the integrating effect of the ocean due to the relatively long decay time of oceanic anomalies and the annual re-emergence of the solar induced heat content anomaly in the Atlantic could give rise to a delay in the maximum response69,74 as shown in Fig. 5.

Viewing the solar cycle as a boundary condition with an 11-year sinusoidal period, and viewing the response as an ocean integrated (cosine) wave with the same period, then we expect a lag of π/2 radians (one quarter cycle) in the timing of the maximum response. This is approximately 3 years for the 11-year solar cycle, as observed. In contrast, when taken as a whole, the reported responses to the 11-year solar cycle in current climate models appear to be weak and most model simulations show no clear lag.68,70,74,76,77 Scaife et al.74 showed that this could result from too weak a feedback in the surface climate response to solar variability. We should of course note that the strength of observational estimates of solar irradiance variations continue to be refined78 and this uncertainty in forcing may contribute to uncertainty in the amplitude of the solar response in surface climate. Nevertheless, the existence of a lag in the observed response and the apparent difficulty in simulating this lag in climate models is additional evidence of a weak response of the atmospheric circulation in the north Atlantic sector to external forcing which is again consistent with the signal-to-noise paradox in climate predictions.

Our last example of the possible effects of the signal-to-noise paradox is in the tropospheric climate response to the development of the ozone hole. While some studies report successful simulation of the temporal evolution,79 a number of studies have noted a weaker then observed response in the Southern Annular Mode (SAM)80,81,82,83 which is often then attributed to coincidental internal variability. While this coincidental alignment of forced and internal variability in the SAM is perfectly possible, we again note that the common tendency for models to simulate weaker than observed changes is consistent with a signal-to-noise paradox in the forced response of the SAM in models. Indeed, this has recently been found in seasonal climate predictions of the SAM.33

In summary, the signal-to-noise paradox may therefore also apply to a range of responses to external forcing, including volcanic forcing, solar variability, and ozone depletion.

## Implications

If the signal-to-noise ratio is underestimated by climate model simulations and predictions, then each model ensemble member cannot be regarded as an equivalent realisation of the real climate system to that seen in the observations as it contains a smaller proportion of predictable variance than the observations. This has a number of important implications:

1. 1.

Many measures will give inaccurate estimates of the forecast skill that is potentially available. These include error measures such as root-mean-squared error and mean-squared-skill-score,84 as well as probabilistic measures, including reliability and Brier skill score, that are based on the distribution of ensemble members.30 Errors in the signal-to-noise ratio can be corrected in forecasts by a postprocessing step that amplifies the predictable signal20,85 but measures that assess the raw model data without such a correction will be misleading. Note that anomaly correlation is not affected by the magnitude of the ensemble mean signal and is therefore unaffected, so it should be routinely included in any skill assessment, providing that a large enough ensemble is used to accurately estimate the predictable (ensemble mean) signal.

2. 2.

Seasonal forecasts of tropical regions are typically overconfident and their statistical properties may be improved by techniques such as stochastic physics86 that often increase the ensemble spread. However, such techniques could potentially exacerbate problems where the signal-to-noise ratio is too small and the models are under-confident.

3. 3.

Predictability is often estimated from model ensembles, for example, by assessing the skill of predicting a single model member instead of the observations.87,88,89 This has often been regarded as an upper limit of the skill that could be achieved using a particular model90 because it mimics the situation in which each ensemble member is initialised with perfect observations. However, if the signal-to noise ratio were too large, then the additional predictability in the model ensembles would be an overestimate, rather than representing potential for future improvement. Similarly, if the modelled signal-to-noise ratio is too small, as found here, then the real world is more predictable than the model (Fig. 1) and this approach will underestimate the true predictability.

4. 4.

Event attribution91,92 seeks to quantify the change in the probability of weather events due to human influences. One approach is to compare a large ensemble of model simulations driven by observed SSTs with another ensemble driven by counterfactual SSTs—obtained by removing the anthropogenic signal. This relies on the model correctly simulating the amplitude of the response to SSTs and will give incorrect results, especially in the North Atlantic sector, where the signal-to-noise ratio is too small.

5. 5.

Large ensembles of model simulations suggest that natural internal variability is the major source of uncertainty in regional climate change projections over the coming decades.93 This approach relies on the models correctly responding to external forcing, including greenhouse gases, anthropogenic aerosols and ozone. If the signal-to-noise paradox also applies to the response to these forcing factors, then the role of internal variability will be overestimated by this technique.

6. 6.

Although the signal-to-noise paradox highlights a potentially serious problem with climate models, its discovery helps to reveal that skilful forecasts are now possible for some phenomena, including the NAO,23 including some of the most extreme cases94 that were previously thought to be unpredictable.95 We note that a large ensemble is required in order to extract the maximum predictable signal (Fig. 1), and postprocessing is needed to boost its magnitude.20

7. 7.

Resolving the signal-to-noise paradox and correcting it in climate models could increase the strength of the model response to a whole host of phenomena and would settle longstanding debates about whether various teleconnections are real. It would also enable smaller ensembles to be used for detection, attribution and prediction, and could increase the skill of climate forecasts and climate services.

## Conclusions

We have provided a wide range of evidence for a ‘signal-to-noise paradox’ in climate science. The paradox lies in the fact that climate models are better able to predict observed climate variability than would be expected from their low signal-to-noise ratio. However, in many cases, the total amount of variability found in ensemble member simulations closely matches that found in observations, and so it is not just a simple case of models being too ‘noisy’ or containing too much variability. We instead conclude that the amplitude of predictable signals in response to boundary conditions or external forcing may be much too weak, especially in the Atlantic sector. This helps to explain why so many climate modelling studies show clear relationships between model and observations only after anomalies are ‘standardised’. These anomalously weak signals in predictions hamper the use of seasonal and decadal predictions, inhibit the validity of probabilistic and ensemble approaches and prevent the accurate estimation of forced climate variability in the Atlantic sector.

The signal-to-noise paradox appears to be ubiquitous across timescales: it appears on timescales of seasons20,23,24,25,36,38 years20,26,39 and multi-decades.43,50,51 It may even be present on multi-century timescales in the Atlantic sector as there is proxy observational evidence for negative NAO and associated European cooling in the Little Ice Age96,97 but numerous studies have noted only weak model responses in the NAO98 and associated temperatures.99

The signal-to-noise paradox also appears to be ubiquitous across different climate models, spanning many years of model development and using a wide variety of ensemble generation techniques.23,25,27,28,29,38,39,43,55

The signal-to-noise paradox appears to be robust across different experimental procedures. It appears as weak signals in ensemble forecasts20,23,25,26,28,36 in atmosphere-only simulations forced by prescribed ocean conditions29,38,39,51,55 and in models subjected to changes in radiative forcing.59,61,62,74,99

While this review cannot provide absolute proof, it summarises a growing body of evidence for a signal-to-noise paradox in initialised climate predictions. A chance alignment of unpredictable and predictable variability could in principle lead to an apparent paradox in this context but this is very unlikely.21 Instead we suggest evidence that it may arise from an underestimate in the strength of a wide variety of North Atlantic teleconnections in climate models. The reasons for this remain unclear but there are a number of obvious candidates including: lack of extratropical ocean–atmosphere coupling, weak eddy feedback in current resolution models, errors in remote teleconnections, or errors in parametrised processes such as atmospheric convection. Some of the further supporting evidence given here may eventually be explained by other means, but there is also evidence that the signal-to-noise paradox may be present in the modelled response of the Atlantic sector to external radiative forcing. We do not yet know whether it applies to the regional response to anthropogenic greenhouse gases, but of course that is an important question for future research, as it could imply large changes in regional climate that are currently unrepresented in climate model projections.

### Data Availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.