INTRODUCTION

The signal-to-noise paradox1,2 highlights a serious deficiency in climate models and has recently generated a lot of interest in the climate prediction community. In a perfect seasonal forecast system, each forecast ensemble member should behave equivalently to the observations, in the sense that the skill of the model in predicting the observations (or the correlation of the model ensemble mean with the observations, rmo) should be statistically indistinguishable from the skill of the model in predicting its own forecast ensemble members (or the average correlation of any given ensemble member with the mean of all other members, rmm). In other words the Ratio of Predictable Components (RPC = rmo/rmm) should equal 1. The signal-to-noise paradox occurs when RPC is greater than 1, as is found to be the case in almost all current seasonal forecast systems3. Forecasts in regions where RPC > 1 are ‘under-confident’, and this leads to a number of issues, including the apparently paradoxical result that the model is better at predicting the real world than its own ensemble forecast members. The standard metrics for forecast skill will then underestimate the skill that is potentially available in a forecast system2. Additionally, a very large ensemble is needed to extract the predictable signal, and the variance of the ensemble mean must be inflated to match the observed predictable signal1. Small signal-to-noise ratios also lead to uncertainty in decadal predictions, and it requires very large ensembles to remove this uncertainty. For example, the North Atlantic Oscillation (NAO) predictable signals are around 2–3 times too small in seasonal forecasts, and could be as much as ten times too small in decadal forecasts4,5. Since the error variance decreases linearly with ensemble size, such weak signals require 4–9 (seasonal) and 100 (decadal) times more ensemble members than would a perfect model, representing a significant computational cost4,6,7. Therefore there is a current need for large ensembles, and a significant effort in the seasonal and decadal forecast communities to try to explain and resolve the signal-to-noise paradox.

Currently, theories as to why the paradox exists include: a deficiency of atmospheric eddy feedback in climate models due to insufficient spatial resolution8, a deficiency in simulated ocean eddies alongside weak ocean–atmosphere coupling9,10,11, a bias in surface drag leading to too much baroclinic instability12, and inaccurate regime persistence13,14. Some of these may be interrelated15, but in this study, we focus on the first of these ideas, a deficiency in atmospheric eddy feedback.

Eddy feedback is the process whereby interaction with small-scale transient eddies amplifies large-scale quasi-stationary climate anomalies in the mid-latitudes. It is an essential part of the process whereby remote influences impact the tropospheric jets16,17,18,19,20,21,22,23,24,25,26,27. Eddy feedback is therefore crucial for simulating the correct strength of the change to the tropospheric jets caused by remote influences in climate models, from monthly through to centennial time scales. However, eddy feedback has been shown to be deficient in current climate models8,24,28. Although there have been several hypotheses relating to the interaction of wave propagation and breaking with the latitudinal and vertical structure of the jets in models29,30,31, none have yet suggested a means of resolving this model deficiency.

By computing the magnitude of eddy feedback in a range of seasonal forecast systems we demonstrate that these systems are deficient in eddy feedback, and we show an important link between this deficiency and the signal-to-noise error in those systems. We consider how reducing the eddy feedback deficiency in forecast systems has the potential to improve their skill. A mechanism whereby this may occur, via the influence of the El Niño Southern Oscillation (ENSO) on the Arctic Oscillation32,33 is discussed. Finally, the potential for more accurate eddy feedback to improve regional forecast skill is considered.

RESULTS

Signal-to-noise

In order to investigate a link between eddy feedback and signal-to-noise ratio, data from seventeen seasonal forecast systems is used, alongside the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis version 5, ERA534,35 dataset. We define an eddy feedback parameter (EFP) for ERA5 and for each forecast system28, quantifying the link between atmospheric wave driving and the zonal mean wind. Predictable components (rmm and rmo) are formed from an Arctic Oscillation (AO) index. Details of the reanalysis data and all forecast systems, equations for the eddy feedback parameter, and definitions of the AO and the predictable components are all given in the ‘Methods’ section.

We find that the forecast system predictable components, both the model–observed skill (rmo) and the model–model skill (rmm), are significantly positively correlated with eddy feedback (Fig. 1a and b). Systems are all deficient in eddy feedback, thus Fig. 1 demonstrates that systems with greater and more realistic values of EFP also exhibit increased skill. The values of rmm are around 0.25, lower than the values of rmo which are around 0.4, demonstrating the signal-to-noise error. The RPC becomes ill-defined as model skill (rmo) tends to zero2, thus we focus on systems with significant skill when considering the RPC. Figure 1c shows a significant negative correlation between eddy feedback and RPC in these skilful systems. Again, since RPC is greater than 1 in the vast majority of systems (the signal-to-noise paradox), a greater and more realistic value of EFP leads to a value of RPC closer to 1 and, therefore, a reduced signal-to-noise error. Indeed, the regression line in Fig. 1c crosses RPC = 1 very close to the value of EFP calculated from observational reanalysis. This figure supports the hypothesis that increased eddy feedback, potentially through increased horizontal resolution, would improve weak signal-to-noise ratios in forecast systems8.

Fig. 1: Eddy feedback relation to skill and RPC.
figure 1

Eddy feedback parameter (EFP) correlated with model–model skill (rmm), model–observed skill (rmo) and RPC (the ratio of predictable components, where RPC = rmo/rmm), as calculated from the AO index using geopotential height at 850 hPa (AO GPH850). EFP is defined as the correlation squared between the December–January–February horizontal EP-flux divergence and the zonal mean zonal wind, at each latitude at 500 hPa, area weighted 25N–72N. The value of EFP computed from ERA5 reanalysis data is shown with a black vertical line in a and b and as a black circle in c. Thick black lines show linear least squares regression lines. In b the point where the regression line crosses the observed value of EFP is marked with a hollow black box as discussed in relation to Fig. 4. Models used in c are those that have significance at the 90% level in b, equating to an rmo value greater than 0.277. In Table 1, the same subset of models is used at all levels. See ‘Methods’ section for explanation of different symbols.

Figure 1 shows results calculated at 850 hPa, but all results shown are largely independent of height (see Table 1 where results are shown using geopotential height (GPH) at 500 hPa and 850 hPa, and using mean sea-level pressure (MSLP)). It is stressed that, throughout this study, it should be remembered that correlations between two quantities do not demonstrate causality, but rather show where there is a significant link or relationship between those quantities. Nevertheless, Fig. 1 shows improvements in simulated eddy feedback to be linked to improved signal-to-noise ratios, and also linked to increased system skill.

Table 1 Correlation (r) between eddy feedback parameter (EFP) and skill/RPC.

ENSO teleconnection

The impact of ENSO on the AO is well documented, with teleconnection pathways going via the troposphere and stratosphere. The importance of eddy feedback for the response via the tropospheric pathway has long been known36. In this study, we focus on the stratospheric teleconnection pathway, for which the mechanism is as follows. El Niño leads to an intensification and eastward shift of the climatological Aleutian cyclone in the North Pacific, causing increased planetary wave flux to enter the stratosphere37, where it leads to a weaker than average winter stratospheric polar vortex38. This weak vortex then projects onto a negative AO25. The opposite occurs for La Niña. One way of diagnosing this teleconnection is to consider GPH anomalies, composited onto El Niño and La Niña years.

Figure 2 shows maps of the difference in GPH 500 hPa anomalies between El Niño and La Niña years, for the reanalysis (Fig. 2a), and for composites across systems with strong eddy feedback (Fig. 2b) and weak eddy feedback (Fig. 2c)—see ‘Methods’ section for details. As expected, the signal in the reanalysis is that of a negative AO in response to El Niño, particularly strong in the north Atlantic, projecting onto the NAO, and in the north Pacific (strengthening of the climatological Aleutian cyclone37). The same signal is captured in the forecast systems, with both composites showing a deficiency in the magnitude of the signal, but the strong eddy feedback composite being closer to the reanalysis than the weak eddy feedback composite, consistent with that found in earlier-generation climate models24. In addition, the difference between the strong and weak composites is a clear AO signal that looks very similar to that in the reanalysis, and is statistically significant (Fig. 2d).

Fig. 2: ENSO teleconnection.
figure 2

ENSO anomalies of the 500 hPa GPH field for a ERA5, composites over models with b strong and c weak eddy feedback24, and d the difference between strong and weak composites (bc). ENSO anomalies are formed by taking GPH anomalies in El Niño years (Niño 3.4 index > 1.5K) minus those in La Niña years (Niño 3.4 index < 1.5K). Model composites based on eddy feedback are defined in the Methods section. Stippling in d shows where the difference in model composites is statistically significant at the 95% level.

This is highly suggestive of a link between eddy feedback and the strength of the ENSO teleconnection to the AO captured in a forecast system, namely that the strength of the teleconnection increases and becomes more realistic as the strength of the eddy feedback increases and becomes more realistic. To quantify this, the ENSO teleconnection ‘strength’ is defined as the regression coefficient between the normalised Niño 3.4 index and the AO index in a forecast system (see Methods section for details). A strongly positive correlation of 0.69 is found between the ENSO teleconnection strength and the eddy feedback parameter across all the forecast systems33 (Fig. 3a).

Fig. 3: EFP, teleconnection strength, and predictable signals.
figure 3

Correlation of ensemble mean a eddy feedback parameter (EFP) and ENSO teleconnection strength, b ENSO teleconnection strength and the predicted AO signal (measured by the ensemble mean variance), and c EFP and the predicted AO signal. The EFP and ENSO teleconnection strength are defined in the Methods section. The value of the EFP computed from ERA5 reanalysis data is shown with a black vertical line in a and c.

To further investigate the implications of this link, we find a very strong positive correlation between the ENSO teleconnection strength and the predictable signal, measured by the ensemble mean variance, in the forecast systems of 0.89 (Fig. 3b). Further, we find a significantly positive correlation between eddy feedback and the predictable signal in the forecast systems of 0.49 (Fig. 3c). In summary, our results suggest that increased eddy feedback is related to a stronger ENSO teleconnection which, in turn, leads to stronger simulated predictable signals.

Potential gain in regional skill

We have shown that increased/more realistic eddy feedback is associated with increased ENSO teleconnection strength, improved signal-to-noise ratio, and increased AO forecast skill. We now consider the potential gain in regional prediction skill if the eddy feedback in forecast systems was at realistic levels.

Taking the AO as an example, we define the current average model skill as the average, across all model systems, of rmo (i.e. the average of all rmo values in Fig. 1b). It is also possible to calculate the potential skill that could be achieved by increasing the eddy feedback in forecast systems to equal that in the reanalysis. This is defined as the skill value where the regression line across all systems crosses the ERA5 value of eddy feedback parameter, shown by a hollow black box in Fig. 1b. The potential gain in skill is then simply the potential skill minus the current average model skill. By using rmo scatter plots similar to Fig. 1b, but defined over smaller regions, this approach is applied more generally to consider the potential gain in regional prediction skill (further details are given in the Methods section).

The current average model skill is found to be high in the tropics and around the region of the Aleutian low in the extratropics3 (Fig. 4a). It is generally lower in the extratropics, but significant in the NAO regions (the Azores and Iceland6,39). The potential gain in skill that arises from correct eddy feedback is shown in Fig. 4b. In the extratropics this is comparable in magnitude to the current average skill, particularly in the NAO regions and east Asia, suggesting the potential to double the skill of seasonal forecasts in these regions. As might be expected, the regions where there is potentially the greatest skill to be gained by improved eddy feedback correspond closely to those regions where the signal-to-noise error is largest in the skilful forecast systems (Fig. 4c)1,3, consistent with improved eddy feedbacks driving larger predictable signals. There are also some regions (for example central North America and continental northern Europe) where improvement in eddy feedback would not lead to any increase in model skill, and in these regions, the model is currently over-confident (RPC < 1).

Fig. 4: Model skill.
figure 4

Maps of a Average model skill, b potential gain in skill by improving EFP to observed values (see Methods section for details) and c Ratio of Predictable Components averaged across skilful models (those used in Fig. 1c). In b, red colours denote regions where skill could potentially be improved by a more accurate simulation of the eddy feedback parameter. In c, red colours denote regions where the signal-to-noise paradox is evident, and white indicates regions where model skill is negative for all models.

DISCUSSION

In this study, we have investigated one potential cause of the signal-to-noise paradox—that of a deficiency in eddy feedback in current seasonal forecast systems. We have considered the impacts of improving the accuracy of the eddy feedback in these systems.

We find that increased eddy feedback is strongly linked with a reduced signal-to-noise error, with linear regression suggesting that the error would be completely removed (RPC = 1) if the eddy feedback in forecast systems was equal to that in the reanalysis.

Consistent with improving the RPC, we find that increased eddy feedback increases model–model predictability (rmm) and model–observed skill (rmo). In particular, increased and more accurate eddy feedback is found to increase the strength of the ENSO teleconnection to the AO in forecast systems which, in turn, leads to stronger predictable signals. The potential gain in skill from corrected eddy feedback is considered regionally. It is found that correlation skill scores could roughly double in many extratropical regions, particularly in the region of the NAO, if eddy feedback were of a realistic magnitude. Thus, improving eddy feedback in forecast systems may yield significant improvement in the skill of seasonal forecasts in the extratropics.

The regions where there is potentially the greatest skill to be gained by improved eddy feedback correspond closely to those regions where the signal-to-noise error is largest. This emphasises the point that improving eddy feedback in forecast systems may be important for both improving the signal-to-noise error in these systems, and increasing their skill.

It is important to note that our analysis does not prove cause and effect - we cannot say that imposing the observed eddy feedback in a forecast system, or including a parameterisation of the missing eddy feedback, would cure the signal-to-noise paradox. For example, if larger predictable signals were achieved by another means, this would impact the simulated jet stream and therefore potentially be realised as increased eddy feedback. Nevertheless, our results are still strongly suggestive of eddy feedback deficiency contributing to the signal-to-noise error.

The eddy feedback parameter used in this study is a seasonal mean, zonal mean, quantity as recently used to study the atmospheric circulation response to climate change28. To investigate, in addition, the transient component of the synoptic eddy forcing, and the effects of eddy feedback on the full three-dimensional flow, one might use the eddy-induced growth rate40,41. This may help in better understanding the physical mechanisms underlying eddy feedback, and is a topic for future research.

Since it is unlikely that model resolution will increase in the near future to anything like that required to accurately and fully resolve eddy feedback8, our results motivate an increased effort to both understand the physical mechanisms underlying eddy feedback and to conceive how to implement imposed eddy feedback or a parameterisation of missing eddy feedback in forecast systems.

METHODS

Data

The ECMWF Reanalysis version 5, ERA534,35, is used as the ‘observations’ or ‘truth’ in this study. Seasonal forecast systems are from a variety of sources. The most up-to-date systems from the Copernicus Climate Data Store (C3S) for which all required data is available are the ECMWF SEAS5 system (ECMWF42), Météo-France System 7 (METEO43), the Centro Euro-Mediterraneo sui Cambiamenti Climatici SPS3.5 (CMCC44), the Deutscher Wetterdienst GCFS2.1 (DWD45), the National Centers for Environmental Prediction CFS version 2 (NCEP46), the Japan Meteorological Agency CPS2 (JMA47), and the land-initialised versions of the UK Met Office Global Seasonal Forecast system version 5 (UKMOGC2LI48). These are shown as circles on all scatter plots in this study. Older seasonal forecast system versions also obtained from C3S are DWD GCFS2.0 (DWDSYS245), CMCC SPS3 (CMCCSYS349), METEO System 6 (METEOSYS650), and climatological-land versions of UKMO GloSea5 (UKMOGC248), denoted by squares on the scatter plots. There are three systems from the DEMETER ensembles51 for which the required data is available, denoted UKMO-D, SMPI-D, and SCNR-D. These are denoted by crosses on the scatter plots. There are two systems from the North American Multi-Model Ensemble (NMME) for which data was obtainable, namely the CCCma climate model version 3 (CanCM352) and version 4 (CanCM453), denoted by triangles on the scatter plots. Finally, the new Met Office Global Seasonal Forecast system version 6, also available on C3S (UMKOGC348), is added as a hexagon on the scatter plots. The number of ensemble members available from each system is listed in Table 2.

Table 2 Forecast system ensemble members.

For all of these systems, forecasts start at the beginning of November. Daily zonal wind (u) and meridional wind (v) at 500 hPa, monthly mean GPH at 500 hPa and 850 hPa, and monthly MSLP is used for all winter months (December–January–February), a forecast lead time of 2–4 months. The years 1993–2016 are available for C3S systems (23 winters), 1970–2001 for DEMETER systems (31 winters), and 1981–2012 for NMME systems (31 winters). For each system, the corresponding period is used from ERA5.

Eddy feedback parameter

Eddy feedback, the feedback between small-scale transient eddies and large-scale quasi-stationary climate anomalies, can be quantified as the link between resolved wave driving and the zonal mean wind in climate models. Here, the eddy feedback parameter is defined as follows. Using daily u and v data at 500 hPa, the zonal acceleration due to the quasi-geostrophic component of the horizontal EP-flux divergence54 is computed:

$$\frac{\nabla \cdot {{{\bf{{F}}}_{{{{\rm{H}}}}}}}}{\rho a\cos \phi }=\frac{-1}{a{\cos }^{2}\phi }\frac{d(\overline{u^{\prime} v^{\prime} }{\cos }^{2}\phi )}{d\phi }$$
(1)

where ρ is density, ϕ is latitude, a is the radius of the Earth, overbar represents a zonal mean, and \(^{\prime}\) represents the residual after removing the zonal mean. The December–January–February mean of this zonal acceleration is then formed from the daily values, for each year. Next, the December–January–February mean of the zonal mean zonal wind, \(\overline{u}\) is computed for each year. Then the correlation at each latitude, across years, between the zonal acceleration and \(\overline{u}\) is calculated. The area-weighted average, 25N–72N, of this correlation squared is the eddy feedback parameter.

This definition is identical to that used previously28, except simplified in two ways due to the limited amount of data available for some systems. Firstly, we only use data at 500 hPa, as opposed to a vertical integral between 600 hPa and 200 hPa. Data at 500 hPa is available for all systems, and the signal has been found to be coherent across the different vertical levels28. Secondly, only the quasi-geostrophic component of the horizontal EP-flux divergence is included. However, the other components of the horizontal EP-flux divergence are small. Thus, these two simplifications make the calculations possible across many more systems, and should not impact the results.

Predictable components

The ratio of predictable components is defined as

$$RPC=\frac{{r}_{{{{\rm{mo}}}}}}{{r}_{{{{\rm{mm}}}}}}$$
(2)

where rmo, the skill of the model in predicting the observations, is defined for each system as the correlation across years between ERA5 quantities and those from the system ensemble mean1,2. The definition of rmm, the model–model skill, has been previously given as1

$${r}_{\rm{mm}}=\sqrt{\frac{{\sigma}_{{\rm{ens}}\;\mathrm{mean}}^{2}}{{\sigma}_{\rm{total}}^{2}}}$$
(3)

where \({\sigma }_{{\rm{ens}}\;{\mathrm{mean}}}^{2}\) is the variance of the ensemble mean and \({\sigma }_{{{{\rm{total}}}}}^{2}\) is the total variance of individual ensemble members. It has also been defined as the average of the correlation of a single ensemble member with the mean of all other ensemble members2.

Here we use an improved calculation of rmm that is less sensitive to ensemble size. Equation (3) will be an over-estimate for rmm when the number of ensemble members used is small1. Further, the average ensemble member correlation2 is found to be an under-estimate when the number of ensemble members used is small. Therefore, rmm, is here defined as follows.

Starting from equations for the total variance and the ensemble mean variance55:

$${\sigma }_{{{{\rm{total}}}}}^{2}={\sigma }_{{{{\rm{signal}}}}}^{2}+{\sigma }_{{{{\rm{noise}}}}}^{2}$$
(4)
$${\sigma }_{{{{\rm{ens}}\;\mathrm{mean}}}}^{2}={\sigma }_{{{{\rm{signal}}}}}^{2}+\frac{{\sigma }_{{{{\rm{noise}}}}}^{2}}{N}$$
(5)

combine and rearrange to obtain

$${\sigma}_{{{{\rm{signal}}}}}^{2}=\frac{N{\sigma}_{{{{\rm{ens}}\;\mathrm{mean}}}}^{2}-{\sigma}_{{{{\rm{total}}}}}^{2}}{N-1}.$$
(6)

Then rmm = σsignal/σtotal such that

$${r}_{{{{\rm{mm}}}}}=\sqrt{\frac{N({\sigma }_{{\rm{ens}}\;{\mathrm{mean}}}^{2}/{\sigma }_{{{{\rm{total}}}}}^{2})-1}{N-1}.}$$
(7)

The total variance is computed, across years, by using the individual ensemble members and bootstrapping using 1,000,000 random samples with replacement. The ensemble mean variance is computed with a single calculation. The equation for rmm used here takes a value in between the estimates for equation (3) and the average ensemble member correlation, and is found to be stable regardless of the number of ensemble members (not shown).

AO and NAO indices

The AO index is defined as the zonal mean area-weighted 30N–60N mean minus the zonal mean area-weighted 60N–90N mean48, and computed using GPH at 500 hPa and 850 hPa, and MSLP. The NAO index is defined as an area-weighted mean centred around the Azores (28.5–20W, 36–40N) minus an area-weighted mean centred around Iceland (25–16.5W, 63.5–70N) 56, and computed using MSLP.

ENSO teleconnection

The El Niño Southern Oscillation (ENSO57) index used in this study is the December–January–February mean of sea surface temperature anomalies in the Niño 3.4 region (5N-5S, 120-170W) obtained from https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php. The regression coefficient, across years, of this normalised Niño 3.4 index with the AO index (calculated using GPH 850 hPa) is used to define the strength of the ENSO teleconnection. For each system, the ensemble mean AO index is regressed against the Niño 3.4 index to give the value of the teleconnection strength (in metres) shown in Fig. 3a and b.

For the maps in Fig. 2, we first divide the systems into three roughly equal-sized groups: those with high values of the eddy feedback parameter (DWD, CMCC, DWDSYS2, CMCCSYS3, and CanCM4)—a ‘Strong’ model composite, those with low values of the eddy feedback parameter (NCEP, SMPI-D, SCNR-D, METEOSYS6, and CanCM3)—a ‘Weak’ model composite, and the remaining seven form a ‘neutral‘ composite (although this composite is not used). Next, we define years for which there were El Niño events and La Niña events. In order to have roughly equal numbers of events, we here define the threshold such that an event is said to have occurred if the magnitude of the Niño 3.4 index is greater than 1.5K. Using ERA5, and each system individually, we form El Niño minus La Niña differences of the GPH 500 hPa field. In the case of the models, these are then averaged to form the composite maps shown in Fig. 2b and c. In the case of the observations, El Niño minus La Niña differences are formed using ERA5 for each of the three model periods (1993–2016, 1970–2001, and 1981–2012) with the average of these differences shown in Fig. 2a. The result is found to be insensitive to the exact period chosen.

Potential skill and RPC maps

In order to compute the skill maps in Fig. 4, the following method is used. Using the GPH 850 hPa field, a scatter plot, equivalent to that shown in Fig. 1b, is computed for each area-weighted 10 longitude × 5 latitude region of the globe (a total of 36 × 36 = 1296 plots). For each plot, the average model skill (shown in Fig. 4a) is the straight average, across all model systems, of rmo. The potential skill is the value of rmo where the linear least-squares best fit line crosses the ERA5 value of eddy feedback parameter (equivalent to that shown by a hollow black box in Fig. 1b). This makes the assumption that the relationship between eddy feedback and skill remains linear at least up to the observed value of eddy feedback. The potential gain in skill (shown in Fig. 4b) is then the potential skill minus the average model skill, in each region.

RPC maps are computed by dividing average model–observed skill (rmo, shown in Fig. 4a) by average model–model skill (rmm, computed in the same way) for each region for each model. The RPC values in each region are then averaged across all skilful models (those included in Fig. 1c) to produce the RPC map shown in Fig. 4c.