Decadal predictability of North Atlantic blocking and the NAO

Can multi-annual variations in the frequency of North Atlantic atmospheric blocking and mid-latitude circulation regimes be skilfully predicted? Recent advances in seasonal forecasting have shown that mid-latitude climate variability does exhibit significant predictability. However, atmospheric predictability has generally been found to be quite limited on multi-annual timescales. New decadal prediction experiments from NCAR are found to exhibit remarkable skill in reproducing the observed multi-annual variations of wintertime blocking frequency over the North Atlantic and of the North Atlantic Oscillation (NAO) itself. This is partly due to the large ensemble size that allows the predictable component of the atmospheric variability to emerge from the background chaotic component. The predictable atmospheric anomalies represent a forced response to oceanic low-frequency variability that strongly resembles the Atlantic Multi-decadal Variability (AMV), correctly reproduced in the decadal hindcasts thanks to realistic ocean initialization and ocean dynamics. The occurrence of blocking in certain areas of the Euro-Atlantic domain determines the concurrent circulation regime and the phase of known teleconnections, such as the NAO, consequently affecting the stormtrack and the frequency and intensity of extreme weather events. Therefore, skilfully predicting the decadal fluctuations of blocking frequency and the NAO may be used in statistical predictions of near-term climate anomalies, and it provides a strong indication that impactful climate anomalies may also be predictable with improved dynamical models.


INTRODUCTION
In the well-observed historical period, climate in different parts of the world is shown to undergo a warming trend, but also significant interdecadal variations that compensate, or exacerbate the former 1 . These variations are associated not only with changes in the radiative forcing, but also to natural variability in the atmospheric and oceanic circulation patterns 2 . Decadal predictions fill the gap between assessing long-term climate trends (climate projections) and predicting short-term climatic anomalies (seasonal forecasting), thus targeting near-term regional climatic anomalies with multi-annual lead time and responding to an increasingly needed service to society.
Skillful prediction of atmospheric circulation anomalies is known to be extremely challenging on the multi-annual timescale [3][4][5] . Arguably, since decadal predictability for the atmosphere is primarily a boundary-value problem, larger ensemble sizes are needed to let the predictable signal emerge from noise, the latter referring to the inherently unpredictable component of atmospheric low-frequency variability. In this regard, the 40-member Community Earth System Model-Decadal Prediction Large Ensemble (CESM-DPLE) 6 run by the National Center for Atmospheric Research (NCAR) provides new opportunities to explore atmospheric predictability.
Focusing on the extratropical Euro-Atlantic sector, the tropospheric wintertime circulation anomalies projecting on the respective dominant pattern of variability, the North Atlantic Oscillation (NAO), have been shown to exhibit significant predictability on the seasonal timescale [7][8][9] . Given that the NAO variability strongly relates to variations in Greenland blocking frequency 10 , comparable seasonal predictability in winter is expected, and has been found for the latter 11 . The NAO has been considered by many as white noise and therefore unpredictable, but the use of large ensemble sizes in seasonal forecasting showed that the seasonal to interannual NAO variability contains, in fact, predictable components. As recent studies indicate [12][13][14] , these components represent the response to boundary forcings (sea surface temperatures, sea ice, snow cover, and soil moisture), while the initial state of the atmosphere (including the stratosphere) may also contribute to sub-seasonal predictability 15,16 .
Dynamical processes analogous to those underlying seasonal predictability may be active on longer than interannual timescales, associated with lower-frequency oceanic forcing. In fact, a number of observational and modeling studies [17][18][19][20][21][22][23] provide supporting evidence that ocean dynamics may drive part of the observed decadal atmospheric variability. This part is believed to be small, though potentially predictable 24 .
In turn, atmospheric circulation anomalies over the North Atlantic exert a strong influence on the ocean 25,26 via surface fluxes of heat, freshwater and momentum (wind stress). A number of different feedback mechanisms have also been proposed [27][28][29] to explain the bidirectional air-sea interaction occurring at timescales beyond interannual. An overarching idea is that the atmosphere responds to the ocean preferably through its own intrinsic modes of variability, among which the NAO is the dominant one in the North Atlantic sector 30 . The NAO exhibits large interannual variability, only a small fraction of which is likely to be forced by the ocean. While the NAO drives ocean circulation anomalies 31 , the ocean integrates the atmospheric forcing and in turn exerts a delayed forcing on the atmosphere. This further atmospheric response projects on the NAO, reddening its spectrum 32 and/or giving rise to a quasi-oscillatory behavior with characteristic spectral peaks 33 . Of course, the reddness of the NAO spectrum may also be related to and explained by a number of different mechanisms 34,35 . Clearly, depending on the spatial pattern and the timescale of the oceanic forcing, the oceanic feedback to the atmosphere may involve different physical processes. Focusing on the multi-decadal timescale, there is no definite consensus regarding the mechanism that gives rise to the North Atlantic variability 36 .
In this framework, given the high predictability of sea surface temperature (SST) variations in the North Atlantic 6 , an oceanic forcing on the mid-latitude atmospheric circulation would be a source of predictability for the latter in the decadal range. By exploiting the large ensemble of the CESM decadal prediction simulations, this study assesses mid-latitude atmospheric predictability through blocking and the NAO. Detecting decadal predictability for components of the atmospheric circulation is remarkable and opens the door to new possibilities.

RESULTS
Decadal skill for blocking and the NAO The CESM-DPLE is documented by Yeager et al. 6 , while additional details about the model and the hindcast data can be found in the last section (Methods). An element that is key for the present study is the relatively large ensemble size (40 members) that renders this data set unique. Given the lower signal-to-noise ratio characterizing climate forecasts as compared to the real world 37 , ensemble averaging is essential for extracting the predictable signal from the bulk of chaotic variability, particularly for the atmospheric circulation over the North Atlantic. Anomalies that survive ensemble averaging do so thanks to being similar across a large proportion of the ensemble, and therefore are indicative of a common predictable component. Instead, anomalies that vanish through averaging do so because they are different from member to member, as chaotic noise would be. A large ensemble size makes the isolation of the signal (through averaging) more effective 38 , and it is this well-documented effect that makes CESM-DPLE particularly powerful for assessing atmospheric decadal predictability at mid-latitudes.
A sufficiently realistic representation of ocean and atmospheric dynamics, including air-sea interaction processes, is a prerequisite for a model to correctly represent decadal atmospheric predictability. In this regard, a realistic model climatology is indicative of the fidelity of the model. The representation of the observed climatology of blocking frequency for winter (December to March, DJFM) by CESM-DPLE is assessed in Fig. 1. This is for all ensemble members and for each initialization year , using those lead years that fall in the historical period used to define the observed climatology . Mean bias correction has been applied as described in "Methods". Mean bias correction generally improves blocking in models 39 , yet, here the blocking frequency remains underestimated in the North Atlantic domain, particularly over Greenland. This is a problem common to many state-of-theart climate models 40 . Not surprisingly, given the above-mentioned underestimation, the interannual variability of winter blocking frequency is equally underestimated by CESM-DPLE, although the spatial distribution of variability is very similar to observations (Fig. 1).
An area of high blocking frequency is identified for this analysis (blue frame in Fig. 1) relating to North Atlantic high-latitude blocking (HLB) 41 . The rationale for choosing this area is that HLB tends to be accompanied by a southerly displaced eddy-driven jet (negative NAO), as shown in previous studies 10,11,30,42 . Conversely, the absence of blocking over the North Atlantic tends to coincide with a positive NAO regime 43 and an eddy-driven jet close to its climatological position 10 . Hence, the area relating to HLB is chosen due to its direct link to the NAO, which is the leading mode of variability in the North Atlantic sector and the most documented variability pattern representing the atmospheric response to extratropical North Atlantic SST anomalies. Figure 2a documents the predictive skill of the 40-member ensemble mean for the number of blocking days in winter belonging to blocking episodes (as described in Methods) occurring anywhere in the HLB area (blue frame in Fig. 1). Specifically, for HLB it is required that the center of the detected blocking falls within the selected area. The anomaly correlation coefficient (ACC) is shown for all possible lead-year ranges, determined by the start lead-year (ordinate) and the end lead-year (abscissa). For example, for the initialization year 1990 the leadyear range LY [3][4][5][6][7][8] represents the average of the DJFM anomalies falling between December 1992 and March 1998. The blue markers (open circles) in this figure indicate that the respective correlation coefficients were found to be not statistically significant. Hereafter, this type of plot is referred to as ACC matrix. Evidently, for HLB the skill is statistically significant over various lead-year ranges, reaching as high as 0.65 for LY [1][2][3][4][5][6][7][8] (indicated by the "X" marker). Details about the statistical testing can be found in "Methods". It is noted that the predictive skill for HLB is largely unaffected by linear detrending of the timeseries (see Fig. 6).
The same figure (panel c) shows the ACC matrix for the NAO index. As for HLB blocking, the skill is statistically significant over various lead-year ranges, reaching as high as 0.63 for LY [2][3][4][5][6][7][8] and 0.58 for LY [1][2][3][4][5][6][7][8], which is comparable to that for HLB. The NAO index is defined hereafter zonally averaging the mean sea level pressure (MSLP) at 35°N and 65°N, between 80°W and 30°E, as indicated by the blue lines in Fig. 3d. This definition 44 accounts for the zonal migration of the NAO centers of action. The traditional definition 45 yields a comparable ACC matrix with correlation differences that do not exceed 0.09. Linear detrending of both the observed and the model NAO timeseries reduces the ACC values (see Fig. 6) indicating that MSLP over the North Atlantic exhibits trends associated with external forcing 46 . Figure 2b and d show how the ACC increases with ensemble size. Each line in these plots corresponds to a lead-year range (cell) in the respective ACC matrix. The skill increases almost monotonically with the ensemble size, which strengthens our confidence that the detected skill relates to a real predictable signal. Furthermore, the skill has clearly not saturated at N = 40 (even more so for HLB compared to the NAO), pointing to the potential benefit of further increasing the ensemble size.
For both HLB and the NAO, Fig. 2 shows also (dashed-dotted lines) the skill of the sub-ensemble mean against a single member of the ensemble (averaged across all possible member permutations). The fact that the CESM-DPLE ensemble mean has more skill in predicting the observed anomalies than any single member of the ensemble is a common feature across decadal and seasonal prediction systems alike 46 , particularly referring to the midlatitude North Atlantic. This has been referred to as the "signal to noise paradox" 37 although there is no apparent logical contradiction. Instead, this behavior is to be expected for imperfect models that partly under-represent (or misrepresent) the physical processes underlying predictability, and implies potential for further increase in predictive skill with future improvement of models.
The timeseries corresponding to the high skill for HLB and the NAO for LY [1][2][3][4][5][6][7][8] are shown in Fig. 3. For the observed timeseries, the averaging in lead time corresponds to an 8-year running average, while this is not the case for the model timeseries, every point of which is computed with completely distinct data (model runs with different initialization year). This explains why the observed timeseries (blue lines) are smoother than the model ensemble mean (dashed orange lines). More importantly, it is evident that for both HLB and the NAO the high correlation between the observed and model timeseries comes from multidecadal timescales. In fact, for both quantities the correlation increases significantly after smoothing the model timeseries (details in Methods). As expected, the HLB and the NAO timeseries are highly anticorrelated (−0.95 between the observed timeseries). In addition, the long-term variations of HLB and the flipped-sign NAO resemble those of the Atlantic multi-decadal variability 47 (AMV). As will be argued below, this similarity is not coincidental.
In Fig. 3, the panels b and d map the skill for blocking and MSLP for the lead-year ranges that exhibit maximum skill, respectively for HLB and the NAO. Large, coherent areas of statistically significant skill can be seen in the areas determining the respective indices.
To conclude, in this section significant decadal predictive skill was demonstrated for high-latitude blocking in the North Atlantic and the NAO, in both cases arising from multi-decadal timescales. In the following section a first step is made to pin down the origin of the associated predictability.
Origin of predictability A number of studies 20,21,23,48,49 provide evidence that sea surface temperature (SST) anomalies over the North Atlantic subpolar gyre drive atmospheric circulation anomalies projecting on the NAO pattern, particularly via a modification of the location and strength of the SST gradient in the Gulf Stream extension. This SST gradient is indeed important for the existence of the North Atlantic stormtrack, which in turn controls the eddy-driven jet [50][51][52] . Then, as the stormtrack tends to move meridionally (poleward or equatorward) responding to the SST gradient, the formation of HLB is affected (inhibited or favored, respectively). Although a general and widely accepted theory for blocking does not exist, HLB represents a specific circulation regime whose frequency is thought to be modulated in part by North Atlantic SSTs. Following the SST front, at the western side of the North Atlantic, a poleward (equatorward) displaced stormtrack with the associated increased (decreased) westerlies just south of Greenland would inhibit (favor) anticyclonic circulation over Greenland, which tends to form there through barotropic dynamics and orographic forcing 53 , contributing to HLB formation. The occurence of HLB tends to deviate the extratropical cyclones to the south of the block therefore displacing the stormtrack 30,54 and the eddy-driven jet 10,30 to the south, in synergy with the SST forcing. Other mechanisms to explain the atmospheric response to extratropical North Atlantic SST anomalies have been proposed, including a direct diabatic adjustment to surface heating anomalies and a delayed adjustment of the flow through the action of the modified baroclinic activity (eddy-mean flow interaction) 55,56 . In this regard, changes in the strength of the SST gradient may also be important.
Although a thorough and conclusive assessment of the physical mechanisms underlying the detected predictability are beyond the scope of the present study, some basic diagnostics are shown to support the hypothesized driving role of North Atlantic SST, as outlined above. First the circulation anomalies that accompany HLB are examined. Figure 4a, b show the anomalies of the storm activity and the eddy-driven jet concurrent with anomalous HLB frequency in winter. The downstream weakening and southward shift of both the eddy-driven jet and the stormtrack associated with HLB detected in CESM-DPLE (Fig. 4a, b) are consistent with results obtained with observations 10,30,57 . However, it is important to note that the respective anomalies are about five times weaker for CESM-DPLE (about 2 m s −1 ) compared to observations (about 10 m s −1 for the "south jet" regime 57 ). This difference is due to three distinct causes: (i) the composite differences presented here are based on DJFM-mean anomalies as opposed to day-by-day composites 10,57 ; (ii) as the composite differences in Fig. 4 are based on ensemble-mean anomalies, the ensemble averaging strongly decreases the amplitude of the anomalies; and (iii) lowfrequency variability and blocking over the North Atlantic is severely under-represented in CESM simulations 58,59 and this is even more true if blocking detection is performed without meanbias correction. Consequently, in the presented composite differences based on HLB, the impacts of blocking (in a statistical sense) on the eddy-driven jet and the stormtrack are strongly under-represented.
The control mechanism described above involves a pattern of SST anomalies resembling the AMV 26 . In Fig. 4c a very similar pattern is identified (referred to as P HLB ) by compositing on ensemble-mean anomalies of HLB in CESM-DPLE. To clear the possible doubt that the associated SST anomalies might rather be the result of HLB and NAO winter variability 27,60 , the SST anomalies are computed for the preceding autumn (September to November: SON). After averaging autumn SST anomalies over several years and ensemble members, what survives is lowfrequency (decadal to multi-decadal) variability that is representative also of the respective winter SST anomalies. Furthermore, although one would expect the atmospheric response to be established relatively fast, and thus the wintertime SST anomalies to be more relevant than the autumn ones, there is evidence also for a delayed response to extratropical North Atlantic SST anomalies that takes 2-3 months to develop 56 . The composite pattern P HLB closely resembles the AMV (pattern correlation 0.60, statistically significant at the 0.01 level, assessed for SON), as revealed by the large loading over the subpolar gyre and the comma-shaped feature over the eastern subtropical Atlantic. Notably, the magnitude of the composite SST anomaly is comparable with that of the AMV pattern. These findings remain largely unaffected if simultaneous (DJFM) SST anomalies are used instead of SON anomalies: the magnitude increases slightly but the pattern does not change much (not shown).
As a further step to assess the driving role of this SST pattern, the respective anomalies of the magnitude of the SST gradient are shown in Fig. 4d, revealing a weakening and an equatorward shift with respect to the climatological SST front (thick, black line), which is consistent with the control mechanism discussed above. In addition, a weakening of the jet occurs downstream, resulting from the reduced baroclinicity upstream. One may argue that the weakening and the displacement of the SST front (about 1°l atitude) is too small to cause such widespread anomalies in the stormtrack and the jet via the modification of low-level baroclinicity in the atmosphere caused by anomalous surface heat fluxes (consequently affecting HLB and the NAO). 61 However, one should consider that the relative frequency of residence of the climate system in each of two competing circulation regimes (blocking versus unblocked flow, or NAO −/+) may be affected significantly even by weak boundary forcings 62 . In addition, the detected SST pattern involves not only an SST front shift, but also a perturbation of surface heat fluxes over the whole area of the subpolar gyre (meant to be the region of the North Atlantic Ocean characterized by cyclonic barotropic flow, commonly invoked as the region north of 50°N 63 ; here the term is used loosely to indicate the area with the strongest positive SST anomalies, broadly [15-60°W, 45-60°N]). On one hand, changes in atmospheric baroclinicity over the SST front are important for cyclogenesis and Eady growth-rate, while, on the other hand, positive SST anomalies over the whole subpolar gyre tend to warm the overlaying atmosphere and thus reduce the thermal contrast between the air-masses contributing to baroclinic activity. Both of these effects may contribute to the weakening of the stormtrack and of the jet downstream. Also, differences in diabatic heating may impact blocking downstream 64 . Last but not least, by altering low-level atmospheric baroclinicity and meridional eddy heat fluxes upstream, the SST anomalies associated with the P HLB pattern may force a downstream atmospheric response similar to the one detected here for the stormtrack and the jet 23,65 .
As a last step, it is necessary to demonstrate that CESM-DPLE is able to reproduce the observed variability (correct temporal evolution) of the identified SST pattern (P HLB ) and the AMV. The respective timeseries shown in Fig. 5 reveal a striking similarity in the multi-decadal evolution of these two patterns and, in addition, that CESM-DPLE skillfully predicts this evolution (ACC values: 0.87 for P HLB and 0.90 for the AMV). Moreover, this time evolution (P HLB timeseries) largely matches the one of HLB itself (ACC: 0.83 for the respective ensemble-mean LY [1][2][3][4][5][6][7][8] timeseries).
One may wonder whether HLB and the NAO exhibit a similar response to the AMV in longer observational data sets. Unfortunately, due to limited observations (surface-only) prior to 1950, different reanalyses exhibit significant discrepancies in blocking frequency for the first half of the twentieth centrury 66 . On the other hand, most reanalyses agree very well in the period examined here (post 1950). The respective response of North Fig. 4 Composite differences based on model high-latitude blocking (storm track activity, zonal wind, SST and SST gradient). a The standard deviation of the high-frequency geopotential height anomaly at 500hPa (indicating stormtrack activity, see Methods) for CESM-DPLE. The dashed contours show the model climatology for winter (DJFM), while the color shading shows the composite difference of the respective anomalies based on winters of high (top 10%) and low (bottom 10%) ensemble-mean HLB. b: as in a but for the zonal wind at 850hPa (indicating the eddy-driven jet). c: as in a but for the SST anomalies in the preceding autumn (SON). This pattern, resembling the AMV, is referred to as P HLB . d: as in c but for the magnitude of the SST gradient with the black solid line representing the climatological position of the SST front. In each map, the white contours correspond to the respective anomalous field (anomalies added to climatology). Statistical significance is indicated by dots (see "Methods").
Altantic blocking frequency to the AMV is also broadly consistent across different reanalyses 67 (albeit assessed not to be statistically significant) and agrees with the HLB response found in this study. Interestingly, a very recent study 23 using NOAA twentieth Century Reanalysis and ERA20C data sets for 1901-2010 finds a robust relationship between North Atlantic blocking variability and the AMV, and proposes a mechanistic explanation that is very much in line with the dynamical arguments discussed above (changes in the SST gradient impacting on atmospheric baroclinicity, which in turn modify baroclinic activity and the eddy-driven jet, finally reshaping the distribution of blocking). Figure 6 shows how trends associated with external forcing affect the decadal predictive skill. Linear detrending applied to the timeseries before computing the ACC has little impact for HLB, while for the NAO the ACC values are generally reduced (the maximum value from 0.63 in Fig. 2 becomes 0.54). This reduction indicates that MSLP exhibit trends, arguably related to global warming, that are well captured by CESM-DPLE due to the prescribed external forcing. Nevertheless, the realistic initialization accounts for most of the CESM-DPLE skill (Fig. 6: comparing panels  a to b and c to d).
The presented evidence, albeit strong, cannot be considered conclusive, since other possible mechanisms might also be at work to account for atmospheric predictability, including the role of the stratosphere 68,69 . However, it should be emphasized that the decadal prediction simulations analyzed in this study 6 are made with a low-top atmospheric model, which is, in principle, characterized by a limited representation of stratospheric dynamics. The air-sea interaction is bidirectional, and interannual to multi-decadal variability in the North Atlantic domain may involve other processes and feedbacks. The fact that the uninitialized Community Earth System Model-Large Ensemble (CESM-LENS) simulations under-represent the AMV 70 arguably indicates that some coupled processes and feedbacks are still poorly represented in the model. Consequently, without an AMV evolution similar to the observed 71 , it is no surprise that the uninitialized CESM-LENS simulations do not exhibit statistically significant skill for HLB (Fig. 6). Instead, in CESM-DPLE the realistic initialization of the subsurface ocean seems to provide the necessary forcing, leading to atmospheric predictability.
Interestingly, a very recent study 72 shows that AMV predictability in CESM-DPLE may be used to predict multi-annual anomalies of precipitation over Europe via the association between AMV and the eddy-driven jet. However, no direct predictive skill was detected for the eddy-driven jet variability 59 . This may appear inconsistent with the NAO and HLB predictability presented here, but it can be understood considering that, as discussed earlier, low-frequency variability over the North Atlantic and particularly HLB are strongly under-represented in CESM simulations (even more so if blocking detection is performed without mean-bias correction). Consequently, skilfully predicting the multi-annual evolution of HLB frequency and the NAO (referring to standardized timeseries) does not guarantee that the actual ensemble-mean circulation anomalies in the model, averaged over the same lead-year range, contain similar skill. That said, it should be noted that the skill in this study 59 was assessed for a single lead-year (LY5), namely without averaging across a lead-year range (which has been shown to be essential for the emergence of the predictable signal). Moreover, in that study the authors concluded that there was no predictive skill based on the small magnitude of the signal in CESM-DPLE (their Fig. 13c). However, standardization of the signal (as done in ACC) is important given the weak amplitude of the ensemble mean anomalies. Even though decadal predictions (CESM-DPLE) show significant skill in predicting HLB and the NAO 46 , the expected climatic impacts may not be skillfully predicted until further model improvements allow for a more realistic representation of mean climate and of its variability. Nevertheless, skillfully predicting HLB and the NAO may already be used in statistical predictions of nearterm climate anomalies supporting a range of multi-disciplinary applications.
To conclude, referring to climate variations associated with atmospheric circulation changes, the unprecedented decadal predictive skill presented in this study prompts new investigations to better understand the physical processes underlying this predictability and explore the benefits that the latter carry for better predicting key meteorological variables and weather extremes.

METHODS Data
The present study assesses the decadal predictability of blocking frequency anomalies in winter using daily mean geopotential height fields at 500 hPa (Z500). In particular, Z500 data are used from the Community Earth System Model (CESM) Decadal Prediction Large Ensemble (DPLE) simulations, documented by Yeager et al. 6 . These are decadal hindcast ensemble simulations consisting of 40 members, initialized every November from 1954 to 2015 and run for 10 years. The ensembles are generated through perturbations to the atmospheric initial conditions (air temperature). From the same simulations, monthly mean SST fields are used to assess the origin of the atmospheric predictability. Also, zonal wind fields at 850 hPa (U850) are used to assess the response of the eddy-driven jet. Furthermore, to assess the impact of realistically initializing the ocean on the predictive skill, namely the added value of initialization as opposed to the skill explained by the external forcing alone, the results are compared against the uninitialized CESM Large Ensemble (CESM-LENS) historical simulations, also consisting of 40 realizations (1954-2027) run with the same external forcings as the initialized runs (DPLE). The observational data used for verification and comparison include daily mean Z500 fields from NCAR/NCEP Reanalysis 73 and monthly mean SST fields from HadISST 74 .

Blocking detection
The detection of atmospheric blocking was performed following the twodimensional extension 75   days of instantaneous blocking in each winter season (December to March, DJFM), a 5-day threshold for persistence was applied to determine the days belonging to prolonged blocking episodes, hereafter referred to as blocking days. It is noted that prior to the blocking detection mean bias correction 39 has been applied to the daily Z500 fields by subtracting a lead-year dependent daily climatology (to account also for model drift) and then adding the resulting anomalies to the respective observed climatology. All Z500 fields were interpolated onto a coarser regular grid (2.5°× 2.5°), and blocking detection was performed in the latitudinal zone 30°-75°N.

Statistical significance
For assessing the statistical significance of correlation coefficient values (Figs 2 and 3), accounting for autocorrelation, the effective degrees of freedom were calculated following Bretherton et al. 77 . A one-sided Student's t-test against the null hypothesis of non-positive correlation has been applied at the 0.05 significance level. Also, for the composite differences shown in Fig. 4, the statistical significance is assessed via a onesided Student's t-test (significance level 0.05) against the null hypothesis that the anomalies used to compute the composite differences come from populations with means that are equal, or have a difference of the opposite sign of the displayed difference. It was assumed that the two populations have the same (unknown) variance, which was estimated via pooling.

Miscellaneous technical details
To compute the interannual standard deviation (STD) for CESM-DPLE shown in Fig. 1, the STD was first computed separately for each ensemble member {1, 2, 3…40} and each lead-year {1, 2, 3…10} using only the forecast years (DJFM seasons) corresponding to the historical period 1964-2017 that was used to evaluate the observed interannual STD.
In Fig. 3, there is an apparent dissimilarity between the observed (NCEP/ NCAR reanalysis) and the model (CESM-DPLE) timeseries, with the latter being more noisy than the former. To understand this, one has to consider that the respective timeseries are produced in different ways. Namely, for the observed timeseries, when the 8-year moving average is applied to match LY [1][2][3][4][5][6][7][8] in model, two neighboring points/years are the result of averaging eight values, seven of which are identical. This is not the case for CESM-DPLE timeseries because two neighboring points/years are computed from entirely different data (different initialization years correspond to entirely distinct simulations). There is always an 8-year average applied, but for the model the 8 data values are always different, while for the observations they are not. This explains why the observed timeseries are much smoother. Considering squared-coherence, one can argue that the ACC would be more meaningfully computed after a similar level of smoothness (spectrum) is attained between the examined timeseries; for this the ACC is also computed after a centered 7-year running average has been applied to the CESM-DPLE ensemble-mean timeseries. As one would expect, this method yields higher ACC values (Fig. 3). For mapping the predictive skill for blocking (Fig. 3), the ACC was computed at every grid point after averaging the blocking frequency between this and all (8) neighboring grid points in a 5°× 5°box. Effectively, this averaging means that some information is lost regarding the exact location of the blocking center, yet blocking, and especially persistent blocking, is known to be a large-scale phenomenon affecting extensive areas around it. Therefore, the skill presented is still considered spatially accurate.
In Fig. 4, the standard deviation of the high-frequeny transients of Z500 is used as an indicator of baroclinic activity 30 . For this purpose a bandpass Fourier filter has been applied to the daily timeseries retaining transients with periods of 2-10 days.