The role of the Pacific Decadal Oscillation and ocean-atmosphere interactions in driving US temperature predictability

Heatwaves can have devastating impact on society and reliable early warnings at several weeks lead time are needed. Previous studies showed that north-Pacific sea surface temperatures (SST) can provide long-lead predictability for eastern US temperature, mediated by an atmospheric Rossby wave. The exact mechanisms, however, are not well understood. Here we analyze two different Rossby waves associated with temperature variability in western and eastern US, respectively. Causal discovery analyses reveal that both waves are characterized by positive ocean-atmosphere feedbacks at daily timescales. Only for the eastern US, a long-lead causal link from SSTs to the Rossby wave exists, which generates summer temperature predictability. We show that this SST forcing mechanism originates from the evolution of the winter-to-spring Pacific Decadal Oscillation (PDO). During pronounced winter-to-spring PDO phases (either positive or negative) eastern US summer temperature forecast skill more than doubles, providing a temporary window of enhanced long-lead predictability.


INTRODUCTION
Quasi-stationary or recurrent Rossby waves in boreal summer play an important role in the development of high impact heat waves 1,2 . Such Rossby waves create persistent clear-sky high pressure systems, which, in combination with soil desiccation and land-atmosphere feedbacks, can lead to extreme heatwaves such as seen in Russia 2010 3,4 and the United States 2012 5 . These Rossby waves (RW) can arise due to internal atmospheric variability, with a preferred phase 6 that largely depends on orography 7 , land-ocean boundaries 8 and atmospheric waveguidability [8][9][10] . Vorticity anomalies induced by e.g., tropical convection or mid-latitude sea surface temperature (SST) anomalies can also force quasi-stationary Rossby waves 7,[11][12][13] .
Understanding the role of mid-latitude ocean-atmosphere interactions in generating and maintaining Rossby waves is needed to improve subseasonal-to-seasonal (S2S) predictions [14][15][16] and climate change projections [17][18][19] . Currently, these interactions are not well understood. We know that, especially on intra-seasonal timescales, mid-latitude SST anomalies are predominantly forced by atmospheric variability 20,21 , yet the ocean can also influence the atmosphere 22,23 . The initial atmospheric response to diabatic heating at the ocean surface is baroclinic, with a low-level trough and high-level ridge slightly downstream of a mid-latitude warm SST anomaly 7 . Subsequently, the baroclinic response is modified to a local or slightly downwind-shifted warm ridge (barotropic) response via a transient eddy feedback 13,24 . In the upper atmosphere, the warm ridge is associated with a negative vorticity anomaly. The atmosphere responds to this negative vorticity anomaly by moving air equatorward, mainly at the downstream edge of the warm ridge. This adjustment can lead to a downstream Rossby wave response consisting of alternating highs and lows 25 .
The atmospheric response to SST anomalies is thus complicated due to the transient eddy feedback, which strongly depends on the strength of the background flow, and therefore also on season and location of the anomaly 22 . A stronger atmospheric response is expected when the SST anomaly is close to the storm tracks and when the storm tracks are strong (e.g., in winter) 26 . This sensitivity of the atmospheric response to the storm track's characteristics is also linked to the waveguidability of the jet stream 26 . Vorticity disturbances in the storm track near the core of the jet will be refracted to the core 9,27 , thereby generating a more zonally elongated Rossby wave response. A higher waveguidability is found for a strong and/or more narrow jet stream, leading to a stronger atmospheric wave-response 28 . The jet and storm track are tightly coupled 29 , and it is thus likely that both strongly affect the atmospheric response to an SST induced vorticity anomaly.
The atmospheric response also depends on the persistence of an imposed mid-latitude SST anomaly. While the timescale of the baroclinic adjustment is only a few days, to reach the equilibrium barotropic adjustment takes approx. 1 to 2 months 13,30 . The SST persistence is governed by the oceanic Rossby wave response to atmospheric forcing 31 , yet it is also affected by the thermal inertia of the ocean mixed layer and the turbulent heat fluxes 32 . The mixed layer is shallower during summer, and therefore SST anomalies are less persistent (dissipating within a couple of months) compared to winter (>1 year) 33,34 . Vice versa, the shallower mixed layer also means that the persistence of summer SST is more sensitive to atmospheric forcing 34 . All these factors illustrate the complexity of the coupled ocean-atmosphere Rossby wave interactions, with (1) a seasonally varying ocean-atmosphere coupling strength, (2) a seasonally varying persistence of SST, (3) the slow atmospheric baroclinic-to-barotropic adjustment to an SST anomaly, and (4) the dependence on the location of the SST anomaly (and background atmospheric state).
Here, we focus on United States (US) temperature variability and its relationship with atmospheric Rossby waves, and how these Rossby waves interact with north-Pacific SST anomalies. Previous work showed that extra-tropical Pacific SSTs, associated with a Rossby wave, provide long-lead predictability for eastern US hot temperatures 15 . Follow-up work showed that using only SST precursors to predict high temperature events renders predictive skill up to 50 days lead-time, while adding local soil moisture only slightly improves skill at shorter lead-times (up to 30 days lead-time) 16 . Thus, we focus here on interactions between SST and Rossby Waves (SST-RW), and how those interactions affect the long-lead SST signal for eastern US temperature. It was hypothesized that, in summer, amplifying twoway feedbacks between the Rossby wave and the underlying SST pattern can generate long-lead predictability 15 . The SST pattern would initially arise as response to a strong atmospheric Rossby wave, and subsequently amplify via positive ocean-atmosphere feedbacks. An alternative hypothesis is that the long-lead SST signal predominantly originates from the Pacific Decadal Oscillation (PDO), since the robust SST precursor pattern as found in 16 , projects strongly onto the PDO pattern. This suggests that the low-frequency PDO dynamics leads to a continuous and persistent boundary condition for the atmosphere 16 . Both processes might also simultaneously contribute to the long-lead signal between SST and eastern US temperature.
To test these hypotheses, we use a causal discovery technique to quantify the SST-RW coupling strength of the Rossby wave associated with eastern US temperature variability. As a comparison, we perform the same analyses for the western US and show that this region is modulated by a different Rossby wave pattern with different dynamical characteristics. Importantly, we show that only for the eastern RW, a long-lead SST signal exists, and thus long-lead predictability is possible. If the long-lead signal for the eastern RW originates from a positive SST-RW feedback, we expect to find a stronger SST-RW coupling on (sub)-synoptic timescales (1-10 days) compared to the western RW. On the other hand, if the ocean is forcing the atmosphere by acting as a boundary forcing, we expect to find a pronounced upward ocean-toatmosphere link for the eastern RW.
To measure the coupling strength, lagged univariate correlation analyses are inadequate since the autocorrelation of both the Rossby wave and especially the SST variability will spuriously inflate the correlation coefficient 35 . Therefore, we will use a causal discovery algorithm which has been specifically developed to deal with strongly autocorrelated climate data 36 .

RESULTS
Quantifying ocean-atmosphere coupling of Rossby Waves Figure 1a, b shows that western (T w ) and eastern (T E ) US summer temperatures strongly correlate with two distinct Rossby wave patterns, here called the western (RW W ) and eastern RW (RW E ) pattern. These are phase-shifted by about half a wavelength with respect to each other. T w and T E are the area-weighted 15-day mean anomaly temperature timeseries of the western and eastern US spatial cluster, respectively. The western and eastern US temperature clusters are based on gridcells that tend to show simultaneous occurrences of warm temperature periods 15,16 ("Method").
The western RW pattern is more zonally elongated and resembles a dominant Northern Hemispheric mode of variability. The eastern RW consists of an arcing pattern over the Pacific and North America. This wave-pattern is reminiscent of the winter Pacific North American (PNA) pattern in its negative phase 37 and the ENSO-forced atmospheric bridge response 38 . Interestingly, it does not resemble the summer PNA pattern, and it does not appear to be related to a circumglobal mode of variability. See Supplementary Note 1 for a more detailed discussion and evidence. Hence, while the RW W clearly relates to an atmospheric mode of variability, the RW E does not appear to match any summer mode of variability. Figure 1e, f show there is a strong instantaneous and lagged SST correlation with eastern US temperature (T E ). For the west (T w ), no long-lead signal SST signal is detected (Fig. 1d), only an instantaneous one (Fig. 1c). To investigate the role of oceanatmosphere feedbacks, we quantify the coupling strength between SST and the western and eastern Rossby waves, respectively, using the Peter and Clark -Momentary Conditional Information (PCMCI) algorithm. To visualize the causal dependencies found by PCMCI, we plot Causal Effect Networks (CEN), which are directed network graphs.
By calculating the spatial covariance of the RW patterns within the green rectangles (shown in Fig. 1a, b), we quantify timeseries that capture the RW variability for both the western and eastern RW, referred to as RW W t and RW E t ("Method"). Figure 2a, g show the SST correlation with the RW W t and RW E t timeseries, respectively. By calculating the spatial covariance within the green rectangle of these SST correlation patterns, we capture the SST variability (SST W t and SST E t ) associated with the two Rossby waves ("Method"). We use the PCMCI algorithm that consists of two-steps: (1) an adaptation of the PC 39 (Peter and Clark) algorithm and (2) the Momentary Conditional Information metric 40 . If one is interested to test the causal relationship between the timeseries x t-1 and y t , first, the PC-step estimates the lagged parents of both timeseries (x t-1 and y t ) by iteratively performing conditional independence tests. The MCI-step tests if x t-1 and y t are conditionally independent, given the influence of the lagged parents of both x t-1 and y t . To measure conditional independence, we use partial correlation analyses, e.g., parcorr(x t-1 , y t |Z), in which Z is a single or set of timeseries to condition on. For more information, see "Method". To measure the SST-RW feedback on (sub)-synoptic timescales we perform the PCMCI analysis on 1, 5, 10 and 15 day means. To measure the effect of lower-frequency variability, we aggregate to 30, 45, 60 day mean timescales. For conciseness, we present the CEN's for 1, 5, 10, 30 and 60 day means, as for the other timescales (15 and 45 day mean) we only found instantaneous links (not informing us about the directionality of the forcing). Note that we focus on summer (June, July, August), yet when using 60-day means, we extend into May, June, July, and August to increase sample-size (2 datapoints per year instead of 1).
Since a CEN depicts causal links as a yes/no answer, we implement a sensitivity analysis to ensure we present robust links only. We do this by repeating the PCMCI analysis 70 times on slightly different subsets of data ("Method"). For the CENs, a link is only shown if it is significant (α FDR = 0.05) in 60 out of 70 perturbation experiments. The spread in the link strength that results from the sensitivity experiments also represents the uncertainty due to sampling. Given this spread, a double-sided t test is used to measure if the western SST-RW link strengths (n = 70 perturbations) are significantly different (α = 0.05) from the eastern SST-RW coupling (n = 70 perturbations), indicated with a * in Table 1.
The CENs for the west and east both show a positive two-way coupling between SST and the Rossby waves on daily timescales, but there are important differences (Fig. 2). The causal influence of the atmosphere on the ocean is more pronounced for the western US Rossby wave as compared to the eastern one. On the daily and 5-day timescale, the downward causal link for the west (RW W →SST W ) is stronger by about a factor 2 compared to the east (Table 1). Similarly, the instantaneous RW-SST link on these timescales is also stronger for the western RW (again by about a factor 2). On longer timescales than 5-day means, no robust directed links for the western RW are found (only instantaneous links).
On timescales longer than 5-day means, the influence of the ocean to the atmosphere is consistently stronger for the eastern Rossby wave compared to the western one. Using 60-day aggregated data, the 60-day lagged causal link SST E →RW E is stronger by roughly a factor 10 ( Table 1). This 60-day aggregated SST E tÀ1 ! RW E t link is very robust and found to be causal by PCMCI in all the 70 perturbation experiments. Figure 2 also shows that the SST E t timeseries has a higher persistence compared to SST W t , as indicated by the higher autostrength values (color of SST nodes). The auto-strength value is similar to the conventional autocorrelation, but accounts for factors that might artificially inflate it. For Fig. 2f, l, the autostrength is calculated by the partial correlation of SST t versus SST t-1 , conditioned on SST t-2 (parcorr(SST t-1 , SST t |SST t-2 )).
Note that for the eastern RW, for timescales longer than daily, there are no causal links from the atmosphere to the ocean. More precisely, the parcorr RW E tÀ1 ! SST E t jSST E tÀ1 À Á link is non-significant. From this, it can be deduced that the higher persistence of SST E t in the first place results from the past SST state (SST E tÀ1 ) with only a minor influence of antecedent atmospheric forcing (RW E tÀ1 ). To get a better understanding on the interaction between RW E and SST prior to the summer, we calculate the ocean-atmosphere coupling between SST and the eastern RW for winter and spring (see Supplementary Note 1). The ensure that we focus on the same RW pattern, we project the summer eastern RW pattern (as defined Fig. 1b) onto the winter and spring z500 field. Supplementary Fig. 8 verifies how the Rossby wave timeseries correlates with z500 variability. For JJA we retrieve a pattern very similar to what is shown in Fig. 1b, confirming that our RW E t index is a good proxy for the eastern Rossby wave. In winter and spring, the RW E t index projects strongly on the same eastern Rossby wave and additionally correlates with the tropical belt and is again similar to the Pacific-North-American pattern and the ENSO-forced teleconnection called the atmospheric bridge, which starts above the ENSO region and arcs over the Pacific-North American domain 41 . In Supplementary Fig. 9 the correlation maps in winter (panel a) and spring (panel g) between SST and the RW E t index show a clear resemblance to main features of the PDO pattern in its negative phase. The CENs in panels b-f show that during winter, we find a strong downward forcing (atmosphere-toocean). For spring, in addition to a strong downward forcing, we also observe two-way coupling on the 5-day mean timescale (panels h-l). In the next section, we use partial correlation to further investigate the importance of different processes in winter and spring for the upward (ocean-to-atmosphere) forcing that we find in summer (Fig. 2l).

Explaining the long-lead causal link
Here we show that the long-lead upward ocean-forcing that drives the eastern US Rossby wave in summer (as identified above), is closely related to low-frequency PDO variability. From here on, we work with 2-month mean data (instead of 60-day means) to ease interpretation. On the 2-month mean timescale, eastern RW correlation pattern is clearly in phase with the PDO pattern, while the western RW is not. This can be seen in Fig. 3 which shows the PDO (1st EOF loading) pattern together with the instantaneous and lag-1 (corresponding to a 2-months lag) correlations maps between SSTs and the western and eastern RW, respectively.
In Fig. 4, we investigate how the intra-seasonal evolution of the eastern Rossby wave, ENSO and PDO (at lag 2) affect the SSTA signal at lag 1. We do so by creating lag-1 correlation maps that are conditioned on different actors at lag 2. Figure 4a shows the SST-RW E correlation pattern at lag 1 (2 months). The correlation values and their significance are reduced when conditioning on RW E at lag 2 ( Fig. 4b), implying that the Rossby wave activity at lag 2 plays some role in forcing the SST signal at lag 1. We calculate the low-frequency (winter-to-spring mean) ENSO tÀ2 and PDO tÀ2 timeseries as described in "Method". When conditioning on ENSO tÀ2 , the SST signal does not weaken indicating that the mean ENSO state has little or no effect (Fig. 4c). Figure 4d shows the SST t-1 influence on RW E t is most effectively weakened when conditioning on the winter-to-spring PDO variability (PDO tÀ2 ). Thus, most of the information originates from winter-to-spring PDO variability. These results show that the lagged SST signalrelevant for forcing the May-August eastern RW (RW E t ) -is influenced in the first place by the winter-to-spring PDO state. An additional (but smaller) influence is provided by the prior atmospheric wave forcing. That they both show an influence is in line with strong co-variability between the PDO and RW E in spring ( Supplementary Fig. 10).

Temperature predictability
A robust lagged Pacific SST signal can only be found for temperature in the eastern US cluster, but not for the west (Fig. 5). This is in line with the lack of any long-lead causal links to the western RW (Fig. 2) or any significant lagged SST correlation for western US temperature variability (Fig. 1d). For the eastern US, the lagged SST signal is clearly strongest for July-August (Fig. 5). The correlating regions (Fig. 5, July-August mean) are clustered into the mid-and eastern Pacific region (Fig. 6), these two regions are used as a mask to calculate 1-dimensional spatial mean timeseries ("Method"). These timeseries will be used for predictions in "Results". This method is referred as the response-guided dimensionality reduction (DR), i.e., the dimensionality reduction is based on precursor regions that correlate with the target variable 16,42 .
We make out-of-sample Ridge Regression forecasts for July-August mean T E using two different precursor sets, i.e., The * indicates a significant (α = 0.05) difference given the uncertainty due to sampling (see "Results").

S. Vijverberg and D. Coumou
(a) precursors found by the response-guided DR and (b) the PDO climate index timeseries (Fig. 6). The test data is obtained by splitting the data into training and test years using a 10-fold stratified cross-validation (see "Method"). Both DR methods are done on only training data and extrapolated to the test set to make out-of-sample predictions. We use the correlation coefficient and the Mean Absolute Error Skill Score (MAE-SS) for the verification. The latter is defined as  where MAE dim is the error when always predicting the climatological mean of the T E anomalies (≈0).
The response-guided DR leads to better forecast skill compared to using the PDO as a predictor (Table 2 and "Discussion" for more details). We construct Ridge Regressions using lag 1, lag 1 and 2, and lag 1, 2 and 3 ( Table 2). We observe that using lag 1 and 2 (May-June and March-April) and the Response-guided DR provides the best predictive skill for July-August temperatures in eastern US (out-of-sample correlation value of 0.62). Hence, although there is a clear link to PDO variability, using a more tailored method to extract the signal renders a clear boost in skill (see "Discussion").

Window of opportunity by winter-to-spring PDO state
July-August mean eastern US temperature are substantially more predictable when the forecasted summer is preceded by a strong (instead of a weak) winter-to-spring PDO state (Fig. 7). We demonstrate this by comparing the skill during the 50% of years with strongest winter-to-spring PDO states (either positive or negative) with the skill of the 50% with weak PDO states. The winter-to-spring PDO state is defined by the DJFMAM mean PDO. During strong positive or negative PDO states, there is a 54% reduction in the MAE compared to years with weak PDO states (Fig. 7, left column). When comparing the forecast skill to the climatological benchmark (MAE-SS), we observe that most skill is present in these strong PDO state years (Fig. 7, right column). This result is robust when using different train-test splits. If we make a stricter selection of anomalous winter-to-spring PDO states (top 30%), the skill further increases with MAE-SS values ranging between 0.48 and 0.57 and correlation values ranging between 0.85 and 0.89 (Supplementary Table 1). During weak PDO state years, the model hardly outperforms a climatological mean temperature forecast.
Similarly, using partial correlation to remove the PDO tÀ2 from the lag 1 SST timeseries and the PDO tÀ3 from the lag 2 SST timeseries causes the forecast skill to vanish (mean MAE-SS¼ 0:03 0:20 À0:23 , with the lower and upper subscript denoting the C.I. at α = 0.05). Once more this indicates that the low-frequency antecedent PDO evolution is the background mechanism that is vital for predictability and that it can be used to identify a window of opportunity at the time of the forecast.

DISCUSSION
We show that two different Rossby waves are important drivers of temperature variability in western and eastern US, respectively (Fig. 1a, b). While both Rossby waves correlate equally strong with surface temperatures over the US on synoptic timescales (15-day means), a long-lead signal between temperature and SST is only present for the eastern US (Fig. 1c-f). As hypothesized in the introduction, the CEN analyses confirms that the associated summer eastern RW is forced by the low-frequency north-Pacific SST variability (Fig. 2).
We show that low-frequency PDO variability is a crucial aspect for this long-lead signal, and thus for predictability ("Results"). In our view, the mid-and eastern Pacific timeseries are the direct causal precursors, while the antecedent low-frequency PDO dynamics are vital to develop the persistent and high amplitude signal that is needed to force a persistent RW response in summer. This is in line with modeling experiments which show that a persistent [order of 2 months] SST forcing is needed for a  barotropic (RW-like) response to develop and that a stronger boundary forcing (higher amplitude SSTA) results in a stronger atmospheric response 13 . To first order, the PDO pattern arises from extra-tropical atmospheric forcing and the corresponding oceanic Rossby wave response 43,44 . This downward forcing is strongest in winter 31 , as is also observed in our winter CEN (Supplementary Note 2). Multiple processes are important for strengthening the PDO variability, such as (1) the re-emergence mechanism 32 , (2) the ENSO teleconnection named "the atmospheric bridge" 41,45,46 and (3) active ocean-atmosphere coupling 43,46 and the associated local positive feedbacks 47 . However, the relative importance of these processes is uncertain. Our CEN analysis quantifying the SST-RW coupling for winter and spring indeed support that processes (2) and (3) strengthen the PDO pattern (Supplementary Note 2). The CENs show that forcing in winter and spring is predominantly downward (from atmosphere to ocean). In spring, we also observe a more pronounced two-way feedback. Finally, the forcing is predominantly upward in summer. This is consistent with observational findings (their Fig. 3) 21 and previous work 23,48 .
Since persistence is a requirement to get a clear barotropic RW response, the spatial patterns of any low-frequency mode of SST variability will provide a physical constraint on the location and phase-position of quasi-stationary Rossby waves and therefore on the downstream surface impact of the Rossby waves. Results for the western RW supports the latter hypothesis, as its wave pattern is not in phase with the PDO pattern (Fig. 3). We argue that this is the reason why the western RW is not forced by the north-Pacific SSTs at longer timescales (Fig. 2), and therefore no long-lead SST signal is found for western US temperature (Fig. 1d). In contrast, the eastern RW is in phase with the PDO pattern (Fig. 3) resulting in a long-lead SST signal that forces the atmosphere (Fig. 2). The persistent SST forcing originates from the co-evolution of winter-to-spring PDO dynamics and the associated ocean-atmosphere interactions. Hence, these are the key process behind predictability for the eastern US summer temperature.
We show that using the mid-and eastern Pacific SST timeseries yields higher forecast skill compared to using the PDO index as a predictor ("Results"), in line with previous work 16 . The PDO timeseries captures variability in a much larger domain over the Pacific and therefore includes disturbances that are irrelevant to the Rossby wave forcing mechanisms described in the introduction. In contrast, the mid-and eastern Pacific regions are the core PDO regions whichbased on theoretical and modeling experimentsare expected to force an eastern RW-like response 13,21 . Moreover, while the PDO (simply explaining most variance of SSTA in the North Pacific) suggests that the mid-and eastern Pacific regions are part of the same variability, the correlation between the mid-and eastern Pacific timeseries is only −0.56. Using a separate mid-and eastern Pacific SST timeseries (extracted by the Response-Guided Dimensionality Reduction, RGDR), enables the Ridge regression to (1) learn a more detailed model and (2) use timeseries that are more directly related to the forcing of the RW. Nevertheless, the importance of the background PDO state is further illustrated by the considerable increase in forecast skill for the July-August mean temperature for years with a persistent high amplitude winter-to-spring PDO state ("Results").
Seasonal dependence of the lagged SST signal for eastern US temperature is evident from Fig. 5. The exact reason for this specific window of predictability is not fully understood yet. It might be explained by (1) the atmosphere being less chaotic in summer which results in a higher signal-to-noise-ratio or, (2) the seasonal cycle of solar radiation resulting in a stronger impact of high-pressure systems on surface temperature during summer months or (3), potentially amplifying effects of soil-moisture deficits become important near the end of summer 49 . We also note that it is likely that the persistent summer eastern RWforced by spring SSTsleads to both higher temperatures and reduced rainfall, thereby simultaneously affecting summer soilmoisture content. Similarly, the winter-to-spring atmospheric variability that is associated with a strong winter-to-spring PDO state might already affect rainfall over eastern US in those months.
We unraveled the role of ocean-atmosphere feedbacks that are driving long-lead predictability of eastern US summer temperature based on careful analyses with causal discovery algorithms. As shown in "Results", understanding the sources of predictability paves the way for identifying windows of enhanced S2S predictability and our approach might be successful in finding other potential windows of predictability.

METHOD Data
Our analysis relies on 42 years of data (1979-2020) from the ERA-5 reanalysis 50 . Daily maximum 2-meter temperature (on a 0.25°x 0.25°grid) is calculated by computing the daily maximum of the "maximum 2 m temperature since previous post-processing", with a step-size of 1 h. We use daily mean sea surface temperature (SST), geopotential height at 500 hPa (z500) and meridional wind at 300 hPa (v300), all on a 1 x 1 degree grid. Z500 Denotes the thickness of the atmospheric layer at 500 hPa, therefore, it clearly discriminates between high-and low-pressure systems, and it is directly affected by vortex stretching/compression that can arise due to diabatic anomalies 51 . Meridional wind (v) at 300 hPa is less affected by lower tropospheric disturbances, therefore, v300 is often used to investigate large-scale Rossby wave patterns 8,52 .
For the daily data, we determine the seasonal cycle by (1) applying a 25-day rolling mean, (2) calculating the multi-year mean of each day-ofyear of the smoothened timeseries and (3) then construct the final seasonal cycle by fitting the first 6 annual harmonics to the calculated seasonal cycle based on the smoothened data. The 1-dimensional target timeseries for the analyses in "Results" are aggregated from pre-processed daily data to 2-month means. We also use raw monthly mean SST data as input when analyzing data on the 2-monthly timescale for computational efficiency. For the raw monthly mean SST data, we construct the seasonal cycle by calculating the multi-year mean of each month. We subtract the seasonal cycle from the daily/monthly data and remove the climate change signal by subtracting the long-term linear trend of each gridcell.

Clustering North American temperature events
We use Hierarchical agglomerative clustering to identify coherently behaving regions 15 (Fig. 8). We apply the clustering on US gridcells and a part of Canada and Greenland (up to 70°N). Regions that tend to experience temperature above the 66th percentile simultaneously are clustered together. Because the dynamics behind temperature variability might be different at high elevation, we excluded all gridcells with an altitude above 1500 m (e.g., the Rocky Mountains). We performed the clustering for a range of temporal aggregations [5, 10, 15, 30 days] and number of clusters [4,5,6,7,8,9,10] to test for robustness. From the results presented in the Supplementary Method 1, we choose the two robust clusters, to simplify notations we refer to this as the western and eastern US cluster. By testing the spatial decorrelation radius within each cluster ( Supplementary Fig. 2), we verified that the size of the clusters is appropriate. Of these two clusters an area-weighted spatial mean temperature is calculated, rendering two 1-dimensional timeseries. The timeseries convey the western and eastern US daily maximum temperature variability, these are referred to by T W and T E , respectively.

Link between temperature, circulation, and sea surface temperature
To quantify the temperature versus z500 relationship, we aggregated to 15-day means and calculate one-point correlation maps (α FDR = 0.05) at lag 0 for both the west (T W ) and eastern US (T E ) temperature timeseries. We account for the False Discovery Rate using the Benjamini/Hochberg correction 53,54 . For Fig. 1 and Fig. 2a, g, we test for robustness of the correlation maps by re-calculating them on 70 subsets (36 years) sampled from the 42 years of data. See "Method" for more information. In the one-point correlation maps, gridcells are only presented as significant if they are found significant in 60/70 subsets of data. The RW pattern (RW pattern ) is defined by the significantly correlating gridcells within the green rectangle as shown in the z500 correlation maps (Fig. 1a, b). We reduce it to a 1-dimensional timeseries by calculating the areaweighted spatial covariance, i.e., using only the Nsignificantly correlating grid cells. Where w i denotes the area weight at grid cell i, RW pattern denotes a vector with the correlation values of the significantly correlating grid cells, the overbar denotes the spatial mean and z(t) denotes the geopotential height field at time t. Temperature correlates strongest with local geopotential height. The higher correlation values result in a much stronger weight for the local high-pressure system compared to adjacent lows and highs of the RW pattern. To obtain a RW timeseries where the high and lows have equal weights, we set all significant positively (negatively) correlating gridcells to 1 (−1). This is done only for the RW timeseries. We tested other options, but this method led to a timeseries that was best capable to reproduce the RW pattern that the timeseries is supposed to capture (Supplementary Method 2). We use this procedure to calculate both the west (RW W t ) and eastern US RW E t À Á RW timeseries, which we will use for the PCMCI and (partial) correlation analysis.

Causal effect network using PCMCI
To obtain the link between the RW timeseries and north-Pacific SST, we first calculate one-point correlation maps with SSTA versus both the west and eastern RW timeseries (Eq. 1). These correlation maps show the RW imprint on the SSTA. Substituting RW pattern for SST pattern and z for SST in Eq. (1), we obtain a 1-dimensional timeseries for the SST pattern. At this point, we have the SST pattern timeseries and the RW pattern timeseries, i.e., (SST W t and RW W t ) and (SST E t and RW E t ). To quantify the SST-RW coupling strength, we use the PCMCI algorithm 55 in combination with conditional independence (CI) tests based on partial correlation 56 . For each significantly correlating link, a partial correlation analysis is performed which is conditioning on all relevant information that might statistically inflate the correlation link strength. Fig. 8 Gridcells which show more frequent simultaneous warm temperature periods occurrences are combined using Hierarchical agglomerative clustering. Warm temperature periods are defined as 15-day mean temperature exceeding the 66th percentile. The white gridcells indicate (the Rocky) mountains (altitude > 1500 metre). These were left out of the analysis because temperature variability at high altitudes might have a different relationship with Rossby wave variability compared to low altitude gridcells. The purple contour lines indicate the western and eastern US spatial clusters.
are the estimated parents of x i t , excluding the to-betested variable x i iÀτ . All links are tested in both directions, as well as instantaneous, i.e., from τ min = 0 up to τ max . If the MCI is significant (α FDR = 0.05) when conditioning on all the parents and assuming the underlying assumptions are satisfied 35 , the link is deemed causal. When we state there is a causal link, it should be interpreted as causal within the context of the experiment, i.e., not the result of a spurious link due to the past SST evolution or RW occurrences, with the past (i.e., maximum lag considered) being limited by τ max .
Sensitivity analyses are performed by re-iterating the analysis workflow, i.e., from calculating the RW pattern and timeseries ("Method") up to the CEN, each time using a unique set of 36 out of the 42 years of data. Since we apply PCMCI repeatedly on different subsets of data and PCMCI tests many different dependency tests within the algorithm, we correct for the False Discovery Rate (FDR) using the Benjamin/Hochberg correction. With these sensitivity analyses, we are propagating uncertainties due to leaving out data through the entire workflow. Similar types of robustness/stability tests are becoming more common in the machine learning community 57,58 . The results of the sensitivity analyses are also used to quantitively compare the western vs the eastern CENs in Table 1.

Partial correlation maps
We use the partial correlation conditional independence tests to construct latitude, longitude maps where we test the influence of a potential confounder of interest. We use these maps to regress out the influence of the RW at lag 2, the low-frequency ENSO and the low-frequency PDO timeseries when testing the link between SST t-1 and RW E t . The lowfrequency variability is obtained by apply a 6-month rolling mean (indicated by ENSO t and PDO t ). When selecting the dates at lag 2 we approx. select the winter-to-spring mean timeseries. We ensure that the rolling mean is based on data prior and including lag 2 to avoid information leakage. The ENSO is calculated using the area-weighted nino3.4 bounding box [5°N-5°S, 170-120°W]. The PDO pattern is found by calculating the first area-weighted Empirical Orthogonal Function of Pacific SSTA [115-250°E,20-70°N]. For the (partial) correlation maps in Fig. 4, we use the same cross-validation as introduced in "Method" to obtain different subsets of data. Hence, the partial correlation maps are calculated 10 times on subsets of 36 years.

Forecasting
To investigate the seasonal dependence, we use a response-guided approach 16,42,59,60 . This approach encompasses methods that reduce dimensionality of the precursor field based on a relationship to a target, instead of using some statistic of the precursor field (e.g., maximizing the explained variance). First, we calculate one-point correlation maps based on training data (at lag 1). Second, adjacent regions of the same correlation sign are grouped together into precursor regions. This is done using the Density-based spatial clustering of applications with noise (DBSCAN) 61 . Third, for each precursor region, an area-weighted and correlation-value weighted spatial mean is calculated.
The resulting 1-dimensional timeseries are standardized and then fitted on the training data using a Ridge regression. The regularization parameter α is tuned using the default Generalized Cross-Validation 62 . The alphas range between 0.1 and 1.5, with 25 steps spaced evenly on a log scale with base = 10. The standardizing and fitting are done on the same training data as is used to calculate the correlation maps.
We use the Pearson-r correlation coefficient and mean absolute error skill score (MAE-SS) for verifying the deterministic forecasts. The MAE gives equal weights to each observation/forecast pair, making the analysis we present in Fig. 7 a fairer comparison between the two data subsets. It is defined as, MAE ¼ 1 n P n i¼n y pred;t À y true;t ; where n are the number of observation/forecast pairs, y pred , i is the predicted value at timestep t and y true is the observation at timestep t.
We implement a stratified 10-fold cross-validation (training sets consist of 36 or 35 yrs, test sets 4 or 5 yrs). The stratification is achieved by creating the training sets, such that these contain similar statistics in terms of the magnitude of July-August temperature values. This ensures that the training/test sets are good approximations of the climatological US temperature dynamics. Since we cannot reliably estimate the skill score based on a single test set of 4 years. We implement a double cross-validation for tuning the regularization parameter within each training sample, as done in 16 . This means that we fit (and tune) a statistical model on each training set and use that to forecast the test set. The verification metrics are computed on the 10 concatenated test sets.

CODE AVAILABILITY
Python code for all output presented in this article are publicly available in the Github release: https://github.com/semvijverberg/RGCPD/releases/tag/v0.95-a.