Introduction

The important role of the Atlantic Meridional Overturning Circulation (AMOC) in the climate system has been extensively studied1,2,3,4,5,6,7,8,9,10,11,12. Without an AMOC and associated northward heat transport, northern and western Europe could be much colder1,2,5,6,9, the Arctic sea ice could expand1, the Inter-Tropical Convergence Zone (ITCZ) could shift southward3,5,9, and sea level along the East Coast of North America could be higher12. Compared with these changes in the mean climate, the impact of AMOC on extreme weather has not been investigated systematically and sufficiently thus far. One reason is that previous generations of global climate model were particularly designed for studies on large-scale, long-term climate, rather than on daily weather at the local scale, which requires high resolution, frequent data output, regional focus, and so on. Nonetheless, several recent studies have shown that a slowdown of AMOC could contribute to summer heatwaves over Europe13,14, flooding and droughts15, stronger and more active Atlantic hurricanes16,17 and extratropical storms18.

During the past decade, the Geophysical Fluid Dynamics Laboratory (GFDL) of NOAA has been working towards a unified and seamless modeling system suitable for studying both weather and climate, as well as their complex interactions under the same umbrella. The recent progress in model development and the rapid growth of supercomputer power have provided better tools to tackle important weather-climate issues. Here, we use the high resolution version (C192) of the global coupled modeling system, GFDL CM419,20,21,22,23 (see the “Methods” section), to investigate the influence of AMOC on the U.S. extreme cold weather during winter. As low-frequency high-impact events, extreme cold snaps could be disastrous (https://www.ncdc.noaa.gov/billions/), particularly for the U.S. southern states with typical mild temperatures during winter24,25.

Results

Control simulation and water-hosing experiment with GFDL CM4C192

Under the 1950 radiative forcing, a long, centennial timescale control simulation has been carried out with CM4C192 as part of the GFDL’s participation in the High Resolution Model Intercomparison Project26. Due to the refined resolution for both the atmosphere (0.5°) and ocean (0.25°), synoptic-scale phenomena are better simulated by CM4C192, including hurricanes and severe winter storms, atmospheric rivers and blocking, ocean eddies and jets, storm surge and coastal flooding, etc12,19,20,21,23. In addition, the simulated AMOC has a mean strength of about 18 Sv (1 Sv = 106 m3 s−1) at 26°N, compared well with observations19,23 (Supplementary Fig. 1a).

To investigate the impact of AMOC on mid-latitude weather, we consider an idealized case by obtaining a climate state without an active AMOC while keeping everything else the same. To do so, we perform the typical water-hosing experiment by imposing a 0.6 Sv freshwater addition over the northern North Atlantic1,3 (see the “Methods” section for more details). This experimental design should lead to strong and quick signals with a clear and definite attribution to AMOC, thereby avoiding complication by other factors. In addition, the high resolution coupled model is computationally expensive, which currently prevents long, transient, and ensemble simulations.

In response to the freshwater perturbation, the AMOC almost shuts down in about 20 years (Supplementary Fig. 1b, c). The atmosphere in the Northern Hemisphere approaches a new quasi-equilibrium state after year 20. In the following analysis, we compare years 21–100 of the hosing experiment with the 100-year control run to identify response characteristics of daily weather to the AMOC shutdown.

Energy transport across 40°N and Bjerknes compensation between the ocean and atmosphere

In the control run of CM4C192, the atmosphere and ocean work together to transport up to 5.7 Petawatts (PW, or 1015 Watts) annual heat poleward to compensate the differential solar heating between the low and high latitudes27,28,29 (Fig. 1a, b and Supplementary Fig. 2). In the Northern Hemisphere, the maximum total transport occurs at about 40°N. At mid-latitudes, the atmosphere is highly efficient at mixing different temperatures and transporting heat poleward through fast-moving turbulent weather systems, especially during winter. For the annual mean, the oceanic transport of about 0.8 PW at 40°N, largely due to AMOC16,30,31, is by far smaller than its atmospheric counterpart of 4.8 PW, but nonetheless represents an enormous amount of heat in global energy balance (Fig. 1). It should be noted that CM4C192 likely underestimates the northward heat transport in the Atlantic. The simulated maximum transport of about 1 PW at 26°N is lower than the recent observational estimate of about 1.3 PW16,31 (Fig. 1c).

We consider the atmosphere north of 40°N as a whole (“northern atmosphere”) and perform a detailed heat budget analysis for December, January, and February (DJF). During boreal winter of the control, the northern atmosphere loses 13.3 PW heat at the top of the atmosphere (TOA) but gains 6.1 PW from the surface (Fig. 1a). The heat deficit of 7.2 PW is compensated by the atmospheric heat transport across 40°N mainly associated with mid-latitude weather processes especially baroclinic transient eddies. Without an AMOC and its northward heat transport in the hosing experiment (Fig. 1c), the TOA and surface heat fluxes reduce by 0.6 PW and 1.1 PW, respectively (Fig. 1a). To compensate the increased heat deficit due to these changes, the atmosphere must transport about 0.5 PW more heat northward across 40°N. This “Bjerknes compensation” mechanism32,33,34,35,36 works to stabilize the mean temperature and maintain the energy balance of the northern atmosphere in a climate without AMOC.

Fig. 1: Energy balance of the northern atmosphere in the climates with and without an active AMOC.
figure 1

a Schematic shows the energy balance for the entire atmosphere north of 40°N. The left half with black numbers (annual/DJF) shows heat fluxes (PW) at the top, bottom and southern boundaries in the long-term control run of CM4C192. The right half with red numbers shows the heat flux anomalies during years 21–100 of the hosing experiment relative to the control. The positive and negative values indicate enhanced and reduced heat fluxes, respectively. Only the annual mean value is shown for the oceanic transport. The blue and yellow shadings denote the atmosphere and AMOC, respectively. b Annual northward heat transport by the global atmosphere and global ocean as a function of latitude in the control run. c Annual northward heat transport of the global ocean and the Atlantic in the control and during years 21–100 of the hosing experiment. The green vertical dashed line marks 40°N.

The enhanced atmospheric heat transport during winter is achieved through more active weather processes at mid-latitudes33. In the control, intense north–south atmospheric heat exchanges occur over a broad region at 40°N. At 850 hPa, large atmospheric eddy temperature fluxes27 (vT′; see “Methods” section) are found over the eastern North America and western North Atlantic, East Asia, and the North Pacific, as well as over Europe and Middle East (Fig. 2a). These regions coincide with the mid-latitude storm track where extratropical cyclones and anti-cyclones continuously develop and propagate, thereby efficiently mixing warm and cold air masses. In particular, the U.S. east of the Rocky Mountains37 sees some of the highest values of vT′ (Fig. 2a).

Fig. 2: Enhanced atmospheric heat transport by transient eddies in response to the shutdown of AMOC.
figure 2

a Atmospheric eddy temperature flux (vT′) (°C m s−1) at 850 hPa in the long-term control. vT′ is band passed using a Lanczos filter to identify synoptic variations on 3–15 days. Positive and negative values indicate northward and southward transport of sensible heat, respectively. The green asterisks mark Chicago, Houston, and New York. The thin grey lines are surface topography with 1000 m intervals. b Anomalies of the atmospheric eddy temperature flux (°C m s−1) during years 21–100 of the hosing experiment relative to the control. c Anomalies of the surface heat flux (W/m2) during years 21–100 of the hosing experiment relative to the control. Negative values indicate reduction of the upward heat flux. The freshwater perturbation is input into the ocean region of the green box. All values in a, b and c are for DJF. See Supplementary Fig. 2 for the TOA and surface heat fluxes in the control run.

The atmospheric eddy temperature flux is sensitive to the change in heat transport by AMOC and the surface heat flux anomalies in the northern Atlantic and Arctic (Fig. 2c). After the AMOC shutdown in the hosing experiment of CM4C192, vT′ shows large increases at the northern latitudes (Fig. 2b). North of 40°N, the increase in the eddy sensible heat flux concentrates over the northern North Atlantic, where the mean cooling is largest and amplified due to the sea ice feedback (Supplementary Fig. 3b). South of 40°N, higher vT′ values are pronounced over the eastern U.S. and the North Pacific (Fig. 2b). Note that the southward intrusion of frigid Arctic air mass is equivalent to a large northward temperature flux because both v′ and T′ are negative and have large absolute values. In addition, the atmospheric eddy latent heat flux (vq′) shows a consistent increase in the 20°–40°N latitudinal band (Supplementary Fig. 4).

Response of the U.S. extreme cold weather to the AMOC shutdown

During years 21–100 in the hosing experiment, the global annual mean surface air temperature cools by about 1 °C relative to the control (Supplementary Fig. 5a). This global cooling, centered at the northern North Atlantic, is a result of the cloud, water vapor, and sea ice feedbacks associated with the reduced northward heat transport in the ocean38,39. Other changes of the large-scale mean climate in the hosing experiment are generally similar to the previous results1.

Next we focus on the U.S. daily surface air temperature (Ts) in DJF. Compared with the reanalysis data of ERA540 during 1979–2021 (see “Methods” section), CM4C192 simulates the mean and daily variations of Ts in DJF well in the control run (Supplementary Fig. 6). As for extremely cold temperatures, we evaluate the model performance at Chicago, Houston, and New York, three large cities representing the Midwest, South, and Northeast U.S., respectively. At Chicago, the daily temperature anomaly relative to the daily climatology (ΔTs; see “Methods” section) reached the lowest point of −23.5 °C on January 31, 2019 in the detrended and deseasonalized ERA5 data (Fig. 3a, b). The extremeness of the recent Texas cold snap during February 2021 (https://en.wikipedia.org/wiki/February_2021_North_American_cold_wave) is even more striking. ΔTs at Houston plummeted to −23.4 °C on February 16, 2021, by far colder than previous extreme events (Fig. 3c, d). At New York, the coldest ΔTs occurred on January 18, 1982 and on February 20 and 24, 2015, with a magnitude of about −16.3 °C (Fig. 3e, f).

Fig. 3: Data-model comparison of DJF daily temperature anomalies (ΔTs) at three cities of the U.S.
figure 3

a, b Chicago; c, d Houston; e, f New York. a, c, e The time series for 1979–2021 of ERA5 and the 50-year control simulation of CM4C192. Both curves are detrended and deseasonalized so that the mean is zero. The coldest value of ΔTs at each city in ERA5 is marked with its occurrence date. b, d, f The histograms of the 42-year ERA5 data and the 100-year control simulation of CM4C192. Note that the x-axis uses a logarithmic scale and denotes probability (ci/N; ci—bin count; N—total count). The solid horizontal lines show the mean. The dashed horizontal lines denote the return levels for the 1-in-10-year and 1-in-100-year cold events. Their values along with the mean and three moments of the time series are listed at the upper left corner. From left to right: mean, standard deviation, skewness, kurtosis, \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{10}\), and \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\).

At the three cities, CM4C192 simulates the general statistics of ΔTs well in the control run, including its standard deviation, skewness, and kurtosis (Fig. 3). However, the model underestimates extreme cold events as evidenced by the higher 10-year and 100-year return levels (\({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{10}\) and \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\); see “Methods” section for the return level calculation), especially at Houston (Fig. 3). Different resolutions and external forcings, as well as existing model biases, are among the possible reasons for the differences between the ERA5 data and CM4C192 simulations.

After the shutdown of AMOC in the hosing experiment, the intensity and frequency of extremely cold daily temperatures over the U.S. increase disproportionately compared with the mean temperature response (Figs. 4 and 5 and Supplementary Fig. 7). At Chicago, the 10-year and 100-year return levels of \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{10}\) and \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) further drop by 3.4 °C and 3.6 °C, respectively, in the hosing experiment, compared with a mean cooling of 1.6 °C relative to the control (Fig. 4a, b). \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) (−20.9 °C) in the control is almost identical to \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{10}\) (−20.8 °C) in the hosing run, suggesting that the 100-year extreme cold event could occur every 10 years at Chicago after the AMOC shutdown. At Houston, \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) drops more and by 4.6 °C from −14.8 °C in the control to −19.4 °C in the hosing (Fig. 4c, d). It represents a change more than five times larger than the mean cooling of 0.9 °C (Fig. 5f). Interestingly, this drop makes \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) in CM4C192 closer to that of ERA5 (Fig. 3c, d). At New York, \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{10}\) and \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) further drop by 5.6 °C and 5.4 °C, respectively, compared with a mean cooling of 2 °C (Fig. 4e, f). Extremely cold temperatures reaching or exceeding \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) = −15.7 °C in the control occur more frequently and for about 60 times/days in the hosing experiment.

Fig. 4: Response of DJF daily temperature anomalies (ΔTs) at three cities of the U.S. in the hosing experiment.
figure 4

a, b Chicago; c, d Houston; e, f New York. a, c, e Time series for 100 years or 9000 DJF days. In both curves, the daily climatology from the control has been removed and the mean cooling remains in the curve of the hosing run. b, d, f The histograms. The y-axis and x-axis are the temperature anomaly and the number of days, respectively. Note the x-axis uses a logarithmic scale. The solid horizontal lines show the long-term mean. The dashed horizontal lines denote the return levels for the 1-in-10-year and 1-in-100-year cold events. The statistics of the time series are listed at the upper left corner (from left to right: long-term mean, standard deviation, skewness, kurtosis, \({\widehat{\Delta T}}_{s}^{10}\) and \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\)). These statistics are calculated based on years 1–100 of the control run and years 21–100 of the hosing run. The 90% confidence bounds of \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{10}\) and \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) quantified by the bootstrapping can be found in Supplementary Fig. 8.

Fig. 5: Changes in statistics of DJF daily temperature anomalies (ΔTs) over mid-latitude land areas in the hosing experiment.
figure 5

a Long-term mean (°C), b Standard deviation (°C), c Skewness, d Kurtosis, e 100-year return level (\({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\); °C); f Ratio of the extreme (e) and mean (a) responses. The values show the changes in statistics during years 21–100 of the hosing experiment relative to the long-term control. f Large positive values over North America indicate amplified responses of extremely cold daily temperature relative to the mean cooling. Negative values indicate that the extreme and mean temperature responses have opposite signs. See Supplementary Fig. 7 for these statistics in the long-term control simulation.

To assess the uncertainty associated with the extreme value analysis, we perform the Kolmogorov–Smirnov test for the annual coldest ΔTs at Chicago, Houston and New York between the control and hosing runs. The test rejects the null hypothesis at the 5% significance level that the control and hosing samples are drawn from the same distribution. In terms of the return level estimate, we apply the bootstrap method to quantify its 90% confidence bounds41 (Supplementary Fig. 8). The results confirm that compared with the control, the drops of \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{10}\) and \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) in the hosing experiment are statistically significant at the three cities.

Impact factors for the change in return levels

In the hosing experiment, the drops in return level of extreme cold temperatures could be caused by multiple factors42 (Fig. 5): the mean cooling, increased overall variance, reduced skewness, changes in the seasonal cycle (Supplementary Fig. 9), and individual extratropical cyclones/anti-cyclones that become stronger and propagate more southward. At New York, the mean cooling (−2.0 °C), the increased standard deviation (from 4.7° to 5.4 °C) and the reduced skewness (from 0.4 to 0), as well as more extreme individual weather events, all contribute to the drop of \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{10}\) and \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) in the hosing run (Fig. 4e). Similarly, these factors are important to explain the intensification of extreme cold weather over western Europe (Fig. 5 and Supplementary Fig. 7), along with the increase in snow cover (Supplementary Fig. 10). However, snow cover in the hosing experiment changes little over the U.S. due to a minimum cooling (Fig. 5a).

By comparison, the drop of \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{10}\) and \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) at Houston is mainly caused by individual extreme weather events rather than by the overall variability and skewness (Fig. 4c). This is consistent with the increase in kurtosis that measures the tailedness of the temperature distribution (i.e., outliers). In fact, the large drops of \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\) in the Great Plains just east of the Rocky Mountains are related to the increased kurtosis, which also dominates the ratio of the extreme and mean responses (Fig. 5d–f). The shutdown of AMOC sharpens the meridional temperature gradient at the northern mid-latitudes and increases the baroclinicity of the atmosphere. These lead to stronger weather systems that propagate more southward.

It should be noted that the analysis above is based on daily temperature anomalies (ΔTs) relative to the daily climatology in the control (\({\tilde{T}}_{{{{{{\rm{s}}}}}}}\)). Due to the relatively small curvature of the seasonal cycle in DJF (Supplementary Fig. 9), the largest negative anomalies also mean the local coldest weather during winter. Among the three cities, Chicago is located in land interior and generally colder than the coastal Houston and New York. The absolute daily temperature (Ts) at Chicago could drop to as low as −27.4 °C in the hosing run of CM4C192, compared with the coldest temperature of −14.3 °C at Houston and −21.6 °C at New York.

Conclusions

In this study, we use a state-of-the-art global weather/climate modeling system with high resolution to investigate the influence of AMOC on extreme winter weather. Located at the upwind direction of the North Atlantic, mean winter temperatures over the U.S. are thought to be less influenced by the AMOC compared with the downwind European side (Fig. 5a and Supplementary Fig. 3b). From a concise energy balance point of view without involving much advanced atmospheric dynamics, we show here that AMOC can modulate daily temperature extremes more efficiently over the U.S. (Fig. 5e). The AMOC shutdown and reduced northward heat transport in the Atlantic are capable of exciting more extremely cold weather over the U.S. during winter. This amplified response at the tail of the temperature distribution could be several times larger than that of the mean (Fig. 5f).

This sensitivity of extreme weather over land interior to deep ocean circulation seems surprising but is nevertheless a robust response required by Bjerknes compensation. Due to the north–south orientation of the mountain series over North America (Fig. 2), the Arctic outbreak during winter can push frigid polar air mass from Canada all the way southward to the Gulf of Mexico. We find that this channel of intense atmospheric heat exchange becomes even more active after the shutdown of AMOC, thereby intensifying extreme cold events over the U.S. In other words, an active AMOC in the present-day climate likely makes the U.S. winter less harsh and extreme.

According to some of recent observational studies, the AMOC has weakened during the past century43. In particular, the northward heat transport at 26°N in the North Atlantic reduced by 0.17 PW and from 1.32 PW during 2004–2008 down to 1.15 PW during 2009–2016, as a result of a recent AMOC slowdown event31. This reduction in ocean heat transport influenced the northern atmosphere through heat flux anomalies at the ocean surface. The magnitude of this reduction represents a sizeable fraction of that induced by the AMOC shutdown in the CM4C192 simulations (Fig. 1).

Anyway, the model simulations carried out here represent a sensitivity study. Given the highly idealized nature of the hosing experiment in this study, one should be cautious about its implication for extreme cold weather in future climates. This is evidenced by the opposite trends of the mean temperature and Arctic sea ice between the ERA5 data and the CM4C192 simulation (Supplementary Fig. 3). Compared with the shutdown case, in addition, a slowdown of AMOC could cause a similar but more gradual response of the extreme weather. Despite these caveats, one sure thing is that Bjerknes compensation, which is derived from the very basic law of energy conservation, should continue to work in the future climate. Anything that alters one way of the energy flow will trigger a response from the others.

Methods

The GFDL CM4C192 model

CM4C192 is the high resolution version of the latest generation of the climate models developed and used at GFDL19. For various metrics, it performs among the best CMIP6 models44. The atmospheric model (AM4)20,21,22 adopts finite-volume cubed-sphere dynamical core with 192 grid boxes per cube face (~0.5° grid spacing). It has 33 vertical levels and the model top is located at 1 hPa. The model incorporates updated physics such as a double-plume scheme for shallow and deep convection and a new mountain gravity wave drag parameterization21. Due to improvements in model resolution, physics and dynamics, CM4C192 simulates strong synoptic systems well such as hurricanes45 and atmospheric rivers22.

The oceanic model of CM4C192 is based on the Modular Ocean Model version 6 (MOM6)23. It uses the Arbitrary-Lagrangian-Eulerian algorithm in the vertical to allow for the combination of different vertical coordinates including geopotential and isopycnal. The model adopts the C-grid stencil in the horizontal and is configured on a tripolar grid. It has a 0.25° eddy-permitting horizontal resolution and 75 hybrid vertical layers down to the 6500 m maximum bottom depth. The vertical grid spacing can be as fine as 2 m near the ocean surface.

Daily or even hourly data of important atmospheric variables are saved to facilitate analyses on weather and extreme events. These variables include surface air temperature (Ts), precipitation, sea level pressure, atmospheric temperature (T) at 250 and 850 hPa, zonal and meridional winds (u, v) at 250 and 850 hPa, and specific humidity (q) at 850 hPa. The model uses a noleap calendar that has 365 days in every year.

Control run and water-hosing experiment with CM4C192

The initial condition is obtained from a long-term control simulation under the 1850 radiative forcing. During the 100-year control run under the 1950 radiative forcing, the global mean surface air temperature shows a slight increase (Supplementary Fig. 5a). This drift is mainly caused by some high-latitude regions. At low and mid-latitudes, Ts is quite stable in the control run without any clear trend (Supplementary Fig. 5bd).

In the water-hosing experiment, a 0.6 Sv freshwater addition is input uniformly into the northern North Atlantic and the ocean region from 65°W–5°E and 50°N–75°N (see the green box in Fig. 2c) for 100 years. This freshwater addition is not compensated elsewhere. So it leads to about 5 m global sea level rise over the 100-year period. The perturbation freshwater is input at the same temperature as the local sea surface temperature. So while it is a mass source and reduces regional and global ocean salinity, it is not a specific heat source or sink and therefore does not influence the heat budget analysis here.

Atmospheric and Oceanic heat transport

In this study, we use both the direct and indirect methods to calculate the heat transport by the atmosphere and ocean. In the long-term control run, the total northward heat transport by the global atmosphere and global ocean at a latitude ϕ can be estimated by integrating the net radiative flux at TOA from the South (or North) Pole to latitude ϕ.

$${Q}_{{{{{{\rm{t}}}}}}}(\phi )={\int }_{-\frac{\pi }{2}}^{\phi }{\int }_{0}^{2\pi }{F}_{{{{{{\rm{TOA}}}}}}}{R}^{2}\cos \,\phi ^{\prime} \,{{{{{{\rm{d}}}}}}}\lambda \,{{{{{{\rm{d}}}}}}}\phi ^{\prime}$$
(1)

Qt is the total northward heat transport; FTOA the net radiative flux at TOA; R Earth’s radius; λ and ϕ are longitude and latitude, respectively. Similarly, the atmospheric heat transport (Qa) is estimated as

$${Q}_{{{{{{\rm{a}}}}}}}(\phi )={{\int }_{-\frac{\pi }{2}}^{\phi }}{\int }_{0}^{2\pi }({F}_{{{{{{\rm{TOA}}}}}}}-{F}_{{{{{{\rm{sfc}}}}}}}){R}^{2}{{{{{\rm{cos}}}}}}\phi ^{\prime} \,{{{{{{\rm{d}}}}}}}\lambda \,{{{{{\rm{d}}}}}}\phi ^{\prime},$$
(2)

where Fsfc is the heat flux at the surface.

We adopt the direct method to calculate the heat transport in an ocean basin. Integrate the transport from the western to the eastern boundary and vertically. Then sum across the ocean basins.

$${Q}_{{{{{{\rm{o}}}}}}}(\phi )=\mathop{\sum}\limits_{{{{{{\rm{basin}}}}}}}{\int }_{-H}^{\eta }{\int }_{{{{{{\rm{w}}}}}}}^{{{{{{\rm{e}}}}}}}{\rho }_{{{{{{\rm{w}}}}}}}{c}_{{{{{{\rm{p}}}}}}}Tv \,R \,cos\phi \,{{{{{\rm{d}}}}}}\lambda \,{{{{{\rm{d}}}}}}z$$
(3)

Qo is the global ocean heat transport, T the ocean potential temperature, v the ocean meridional velocity, ρw seawater density, cp seawater heat capacity, η and H denote ocean surface and bottom, respectively.

Sensible and latent heat fluxes from the atmospheric transient Eddies

To calculate the atmospheric eddy heat fluxes, we apply a Lanczos bandpass filter46 to daily atmospheric temperature (T), specific humidity (q), and meridional wind (v) to identify their variations on the synoptic timescale of 3–15 days. We first remove the seasonal cycle before applying the filter to the time series.

$$x{\prime} (t)=\mathop{\sum }\limits_{k=-L}^{L}w(k)x(t-k)$$
(4)
$$w(k)=\left(\frac{{{{{{\rm{sin}}}}}}2\pi {f}_{2}k}{\pi k}-\frac{{{{{{\rm{sin}}}}}}2\pi {f}_{1}k}{\pi k}\right)\frac{{{{{{\rm{sin}}}}}}\pi k/L}{\pi k/L}$$
(5)
$$k=-L,\ldots ,0,\ldots ,L$$

x and x′ represent the original and filtered time series of T, q or v, respectively. f1 and f2 are the cutoff frequencies for the bandpass filter. w(k) represents a set of weights within the filter window (L = 25).

Analysis on extreme daily surface air temperature

The anomaly of daily surface air temperature is the departure from its daily climatology.

$$\Delta {T}_{{{{{{\rm{s}}}}}}}(x,y,t)={T}_{{{{{{\rm{s}}}}}}}(x,y,t)-{\tilde{T}}_{{{{{{\rm{s}}}}}}}(x,y,{t}_{1}),{t}_{1}=1,2,\ldots ,365$$
(6)

Ts, \({\tilde{T}}_{{{{{{\rm{s}}}}}}}\) and ΔTs are daily temperature, its climatology and anomaly, respectively. As the coldest three months at the mid-latitude Northern Hemisphere, \({\tilde{T}}_{{{{{{\rm{s}}}}}}}\) over DJF shows relatively small variation compared with the annual cycle (Supplementary Fig. 9). Note that ΔTs in the hosing experiment is calculated relative to \({\tilde{T}}_{{{{{{\rm{s}}}}}}}\) in the control. So the change in the seasonal cycle (mean, amplitude and timing) in the hosing run also contributes to ΔTs (Supplementary Fig. 9).

To calculate return levels of extremely cold daily temperatures, we use the block maxima approach in the extreme value analysis41,47. We consider the time series of −ΔTs and pick out the maximum daily values (i.e., the coldest daily temperatures) in DJF for each year. Then we fit the generalized extreme value (GEV) distribution to annual maxima of −ΔTs.

$$G(x)=\exp \{-{[1+k\left(\frac{x-\mu }{\sigma }\right)]}^{-\frac{1}{k}}\}$$
(7)
$$1+k\frac{x-\mu }{\sigma }\, > \,0$$

k, σ, and μ are the shape, scale and location parameters of GEV, respectively. For k = 0, the GEV distribution reduces to the Gumbel distribution. For k > 0 and k < 0, the GEV distribution becomes the Fréchet and Weibull distribution, respectively. After the three parameters are determined, the return levels (\({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{10}\) and \({\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}\)) can be estimated with the inverse cumulative density function of the GEV distribution. For example,

$$-{\widehat{\Delta T}}_{{{{{{\rm{s}}}}}}}^{100}=\mu -\frac{\sigma }{k}\{1-{[-{{{{\mathrm{ln}}}}}(1-\frac{1}{100})]}^{-k}\}$$
(8)

To assess the uncertainty associated with the return level estimates and determine whether the changes in return level in the hosing experiment are statistically significant, we use the bootstrap method41,48 to generate 10,000 samples of the annual maximum values of −ΔTs and quantify the 90% confidence bounds.

ERA5 reanalysis

ERA5 combines large amounts of historical observations and uses advanced modeling and data assimilation to obtain global estimates of the atmosphere40. For the data-model comparison in this study, we use the 3-h global surface air temperature data from January 1, 1979 to February 28, 2021. The data with a 0.25° horizontal resolution are downloaded from the Copernicus Climate Change Service (https://doi.org/10.24381/cds.adbb2d47). February 29 in the leap years is removed before the data-model comparison.