Main

Improving our ability to characterize internal and natural climate variability is pivotal to improving our long-term climate projections1, particularly at the regional scale2. The statistics of natural variability can be quantified up to decadal timescales using instrumental observations, but indirect climate proxies are needed for longer timescales.

Proxy-model comparisons aiming to validate climate models generally focus on whether long-term (forced) trends found in proxies can be reproduced by global climate models (GCMs)3,4. In this regard, trends over the Holocene from pollen-based climate reconstructions in northern mid-latitudes were shown to be consistent with the forcing and simulated warming in the TraCE-21ka experiment5. However, the long-term multi-millennial trend is a rather limited measure that does not evaluate the ability of climate models to simulate realistic climate variability, both forced and internal, at all timescales. Furthermore, climate reconstructions of continental or global average temperatures based on sedimentary proxies generally do not preserve the amplitude of millennial climate variability because of age uncertainty (Methods and Extended Data Fig. 1). To evaluate climate variability across timescales, we consider the power spectral density (PSD, or simply spectrum), St), which provides an estimate of how variance is distributed with frequency, or equivalently with timescale Δt. Others6 have provided the classical view of climate variability being dominated by specific quasi-periodic processes, such as solar variations linked to orbital cycles over a mostly random and uncorrelated (that is, white) noise background. Since then, an increasing number of climate proxies and model simulations7 have shown a contrasting view of a variance that continuously increases with increasing timescale. Such a behaviour is often well-approximated by a power-law-scaling relationship8,9 and can be summarized by the scaling exponent β such that \({{S\left( {\Delta t} \right)} \sim {\Delta t^\beta}}\).

Several studies have suggested that GCMs can simulate realistic climate variability for global and continental mean temperatures across timescales7,10,11 that are dominated by the response to external forcing. However, at the regional scale, where only a small fraction of the variability is forced12, the capability of GCMs has been called into question for decadal and longer timescales13,14,15. Marine proxies suggest a strong scaling of regional sea-surface temperature (SST) variance from annual to millennial timescales (β ≈ 1)9 that contrasts with the weaker scaling found in GCMs at longer-than-decadal timescales (β < 0.5 in most regions)15,16,17. This leads to an increasing discrepancy between reconstructed SSTs and model simulations at longer timescales, reaching two orders of magnitude in variance at the millennial timescale8.

On land, instrumental temperatures show a fundamentally different behaviour compared to the oceans, with only a weak timescale dependency up to multi-decadal timescales (β ≈ 0.1)16,17,18. This low-frequency weather has been termed ‘climate noise’19 or the ‘macroweather regime’20 and starts at sub-monthly timescales. This difference can be explained by the much smaller heat capacity of land surfaces compared to the oceanic mixed layer21. If the macroweather-type behaviour over land held to longer timescales, internal variability would only play a minor role on the uncertainty of regional climate projections at multi-decadal and longer timescales22,23. However, a different scaling behaviour of ocean and land at longer timescales would imply an increasingly large variability discrepancy between terrestrial and marine regions, which seems physically implausible given their coupling by the atmosphere, and in contradiction with both diffusive energy balance models (EBMs)24,25 and GCMs17. This leaves us with the conundrum that we must either reject altogether the marine proxies or see a fundamental change of variability in the terrestrial domain on longer timescales.

Given the limited instrumental record, this fundamental question on the scaling of variability between land and ocean cannot be answered from observations without terrestrial proxy data. Holocene Greenland ice-core δ18O timeseries suggest that the weak scaling behaviour found in the instrumental data extends to millennial timescales (akin to white noise with β ≈ 0)26. However, it is unclear whether the climate variability derived from the high-altitude Greenland ice cores is representative of other terrestrial regions27 and to what extent the proxy variability reflects temperature variability28.

Land-temperature variability from pollen records

To elucidate the behaviour of climate variability over land at timescales longer than centennial, we compiled and analysed an extensive collection of recent Holocene Northern Hemisphere pollen data covering the past 8,000 years, providing the largest spatial coverage of any land-based proxy at millennial timescales. This compilation of pollen data comprises 1,744 unique records from the Northern Hemisphere, 680 of which are from North America, 804 from Europe and 260 from Asia, including the most extensive dataset so far for China29 and Siberia30. We produced summer (June–July–August (JJA)) temperature reconstructions (hereafter simply termed temperature), as summer temperatures are usually well-correlated with other variables driving vegetation growth31, and found a robust signal of millennial-scale variability for all continents (Extended Data Fig. 2). Our results are not sensitive to the choice of summer temperature (Extended Data Fig. 3a) or a potential influence of precipitation (Extended Data Fig. 3b). Pollen-based reconstructions rely on the assumption of dynamical equilibrium between climate and vegetation to calibrate a transfer function based on the modern distribution of vegetation and the associated spatial gradient in the climatology. This assumption is timescale-dependent and is generally more valid at longer timescales31. In this respect, millennial-scale estimates of temperature variability, on which our quantitative results hinge, should thus be particularly robust, especially considering there is evidence that vegetation responds on timescales of centuries or faster32,33. Finally, we verified that the estimates of millennial variability were not systematically biased by human influences (Supplementary Fig. 2) and that our estimates are robust to the time uncertainty and irregular spacing of the timeseries (Extended Data Fig. 1).

Pollen-based reconstructions, once coupled with instrumental data34, enable us to establish a comprehensive picture of regional land-temperature variability from inter-annual to millennial timescales (Fig. 1 and Methods). The spectrum of instrumental air temperature shows a rather flat scaling behaviour that is characteristic of the macroweather regime. At longer timescales, however, the pollen-based reconstructions show a strong increase in variability with increasing timescale (Fig. 1a). This suggests that, even in the relatively stable recent Holocene, there exists substantial centennial- to millennial-scale temperature variability. Although human influences have historically impacted vegetation composition, they do not dominate this result, as they were not found to systematically bias millennial-scale temperature variability (Supplementary Fig. 2). This behaviour clearly differs from the rather flat macroweather regime and resembles power-law scaling with an exponent β near one for timescales longer than centennial. Traces of this increased scaling behaviour at multi-decadal timescales already appear in the instrumental record, and it is corroborated by dendrochronological timeseries and long instrumental series (Extended Data Fig. 4 and Supplementary Note 1).

Fig. 1: Average spectral estimates of local land temperature over the Northern Hemisphere.
figure 1

a, PSD estimates of land air temperature (T land) from pollen-based reconstructions along instrumental data and model simulations, both extracted at the pollen record locations; logarithmically spaced axes were used. Also shown are estimates from timeseries detrended with a 23-kyr sinusoidal or with respect to log(CO2) (dashed). The number of pollen records contributing to each timescale is indicated below (brown axis). b, Average spectral estimates from reconstructed annual SSTs9, and instrumental data at the corresponding locations. Observational spectra from a are reproduced. Linear combinations of power laws with slope β = 1.2 and white-noise series are shown as dashed-dotted lines (land in green, sea in blue and white-noise levels in grey). Shading indicates 90% confidence intervals around the mean.

Source data

We benchmark our result against three transient climate-model simulations of the recent Holocene: IPSL35, ECHAM536 and the last 8,000 years of the TraCE-21ka deglaciation experiment37 (that is, after the last freshwater forcing events). The transient simulations of these GCMs show the weak scaling behaviour characteristic of the macroweather regime, although with higher annual to decadal variability relative to the instrumental data, as previously recognized17. However, they fail to capture the increase in variability observed in the reconstructions at multi-decadal timescales. Instead, they show a relatively weak increase at multi-centennial timescales and a sharp increase at millennial timescales due to spectral leaking from the orbital 23-kyr precession cycle (Fig. 1a and Extended Data Fig. 5). As a result of this divergence in variability scaling, there is an increasing deficit in temperature variability observed in the simulations compared to the reconstructions. The range of variance ratios between the reconstructions and the different model simulations increases from 6–8 over the 100–300 years timescale to 11–54 (12–85 with orbital detrending) over the 1,000–3,000 years timescales band, resembling the discrepancy between models and proxy data for regional SST variability9. Similar results are also obtained with the fully forced Coupled Model Intercomparison Project – Phase 5 (CMIP5) simulations of the last millennium, and accounting for volcanic forcing over the Holocene is unlikely to resolve this discrepancy (Extended Data Fig. 6).

Marine influence on the spatial pattern of millennial variability

Comparison of the reconstructions over land with estimates of marine variability9 shows a very similar low-frequency scaling behaviour (Fig. 1b). The observed parallel scaling behaviour is expected if both components vary more coherently as climatic variability becomes a global phenomenon over longer timescales24, as also indicated by coherent land and ocean average temperatures5. EBMs suggest that this parallel behaviour of land and oceans on long timescales is due to heat exchange between the land and ocean compartments. In such models, land air temperature can be described as a linear combination of the SST and a time-dependent forcing over land25,38; the resulting variability spectrum over land is then a linear combination of the spectra of each term (Fig. 1b) when the two are uncorrelated (Supplementary Note 2). In this framework, the change in scaling behaviour can be regarded as a transition from the macroweather regime at shorter timescales, dominated by a weakly scaling forcing component akin to white noise over land21, to an oceanic regime dominated by the SST component at timescales longer than decadal. Interestingly, the parallel behaviour between land- and ocean-temperature spectra on multi-decadal to millennial timescales provides no evidence for additional terrestrial slow climate feedbacks. The oceanic component present in land-temperature variability appears amplified by a factor of ~4 in PSD or ~2 in amplitude (Fig. 1b). This factor is similar to the land–sea warming contrast39 observed during the last century40 and within the range of land–sea warming ratios measured in GCMs41. This is thought to be the result of local feedbacks, such as evaporation feedback, when moisture availability over land limits evaporative cooling in comparison with marine regions42, and also because of the asymmetry in the land–ocean heat exchange, which favours land due to its lower specific humidity41.

The extensive spatial coverage of pollen-based reconstructions allows us to perform a spatial analysis of the millennial-scale temperature variability PSD1,000–3,000 years (the mean PSD over 1,000–3,000 years; Fig. 2a) and investigate the potential link to oceanic influence. The spatial coherence (Moran’s I = 0.2, P < 0.001; Methods) shows that the variability estimates are not drowned out by local noise. Over Europe, millennial-scale variability decreases inland along the path of prevailing winds blowing from the Atlantic Ocean, and is lowest over Fennoscandia (Finland and Scandinavia), where blocking events are most frequent43. Similarly, China’s high millennial variability is linked to variability in the East Asian Summer Monsoon44, and further north in eastern Siberia the dominant westerlies bring little oceanic influence. This further suggests that higher millennial variability relies on higher connectivity to oceans, as implied by EBMs, although compounded by local sensitivity. The high variability in central Asia remains an outlier given the strong continentality there, but the significance is lower because of the sparseness of records. It is also possible that the lower connectivity to oceans is compensated by the stronger local climate sensitivity40, which may be linked to hydrological feedback due to the arid conditions42 and to snow-albedo feedback at higher elevations45. Meanwhile, in North America, the lowest millennial variability is found in the prairies, near the centre of the continent, where the westerlies predominantly blow from the northwest, and the oceanic influence is lowest.

Fig. 2: Spatial patterns of temperature variability in instrumental data and pollen-based reconstructions.
figure 2

a, Map of millennial variability estimated from the summer temperature spectra of pollen-based reconstructions as the mean PSD over the 1,000–3,000-yr timescale band in 4° × 4° spatial bins with opacity linearly proportional to the number of records in the bin and saturating when there are five records (see Extended Data Fig. 7 for raw and smoothed estimates). b, Map of the multi-decadal scaling exponent β from the spectra of instrumental temperature records fitted over the 10–60-yr timescale band. c, Map of sub-decadal variability, mean PSD over the 2–10-yr timescale band, estimated from the spectra of instrumental temperature records. A logarithmically spaced colour scale was used for a and c.

Source data

Links in the continuum of land-temperature variability

We use the instrumental data to study the mechanisms governing the spatial distribution of the millennial variability and the continuum of variability. The scaling of variability in instrumental data has already been shown to be related to the strength of the annual cycle and of the sub-decadal variability8. If we aggregate the instrumental data and reconstructions based on the sub-decadal variability PSD2–10 years (the mean PSD over 2–10 years; Fig. 3 and Extended Data Figs. 7 and 8), a clear relationship appears with the emergence of the low-frequency regime, quantified by the multi-decadal scaling exponent β10–60 years (β regressed over 10–60 years): locations with lower sub-decadal variability thus show a stronger increase of variability towards longer timescales, as indicated by higher multi-decadal scaling (Figs. 2b,c and 3 and Extended Data Fig. 8). We should thus expect an inversion where regions of low sub-decadal variability, typically characterized by more maritime influences, would become regions of high variability at longer-than-centennial timescales. Similarly, continental regions characterized by high sub-decadal variability would be expected to become regions of relatively low variability on longer timescales.

Fig. 3: Spectral estimates of land temperature as a function of sub-decadal variability.
figure 3

a, PSD estimates of the instrumental temperature (Δt < 60 years) and of the pollen-based reconstructions (Δt > 100 years) binned according to sub-decadal instrumental variability (Methods and Extended Data Fig. 7c,d); logarithmically spaced axes were used. Shading indicates 90% confidence intervals around the mean. b, Relationship of the instrumental sub-decadal temperature variability PSD2–10 years and the pollen-based millennial temperature variability PSD1,000–3,000 years. Vertical and horizontal lines indicate the standard error of the mean within each bin. c, As in b, but between β10–60 years and PSD1,000–3,000 years.

Source data

Indeed, this hypothesized relationship is confirmed by the pollen-based reconstructions. Their estimates of millennial temperature variability PSD1,000–3,000 years show a strong anti-correlation with the sub-decadal variability PSD2–10 years (r = −0.95, P < 0.03; Fig. 3b) and a strong correlation with the multi-decadal scaling β10–60 years (r = 0.92, P < 0.01; Fig. 3c). These substantial strong relationships between the pollen-based reconstructions and independent instrumental temperature data demonstrate a fundamental link of temperature variability from sub-decadal to millennial timescales. The spatial pattern of the variability (Fig. 2 and Extended Data Fig. 5) further suggests that this relationship is caused by a varying marine influence. This is supported by a strong correlation (r = −0.93, P < 0.02) of the spatial pattern of millennial variability with the relative land influence index46 (RLI; ‘Relative land influence index’), which quantifies the relative contributions of land and ocean to atmospheric fluxes at a given location using a Lagrangian trajectory model (Extended Data Fig. 9). An exception is Central and East Asia, where the RLI is highest, but millennial variability remains high. Finally, our result provides continentality as a complementary explanation for the relationship between the amplitude of the annual cycle (an indicator for continentality) and the inter-annual variability scaling in the instrumental relationship proposed in ref. 8. Therefore, our findings complete the linkage between seasonal and millennial land-temperature variability.

Discussion

Our results indicate that current GCMs underestimate regional temperature variability over land at timescales longer than multi-decadal (Fig. 1a and Extended Data Fig. 5). In combination with the spatial pattern of variability (Fig. 2), this suggests that the deficit in low-frequency variability is related to an underestimation of marine variability9. The interpretation of climate-sensitive proxies remains an area of active research, and, in principle, it remains possible that the observed model-data mismatches stem from non-climatic variability. However, several lines of evidence argue against this interpretation. First, as yet, there are no known archival processes that could artificially create such power-law scaling in sedimentary archives47. Specifically, known processes such as counting errors, spatial or temporal aliasing, and bioturbation in the sediment cannot explain the power spectra of variability found here (Figs. 1 and 3). The vegetation’s response time to climate can affect the variability, but it would only reduce it on the faster timescales, in contrast to increasing it on millennial timescales. Second, the consistency between independent marine and terrestrial archives (Fig. 1b) provides further support for the temperature-variability reconstruction. Although, in principle, similar non-climatic effects could create scaling in both sedimentary archives, in each case, annually resolved independent archives (corals for SST9, tree records for land temperature; Extended Data Fig. 4) confirm the scaling behaviour and amplitude. Finally, the spatial relationship with the independent instrumental temperature data (Fig. 3) also indicates that this is no artefact from the proxy data.

Thus, pollen-based reconstructions support the paradigm of an increasing continuum of climate variability with increasing timescales8, in contradiction with the local temperature variability in current GCM simulations and the classical picture of ref. 6. More importantly, our results extend previous findings from instrumental data8 and demonstrate a fundamental link between inter-annual, multi-decadal and millennial timescales driven by the interaction of marine and terrestrial temperature variability modulated by continentality.

This fundamental behaviour of temperature variability has implications for the relative impact of natural and anthropogenically forced variability. High-latitude regions characterized by high inter-annual variability show a weaker oceanic regime and, ultimately, less natural variability on millennial timescales. As these regions are also highly sensitive to anthropogenic forcing, the impact of anthropogenic warming, relative to natural variability, will be greater. However, regions of strong maritime influence, where most of the world’s population is located, could see large natural variability that is not simulated by current GCM projections, which tend to display monotonous warming40. It is thus possible that, until now, the stronger natural variability at multi-decadal timescales in maritime regions has partly overshadowed the anthropogenic warming in those regions, which could explain their lower observational transient climate sensitivity40. Integrative archives such as glaciers should be particularly sensitive to this increased memory18 and could be used to verify our findings. Large compilations of climate archives have the potential to inform us on the spatial patterns of slow variability and their underlying causes. Further studies combining multiple proxies over land and ocean, and the development and inclusion of proxy system models48 for a more direct model-proxy comparison, show great promise to improve our understanding of the spatio-temporal correlation structure of climate variability.

Methods

Reconstructions

The fossil dataset LegacyPollen 1.049 includes taxonomically harmonized fossil pollen data from North America and Europe obtained via the Neotoma Palaeoecological Database50, and from Asia combining refs. 29 and 30. The datasets were combined, keeping the 70 most common taxa according to Hill’s second number. A fossil database comprising 1,744 records was retrieved based on the requirement that the resulting spectral estimates covered timescales of at least one-fifth of an order of magnitude, that is, 0.2 in a base-10 logarithm, below one-half the length (to avoid the well-known multitaper low bias at long timescales)51. Extreme outliers (with logarithm of PSD1,000–3,000 years more than four standard deviations away from the mean of all records) were discarded (n = 5), but their inclusion did not affect any of our conclusions.

The method and data for calibration agree with those of ref. 52. The weighted averaging partial least square (WAPLS)53 method was used to calibrate transfer functions relating the pollen assemblages to the summer temperature, with leave-one-out cross-validation to assess the model performance. The WAPLS method exploits the modern spatial relationship between a modern calibration dataset of pollen assemblages and the climatology to calibrate a transfer function that is then applied on the fossil assemblages to reconstruct timeseries. The modern climate data used for calibration were the average of the JJA climatologies, that is, the summer temperature, for the years from 1970 to 2000 as obtained from WorldClim 2.154. The modern pollen dataset used for calibration consists of 15,379 sampling sites. For each fossil record location, we selected a unique subset of modern sites within a 2,000-km radius so as to reduce the impact of false analogues55; if there were fewer than 30 samples within the radius, the reconstructions were not performed and the record discarded. The pollen percentages of both the modern and fossil databases were square-root-transformed to decrease the dominance of abundant taxa with high productivity. The number of retained WAPLS components was selected using a randomization t-test based on the criterion that adding a component should improve the root-mean-square error by at least 5%56. The transfer function with the retained number of components was then used to predict the summer temperature of the fossil assemblages, that is, the reconstructions. For the sensitivity tests, the same method was also applied to reconstruct annual mean temperature and annual precipitation (Extended Data Fig. 3), also using the climatologies from WorldClim 2.154 for the calibration of each variable.

Robustness of the reconstruction and RandomTF test

To test the robustness of the summer-temperature reconstructions, we performed replicability experiments following the methodology in ref. 5. For each continent, the reconstructions were divided into two sub-groups (odd and even indices after ordering them according to longitude) and averaged in the temporal domain after linear interpolation at a resolution of 100 yr and detrended using a 6.5-kyr LOESS filter. The result shows replicability of the millennial-scale variability for all continents, with good correlation between the odd and even sub-groups: r = 0.45 for North America, r = 0.59 for Europe and r = 0.34 for Asia (Extended Data Fig. 2a–c). These values are within the expectation for surrogate data with time uncertainty (Extended Data Fig. 2d–i and ‘Surrogate data and time uncertainty’).

Following ref. 57, we also tested the summer-temperature reconstructions for significance using the RandomTF test from the R package ‘palaeoSig’ and found that 528 of 1,744 reconstructions were substantial (P < 0.1). However, this significance test is rather conservative and there are several reasons for the creation of type II errors (false negatives), including a low diversity of taxa, a small number of sub-fossil observations, an input climate signal that is less variable or an inadequate training set57. Accordingly, a higher P value does not necessarily mean that the summer temperature has not been recorded, but rather that the information is insufficient to confirm it. The power of the RandomTF test can be low, as it is only sensitive to reconstructions with strong trends31. The use and caveats (particularly how it is prone to false negatives) of this test are debated and discussed in the literature58,59,60,61,62,63,64. Notably, in ref. 65, no correlation was found between the RandomTF P values and transfer function performance.

We also found no difference in performance based on the RandomTF test results. We reproduced the replicability analysis described above, but dividing the records into two groups of ‘significant locations’ (P < 0.1) and ‘not significant locations’ (P > 0.1). We would expect a clear improvement in odd–even replicability for the former if the test is skilful, but instead we find similar correlations (r = 0.52 and r = 0.54, respectively). Furthermore, the two groups are well-correlated together (r = 0.54), indicating they are recording the same millennial variability signal.

Therefore, instead of unduly discarding most records, we decided to include all records in the main analysis. We show that our conclusions are robust and continue to hold even if we restrict our analysis to the ‘significant locations’ only (P < 0.1), which yielded similar results to the ‘not significant locations’ (P > 0.1) (Supplementary Fig. 3).

Testing for anthropogenic impacts

We considered all series entirely covering 0–8 ka and defined two temporal windows: the more recent (0–4 ka) and the more distant (4–8 ka) past. The 1,000–2,000-yr timescale band was taken to calculate the variance ratio between the two 4-kyr windows. We only included those series in our analyses whose spectral estimates covered at least one-fifth of an order of magnitude (on a base-10 logarithmic scale) for both time periods, a criterion that was met by 859 records. We found no systematic variance increase in the more recent half of the series. In fact, the more recent period, where human impacts may have contributed to an increased variability, is ~10% less variable than the earlier one. Similarly, the spatial distribution (Supplementary Fig. 2) did not show any obvious spatial patterns that could be related to human occupation, displaying a non-significant Moran’s I of 0.016 (P > 0.1; ‘Moran’s I’). If human occupation was the dominant driver of millennial-scale variability, we would have expected to observe an increase in variability over both Europe and China, where human occupation has been increasing the most over the last 4,000 years compared to the preceding 4,000 years. We thus conclude that human impacts on vegetation did not have a substantial enough impact on the slow variability to systematically bias millennial-scale variability estimates.

Instrumental data

We considered two instrumental datasets—HadCRUT566 and the Berkeley Earth Surface Temperature (BEST)34 land and ocean product—both covering the period 1850–2020. As an additional test, the breakpoint-adjusted monthly station data provided by BEST were considered. The BEST dataset has the advantage of providing full spatio-temporal coverage at a high spatial resolution covering all pollen locations. On the other hand, it appears that the spatial autocorrelation model assumed for the interpolation leads to a bias low in variability compared to the non-infilled HadCRUT5 and the station data (Extended Data Fig. 10). The choice of the optimal instrumental dataset is thus a trade-off between minimizing variance biases from the reconstruction method and minimizing the spatial sampling bias. Both datasets were used for the global analysis (Fig. 1) because the spatial sampling bias of HadCRUT5 is not important for the average spectrum of extra-tropical land regions (Extended Data Fig. 10). However, because it is important for spatial comparisons with the reconstructions (Figs. 2 and 3) to have both datasets at the same location, BEST was used as it provides spatial coverage of all the pollen record sites. The BEST equal area product was used for calculations, and the regular 1° × 1° product was used for visual display. The instrumental data were detrended from anthropogenic influences to a first-order component67 proportional to historical timeseries of doublings in atmospheric carbon dioxide concentration (that is, log(CO2))68. As this detrending might lead to a negative bias on the lowest frequencies, we show the series with and without log(CO2) detrending in Fig. 1, but generally perform the detrending for the subsequent results. We note that all our analyses are robust to these choices, or to the use of alternative instrumental datasets69.

As an additional test of the increase in variability at longer timescales that is shown in pollen and tree-ring records, we also considered long instrumental timeseries from the BEST station data. We identified 70 series that covered more than 170 years (the length of the gridded product) of summer-JJA temperature with gaps of at most five years. They further support the scaling increase at multi-decadal timescales, in agreement with the dendrochronological data (Extended Data Fig. 4).

Model data

Three model simulations of the recent Holocene were considered: IPSL35, ECHAM536 and CCSM370. The first two are recent Holocene transient simulations of the past 6,000 years, and the latter is the TraCE-21ka deglaciation experiment37. We only retained the last 8,000 years of TraCE-21ka because it is comparable to the recent Holocene transient simulations, containing no more freshwater forcing events, which were the main drivers of deglaciation. We selected the average 2-m summer air temperature (JJA) and averaged it annually.

Because the long-term trends in summer temperature are linked to the precession in Earth’s orbit, we also analysed the timeseries after detrending for a 23-kyr sinusoid rather than using the standard linear detrending performed before computing spectral estimates. This approach attempts to minimize power leakage from the orbital forcing frequencies onto the observed frequencies (Fig. 1a, lines). The reduction in leaked power was not nearly as important in the case of the pollen-based reconstructions (Fig. 1a and Extended Data Fig. 5).

Spectral estimates

The PSD estimates were calculated using the multitaper method71, adapted for irregular sampling through linear interpolation72, with number of tapers ntapers = 3 and time-bandwidth parameter ω = 2, which yield up to ntapers × ω = 6 degrees of freedom for the individual spectral estimates. Standard linear detrending was performed before computing the multitaper spectrum. Only timescales greater than twice the maximal resolution were kept, in order to minimize power loss due to the interpolation47. When averaging several spectra from timeseries with different temporal resolution (and thus different timescale/frequency sampling), the spectra were first linearly interpolated on a common timescale axis and the arithmetic mean taken. The PSD estimates were smoothed using a Gaussian kernel with a constant width of 0.03 in the logarithm (base 10) of the timescale73.

The confidence intervals were derived from the chi-squared distribution (χ2) of the PSD estimates. For the mean spectra, the degrees of freedom of each spectral estimate account for the frequency-dependent smoothing kernel and the number of individual spectra, limited by the expected effective spatial degrees of freedom at a given timescale, Given that the spatial coverage of our results is limited to extra-tropical land regions, we assumed a conservative value of a maximum of five effective spatial degrees of freedom for all timescales74. The degrees of freedom of the χ2 distribution were estimated by moment-matching from an ensemble of surrogate data consisting of 1,000 realizations of fractional (or power-law) noise75, each time simulated with the estimated scaling exponent of the spectrum.

The mean spectra are more representative of the regions most sampled by the pollen records. To obtain comparable estimates, the instrumental and model timeseries were extracted at the same locations and processed in the same way. The mean local spectra of instrumental summer temperature at the location of pollen records and for the entire extra-tropical land regions (Extended Data Fig. 10, light and dark green) are very similar, thus supporting that the space sampled by the pollen records is representative for the extra-tropical land regions.

We conducted a sensitivity study to test the effect of potential non-climatic noise in reconstructions of the spectral estimates (Fig. 1). We assumed that the main noise sources, from aliasing and from the finite number of pollen grains and a misspecification of pollen taxa, are uncorrelated between samples (‘white’). To do this we computed the expected spectrum of a given record when replacing the temperature signal by a white noise and subtracted it from the reconstructions9. The confidence interval in Fig. 1 includes the χ2-based confidence interval of the spectrum of the original reconstructions and that of the reconstructions when subtracting a noise level of 0.5 K from all records. This shows that the uncertainty is highest on the faster timescales, as here the relative effect of the noise is largest. The provided confidence intervals should be a conservative bound, as the noise level of 0.5 K corresponds to more than twice the median spread of the leave-one-out cross-validation ensembles.

Surrogate data and time uncertainty

To test the robustness of local spectral estimates to time uncertainty and irregular sampling in the pollen records, we produced surrogate data using the TraCE-21ka model simulation37. The simulated summer-JJA temperature was extracted at the location of pollen records and degraded to the resolution of the corresponding pollen records by sub-sampling the series after a low-pass filter with a characteristic timescale of twice the mean resolution75. In this way, a database analogous to the pollen database was created. We then introduced time uncertainty based on the statistical uncertainty from the radiocarbon dating of the Bacon age models76. The pollen database has a median time uncertainty of 250 yr, which is too optimistic77, as it does not take into account other effects such as, for example, uncertainty in the reservoir age, temporal variations in the reservoir age and the effect of pre-aged material and dated material not being representative of the sediment layer because of sediment mixing. We thus conducted a sensitivity test by amplifying the uncertainty around the mean age model of each record by factors of √2 and further added a random offset to each core to model the reservoir age effect, each time drawn from a normal distribution with a mean of 500 years and a standard deviation of √2 × 250 years. As the uncertainties add in quadrature, we obtained a database with a time uncertainty of 500 years. This is a realistic, but possibly optimistic, estimate according to the few studies comparing radiocarbon-based lake chronologies to independent constraints from widespread ecological events or to independent dating techniques78,79,80.

In analogy to the spectral estimation of the true pollen timeseries, we estimated the average local spectra of temperature variability from the time-uncertain surrogate timeseries. The resulting mean spectra are consistent between the original annual model data and the surrogate databases with and without time uncertainty (Extended Data Fig. 1), showing that our method is robust to irregular resolution of our timeseries and realistic estimates of time uncertainty.

To study the spectrum of spatial means, we computed the average of the surrogate timeseries with and without time uncertainty after interpolating each series to a 100-yr resolution following ref. 5. The spectra of average (hemispheric) temperature show strong power loss at all timescales due to the irregular resolution and time uncertainty (Extended Data Fig. 1). Only the variability at the longest timescales is preserved, as the long-term trend is robust to these effects. This shows that the amplitude of variability from mean timeseries over large spatial scales cannot be quantitatively interpreted without knowledge of the amount and structure of time uncertainty.

Variance ratios

Variance ratios were computed by taking the ratio between the mean PSD over the same timescale band between different series after interpolating in the spectral domain. Because the ratio of two χ2-distributed variables follows an F-distribution, the ratios were multiplied by (d – 2)d−1, where d is the number of degrees of freedom of the denominator27.

Sub-decadal variability binning

The data were aggregated based on the mean sub-decadal variability PSD2–10 years, defined as the mean PSD over the 2–10-yr band. We calculated PSD2–10 years for each of the 624 instrumental grid points for which pollen records were present nearby, ordered the results, and split them into eight non-overlapping bins (Extended Data Fig. 7c,d). Each pollen record was assigned to the nearest instrumental grid point and averaged in the spectral domain. Varying the number of bins, for example, using 20 bins instead of 8, led to similar correlations. The standard errors of the millennial variability estimates were used as weights for the correlation calculation (using Pearson’s correlation) and for the visual representation in Fig. 3b,c. The P values were calculated taking into account the spatial autocorrelation of the fields and the binning process. We performed Monte-Carlo experiments using surrogate fields generated based on the randomization of the phase of the two-dimensional Fourier transform81. Ten thousand surrogate fields were thus generated for PSD2–10 years and β10–60 years, and the analysis was repeated to obtain the correlation coefficients after binning, from which the P values were calculated based on the empirical quantile. The significance is not sensitive to the number of bins (Supplementary Fig. 4).

Relative land influence index

As a further test of the relationship between millennial variability and marine influence, PSD1,000–3,000 years was compared to the RLI metric developed by McKinnon and colleagues46 (Extended Data Fig. 9). The RLI quantifies the relative contributions of land and ocean to atmospheric fluxes at a given location using a Lagrangian trajectory model and reanalysis data. The expected relationship, less millennial variability for higher RLI, is observed, except for the very high RLI values (RLI > 0.78, indicated in red in Extended Data Fig. 9a). This is further evidenced by binning the reconstructions according to RLI into eight bins of 198 records for those with RLI < 0.78, and a further ninth bin with the 158 records with RLI > 0.78 (Extended Data Fig. 9b). The latter contains records in Siberia, Central Asia and East Asia where the RLI is highest (Extended Data Fig. 9c), but the millennial variability remains high (except for the records in Siberia, which follow the expected trend). This could either be related to strong local climate sensitivity, which may be linked to hydrological feedback due to the arid conditions42 or to the snow-albedo feedback at higher elevations45, and to a bias in the pollen-based temperature estimates from precipitation-sensitive vegetation in this region29. Also, the discrepancy between RLI and the sub-decadal variability over East Asia suggests that RLI might not be an appropriate metric of continentality there. Notably, East Asia is one of the few regions where an EBM using RLI as a mixing parameter does not reproduce the seasonal gain (associated to continentality) skilfully25. The P value was obtained in the same way as for the sub-decadal variability binning, accounting for the spatial autocorrelation and the distribution of the fields.

Moran’s I

Moran’s I spatial autocorrelation index was calculated using the method from ref. 82 as implemented in the R-package ‘ape’83. The weight matrix used corresponds to the inverse of the distance between sites.