Introduction

Earth’s climate is a complex nonlinear system in which multiple feedback mechanisms control the stability, variability and/or abrupt transitions between different climatic states (see e.g. Refs.1, 2). Such feedbacks generate internal, intrinsic climatic oscillations and can amplify or damp the effects of external forcing factors, see e.g. Refs.3,4,5,6,7,8. In such a framework, identifying cause-effect relationships from an observed behavior is often difficult, and refined mathematical approaches and data analysis methods able to go beyond correlation estimates are needed to disentangle causation links, as discussed for example in Refs.9,10,11.

One outstanding example of climate variability is the case of glacial-interglacial oscillations of the Pleistocene. In the last three million years, Earth’s climate fluctuated between prolonged glacial periods, slowly developing through global temperature decrease and the build-up of extended ice sheets, and shorter interglacials with milder climate, generated by a relatively rapid (in geological sense) melting of the ice12,13,14,15. Such glacial cycles are believed to be a nonlinear reaction of the climate, or of some of its sub-systems, to the slow variation of the orbital forcing via amplifying feedbacks16. The Antarctic ice cores drilled at Vostok17 and, more recently, by the EPICA project18 have revealed the details of the glacial oscillations in the last few hundred thousand years19, 20. Approximately synchronous variations of the reconstructed Antarctic temperature and of carbon dioxide concentration in the paleo-atmosphere are visible21, 22, although the precise lead-lag relationships between (Antarctic) temperature and CO\(_2\) concentration are still a matter of debate. In particular, such lead-lag relations could vary on both the time scale and the specific period considered23. For example, a detailed correlation analysis of a high-resolution Antarctic ice core has indicated that during the last glacial period there is a lagged variation of CO\(_2\) with respect to temperature on millennial time scales, which however becomes more complex at centennial time scales24. The interpretation of such time-scale dependence of the lead-lag relationships between CO\(_2\) and T can be offered in terms of the presence of multiple mechanisms at different time scales23. One possibility is the interplay of a slower process associated with the reorganization of the Southern Ocean carbon cycle, and faster (possibly abrupt) processes associated with Northern Hemisphere Dansgaard-Oeschger events24, 25. Of course, whether the dynamics have been considered synchronous depends a lot on the quality of the age model and on whether lagged correlation or actual differences in specific change points are considered.

On the other hand, the correlation between two variables is not, in itself, a reliable measure of causation, as already pointed out in Ref.23 for paleoclimate dynamics. A typical case in which correlation fails to catch the underlying causal structure is when two mutually independent variables \(x_1(\tau )\) and \(x_2(\tau )\) are driven by a common forcing \(f(\tau )\), \(\tau\) being the absolute time. In this case, a strong correlation can be easily misinterpreted as causation. This situation is frequently encountered in the climate system, as discussed in Ref.10. A relevant challenge is thus to disentangle the cause-effect relationships from the analysis of the two signals. In past works on glacial oscillations, the issue of causation has been addressed by the work in Ref.26, which looked at the causal structure of the temperature-CO\(_2\) concentration relations using an information flow approach27. However, confounding factors can be present and different causal relationships can exist on different time scales23. These issues should not be overlooked and they are further addressed in the present manuscript.

One reliable definition of cause-effect relationship, able to take into account different behaviors at different scales, is based on the observation of the average trajectory of \(x_2(\tau )\) after an active perturbation of the variable \(x_1(\tau )\) has been performed. This kind of “probing” resembles the idea behind the mathematical formal definition of causation by Judea Pearl28. In dynamical systems it can be characterized by looking at the linear response matrix function29, 30

$$\begin{aligned} R_{ij}(t;\tau ) = \frac{\overline{\delta x_i(\tau +t)}}{\delta x_j(\tau )}. \end{aligned}$$
(1)

Here \(\delta x_j(\tau )\) is the value of an instantaneous, external perturbation operated on the variable \(x_j\) at time \(\tau\), while \(\overline{\delta x_i(\tau +t)}\) is the average (over many repetitions of the experiment) of the difference between the perturbed trajectory of \(x_i(\tau )\) and its unperturbed evolution. In what follows, we will assume that the above defined quantity does not depend on the absolute time \(\tau\), but only on the lag t, so that \(R_{ij}(t;\tau )\equiv R_{i,j}(t)\). From Eq. (1) it can be deduced that \({R}_{ij}(0)=\delta _{ij}\) by definition, where \(\delta _{ij}\) is the Kronecker-delta. Thus the diagonal entries decay from the starting value \(R_{ii}(0) = 1\), while the off-diagonal entries grow from the starting value \(R_{i\ne j} = 0\). From a physical point of view, this simply means that the variables of a system cannot generate an immediate reaction on the others, as some (possibly very small) delay has to occur between a “cause” and its “effect”.

Of course, Eq. (1) cannot be applied to any problem for which only time series referring to past events are available. However, a series of well known results of response theory show that \(R_{ij}(t)\) can be written in terms of time correlations of suitable quantities31, 32. One of the possible formulations of this principle, sometimes called generalized Fluctuation–Dissipation Relation (generalized FDR), is:

$$\begin{aligned} R_{ij}(t) = -\left\langle x_i(t)\frac{\partial }{\partial x_j}\log P( \mathbf {x} )\big |_{ \mathbf {x} (0)}\right\rangle . \end{aligned}$$
(2)

Here \(P( \mathbf {x} )\) is the stationary probability distribution of the whole phase-space vector \(\mathbf {x} =(x_1,x_2, \ldots , x_n)\) describing the system dynamics; the vector \(\mathbf {x}\) is meant to include all the variables that determine the behavior of \(x_i\) and \(x_j\).

In particular, if the dynamics of a system of n variables \(x_1,\dots ,x_n\) is linear, the matrix of the response functions simplifies to (see Supplemental Material for a complete derivation)

$$\begin{aligned} \mathbb {R}(t)=\mathbb {C}(t) \mathbb {C}^{-1}(0)\,, \end{aligned}$$
(3)

that is easily determined from the elements of the correlation matrix, \(C_{ij}(t)= \langle x_i(\tau + t)x_j(\tau )\rangle\), of the available data sets [where \(\mathbb {C}^{-1}(0)\) is the inverse matrix of \(C_{ij}(0) = \langle x_i(\tau )x_j(\tau )\rangle\)]. Hereafter, we will denote with \(\mathbb {M}\) a matrix and with \(M_{ij}\) the scalar values of its entries. To appreciate the difference with a simple correlations analysis, it is useful to explicitly write the matrix (3) for a two-dimensional system [x(t), y(t)], where the inversion of \(\mathbb {C}(0)\) can be easily performed. We assume that x(t) and y(t) have zero average and unitary variance, as in the following we will always consider normalized signals of this sort. In this simple case, the 2\(\times\)2 response matrix reads

$$\begin{aligned} \mathbb {R}(t) = \begin{pmatrix} \dfrac{C_{xx}(t) - C_{xy}(0)\;C_{xy}(t)}{1-C_{xy}^2(0)} &{} \dfrac{C_{xy}(t) - C_{xy}(0)\;C_{xx}(t)}{1-C_{xy}^2(0)} \\ &{} \\ \dfrac{C_{yx}(t) - C_{xy}(0)\;C_{yy}(t)}{1-C_{xy}^2(0)} &{} \dfrac{C_{yy}(t) - C_{xy}(0)\;C_{yx}(t)}{1-C_{xy}^2(0)} \end{pmatrix}. \end{aligned}$$
(4)

The relation \(C_{xx}(0) = C_{yy}(0) = 1\), following from the normalization of the data, has been used. In general, \(C_{xy}(t)\) is different from \(C_{yx}(t)\), the symmetry \(C_{xy}(0) = C_{yx}(0)\) holding true only for \(t=0\). We remark that, even if the response can be computed as a combination of correlation functions, it provides information about the causal structure of the system which could not be deduced from cross-correlations alone29. Note that this result is quite robust: the presence of small nonlinearities in the dynamics is not expected to spoil the ability of Eq. (3) to detect causal links30.

In principle, the generalized FDR solves the problem of inferring causal relations for any dynamical system, but its application strongly relies on the assumption that the dynamics of the chosen set of observables does not depend on any variable that is external to the system (i.e., in the language of stochastic processes, that the dynamics is Markovian). Moreover, the shape of the steady state distribution has to be known, at least approximately.

The application of this kind of analysis to the EPICA paleoclimate data is thus limited by two factors: (i) the lack of knowledge of a proper set of variables \(\mathbf {x} =(x_1, \ldots , x_n)\) fully describing a Markovian dynamics and (ii) the relative shortage of data (about 1600 measurements, covering a range of 800 kyr), which would not allow a reliable estimate of the joint probability distribution \(P( \mathbf {x} )\), even if a valid set of observables \(\mathbf {x}\) was known.

In this paper, we focus on the variations of the carbon dioxide concentration, [CO\(_2\)], and the reconstructed temperature T, during glacial-interglacial oscillations. Assuming a sufficient time-scale separation between the astronomical forcing (with a typical time of the order of 20 kyr or more) and the internal climate variability on millennial or shorter time scales, we delineate a qualitative analysis of the causation relations between [CO\(_2\)] and T in the last 800 kyr. The strategy is based on the definition of a proper set of “fast” components whose dynamics is assumed to mainly reflect internal climate variability, associated with the interaction of the different climate sub-systems. Notice, however, that the intensity of such fast fluctuations could be modulated by the climate background state and thus by the astronomical forcing33. Here, the important hypothesis is that the fast oscillations are not completely “slaved” to the slow forcing. Within the reasonable assumption that the mutual interactions of the fast components can be approximately linearized, the response on fast time-scales of the order of one to two kyr can be inferred by exploiting Eq. (3), which, for these variables, can be evaluated with good accuracy even with the available quantity of data. Special attention should be given to the temporal resolution of the paleoclimatic data, which, in some portions of the record, may hamper the ability to safely detect millennial-scale oscillations, a point that is further addressed below and in the “Methods” section. Important messages of this work are that (a) FDR analysis provides relevant information on the causation relationships in (paleo)climate signals, (b) considering only unfiltered data including all time scales can lead to incomplete results, masking the possible emergence of more complex causal relationships on specific time scales, and (c) there is a clear long-term temporal variation in the strength of the millennial-scale causal relationships between temperature and CO\(_2\) concentration.

Results

As discussed in the Introduction, the generalized FDR can be used to unravel causal links between the variables of a physical system by analyzing their correlations, provided that (i) the dynamics is not subject to an external common driving which simultaneously forces several variables and (ii) the stationary distribution is known, at least approximately. The coupled dynamics of T and [CO\(_2\)] in the last 800 kyr does not fulfill any of these two conditions, as their behaviour is heavily conditioned by the external astronomical driving, and the relatively small amount of available data (about 1600 measurements spanning the whole record) does not allow to reconstruct a reliable joint probability distribution for the two quantities.

In the “Methods” section, we show that both the above difficulties can be circumvented as long as there is a sufficient scale separation between the (slow) typical time scale of the external driving and the (short) characteristic times of the interaction between the variables. This is indeed the case for Pleistocene climatic variability, where the time scale of the external astronomical forcing is of the order of 20 kyr or more (Milankovitch cycles16), while intense climatic fluctuations occur on a much faster time scale, of about 1 kyr.

The key ingredient for the analysis is thus a proper high-pass temporal filtering of the signals: before applying the FDR, we subtract from the time series of T and [CO\(_2\)] a running average (see “Methods” for details) over windows, \(T_w\), of a few kyr. This basically allows to filter out any possible spurious correlation due for example to a common influence of the slow external forcing, while keeping the relevant information on the short-time mutual relationship. In addition, the distribution of the filtered variables is approximately Gaussian (“Methods” section, Fig. 4): this is consistent with the working hypothesis that the dynamics on the fast scales is approximately linear, and that the use of FDR in the form of Eq. (3) is justified.

The results of the analysis are shown in Fig. 1, where the response function computed with the generalized FDR is plotted as a function of time. The approach based on a straightforward application of Eq. (3) to the unfiltered data would suggest that the relative influence of temperature on CO\(_2\) concentration, \(T\rightarrow [\)CO\(_2]\), is much stronger than the reversed causal link [CO\(_2]\rightarrow T\). When the formula is instead applied to the filtered data, a different scenario is observed. For the whole time series, the relative influence of the temperature on [CO\(_2\)] is at most \(\simeq 0.1\) (in a scale in which the self-response at time 0 is set equal to 1), and it almost vanishes for lags beyond 2 kyr; on the other hand the intensity of the [CO\(_2]\rightarrow T\) causal relationship increases with respect to what is observed without high-pass filtering, doubling that of the reverse relation on the scale of 1 kyr. The causal link [CO\(_2]\rightarrow T\) does now vanish at longer lags, displaying an oscillating behavior as shown also by the Transfer Entropy results.

Figure 1
figure 1

Analysis of the mutual influence between temperature T and CO\(_2\) concentration in paleoclimate data. Panels (a) and (b) show the deviation from average of the two signals as a function of time (blue curves), normalized by the standard deviation. The daily mean insolation at \(65^{\circ }\) N summer solstice, revealing the typical time scales of the external driving, is also reported (red curves; see “Methods” sections for details on the data sources). In panels (c) and (d) the response function, computed according to the Generalized FDR, is plotted. The analytical formula is given by the non-diagonal elements \(R_{xy}(t)\) and \(R_{yx}(t)\) of the linear response matrix (4), where x and y are the normalized [CO\(_2\)] and T signals shown in panels (a), (b) (and their high-pass filtered analogues). Panel (c) refers to the effect of T on [CO\(_2\)], while panel (d) shows the opposite relation. Red circles represent the results of a direct application of Eq. (3) on raw data, apparently suggesting that \(T\rightarrow\)[CO\(_2\)] is stronger than [CO\(_2\)]\(\rightarrow T\). The response on data filtered over \(T_w=3\) kyr window (blue squares) instead indicates that the impact of [CO\(_2\)] on T becomes larger. The result is robust with respect of \(T_w\) variations by one kyr (green up/down triangles). A similar analysis, where TE are computed instead of generalised FDR, is shown in Panels (e) and (f). Here, we have considered the data for which the temporal resolution of the temperature record has been degraded to become similar to that of CO\(_2\) concentration, as discussed in the “Methods” section. For the undegraded temperature data, the role of CO\(_2\) driving is even larger, see Fig. S5 of the Supplemental Material.

As discussed in the “Methods” section, the width \(T_w\) of the time window for the filter has been chosen to be 3 kyr. Figure 1 also shows that the method is quite robust with respect to the choice of \(T_w\): indeed, we obtained an analogous behaviour of the response functions using \(T_w = 2\) kyr and \(T_w = 4\) kyr.

An important point concerns the temporal resolution of the two signals considered. Figure S4 of the Supplemental Material shows that the resolution of the CO\(_2\) record is generally coarser than that of temperature, and both vary in time. To avoid possible spurious effects generated by the different temporal resolution and the different weight of the spline interpolation, we opted for degrading the resolution of the temperature signal to that of the carbon dioxide concentration (and vice-versa, in the few intervals where the temporal resolution of [CO\(_2\)] is higher than that of T). Considering instead the results of the original (non degraded) data, as shown in Fig. S5 of the Supplemental Material, the main messages do not change.

A similar analysis, in which the transfer entropy (TE) between the two variables is computed before and after the filtering procedure (Fig. 1e,f), confirms that a high-pass filtering of the data is crucial to elucidate the causal relationships between temperature and CO\(_2\) concentration in the last 800 kyr of the Pleistocene. A brief discussion on the concept of TE, which may be regarded as complementary to that of response, is given in the “Methods” section, where also further details about its application in this context are provided. Here it is worth noticing that the qualitative results are similar to those achieved with the generalized FDR approach: a direct application of the formula to the raw data would suggests a large influence of T on [CO\(_2\)]; on the other hand, a proper temporal filtering leads to a more complex picture. A remarkable similarity between the behaviour of the two observables can be appreciated in Fig. 1d,f (recalling that TE, at variance with response, is a positive-definite quantity).

Discussion

The issue of which climatic signal drives which in the glacial-interglacial record is widely debated. For example, the analysis of Caillon et al.34 indicated that CO\(_2\) lagged Antarctic deglacial warming by 800 ± 200 years during a specific deglaciation event (Termination III 240,000 years ago). Subsequently, Parrenin et al.22 found no asinchronicity between Antarctic temperatures and CO\(_2\) variations in the last deglaciation event (Termination TI), even though the situation is not always clear and it can vary with time35, 36. The work of Stips et al.26, based on the use of the information flow to detect causal relations, revealed a complex pattern of cause-effect links, with a predominance of the Antarctic temperature driving CO\(_2\) concentration when the whole record is considered. The analysis of millennial-scale fluctuations in the last glacial period showed that CO\(_2\) seems to lag temperature by 500–1000 yr24, while more complex relationships may exist on centennial time scales. Finally, the work of van Nes et al.23 concluded that different relationships can exist on different time scales. Clearly, crucial to all these lag analyses is the availability of a safely calibrated time scale for both temperature and CO\(_2\). In any case, here we found that the maximum value of the FDR for the effect of the CO\(_2\) concentration on temperature for the high-pass filtered data is found between 500 and 1500 yr when considering the whole 800-kyr record. Thus, even an uncertainty of the age model of the order of 500 yr does not qualitatively change the results.

Here, we analysed causality links adopting the generalized Fluctuation–Dissipation Relation (see the “Methods” section for a detailed discussion of how this approach works), further confirming the results using the Transfer Entropy method. The main finding is that, using the data from Refs.19, 20, we detected a causal link of temperature on [CO\(_2\)] when considering the whole unfiltered record that includes both millennial-scale fluctuations and longer-term glacial-interglacial oscillations, in keeping with previous results26. On the scales of the astronomical forcing, albedo changes could drive temperature variations and consequently affect the whole cascade of climatic processes, including CO\(_2\) changes. The causal link T \(\rightarrow\) [CO\(_2\)] could thus be generated either by slow climatic processes, such as the global ocean’s temperature-dependent ability to store CO\(_2\), or simply reflect the fact that both climatic signals are controlled by a common driver, namely, the astronomical forcing with the related changes in summer solar insolation at high latitudes.

On the other hand, the significant novelty of our analysis is that the high-pass filtered paleoclimatic signal, including only fluctuations on scales of a few millennia, displays a more complex pattern of causal relationships, with mutual driving of [CO\(_2\)] and temperature. This result is robust with respect to the precise value of the threshold used in the high-pass filter, which was varied between 2 and 4 kyr.

The results shown in Fig. 1 refer to the whole 800-kyr temporal period covered by the record, and they differ from the outcomes of some of previous analyses, performed on other signals spanning a more limited time range24. This supports the view that the strength of the causal links between temperature and CO\(_2\), or more generally between the various components of the climate system, can vary with time, in line with the conclusions of Ref.23. Such view is confirmed by Figs. S6 and S7 of the Supplemental Material, where we analysed the first or the second half of the record. The results indicated that the strength of the causal links is different in the two periods. On millennial time scales, we always observed a mutual effect between [CO\(_2\)] and T. In particular, the link [CO\(_2\)] \(\rightarrow T\) is much stronger in the first 400 kyr of the record (Fig. S6), while in the last 400 kyr the reverse link \(T \rightarrow\) [CO\(_2\)] becomes more relevant (Fig. S7).

To further explore this issue, in Fig. 2 we show the value of the Generalized FDR for \(T \rightarrow\) [CO\(_2\)] and [CO\(_2\)] \(\rightarrow T\) at lag 1 kyr (where the first maximum of the FDR is generally located), for a moving window of 250 ky slided along the whole time series. Interestingly, the effect of [CO\(_2\)] on temperature is stronger in the earlier part of the record and it decreases in the course of time, while the reverse link \(T \rightarrow\) [CO\(_2\)] grows in the early part of the record and then stabilizes at an approximately stationary value. Towards the end of the record, the two reverse links have comparable strength. At present, given the limited amount of data it can be difficult to disentangle between real variability and statistical fluctuations, but the results suggest a variability in the relative importance of the causal links between temperature and CO\(_2\) concentration. As a word of caution, we also mention that some of the inferred changes in causal relationships could be a result of changes in the data properties and the possibly non-stationary resolution properties of the two time series. In any case, these complex causal relationships would have been completely lost if we had considered only the unfiltered data.

Figure 2
figure 2

Values of the Generalized FDR at lag \(t=1\) kyr for the link \(T \rightarrow\) [CO\(_2\)] (a) and [CO\(_2\)] \(\rightarrow T\) (b), computed in a moving window with width \(\Delta =250\) kyr, slided along the whole record. All details as in panels (c) and (d) of Fig. 1.

A detailed analysis of the climatic processes inducing millennial-scale changes in CO\(_2\) concentration is beyond the scope of this work. Here, we simply mention that the causes of the fluctuations of CO\(_2\) concentration on such time scales are widely debated and still not fully clear, but most interpretations involve the role of CO\(_2\) outgassing associated with changes in the ocean overturning circulation and/or marine ecosystem functioning37, 38. The negative value of the FDR, observed at a lag of about 3 kyr, can also point to a coupled oscillation in the climate system, although its nature remains currently undetermined. In any case, the results of our analysis support the view that internal climate mechanisms, rather than direct orbital forcing, are responsible for the main variability at millennial time scales in the last 800 kyr, in keeping with the conclusions of Ref.23. Further work using simple models such as that of Ref.24 could help further addressing this issue; in this respect, it is worth noticing that the conclusion of Ref.24 are consistent with our results for the last part of the analyzed time interval (see Fig. 2).

We emphasize that the results of the analysis presented here have to be regarded as qualitative. In fact, the relative scarcity of currently available data does not allow to claim the detection, within reasonable accuracy, of the detailed causal structure of glacial-interglacial dynamics on the whole spectrum of time scales. From a methodological point of view, our work clearly shows that the direct application of causality detection methods to unfiltered data may provide only a part of the story. The results reported here indicate that causality analysis can be a powerful approach to study paleoclimatic signals (such as multiple isotope records from ice cores, speleothems or sediments), provided that the data set is long enough and almost-linear interactions between the relevant climatic variables can be assumed on the temporal scales of interest.

Methods

Dataset

The time series of reconstructed temperature and CO\(_2\) concentration analyzed here are obtained from the EPICA Dome C ice drilling project in Antarctica, as described in Ref.19. Here we use the data described in Ref.20, where the CO\(_2\) concentration record was obtained by blending different ice cores and the chronology for the first 200 kyrs was revised and improved. In comparing the CO\(_2\) concentration and temperature records, the issue of the gas-ice age difference (the so-called delta age) and its uncertainty should be kept in mind39. Here, we adopt the chronology indicated in the Ref.20. The time series of the insolation was calculated using the software provided at the site https://sites.google.com/site/geokotov/software and on the reconstructions of Ref.40.

Response function in the presence of slow external driving

In this section we export the FDR formalism to cases in which slow time-dependent external driving is present. This is relevant in the climate context, where insolation drives the system on very slow time scales, compared to those for which experimental data are available.

First we will show that, for the considered class of models, the response function can be written in terms of correlation functions of suitably defined fast components of the dynamics. These components can be estimated, within reasonable approximations, from a proper filtering of the time series of the original variables. An example with a toy model is then discussed to illustrate the analytical results.

Application of generalized FDR

We will limit ourselves to the study of models in which the dynamics of the n variables \({x_i},\,i=1,\ldots ,n\) representing the state coordinates can be written as

$$\begin{aligned} \dot{ \mathbf {x} }=-\mathbb {A} \mathbf {x} + \mathbf {c} f(t)+ \varvec{\xi } (t)\,, \end{aligned}$$
(5)

where A is a \(n\times n\) invertible, positive-definite and diagonalizable matrix; \(\mathbf {c}\) is an n-dimensional vector of amplitudes, \(\xi\) denotes a \(\delta\)-correlated diagonal noise \(\langle \xi _i(t) \xi _j(s)\rangle = D_i\,\delta (t-s)\delta _{ij}\). We call \(\tau _0\) the relaxation time of the free dynamics (i.e., without the forcing term), determined by the inverse of the spectral radius of A. The conditions on A insure that the dynamics will not diverge in time. The diagonalizability requirement could actually be relaxed, as it is not essential to the proof, but it allows to simplify calculations: see Sec. 2 of the Supplemental Material for a brief discussion on this point. For our application to paleoclimate time series, it is reasonable to consider a slow forcing (e.g. periodic, or quasiperidic, with long periods)

$$\begin{aligned} f(t)=\sum _{i=1}^l a_i \cos (t/\tau _i+\phi _i)\,, \end{aligned}$$
(6)

where \(\{a_i\}\) and \(\{\phi _i\}\) are dimensionless constants O(1). We assume that the \(\{\tau _i\}\) are much larger than \(\tau _0\), i.e.

$$\begin{aligned} \tau _l\ge \tau _{l-1}\ge \cdots \ge \tau _1\gg \tau _0. \end{aligned}$$
(7)

We are interested in the response of the system to an instantaneous perturbation \(\mathbf {x} (0) \rightarrow \mathbf {x} (0) + \delta \mathbf {x} (0)\). In particular, we want to compute the response function (1), assuming that we ignore the details of the model, and that we only have access to the measured trajectories of the system. The generalized FDR (2) cannot be used tout court in this context; indeed, due to the presence of the external forcing, f(t), the dynamics is not Markovian, because our set of variables \(\mathbf {x}\) does not completely describe the state of the system.

It is then useful to decompose the full dynamics (5) into “slow” \(\mathbf {x} _S\) and “fast” \(\mathbf {x} _F= \mathbf {x} - \mathbf {x} _S\) components, evolving as

$$\begin{aligned} \dot{ \mathbf {x} }_S= & {} -\mathbb {A} \mathbf {x} _S+ \mathbf {c} f(t) \end{aligned}$$
(8)
$$\begin{aligned} \dot{ \mathbf {x} }_F= & {} -\mathbb {A} \mathbf {x} _F+ \varvec{\xi } (t)\,. \end{aligned}$$
(9)

The above definition identifies a slow set of variables as those whose dynamics is only affected by the slow external forcing, while the uncorrelated noise is only present in the fast variables evolution.

Let us assume that the instantaneous change \(\mathbf {x} (0) \rightarrow \mathbf {x} (0) + \delta \mathbf {x} (0)\), occurring at time \(t=0\), entirely affects the fast components \(\mathbf {x} _F\). Of course we can always make such an assumption, as the only constraint imposed by the definition (9) is that the sum of \(\mathbf {x} _F(0)+ \mathbf {x} _S(0)\) is increased by \(\delta \mathbf {x} (0)\). Denoting with “\(\mathbf {x} ^{(P)}\)” the perturbed dynamics one has therefore

$$\begin{aligned} \mathbf {x} ^{(P)}(t)- \mathbf {x} (t)= \mathbf {x} ^{(P)}_F(t)- \mathbf {x} _F(t)\quad \quad \forall t>0\,, \end{aligned}$$
(10)

which follows from the independence of the evolutions, \(\mathbf {x} _F\) and \(\mathbf {x} _S\). By definition, Eq. (1), the response function for the complete dynamics \(\mathbf {x}\) is equal to that of the fast variables \(\mathbf {x} _F\).

The physical meaning of this choice is easily understood in the context of paleoclimate, where the slow dynamics can be associated to the effect of the astronomical forcing, while the behaviour of the fast components is meant to be related to the internal climate dynamics. In this case our choice is equivalent to saying that the latter components are actually modified by an instantaneous perturbation (e.g. a large emission of CO\(_2\) due to a volcano eruption), while the former, which only depend on astronomical motion, are not affected by this kind of events.

At this point, if the trajectories of the fast components, \(\mathbf {x} _F\), were accessible, a plain employ of Eq. (2) would be possible, since the fast dynamics does not depend on the external forcing f(t) and it is therefore Markovian. Moreover, since it is also described by a linear model, we could straightforwardly apply Eq. (3) and get:

$$\begin{aligned} \mathbb {R}(t)=\mathbb {C}_F(t)\mathbb {C}_F^{-1}(0) \end{aligned}$$
(11)

with

$$\begin{aligned} \mathbb {C}_F(t)=\langle \mathbf {x} _F(t) \mathbf {x} ^T_F(0)\rangle \,. \end{aligned}$$
(12)

For the class of dynamics described by Eq. (5), this result provides an easy way to compute the response functions, once the dynamics of the fast variables is known.

Evaluating of fast correlations from data filtering

The computation of the response functions by means of Eq. (11) requires the evaluation of the correlation function matrix \(C_F(t)\) appearing on the right hand side. The latter is usually not accessible from experiments and observations: if a large time-scale separation is present, however, such correlation functions can be estimated by considering a proper filtering of the dynamics. Remarkably, the quality of the approximation increases with the separation between the time scales.

The idea is to replace \(\mathbf {x} _F\) by

$$\begin{aligned} \tilde{ \mathbf {x} }(t)= \mathbf {x} (t)-\int _{-\infty }^{\infty }ds\,\mathcal {G}(t-s) \mathbf {x} (s)\,, \end{aligned}$$
(13)

with

$$\begin{aligned} \mathcal {G}(t)=\frac{e^{-t^2/2T_w^2}}{\sqrt{2 \pi } T_w}\,, \end{aligned}$$
(14)

i.e. subtracting from the full dynamics a suitably defined running average. Here \(T_w\) is the characteristic time-window of the Gaussian filter. The idea, not new41, is that the filtered signal mimics the behaviour of the slowly varying components, so that \(\tilde{ \mathbf {x} }(t)\) can be seen as a “surrogate” of \(\mathbf {x} _F(t)\); unlike \(\mathbf {x} _F(t)\), however, \(\tilde{ \mathbf {x} }(t)\) can be easily computed from empirical data.

One of the advantages of using a Gaussian filter42 relies on the possibility to show analytically that

$$\begin{aligned} \langle \tilde{ \mathbf {x} }(t)\tilde{ \mathbf {x} }^T(t')\rangle \simeq \langle \mathbf {x} _F(t) \mathbf {x} _F^T(t')\rangle + O(\max \{\tau _0/T_w,T_w^2/\tau _1^2\})\,. \end{aligned}$$
(15)

The details of the proof, which involves easy but tedious computations, are reported in the Supplemental Material, Sec. “Evaluation of the error”. Here, the main point of the computation is the possibility to always find a \(T_w\) such that both \(\tau _0/T_w\) and \(T_w^2/\tau _1^2\) are small, provided that the time-scale separation between \(\tau _0\) and \(\tau _1\) is large enough. The optimal order of magnitude for the width of the window is given by

$$\begin{aligned} T_w \sim (\tau _0 \tau _1^2)^{1/3}\,, \end{aligned}$$
(16)

which ensures

$$\begin{aligned} \frac{\tau _0}{T_w} \simeq \frac{T_w^2}{\tau _1^2} \ll 1\,. \end{aligned}$$
(17)

Thus, the correlation functions in Eq. (11) can be written in terms of the quantities (13), which can be straightforwardly obtained form the time series. In other words, one has

$$\begin{aligned} \mathbb {R}(t) \simeq \tilde{\mathbb {C}}(t)\tilde{\mathbb {C}}^{-1}(0) \end{aligned}$$
(18)

with \(\tilde{\mathbb {C}}(t)=\langle \tilde{ \mathbf {x} }(t)\tilde{ \mathbf {x} }^T(0)\rangle\).

Section “Numerical examples” of the Supplemental Material illustrates how the above proposed combination of filtering and generalized FDR works in practice. We also show, numerically, that the method is robust with respect to the presence of (small) nonlinear terms in the fast dynamics.

Application to paleoclimate

To apply the proposed analysis to paleoclimate dynamics we assume, as a working hypothesis, that the dynamics of temperature and [CO\(_2\)] can be approximated by a model of the form (5). Here \(\tau _1\) is the typical time-scale of the Milankovitch series (approximately 40 kyr), while \(\tau _0\) is a characteristic time of the fast dynamics of T and [CO\(_2\)], which we expect to be of the order of the kyr. The time-scale separation should then allow the application of our analysis, at least at a qualitative level. This scenario is supported by consistency checks which will be described in the remaining part of this Section.

Data pre-processing

Our study of generalized FDR on paleoclimate time series has required a pre-processing to make the data ready for the analysis. First, we operated a “data alignment”, since the time series needed to be synchronized to make the generalized FDR applicable, while the original data were obviously not. In particular, the data of temperature and [CO\(_2\)] were recorded on different set of times \(a=\{t_1,t_2,\ldots ,t_n\}\), \(b=\{t'_1,t'_2,\ldots ,t'_k\}\). We used a spline interpolation method to align the two datasets, in such a way that the resulting series were characterised by time intervals of 0.5 kyrs between consecutive entries (close to the original average time interval).

A relevant point concerns the fact that temperature and CO\(_2\) have different temporal resolution, with CO\(_2\) showing more sparse data than temperature, in most parts of the record. This is reported in Fig. 3a, which displays the temporal resolution of the T and [CO\(_2\)] records. In any case, the temporal resolution rarely becomes lower than 1 kyr, thus affecting only the shortest lag considered (0.5 kyr). The different abundance of data between the two signals may introduce spurious statistical effects when interpolating: points generated from a set with lower density are more correlated, and this may affect the subsequent analysis, at the shortest time scales. To avoid this kind of effects we degraded the temporal resolution of the records in such a way that they were always locally comparable. In particular, we divided the total observation interval (800 kyr) into 80 equal segments. For each of these 10 kyr intervals, we compared the number of available data for T and [CO\(_2\)], and we deleted from the larger sample a number of entries equal to the difference. The data to be deleted were taken at regular intervals in the sequence.

The analysis of the data with undegraded temporal resolution confirms the findings reported here, and it is discussed in Sec. 5 of the Supplemental Material, see also Fig. S5. The analysis of the two halves of the signal, shown in Figs. S6 and S7 of the Supplementary Material , was performed on the same temperature signal with reduced temporal resolution used here.

After filtering, we rescaled the values of temperature and [CO\(_2\)] in order to be standardized: zero average and unitary variance. This is necessary to get rid of the degree of freedom due to the arbitrary choice of measure units, and allows to make comparisons between response functions relative to different physical quantities (see Ref.30 for a discussion on this point).

Width of the time window

Figure 3
figure 3

(a) Temporal resolution of the temperature (azure circles) and CO\(_2\) concentration (red diamonds) along the record. In the analysis, we have degraded the temporal resolution of temperature to match that of CO\(_2\). (b) Power spectrum of the time series for T (azure) and [CO\(_2\)] (red). The vertical green band indicates the frequencies corresponding to the three values of \(T_w\) employed in the “Results” section. (c), (d) High-pass-filtered versions of the T and [CO\(_2\)] signals, for the whole 800-kyr period, obtained by using a threshold \(T_w =\) 3 kyr, compared to the original datasets. Data are shown as normalized deviations from average.

The power spectra of T and [CO\(_2\)], Fig. 3b, show a common regime, smoothly decreasing with almost power-law dependence at time scales shorter than about 10 kyr. As such, there is no spectral gap at a precise frequency. We applied the high-pass filter at a time scale that is much shorter than the scale of the astronomical forcing. In the “Results” sections we take as a reference value \(T_w=3\) kyr, and we repeated the analysis also for \(T_w=2\) kyr and \(T_w=4\) kyr. The effect of this filtering procedure on the data series can be estimated by looking at Fig. 3c,d, where the signals before and after applying the filter are plotted for the whole dataset. In general, the millennial-scale oscillations tend to be more pronounced during the glacial periods.

Effect of the filter

The action of the Gaussian filter (13) is clearly visible in Fig. 4, reporting the histograms of \(T-\langle T \rangle\) and \([\text{CO}_2] - \langle [\text{CO}_2]\rangle\), before and after filtering. What can be deduced by the comparison between the original and the final distributions is that the filtering has a twofold effect: it makes the distributions Gaussian-like and, at the same time, it reduces the excursion of the signal.

The former fact can be seen as an hint (although not a proof) that the filtered variables have an almost-linear dynamics, which is our working hypothesis. The latter indicates that most of the variability is on longer time scales, where the effects of the slow forcing and/or of stronger nonlinear climatic responses (both being removed when the signal is filtered), are non-negligible.

Figure 4
figure 4

Histograms of the centered variables \(T-\langle T \rangle\) and \([\text{CO}_2] - \langle [\text{CO}_2]\rangle\) before (Panels a, c) and after the filtering (Panels b, d). The filter makes the distribution of the signal Gaussian-like (compare with the Gaussian fits, dashed line in the right column); at the same time it reduces the excursion of the signal, since it removes the large oscillations due to the slow external forcing. Binning: for each plot, 60 bins of equal size are considered, ranging from the lowest to the highest recorded value.

Finally, in Fig. 5 we show the cross correlations of the two signals, before and after the filtering procedure. From these plots it is clear that the effect of the filter consists in removing from the analysis the large contributions coming from correlations on longer time scales. It should also be remarked that the cross correlations alone, even after the filter, are not very informative about the causal relations between the signals: in order to give insightful information on the causal structure of the system they must be properly combined with the self correlations, as prescribed for instance by the generalized FDR formalism.

Figure 5
figure 5

Lagged cross correlations for the T and [CO\(_2\)] signals, before and after applying the Gaussian filter discussed in the text.

Transfer entropy: a complementary approach

Transfer entropy was introduced by Schreiber27 as an indicator of the information which a given time-dependent signal \(x_1(t)\) provides about a second variable \(x_2(t)\). The basic idea is to measure how much information is lost about the distribution of \(x_2(t)\) when the knowledge of \(x_1(t)\) is ignored.

For a two-variable Markovian system, the TE with lag t is defined as

$$\begin{aligned} TE_{1\rightarrow 2}(t)=H[x_2|x_2](t)-H[x_2|x_1,x_2](t) \end{aligned}$$
(19)

where

$$\begin{aligned} H[y|x](t)=-\int dx\, dy\, P(x,0;y,t) \ln P(x,0;y,t)+\int dx\, P(x) \ln P(x)\, \end{aligned}$$
(20)

is the conditional Shannon entropy. Here P(x) represents the marginal x probability density functions, while P(x, 0; yt) is the joint distribution of x at time 0 and y at time t assuming stationarity. If the dynamics is linear the above relations can be simplified, as shown in Ref.43 (see Sec. 4 of the Supplemental Material for details): in particular, it can be shown that TE is a (complicated) function of two-points correlations functions.

From the point of view of the application to paleoclimatic series, the study of TE presents therefore the same difficulties encountered in the case of generalized FDR: the system under study is not Markovian, and the limited amount of data does not enable to determine a reliable functional form for the probability density functions appearing in Eq. (20). It may thus be expected that the above-discussed filtering procedure, by isolating the fast component of the correlation functions, allows to get rid of the spurious long-time-scale correlations, as in the case of the generalized FDR. This expectation seems to be confirmed by Fig. 1.

It is worth mentioning that an alternative rigorous attempt to assess causation was due to Granger44, who suggested that the link \(x_1\rightarrow x_2\) holds if the knowledge of the past history of \(x_1\) enhances the ability to predict future values of \(x_2\). Remarkably, Granger causality and TE have been shown to be equivalent in linear auto-regressive systems45.

With respect to the detection of causal links in a given system, TE and Granger’s approach can be regarded as complementary to responses. While the former focuses on our ability to predict future values of the considered process, the latter aims at defining the interaction mechanisms internal to the system30.