Distinguishing time-delayed causal interactions using convergent cross mapping

An important problem across many scientific fields is the identification of causal effects from observational data alone. Recent methods (convergent cross mapping, CCM) have made substantial progress on this problem by applying the idea of nonlinear attractor reconstruction to time series data. Here, we expand upon the technique of CCM by explicitly considering time lags. Applying this extended method to representative examples (model simulations, a laboratory predator-prey experiment, temperature and greenhouse gas reconstructions from the Vostok ice core, and long-term ecological time series collected in the Southern California Bight), we demonstrate the ability to identify different time-delayed interactions, distinguish between synchrony induced by strong unidirectional-forcing and true bidirectional causality, and resolve transitive causal chains.


Two-Species Model System with Bidirectional Causality
We generalize the simple model system consisting of 2 coupled logistic difference equations as follows: [Eq. S1] where τ d is the time delay for the effect of x on y. For each simulation run, we sample a new fixed value for the growth rates R x and R y from the uniform distribution (3.7, 3.9), as well as new values for the interaction coefficients A xy and A yx from the uniform distribution (0.05, 0.1). In addition each simulation is initialized with random starting points with x(1) and y(1) drawn from the uniform distribution (0.01, 0.99), and run for 3000 time steps. For each of the different values for the time delay: τ d = 0, τ d = 2, and τ d = 4, we ran a total of 500 simulations (when populations reached negative values or increased beyond carrying capacity, we sampled new coefficients and re-ran the simulation). Using extended CCM, we analyze each simulation using E = 2, τ = 1, The results are depicted in Figure S1, with boxplots for the value of the cross map lag (l) that gives the highest cross map skill (ρ). Because nearly all simulations had identical values for the optimal cross map lag (l), the boxplots are depicted as straight lines with just a few outliers.
As expected, "y xmap x" (red), depicting the causal effect of y on x has an optimal cross map lag of l = -1, because the y affects x with a lag of 1 time step (y(t) influences x(t+1)). Conversely, the optimal cross map lag for "x xmap y" (blue) changes depending on τ d ; this is also expected since xmap y" appears to accurately recover the time delay parameter τ d : for example, the optimal l is nearly always -3 when τ d = 2 (meaning x(t) influences y(t+3) and therefore it takes 3 time steps for y to respond to x).

Two-Species Model System with Synchrony
We also generalize the modified form of the above system that produces synchrony with strong forcing from x to y only: For each simulation, R x is sampled from the uniform distribution (3.7, 3.9), R y is sampled from the uniform distribution (2.5, 3.2), and A yx is sampled from the uniform distribution (0.7, 0.9). As above, the system is initialized with random starting points with x(1) and y(1) drawn from the uniform distribution (0.01, 0.99), and run for 3000 time steps. We ran a total of 500 simulations Results for the "generalized synchrony" model are shown in Figure S2, with boxplots showing the value of the cross map lag (l) that gives the highest cross map skill (ρ). Again, we see that the optimal cross map lag (l) is generally negative in the direction of true causality (red, "y xmap x") and positive in the direction of synchrony (blue, "x xmap y").

Four-Species Model System
To test the robustness of extended CCM in distinguishing between direct and indirect causality, we generalize the 4-species model system with a transitive causal chain: For each simulation, the growth parameters are sampled as follows: R 1 is drawn from the uniform distribution (3.8, 4.0), R 2 and R 3 are both drawn from the uniform distribution (3.5, 3.7), and R 4 is drawn from the uniform distribution (3.7, 3.9). The interaction parameters A 21 , A 32 , and A 43 are all drawn from the uniform distribution (0.3, 0.5). As above, the system is initialized with random starting points with each y i (1) drawn from the uniform distribution (0.01, 0.99), and run for 3000 time steps. We ran a total of 500 simulations (when populations reached negative values or increased beyond carrying capacity, we sampled new coefficients and re-ran the simulation).
Using extended CCM, we analyze each simulation using E = 4, τ = 1, selecting a random library Results for this analysis are shown in Figure S3, with bagplots (1) depicting the bivariate boxplots for the optimal cross map lags (l) and corresponding cross map skill (ρ). As in Figure 3, the top row of panel b shows that the optimal cross map lags are close to 0 and show high cross map skill, as would be expected for these direct interactions. In contrast, the indirect interactions generally have optimal cross map lags that are more negative, and lower cross map skill, with the most indirect interaction (from y 1 to y 4 , identified using y 4 xmap y 1 ) showing the most negative cross map lag and the lowest cross map skill. We note that the variance in cross map skill is quite high, indicating that it may not be as useful in separating direct from indirect interactions in real systems, whereas cross map lag shows clearer separation. Boxplots of optimal cross map lag (l) are shown for 500 random simulations of the 2-species logistic model with bidirectional causality and with three different time delays, τ d . Except for a few outliers, the optimal cross map lag when using x to cross map y (blue, "x xmap y") is -1, as would be expected, because x responds to y within a single time step. In the opposite direction, a larger time delay (τ d ) in the effect of x on y results in larger negative values for the optimal cross map lag when using y to cross map x (red, "y xmap x"). Boxplots of optimal cross map lag (l) are shown for 500 random simulations of the 2-species logistic model with unidirectional causality producing generalized synchrony. Except for a few outliers, the optimal cross map lag when using y to cross map x (red, "y xmap x") is negative, and positive in the opposite direction (blue, "x xmap y"). This is expected, because x has a true causal influence on future values of y, mean y is better at cross mapping to past values of x; conversely, the lack of an actual effect of y on x, but rather "generalized synchrony" means that x is better at cross mapping future values of y. (A) In this system, y 1 causes y 2 causes y 3 causes y 4 such that indirect causation from y 1 to y 3 , y 2 to y 4 , and y 1 to y 4 occurs. (B) Bagplots show the optimal cross map lag (l) and corresponding cross map skill (ρ) for 500 random simulations of this system. The white central area depicts the 95% confidence interval for the median value, while the darker colored region is the "bag" containing the central 50% of points (i.e., similar to an interquartile range), and the lighter colored region is