## Abstract

An abundance of complex dynamical phenomena exists in nature and human society, requiring sophisticated analytical tools to understand and explain. Causal analysis through observational time series data is essential in comprehending complex systems when controlled experiments are not feasible or ethical. Although data-based causal discovery methods have been widely used, there is still a lack of direct ways more aligned with the intuitive definition of causality, i.e., whether interventions on one element lead to changes in the subsequent development of others. To solve this problem, we propose the method of intervened reservoir computing (IRC) based on constructing a neural network replica of the original system and applying interventions to it. This approach enables controlled trials, thus observing the intervened evolution, in the digital twins of the underlying systems. Simulated and real-world data are used to test our approach and demonstrate its accuracy in inferring causal networks. Given the importance of causality in understanding complex dynamics, we anticipate that IRC could serve as a powerful tool for various disciplines to decipher the intrinsic mechanisms of natural systems from observational data.

### Similar content being viewed by others

## Introduction

Causal intuition is ingrained in the nature of human thinking, which has been continually explored and led to the development of modern civilization and technology. Practically, it is essential to decipher the underlying causal relationships in various complex systems, such as atmospheric systems^{1}, neurological systems^{2}, complex diseases^{3}, economic systems^{4}, ecosystems^{5}, and the earth system^{6}. The importance of observation-only-based causality identification techniques has been raised due to ethical and technical constraints which make conducting active experiments with precise interventions not always possible. A number of methods have been previously developed, such as those based on Granger causality (GC)^{7}, the entropy^{8}, and the theory of dynamical systems and Takens’ embedding theorem^{5}. Researchers have focused on the improvement of the above three approaches, including incorporating the latest machine learning insights. Reservoir computing Granger (RCG)^{9} uses reservoir computing to replace the autoregression in GC and considerably solves the false-positive and false-negative problems in causality detection. Reservoir computing causality (RCC)^{10} combines the ideas of GC and convergent cross mapping (CCM) to improve the ability to detect causal directions within real-world data by utilizing the positivity and negativity of the relative time lags of the two sequences (representing the past or the future) at optimal predictive performance. Echo state GC (ES-GC)^{11} modifies and optimizes the reservoir computing (RC) formulation to model multivariate signal coupling and achieves internal decoupling using multiple reservoirs. However, causal discovery remains one of the most challenging problems despite long-standing attempts^{12,13}. Inconsistency with human’s intuitive definition of causality, i.e., whether one thing’s appearance or change leads to another’s, brings theoretical biases to these methods, leading to their sometimes unsatisfactory results. There is still a lack of feasible methods based on such a definition. With the continuous development of machine learning techniques, neural network-fitted numerical systems bring us new ideas due to their intrinsic replicability and ease of manipulation.

The successful construction of digital twins for nonlinear systems^{14,15}, their impressive predictive capabilities^{16}, and their ability to solve notoriously challenging tasks^{17} demonstrate the superior performance of RC-based neural network structures. Reservoir computing is a specialized form of recurrent neural network (RNN). It features a unique architecture with randomly generated input and hidden layers, alongside a trainable output layer. This design has proven to be highly efficient for training through least squares optimization and offers superior performance in reconstructing and predicting time series data from intricate dynamic systems. The process involves mapping the input time series into a ‘reservoir’ space, which is theoretically established to create a high-dimensional embedding, enabling a rich dynamical representation.

Here, we present an approach named intervened reservoir computing (IRC) to detect dynamical causality (DC) from time series data. It relies on a simple but direct principle: intervene in *X* and observe the resulting effects on *Y*. An RC neural network is employed to create an embedding of the target system and generate a base sequence using a standard iteration mode termed closed-loop and a sequence with intervened input using intervened-loop. The difference between the two sequences will be substantial in the presence of causal influence. Conversely, the simulated intervention operation ideally produces no effect if no causal connection exists between the two variables. The comparison of the two sequences makes it possible to distinguish between the existence and different intensities of causal influences. The accuracy of IRC is exhibited in simulated data of coupled chaotic systems and two real-world datasets. The results indicate that IRC can accurately and completely reconstruct causal networks. Moreover, the approach provides reasonable results on real-world datasets, in line with prior research and discoveries.

## Methods

### Dynamical causality

The framework of dynamical causality is recently formulated^{18}, where a general system is described by the following delayed autonomous differential equation:

Here the state at time *t* is represented by \({{{{{{{{\boldsymbol{x}}}}}}}}}^{t}={[{x}_{1}^{t},{x}_{2}^{t},\cdots ,{x}_{n}^{t}]}^{\top }\), \({{{{{{{\boldsymbol{f}}}}}}}}={[\, {f}_{1},{f}_{2},\cdots ,{f}_{n}]}^{\top }\) contains *n* smooth functions, and 0 < *θ*_{1} < *θ*_{2} < ⋯ < *θ*_{s} are *s* time delays. The discrete version of Eq. (1) is

where \(\hat{{{{{{{{\boldsymbol{F}}}}}}}}}\) denotes functions to approximate the evolution operator, {*x*^{t−1}, *x*^{t−2}, ⋯ , *x*^{t−q}} are *q* sampled points along the trajectory in [*t* − *τ*, *t*) and *τ* = *q* refers to the memory time.

The determination of causality is contingent upon whether the current effect variable depends on the historical trajectory of the causal variable. If ∃ *k* ∈ {1, 2, ⋯ , *q*}, such that \(\frac{\partial {\hat{{{{{{{{\boldsymbol{F}}}}}}}}}}_{i}}{\partial {x}_{j}^{t-k}}\ne 0\), discrete DC is defined from *x*_{j} to *x*_{i}. Otherwise, *x*_{j} has no discrete DC to *x*_{i}. Practically, both the unperturbed trajectory and the trajectory subject to the artificial intervention of *x*_{j} are indispensable for assessing the presence of such a deviation.

### Reservoir computing

While making direct intervention is impossible without knowing the explicit system equations, which is always the case in real-world applications, machine learning techniques enable the reconstruction of the underlying system. Reservoir computing serves as an efficient paradigm to fulfill this task. An exquisite and improved architecture described in the following facilitates the artificial manipulation of input time series, which replicates real-world interventions.

As a framework that originates from recurrent neural networks, RC functions by mapping the input to a fixed nonlinear high-dimensional computational space, i.e., a reservoir, followed by a trainable output matrix to extract features from the reservoir and map them to the output, and serves as a robust tool for time series forecasting. Assuming that a complex system is subject to sampling at a given time interval *τ*, a set of *N*_{u}-dimensional observations {*u*^{0}, *u*^{τ}, *u*^{2τ}, …} can be obtained. RC is trained to generate a predicted output \({{{{{{{{\boldsymbol{u}}}}}}}}}_{{{{{{{{\rm{p}}}}}}}}}^{t+\tau }\in {{\mathbb{R}}}^{{N}_{{{{{{{{\rm{u}}}}}}}}}}\) that closely approximates *u*^{t+τ} in response to the input \({{{{{{{{\boldsymbol{u}}}}}}}}}^{t}\in {{\mathbb{R}}}^{{N}_{{{{{{{{\rm{u}}}}}}}}}}\).

The most widely used architecture of RC is composed of three functional components: the input matrix \({{{{{{{{\boldsymbol{W}}}}}}}}}_{{{{{{{{\rm{in}}}}}}}}}\in {{\mathbb{R}}}^{{N}_{{{{{{{{\rm{r}}}}}}}}}\times {N}_{{{{{{{{\rm{u}}}}}}}}}}\), the reservoir state \({{{{{{{{\boldsymbol{r}}}}}}}}}^{t}\in {{\mathbb{R}}}^{{N}_{{{{{{{{\rm{r}}}}}}}}}}\) with state adjacency matrix \({{{{{{{\boldsymbol{W}}}}}}}}\in {{\mathbb{R}}}^{{N}_{{{{{{{{\rm{r}}}}}}}}}\times {N}_{{{{{{{{\rm{r}}}}}}}}}}\), and the output matrix \({{{{{{{{\boldsymbol{W}}}}}}}}}_{{{{{{{{\rm{out}}}}}}}}}\in {{\mathbb{R}}}^{{N}_{{{{{{{{\rm{u}}}}}}}}}\times {N}_{{{{{{{{\rm{r}}}}}}}}}}\). The input matrix *W*_{in} and the state adjacency matrix ** W** are randomly generated and remain fixed. The majority of the elements in

*W*_{in}are zero, and only one element in each column is given a sampling value from a uniform distribution in [−

*σ*

_{in},

*σ*

_{in}], where

*σ*

_{in}is the input scaling, one of the hyperparameters. The connectivity between nodes within the reservoir is characterized by the state adjacency matrix

**, whose spectral radius is controlled by another hyperparameter**

*W**ρ*. The reservoir can be considered as an Erdős-Rényi network, where the presence or absence of each edge is determined by a fixed probability. The network’s connectivity

*d*is regulated by the mean degree of connectivity per node. Two steps are needed to calculate the output \({{{{{{{{\boldsymbol{u}}}}}}}}}_{{{{{{{{\rm{p}}}}}}}}}^{t+\tau }\). First, reservoir state

*r*^{t+τ}is updated at each iteration as a function of the input \({{{{{{{{\boldsymbol{u}}}}}}}}}_{{{{{{{{\rm{in}}}}}}}}}^{t}\) and itself

*r*^{t}, by

where \(\tanh \left({{{{{{{\boldsymbol{v}}}}}}}}\right)\) for a vector \({{{{{{{\boldsymbol{v}}}}}}}}={({v}_{1},{v}_{2},\ldots ,{v}_{{N}_{{{{{{{{\rm{u}}}}}}}}}})}^{\top }\) is defined as \({(\tanh {v}_{1},\tanh {v}_{2},\ldots ,\tanh {v}_{{N}_{{{{{{{{\rm{u}}}}}}}}}})}^{\top }\). Then, the predicted output is obtained from the reservoir through the linear readout matrix *W*_{out}, using

Studies have shown, however, that this architecture may cause the predicted trajectory to stray away from the actual attractor towards a symmetric attractor, which could be solved by adding biases to the input and reservoir state^{19,20,21}. Thus, we have the improved version of Eq. (3) and Eq. (4):

where [ ⋅ ; ⋅ ] indicates stacking arrays vertically (row wise), \({{{{{{{{\boldsymbol{W}}}}}}}}}_{{{{{{{{\rm{in}}}}}}}}}\in {{\mathbb{R}}}^{{N}_{{{{{{{{\rm{r}}}}}}}}}\times ({N}_{{{{{{{{\rm{u}}}}}}}}}+1)}\) and \({{{{{{{{\boldsymbol{W}}}}}}}}}_{{{{{{{{\rm{out}}}}}}}}}\in {{\mathbb{R}}}^{{N}_{{{{{{{{\rm{u}}}}}}}}}\times ({N}_{{{{{{{{\rm{r}}}}}}}}}+1)}\).

The training process, that is, the process of adjusting the matrix *W*_{out}, is done by minimizing the normalized mean square error (NMSE) between the output \({{{{{{{{\boldsymbol{u}}}}}}}}}_{{{{{{{{\rm{p}}}}}}}}}^{t+\tau }\) and the actual observed value *u*^{t+τ}, over a training set of \({N}_{{{{{{{{\rm{tr}}}}}}}}}\) points,

where \(\left\Vert \cdot \right\Vert\) is the *L*_{2} norm and *σ*^{2} denotes the variance of *u*^{iτ}. Consider the process of minimizing NMSE as the least squares minimization problem, which can be solved as a linear equation through ridge regression

where \({{{{{{{\boldsymbol{R}}}}}}}}\in {{\mathbb{R}}}^{({N}_{{{{{{{{\rm{r}}}}}}}}}+1)\times {N}_{{{{{{{{\rm{tr}}}}}}}}}}\) and \({{{{{{{\boldsymbol{U}}}}}}}}\in {{\mathbb{R}}}^{{N}_{{{{{{{{\rm{u}}}}}}}}}\times {N}_{{{{{{{{\rm{tr}}}}}}}}}}\) are the horizontal concatenation of the updated reservoir states *r*^{iτ} and the observed value \({{{{{{{{\boldsymbol{u}}}}}}}}}^{i\tau }(i=1,2,...,{N}_{{{{{{{{\rm{tr}}}}}}}}})\)^{22}. \({{{{{{{\boldsymbol{I}}}}}}}}\in {{\mathbb{R}}}^{({N}_{{{{{{{{\rm{r}}}}}}}}}+1)\times ({N}_{{{{{{{{\rm{r}}}}}}}}}+1)}\) is the identity matrix, and *β* is a user-defined Tikhonov regularization parameter preventing overfitting to the training data which generally takes a very small value.

When forecasting sequences using a trained RC, the reservoir needs to be updated through only Eq. (5a) using data prior to the starting point to satisfy the echo state property^{23} and obtain the initial reservoir state. We call this process idle iteration.

### Intervened reservoir computing

Describing the causal relationships within a complex system as a causal network, the basic idea of IRC is that eliminating the influence of one node will lead to a (no) change in the subsequent evolutionary trajectory of another node if a causal link does (does not) exist. To compare whether it leads to a change, control and experimental results need to be generated using closed-loop and intervened-loop, respectively, as described later in this section. The two loop modes use the same RC parameters, i.e., *W*_{in}, ** W**, and

*W*_{out}, where

*W*_{out}is obtained by standard RC training directly using the observational time series as in Eq. (7).

#### Closed-loop

To compare the changes in subsequent evolution resulting from an intervention, an unintervened baseline sequence should first be generated as a control group, whose generation process can be considered a simple forecasting task, and is accomplished through a self-iterative process called closed-loop as shown in the blue dashed rectangle in Fig. 1. The process of generating the unintervened sequence \({\{{{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\rm{CL}}}}}}}}}^{t}\}}_{t = 0}^{{N}_{{{{{{{{\rm{evo}}}}}}}}}}\) using closed-loop can be described as:

where \({{{{{{{{\boldsymbol{r}}}}}}}}}_{{{{{{{{\rm{CL}}}}}}}}}^{t}\) denotes the reservoir state at time *t* under closed-loop and the input used to update the reservoir is the output of the previous iteration. The starting point of the sequence is randomly selected among the testing data and denoted as *t* = 0, followed by *N*_{idle} idle iterations. The last result of the idle iteration serves as the initial state of the closed-loop.

#### Intervened-loop

The key to designing the structure of the intervened neural network, as well as the identification of causal influence (from node *x*_{j} to node *x*_{i}, for example), is to minimize the difference between the two predicted sequences in the absence of causal influence while maximizing the difference in the presence of causal influence. Based on this idea, we present the intervened-loop to simulate the elimination of the causal effect from *x*_{j} to *x*_{i} and to acquire intervened sequences, as shown in the loop process boxed by the red dashed box in Fig. 1. To implement the intervened-loop, it is necessary to employ two RC structures called aiding structure (AS) and intervened structure (IS), whose reservoir states are relatively independent of each other. The intervened structure has an input with node *x*_{j} artificially set to zero, denoting an intervention of eliminating its influence, and the output provided to the final result retains only node *x*_{i}. The aiding structure requires an unaltered input and the output provided to the final result keeps all elements except node *x*_{i}. Precisely, obtaining the intervened sequence \({\{{{{{{{{{\boldsymbol{x}}}}}}}}}_{j\to i}^{t}\}}_{t = 0}^{{N}_{{{{{{{{\rm{evo}}}}}}}}}}\) through the intervened-loop can be described as:

where **0**_{k} denotes a vector whose *k*-th dimension is zero and all other dimensions are one, **1**_{k} denotes a vector whose *k*-th dimension is one and all other dimensions are zero, ⊗ denotes element-wise product, \({{{{{{{{\boldsymbol{r}}}}}}}}}_{j\to i,{{{{{{{\rm{AS}}}}}}}}}^{t}\) and \({{{{{{{{\boldsymbol{r}}}}}}}}}_{j\to i,{{{{{{{\rm{IS}}}}}}}}}^{t}\) denote the reservoir state of aiding structure and intervened structure at time *t*, respectively, and *j* → *i* indicates that the sequence is obtained by eliminating the causal influence from node *x*_{j} to node *x*_{i}. In the intervened structure, \({{{{{{{{\bf{0}}}}}}}}}_{j}\otimes {{{{{{{{\boldsymbol{x}}}}}}}}}_{j\to i}^{t}\) achieves the evolution of the system unaffected by node *x*_{j} via adjusting the *j-*th dimension of the input to zero. And \({{{{{{{{\bf{1}}}}}}}}}_{i}\otimes {{{{{{{{\boldsymbol{W}}}}}}}}}_{{{{{{{{\rm{out}}}}}}}}}[{{{{{{{{\boldsymbol{r}}}}}}}}}_{j\to i,{{{{{{{\rm{IS}}}}}}}}}^{t};{b}_{{{{{{{{\rm{out}}}}}}}}}]\) restricts the result of the intervened structure to be used only to produce the *i-*th dimension of the current iteration. Thus, the intervened-loop simulates the artificial elimination and only the elimination of causal influence directed from node *x*_{j} to node *x*_{i}, regardless of its actual existence.

The intervened-loop provides distinguishable results for different cases with or without causation. The two structures share the same network parameters *W*_{in}, ** W**,

*W*_{out}, and the initial reservoir state with closed-loop, i.e., \({{{{{{{{\boldsymbol{r}}}}}}}}}_{j\to i,{{{{{{{\rm{AS}}}}}}}}}^{0}={{{{{{{{\boldsymbol{r}}}}}}}}}_{j\to i,{{{{{{{\rm{IS}}}}}}}}}^{0}={{{{{{{{\boldsymbol{r}}}}}}}}}_{{{{{{{{\rm{CL}}}}}}}}}^{0}\). Ideally, when there is no causal influence from node

*x*

_{j}to node

*x*

_{i}, the output of both structures \({{{{{{{{\boldsymbol{W}}}}}}}}}_{{{{{{{{\rm{out}}}}}}}}}{{{{{{{{\boldsymbol{r}}}}}}}}}_{j\to i,{{{{{{{\rm{AS}}}}}}}}}^{t}\) and \({{{{{{{{\boldsymbol{W}}}}}}}}}_{{{{{{{{\rm{out}}}}}}}}}{{{{{{{{\boldsymbol{r}}}}}}}}}_{j\to i,{{{{{{{\rm{IS}}}}}}}}}^{t}\) should yield highly similar results for node

*x*

_{i}. In this case, the intervened-loop is considered equivalent to the closed-loop thus ensuring the minimization of the difference produced by both the intervened-loop and the closed-loop. When the causal influence exists,

*x*

_{i}in the result of the first iteration should undergo a shift of a certain magnitude compared to the original trajectory. In the following iteration, the impact on

*x*

_{i}due to the artificial intervention will accumulate, while the slight change in

*x*

_{i}leads to a subsequent change in other variables. After sufficient iterations, the sequence generated by the intervened-loop will exhibit considerable inconsistency in comparison to that from the closed-loop, which comes from the elimination of the causal influence from node

*x*

_{j}to node

*x*

_{i}at each iteration.

### Reconstruction of causal networks

Predicted intervened sequences need to be generated and compared with the unintervened sequence for each pair of nodes (*x*_{i}, *x*_{j}), *i*, *j* = 1, 2, . . . , *N*, *i* ≠ *j* to reconstruct the entire causal network. Considering that the consequences of removing the causal influence of node *x*_{j} are closely related to the absolute value of *x*_{j}, to fairly compare the trajectory differences between pairs of nodes, the trajectory deviation indices **TDI** = [TDI_{j→i}] are calculated to minimize the influence of varying *x*_{j} values for all pairs of nodes through the following equation,

where \({x}_{{{{{{{{\rm{CL}}}}}}}},k}^{t}\) denotes the *k*-th dimension of \({{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\rm{CL}}}}}}}}}^{t}\) and \({x}_{j\to i,k}^{t}\) denotes the *k*-th dimension of \({{{{{{{{\boldsymbol{x}}}}}}}}}_{j\to i}^{t}\). A filtering process is used by setting a minimum threshold value *j*_{threshold} for \({\bar{x}}_{j}\) to maintain the stability of the values and to prevent the detrimental effects caused by dividing by a value too close to zero.

The whole procedure is repeated for multiple RCs with different initial states to guarantee the reliability of the detected results. Pseudo-code for the full procedure can be found in Algorithm 1.

### Algorithm 1

**Reconstruction of the causal network**

1: Generate simulated data (or load real-world data) as \({\{{{{{{{{{\boldsymbol{u}}}}}}}}}^{t}\}}_{t = 1}^{{N}_{{{{{{{{\rm{tr}}}}}}}}}}\)

2: **for** net = 1 to *N*_{net} **do**

3: Initialize the RC and obtain *W*_{in}, *W*

4: Train the RC using \({\{{{{{{{{{\boldsymbol{u}}}}}}}}}^{t}\}}_{t = 1}^{{N}_{{{{{{{{\rm{tr}}}}}}}}}}\)

5: **for** rep = 1 to *N*_{rep} **do**

6: Select a test starting point randomly

7: Execute the idle iterations and obtain the initial reservoir state *r*^{0} using Eq. (5a)

8: Generate sequence \({\{{{{{{{{{\boldsymbol{x}}}}}}}}}_{{{{{{{{\rm{CL}}}}}}}}}^{t}\}}_{t = 0}^{{N}_{{{{{{{{\rm{evo}}}}}}}}}}\) using Eq. (8)

9: **for** each pair of nodes (*x*_{i}, *x*_{j}) **do**

10: \({{{{{{{{\boldsymbol{r}}}}}}}}}_{j\to i,{{{{{{{\rm{AS}}}}}}}}}^{0}\leftarrow {{{{{{{{\boldsymbol{r}}}}}}}}}^{0}\), \({{{{{{{{\boldsymbol{r}}}}}}}}}_{j\to i,{{{{{{{\rm{IS}}}}}}}}}^{0}\leftarrow {{{{{{{{\boldsymbol{r}}}}}}}}}^{0}\)

11: Generate sequence \({\{{{{{{{{{\boldsymbol{x}}}}}}}}}_{j\to i}^{t}\}}_{t = 0}^{{N}_{{{{{{{{\rm{evo}}}}}}}}}}\) using Eq. (9)

12: Calculate TDI_{j→i} using Eq. (10)

13: **end for**

14: Obtain **TDI**

15: **end for**

16: **end for**

17: Calculate the average of **TDI**

## Results and discussion

### Benchmark dataset: networks of coupled Lorenz systems

To validate the IRC method, we consider a network of *N* nodes, each of which is a Lorenz system with slightly different parameters. For the *k*-th node, its internal dynamics can be described by:

where *c* is a constant controlling the strength of the coupling which is set to 0.3, and the value of \({a}_{kl}^{\left(x,y\right)}\) depends on the existence of a connection from *l* to *k* in the network and the weight of this connection, where 0 is for no connection, 0.5 is for a moderate connection, and 1 is for a strong connection. The adjacency matrix \({{{{{{{\boldsymbol{A}}}}}}}}=[{a}_{ij}^{(x,y)}](i,j=1,2,...,N)\) represents the connectivity within the network and is used as the ground truth to judge the accuracy of our method. Randomly chosen *h*_{k} from the interval \(\left[-h,h\right]\) is the source of heterogeneity, where *h* is set to 0.06. Independent white noise terms \({\varepsilon }_{k,d}^{t}\,(d=x,y,z)\) with zero mean and standard deviation *σ*_{dyn} = 10^{−4} are added representing the dynamical noise. In addition, the data is normalized and added with observational noise before it is available for training. To be precise, for the *k*-th node,

where *Q*_{1,k}, *Q*_{2,k}, *Q*_{3,k} are the 25th, 50th, and 75th percentile of the corresponding data \(\{{{{{{{{{\boldsymbol{u}}}}}}}}}_{k}^{t}={({x}_{k}^{t},{y}_{k}^{t},{z}_{k}^{t})}^{\top }\}\), independent white noise term \({{{{{{{{\boldsymbol{\varepsilon }}}}}}}}}_{{{{{{{{\rm{obs}}}}}}}},k}^{t}\) with zero mean and standard deviation *σ*_{obs} = 10^{−4} is added representing the observational noise, and \(\{{{{{{{{{\boldsymbol{u}}}}}}}}}_{{{{{{{{\rm{obs}}}}}}}},k}^{t}\}\,(k=1,2,\cdots \,,N)\) is the time series that is actually used to train the RC.

We start with a primary case with only one directed edge to verify whether the intervened-loop produces a different trajectory compared to the closed-loop in the presence of a causal relation. An adjacency matrix

is used, which represents the network containing a total of two nodes, 1 and 2, and a directed edge from 2 to 1. Figure 2 illustrates the difference between the results produced by the two loops for the absence and presence of causal links. It is worth noting that when performing the IRC approach on the pair of nodes with no causation from 1 to 2, the closed-loop and intervened-loop give highly similar evolutionary results. On the contrary, in the presence of causation, the result of the intervened-loop quickly deviates from that of the closed-loop. The calculated **TDI** gives a precise characterization for the observed trajectory deviation, showing that it is significantly larger in the case of causation existing than in the reversed direction (no causation).

To demonstrate the capability of IRC in detecting causality in larger networks and differentiating the underlying causal strength, we generate adjacency matrices ** A** with the number of nodes

*N*= 5, 10, 20 whose elements are randomly assigned with three possible weights, {0, 0.5, 1}, representing no influence, moderate influence, and strong influence, respectively. A representative network structure for

*N*= 20 is shown in Fig. 3a. The detected TDI

_{j→i}is plotted according to the underlying causal weight, i.e., the value of \({a}_{ij}^{(x,y)}\), as presented in Fig. 3b, showing significant distinctions among the three categories for all sizes of considered networks. Furthermore, for the sake of robustness, we randomly generate ten adjacency matrices for each

*N*, and present all their results in Fig. 4. IRC demonstrates its effectiveness in accurately distinguishing the existence of causal links, that is, whether \({a}_{ij}^{(x,y)}\) is zero or not, while retaining good discrimination between moderate and strong influence. Three representative methods are used for comparison, namely, GC

^{7}, transfer entropy (TE)

^{8}, and CCM

^{5}. Considering the inability of the comparative approaches to discover causal strength between different influence weights, moderate and strong influences are both treated as existence of causation. We consider causal network reconstruction as a task of classifying whether there is a causal influence between each pair of nodes and use receiver operating characteristic (ROC) curves to demonstrate the results, as shown in the internal subplots of Fig. 3b. The IRC method presents excellent classification accuracy, as the corresponding area under the receiver operating characteristic (AUROC) reaches one, providing a notable advantage over the existing methods.

### Real-world dataset 1: El Niño-Southern Oscillation time series

El Niño is a phenomenon in the equatorial Pacific Ocean characterized by five consecutive 3-month running mean of sea surface temperature (SST) anomalies in the Niño 3.4 region. Visit https://www.ncei.noaa.gov/access/monitoring/enso/sstfor a detailed description and source of the maps used in Fig. 5a.

El Niño-Southern Oscillation (ENSO) is a recurring climate pattern across the tropical Pacific that shifts back and forth irregularly every two to seven years. It brings predictable shifts in ocean surface temperature and disrupts wind and rainfall patterns across the tropics. Exploring the ENSO time series can help us understand the patterns and trends of this climate phenomenon. By analyzing the data over time, scientists can better predict the occurrence and intensity of future El Niño and La Niña events. This information can be used to prepare for and mitigate the impacts of these events on weather patterns, agriculture, fisheries, and other sectors.

As depicted in Fig. 5a, the ENSO phenomenon is studied by monitoring the SST in four regions: Nino 1+2, Nino 3, Nino 3.4, and Nino 4. In this experiment, plain monthly area-averaged SST data calculated from the HadlSST1 for these regions are collected from January 1870 to December 2022, yielding a total of 1836 × 4 samples from the National Oceanic and Atmospheric Administration (NOAA) website, https://psl.noaa.gov/gcos_wgsp/Timeseries/, and the IRC approach is used to reconstruct the causal relationships between these four sets of time series data. An interpolation method is applied to improve the smoothness of the data.

During normal conditions, trade winds blow west along the equator in the Pacific Ocean, carrying warm water from the eastern Pacific to the western Pacific. In some years though, during El Niño, trade winds weaken, and the warm surface water is pushed back east, toward the west coast of South America. La Niña has the opposite effect of El Niño, during which trade winds are even stronger than usual, pushing more warm water toward Asia. Thus, a bidirectional causal chain between the neighboring Niño regions is formed. Recent research also attributes this causal chain to the co-varying atmospheric Walker circulation and oceanic thermocline feedback and explains how these subareas’ SSTs are coupled using physical equations^{24}. The reconstructed causal network and its visualization are presented in Fig. 5a, where cells with TDI_{j→i} greater than 0.02 are boxed. Intuitively, each region has two causal connections back and forth with neighboring regions. Despite the non-constant direction of flow at the ocean surface, our result shows consistency to previous study^{25} and is well-matched with the conclusions of existing related studies on El Niño.

### Real-world dataset 2: air pollution and hospital admission of cardiovascular diseases in Hong Kong

Clean air is considered a fundamental requirement for human health and well-being. The World Health Organization (WHO) has published air quality guidelines covering particulate matter, ozone, nitrogen dioxide, and sulfur dioxide. Studies have shown that exposure to present-day concentrations of ambient particulate matter consistently increases the risk of cardiovascular events^{26}. Recorded data on air pollution and hospital admission of cardiovascular diseases in Hong Kong from 1994 to 1997 are collected^{27,28,29} as the second real-world time series example. Similar to ENSO, the data is interpolated.

According to the IRC approach, NO_{2} and respirable suspended particulates (RSP) exhibited stronger causal effects on cardiovascular diseases (CVDs) compared to SO_{2} and O_{3}. The reconstructed network is shown in Fig. 5b, with the threshold set to 0.01. This conclusion is highly consistent with previous research findings^{30,31,32}.

### Further experiments

Recent studies have shown that the indirect chain effect and common driver effect to induce spurious causality are critical factors affecting the reconstruction of multivariate causal networks^{33}. We use coupled Lorenz systems with the following two causal networks to more intuitively validate the ability of IRC to identify spurious causality and determine the direction of causal loops, as well as to discuss the effect of different parameters on the TDI results:

A visualization is shown in Fig. 6a.

#### Choice of a suitable *N*
_{evo}

Overall, the mean and standard deviation of the TDI_{j→i} with causation increase as *N*_{evo} increases, while those of the TDI_{j→i} without causation remain stable, providing a larger possible range of feasible thresholds. But when observational noise exists, there is a proper range of *N*_{evo} values that is not excessively small or large to support the satisfactory performance of IRC, as shown in Fig. 6b, c. The baseline prediction error increases when the observation noise is large, at which point a smaller number of iterations (*N*_{evo} = 1, 2, 3, 4) is not effective in distinguishing between the deviation caused by the intervention and the error from the prediction itself, which are both on the order of 10^{−4}. This also illustrates the necessity of enhancing the influence of interventions through intervened-loop design. A larger number of iterations is also not preferred since larger standard deviations create difficulties in distinguishing different coupling strengths within a single system on the one hand, and the accumulation of prediction errors is more likely to lead to extreme values (*N*_{evo} = 17, 18, 19, 20) on the other hand. In calculating the **TDI** in Figs. 2, 3, 4, 5a, b, *N*_{evo} is set to 20, 10, 10, 5, and 4, respectively.

#### The relationship between TDI_{j→i} and causal strength

Intuitively, larger coupling strength produces a larger influence from each intervention and, eventually, a larger trajectory deviation. The relationship between the coupling strength *c* and TDI_{j→i} for a prediction length *N*_{evo} of 7 is shown in Fig. 6d, exhibiting a high positive correlation between TDI_{j→i} (with causation) and *c*. From these results, it is reasonable to consider the TDI_{j→i} as relative confidence in the existence and a measure of the strength of a causal link in a system. We also use the T-test to examine whether a statistical significance exists between pairs of nodes with or without causal relation, with *p*-value results shown in Table 1.

#### Robustness to noise

We consider the effects of dynamical noise with variant *σ*_{dyn} ranging from 0 to 1.4 and observational noise with variant *σ*_{obs} ranging from 0 to 0.06 separately. Experiments show that the scale of dynamical noise has a negligible effect on IRC, whereas observational noise gradually decreases the performance when its level increases, as shown in Fig. 6e, f. However, IRC only loses its efficacy when the observational noise is significantly large, which is as expected since excessive observational noise may obscure the dynamics of the unknown system, which is exactly what the IRC method tries to reproduce in the digital system, while dynamical noise does not.

#### What time series length is sufficient

Results show that with the increasing of data length, IRC performance becomes better and gradually stabilizes, which is consistent with the intuition. Meanwhile, we find the sufficient length to be closely related to the sampling interval *τ*. In other words, the time span of the sequence needs to be long enough to present the entire dynamic of the unknown system. When *τ* is 0.01, a sequence length \({N}_{{{{{{{{\rm{tr}}}}}}}}}\) of about 4000 is needed for training to get the desired result, while when *τ* is 0.02, the required length is halved to about 2000, as shown in Fig. 6g, h. On the other hand, consistent with intuition, TDI is also related to *τ* since TDI portrays changes resulting from the elimination of causality. Therefore, absolute values of TDI are not comparable across systems on different time scales.

#### Application to discrete-time systems

We use coupled Logistic maps to generate simulated data, whose *k*-th node is described by:

where *c* = 0.3 is the coupling constant, adjacency matrix ** A** = [

*a*

_{ij}](

*i*,

*j*= 1, 2, . . . ,

*N*) represents the connectivity within the network and \({\varepsilon }_{k}^{t}\) is independent white noise with zero mean and standard deviation 0.005. We generate 100 sets of time series with length \({N}_{{{{{{{{\rm{tr}}}}}}}}}=5000\) in each of the following two scenarios:

and *γ*_{1} = 3.7, *γ*_{2} = 3.72, *γ*_{3} = 3.78. For each pair of nodes, TDI_{j→i} is categorized as with and without causal links based on ground truth *a*_{ij}. Although RC is not preferred for fitting discrete systems, the IRC approach still gives fairly correct classifications with small prediction lengths *N*_{evo}. Correct here means that the minimum value of the TDI_{j→i} corresponding to all pairs of nodes with causal link is greater than the maximum value of those without causal links in a single simulation. The results are shown in Fig. 7a, b, with *p*-values and correct numbers given in the subtitle and ** A** in the lower left.

Moreover, for a strong autocorrelation case, we use the following model:

where autocorrelation *α*_{k} is set to 0.8, coupling strength *c* is set to 0.3, ** A** = [

*a*

_{ij}](

*i*,

*j*= 1, 2, . . . ,

*N*) represents the connectivity within the network and \({\varepsilon }_{k}^{t}\) is independent white noise with zero mean and standard deviation 0.5. Similar to the Logistic map, in scenarios

*A*_{3}and

*A*_{4}, as shown in Fig. 7c, the IRC approach yields accurate and distinguishable TDI

_{j→i}that are consistent with the ground truth of the existence of causality between nodes, free from spurious causality and strong autocorrelation.

#### Choice of other parameters

To make the trained RC reproduce the dynamics of the target system as much as possible, it is necessary to choose suitable hyperparameters to improve the fitting performance of the RC. We use K-Fold Cross Validation and Grid Search^{19} in a Lorenz system datasets to determine *σ*_{in} and *ρ*. In continuous (discrete) system experiments *σ*_{in} is set to 0.5 (1). *ρ* = 0.9, *b*_{in} = *b*_{out} = 1, *β* = 1 × 10^{−4}, and *d* = 2 are used in all experiments.

As for reservoir size *N*_{r}, RC fitting complex systems with higher dimensions are supposed to have larger reservoir sizes. We allocate *N*_{rpd} reservoir nodes for each dimension, i.e., *N*_{r} = *N*_{u}*N*_{rpd}. For the coupled Lorentz system, an *N*_{rpd} of about 40 works well enough, as shown in Fig. 6i.

Practically, we use *N*_{net} of RCs with different initializations to randomly select *N*_{rep} starting positions in the test dataset respectively, and average each result as the final TDI result, thus reducing the influence of the internal structure of the reservoir and different starting positions on the predicted sequences. Larger *N*_{net} and *N*_{rep} give more stable IRC outcomes but also lead to higher computational requirements. In calculating the TDI in Figs. 2, 3, 4, 5a, b, *N*_{net} is set to 1, 5, 5, 25, and 30, respectively, and *N*_{rep} is set to 20, 20, 30, 20, and 15, respectively.

## Conclusion

Unlike existing methods (including Granger causality, transfer entropy, convergent cross mapping, etc.) that detect causality via exploiting relationships among multiple time series, intervened reservoir computing (IRC), though similarly utilizing sequence forecasting as a tool, is fundamentally different. Firstly, IRC implements the idea of controlled trials by constructing digital twins of the underlying systems to detect causality. Granger causality is based on the principle that if *X* causes *Y*, then incorporating *X*’s past information improves the prediction of *Y*’s future. This principle requires a separability condition of the time series, usually not satisfied in real-world nonlinear systems, which is then addressed by the convergent cross-mapping method based on phase space reconstruction. Convergent cross-mapping confirms the existence of causation from *X* to *Y* when the neighbors of a point in the reconstructed manifold *M*_{Y} are still the neighbors of the point with the same time index in *M*_{X}, under the cross-map. However, our IRC method admits active intervention and confirms *X* causing *Y* if an intervention on *X* results in a subsequent change in *Y*’s future, representing a completely different idea with Granger causality and convergent cross-mapping. Moreover, Granger causality is implemented by comparing two different autoregression predictors, ranging from the original linear predictor to more recent neural network predictors, one of which includes the information of the cause while the other one does not. IRC is implemented by observing the deviation of one predictor, fully replicating the target system when intervention is imposed, and also representing different designs with Granger causality.

To summarize, inspired by the idea of controlled trials, which is at the center of causal principle, and taking advantage of the ease of manipulation offered by reservoir computing in replicating real-world complex systems, we devise the IRC framework, allowing for accurate causal discovery. The core principle of this framework is based on the criterion that causality is inferred when a manual intervention on one variable leads to a change in the evolution of another. While reservoir computing (RC) offers cheap performance requirements, IRC as a flexible framework, is focused on comparing intervention effects on a digital system, and thus its neural network can theoretically be replaced on demand as well. Experiments on simulated and real-world datasets validate IRC’s ability to achieve accurate detection results despite complex network topologies. The results show that IRC not only detects correct causal directions but also provides valuable information regarding the underlying causal strengths. The application of IRC to real-world systems indicates that it can serve as an essential tool across various disciplines in exploring causal mechanisms.

Future works can be done with IRC to detect hidden nodes and perform causal discovery with the existence of hidden variables in real-world complex systems. Practically, some limitations still exist for the IRC framework. For example, the application of IRC presupposes that RC or other neural networks are suitable for reproducing the dynamics of the target system, the time complexity of calculating the trajectory deviation indices rises with the square of the network size, dense networks are difficult to be accurately inferred, and like many existing methods, the choice of thresholds remains an imperfectly solved problem. Despite the limitations, IRC presents innovative concepts and tools that empower us to further explore the evolutionary mechanisms of nature, deepening our understanding of its intrinsic principles.

## Data availability

All datasets generated in this study can be simply reproduced, using publicly available code at https://github.com/Jintong-CNS/Intervened-Reservoir-Computing. The links/references for the public data sets used and analyzed during the current study are all provided in main text.

## Code availability

The code used in this study is freely available in the public GitHub.

## References

Wang, Q. Multifractal characterization of air polluted time series in China.

*Physica A***514**, 167–180 (2019).Hamilton, J. P., Chen, G., Thomason, M. E., Schwartz, M. E. & Gotlib, I. H. Investigating neural primacy in major depressive disorder: multivariate granger causality analysis of resting-state fMRI time-series data.

*Mol. Psychiatry***16**, 763–772 (2011).Zhong, J. et al. Uncovering the pre-deterioration state during disease progression based on sample-specific causality network entropy (SCNE).

*Research***7**, 0368 (2024).Dritsakis, N. Tourism as a long-run economic growth factor: an empirical investigation for Greece using causality analysis.

*Tour. Econ.***10**, 305–316 (2004).Sugihara, G. et al. Detecting causality in complex ecosystems.

*Science***338**, 496–500 (2012).Runge, J. et al. Inferring causation from time series in earth system sciences.

*Nat. Commun.***10**, 2553 (2019).Granger, C. W. Investigating causal relations by econometric models and cross-spectral methods.

*Econometrica***37**, 424–438 (1969).Schreiber, T. Measuring information transfer.

*Phys. Rev. Lett.***85**, 461 (2000).Wang, M. & Fu, Z. A new method of nonlinear causality detection: reservoir computing Granger causality.

*Chaos Solitons Fractals***154**, 111675 (2022).Huang, Y., Fu, Z. & Franzke, C. L. Detecting causality from time series in a machine learning framework.

*Chaos***30**, 063116 (2020).Duggento, A., Guerrisi, M. & Toschi, N. Echo state network models for nonlinear Granger causality.

*Philos. Trans. R. Soc. A***379**, 20200256 (2021).Yang, L., Lin, W. & Leng, S. Conditional cross-map-based technique: from pairwise dynamical causality to causal network reconstruction.

*Chaos***33**, 063101 (2023).Ying, X. et al. Continuity scaling: a rigorous framework for detecting and quantifying causality accurately.

*Research***2022**, 9870149 (2022).Kong, L.-W., Weng, Y., Glaz, B., Haile, M. & Lai, Y.-C. Reservoir computing as digital twins for nonlinear dynamical systems.

*Chaos***33**, 033111 (2023).Duan, X.-Y. et al. Embedding theory of reservoir computing and reducing reservoir network using time delays.

*Phys. Rev. Res.***5**, L022041 (2023).Chen, P., Liu, R., Aihara, K. & Chen, L. Autoreservoir computing for multistep ahead prediction based on the spatiotemporal information transformation.

*Nat. Commun.***11**, 4568 (2020).Tong, Y. et al. Earthquake alerting based on spatial geodetic data by spatiotemporal information transformation learning.

*Proc. Natl Acad. Sci. USA***120**, e2302275120 (2023).Shi, J., Chen, L. & Aihara, K. Embedding entropy: a nonlinear measure of dynamical causality.

*J. R. Soc. Interface***19**, 20210766 (2022).Racca, A. & Magri, L. Robust optimization and validation of echo state networks for learning chaotic dynamics.

*Neural Netw.***142**, 252–268 (2021).Huhn, F. & Magri, L. Learning ergodic averages in chaotic systems. In

*International Conference on Computational Science*, 124–132 (Springer, 2020).Lu, Z. et al. Reservoir observers: model-free inference of unmeasured variables in chaotic systems.

*Chaos***27**, 041102 (2017).Lukoševičius, M. & Jaeger, H. Reservoir computing approaches to recurrent neural network training.

*Comput. Sci. Rev.***3**, 127–149 (2009).Yildiz, I. B., Jaeger, H. & Kiebel, S. J. Re-visiting the echo state property.

*Neural Netw.***35**, 1–9 (2012).Fang, X., Dijkstra, H., Wieners, C. & Guardamagna, F. A nonlinear full-field conceptual model for ENSO diversity.

*J. Clim.***37**, 3759–3774 (2024).Ma, D., Ren, W. & Han, M. A two-stage causality method for time series prediction based on feature selection and momentary conditional independence.

*Physica A***595**, 126970 (2022).Franklin, B. A., Brook, R. & Pope III, C. A. Air pollution and cardiovascular disease.

*Curr. Probl. Cardiol.***40**, 207–238 (2015).Lee, B.-J., Kim, B. & Lee, K. Air pollution exposure and cardiovascular disease.

*Toxicol. Res.***30**, 71–75 (2014).Wong, T. W. et al. Air pollution and hospital admissions for respiratory and cardiovascular diseases in Hong Kong.

*Occup. Environ. Med.***56**, 679–683 (1999).Fan, J. & Zhang, W. Statistical estimation in varying coefficient models.

*Ann. Stat.***27**, 1491–1518 (1999).Milojevic, A. et al. Short-term effects of air pollution on a range of cardiovascular events in England and Wales: case-crossover analysis of the minap database, hospital admissions and mortality.

*Heart***100**, 1093–1098 (2014).Leng, S. et al. Partial cross mapping eliminates indirect causal influences.

*Nat. Commun.***11**, 2632 (2020).Chen, F. & Li, C. Inferring structural and dynamical properties of gene networks from data with deep learning.

*NAR Genomics Bioinforma.***4**, lqac068 (2022).Runge, J., Nowack, P., Kretschmer, M., Flaxman, S. & Sejdinovic, D. Detecting and quantifying causal associations in large nonlinear time series datasets.

*Sci. Adv.***5**, eaau4996 (2019).

## Acknowledgements

This work is sponsored by the National Key R&D Program of China (No. 2022YFC2704604), the National Natural Science Foundation of China (No. 12101133), and “Chenguang Program” supported by Shanghai Education Development Foundation and Shanghai Municipal Education Commission (No. 20CG01). This work is also sponsored by Science and Technology Commission of Shanghai Municipality (Nos. 2021SHZDZX0103 and 21DZ1201402).

## Author information

### Authors and Affiliations

### Contributions

S.Y.L. conceived the idea; C.G., J.F.S., and S.Y.L. designed the research; J.T.Z. performed the research; J.T.Z., Z.X.G., R.X.H., C.G., J.F.S., and S.Y.L. analyzed the data and wrote the paper.

### Corresponding authors

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Peer review

### Peer review information

*Communications Physics* thanks Rui Liu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Zhao, J., Gan, Z., Huang, R. *et al.* Detecting dynamical causality via intervened reservoir computing.
*Commun Phys* **7**, 232 (2024). https://doi.org/10.1038/s42005-024-01730-6

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s42005-024-01730-6

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.