Model-free inference of direct network interactions from nonlinear collective dynamics

The topology of interactions in network dynamical systems fundamentally underlies their function. Accelerating technological progress creates massively available data about collective nonlinear dynamics in physical, biological, and technological systems. Detecting direct interaction patterns from those dynamics still constitutes a major open problem. In particular, current nonlinear dynamics approaches mostly require to know a priori a model of the (often high dimensional) system dynamics. Here we develop a model-independent framework for inferring direct interactions solely from recording the nonlinear collective dynamics generated. Introducing an explicit dependency matrix in combination with a block-orthogonal regression algorithm, the approach works reliably across many dynamical regimes, including transient dynamics toward steady states, periodic and non-periodic dynamics, and chaos. Together with its capabilities to reveal network (two point) as well as hypernetwork (e.g., three point) interactions, this framework may thus open up nonlinear dynamics options of inferring direct interaction patterns across systems where no model is known.


Supplementary Note 1 Algorithm for revealing network interactions ARNI
We decompose units' dynamics into interaction terms with other units in the network aṡ where g i j : R → R, g i js : R 2 → R, g i jsw : R 3 → R and, in general g i j1j2...j K : R K → R, represent interactions from units j k to unit i up to the K-th order, such that k ∈ {1, 2, . . . , K}. Specifically, decomposition (1) separates contributions to units' dynamics arising from different orders, e.g. pairwise and higher order interactions with other units in the system. Thus, it may be employed to infer network connectivity. Here reconstructing the network connectivity implies identifying which interactions in (1) capture the network dynamics the best. So, given that f i and Λ i are unknown, functions g i j1j2...j K are not directly accessible. Still, one may expand these functions in terms of basis functions aṡ where P k indicates the number of basis functions employed in the expansion. The rationale behind (2) is to express the unknown functions g i j1j2...j K in terms of linear combinations of simpler and known functions [1].
and c i j := c i j,1 , c i j,2 , . . . , c i j,P1 ∈ R P1 , and analogously for all expansions in Supplementary Equation (2), we may rewrite (2) aṡ Now, let us assume that we are given a high dimensional time series of a network x i,m = x i (t m ) for all i ∈ {1, 2, . . . , N }, where t m = m∆t + t 0 is the sampling time for m ∈ {1, 2, . . . , M } andẋ i,m =ẋ i (t m ) is the time derivative of i computed (e.g. through finite differences) at t m . We may rewrite Supplementary Equation (4) aṡ and matrices H js and H jsw are constructed in a similar manner to (6). The reconstruction problem then becomes identifying the non-zero interaction terms in Supplementary Equation (5) that best capture the unit's dynamics. Yet, the vector of coefficients c i j , c i js , c i jsw (and so on) are unknown, thereby, hindering the retrieval of Λ i from (5). Solving linear systems (5) consist of finding block structures for c i . These structured solutions are composed by blocks c i s of non-zero entries (representing the non-zero interactions acting on unit i) distributed along c i , cf. 1. However, such solutions cannot be retrieved in standard manners (e.g. Moore-Penrose pseudo-inversion or minimization of L 1 -norm, cf. [2]). Thus, solving this kind of systems requires more sophisticated methods. Specifically, given a linear system y = cD, where y ∈ R 1×M is a vector of M measurements and D ∈ R N d×M is a matrix divided into N matrix blocks D j ∈ R d×M for all j ∈ {1, 2, . . . , N }, we may define a structured solution in blocks for c as [3,4,5,6,7] where c j ∈ R d for all j ∈ {1, 2, . . . , N } represent the blocks. Recent efforts on solving systems (7) subjected to solutions (8) in blocks have been mainly focused on block sparse 1 solutions on under-determined systems (N d M ) [3,4,5,6,7], cf. 1. Such approaches (e.g. group lasso [3], sparse regression using mixed norms [5], block orthogonal matching pursuit [6]) take advantage of the indeterminacy of systems (7) to iteratively construct a block sparse solution from a family of solutions for (7) employing specific heuristics (e.g. minimizing c j 2 for j ∈ {1, 2, . . . , N } while maximizing the number of zero blocks [4]). Here, we developed the Algorithm for Revealing Network Interactions (ARNI) that reverse-engineers the non-zero interactions in Supplementary Equation (5) while solely relying on measured time series of x. Specifically, ARNI: (i) constructs function spaces for each interaction using the known basis functions and measured time series, (ii) finds the best suited set of interactions that reproduces the measuredẋ i under a squared error-minimization scheme and (iii) returns the explicit connections Λ i of the unit. Formally, ARNI is based on the Block Orthogonal Least Squares (BOLS) [9] algorithm. Opposite to methods described in [3,4,5,6,7], ARNI recovers connections by minimizing orthogonal projections on sequentially-enlarged function spaces generated by the basis functions, cf. Supplementary Figures 2 and 3. Thus, ARNI iteratively finds the composite function space that captures the recorded dynamics the best. Conceptually, the algorithm consists of the following steps: 1. Select a generic model for interactions by choosing orders K, P 1 , P 2 , P 3 and so on, in (2).

Select models for basis functions
and so on, cf. Supplementary Note 6.

Compute vectorsẋ
where H N +j := H jj , H N +N 2 +j := H jjj and so on, and Q = 6. Compute ∀ n ∈ N \L i,(l−1) the projection-error function where σ (y) stands for the standard deviation of entries of y, † indicates the Moore-Penrose pseudo-inverse, where 8. If the standard deviation σ l := σ( i (l)) ≤ θ, STOP algorithm and RETURNL i,(l−1) as set of incoming interactions.
and REPEAT from steps 6 and 7 until the condition in step 8 is satisfied or N \L i,(l−1) = ∅. 1 In the weak definition we follow in this work, a sparse vector consist of an array containing only few entries different from zero The projection-error function in step 6 differs from the cost function introduced in the main manuscript. Also, for the condition N \L i,(l−1) = ∅, the algorithm has a complexity of O Q 2 .

Supplementary Note 2 Summary of Scaling Behavior
In what follows, we briefly summarize how the approach scales with respect to different intrinsic features of the networked systems, specifically, the system size, the number of incoming connections per unit, and the noise level in the time series. We also provide an explanation about a key aspect of the algorithmic complexity of our approach. As shown in the main manuscript, the number M 0.95 of required measurements to achieve reconstructions of AUC > 0.95 scales ...
• sublinearly, likely logarithmically~log(N ) (see Figure 3a in main manuscript), with number N of dynamical variables in the network (that equals the number of units if they exhibit one-dimensional dynamics). Similar results have been reported in [10,2].
• linearly,∼ n i (see Figure 3b in main manuscript), with the number n i of incoming connections per unit.
• supralinearly (see Figure 3c in main manuscript) with the noise strength η.
Concerning the algorithmic complexity of our approach, if Q is the total number of proposed expansions in the right hand side of Supplementary Equation (5), it follows that the total number of projections for the worst-case condition N \L i,(l−1) = ∅ is given by Therefore, in the worst case where all Q expansions are needed to approximate the right hand side of Supplementary Equation (5) or the threshold θ is too small, our algorithm has a complexity of O Q 2 .

Supplementary Note 3 Quantifying quality of reconstruction
Our reconstruction problem may be seen as a binary classification problem where one has to identify whether network interactions are active or not. Thus, if an interaction is active and it is classified as active, it is counted as a true positive, yet if it is classified as inactive, it is counted as a false negative. Conversely, if the interaction is inactive and it is classified as inactive, it is counted as a true negative, but if it is classified as active, it is counted as a false positive.

True positives and false positives rates
We measure the predictive power of our approach using the true positives (TPR) and false positives (FPR) rates. Basically, the TPR quantifies the fraction of active interactions which were correctly identified as active from all active interactions (actual interactions) present in the dynamics. Analogously, the FPR quantifies the fraction of inactive interactions which were incorrectly identified as active from all inactive interactions available. Both quantities are defined as TPR = True positives Total number of positives , FPR = False positives Total number of negatives .

Receiver Operating Characteristics (ROC) curve
A ROC curve is a two-dimensional graph in which TPR is plotted against FPR. Its purpose is to depict the trade-off between true positives and false positives of a given classifier [11]. Depending on its internal parameters, the classifier may yield better or worse results. Thus, ROC curves provide a graphical way to assess the predictive power of such classifiers with respect to their internal parameters, cf. 4a. Ideally, one seeks to have points in the ROC space as close to the upper-left corner as possible. Points there represent classifiers with high TPR and low FPR. Thus, classifiers with ROC-scores close to the upper left corner are more likely to correctly predict true positives and those close to the diagonal, given that they are applied to a test data set that has similar dynamical properties (stationarity) and is comparable in length to the data set which was used to evaluate the ROC score. The diagonal line TPR = FPR represents a random-guessing classifier, cf. 4a. Thus, any classifier that appears below this line performs worse than random-guessing. Conversely, any classifier that appears above performs better than random-guessing.

Area Under the Curve (AUC) score
Still, one may want to characterize the predictive power of a classifier in terms of a single scalar variable rather than interpreting a two-dimensional graph. A broadly used method is to measure the Area Under the ROC Curve, also known as the AUC score [11]. As its name indicates, the AUC score quantifies the area below the ROC curve of a given classifier, and thereby, AUC score ∈ [0, 1], cf. 4b.
The AUC score provides an easy way to compare the predictive power of different classifiers. The higher the AUC score of a classifier is, the higher its predictive power. Also, the AUC score is equivalent to the probability that the classifier will rank (in a list of probable interactions) a randomly-selected active interaction higher than a randomly-selected inactive interaction [11].
In our implementation, we track the evolution of the minimal projection-error min n∈N L n i,l , for each literation, Supplementary Figure 3. Then, we compute the ROC curve by (i) systematically varying a threshold β on the sequence of minimal projection-errors, and (ii) calculating the TPR and FPR for each β. In particular, given a specific β, we consider as correct interactions those inferred in all iterations whose minimal projection-error is above β. Finally, we compute the AUC score of the resulting curve.

Supplementary Note 4 Random network model systems
To test our framework, we generated high dimensional time series simulating directionally coupled networks of dynamical systems. The resulting time series were used as inputs for reconstructing network structural connections by means of ARNI. To mimic real-world recordings, we sampled time series with sampling rates of ∆t = 1 for all simulations (except where stated). Nonetheless, we note that simulations were computed with time steps δt ∆t.

Ensembles inspired by gene regulatory circuits
To test the predictive power of our approach on systems mimicking gene regulation, we simulated networks of dynamical systems having Michaelis-Menten kinetics [12,13] x having n i randomly-selected incoming connections per node. Here, the entries of J ij of J ∈ R N ×N are given by J = R A, where stands for entry-wise-matrix product, A ∈ {0, 1} N ×N is a randomly generated adjacency matrix with a fixed number n i of non-zero entries per row and R ∈ R N ×N with R ij randomly drawn from a uniform distribution R ij ∈ [1,5].

Networks and Hypernetworks of Kuramoto-like oscillators
To introduce functional complexity in the coupling functions, we selected a model of Kuramoto-like oscillators with coupling functions having two Fourier modes [14] x having n i randomly-selected incoming connections per node. Here, the entries of J ij of J ∈ R N ×N are given by J = R A, A ∈ {0, 1} N ×N is a randomly generated adjacency matrix with a fixed number n i of non-zero entries per row and R ∈ R N ×N with R ij randomly drawn from a uniform distribution R ij ∈ [0.5, 1]. The natural frequencies ω i are randomly selected from a uniform distribution ω i ∈ [−2, 2] and ξ i represents external white noise affecting the dynamics of unit i. To characterize the selectivity of ARNI against network and hypernetwork interactions, we simulated extended models of (17) whereẋ with n i randomly-selected incoming interactions per node. Differently to (17), here we introduce the second order interaction matrix E i ∈ R N ×N for all i = {1, 2, . . . , N }. Specifically, the element E i jk weighs how units j and k directly influence unit i together. Thus, it follows that the elements E i ik for all k = {1, 2, . . . , N } represent regular network interactions. The entries of E i are set by N ×N is a randomly generated matrix with a fixed number n i of 1's across the matrix. The R ij entries of R ∈ R N ×N are randomly drawn from a uniform distribution R ij ∈ [0.5, 1].

Networks of Rössler oscillators
To test the robustness of our approach against chaotic dynamics, we simulated networks of coupled Rössler oscillators [15]. The dynamics of each oscillator is set by three differential equationṡ where n i is the number of incoming connections to unit i. As before, the entries of J are given by Similarly, ξ k i with k ∈ {1, 2, 3} represent external noisy signals acting on the unit's components.

Supplementary Note 5 Biological model systems
To test performance on biological model systems, we simulated the glycolytic oscillator in yeast [16] and circadian clock in Drosophila [17] and inferred their interactions from their collective dynamics.

Glycolytic oscillator model
Glycolytic oscillations is one of the best-studied examples for oscillations at the cellular level. All its factors oscillate at different phases with a shared relatively short period. In yeast cells, these oscillations were also shown to synchronize across cell population [18]. Here we use a single-cell model for anaerobic glycolytic oscillations in yeast, containing the influx of glucose and outflux of pyruvate and/or acetaldehyde, resulting in coupled ODE equations for seven species [16] where S 1 represents glucose, S 2 represents glyceraldehydes-3-phosphate and dihydroxyacetone phosphate pool, S 2 represents 1,3-bisphosphoglycerate, S 2 represents cytosolic pyruvate and acetaldehyde pool, S 2 represents NADH, S 2 represents ATP, and S 7 represents extracellular pyruvate and acetaldehyde pool. Parameters were set according to [19] and are shown in Supplementary

Circadian Clock
Circadian clocks underly the response to the day-night cycle. Specifically, we simulated a model of the circadian clock in Drosophila [17], where oscillations are driven by a negative feedback between per and tim genes, which code for PER and TIM proteins, and the PER-TIM complex. The rate equations for the 10-species circadian clock are:Ṁ where M T and M P are tim and per mRNAs, respectively. T 0 , T 1 and T 2 are forms of TIM protein, P 0 , P 1 and P 2 are forms of PER protein, and C and C N are forms of PER-TIM complex. The parameter values used for the simulations are based on [17]. The total levels of PER and TIM proteins, P t and T t , respectively, are given by:

Supplementary Note 6 Choosing a proper family of basis functions
Selecting an appropriate class of basis functions to represent the network interactions is vital step of our approach. Choosing basis functions that capture the intrinsic nature of interactions (e.g. single h(x i ), pairwise h(x i , x j ), triplet h(x i , x j , x w ), and so on) will lead to optimal results. However, finding an appropriate class of basis functions should not be confused with finding a specific set of basis functions. While the former implies finding basis functions of correct order k ∈ {1, 2, . . . , K}, Supplementary Figure 5, the latter implies finding a unique set of basis functions capable of representing the network dynamics.
Reconstruction of networks of Kuramoto-like oscillators (17) using different sets of basis functions reveal that employing bases resembling the intrinsic nature of interactions (e.g. pairwise) lead to optimal results, Supplementary Figure 5. The rationale is that expansions in basis functions seek to span functional spaces, thus, functions involving x i and x j together, h(x i , x j ), are more suitable to represent the dependencies induced by pairwise interactions in (17) than functions just involving single interactions, h(x i ) and h(x j ).