Anticipating regime shifts by mixing early warning signals from different nodes

Real systems showing regime shifts, such as ecosystems, are often composed of many dynamical elements interacting on a network. Various early warning signals have been proposed for anticipating regime shifts from observed data. However, it is unclear how one should combine early warning signals from different nodes for better performance. Based on theory of stochastic differential equations, we propose a method to optimize the node set from which to construct an early warning signal. The proposed method takes into account that uncertainty as well as the magnitude of the signal affects its predictive performance, that a large magnitude or small uncertainty of the signal in one situation does not imply the signal’s high performance, and that combining early warning signals from different nodes is often but not always beneficial. The method performs well particularly when different nodes are subjected to different amounts of dynamical noise and stress.

example, are interconnected by mutualistic, prey-predator, and other types of interactions.In a climate system, geographical regions are interconnected by, for example, water and heat transport.Network resilience is a comprehensive approach with which to understand how dynamics on networks respond to perturbations and system failures 16 .For dynamics on networks that possibly show tipping events and are relevant to these applications, it is highly likely that the network structure is complex 17,18 and that, related to this, all the nodes do not emit equally useful early warning signals [19][20][21][22][23][24] .Specifically, nodes that are about to tip may be emitting more useful early warning signals than other nodes in the same system that are still far from tipping.Evidence in favor of this view is that, as a dynamical system gradually changes, some nodes may tip earlier than others, showing multistage transitions 11,[25][26][27][28][29] .Therefore, selecting an appropriate set of sentinel nodes from which one calculates the early warning signal may improve the quality of the signal in addition to saving the cost of observing uninformative nodes.
Several methods for selecting a sentinel node set, denoted by S, for constructing early warning signals have been proposed.The dynamical network biomarker theory searches for an S that maximizes a composite index 9,10,30,31 .The index is the average pairwise correlation of the sample of data between the nodes within S multiplied by the standard deviation of the samples across the nodes in S, which is divided by the average pairwise correlation of the samples between a node in S and a node outside S. Another analytical approach is to identify a multidimensional linear dynamics model near the bifurcation point only from observed data and then use the dominant eigenvector associated with the estimated dynamics to select S (i.e., as the set of the ith nodes such that the ith entry of the dominant eigenvector is the largest in terms of the absolute value) 23 .Note that an earlier study pointed out the usefulness of the dominant eigenvector 19 .Participation of the node in the dominant eigenvector is also used for selecting S in other studies 21,24 .In addition, an eigenvector-based method has also been proposed for ranking the nodes most sensitive to perturbations 32 .A different study numerically showed that nodes with small degrees (i.e., small number of the neighboring nodes) are good performers 20 .A method to determine S based on network control theory also found an advantage of selecting nodes with small degrees 27 .Using the nodes receiving the highest total input from other nodes is also a powerful heuristic for setting S 29 .Simply using the nodes with the highest fluctuations also improves upon other naive methods 29 .Note that the last two methods are often better than using all the nodes in the network as S, and the same may be true for other node selection methods.
Except for the dynamical network biomarker theory, these methods provide ranking of nodes and suggest that we should include the top-ranked nodes in S. It is unclear whether combining the nodes in S improves an early warning signal compared to when the single top node in S is used.Any early warning signal (e.g., variance of the signal, lagged autocorrelation) is noisy because, by design, an early warning signal exploits the information about impending tipping events hidden in noisy observations.If a given early warning signal measured for each ith node is independent for the different nodes in S, then averaging them including the case of a weighted average is expected to produce a better early warning signal owing to the central limit theorem.Quantitatively, the standard deviation of the early warning signal should decrease according to ∝ n −1/2 , where n is the number of nodes in S, and ∝ represents "in proportion to".However, nodes in a complex system are interrelated, such as through edges in the case of conventional networks, and the states of different nodes are correlated in general [33][34][35] .Therefore, it is not a trivial concern whether or not combination of the top-ranked nodes generates a high-quality early warning signal.For example, suppose that S is composed of n nodes close to each other in the network and that the states of these n nodes are hence strongly correlated.In this case, averaging the early warning signals over the n nodes does not help much to reduce their fluctuation because the n early warning signals are similar to each other.We may be then tempted to select n nodes that are far from each other on the network even if some of the n nodes do not provide early warning signals of top quality.
In fact, the aforementioned dynamical network biomarker theory aims to optimize the set S, which the original authors call the dominant group, implicitly resolving this problem 9,10,30,31 .However, the composite index that they propose is heuristic, and why this method works well in medical applications 10,36 and how the effectiveness of their method translates to other applications such as in ecology are elusive.Furthermore, this theory and most other proposals of early warning signals neglect that early warning signals are themselves noisy.Considering fluctuations of early warning signals in their design is necessary for at least two reasons.First, if a candidate early warning signal carries a lot of noise, then that signal may not be useful even if its expected value is sensitive to the approach of the system towards the tipping point.A recent study highlighted interplay of the dominant eigenvector direction of the Jacobian matrix of the dynamics and the primary noise direction that is dragged towards the nodes receiving high dynamical noise 24 .Second, in the aforementioned example of n nodes, let us assume that the expectation of the single-node early warning signal is the same for all the n nodes.We stated that averaging over the n early warning signals would be beneficial relative to the single-node early warning signals only if the n signals are not strongly correlated with each other.In this situation, the expectation of the averaged early warning signal is the same as that of the single-node early warning signal.The bonus of the averaging is only present in the reduced fluctuation in the averaged as opposed to the single-node early warning signal.To enable such discussion, we need to assess the fluctuation as well as the magnitude of early warning signals.
In the present study, we develop a mathematical framework for node set selection for early warning signals based on stochastic differential equations on networks.By assuming dynamics near the equilibrium and using the variance of the node's state as the early warning signal, we propose an index whose maximization gives an optimal node set for constructing an early warning signal.We demonstrate our method with analytically solvable networks with two or three nodes and with numerically investigated larger networks combined with various dynamics models.

Theory
We consider an N-dimensional noisy nonlinear dynamical system in continuous time.We regard each of the N dynamical elements as a node and the entire dynamical system as stochastic dynamics on a network with N nodes.We denote by x i ðtÞ 2 R the state of the ith node at time t 2 R. We write xðtÞ = ðx 1 ðtÞ, . . ., x N ðtÞÞ > , where ⊤ represents the transposition.We assume that x(t) obeys a set of stochastic differential equations in the Itô sense given by dxðtÞ = FðxðtÞÞdt + BdW ðtÞ, ð1Þ where F : R N !R N ; B is an N × N matrix; W(t) represents an N-dimensional vector of independent Wiener processes (i.e., white noise).Assume that the dynamics given by Eq. ( 1) has an equilibrium in the absence of noise, which we denote by x * = ðx * 1 , . . ., x * n Þ > .By linearizing Eq. ( 1) around x * , we obtain a set of linear stochastic differential equations given by where z(t) = x(t) − x * , and the N × N matrix A is the sign-flipped Jacobian matrix of F at x(t) = x * .Equation ( 2) is a multivariate Ornstein-Uhlenbeck (OU) process 24,37 , and x * is asymptotically stable in the absence of noise if and only if A is positive definite.The covariance matrix in the equilibrium, corresponding to z * = (0, …, 0) ⊤ , is given as the solution to the Lyapunov equation given by where C = (C ij ) is the N × N covariance matrix, and C ij represents the covariance between z i (t) and z j (t), which is equal to the covariance between x i (t) and x j (t), at equilibrium 37,38 .The solution C is unique if the real part of all the eigenvalues of A is positive 38 .
Outcomes of the covariance matrix such as the standard deviation and correlation coefficient are often used as early warning signals for noisy multivariate dynamics.In practice, we need to estimate these quantities from samples.Therefore, we consider the sample variance of x i (t), which is a major early warning signal 1,[39][40][41] , and its average over a given node set 29,42 , as candidates of early warning signals.The choice of the variance rather than the standard deviation is because of mathematical tractability.We emphasize that the rationale behind averaging the sample variance over nodes is that we may be then able to obtain a less fluctuating early warning signal than that calculated from a single node.
Assume that we observe L samples of x i (t) from each node i in the given node set S. We denote by V i the unbiased sample variance of x i (t) calculated from the L samples.Let us consider the average of V i over the n = |S| nodes in set S as an early warning signal.Without loss of generality, we assume that S = {1, 2, …, n}, where n ≤ N. We denote this average by By assuming that z i,1 , …, z i,L are i.i.d. and using E[z i,ℓ ] = 0, we obtain where E denotes the expectation, Tr denotes the trace, C is the leading principal minor of order n of C (i.e., the submatrix of C composed of C ij with i, j ∈ {1, …, n}), and λ i is the ith eigenvalue of C. As shown in Supplementary Note 1, we also obtain the variance of V S , denoted by var ½ V S , as follows: The coefficient of variation (CV), i.e., the standard deviation divided by the mean, of V S , which quantifies the relative uncertainty in the estimation of V S , is given by Equation ( 7) has a couple of implications.First, the CV decays according to ∝ L −1/2 as the number of samples, L, increases, regardless of the node set S. This scaling corresponds to the central limit theorem.Second, in the case of a single node, CV = ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2=ðL À 1Þ p regardless of the node.Therefore, a large value of the sample covariance, V i , or its expectation, E½ V i , at a node i compared to at other nodes does not imply that V i is better than V j 's (with j ≠ i) as an early warning signal.A large signal carries a proportionately large amount of noise.This property also holds true when one uses the sample standard deviation, instead of the sample variance, of single nodes as early warning signal.Third, the CV with n ≥ 2 is always smaller than the CV with n = 1 (i.e., ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2=ðL À 1Þ p ) unless all but one eigenvalues of C are equal to 0. Note that this result holds true owing to λ i ≥ 0 ∀ i, which follows from the fact that C is a covariance matrix and therefore positive semidefinite.This result motivates us to use the average of V i over nodes as opposed to V i at a single node with the aim of improving the performance of the early warning signal.We will investigate this possibility in the following sections.

Coupled nonlinear dynamics on networks with two or three nodes
In this section, we analyze coupled nonlinear dynamics on networks with N = 2 and N = 3 nodes.Through this analysis, we highlight relevance of multistage transitions, impact of averaging V i over nodes, heterogeneous amount of dynamical noise given to different nodes, and uncertainty of early warning signals, among other things.We then propose a measure of the quality of early warning signals in the form of V S and a method to select an optimal node set for constructing early warning signals.
Two nodes connected by a directed edge.We consider a network composed of two nodes and a directed edge of weight w(≥0); see Fig. 1a for a schematic.We assume that node 1 influences node 2 but not vice versa.We also assume that, as a bifurcation parameter, denoted by r, gradually increases, node 1 undergoes a saddle-node bifurcation and that node 2 also undergoes a saddle-node bifurcation either almost at the same time as node 1 or after r has further increased.The model is given by dx 1 ðtÞ = f ðx 1 ðtÞ, rÞdt + σ 1 dW 1 ðtÞ, ð8Þ dx 2 ðtÞ = f ðx 2 ðtÞ, r À ΔrÞ + wðx 1 ðtÞ + 1Þ where Δr(≥ 0) is a constant, σ 1 and σ 2 are the intensities of dynamical noise applied to nodes 1 and 2, respectively, and f(x) satisfies the following conditions.First, we assume f(x, r) = r + x 2 when r ≤ 0 and x ≤ ffiffiffiffiffiffi Àr p + Δx, where Δx( > 0) is a small constant.This condition guarantees that, in the absence of coupling and dynamical noise, Eqs. ( 8) and ( 9) are both the topological normal form of the saddle-node bifurcation 43 .In other words, dx/dt = f(x, r) with r < 0 has a stable equilibrium x * = À ffiffiffiffiffiffi Àr p and an unstable equilibrium x * = ffiffiffiffiffiffi Àr p , which collide at x * = 0 when r = 0.In Fig. 2, we show an example bifurcation diagram of single-node deterministic dynamics given by dx/dt = f(x, r).If σ 1 = 0, then x 1 (t) undergoes a saddle-node bifurcation at r = 0 as r increases starting with a negative value.If σ 2 = 0 and w = 0, then x 2 (t) undergoes a saddle-node bifurcation at r = Δr.Second, we assume that f(x, r) is continuous in terms of x and r for simplicity.Third, we assume that f(c, r) = 0 for ∀ r ≥ 0 for a unique positive value of c, which is larger than Δx.This implies that, in the absence of noise, x = c is the unique  stable equilibrium after a node undergoes a saddle-node bifurcation as r gradually increases.This assumption in combination with the continuity assumption for f(x, r) also implies that the stable equilibrium apart from x * = À ffiffiffiffiffiffi Àr p persists for some r < 0 although its position changes from x = c in general.Therefore, there are two stable equilibria at least in some range of x < 0 near x = 0, as shown in Fig. 2.
We assume that w > 0 and consider this dynamical system in the range of bifurcation parameter r ∈ [ − 1, 0), with (x 1 , x 2 ) satisfying − 1 ≤ x 1 , x 2 < 0 in the absence of dynamical noise.In other words, we assume that x 1 and x 2 are both near the lower stable equilibrium in Fig. 2. In this situation, the input that node 2 receives from node 1, i.e., w(x 1 + 1), is positive.To prevent node 2 from transiting from its lower to the upper state through a saddle-node bifurcation earlier than node 1 when r gradually increases, we assume that w ≤ Δr, which guarantees that − Δr + w(x 1 + 1) ≤ 0. Let us gradually increase r starting with r = − 1.Then, x 1 jumps from 0 to c at r = 0 via a saddle-node bifurcation.We distinguish between two cases depending on the change in x 2 right after the transition of node 1 from x 1 = 0 to x 1 = c.If − Δr + w(c + 1) > 0, then x 2 transits from x 2 = 0 to x 2 = c immediately after x 1 does without a further increase in r (i.e., at r = 0).Otherwise, x 2 does not transit at r = 0, and x 2 does so at a positive value of r if we further increase r.The latter case is a multistage transition 28,29 .
The lower equilibrium of the coupled dynamical system, which exists if and only if r ≤ 0 and is locally stable when r < 0, is given by The linearized dynamics around x * is given by Eq. ( 2) with By combining Eqs. ( 3) and (11), we obtain Note that x , and C 22 , respectively, also diverge.
To investigate the quality of V 1 , V 2 , and V f1, 2g as early warning signals, we set w = 0.5, σ 1 = 0.1, L = 100, and used three pairs of Δr and σ 2 values, which we refer to as scenarios.In Fig. 3, we show and std ½ V f1, 2g as we gradually increase r under the three scenarios.In Fig. 3a, we show the results for the first scenario, for which we set Δr = 1 and σ 2 = σ 1 = 0.1.The solid lines represent the expected value of each early warning signal.The shaded area represents the mean ± standard deviation.With these parameter values, when node 1 transits from x 1 = 0 to x 1 = c at r = 0, node 2 is still far from the bifurcation point.This is because the input from node 1 to node 2, which is equal to w(x 1 + 1) ≈ 0.5, is substantially smaller than − Δr( = 1) such that x 2 approximately obeys dx 2 = ( − 0.5 + x 2 ) dt + σ 2 dW 2 (t) near r = 0. Right after node 1 transits from x 1 = 0 to x 1 = c at r = 0, the input from node 1 to node 2 jumps to w(x 1 + 1) ≈ 0.5(c + 1).Therefore, a multistage transition in which nodes 1 and 2 transit from their lower to the upper state at different r values occurs if 0.5(c + 1) < 1, i.e., 0 < c < 1. Otherwise (i.e., c ≥ 1), node 2 also transits from its lower to the upper state at r = 0. Regardless of whether or not a multistage transition occurs, Fig. 3a indicates that E½ V 1 is more sensitive to increases in r than E½ V 2 and E½ V f1, 2g , and that std ½ V 1 is larger than std ½ V 2 and std ½ V f1, 2g .Therefore, it is not obvious which of V 1 , V 2 , or V f1, 2g is a better early warning signal.
To rank these different candidates of early warning signals, we define the following index.We measure each early warning signal's mean and standard deviation at r = − 0.3 and r = − 0.1.We do so based on the premise that it is necessary to measure signals at least at two r values to estimate how responsive the signal is to the change in the environment, i.e., r.As we explained in Supplementary Note 1, V 1 , V 2 , and V f1, 2g obey a normal distribution at equilibrium.Therefore, we measure a distance, denoted by d, between the normal distribution at r = − 0.3 and that at r = − 0.1 for each early warning signal.If d is large, it is relatively easy to tell the increase in the signal with relatively small uncertainty as r increases from − 0.3 to − 0.1, approaching a tipping point.Therefore, we propose that an early warning signal attaining a larger d value is better.Although there are various measures of the separability of two normal distributions 44 , inspired by the t-statistic, we define where μ 1 and var 1 are the mean and variance of the normal distribution at r = − 0.3, and μ 2 and var 2 are those at r = − 0.1.We show in Supplementary Note 2 that the results shown in the remainder of this section are robust with respect to the choice of the two r values for calculating d.Note that d can be small even if the mean of the early warning signal grows by a large amount between the two r values.This is the case when var 1 or var 2 is large, i.e., when the uncertainty of the signal is large.In this situation, tracking the early warning signal is relatively uninformative.In contrast, even if the absolute magnitude of the increase in the signal is small (i.e., small |μ 1 À μ 2 |), the signal provides a good early warning if the uncertainty of the signal is small.We have found that dð V 1 Þ = 2:58, dð V 2 Þ = 1:27, and dð V f1, 2g Þ = 2:78.Therefore, we suggest that V 1 is a better early warning signal than V 2 .We emphasize that it is not because V 1 is larger than V 2 but because V 1 responds more strongly to an increase in r than V 2 does with relatively small uncertainty.Furthermore, V f1, 2g , i.e., the average of V 1 and V 2 , realizes a larger value of d than V 1 .Therefore, we suggest that V f1, 2g is a better early warning signal than V 1 and V 2 .This is intuitively because the averaging cancels out noise in the two signals, while V 1 and V 2 are not independent of each other due to the edge between nodes 1 and 2. We note that E½ V f1, 2g is smaller than E½ V 1 (see Fig. 3a), challenging a natural idea of using single nodes or node sets with the largest variance as sentinel nodes.
We show in Fig. 3b the mean and standard deviation of the signals for the second scenario; we changed Δr from 1 to 0.5.In this scenario, x 2 (t) always transits from its lower to the upper state immediately after node 1 does so at r = 0.This is because, when r → 0 − , Eq. ( 8) implies that x 1 (t) → 0 − if we ignore the noise term, and r − Δr + w(x 1 (t) + 1) ≈ 0 combined with Eq. ( 9) implies that x 2 (t) → 0 − .Therefore, even with a small c value, both nodes 1 and 2 sequentially transit from its lower to the upper state at r = 0. Figure 3b Because the quality of V 2 is much better than in the first scenario, the average signal, V f1, 2g , is substantially better than V 1 in terms of d in the present second scenario, while V f1, 2g was only marginally better than V 1 in the first scenario.
In addition to different amounts of constant stress as parametrized by Δr, different nodes may be subject to different amounts of dynamical noise.We show in Fig. 3c the results for the third scenario, in which we set Δr = 1 and σ 2 = 0.2.In this scenario, the saddle-node bifurcation for node 2 is delayed such that a multistage transition may occur, as in the first scenario.The difference to the first scenario is that, in the present scenario, node 2 receives a larger dynamical noise than node 1.Therefore, V 2 is noisier than V 1 and V f1, 2g , which is apparent in Fig. 3c.If we only measure the early warning signals at one r value (e.g., r = − 0.3 or r = − 0.1), then one may be tempted to regard that V 2 is a better early warning signal than V 1 and V f1, 2g because E½ V 2 is larger than E½ V 1 and E½ V f1, 2g .However, this conclusion is erroneous for two reasons.First, std ½ V 2 is larger than std ½ V 1 and std ½ V f1, 2g .Second, V 2 is not much sensitive to the change in r, as Fig. 3c shows.In fact, we obtain dð V 1 Þ = 2:58, dð V 2 Þ = 0:97, and dð V f1, 2g Þ = 2:12.This result suggests that V 1 is a better signal than V 2 and V f1, 2g .In this case, taking the average of V 1 and V 2 does not help because V 2 is too poor.Lessons from the analysis of the present scenario are that (i) a large signal value does not necessarily imply a better early warning signal and that (ii) it is sometimes better to exclude some nodes from the calculation of the early warning signal if those nodes are either only marginally responsive to the change in the bifurcation parameter or they carry larger fluctuations than other nodes.
Chain with three nodes.In this section, we consider three nodes of identical nonlinear dynamics, except possible differences in the noise strength, coupled as an undirected chain network with N = 3 nodes.We show a schematic of the network in Fig. 1b.Specifically, we consider the set of stochastic differential equations given by where f(x) is given in the "Two nodes connected by a directed edge" section.If c is small enough and σ 1 = σ 2 = σ 3 , node 2 transits from its lower to the upper state earlier than nodes 1 and 3 as r(≥ − 1) increases from a negative value because node 2 receives positive input from nodes 1 and 3 while node 1 and 3 each receive input solely from node 2.
When − 1 ≤ r ≤ 0, the equilibrium in the absence of noise satisfying If we ignore the dynamical noise, the first saddle-node bifurcation, which is the transition of node 2 from its lower to the upper state, occurs at r = r c , where r c satisfies 0 > r c > r 00 c 2w½Àðw + 1Þ + ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi wðw + 1Þ p , as r gradually increases from r = − 1.At r = r c , the two parabolas given by Eqs. ( 19) and ( 20) are tangent to each other.See Supplementary Note 3 for the derivation including that of the uniqueness of the stable solution satisfying x * 1 ð = x * 3 Þ < 0 and x * 2 < 0 when r ≤ r c .We set w = 0.05, which leads to r c ≈ r 00 c ≈ À 0:082.Therefore, we calculate the d values at r = − 0.3 and − 0.1 as we did in the "Two nodes connected by a directed edge" section.We show in Supplementary Note 2 that the following results are reasonably robust against variation in the two r values.
(3), we obtain By substituting Eqs. ( 22)-( 25) into Eqs.( 5) and ( 6), we obtain the mean and standard deviation of each early warning signal for any node set S. Because V 1 and V 3 obey the same normal distribution, we obtain where Corr is the Pearson correlation coefficient between x 1 (t) and x 3 (t) in the equilibrium.Therefore, std ½ V f1, 3g is smaller than std ½ V 1 unless x 1 (t) and x 3 (t) are perfectly correlated, which does not happen because they are subjected to independent Wiener processes dW 1 (t) and dW 3 (t).Therefore, V f1, 3g is always better than V 1 and V 3 for this model.In addition, due to the assumed symmetry between nodes 1 and 3, S = {1, 2}, {1, 3}, and {1, 2, 3} exhaust all possibilities of combining multiple nodes' signals into one early warning signal.Therefore, it suffices to compare the performance of V 2 , V f1, 2g , V f1, 3g , and We set σ 2 = 0.1 and L = 100, and consider three values of σ 1 .In the first scenario, we set σ 1 = σ 2 = 0.1.For this case, we show the mean and standard deviation of V 2 , V f1, 2g , V f1, 3g , and V all as a function of r in Fig. 4a.Note that the result for V 1 is the same as that for V f1, 3g except that std ½ V 1 is 2/(1 + Corr)( > 1) times larger than std ½ V f1, 3g .It is not straightforward to tell from Fig. 4a which signal is better than others.However, we expect that V 2 provides a better early warning signal than V 1 because the transition of node 2 from the lower to the upper state as r increases from r ≈ − 0.5 is more impending than the transition of nodes 1 and 3.In fact, we obtain dð V 1 Þ = 3:52, dð V 2 Þ = 4:72, dð V f1, 2g Þ = 5:84, dð V f1, 3g Þ = 4:97, and dð V all Þ = 6:77, verifying this prediction.Furthermore, it is the best to use V all , i.e., the average of the sample covariance over all nodes.
In the second scenario, we set σ 1 = 0.7 such that nodes 1 and 3 receive larger dynamical noise than node 2. We show the behavior of the early warning signals in this case in Fig. 4b.The figure indicates that the mean magnitude of any signals including 3 , and V all ) is much larger than that of V 2 .However, this phenomenon does not imply that it is better to use a signal including V 1 because the enhanced signal size owes to the increased amount of dynamical noise.We obtain dð and dð V all Þ = 5:10.Therefore, we suggest that we should only use node 2 as early warning signal under the present scenario because V 2 is the least affected by the dynamical noise.Note that the magnitude of V 2 in the present scenario is similar to that in the first scenario (see Fig. 4a).
In the third scenario, we set σ 1 = 0.015 such that node 2 receives large dynamical noise than nodes 1 and 3. We show the behavior of the early warning signals in this case in Fig. 4c.We find that E½ V 2 is much larger than E½ V f1, 3g ð = E½ V 1 Þ, which does not use V 2 .Note that, the magnitude of V 2 is similar to that in the last two scenarios because we kept σ 2 = 0.1 in all the three scenarios.We obtain and dð V all Þ = 4:84.Therefore, we suggest that V f1, 3g is the best early warning signal under the present scenario.In the second and third scenarios, the best early warning signal turns out to be the one having the smallest mean magnitude at both r = − 0.3 and r = − 0.1.

Numerical results for larger networks and various dynamical systems
We propose to use the node set S maximizing d as the sentinel node set from which we calculate the early warning signal, V S .In the networks with two or three nodes analyzed in the "Coupled nonlinear dynamics on networks with two or three nodes" section, the maximizers of d are composed of nodes with small dynamical noise (i.e., small σ i ).However, in networks with larger numbers of nodes, it may not be the best to select nodes with small dynamical noise.It is because x i 's of these nodes may be strongly correlated with each other, which typically occurs when these nodes are closely located in the network.In this case, averaging V i over these nodes, which yields V S , does not help much to reduce the fluctuation, potentially making this V S a poor early warning signal.Therefore, to test the performance of our method to select the sentinel node set in networks with larger numbers of nodes, we carry out numerical simulations with different networks and dynamical systems in this section.
Setup for numerical simulations.We use six undirected and unweighted networks with N nodes and denote a network's adjacency matrix by (w ij ), with w ii = 0 and w ij = w ji ∈ {0, 1} ∀ i, j ∈ {1, …, N}.
Our theory requires the covariance matrix, C, or at least its eigenvalues, estimated at two values of the bifurcation parameter.In the previous sections, we theoretically calculated C by linearizing the original dynamics around the equilibrium and solving the Lyapunov equation given by Eq. (3).However, to know matrix A in Eq. (3), we need to know the term driving the dynamics of the node, i.e., the derivative of F in Eq. ( 1), which depends on the single node's intrinsic dynamics and form of the coupling between nodes.However, we usually do not have access to such detailed information given empirical data, while estimating them from observed data is an active research field (e.g., ref. 23).Therefore, we use the sample covariance matrix as an estimator of C.This choice is for simplicity, and there are better estimators of the covariance matrix that are useful especially when the number of samples, L, is small relative to the number of variables, N 45 ; we will discuss this limitation of the present study in the "Discussion" section.
Results for the coupled double-well model.We first consider a coupled double-well model on networks with dynamical noise given by Equation ( 26) represents dynamics of species abundance 25 or climates in interconnected regions 26 .Parameters r 1 , r 2 , and r 3 control the location of the equilibria and satisfy r 1 < r 2 < r 3 ; D(≥0) is the strength of coupling between nodes, and u i is a constant stress given to the ith node, equivalent to − Δr in Eq. ( 9).When D or u i ∀ i is sufficiently small, x = ðx 1 , . . ., x N Þ > in which all the nodes are in their lower state near r 1 is the unique stable equilibrium.In contrast, when D or u i ∀ i is sufficiently large, x in which all the nodes are in their upper state near r 3 is the unique stable equilibrium.In between, we initially set all the nodes in their lower state at the start of each simulation, and as one gradually increases D or u i , all nodes may flip to the upper stable state at once or in multiple stages 28,29 .
For this system, we numerically assess the performance of the variance of a single node or its average over the nodes in set S as early warning signal to anticipate the first transition from the lower state to the upper state.We use either u i or D as the bifurcation parameter in a given sequence of simulations.If the bifurcation parameter is u i , we started with D = 0.05 and u i = u = 0 ∀ i and gradually increased u until the first transition occurs (see the Methods section for the precise definition of the transition and parameter values for the numerical simulations).We computed the covariance matrix at reasonably separated two values of u, denoted by u (1) and u (2) (see the Methods section for details).At each of u = u (1) and u = u (2) , we collected L = 100 samples of each x i , i ∈ S and calculated d using Eq. ( 15).As a demonstration, we show how noisy early warning signals gradually increase as we gradually increase u in one series of simulations in Fig. 5.When we select n = 5 nodes uniformly at random, the sample covariance averaged over the nodes in S (i.e., V S ) gradually increases until the first node transits from its lower to the upper state (see the orange line).When we use the set of n = 5 nodes maximizing d, the sensitivity of V S to the increase in u is notably larger near the first tipping point (see the blue line).Therefore, the maximizer of d provides an apparently better early warning signal than a uniformly randomly selected node set.
We assessed the performance of the sample covariance averaged over the nodes in S using the Kendall's τ, which is conventionally used 13 .We evaluated S with n = {1, 2, 3, 4, 5} nodes and also the S composed of all N nodes, which refer to as "all", as a control.We exhaustively examined all possible S with n = 1 or n = 2 and uniformly randomly sampled 5000 sets of S for each n ∈ {3, 4, 5} due to a large number of combinations.In Supplementary Note 4, we also provide a stopping criterion and a numerical demonstration when one wants to explore S with n > 5 nodes for better solutions (i.e., S with larger d values).
We show the results of one set of simulations on the Barabási-Albert (BA) model network with 50 nodes and average degree 〈k〉 = 3.88 in Fig. 6.We gradually increased u as the bifurcation parameter.Figure 6a shows the relationship between the Kendall's τ and d when the early warning signal is the variance of a single x i .Each symbol corresponds to a node.We find that the node maximizing d, indicated by the intersection of the two dashed lines, is the third best performer in terms of τ.At each value of n ∈ {2, 3, 4, 5}, the maximizer of d is not the top performer in terms of τ but is reasonably good performer (see Fig. 6b-e).For example, the pair of nodes attaining the largest d, corresponding to the symbol at the intersection of the two dashed lines in Fig. 6b, provides the eighth best early warning signal among all the N(N − 1)/2 = 1225 pairs of nodes.At each value of n ∈ {1, …, 5}, we find that τ and d are positively correlated; the Pearson correlation coefficient between τ and d is equal to 0.42, 0.46, 0.45, 0.44, and 0.43 for n = 1, 2, 3, 4, and 5, respectively.It should also be noted that both d and τ generally increase as n increases.When all nodes are used, d and τ are the largest (see Fig. 6f).Therefore, in the present case, using all the nodes is a better choice than using particular combinations of n ≤ 5 nodes.However, a reasonably large τ is attained only using n ≤ 5 nodes if we choose appropriate node sets based on d.One can find S that maximizes d only by observing the covariance matrix at two bifurcation parameter values.
We quantified the performance of the optimized node set, i.e., the node set that maximizes d at each value of n, using p 1 and p 2 .For a given n, we define p 1 as twice the fraction of node sets whose τ is larger than that for the optimized node set.We note that 0 ≤ p 1 ≤ 2. We define p 2 as ðτ max À τ * Þ=ðτ max À hτiÞ, where 〈τ〉 is the average of τ over all the node sets with n nodes examined, τ max is the largest (i.e., best) τ value among the node sets with n nodes examined, and τ * is the τ value realized by the maximizer of d with n nodes.Although we have assumed that a larger positive τ is better, the definition of p 1 and p 2 are similar when the nodes transit from their upper to lower states through tipping points and therefore a larger negative τ implies a better performance of an early warning signal (Supplementary Note 5).If the maximizer of d is the best node set, realizing the largest τ, then both p 1 and p 2 are equal to 0 and the smallest.Smaller p 1 and p 2 values are better.If half the node sets are better than the maximizer of d, meaning that the maximizer of d is no better than a uniformly random pick, then we obtain p 1 = 1.Similarly, p 2 = 1 is equivalent to τ * = 〈τ〉, implying that the maximizer of d is no better than a random pick.If the maximizer of d is the worst performer, we obtain p 1 = 2 and a value of p 2 larger than 1.
We show in Fig. 7a the p 1 and p 2 values for the double-well dynamics on the BA network with n ∈ {1, …, 5}.Each small symbol represents the p 1 and p 2 values for a single series of simulations in which we gradually increased the value of u towards the first transition.Figure 6 shows the relationship between τ and d in one of the 50 series of simulations whose results are shown in Fig. 7a.The larger symbols in Fig. 7 indicate the average over the 50 series of simulations.We find that the optimizer of d is not a top but reasonably good performer because the p 1 and p 2 values are substantially smaller than 1 in most runs.The performance of the maximizer of d relative to that of uniformly randomly selected node sets with the same number of nodes, n, degrades as n increases.
Robustness of node set optimization under different heterogeneity scenarios, networks, dynamical systems, and other factors.To examine the generality of the effectiveness of the node set optimization, we ran simulations, assessed the performance of the early warning signals in terms of the Kendall's τ, and calculated p 1 and p 2 for the following variations.First, we have assumed that all nodes are homogeneous except with regard to the position in the network.However, as we examined with Fig. 3a and b, different nodes may have different Fig. 5 | Early warning signal, V S , for the maximizer of d (blue line) and a set of nodes selected uniformly at random (orange line).Both sentinel node sets are composed of n = 5 nodes.We also show x * i for each node at each value of u.We used the coupled double-well dynamics on a BA network with N = 50 nodes and gradually increased u. propensity to tip, yielding a multistage transition for the entire system.To examine this case, we set u i = u + Δu i in Eq. ( 26), where Δu i independently obeys the uniform density on [ − 0.25, 0.25].We continue to use u as the bifurcation parameter.A node with large u i tends to transit from its lower state to the upper state earlier as u gradually increases.
Note that a heterogeneous distribution of u i makes it more difficult to find a good node set S only from the information on the network structure.We show the p 1 and p 2 values when u i is heterogeneous and σ i is homogeneous in Fig. 7b.The results are qualitatively the same as those for the case in which both u i and σ i are homogeneous (see     7a).Separately, we also consider the case in which both the node stress u i and the strength of the dynamical noise σ i are heterogeneous.We set σ i = σ + Δσ i , where σ = 0.05, and Δσ i independently obeys the uniform density on [ − 0.9σ, 0.9σ].We show p 1 and p 2 for this case in Fig. 7c.The results are qualitatively the same as those when both u i and σ i are homogeneous (Fig. 7a) and when only u i is heterogeneous (Fig. 7b).
Second, we have considered six networks including the BA model, three of which are model networks and the other three are empirical networks (see Methods for the networks).Third, we have considered four dynamical system models on networks including the coupled double-well dynamics; the other three dynamics are a model of mutualistic interaction dynamics among species 46 , a gene regulatory system governed by the Michaelis-Menten equation 46 , and the deterministic approximation of the susceptible-infectious-susceptible (SIS) model on networks 47 .These four models of dynamics are diverse in the sense that the SIS model shows a transcritical bifurcation to mark the onset of epidemic spreading, while the other three models show saddle-node bifurcations.In addition, we started the mutualistic interaction and gene regulatory dynamics from the upper state for each node and gradually decreased the bifurcation parameter value, which is the opposite to the case of the double-well and SIS dynamics.This is because the loss of resilience such as species loss in mutualistic dynamics and cell death in gene regulatory dynamics is of a practical concern for these dynamical systems 46 .Fourth, we also varied D instead of u as bifurcation parameter until the first bifurcation occurs.For the SIS model, we only considered an equivalent to D, which is the infection rate parameter, but not u because introducing and varying u is unrealistic for the SIS model; see Methods for the discussion.
We show the p 1 and p 2 values for n ∈ {1, …, 5}, the three different scenarios for u i and σ i (i.e., constant across all nodes or heterogeneous), two networks, four different dynamical systems, and two different bifurcation parameters (i.e., u or D) in Fig. 8.Each figure panel contains the average of p 1 and p 2 over 50 series of simulations for the three scenarios of heterogeneity (i.e., both u i and σ i are homogeneous, only u i is heterogeneous, and both of them are heterogeneous) for one network and one dynamical system.Note that Fig. 8a is for the doublewell dynamics on the BA network, corresponding to Figs. 6 and 7. We find that the node set based on the largest d value performs reasonably well for the two networks (see Fig. 8a-g for the BA network and Fig. 8h-n for the Chesapeake Bay carbon flow network) and for all the four models of dynamics.The node set maximizing d tends to work better relative to uniformly randomly picked node sets, yielding smaller p 1 and p 2 values, when u i and σ i are both heterogeneous across the nodes (shown by the dotted lines) than when either u i or σ i is homogeneous (shown by the solid and dashed lines).This is presumably because the proposed method actively discards nodes with high intrinsic noise by comparing the distribution of the signal variance at two values of the bifurcation parameter.The node set maximizing d tends to work better relative to uniformly random node sets with the same number of nodes, n, when n is smaller.However, even at n = 5, the p 1 and p 2 values are at most approximately 0.7 and much smaller in a majority of cases.The results are qualitatively the same for the other four networks, with a caveat that the maximizer of d performs poorly for the gene regulatory dynamics model on some networks (see Supplementary Note 6).
We also carried out the following robustness tests.Our choice of the two bifurcation parameter values at which we sample the covariance matrix and then calculate d (see Methods for the definition of the two bifurcation parameter values) has been arbitrary.We used one bifurcation parameter value close to the start of each series of simulation and the other value close to the first tipping point, regardless of the type of bifurcation.This is because, with this choice, the difference between μ 1 and μ 2 , which is the numerator of d (see Eq. ( 15)), is larger than when the two bifurcation parameter values are closer.Then, the contrast of d for different choices of S may be larger, potentially helping to single out the S that maximizes d.To test the robustness of our results with respect to the choice of the two bifurcation parameter values, we measured p 1 and p 2 for various pairs of the bifurcation parameter value, which are different in terms of the separation between the two values and the closeness to the first tipping point.For the three scenarios for u i and σ i (i.e., constant across all nodes or heterogeneous), six networks, four dynamical systems, and two bifurcation parameters (i.e., u or D), we have found that the result that p 1 and p 2 are small in most cases is robust.Specifically, p 1 and p 2 are considerably smaller than 1 when they are so for the original pairs of the bifurcation parameter value, the two bifurcation parameter values are not too close to each other, and the larger bifurcation parameter value is reasonably close to the first tipping point.(See Supplementary Note 2 for the numerical results.)Not all the regime shifts accompany critical slowing down.Early warning signals based on critical slowing down, including V S for any node set S, should be insensitive to such types of regime shifts 1,11,14,48 .To verify that this is the case, we investigated coupled double-well dynamics on the BA network experiencing a state transition from the lower to the upper state of x i (t) driven by either dynamical noise or impulse input given to x i (t)'s.Dynamical noise and impulse input are two major scenarios in which regime shifts occur without critical slowing down 11,14 .It should be noted that, in both scenarios, all the system parameter values remain the same throughout a simulation.As expected, V S is found to be insensitive to impending regime shifts until they are about to occur, and therefore, τ is close to 0. These results hold true for any choice of S, and the S maximizing d does not yield a particularly large τ value.See Supplementary Note 7 for details.
In contrast to the last case, critical slowing down can occur even when there is no sudden regime shift.Representative examples of this phenomenon are transcritical and Hopf bifurcations 14,49 .The gene regulatory and SIS models that we have employed show transcritical bifurcations, at least in well-mixed populations.We showed that V S increases just before the bifurcation, which is captured by large τ values, in the gene regulatory and SIS models (see Supplementary Note 8 for numerical evidence).Therefore, consistent with the previous studies 14,49 , we are not claiming that V S specifically increases when a bifurcation accompanying a large regime shift, which is typically a saddle-node bifurcation, is being approached.
Last, we derived theory for the variance of x i (t) and its average over nodes for mathematical convenience.However, the standard deviation of x i (t) rather than the variance of x i (t) is also a common early warning signal for anticipating tipping points 13,50,51 .We verified that the results on the advantage of the maximizer of d remained almost the same when we used the average of the sample standard deviation of x i (t) over i ∈ S as the early warning signal (see Supplementary Note 9).
Comparison with other node selection methods.A simple heuristic to select a node set for constructing early warning signals by averaging is to use the n nodes with the largest standard deviation of x i Figure 9 shows an example.In this figure, we used the double-well dynamics on the BA network, as we did for Fig. 6, and assumed that both u i and σ i are heterogeneous.The figure indicates that the maximizer of d substantially outperforms Large SD except when n = 4 and n = 5, for which the maximizer of d only slightly outperforms Large SD.The compromised performance of Large SD in this scenario is because Large SD tends to select nodes with large σ i although a large σ i value simply reflects that the ith node is inherently noisier than others.
To compare between the maximizer of d and Large SD more systematically, we compared the Kendall's τ values attained by these two strategies for the six networks, the four dynamics models, and whether the stress or dynamical noise is homogeneous or heterogeneous across the nodes.As we show in Supplementary Note 8, we confirmed that the maximizer of d outperforms Large SD in a majority of cases when both the stress, u i , and the noise strength, σ i , are nodedependent.The maximizer of d outperforms Large SD even when σ i is node-independent for the mutualistic interaction dynamics and D being used as the bifurcation parameter.In the other cases, the maximizer of d is either on par with or slightly inferior to Large SD.However, there is no case in which Large SD substantially outperforms the maximizer of d (i.e., Large SD outperforms the maximizer of d in terms of τ by at most ≈ 0.078), whereas the maximizer of d often substantially outperforms Large SD.
Furthermore, we compared the maximizer of d with another heuristic algorithm with which one selects the n nodes that receive the highest total input (or lowest total input, depending on the direction of the bifurcation being considered) from the other nodes near the bifurcation point 29 , which we refer to as "High/Low Input".The results of comparison between the maximizer of d and High/Low Input were similar to the case of the comparison between the maximizer of d and Large LD (see Supplementary Note 10 for the methods and results).In other words, the maximizer of d tends to be better than the High/Low Input algorithm, in particular when the nodes' dynamics are assumed to be heterogeneous.We note that High/Low Input requires the adjacency matrix of the network, i.e., complete information on the network structure, which neither the maximizer of d nor Large SD requires.

Discussion
Based on theory of OU processes, we proposed an objective function d for identifying sets of nodes, S, that are expected to reliably alert impending tipping events when we combine the early warning signals from S by averaging.While we focused on anticipating the first bifurcation in the entire network, the proposed method is equally applicable to anticipating the second and later transitions in the case of multistage transitions in which different nodes experience regime shifts at different timings 11,[25][26][27][28][29] .We numerically demonstrated the proposed method with various dynamics models, networks, and system heterogeneity scenarios to confirm its good performances.Application to domain-specific problems such as in ecology, climate dynamics, and disease progression is saved for future work.In these and other applications, further complexity of systems in question in addition to the network structure and nonlinearity that we have neglected, such as the mixture of positive and negative interactions 18 and nonequilibrium dynamics 32 , may require alternations of our method, posing further research questions.
Nonetheless, our method is readily applicable to empirical data because it does not require the information about the network structure or the dynamical model equations.It only requires the covariance matrix among the observables measured at two values of the bifurcation parameter, or two different states of the networked system, to determine an optimal sentinel node set, S. In ecological 52,53 and psychopathological 54 applications of early warning signals, it is customary to calculate fluctuation-based early warning signals such as the variance or covariance from multivariate time series data with sliding time windows, thus obtain time series of early warning signals, and use them to give alerts for impending tipping points.To apply our methods to empirical multivariate time series data, once one has determined S as the maximizer of d, one only needs to collect signals from the nodes in S in each of the sliding time windows, calculate the early warning signal (i.e., V S ), and monitor it.
The proposed method tended to outperform other heuristics when the amount of dynamical noise depended on nodes.The assumption of heterogeneous noise, also made in a previous analytical study 24 , is probably realistic because the nodes constituting a complex dynamical system correspond to different species or geographical patches in ecosystems, different genes or symptoms in complex systems of diseases, different firms or countries in a financial network, and so on.Although it is not easy to measure dynamical noise for each node in isolation in empirical complex systems, there is no a priori reason to assume that different nodes are subject to the same amount of intrinsic noise.As we have shown, heterogeneous noise across nodes generally confuses early warning signals estimated from observed data because, in heterogeneous noise cases, a larger variance of an x i does not imply that x i provides a better early warning signal.We overcame this problem by quantifying fluctuations of the early warning signal, not just fluctuations of x i , and measuring it at two bifurcation parameter values.We propose that we should observe the system at two sufficiently distant bifurcation parameter values (practically, two distant times) for identifying good early warning signals.The early warning signals at nodes whose fluctuation is large due to large intrinsic noise do not substantially grow between the two bifurcation parameter values.In contrast, the early warning signals at nodes closely approaching their tipping point substantially grow between the two bifurcation parameter values.An alternative strategy is to use lagged autocorrelation as early warning signal because nodes with large intrinsic noise may produce small autocorrelation.Autocorrelation of multivariate OU processes is analytically tractable, whereas it is more complicated than the variance and covariance 37 .Analysis of autocorrelation and its average over nodes as early warning signals with the present theoretical framework warrants future work.
In numerical simulations, we examined the size of the node set n ∈ {1, …, 5}.For n = 1 and 2, we inspected all the node sets for their performance and d values.In contrast, we randomly sampled node sets for n ≥ 3 due to the combinatorial explosion.If the number of nodes, N, is less than approximately 20, it may be feasible to exhaustively investigate all the node sets across all values of n to find the exact maximizer of d.Note that the calculation of a single d only involves the square and summation of the entry of the N × N covariance matrix and therefore is not costly.When N is larger, we need to find node sets that only approximately maximize d.This combinatorial problem has scarcely been discussed in early warning signal research community.It is because prior studies highlighting that different nodes can emit early warning signals of different quality either used systems with a small number of nodes, typically with N ≤ 5 19,[21][22][23][24] , or linearly ranked the N nodes 20,27,29 , thus implicitly abandoning the potential benefit of wisely combining different nodes.However, real-world complex systems whose tipping events are of practical interest are more often than not larger complex networks 17,18 .Although we provided a stopping criterion for exploring different n values and gave a demonstration (see Supplementary Note 4), further deploying heuristics for combinatorial optimization to approximate maximization of d for reasonably large n and N is left for future work.
We used the sample covariance matrix as the estimator of the true covariance matrix for simplicity.In fact, although the sample covariance matrix is an unbiased estimator in the limit of L → ∞, it is an unreliable estimator when L is small relative to N, and there are better choices such as different sparse estimators and covariance shrinkage methods 45 .In the present study, the sample covariance matrix is not problematic because we used a relatively small N and large L.However, in practical situations, we may only have a small number of samples, in which case one should consider a different covariance estimator.Furthermore, we confined ourselves to linear combinations of early warning signals measured at single nodes.However, a natural extension is to exploit covariance of x i and x j , where i ≠ j, to construct early warning signals.Examples of such early warning signals include the leading eigenvalue of the covariance matrix 21,55,56  Eq. ( 29) represents that the jth node infects the ith node.The second term represents the recovery of the ith node.We set μ = 1 without loss of generality; multiplying λ, μ, and σ i by the same positive constant is equivalent to changing the time scale and not to change these parameter values.For this model, we only use λ as the bifurcation parameter.This is because adding a term u i dt on the right-hand side of Eq. ( 29) would imply that infectious individuals can spontaneously appear in the population in the absence of any infected host, which is unrealistic.For the same reason, we do not have a concept of heterogeneous node stress in this model.We initially set λ = 0.In the absence of dynamical noise, it holds true that 0 ≤ x i < 1 ∀ i at equilibrium.We set σ i = σ + Δσ i , where σ = 5 × 10 −4 .

Networks
We used the following three model networks and three empirical networks in our numerical simulations.All networks are undirected and unweighted by construction, or coerced to be so.
The first network was generated by the Erdős-Rényi random graph with 50 nodes and exactly 125 edges.We connected a uniformly randomly selected pair of nodes that had not yet been adjacent to each other, one by one, until we had 125 edges.It happened that the generated network was connected.By construction, the average degree 〈k〉 = 5.
The second network is a network generated by the BA model with N = 50 nodes.We set the number of edges per each additional node to m = 2.The initial condition was the complete graph with 3 nodes (i.e., triangle).In the limit N → ∞, the model produces a degree distribution with a power-law tail, i.e., p(k) ∝ k −3 , where k is the degree, and p(k) is the probability that the degree is equal to k.The resulting network was connected and had 97 edges, yielding 〈k〉 = 3.88.
The third network is the largest connected component of the node fitness model proposed by refs.60-62.This model produces networks with heterogeneous degree distributions as follows.We start with an empty network with N = 50 nodes and assign to each node i a fitness score f i = ði + i 0 À 1Þ Àα , where α is a parameter that controls the heterogeneity of the degree distribution, and i constrains the maximum degree.We set α = 2.Then, we independently connect each pair of nodes (i, j) by an edge with probability We took the largest connected component of the resulting network, which included N = 49 nodes and 125 edges, yielding 〈k〉 = 5.1.The fourth network is a record of carbon flows in the Chesapeake Bay marine ecosystem 63 .The original carbon flow data is directed and forms a connected network.We used the undirected, unweighted version of the network stored by the Koblenz Network Collection 64 .This network has 39 nodes, 170 edges, and 〈k〉 = 8.72.
The fifth network is the food web of a freshwater stream collected in southern New Zealand 65 .Two species are connected if there was evidence of one consuming the other.The original data is a directed and unweighted network with 49 nodes and 110 edges 66 .We coerced the network to be undirected and retained the largest connected component, resulting in a network with 48 nodes, 110 edges, and 〈k〉 = 4.58 The sixth network is a social network of wild dolphins observed in Doubtful Sound, New Zealand 67 .The network is connected, undirected, and unweighted, with 62 nodes, 159 edges, and 〈k〉 = 5.13.

Calculation and performance assessment of early warning signals
We conducted 50 independent series of simulations for each combination of dynamics model, network, and whether both node stress (i.e., u i ) and noise strength (i.e., σ i ) were homogeneous, only u i was heterogeneous, or both u i and σ i were heterogeneous.Each series consisted of simulations at linearly increasing values of a bifurcation parameter in the case of the double-well or SIS dynamics, and linearly decreasing values of a bifurcation parameter in the case of the mutualistic interaction or gene regulatory dynamics.When we moved to a next value of the bifurcation parameter, we increased it by Δu = 0.025 or ΔD = 0.0025 for the double-well dynamics, decreased it by Δu = 0.1 or ΔD = 0.01 for the mutualistic interaction dynamics, decreased it by Δu = 0.01 or ΔD = 0.01 for the gene regulatory dynamics, and increased it by Δλ = 0.0025 for the SIS dynamics.
Each simulation given the dynamics model began from the same value of x i ∀ i (double-well: x i = 1, mutualistic interaction: x i = 5, gene regulatory: x i = 5, SIS: x i = 0.001) regardless of the bifurcation parameter value.We used the Euler-Maruyama method with Δt = 0.01 to simulate each dynamics.In the SIS and gene regulatory dynamics, whenever we obtain a negative value of x i (t) for any i due to dynamical noise, we reset x i (t) = 0. We allowed 100 generic time units (TU) to discard transients, except in the case of the mutualistic dynamics, for which we allowed 10 TU due to a shorter characteristic time scale of the mutualistic dynamics model.After discarding transients, we considered the system at equilibrium and took L = 100 evenly spaced samples from each x i (t), i ∈ {1, 2, …, N}; samples were spaced 1 TU apart, with the exception of the mutualistic dynamics model for which the samples were spaced 0.1 TU apart.We stopped a series of simulations with a gradually changing bifurcation parameter value once any x i was no longer near its initial state, specifically, once any x i satisfies the following condition: x i ≥ 3 for the double-well dynamics, x i < 0.1 for the mutualistic interaction dynamics, x i < 0.1 for the gene regulatory dynamics, and x i ≥ 0.1 for the SIS dynamics.
After all simulations in a given series were complete, we calculated the early warning signals for each node set S with |S| = n, based on the L samples taken at each value of the bifurcation parameter.In addition, to calculate d, we used Eq. ( 5) to obtain μ 1 = 1 n ii and μ 2 = 1 n ii , and Eq. ( 6) to obtain var 1 = 2 P n i = 1 . Then, we used Eq. ( 15) to calculate d for the node set S. Note that this computation is easy once we have calculated C (1) and C (2) .We selected the two values of the bifurcation parameter at which we calculated C (1) and C (2) as follows.Suppose that the bifurcation parameter is u and that we have simulated the dynamics at u = u k , k 2 f1, 2, . . ., Kg.Thus, the simulation with u K was the last simulation in which all x i remained near their initial state at equilibrium.We selected k ð1Þ = round ð0:1 KÞ and k ð2Þ = round ð0:9 KÞ, where round() denotes rounding to the closest integer.Then, we calculated C (1) and C (2) from the L samples obtained at u = u ð1Þ u k ð1Þ and u = u ð2Þ u k ð2Þ , respectively.We followed the same procedure to select the two values of D at which we calculated C (1) and C (2) when the bifurcation parameter was D.
In addition to by maximizing d, we also determined node set S by the Large SD algorithm 29 for comparison purposes, which proceeds as follows.First, we collected L samples from each x i (t) at u = u (2) (or D = D (2) when D instead of u was the bifurcation parameter) and calculated its sample standard deviation.Second, we defined S as the set of the n nodes with the largest sample standard deviation of x i (t).Third, we averaged the sample standard deviation of x i (t) over i ∈ S and used it as early warning signal.
We quantify the extent to which a given node set S signals the proximity of tipping by the Kendall's τ rank correlation between the early warning signal (i.e., the sample variance averaged over the nodes in S unless we state otherwise) and the bifurcation parameter 13 .Because of critical slowing down, the variance of x i (t) grows large as the dynamical system approaches a tipping point 1 .In our simulations, when the bifurcation parameter linearly increases (i.e., double-well and SIS dynamics), the value of a perfect early warning signal would monotonically increase, resulting in τ = 1.When the bifurcation parameter linearly decreases (i.e., mutualistic interaction and gene regulatory dynamics), a perfect early warning signal yields τ = − 1.

Fig. 1 |
Fig. 1 | Schematic of two networks.a A network with N = 2 nodes connected by a directed edge.b A symmetric chain network with N = 3 nodes.The coupling strength is denoted by w.The stress given to each node is either r or r − Δr.

Fig. 2 |
Fig.2| An example bifurcation diagram of single-node dynamics given by Eq. (8) without dynamical noise.The solid and dashed lines represent stable and unstable equilibria, respectively.

Fig. 6 |
Fig. 6 | Relationships between the Kendall's τ and d before the first major transition in the coupled double-well dynamics on a BA network with N = 50 nodes.We gradually increased u.We considered node sets S with n ∈ {1, 2, 3, 4, 5, N} nodes.a n = 1.b n = 2. c n = 3. d n = 4. e n = 5. f n = N.The τ and d values for the node set maximizing d for each n value are highlighted by the dashed lines.The cross represents "Large SD'', i.e., the node set comprised of the n nodes with the largest sample standard deviation of x i (t).

Fig. 7 |
Fig. 7 | Performance of the optimized node set in anticipating the tipping point in the double-well dynamics on the BA model network with N = 50 nodes.a Homogeneous stress and homogeneous noise.b Heterogeneous stress and homogeneous noise.c Heterogeneous stress and heterogeneous noise.The smaller symbols show the p 1 and p 2 values for the individual series of simulations.The larger symbols show the average over 50 series of simulations.

Fig.
Fig.7a).Separately, we also consider the case in which both the node stress u i and the strength of the dynamical noise σ i are heterogeneous.We set σ i = σ + Δσ i , where σ = 0.05, and Δσ i independently obeys the uniform density on [ − 0.9σ, 0.9σ].We show p 1 and p 2 for this case in Fig.7c.The results are qualitatively the same as those when both u i and σ i are homogeneous (Fig.7a) and when only u i is heterogeneous (Fig.7b).Second, we have considered six networks including the BA model, three of which are model networks and the other three are empirical networks (see Methods for the networks).Third, we have considered four dynamical system models on networks including the coupled double-well dynamics; the other three dynamics are a model of mutualistic interaction dynamics among species46 , a gene regulatory system governed by the Michaelis-Menten equation46 , and the deterministic approximation of the susceptible-infectious-susceptible (SIS) model on networks47 .These four models of dynamics are diverse in the sense that the SIS model shows a transcritical bifurcation to mark the onset of epidemic spreading, while the other three models show saddle-node bifurcations.In addition, we started the mutualistic interaction and gene regulatory dynamics from the upper state for each node and gradually decreased the bifurcation parameter value, which is the opposite to the case of the double-well and SIS dynamics.This is because the loss of resilience such as species loss in mutualistic dynamics and cell death in gene regulatory dynamics is of a practical concern for these dynamical systems46 .Fourth, we also varied D instead of u as bifurcation parameter until the first bifurcation occurs.For the SIS model, we only considered an equivalent to D, which is the infection rate parameter, but not u because introducing and varying u is unrealistic for the SIS model; see Methods for the discussion.We show the p 1 and p 2 values for n ∈ {1, …, 5}, the three different scenarios for u i and σ i (i.e., constant across all nodes or heterogeneous), two networks, four different dynamical systems, and two different bifurcation parameters (i.e., u or D) in Fig.8.Each figure panel contains the average of p 1 and p 2 over 50 series of simulations for the three scenarios of heterogeneity (i.e., both u i and σ i are homogeneous, only u i is heterogeneous, and both of them are heterogeneous) for one network and one dynamical system.Note that Fig.8ais for the doublewell dynamics on the BA network, corresponding to Figs.6 and 7. We find that the node set based on the largest d value performs reasonably well for the two networks (see Fig.8a-g for the BA network and Fig.8h-n for the Chesapeake Bay carbon flow network) and for all the four models of dynamics.The node set maximizing d tends to work better relative to uniformly randomly picked node sets, yielding smaller p 1 and p 2 values, when u i and σ i are both heterogeneous across the nodes (shown by the dotted lines) than when either u i or σ i is homogeneous (shown by the solid and dashed lines).This is presumably because the proposed method actively discards nodes with high intrinsic noise by comparing the distribution of the signal variance at two values of the bifurcation parameter.The node set maximizing d tends to work better relative to uniformly random node sets with the same number of nodes, n, when n is smaller.However, even at n = 5, the p 1 and p 2 values are at most approximately 0.7 and much smaller in a majority of cases.The results are qualitatively the same for the other four networks, with a caveat that the maximizer of d performs poorly for the gene regulatory dynamics model on some networks (see Supplementary Note 6).We also carried out the following robustness tests.Our choice of the two bifurcation parameter values at which we sample the covariance matrix and then calculate d (see Methods for the definition of the two bifurcation parameter values) has been arbitrary.We used one bifurcation parameter value close to the start of each series of simulation and the other value close to the first tipping point, regardless of the type of bifurcation.This is because, with this choice, the difference between μ 1 and μ 2 , which is the numerator of d (see Eq. (15)), is larger than when the two bifurcation parameter values

Fig. 8 |
Fig.8| Performance of the node set maximizing d on the BA and Chesapeake Bay networks.The squares and circles represent p 1 and p 2 , respectively, for the given n, dynamics, network, and condition (i.e., whether u i or σ i is homogeneously or heterogeneously distributed) averaged over 50 series of simulations.a-g BA network.h-n Chesapeake Bay carbon flow network.The panels on the leftmost column correspond to the double-well dynamics, and the second to the fourth columns to the mutualistic interaction, gene regulatory, and SIS dynamics, respectively.The combination of the dynamics model and bifurcation parameter is as follows.(a, h): double-well, u. (b, i): mutualistic interaction, u. (c, j): gene regulatory, u. (d, k): double-well, D. (e, l): mutualistic interaction, D. (f, m): gene regulatory, D. (g, n): SIS, λ.

Fig. 9 |
Fig. 9 | Relationships between the Kendall's τ and d in the coupled double-well dynamics on the BA network with N = 50 nodes under heterogeneous node stress and heterogeneous dynamical noise.The network is the same as the one used in Fig. 6.We considered node sets S with n ∈ {1, 2, 3, 4, 5, N} nodes.a n = 1.b n = 2. c n = 3. d n = 4. e n = 5. f n = N.The τ and d values for the node set maximizing d are highlighted by the dashed lines.The cross represents "Large SD'', i.e., the node set comprised of the n nodes with the largest sample standard deviation of x i (t).
* 1 and x * 2 are negative such that C 11 , C 12 , and C 22 are positive.Here we recall that E denotes the expectation, and we let std denote the standard deviation.Because x * 1 = 0 À as r → 0 − , all of C 11 , C 12 , and C 22 diverge as r → 0 − .Therefore, E½ V 1 , E½ V 2 , and E½ V f1, 2g , which are equal to C 11 , C 22 , and (C 11 + C 22 )/2, respectively, all diverge as r → 0 − according to ∝ (−r) −1/2 .However, we argue that this fact does not immediately imply that V 1 , V 2 , or V f1, 2g is a high-quality early warning signal for the saddle-node bifurcation occurring at r = 0 because std ½ V 1 , std ½ V 2 , and std ½ V f1, 2g , which are proportional to [57][58][59]I[57][58][59]. It should be noted that we used the cross covariance, C ð1Þ ij and C ð2Þ ij , where i ≠ j, only to evaluate the uncertainty of our early warning signals, not to construct early warning signals.Early warning signals for heterogeneous networks when the number of samples is limited are a challenging open question.