Duality between predictability and reconstructability in complex systems

Predicting the evolution of a large system of units using its structure of interaction is a fundamental problem in complex system theory. And so is the problem of reconstructing the structure of interaction from temporal observations. Here, we find an intricate relationship between predictability and reconstructability using an information-theoretical point of view. We use the mutual information between a random graph and a stochastic process evolving on this random graph to quantify their codependence. Then, we show how the uncertainty coefficients, which are intimately related to that mutual information, quantify our ability to reconstruct a graph from an observed time series, and our ability to predict the evolution of a process from the structure of its interactions. We provide analytical calculations of the uncertainty coefficients for many different systems, including continuous deterministic systems, and describe a numerical procedure when exact calculations are intractable. Interestingly, we find that predictability and reconstructability, even though closely connected by the mutual information, can behave differently, even in a dual manner. We prove how such duality universally emerges when changing the number of steps in the process. Finally, we provide evidence that predictability-reconstruction dualities may exist in dynamical processes on real networks close to criticality.


I. INTRODUCTION
The relationship between structure and function is fundamental in complex systems [1][2][3], and important efforts have been invested in developing network models to better understand it.In particular, models of dynamics on networks [4][5][6][7] have been proposed to assess the influence of network structure over the temporal evolution of the activity in the system.In turn, data-driven models [8,9], dimension-reduction techniques [10][11][12][13] and mean-field frameworks [14][15][16][17][18] have deepened our predictive capabilities.Among other things, these theoretical approaches have shed light on the relationship between dynamics criticality and many network properties such as the degree distribution [14,16], the eigenvalue spectrum [19][20][21] and their group structure [17,22,23].Fundamentally, these contributions justify our inclination for measuring and using real-world networks as a proxy to predict the behavior of complex systems.
Interestingly, dynamics prediction and network reconstruction are usually considered separately, even though they are related to one another.The emergent field of the network neuroscience [46,47] is perhaps the most actively using both notions: Network reconstruction for building brain connectomics from functional time series, then dynamics prediction for inferring various brain disorders from these connectomes [48,49].Recent theoretical works have also taken advantage of these notions to show that dynamics hardly depend on the structure.In Ref. [50], it was shown that time series generated by a deterministic dynamics evolving on a specific graph can be accurately predicted by a broad range of other graphs.These findings highlight how poor our intuition can be with regard to the relationship between predictability and reconstructability.Furthermore, recent breakthroughs in deep learning on graphs have benefited from proxy network substrates to enhance the predictive power of their models [51][52][53], with applications in epidemiology [9,54], and pharmaceutics [55,56].However, the use of graph neural networks and those proxy network substrates is only supported by numerical evidence and lacks a rigorous theoretical justification.As a result, their enhanced predictability remains to be fully corroborated.There is therefore a need for a solid, theoretical foundation of reconstructability, predictability and their relationship in networked systems.
In this work, we establish a rigorous framework that lays such a foundation based on information theory.Information theory has been regularly applied to networks and dynamics in the past.In network science, it has been used to characterize random graph ensembles [57][58][59]e.g. the configuration model [60,61] and stochastic block models [62,63]-, to develop network null models [64] and to perform community detection [65,66].In stochas-arXiv:2206.04000v4[cond-mat.stat-mech]20 Feb 2023 tic dynamical systems, information-theoretical measures have been proposed to quantify their predictability [67][68][69], complexity [70,71] and emergence [72].In statistical mechanics, information transmission has been shown to reach a maximum value near the critical point of spin systems in equilibrium [73,74].
Our objective is to combine these ideas into a single framework, motivated by recent works involving spin dynamics on lattices [75,76] and deterministic dynamics [50].Our contributions are fourfold.First, we use mutual information between structure and dynamics as a foundation for our general framework to quantify the structure-function relationship in complex systems.Second, this codependence naturally leads to the definition of measures of predictability and reconstructability.Doing so allows us to conceptually unify prediction and reconstruction problems, i.e., two classes of problems that are usually treated separately.Third, we design efficient numerical techniques for evaluating these measures on large systems.Finally, we identify a new phenomenona duality-where our prediction and reconstruction capabilities can vary in opposite directions.These findings further our understanding of the complexity of modeling networked complex systems, such as the brain, where both prediction and reconstruction techniques play critical roles.

A. Information theory of dynamics on random graphs
Let us consider a random graph G whose support, G N , consists in the set of all graphs of N vertices, each of which having its respective non-zero prior probability P (G * ) with G * ∈ G N .From the Bayesian perspective, the random graph G represents our prior knowledge on the structure of the system of interest.We also consider a stochastic process (also called a dynamics hereafter) of length T , noted X, evolving on a realization of G and representing the possible dynamic states of the system.We note P (X | G) the probability of a random time series X = (X i,t ) i,t conditioned on G, where X i,t is the random state, with support Ω, of vertex i at time t.Together, X and G form a Bayesian chain G → X, where the arrow indicates conditional dependence [77].
We are interested in the mutual information between X and G-denoted I(X; G)-which is a symmetric measure that quantifies the codependence between the dynamics X and the structure G [78], where I(X; G) = 0 when they are independent.It is equivalently given by where H(G) = − log P (G) and H(X) = − log P (X) are respectively the marginal entropies of G and X, The highly predictable / weakly reconstructable scenario, where H(X) H(G) meaning that I(X; G) contains most of the information related to the dynamics, but only a small fraction of the information related to the graph.(c) The reverse scenario, i.e., highly reconstructable / weakly predictable, where H(G) H(X) meaning that I(X; G) contains most of the information related to the graph, but only a small fraction of the information related to the dynamics..

and H(G
are their corresponding conditional entropies.In the previous equations, the marginal distribution for X, the evidence, is defined as P (X) = G * ∈G N P (G * )P (X | G * ), and the posterior is obtained from Bayes' theorem as P (G | X) = P (G)P (X | G)/P (X), using the given graph prior P (G) and the dynamics likelihood P (X | G).In the case where Ω is a countable set (i.e., vertices have discrete dynamical states), I(X; G) is a non-negative measure bounded by 0 ≤ I(X; G) ≤ min {H(G), H(X)}.Figure 1(a) provides an illustration of Eq. ( 1).
The measures presented in Eq. ( 1) and above can all be interpreted in the context of information theory.Information is generally measured in bits which in turn is interpreted as a minimal number of binary-i.e., yes/noquestions needed to convey it.While entropy measures the uncertainty of random variables like X and G, i.e., the minimal number of bits of information needed to determine their value, mutual information represents the reduction in uncertainty about one variable when the other is known.The fact that it is symmetric means that this reduction goes both ways: The reduction in the dynamics uncertainty when the structure is known is equal to that of the structure when the dynamics is known.Hence, mutual information measures the amount of information shared by both X and G.
As an illustration, let us consider the physical example of a spin system such as the one described by the Glauber dynamics [79] depending on G (see Table I).For a given value of the coupling parameter J ≥ 0, the spins will be more (large J) or less (small J) likely to align with their first neighbors in G. Now, suppose that J = 0, which means that the spins will flip independently from each other and from G with probability 1 2 at each time step.Hence, H(X | G) = N T bits, corresponding to the maximum entropy of X: We need precisely one binary question for each spin at each time for a given structure G-e.g., "Is the spin of vertex i at time t up? ".When J > 0, correlation is introduced between connected spins.As a result, a single question about the spin of vertex i at time t can provide additional information about the spins of other vertices at other times and thus, H(X | G) < N T .The interpretation of H(X) is analogous to that of H(X | G), as it measures the number of binary questions needed to determine X when the graph is unknown.From this perspective, the mutual information I(X; G), as expressed by the difference between H(X) and H(X | G), is the reduction in the number of questions needed to predict X ensuing from the knowledge of G. Hence, I(X; G) measures how much information about X is determined by G-or how influential G is over X-and is therefore related to its predictability.
Similar observations can be made from the structural perspective.Suppose that X is again the Glauber dynamics and G is a random graph, where each edge exists independently with probability p.This yields H where N 2 is the total number of possible undirected edges.When p = 1 2 , we have H(G) = N 2 bits, which is again the maximum entropy of G.We therefore need precisely one binary question for each of the N 2 edges in the graph-e.g."Is there an edge between i and j? "-to completely determine its value.When the dynamics X is known, H(G | X) is interpreted similarly to H(G), but also takes into account the observation of the spins X which introduces correlation between the edges of G.As a result, each bit can provide information about more than one edge, even in the case p = 1 2 where we a priori need one bit per possible edge to fully reconstruct G. Consequently, the knowledge of X reduces uncertainty about G (i.e., H(G | X) ≤ H(G), see [78,Theorem 2.6.5]).The difference between H(G) and H(G | X)-i.e., I(X; G)-thus measures how much information about G is revealed by knowing X, which in turn is related to its reconstructability.
In practice, we can argue that I(X; G) is related to the performance of reconstruction algorithms such as the cross-correlation matrix method [30].Figure 2 provides evidence of this relationship by comparing the performance of common reconstruction algorithms with the mutual information (see Section IV D for detail).Indeed, when I(G; X) = 0, the score of all algorithms is comparable to that of a random edge/no-edge classifier between each pair of nodes (with an AUC of 0.5) and all methods seems to peak around the mutual information maximum before decreasing again for larger coupling.Note that a similar comparison analysis could, in principle, be carried out for the predictability as well, but the problem of measuring a graph influences over a process is less documented than that of graph reconstruction, which makes it harder to actually investigate.
The mutual information I(X; G) is therefore both a measure of predictability and reconstructability, thereby unifying these two concepts under one single framework.We say that a system is perfectly predictable when the mutual information contains all the information about X, that is when I(X; G) = H(X) [see Fig. 1(b)].Likewise, we say that it is perfectly reconstructable when I(X; G) = H(G) [see Fig. 1(c)].Consequently, whenever I(X; G) > 0, we expect the system to be predictable and reconstructable to a certain degree.Otherwise, when I(X; G) = 0, the system is said both unpredictable and unreconstructable.Yet, I(X; G) by itself is hardly comparable from one system to another.Indeed, a specific value of I(G; X) may correspond to opposing scenarios when it comes to predictability and reconstructability, as shown in Fig. 1(b-c).Thus, it is more convenient to use normalized quantities such as the uncertainty coefficients as measures, bounded between 0 and 1, of the relative degrees of predictability and reconstructability, respectively.The interpretation of the reconstructability measure U (G | X) is straightforward: It is the fraction of the structural information that can be recovered from the dynamics.Similarly, the predictability measure U (X | G) is interpreted as the fraction of dynamical information that is determined-or influenced-by the structure.Strictly speaking, U (X | G) is not a standard measure of predictability, i.e., a measure that quantifies the influence of the initial conditions of a system, or its past states, over its future states [67,68,80,81].However, as the graph can be interpreted as an element of the initial conditions that remains constant throughout the process, our predictability measure U (X | G) is compatible with the terminology of "predictability".It is nevertheless possible to utilize our framework towards predictability measures explicitly dependent on the past, but these do not highlight the relationship between structure and dynamics as clearly as the measures presented in this section.We refer to Appendix IV C for further details.

B. θ-Duality between predictability and reconstructability
Predictability and reconstructability in dynamics on random graphs offer two perspectives of the same infor-mation shared by G and X-two sides of the same coin.However, it does not mean that predictability and reconstructability go hand in hand even though they are related: A high value of U (G | X) does not necessarily imply a high value of U (X | G), which can somewhat be counterintuitive.In other words, a maximally influential structure, with respect to the dynamics, will not necessarily be easier to reconstruct.This observation is well illustrated by Figs.1(b)-(c), where U (G | X) and U (X | G) can take opposing values, depending on H(G) and H(X), for a same value of I(X; G).
As an example, let us consider X to be a Markov chain evolving on a random graph G for different values of the number of time steps, T .Theorem 1 (see App. IV B) states that, for any Markov chain whose entropy rate is non-zero and for sufficiently large T , U (G | X) is an increasing function of T , while U (X | G) is a decreasing one.This is a consequence of the fact that the mutual information is strictly increasing with T , and so is U (G | X) whenever H(G) is independent of T .Yet, we show in App.IV B that I(X; G) increases more slowly than H(X) with T , which results in a decreasing U (X | G).We refer to this opposing behavior as a duality between U (G | X) and U (X | G) with respect to T , or a T -duality for short [82].
Figure 3 illustrates the universality of the T -duality using different binary Markov chains (i.e., Ω = {0, 1}).In each of these chains, the probability where is the transition probability from state X t to state X t+1 .We also denote the activation (0 → 1) and the deactivation (1 → 0) probability functions with α(n i,t , m i,t ) and β(n i,t , m i,t ), respectively, where n i,t and m i,t denote the number of active and inactive neighbors of vertex i at time t.
We consider three well-known Markov chain models of different origins: The Glauber dynamics, the Suspcetible-Infectious-Susceptible (SIS) dynamics and the Cowan dynamics.The aforementioned Glauber dynamics [79], which have been used to describe the time-reversible evolution of magnetic spins aligning in a crystal, have been tremendously studied because of its critical behavior and its phase transition.Its stationary distribution is given by the Ising model which has found many applications in condensed-matter physics [83] and statistical machine learning [77,84], to name a few.The SIS dynamics is a canonical model in network epidemiology [5] often used for modeling influenza-like disease [85], where periods of immunity after recovery are short.In this model, susceptible (or inactive) vertices get infected by each of their infected (active) first neighbors, with a constant transmission probability, and recover from the disease with a constant recovery probability.The simplicity of the SIS model has allowed for deep mathematical analysis of its absorbing-state phase transition [14,16,19].Finally, the Cowan dynamics [86] has been proposed to model the neuronal activity in the brain.In this model, quiescent neurons fire if their input current, coming from their firing neighbors, is above a given threshold.Its mean-field approximation [87] reduces to the Wilson-Cowan dynamics [88], one of the most influential models in neuroscience [89].For each model, we can identify an inactive statedown, susceptible or quiescent-and an active one-up, infectious or firing.The corresponding activation and deactivation probabilities are given in Table I. Figure 3 numerically supports Theorem 1 and clearly illustrates the T -duality for each dynamics and with different values of their parameters.We used the Erdős-Rényi model as the random graph on which these dynamics evolve.The support G N is the set of all simple graphs of N vertices with E edges, and ( It is also important to note that the T -duality seems to persist for the past-dependent measures presented in Appendix IV C, as illustrated by Fig. 6, for different values of τ , which we recall is the length of the past Markov chain.However, it does not hold for all values of τ , especially those that scale with T such that τ = T − ξ, where ξ < T is constant.Hence, it is tantalizing to conjecture that there exists a scaling g(T ) such that, when τ is dominated by g(T ), the T -duality can persist and it cannot otherwise.More details are available in Appendix IV C.
The observation of the T -duality begs for a more general definition of duality for any arbitrary parameter θ Activation and deactivation probability functions, α(n, m) and β(n, m), respectively, for the binary dynamics considered in this study, where n corresponds to the number of inactive neighbors whose states are 0, and m corresponds to the number of active neighbors whose states are 1.We define σ(x) = [exp(−x)+1] −1 as the logistic function.Some of these parameters are fixed throughout the paper: β = 0.5 for SIS and Cowan, and a = 7 and µ = 1 for Cowan.The coupling parameters (J for Glauber, λ for SIS and ν for Cowan) are specified in each figure.Also, to prevent the SIS dynamics from being completely inactive, we allow the inactive vertices to spontaneously activate with probability = 10 −3 [90].
(see Appendix IV A).In fact, we say that U (G | X) and U (X | G) are dual with respect to θ, or θ-dual, in an interval Θ if and only if the signs of their derivative with respect to θ are different for every θ * ∈ Θ: This criterion formally relies on the existence of regions Θ where the variations of U (G | X) and U (X | G) with respect to θ are contradictory, regardless of their am-plitude.We use this criterion to relate the existence of extrema of U (G | X) and U (X | G) with that of regions of θ-duality (see Lemma 1 in App.IV A), and to prove Theorem 1.
Knowing the existence of the T -duality and having a general definition of θ-duality, it is now natural to ask if there exist other types of θ-dualities in dynamics on random graphs.A large variety of parameters could lead to interesting θ-dualities-some controlling the general behavior of the dynamics, and others controlling some structural properties of the random graph which, in turn, also impact the dynamics.Most of them require the system to be larger, if the effects over X and G of varying θ are to be significant (e.g.phase transitions).However, in high-dimensional systems, theoretical and numerical challenges arise in the evaluation of the reconstructability and the predictability, which complicate the search for dualities.We address this problem in the next section.

C. Duality and criticality
Despite their different nature and range of applications, the three models presented in Table I share several properties of interest.For instance, each model has a coupling parameter that controls the influence of the state of the first neighbors on the transition probabilities.They also all feature a phase transition in the infinite size limit whose position is determined by the coupling parameter (see Fig. 5 and App.IV F).We now investigate the influence of criticality over the existence of θ-dualities, where θ is a coupling parameter.
For the Glauber dynamics, this parameter is the coupling constant J, which dictates the reduction (increase) in the total energy of a spin configuration when two neighboring spins are parallel (antiparallel).The Glauber dynamics features a continuous phase transition at a critical point J c between a disordered and an ordered phase, where for J < J c the spins are disordered resulting in a vanishing magnetization, and for which this magnetization is non-zero when J > J c .For the SIS dynamics, it is the transmission rate λ that acts as a coupling parameter.Like the Glauber dynamics, the SIS dynamics possesses a continuous phase transition where, when λ < λ c , the system reaches an absorbing-or inactivestate from which it cannot escape, and an active state, when λ > λ c , where a non-zero fraction of the vertices remain active over time [91].The Cowan dynamics can both feature a continuous or a first-order phase transition between an inactive and an active phase depending on the value of slope a, for which the coupling parameter is ν, i.e., the potential gain for each firing neighbors.The continuous and first-order phase transitions of the Cowan dynamics are quite different in that the latter is characterized by two thresholds, namely the forward and backward thresholds ν b c < ν f c , respectively (see Appendix IV F for further details).Hence, the Cowan dynamics has a first-order phase transition that exhibits a bistable region , where both the inactive and active phases are reachable depending on the initial conditions.
To account for the heterogeneous network structure observed in a wide range of complex systems [1], we simulate the dynamics on the configuration model, a random graph whose-potentially heterogeneous-degree sequence k is fixed and whose support G N corresponds to the set of all loopy multigraphs of degree sequence k.The probability of a graph G * in this ensemble is where M ij counts the number of edges connecting vertices i and j in the multigraph G * and 2E = i k i is the number of half-edges in G * .Like the Erdős-Rényi model, the configuration model fixes the number of edges, but also fixes the degree distribution ρ(k).
Figure 4 shows the predictability and reconstructability of the three dynamics evolving on graphs drawn from the configuration model whose distribution, ρ(k), is geometric, as estimated by the MF estimator.First, these results allow us to compare the dynamics with one another.For example, on the one hand, the Glauber dynamics is globally less predictable than the other two, since its predictability coefficient is overall smaller.In other words, the knowledge of a graph G * provides less information about X * in the Glauber dynamics in comparison with the others, relatively to the total amount of information needed to reconstruct X * .This is related to the time reversibility of the Glauber dynamics, which allows any vertex to transition from the inactive to the active state (and vice versa) with non-zero probability, at any time, effectively making the Glauber dynamics more random than the others-i.e.H(X) is greater for Glauber than the other processes.On the other hand, the SIS and Cowan dynamics are portrayed by the MF estimator as practically unpredictable and unreconstructable when their coupling parameter is below their respective critical point.This precisely occurs in the inactive phase, where no mutual information can be generated after a short time, when the system reaches the inactive state.By contrast, the Glauber dynamics does not reach an inactive state below its critical point, which explains the gradual increase in predictability and reconstructability in that region.
Several additional observations are worth making.All dynamics exhibit maxima for U (X | G) and U (G | X) which delineate a region of duality illustrated by the shaded areas (two for Cowan, that is one for each branch).These regions are close to, but systematically above, their respective phase transition thresholds.A similar phenomenon in spin dynamics on non-random lattices has been reported by previous works [75,76], in which the information transmission rate between spinsa measure akin to I(X; G)-is maximized above the critical point.Our numerical results are consistent with theirs, and suggest that their findings regarding nearcritical systems even apply beyond spin dynamics on fixed lattices, to other types of processes on more heterogeneous and random structures.

III. DISCUSSION
In this work, we used information theory to characterize the structure-function relationship with mutual information.We showed how mutual information is a natural starting point to define both predictability and reconstructability in dynamics on networks, in turn showing how they are intrinsically unified.Our approach is quite general allowing the exploration of different configurations of dynamics on networks of the form G → X, thus varying the nature of the process itself as well as the random graph on which it evolves.Our framework could be extended to adaptive systems [92][93][94][95] where both X and G influence each other (i.e., X ↔ G).The relationship between X and G could also go the other way around: A system in which X generates a graph G (i.e., X → G).Hyperbolic graphs [96,97] falls into this category, where X represents a set of coordinates, and our framework could be extended to quantifying the feasibility of network geometry inference [98][99][100].
We found efficient ways to estimate the mutual information numerically, thus allowing us to investigate relatively large systems.More work on this front is required, however, since the evaluation of these estimators remains quite computationally costly.It would be worth investigating simpler models, for which it is possible to analytically-or at least approximately-evaluate U (X | G) and U (G | X).In particular, dimension reduc-tion methods [11,13,101] and approximate master equations [102,103] are promising avenues for obtaining reliable approximations of Central to our findings is the peculiar discovery that predictability and reconstructability are not only related, but sometimes dual to one another.We proved that such θ-duality appears when the length of the processes changes and presented numerical evidence of duality near the criticality in three different dynamics on random heterogeneous networks.These findings generalize and formalize-while being consistent with-previous works [75,76] and suggest that criticality in these systems is intrinsically related to the duality.
From a practical perspective, the existence of such a θduality can be critical to network modeling applications, since it also suggests a predictability-reconstructability trade-off.On the one hand, we can choose this parameter θ such that the uncertainty of the reconstructed structure is minimized, at the expense of having a less informative structure with respect to the dynamics.On the other hand, we can consider the reverse case, where the process is maximally influenced by the inferred structure, whose uncertainty is nevertheless not minimized.Analogous to the position-momentum duality in the Heisenberg uncertainty principle of quantum mechanics, the predictability-reconstructability duality must be accounted for in our network models if we are to disentangle complex systems.

ACKNOWLEDGMENTS
We are grateful to Guillaume St-Onge and Vincent Painchaud for useful comments, and to Simon Lizotte and François Thibault for their help in designing the software.This work was supported by the Fonds de recherche du Québec -Nature et technologies (VT, PD), the Conseil de recherches en sciences naturelles et en génie du Canada (CM, VT, AA, PD), and the Sentinelle Nord program of Université Laval, funded by the Fonds d'excellence en recherche Apogée Canada (CM, VT, AA, PD).We acknowledge Calcul Québec and Compute Canada for their technical support and computing infrastructures.

A. Formal definition of θ-duality
In what follows, we define the duality between predictability and reconstructability by taking a more general stance: Instead of considering a stochastic process X evolving on a random graph G, we let X be conditioned on an arbitrary discrete random variable Y .First, we define the local duality of the uncertainty coefficients.The latter are considered as continuously differentiable functions with respect to a parameter θ whose domain is some non-empty interval of the real line.
The definition of the θ-duality, a global property, follows that of the local duality.Proof.Let θ R and θ P be the extrema points of U (Y | X) and U (X | Y ), respectively.Thus Suppose for a moment that θ R < θ P and let Θ = (θ R , θ P ).This implies that ∂U (Y |X) ∂θ changes sign at θ R , before ∂U (X|Y ) ∂θ , for which the sign change happens at θ P .On the one hand, if the extrema points θ R and θ P are both maxima (or minima), then ∂U (Y |X) ∂θ and ∂U (X|Y ) ∂θ have different signs in Θ.Hence, inequality (8) is verified in this region.The uncertainty coefficients are therefore θ-dual in Θ.
On the other hand, if the uncertainty coefficients are θdual in Θ, then inequality ( 8) is satisfied in this interval.This in turn implies that either Therefore, the endpoints of Θ are either both maximum points or both minimum points.
Finally, repeating the same arguments with θ R > θ P and Θ = (θ P , θ R ) leads to the same conclusions about θ-duality of U (X | Y ) and U (Y | X) in Θ.

B. Universality of the T -duality
We demonstrate the universality of the T -duality, where T is the number of steps in the process X.First, we need to show that the mutual information is a monotonically increasing function of T .Lemma 2. Let X = (X 1 , X 2 , • • • , X T ) be a Markov chain of length T whose transition probabilities are conditional to some discrete random variable Y that is independent of T and such that H(X t+1 |X t ) > 0 for all t ∈ {1, . . ., T − 1}.Suppose moreover that the state spaces of X and Y are finite.Then the mutual information I(X; Y ) is nonzero and monotonically increasing with T ∈ Z + .
Proof.Let us define a Markov chain X = (X 1 , X 2 , • • • , X T −1 ) of size T − 1, such that the concatenation of X with state variable X T yields X.Hence, we can express the mutual information between X and Y in terms of X as I(X; Y ) = I(X , X T ; Y ).Furthermore, proving the monotonicity of mutual information can be reformulated as proving the following inequality: for all T .By the chain rule for conditional mutual information, that is I(X , X T ; Y ) = I(X T ; Y |X ) + I(X ; Y ), inequality (10) becomes The term H(X T |X ) − H(X T | X , Y ) is always at least non-negative, by virtue of the non-negativity of mutual information [78, Theorem 2.6.5].Then, to prove inequality (11), we must verify that According to the hypothesis H(X t+1 |X t ) > 0 for all t ∈ {1, . . ., T − 1}, condition (i ) cannot be true.Moreover, condition (ii ) implies that I(X; Y ) = I(X T , X ; Y ) = I(X ; Y ) = 0. Therefore, the only instance where Eq. ( 10) is not satisfied is when the Markov chain X is independent of Y , i.e., I(X; Y ) = 0 for all length T .However, this contradicts the assumption about the transition probabilities.Hence, I(X; Y ) > 0 and monotonically increases with T .
Before presenting the main result of this section, let us make a few remarks about the restrictions imposed in the last lemma.The condition H(X t+1 |X t ) > 0 for all t ∈ {1, . . ., T − 1} only asserts that the Markov chain is nondeterministic in the sense that knowing the state of the chain at time t does not completely eliminate the uncertainty about the state at time t + 1.This condition is satisfied for wide variety of stochastic processes, including the irreducible Markov chains, where there is always a nonzero probability to transition from a state to any other state in a finite number of time steps.Moreover, the finiteness of the state spaces for the chain X and the variable Y is imposed to make H(X), H(Y ), and I(X; Y ) finite.This in turn ensures that the uncertainty coefficients U (Y | X) and U (X | Y ) are well defined for all T ∈ Z + , a property that is necessary to prove the next lemma.
and Y respectively be a Markov chain and a discrete random variable as in Lemma 2. Then the uncertainty coefficients U (Y | X) and U (X | Y ), interpreted as functions of T ∈ Z + , can be uniquely generalized to functions, respectively f (T ) and g(T ), that are holomorphic for all T ∈ C, and thus real analytic for all T ∈ R + .Moreover, H(X) can be extended to a function h(T ) that is analytic for all T ∈ R + except where f (T ) = 0.The functions f and g are thus holomorphic in the whole complex plane and bounded on the positive real axis.This allows to use a special case of Carlson's Theorem [106, Theorem 2.8.1] according to which holomorphic functions that are bounded on the positive real axis are uniquely defined by their values on the set Z + .Therefore, f is the unique extension U (X | Y ) that is holomorphic for all T ∈ C. Note that the restriction of f on the positive real axis is real analytic on this domain.Thus, there is a unique extension of U (X | Y ) that is real analytic for all T ∈ R + and that can be further extended to a holomorphic function for all T ∈ C. The same conclusion holds for g and U (Y | X).
To finish the proof, we need to tackle H(X).We cannot use the same strategy as above because H(X) is not a bounded function of T ∈ Z + .However, by definition, the identity is valid whenever U (X | Y ) > 0. Now, according to Lemma 2, I(X; Y ) > 0 and hence U (X | Y ) > 0 for all T ∈ Z + .This means that Eq. ( 12) is well defined for all T ∈ Z + .To extend the domain of validity of the identity, we use the analytic functions f and g introduced above and define a new function h as The values of h coincide with those of H(X) for all T ∈ Z + , so that Eq. ( 12) defines a unique extension of H(X).Moreover, h is analytic for all T ∈ R + except at the points T where f (T ) = 0.
Lemma 3 ensures the existence of analytic extensions for the uncertainty coefficients, considered as functions of the positive integer T .These extensions can thus be evaluated and derived without restriction on the whole domain R + , which is a desirable property that will soon be exploited.However, the same lemma does not guarantee the monotonicity of the extensions on R + in the event where they are monotone on Z + , although we will assume that it is the case from now on.This is a reasonable assumption since numerical methods, generalizing the well-known Fritsch-Butland algorithm [107], have been recently developed to construct smooth (i.e., at least continuously differentiable) and monotone interpolating functions from any finite monotone datasets [108,109].With this assumption in hand, together with Lemmas 2 and 3, we now proceed to prove our main theoretical result: the universality of the T -duality in Markov chains.
and Y respectively be a Markov chain and a discrete random variable as in Lemma 2. Additionally, we suppose that X has a finite nonzero entropy rate and that Y has a nonzero entropy.Then there exists a positive constant τ such that the uncertainty coefficients , and H(X), which were originally defined as real functions of T ∈ Z + , have unique analytic extensions on the positive real axis, i.e., T ∈ R + .This allows us to treat U (X | Y ), U (Y | X), and H(X) as continuously differentiable functions with respect to T , where U (Y | X) = I(X;Y ) H(Y ) and H(X) are also monotone.Now, by hypothesis, the entropy rate of the Markov chain X, R := lim T →∞ H(X) T , is well defined and nonzero.Hence, H(X) ∼ RT , i.e., H(X) is positive and asymptotically linearly increasing with T .Moreover, since Y is independent of T and I(X; Y ) > 0, it follows that I(X; Y ) is monotonically increasing with respect to T by Lemma 2. As a result, U (Y | X) = I(X;Y ) H(Y ) is also monotonically increasing, since its denominator is independent of T , by assumption.This translates to the strict inequality ∂U (Y |X) ∂T > 0. If there exists a T -duality, i.e., there is a domain of T where Eq. ( 8) is true, then U (X | Y ) must be monotonically decreasing with T -or ∂U (X|Y ) ∂T < 0in that domain.To prove this, note that we can relate the two uncertainty coefficients using Eq. ( 12).This leads to the following differential equation where we used the fact that ∂H(Y ) ∂T = 0. Hence, to show that U (X | Y ) is monotonically decreasing with T , the following inequality must hold Suppose for a moment that U (X | Y ) is in fact increasing, such that Eq. ( 15) is false.This will eventually give rise to a contradiction.Let g(T ) := U (Y | X) and h(T ) := H(X) be continuous functions of T such that their derivative with respect to T are respectively given by g (τ ) := ∂f (T )

∂T
T =τ and h (τ 15) is false, then Using Grönwall's inequality [110, Theorem 1.2.1], we get So far, we have established that h(T ) = H(X) ∼ RT and that U (Y | X) is monotonically increasing.We have also proved that if U (X | Y ) is not monotonically decreasing with T , then inequality ( 17) is satisfied.However, the latter inequality and h(T ) ∼ RT readily imply that g(T ) belongs to the class Ω(T ), which is the set of all g(T ) such that there exist positive constants, S and T * , for which g(T ) ≥ ST for all T ≥ T * (i.e., Knuth's Big Omega [111]).Two cases must be considered.First, if so that g(T ) ≥ ST * * > 1 for all T ≥ T * * .This again contradicts the inequality g(T ) ≤ 1 whenever T ≥ T * * .As a result, inequality (17) cannot be satisfied when T ≥ τ , with τ = max {T * , T * * }.We thus conclude that

C. Past-dependent mutual information
We present a generalization of the mutual information in which the Markov chain, hereafter denoted Z = FIG. 5. Information diagrams for the past-dependent information measures.On panel (a), we show the information diagram of the random variable triplet (X, Y, G), where X represents the past states, Y , the future and G, the structure of the system.On panel (b), we highlight the quantities of interest for computing I(Y ; G | X), that is graph (left) and state (right) entropies involved in Eq. ( 19), and the mutual informations (middle) in Eq. (18).Panels (c) and (d) shows two extreme scenarios where the length of the past τ is small and large, which illustrates how the different information measures change with τ .
(X, Y ), is partitioned into two parts, namely the past states X and the future states Y , both conditioned on a random graph G.The past X and the future Y are both Markov chains, of respective length τ and T − τ , where T is the complete length of Z.By separating the past from the future, we can define new information measures that are closer to more standard predictability measures [67,81] interested in quantifying how knowledge about the past influences our capacity to predict the future.In this new scenario, we define the past-dependent mutual information as follows: which is a conditional mutual information, where both I(X, Y ; G) and I(X; G) can be expressed as before from Eq. ( 1).As illustrated by the information diagram of Fig. 5(a), we can expand this mutual information in two ways, using either the dynamical or the structural interpretations: These information measures are highlighted by Fig. 5(b).Similarly to Sec.II A, we then define the partial uncer-tainty coefficients, bounded between 0 and 1: measuring the partial predictability of Y from G and partial reconstructability of G given Y , respectively.The physical interpretation of the conditional mutual information I(Y ; G | X) is very analogous to that of I(Z; G) presented in Sec.II A, but still demands further clarifications.Indeed, it is still a measure of uncertainty reduction between a dynamics Y and a random graph G, but where the mutual information associated to the past states X has already been taken into account, as expressed by Eq. ( 18).Hence, we expect that I(Y ; G | X) decreases when the length τ of the past chain X increases [see Figs.5(c-d)].In terms of the relationship between structure and dynamics, the interpretation of I(Y ; G | X) is less straightforward.Indeed, the influence of G over Y can be reduced when X is given because a fraction of the structural information is hidden in X.This can potentially be misleading, since it does not necessarily imply that the influence of G over the complete dynamics has been reduced whatsoever.The behaviors of U X (Y | G) and U X (G | Y ) can still result in θ-dualities, as shown below, but these dualities are harder to interpret because of the structural information hidden in X.
Using the partial uncertainty coefficients in Eqs.(20), of respective length τ and T − τ , such that Z is conditioned on a discrete random variable W .Then, there exists a function g(T ) such that, if τ is dominated by g(T ), the partial uncertainty coefficients U X (Y | W ) and U X (W | Y ) are T -dual, and they are not otherwise.

D. Estimators of the mutual information
The mutual information I(X; G) is generally intractable.Its intractability stems from the evaluation of the evidence probability, which is defined by the following equation: Indeed, this sum potentially counts a number of terms which grows exponentially with the number of vertices N in the random graph.More specifically, the evidence probability appears in two entropy terms needed to compute the mutual information, namely the marginal entropy H(X) = − log P (X) and the reconstruction entropy H(G | X) = − log P (G)P (X|G) , where f (Y ) denotes the expectation of f (Y ).Fortunately, the evidence probability, and in turn the mutual information, can be estimated efficiently using Monte Carlo techniques, which we present in this section.

Graph enumeration approach
For sufficiently small random graphs (N ≤ 5), the evidence probability can be efficiently computed by enumerating all graphs of G N and by adding explicitly each term of Eq. (21).Then, we can estimate the mutual information by sampling M graph-states pairs, denoted (G * (m) , X * (m) ), and by computing the following arithmetic average: − log P X * (m) . ( The variance of this estimator scales with the inverse of √ M .In Fig. 3, we used this estimator to compute the mutual information, where M = 1000.

Variational mean-field approximation
In this approach, we estimate the posterior probability instead of the evidence probability.According to Bayes' theorem, the posterior probability is Behind this estimator is a variational mean-field (MF) approximation that assumes the conditional independence of the edges.For simple graphs, the MF posterior is where π ij (X) := P (A ij = 1 | X) is the marginal conditional probability of existence of the edge (i, j) given X.
For multigraphs, a similar expression can be obtained, but instead involves a probability π ij (ω | X) := P (M ij = ω|X) that there are ω multi-edges between i and j.In this case, the MF posterior becomes where δ x,y is the Kronecker delta.The MF approximation allows to compute a lower bound of the true posterior entropy, such that as a consequence of the conditional independent between the edges [78, Theorem 2.6.5].Using the MF approximation and a strategy similar to the exact estimator, we compute the MF estimator of the mutual information as follows: To compute P MF G * (m) | X * (m) , we sample a set is the number of times the edge (i, j) is seen in Q (m) .An analogous maximum likelihood estimate is made in the multigraph case, where π ij (ω | X) ij;ω counts the number of times there were ω multiedges between i and j in Q (m) .This estimator is a lower bound of the mutual information-a consequence of Eq. ( 26).Hence, it is biased, and the extent of this bias is dependent on the quality of the conditional independence assumption with respect to the true random graph.Note that the MF estimator can yield negative estimates of the mutual information (see Fig. 7).
In Figs. 7 and 4, we fix the number of graphs sampled from the posterior distribution to Q = 1000, and propose 5N moves between each sample, as also mentioned in App.IV E.

Annealed important sampling
Whereas the MF estimator represents a biased estimator of the posterior probability P (G | X), there exists other Markov chain Monte-Carlo (MCMC) techniques that tackle the problem of estimating the evidence probability directly.The one we consider in this paper is obtained from an annealed importance sampling (AIS) procedure called the stepping-stone (SS) algorithm [112].
The procedure of the stepping-stone algorithm takes advantage of the fact that it is possible to sample efficiently from the posterior distribution P (G | X) using MCMC (see Section IV E).In order to compute an accurate estimator of the evidence probability P (X), the procedure samples the space G N according to P β (G | X), where 0 ≤ β ≤ 1 is an inverse temperature parameter that dampens the influence of the likelihood such that The inverse temperature basically allows the Markov chain to navigate G N efficiently to construct an accurate estimator of P (X), that is where the graph samples are not all too close or too far from the maximum posterior.More specifically, the AIS estimator is defined by where 0 Taking the log of this equation gives us an estimator of the log-evidence probability, which we can use to compute the mutual information directly: Although the estimator for P AIS is unbiased, the one for the log-evidence probability introduces a bias: log P (X) ≥ log P AIS (X) .
This bias can be arbitrarily reduced by increasing K [112], although we found that doing so provides diminishing returns.Using the AIS estimator of the evidence probability, we obtain an AIS estimator of the mutual information such that Following Ref. [112], we use values of β k distributed according to a beta distribution Beta(α, 1), where β k = k K 1/α , such that increasing α controls how skewed around zero the sequence {β k } k is.For Fig. 7, we fix α = 0.5 and K = 20 and, for each value of β k , we sample 1000 graphs from P β k (G | X * ), proposing 5N moves in-between each sample (see Appendix IV E).

Evaluation of the mutual information in large systems
Next, we evaluate the quality of each estimator on small and large systems.Figure 7(a) shows the behavior of I(X; G) in the Glauber dynamics on a small Erdős-Rényi random graph as approximated using the MF and AIS estimators, and compares them to an exact evaluation based on an explicit graph enumeration used in Fig. 3.As expected the two estimators provide a lower and an upper bound for I(X; G), and these bounds are fairly tight.
Several caveats are in order.On the one hand, the bias of the AIS estimator can, in principle, be reduced arbitrarily by increasing the number K of temperature steps, but its evaluation becomes quickly computationally costly.On the other hand, the evaluation of MF estimator is comparatively quicker, but cannot be improved by further sampling.The AIS estimator is accordingly closer to the exact value throughout, but it can sometimes overestimate the mutual information above its upper bound since H(X) is overestimated while H(X | G) is not.The MF estimator can also yield negative values of I(X; G) for small values of J-i.e., regimes where H(G | X) H(G)-due to an overestimated H(G | X) becoming larger than H(G).
Figure 7(b) shows the same experiment as in Fig. 7(a) but with larger graphs of N = 100 vertices and leads to similar observations: the AIS estimator is always greater than the MF estimator, and both estimators sometimes yields approximated values for I(X; G) outside of the valid range [0,max {H(G), H(X)}].Interestingly, these bounds are nevertheless fairly close to one another, as in the case N = 5.

Biases of the uncertainty coefficients
When an estimation of the mutual information is biased, it necessarily follows that an estimation of the resulting uncertainty coefficients will also be biased.Fortunately, we can show that the direction of the bias does not change either for the reconstructability U (G | X) or the predictability U (X | G).Suppose that I ε = I(X; G)(1 + ε) is an estimator of the mutual information, where ε ∈ R is a small bias which can be either positive or negative.Then, the corresponding estimators of the uncertainty coefficients, that we denote P ε and R ε for the predictability and the reconstructability, respec- tively, are and Note that we also suppose that H(G) and H(X | G) are not affected by the bias ε.For the first expression, we consider the first-order development of P ε with respect to ε: Indeed, given that U (X | G) ≥ 0, the leading biased term 1 − U (X | G) ε must have the same sign as ε.
The second expression clearly shows that the bias of R ε is exactly given by ε.Therefore, both P ε and R ε retain the direction of bias of I ε .

E. Markov chain Monte-Carlo algorithm
To sample from the posterior distribution, we use a Markov chain Monte-Carlo (MCMC) algorithm where, starting from a graph G, we propose a move, denoted Ḡ * ← G * , according to a proposition probability P ( Ḡ * |G * ), and accept it with the Metropolis-Hastings probability: where ∆ = P ( Ḡ * )P (X * | Ḡ * ) P (G * )P (X * |G * ) is the ratio between the joint probability of the two graphs with X * .This ratio can be computed efficiently in O (T ), by keeping in memory n i,t , the number of inactive neighbors, and m i,t , a number of active neighbors, for each vertex i at each time t (see Ref. [39]).Equation (36) allows to sample from the posterior distribution P (G | X) without the requirement to compute the intractable normalization constant P (X).We collect graph samples at every N δ moves, where we fix δ = 5 in all experiments.
We consider two types of random graphs with different constraints: The Erdős-Rényi model and the configuration model.Hence, we need two different sampling propositions to apply our MCMC algorithm, that is one for each model.We assume that the support of the Erdős-Rényi model is the set of all simple graphs of N vertices with E edges.In this case, we consider a hinge flip move, where an edge (i, j) is sampled uniformly from the edge set of the graph G and a vertex k is sampled uniformly from its vertex set.Then, with probability 1 2 , we rewire edge (i, j) by either selecting i or j to connect with k.Note that, because we consider the support G N of G to be a space of simple graphs, all moves resulting in the addition of a self-loop or a multiedges are rejected with probability 1.As a result, the proposition probability is the same for any move Ḡ * ← G * : For the configuration model, we assume that the support is the set of all loopy multigraphs of N vertices whose degree sequence is k.In this case, we propose double-edge swap moves according to the prescription of Ref. [113].We refer to it for further details.

F. Numerical estimation of the phase transition thresholds
We evaluate the phase transition thresholds of each dynamics using standard finite-size scaling techniques and Monte Carlo simulations (see Fig. 8).For Glauber, an adequate order parameter to visualize the phase transition is the magnetization M := 1 N i |2X i − 1|, where the absolute value breaks the spin symmetry [84].In this For panels (a) and (b), the left axis (green) shows the order parameter (green circles), and the right axis (purple) shows the susceptibility squares).For panel (c), only the order parameter is shown but for both the forward (right triangle) and backward (left triangle) branches.The values of the thresholds are indicated by the vertical dashed lines.We used the same parameters as those of Fig. 4, but increased the number of steps T = 10 4 to better sample from the dynamics.Each marker has been average over 48 realizations.process, it is well known that the susceptibility of the order parameter M , given by diverges at the threshold J = J c of the phase transition for infinite size systems [84].In finite systems, χ M instead reaches a maximum at J = J c .We use this fact to locate J c and show the corresponding results in Fig. 8(a).For the SIS dynamics, a similar finite-size scaling analysis can be carried out, but a suitable order parameter is rather the average state X := 1 N i X i .We also use a definition of the susceptibility that is more convenient for spreading processes [19], given in terms of X: which also diverges at the phase transition threshold λ = λ c for infinite size systems.We show the results for SIS in Fig. 8(b).Finally, for the Cowan dynamics, we have a first-order phase transition characterized by a discontinuity of the order parameter X in the infinite size limit, and a bistable region bounded by two thresholds ν b c < ν f c .To find these two thresholds, we evaluate the order parameter X for varying values of the parameter ν, and find the location where the discontinuity occurs.We obtain the forward and backward branches by using different initial conditions, where the system is nearly inactive-with one active vertex-and completely active-with no inactive vertex-, respectively.
For the Cowan dynamics, it is important to mention that since we consider relatively small systems (N = 1000 vertices), the bistable region is not clearly defined.Hence, a system starting in the forward branch can jump on the backward branch with a non-zero probability.This is why the expected discontinuity at the threshold is, in fact, populated (see Fig. 8(c)).This finite-size effect should be reduced for considering larger systems, but increasing N is unfortunately too computationally costly at the moment.Hence, to get a reasonable estimation of the thresholds in this scenario, we uniformly sample the set of ν's, compute X for all values of ν and find the point ν * corresponding to the maximum gap between two points.Then, to increase the precision or this estimation, we zoom on a region centered at ν * and do it again, until it converges.This method provides reasonably accurate

FIG. 1 .
FIG. 1. Information diagram of dynamics on random graphs.(a) Areas represent amounts of information: The entropies related to G are shown on the left in blue and those related to X are on the right in orange.Mutual information, in red, corresponds to the information shared by both G and X.(b) The highly predictable / weakly reconstructable scenario, where H(X)H(G) meaning that I(X; G) contains most of the information related to the dynamics, but only a small fraction of the information related to the graph.(c) The reverse scenario, i.e., highly reconstructable / weakly predictable, where H(G) H(X) meaning that I(X; G) contains most of the information related to the graph, but only a small fraction of the information related to the dynamics..

FIG. 2 .
FIG.2.Comparison between the mean field (MF) estimator of I(X; G) (red circles) and the average area under the ROC curve (AUC) of the correlation matrix method[30] (light blue squares), the Granger causality method[31] (green diamonds) and the transfer entropy method[32] (purple triangles), as a function of the coupling constant for time series of length T = 100 generated with the Glauber dynamics on Erdos-Renyi of size N = 100 with M = 250 edges.The mutual information is shown on the left axis and the AUC scores, on the right axis.The theoretical maximum values for both measures are shown with the one vertical dashed line.

FIG. 4 .
FIG. 4. Dynamics evolving on configuration model graphs with geometric degree distribution: (a) Glauber dynamics, (b) SIS dynamics and (c) Cowan dynamics.We generated graphs of N = 1000 vertices, where the degree distribution ρ(k) = (1 − p)p k is a geometric distribution with p = k /(1 + k ) and k = 5, and time series of length T = 2000.See TableIfor the remaining parameters.Similar to Fig.3, U (G | X) is shown in blue (left axis) and U (X | G) is shown in orange (right axis).We show, for each dynamics, the uncertainty coefficients as a function of the coupling parameter: J for Glauber, λ for SIS and ν for Cowan.The vertical dotted-dashed lines correspond to the phase transition thresholds of each dynamics, which are estimated from Monte Carlo simulations (see Appendix IV F).For the Cowan dynamics, the forward and backward branches are shown with their corresponding thresholds and dual regions (see main text).

Definition 1 (
Local duality).The uncertainty coefficients U (X | Y ) and U (Y | X) are locally dual with respect to θ at θ = θ * if and only if

Definition 2
(θ-Duality).The uncertainty coefficients U (X | Y ) and U (Y | X) are dual with respect to θ, or θ-dual, in the interval Θ if and only if they are locally dual for all values of θ * in Θ.From these definitions, we relate the presence of extrema of U (X | Y ) and U (Y | X) with the existence of a θ-duality.Lemma 1.Let Θ be a non-empty subinterval of the variable θ whose one endpoint is a local extremum of U (X | Y ) and the other, a local extremum of U (Y | X).Moreover, suppose that U (X | Y ) and U (Y | X) do not have critical points in Θ.Then the extrema points delineate a region of θ-duality if and only if they are both maxima (or both minima).

Proof.
We first consider U (X | Y ) and U (Y | X), which are defined in Eqs.(2a)-(2b).These can be interpreted as functions of T ∈ Z + whose values belong to the interval [0, 1].According to Guichard's Theorem [104, Theorem 5.2.1] (see also [105, Theorem 15.13]), there exist two functions of z ∈ C, denoted f and g, that are holomorphic in the whole complex plane and whose values at z = T ∈ Z + equal those of U (X | Y ) and U (Y | X), respectively.Now, U (X | Y ) and U (Y | X), and consequently f (z) and g(z), have bounded values for all z = T ∈ Z + .Moreover, f and g are holomorphic, so their restriction to the axis z = T ∈ R is real analytic.Hence, on that axis, f and g are Lipschitz continuous, which means that there are positive and finite constants, a and b, such that |f (T ) − f (T )| ≤ a|T − T | and |g(T ) − g(T )| ≤ b|T − T | for all T, T ∈ R. Choosing T = T + with T ∈ Z + and | | < 1, we conclude that f (T ) and g(T ) have finite values for all T ∈ R + .
the posterior distribution P (G | X * (m) ).Then, we estimate the probabilities π ij (X) 1 and the expectation is evaluated with respect to G * k ∼ P β k (G | X * ), for each k.Similarly to the mean-field estimator, we estimate this expectation by collect a sample Q (m) k of Q graphs distributed according to P β k (G | X * (m) ), for each k.

FIG. 7 .
FIG. 7. Estimators of the mutual information in the Glauber dynamics on Erdős-Rényi graphs as a function of the normalized coupling parameter J k : (a) N = 5, E = 5 and T = 100 (b) N = 100, E = 250 and T = 1000.The solid line in (a) corresponds to the exact evaluation of I(X; G) and is the same line as the one in Fig. 3(a).The circles and square in both (a) and (b) represent the values of I(X; G) computed using the AIS and the MF estimators, respectively.The dashed line indicates the upper bound of I(X; G), i.e., max {H(G), H(X)}.We also show with a gray area the admissible values of I(X; G) bounded by the biased MF and AIS estimators.

FIG. 8 .
FIG. 8. Numerical of the phase transition thresholds: (a) Glauber dynamics, (b) SIS dynamics, (c) Cowan dynamics.For panels (a) and (b), the left axis (green) shows the order parameter (green circles), and the right axis (purple) shows the susceptibility squares).For panel (c), only the order parameter is shown but for both the forward (right triangle) and backward (left triangle) branches.The values of the thresholds are indicated by the vertical dashed lines.We used the same parameters as those of Fig.4, but increased the number of steps T = 10 4 to better sample from the dynamics.Each marker has been average over 48 realizations.