Constructing minimal models for complex system dynamics

One of the strengths of statistical physics is the ability to reduce macroscopic observations into microscopic models, offering a mechanistic description of a system’s dynamics. This paradigm, rooted in Boltzmann’s gas theory, has found applications from magnetic phenomena to subcellular processes and epidemic spreading. Yet, each of these advances were the result of decades of meticulous model building and validation, which are impossible to replicate in most complex biological, social or technological systems that lack accurate microscopic models. Here we develop a method to infer the microscopic dynamics of a complex system from observations of its response to external perturbations, allowing us to construct the most general class of nonlinear pairwise dynamics that are guaranteed to recover the observed behaviour. The result, which we test against both numerical and empirical data, is an effective dynamic model that can predict the system’s behaviour and provide crucial insights into its inner workings. In statistical physics, the observable macroscopic behaviour of a system is obtained from a microscopic model of its components. Here, the authors extend this approach to systems with no known microscopic dynamics, by looking at the system’s response to external perturbations.

D espite the marked advances in mapping social, biological and technological networks [1][2][3][4][5] , our ability to predict and manipulate their behaviour is currently limited due to the absence of accurate microscopic models of their interaction dynamics. Indeed, the observed behaviour of complex systems is governed by the interplay between their topology and their dynamics 6,7 , prompting us to develop reliable methodologies for the construction of dynamic models for complex systems. The problem is that unlike physical systems, where the interactions are described in terms of a set of fundamental rules, such as the laws of electromagnetism driving the microscopic interactions between particles, for most complex systems such rules remain to be discovered, limiting our ability to rely on a theoretical understanding of the system's components when constructing the appropriate model. Here we develop a methodology to construct dynamic models directly from empirical observations, assuming minimal a priori knowledge about the system. Our goal is to directly translate these observations into a mechanistic model that accurately captures the interactions between the system's components, ultimately providing the system's equation of motion. Such models can provide both theoretical insights into the inner workings of the system and predictive power on its expected behaviour. Consider a complex system with N components (nodes), whose activities x i (t) (i ¼ 1,y,N) are driven by ordinary differential equations of the form Here M 0 (x i (t)) describes the self-dynamics of each component and M(x i (t), x j (t)) captures the impact of i's neighbour j on the state of i. The adjacency matrix, A ij , describes which components interact. With the appropriate choice of the nonlinear M 0 (x i (t)) and M(x i (t), x j (t)), equation (1) has been used before to describe the dynamics of a wide range of complex systems, like metabolic networks 8,9 , the spread of infectious disease on a social network 10,11 or interspecies interactions in ecological systems 12 , to mention only a few. Furthermore, for many biological 9,13,14 technological 15 or social 10,11,16 systems the interaction term factorizes as M(x i (t), x j (t)) ¼ M 1 (x i (t))M 2 (x j (t)), in which case the system's dynamics is uniquely characterized by three independent functions, together defining the system's model as a point in the model space M (Fig. 1a,b). For systems of unknown microscopic dynamics, the challenge is to infer the appropriate model by identifying M 0 (x), M 1 (x) and M 2 (x) that accurately describe the system's observable behaviour. (The treatment of more general systems, in which M(x i , x j ) cannot be factorized, is discussed in Supplementary Note 5). Traditionally, uncovering m involves three steps: (i) observation of the system's macroscopic behaviour X (ref. 17); (ii) inference of the microscopic model, m, from X ; (iii) validation, showing that m can predict the observed behaviour X . While this program has been successfully carried out for some well studied systems, the three steps above are difficult to replicate for complex systems: for instance, we lack general guidelines to choose the observation X and a methodology to reliably translate it into a microscopic model. Hence steps (i) and (ii) require either an intuitive leap to guess the right observation and its inferred m, or exogenous knowledge, such as a mechanistic understanding of the nature of the interactions between the system's components, knowledge we currently lack for many complex systems. Moreover, once m is found, step (iii) is insufficient to verify its uniqueness. Indeed, the inferred m could be one of a family of potential models that predict X , which, absent any additional knowledge or observations, are all equally likely candidates for the system. This implies that rather than a specific point in the model space, what the observation X truly allows us to infer is a broader subspace MðX Þ & M, comprising all models m that can be validated against X . A model not included in MðX Þ can be ruled out by X , however, all models that are included in this subspace are equally likely candidates and X alone cannot be used to settle between them. Hence our goal is to develop a general method to infer MðX Þ, relying on minimal a priori knowledge of the structure of M 0 (x), M 1 (x) and M 2 (x).
Our method provides a systematic formalism to treat all steps (i)-(iii) above for constructing m directly from empirical data. As our input observation X we use the system's response to external perturbations. This represents a common empirical exploration of complex systems, such as genetic perturbations in biology 18,19 , monitoring the impact of local failures in technological systems 20 or tracking the spread of information in social networks 21,22 . Our key result is linking the observed system response to the leading , Figure 1 | Reverse engineering the dynamics of a complex system. (a) The dynamics of the system is captured by the unknown nonlinear dynamic equation (1), which we attempt to reconstruct from empirical observations. (b) The dynamic equation is composed of three functions M 0 (x), M 1 (x) and M 2 (x), defining the system's model m as a point in the model space, M. (c,d) To find this point we measure the system's response to permanent perturbations, allowing us to extract a set of observable functions directly linked to m: (c) the transient response is characterized by the steady-state x i and the relaxation time t i , from which we extract the exponents x and y; (d) the asymptotic response allows us to measure six additional functions (see Table 1 terms of m, providing a direct formulation by which to translate X into an equation of the form (1). The inferred equation, reverse engineered directly from the data, does not provide a specific model m, but rather, defines the exact boundaries of MðX Þ, providing the most general class of dynamics that can be used to describe the observed system in light of X .

Results
From observation to inference and validation. To infer m we express its components in terms of a Hahn series 23 as which is a generalization of the Taylor expansion to include both negative and real powers. The powers P i (n) represent a wellordered set in ascending order with n, namely P i (0) represents the leading power in the expansion of M i (x) around x 0 , P i (1)4P i (0) is the next power and so on. For certain systems the functional form of (3)-(5) is known, and the challenge is to infer the specific coefficients A n , B n and C n , which capture the model parameters, such as the rate constants in (1; refs 24-26). In contrast, here we consider systems whose microscopic model itself is unknown. Our goal is thus to uncover the functional form of (3)-(5), which is captured by the powers P i (n) that participate in the expansion. Hence we do not focus here on distinctions such as M 2 (x) ¼ x 2 or M 2 (x) ¼ 2x 2 , a distinction regarding the coefficient C n , which describes the rate at which the interaction occurs. Rather, our focus here is on distinguishing between M 2 (x)Bx 2 versus, say, M 2 (x)B1-x À 2 , as the different powers expressed in the expansion, P 2 (n), capture different interaction mechanisms, and hence provide an insight into the fundamental characteristics of the system's dynamics.
To infer the leading powers of (3)-(5) we link them to a set of experimentally accessible observables, related to the system's response to external perturbations. These observables, in turn, allow us to recover the structure of (3)-(5), reverse engineering the dynamics (1) to its leading terms. To conduct the observation, first we subject the nodes to external perturbations: this entails permanently perturbing the steady-state activity, x j of node j, and capturing the response of all other node activities x i (t). This could be achieved either through a controlled experiment, as frequently done in genetic perturbations 18 , or by monitoring natural perturbations, like observing the spread of ideas or memes in a social network 21,22 , or cascading failures in technological systems 20 . The impact of these perturbations is captured by two quantities (observations): Transient response T . The temporal dynamics following a permanent perturbation is characterized by the time-dependent relaxation from the original steady state, x i , to the perturbed steady-state x i (t-N) ¼ x i þ dx i . By linearizing (1) around the steady state we find (Supplementary Note 1) where t i is node i's relaxation time and B i ðl ij ; tÞ accounts for the propagation of the permanent perturbation along a distance l ij from the source j to the observed node i.
Asymptotic response G. After relaxation (tcmax(t i )), the system reaches a new, permanently perturbed state, captured by the response matrix 6,7,19,27 which quantifies the response of node i to j's perturbation (Supplementary Note 2). By extracting a set of empirically observable exponents from the transient T ð Þ and the asymptotic G ð Þ responses, we can link x i (t) (6) and G ij (7) to m (2) via (Supplementary Note 3) where RðxÞ $ and x 0 and y 0 are arbitrary constants. Equations (8)-(11) take a set of observables (exponents) extracted from the two observations, T (6) and G (7), as input (see Table 1), and provide the leading powers in the expansions (3)-(5) as output. From T we can directly extract two exponents (Supplementary Note 1): (i) y captures the dependence of the relaxation time on the node's degree k i , which follows the scaling t i $ k y i and (ii) x describes the steady-state activity, which can either follow (Supplementary Note 1) , the dissipation rate that captures the behaviour of the distancedependent propagation function, G(l). This function describes the aggregated response of all nodes at distance l from a perturbation. For a system following (1) we have G(l) ¼ e À bal , where a ¼ ln(hk 2 i/hki À 1) is the expansion rate of the network. Note that to measure the observables used for the reconstruction (8)-(11) we must have access to the degrees k i of all the nodes, requiring us to know the network topology A ij . Later we show that even without access to A ij , a partial reconstruction of m is still possible.
The inference presented in (8)- (11) provides only the leading terms of M 0 (x), M 1 (x) and M 2 (x). This leaves a degree of freedom to add additional terms, as denoted by Oðx Y AE ðyÞ Þ, which involves all the terms of order higher (Y þ (y)) or lower (Y À (y)) than x y . Finally, in (11) we used the unsigned Y(y) ¼ Y sign(y) (y), which allows the inclusion of higher (lower) powers for y40 (yo0). Hence by measuring the three characteristic exponents of G (d, j and b), together with the two characteristic exponents of T (y and x), we can reconstruct m to its leading order terms, allowing us to write an effective continuum equation (1) for the system.
Equations (8)-(11) represent our main result, offering a formalism to systematically reconstruct the dynamical equation governing a complex system. As expected, they do not point to a specific model, m, but narrow down the space M of all the potential models into the minimal subspace MðG; T Þ of the models that are consistent with observations G and T (Fig. 1e). This subspace is robust to adding higher order terms and to parameter selection (that is, rate constants). Therefore it defines a minimal model, capturing the essential aspects of the mechanisms underlying the system's interactions. Most importantly, our formalism guarantees that all models included in MðG; T Þ will successfully validate against G and T , and conversely all models for which m= 2MðG; T Þ will not be consistent with observations G and T . Hence if and only if m 2 MðG; T Þ can m be validated against the experimentally measured x and y (T ), and d, j and b (G).
Inferring the dynamics of a model system. To illustrate the predictive power of the developed formalism we first apply it to gene regulation, where the interaction is captured by a Hill function as (refs 13,14,28) Here B is a rate constant, a represents the level of self-regulation and the Hill coefficient, h, describes the level of cooperativity in gene regulation. We set B ¼ 1, a ¼ 1/2 and h ¼ 1/3, hence the real dynamics (12) is captured by the model Next we assume that m Real is unknown and preform an in silico reconstruction using (8)- (10). To observe T and G we perturb each node around the steady state and numerically measure x i (t) (6) and G ij (7) (Supplementary Note 4). For T we find x ¼ 2.0±0.02 and y ¼ 1.0±0.01 (Fig. 2a,b); for G we observe Fig. 2c-e). Therefore where Z ¼ 0.50±0.01 and r ¼ 0.33±0.03. Hence the reverse engineered dynamical model for the system has the form where we only included the leading terms of (14). Equation (15) accurately recovers the self-dynamics and the interaction terms to leading order (Fig. 2f-h). Indeed, expanding M 2 (x) in (13) for large x leads to the inferred M 2 (x) in (15). Hence, the inferred M 2 (x) not only captures the qualitative form of the Hill function, that is, the saturation of the interaction term for large x (ref. 28), but also the precise form of that saturation as 1 À x À 1/3 þ y. This demonstrates that our formalism can correctly reverse engineer the system's microscopic dynamics directly from observing T and G.
The inferred m can be also used to predict a broader range of macroscopic functions of direct experimental interest. Consider for example the probability density P(G) that a response term G ij is between G and G þ dG. We can show that P(G)BG À n , where . Another quantity frequently observed in biological 18 , social 21,22,29 and technological 20,30,31 networks is the cascade size distribution, P(C), representing the probability that exactly C nodes exhibit a significant response (above a threshold) to a perturbation. This distribution is driven by the degree distribution through . Finally, the incoming cascade of node i, defined as the group of all nodes whose perturbation impacted i above a threshold, can be shown to follow Q i $ k s i where s ¼ (b À d)/(b þ 1). Hence, by recovering b, j and d, the inferred m is guaranteed to also predict P(G), P(C) and P(Q) (Supplementary Note 2).
Inferring the dynamics of empirical systems. In experimental settings, we rarely have access to all the components of G and T . In some cases we may lack access to the temporal dynamics, unable to measure x and y; in others we lack a map of the underlying network, missing k i . Fortunately, the three additional exponents n, o and s, associated with the cascades and with the distribution of terms in G ij (Table 1), provide an excess of experimentally accessible quantities to arrive at the original exponents required for the reconstruction (8)- (10). This redundancy enables us to obtain insights from partial observations as well. We illustrate this by inferring the dynamical model for two systems that span rather different domains of inquiry: cell biology and human activity.
Reverse-engineering subcellular dynamics. We demonstrate the utility of our formalism using results obtained from highthroughput gene perturbation experiments for S. cerevisiae. Here G ij measures the change in the expression levels of 6,222 yeast genes induced by 55 individual genetic perturbations 32 . As the data set is not time resolved, we lack access to T . We also lack an accurate map of the underlying regulatory/protein interaction network, hence we cannot directly measure d, j and b. Our method offers valuable insights even under these rather limiting circumstances. First, we measure the distribution of terms in G ij , which we find to follow P(G)BG À n where n ¼ 2.0 ± 0.1 (Fig. 3a). Using Table 1 this translates to b ¼ 0. Next we measure the distribution P(Q) of the incoming cascades of all the nodes (Fig. 3b). Its bounded nature implies that it is disconnected from the degree heterogeneity (P(k)) 33,34 , possible only if s ¼ 0,

Function
Characteristic exponent

Additional information
Parameter in equation (6) Relaxation Parameter in equation (6) Asymptotic response G To observe the system's dynamics through its transient (T ) and asymptotic (G) response to perturbations, we focus on five observables (xi, ti, Si, Ii and G(l)), each with its characteristic exponent (x, y, d, j and b). Additional functions can be derived from these five observables, such as the cascades (Ci), the incoming cascades (Qi) and the response distribution (P(G)), giving rise to additional exponents (o, s and n), offering a redundancy in the predictions that is useful when we explore incomplete experimental data.
where, for simplicity, we once again omitted the higher order terms. Equation (16) predicts a family of potential models, with an arbitrary M 1 (x) and Z and r, degrees of freedom originating in our partial coverage of the functions used as input in (8)- (10). Indeed, it is expected than the less specific is the observation X , the broader are the limits of the inferred subspace MðXÞ. Despite these degrees of freedom, (16) offers crucial insight into the biological mechanisms that underlie the observed dynamics, helping us distinguish between the two classes of dynamical processes that potentially drive the expression patterns of genes in the studied experiment. The first process is regulatory interactions (RIs), the mutual activation/inhibition of genes, in which the interaction term has the form of a switch-like function, for example, Hill function (12), saturating as M 2 (x-N)-1 (M 2 (x-N)-0), to describe the activation (inhibition) 13,14 (Figs 2h and 3c). A competing process is protein-protein interactions (PPIs), a biochemical mechanism in which proteins physically bind to each other. PPIs are expressed in (1) through mass action kinetics by non-saturating polynomial terms. Indeed, according to the law of mass action a physical binding interaction mA þ rB-AB contributes a term of the form x m A x r B to the relevant equations in (1) (refs 9,15,35,36). Hence the polynomial (non-saturating) form of M 0 (x) and M 2 (x) in (16) indicates that in this experiment the system's dynamics is dominated by biochemical interactions, such as protein binding, degradation and dimerization, rather than genetic regulation.
To directly test this prediction we used validated lists of 2,930 PPIs (ref. 34) and 1,079 RIs 37 . It is expected that a large response G ij indicates an increased probability of finding a direct i, j link. Hence we can use G ij to predict the already known PPI/RI links. The standard measures to evaluate such predictions are the areas under the receiver operating curve (AUROC) and under the precision recall curve (AUPR) 38 (Supplementary Note 6). Measuring these curves (Fig. 3d,e), we find that in this experiment G ij is indeed more predictive of PPIs than of RIs: for PPIs we obtain AUROC ¼ 0.580 (P value 0.03, Supplementary Note 6) and AUPR ¼ 1.8 Â 10 À 3 (P value 0.07), while for RIs we find AUROC ¼ 0.504, and AUPR ¼ 1.4 Â 10 À 3 , both not significantly better than random (P value B1). Hence, even though cellular dynamics is driven by a combinations of both RIs and PPIs, in this experiment G ij emphasizes PPIs significantly more than the RIs, supporting our conclusion that PPIs, driven by biochemistry, offer the dominant contribution to the experimentally observed G ij , in agreement with the inferred (16). This finding is, in fact, consistent with other studies indicating that PPIs play a significant role in shaping the profile of expression data [39][40][41] . Therefore the strength of our formalism is not only its ability to reconstruct the continuum model (16), but also its ability to detect the dominance of PPIs in this experiment using only the observed data G ij . Reverse-engineering human dynamics. While building continuum models for biological systems has a long tradition, the diversity of human interaction has lead to a paucity of continuum models capturing human dynamics 42,43 . Here we rely on a data set that captures B6 Â 10 4 exchanges between 1,899 users of an online instant messaging service (UCIonline) during a B7 month period (ref. 44). This allowed us to construct A ij by linking each user pair that exchanged at least one message during the documented period. As here we cannot conduct a controlled perturbation experiment, we rely on proxies that capture quantities associated with x i in (6) and G ij in (7). First we measure the number of messages sent by user i during a 3 hours interval, x i (t), to obtain its time-dependent activity. We take x i ¼ ox i (t)4 as a proxy for i's steady-state activity and G ij ¼ ox i x j 4=ox 2 i 4 as a proxy for (7) (Supplementary Note 7). We find x ¼ 1.23±0.03, d ¼ 0 and n ¼ 1.60±0.03, predicting b ¼ 0.67 (Table 1, Fig. 4a-c). Hence equations (8)- (10) predict that the continuum model capturing the dynamics of this social system has the form where Z ¼ 0.81, r ¼ 0.54 and M 1 (x) is an arbitrary function.
To understand the inferred dynamics, we expand M 1 (x i ) as in (4) and take its leading power to be m, namely P 1 (0) ¼ m. Hence the self-dynamic term in (17) i . Note that this term provides the node's dynamics in isolation, as in the absence of interacting partners equation (1) reduces to _ x i ðtÞ ¼ M 0 ðx i Þ. Here as we describe the message exchanges between linked individuals, an isolated node should become inactive, x i (t-N) ¼ 0, a condition satisfied if we set x 0 ¼ 0. Hence, taking only the leading terms of m Human we obtain the continuum equation To evaluate m we consider the workload W i of pending messages that have to be sent or replied and its impact on i's activity x i . When the workload W i is large, i experiences a significant pressure to respond, increasing its activity x i . Yet W i decreases with every email i sends, hence a highly active i will rapidly decrease its workload and activity. Equation (18) predicts that the workload should increase with i's activity as (Supplementary Note 7) where z ¼ 2-Z-m. To test the validity of this prediction we measured the incoming messages of all nodes as a proxy for their workload, finding that the predicted scaling (19) holds for over three orders of magnitude with z ¼ 0.73 ± 0.02, predicting m ¼ 0.46 (Fig. 4d). The second term on the right hand side of equation (18) describes the impact of the neighbours' activity x j on x i . An active neighbour j increases i's activity, prompting it to reply or forward its incoming messages. The saturating nature of M 2 (x), however, indicates that a neighbour's impact is bounded, reaching a maximum of y 0 . This agrees with our expectation that even an extremely active neighbour cannot drive its contacts to be active beyond their maximal capacity.
To validate (18) we used an independent data set, recording B3 Â 10 5 email exchanges between 2,688 users during a B6-month period (Epoch) (ref. 45). The exponents extracted from this data set are very close to those obtained from UCIonline: x ¼ 1.26±0.04, d ¼ 0 and n ¼ 1.57±0.05 (Fig. 4f-h). The workload (19), however, has z ¼ 0.35 ± 0.1, which significantly differs from that of UCIonline (Fig. 4i). Reverse engineering the dynamics from these exponents leads to (18), with Z ¼ 0.79, m ¼ 0.86 and r ¼ 0.60. This striking agreement between the structure of the two equations inferred from two independent data sets, indicates that the reverse engineered (18) captures the fundamental dynamical characteristics of human Protein dynamics Regulatory dynamics  Table 1 this predicts b ¼ 0. (b) The incoming cascade size distribution, P(Q), is bounded (see Supplementary Note 6), indicating that Q i is independent of k i ; consequently s ¼ 0, predicting also d ¼ 0 (Table 1). (c) As here we only have access to two functions from the observation G and no access to T a all, the model subspace, MðGÞ, that we infer includes a broad range of potential models, as indicated by the nonspecific form of (16). Still, the inferred model can help us distinguish between two competing processes in cellular dynamics, which occupy distinct areas in the model space M: RIs are characterized by M 2 (x), which saturates for large x (activation/inhibition 13,14 ); protein-protein interactions are captured by a non-saturating polynomial M 2 (x) (mass action 8,9 ). The inferred model (16) communications. The only notable difference between the two inferred models is in the value of m, which is higher in Epoch than in UCIonline. This parameter characterizes the correlation between a node's activity and its tendency to respond to incoming messages (Supplementary Note 7). Hence the greater value of m in Epoch suggests that the propensity to respond to incoming communications is more strongly dependent on the activity in email communication than in instant messaging. To test if this is indeed the case we measured the responsiveness, R i , of all nodes, defined as the average ratio between the incoming and outgoing traffic between i and its interacting partners. As m is greater for Epoch than UCIonline, we expect a stronger dependency in Epoch between R i and x i than in UCIonline. Indeed, as shown in Fig. 4e,j this prediction is valid: the scaling of R i with x i , is greater for Epoch (g ¼ 0.37) than for UCIonline (g ¼ 0.14). This dynamical distinction, correctly predicted by the reverse engineered equations, provides independent support for the predictive power of our formalism. The empirical results presented above demonstrate the practical applicability of our methodology, which is a result of the robust nature of its underlying observables. Indeed, characteristic exponents and scaling laws, the basis of our reverse-engineering formalism, are often universal 46 , and hence unaffected by the microscopic details of the system's topology and dynamics 7 . This allows us to reliably measure the observables even for real systems, which rarely satisfy all of the model assumptions, for example, they are subject to the effect of noise, both in their dynamics as well as in their topology. For example, many real networks feature degree correlations, which, strictly speaking, violate our formalism's predictions (Supplementary Note 1). Fortunately, the observed exponents are rather insensitive to such microscopic discrepancies, and can be accurately extracted from both model and empirical systems, even in the presence of degree correlations. To exemplify this we simulated the predicted human dynamics (18) on the empirical network of Email Epoch. Even though this network features rather strong degree correlations, we show in Supplementary Note 7 that the measured obsevables, that is, the exponents x, d and n, can be accurately fitted by the predictions of our formalism.

Discussion
In summary, the technological and experimental advances of recent years have offered a wealth of data, capturing the detailed node-level dynamics of biological, social and technological systems. It is difficult, however, to extract predictive power of these systems without a mechanistic model. Such models are rare for complex systems, however. Here we address this challenge as a reverse-engineering problem, showing that we can use the data to peek into the inner mechanisms of the system, providing an analytical microscope into the dynamics of a complex system. We tested our formalism under rather strict conditions, inferring m from scratch, relying only on the system's macroscopic behaviour (G and T ). A more realistic scenario, however, is to use the proposed method in conjunction with some prior knowledge about the system's microscopic dynamics. Often we seek a resolution between two or more competing models, as was the case in our inference of the cellular dynamics. If the two candidate models have a different functional form in (1), they will occupy distinct subspaces in M, and the more likely of the two can be decisively determined (Fig. 3c). In other cases, the inference can be supported by some a priori knowledge pertaining to the system's behaviour. For instance, in human dynamics we postulated, based on the nature of the observed system, that isolated individuals should become inactive. Coupled  (Table 1). (d) We approximate the workload W i of user i by measuring its incoming messages, finding that W i $ x z i with z ¼ 0.73. (e) The responsiveness, R i , denoting the ratio of outgoing to incoming messages in i's account is found to scale as R i $ x g i , with g ¼ 0.
14. (f-h) We measured the exponents x, d and n from an independent email data set (Epoch 45 ), finding that the two systems feature similar behaviour. (i) The workload, however, differs across the two data sets with z ¼ 0.35, predicting that in the the reverse engineered equation (18) the exponent m is greater in Epoch than in UCIonline. (j) As a consequence (18) predicts a stronger dependence between R i and x i for email communications compared with online messaging. Indeed we find that for Epoch g ¼ 0.37, greater than the value observed for UCIonline (0.14). This correct prediction of (18) provides independent support for its validity. All error bars represent 95% confidence intervals (Supplementary Note 4). NATURE COMMUNICATIONS | DOI: 10.1038/ncomms8186 ARTICLE with our formalism (8)- (10), this allowed us to complete the reconstruction of (18). Other practical considerations in reverse engineering are addressed in Supplementary Note 7.
Finally, the fact that our formalism infers a subspace MðX Þ, rather than a specific model m provides us with exact bounds on the predictive power of an observation. Indeed, it tells us that our observation, X , provides us with theoretical grounds to select m 2 MðX Þ over m 0 = 2MðXÞ. At the same time, however, our formalism shows that X cannot be used to discriminate between any set of models within MðX Þ, by that marking the theoretical limits on the specificity of the inferred model.