Main

The currently incomplete maps of molecular interactions between cellular components limit our understanding of the molecular mechanisms behind human disease1,2,3,4,5,6. Ultimately, high-throughput projects7,8,9,10 are expected to provide the accurate maps of interactomes necessary to systematically unlock disease mechanisms. Yet, as a complete interaction map is currently not at hand, we need to develop tools that allow us to infer the structure of cellular networks from empirically obtained biological data11,12. Many current tools designed to infer functional and physical interactions in the cell rely on the global response matrix,

which captures the change in node i's activity in response to changes in node j's13. This matrix can be measured directly from gene knockout or overexpression experiments, or inferred indirectly using related measures such as Pearson or Spearman correlations14, mutual information15,16 or Granger causality17. Traditional methods for predicting links15,16,18,19 assume that the magnitude of Gij correlates with the likelihood of a direct functional or physical link between nodes i and j. Yet Gij cannot distinguish between direct and indirect relationships: a path ikj can result in a measurable response observed between i and j, falsely suggesting the existence of a direct link between them (Fig. 1a,b).

Figure 1: Silencing indirect links.
figure 1

(a) The experimentally observed global response matrix, Gij, accounts for direct as well as indirect correlations, with no clear separation between them. The source of Gij could be gene coexpression data, statistical correlations or genetic perturbation experiments. (b) In the absence of a clear separation in Gij assigned to direct and indirect correlations, our ability to infer direct physical links (solid lines) is limited. Simple thresholding, that is, accepting all links for which Gij exceeds a predefined threshold, is known to predict spurious links (thick dashed lines) and overlook true links (thin solid lines). (c) Although the average Gij terms associated with direct links are higher than the average terms associated with indirect links, as captured by the discrimination ratio, ΔG, the difference is not sufficient to fully discriminate between direct and indirect links. (d) Silencing is achieved through equation (5), which exploits the flow of information in the network: the flow from the source (j ) to the target (i ) is carried through the indirect effect Gkj (brown) coupled with the direct impact Sik of the target's nearest neighbor k. By silencing the indirect contributions, equation (5) provides the local response matrix, Sij, whose nonzero elements correspond to direct links. (e,f) In Sij the terms associated with indirect links are silenced, allowing us to detect only the direct links of the underlying network. (g) As indirect terms become much smaller in Sij, we obtain a greater discrimination ratio, ΔS. (h) The degree of silencing, κ, captures the increase observed in the discrimination ratio by the transition from Gij to Sij (through equation (5)).

Several methods to correct for such effects have been proposed. Information theory approaches evaluate the association between nodes by measuring the entropy of their mutual activities, where a low entropy indicates a statistical dependence between the node activities16,18,20; probabilistic models, such as the graphical Gaussian model, allow one to evaluate the correlation between i and j, while controlling for the state of node k, and thereby provide a more indicative measure of direct linkage21,22,23,24,25; other models rely on assumptions pertaining to the network topology, such as the tendency of real networks to exhibit strong degree correlations26. The ultimate solution, however, should enable us to fully unwind the direct from the indirect effects, providing a measure that distinctly indicates the existence of direct links. Consequently, we focus here on the local response matrix

in which the contribution of indirect effects is eliminated. In contrast with equation (1), which allows for global changes in i and j's environment, here the “ ” indicates that Sij is defined to capture only local effects, namely the response of i to changes in j when all surrounding nodes except i and j remain unchanged. Hence Sij > 0 implies a direct link between i and j.

We derive a method for calculating the local response matrix (2) from experimentally accessible correlation measures, allowing us to mathematically discriminate direct from indirect links (Fig. 1). We show that the resulting Sij matrix, in which the contribution of indirect paths is silenced, is more discriminative than the empirically obtained Gij matrix, enhancing our ability to extract direct links from experimentally collected correlation data.

Results

The silencing method

To extract Sij from the experimentally accessible Gij, we formally link equations (1) and (2) via

Equation (3) is exact and the sum accounts for all network paths connecting i and j (Supplementary Note, I.1–2). It is of limited use, however, as it requires us to solve N2 coupled algebraic equations. In Supplementary Note, I.1, we show that equation (3) can be reformulated as

where I is the identity matrix and D(M) sets the off-diagonal terms of M to zero. To obtain an approximate solution for S, we use the fact that, typically, perturbations decay rapidly as they propagate through the network, so that the response observed between two nodes is dominated by the shortest path between them. This allows us to approximate D(S · G) with D((GI)G) (Supplementary Note, I.3), obtaining

Equation (5), our main result, provides Sij from the experimentally accessible Gij. It achieves this through a 'silencing effect', in which direct response terms are preserved, whereas indirect responses are silenced. To understand this, consider a specific term in Gij, documenting the response of node i to j's perturbation. As indicated by equation (3), this response is a consequence of all direct and indirect paths leading from j to i. As we document below, the transformation (5) detects the indirect paths and silences them, maintaining only the contribution of the direct paths (Fig. 1d–f). An alternative method to approximate D(S · G) in equation (4) is using an iterative scheme, in which Sij is evaluated first via equation (5) and then used as input in equation (4), repeating the process until sufficient accuracy is achieved (Supplementary Note, I.1).

Silencing in model systems

To demonstrate the predictive power of equation (5), we implemented Michaelis-Menten dynamics on a model network (Supplementary Note, III), as commonly used to model gene regulation27,28. We obtained Gij by perturbing the activity of each node and then calculated Sij using equation (5). Figure 2a shows the Gij and Sij terms associated with interacting and noninteracting node pairs. Although Gij is higher for direct interactions, the overlap between the orange and the green symbols indicates a lack of a clear threshold q that separates direct and indirect interactions. In contrast, Sij displays a clear separation between direct and indirect interactions, accurately predicting each direct link. Indeed, the receiver operating characteristic (ROC) curve derived from Gij (Fig. 2b) has an area of AUROC = 0.91, reflecting inherent limitations in separating direct from indirect interactions based on Gij only. In contrast, for Sij we obtain AUROC = 0.997 (blue), where the true-positive rate reaches 100% with a false-positive rate of <10−3. Also, as opposed to Gij, for which precision increases gradually with the threshold q (Fig. 2c), Sij's precision jumps to 1 for q > 10−4. Hence, in our well-controlled model system, any nonzero Sij corresponds effectively to a direct link.

Figure 2: Network inference in model systems.
figure 2

We numerically simulated Michaelis-Menten dynamics on a scale-free network (refs. 40,41,42), extracting the correlations Gij between all pairs of nodes (Supplementary Note, III). (a) Gij and Sij associated with interacting and noninteracting node pairs. Sij silences the correlations associated with indirect interactions, resulting in a clear separation between direct and indirect interactions, a phenomenon absent from Gij. (b) ROC curve obtained from Gij (red, AUROC = 0.91) and Sij (blue, AUROC = 0.997). The Sij network reaches 100% accuracy with a negligible amount of false positives. TPR, true-positive rate; FPR, false-positive rate. (c) Precision obtained for threshold q for Gij and Sij. The gradual rise of the Gij-based precision indicates that for a broad range of thresholds only a small fraction of the links will be identified. In contrast, the steep rise in precision for Sij indicates its enhanced discriminative power between direct and indirect links; virtually any nonzero Sij corresponds to a directly interacting pair. (d) The discrimination ratio, Δ, is much higher in Sij compared to Gij. This indicates that Sij is a much better predictor of direct versus indirect interactions. The silencing metric (6), which captures the increase in the discrimination ratio, is κ = 15.0. (e) Silencing increases with the path length dij between i and j, so that the more indirect the link, the more dramatic the silencing. (f) The source of Sij's success is the silencing effect, here illustrated on correlations measured for a linear cascade. The reconstruction of the cascade from Gij is confounded by numerous nonvanishing indirect correlations. In Sij the indirect correlations are silenced, providing a perfect reconstruction.

The performance of equation (5) is due to the silencing effect. It leaves Gij unchanged if i and j are linked, whereas it systematically lowers all Gij not rooted in a direct interaction. To quantify this effect we measured the discrimination ratio ΔG = 〈GijDir/〈GijIndirS = 〈SijDir/〈SijIndir), which captures the ratio between Gij (Sij) terms associated with direct links and those associated with indirect links. We find that Sij is much more discriminative than Gij owing to its silencing of indirect responses. This effect can be quantitatively measured through the silencing metric

which captures the increased power of Sij to discriminate between direct and indirect links compared to Gij (Fig. 1h). In our model system we find that κ = 15, a silencing of more than an order of magnitude (Fig. 2d). Furthermore, the longer the distance dij between two nodes, the larger is the silencing (Fig. 2e). As an illustration, consider a linear cascade in which changes in any node result in a finite response Gij by all other nodes (Fig. 2f). Equation (5) silences all indirect responses, while leaving the response of direct links effectively unchanged, offering a discriminative measure that enables a perfect reconstruction of the original network.

Predicting molecular interactions in E. coli

To test the predictive power of equation (5) on real data, we used the E. coli data sets distributed by the DREAM5 network inference challenge19. The input data include a compendium of microarray experiments measuring the expression levels of 4,511 E. coli genes (141 of which are known transcription factors) under 805 different experimental conditions (Supplementary Note, IV.1). We constructed three separate global response matrices Gij between the 141 transcription factors and their 4,511 potential target genes, based on (i) Pearson correlations, (ii) Spearman rank correlations and (iii) mutual information, which are three commonly used methods for link detection (Supplementary Note, IV.3). From each of the three Gij matrices, we obtained Sij via equation (5), and compared the performance of Gij with the pertinent Sij. To validate our predictions we relied on the gold standard used in the DREAM5 challenge, consisting of 2,066 established gene regulatory interactions. Measuring AUROC from Gij and Sij, we found an improvement of 56% for Pearson correlations (Fig. 3a), 67% for Spearman rank correlations (Fig. 3b) and a smaller improvement of 6% for mutual information (Fig. 3c), allowing us to improve upon the top-performing inference methods19.

Figure 3: Inferring regulatory interactions in E. coli.
figure 3

(a) Starting from gene expression data, we used Pearson correlations in expression patterns to construct Gij for 4,511 E. coli genes, obtaining Sij via equation (5). We compared our predictions to a gold standard of experimentally verified genetic regulatory links19. The area under the ROC curve (AUROC) is increased from 0.59 to 0.64 in the transition from Gij to Sij, representing a 56% improvement (above the baseline of 0.5 for a random guess). TPR, true-positive rate; FPR, false-positive rate. (b) An improvement of 67% is observed for Spearman rank correlations. (c) A less dramatic improvement of 6% is shown when Gij is constructed using mutual information (MI). (d) The discrimination ratio for all three methods compared with that obtained from the pertinent Sij matrix. The transition to Sij increases the discrimination between direct and indirect interactions by a factor of 2 or more, so that indirect interactions have a considerably lower expression in Sij. (e,f) This observation becomes even more dramatic when focusing on two specific motifs: cascades and co-regulators. In Gij the indirect correlation between X and Y, which is induced by the intermediate node, I, may lead to the false prediction of the spurious X-Y link. Thanks to silencing, the discrimination between the direct and indirect links in these motifs is increased by a factor of 3 or more for Pearson and Spearman correlations, and by a factor of 2 for mutual information.

We further tested the discrimination ratio, Δ, and the silencing, κ, for each of these methods, finding that indirect correlations are subject to an average of twofold silencing in the transition from Gij to Sij (Fig. 3d). Silencing is especially crucial in the presence of the cascade and co-regulation motifs (Fig. 3e,f), where most inference methods indicate a spurious link between X and Y, owing to the indirect correlation mediated by node I. Indeed, the transformation (5) silences these indirect correlations by a factor of three or more for Pearson and Spearman correlations and by a smaller factor (1.6 or 2.1) for mutual information, overcoming one of the most common hurdles of inference methods, which tend to over-represent triadic motifs19.

The effects of noise and uncertainty

As all experimental data are subject to noise, the global response matrix, Gij, is characterized by some degree of uncertainty. To test the performance of our methodology in the presence of noise, we repeated the numerical experiment of Figure 2, this time adding Gaussian noise to Gij, which allows us to calculate silencing as a function of increasing the signal-to-noise ratio θ (Fig. 4). As expected, silencing is unaffected by small values of θ, so that κ features a plateau below θ 0.1. For large θ, silencing decays as κ θ−1, demonstrating that the performance of the method decreases slowly as the signal-to-noise ratio is increased. Indeed, as opposed to a rapid exponential decay, the observed, slower, power-law dependence indicates that the method is rather tolerant to noise. Silencing is lost only when the noise reaches the critical level θC ≈ 0.75, when the signal is almost completely overridden by noise, leading to κ = 1 (Supplementary Note, V.1).

Figure 4: Silencing in a noisy environment.
figure 4

To test the method's performance in the presence of a noisy input we added Gaussian noise to the numerically obtained Gij, and measured the silencing, κ, versus the signal-to-noise ratio θ. For low noise levels (θ 0.1), silencing is relatively unharmed. At higher noise levels, silencing decreases as κ θ−1, a slow decay that supports the robustness of the method. Silencing is lost at θC ≈ 0.75, when the signal is almost fully driven by the noise.

Hidden nodes offer another source of uncertainty. They represent the fact that in most cases we are unable to read the states of all nodes in the system29. To illustrate the effect of the hidden nodes on the performance of the silencing method, we consider the case of a simple cascade ikj, where the intermediate node k is hidden. In this scenario, equation (5) will not be able to silence the indirect ij link, because in the observable system, the Gij term cannot be attributed to any indirect path. Hence, absent any other information about the system, it is mathematically impossible to infer the indirectness of Gij, as the removal of k isolated i from j30. This touches upon the fundamental mechanism of silencing: the silencing transformation (5) exploits the flow of information through indirect paths (Fig. 1 and Supplementary Note, I.2). Consequently, if as a result of hidden nodes, the network fragments into several components such that the node pair i and j become isolated from each other, then all indirect paths between them became hidden and the pertinent Gij term will not be silenced (Fig. 5a,b). Hence silencing is expected to fail only when the network breaks into many isolated components so that most node pairs become isolated. Fortunately, a fundamental property of complex networks is that with average degree 〈k 1, one needs to remove a large fraction of the nodes to fragment the underlying giant connected component31,32,33,34. Therefore we can build on percolation theory, which allows us to analytically predict how the size of the largest connected component changes with the random removal of a certain fraction of nodes35,36. The calculation shows that silencing is maintained as long as the fraction of hidden nodes is smaller than

where (Supplementary Note, V.2). This equation indicates that for large 〈k〉 the method will be reliable even if a large fraction of the nodes are hidden.

To test this prediction, we revisited the numerically obtained Gij analyzed in Figure 2 and measured the degree of silencing after randomly removing an increasing fraction of nodes. In each case we also measured the ratio between isolated and connected node pairs (ρ). We found that, as predicted, the degree of silencing is driven mainly by ρ, approaching κ ≈ 1 (no silencing) when ρ ≥ 1, namely, when the isolated pairs begin to dominate the network (Fig. 5c). Here as 〈k〉 = 4, equation (7) predicts ηC ≈ 0.57, that is, the method will fail only when almost 60% of the nodes are hidden. Note that for biological networks, 〈k〉 is expected to be in the range of 〈k 10 (ref. 37), predicting ηC 0.8. Namely, one needs to lose access to 80% of the nodes for silencing to lose its effectiveness.

Figure 5: Performance with hidden nodes.
figure 5

(a) A network with N = 8 nodes, of which a fraction η = 1/4 are hidden. The observable subnetwork has 6 nodes, 5 forming a connected component (with 10 connected node pairs) and 1 isolated (6 isolated pairs). The ratio between isolated and connected node pairs here is ρ = 6/10. Equation (5), applied to the observable network, successfully silences the indirect Gij terms among the nodes of the connected component. However, the correlations between the isolated node and the rest of the network, lacking an indirect path, are not silenced. (b) To test the silencing in the presence of hidden nodes, we used the numerically obtained Gij (Fig. 2) from which we eliminated a fraction η of the nodes, obtaining an observable network with 104 isolated node pairs (ρ ≈ 10−3). After applying equation (5) to the remaining nodes, we found that the silencing of Gij terms associated with connected node pairs is unaffected (orange bar), whereas for the isolated node pairs, silencing drops to κ = 1, namely no silencing (purple bar). Hence, for the isolated node pairs, Sij is not more predictive than Gij. (c) Increasing the fraction of hidden nodes, η (top horizontal axis), we measured κ versus ρ. As expected, silencing is observed as long as most node pairs are connected via finite paths (ρ < 1). However, when the number of hidden nodes is increased to the point that the isolated pairs dominate (ρ > 1), silencing is no longer observed (κ = 1). The critical fraction of hidden nodes, ηC, corresponds to ρ = 1, the point at which silencing no longer plays an important role. Here we find ηC ≈ 0.57 (blue arrow), in agreement with the prediction of equation (7).

Discussion

With computational complexity O(N3), equation (5) is scalable and requires no assumptions about the network topology. By silencing indirect effects, it turns the raw correlation data into a predictive Sij matrix, dominated by direct interactions. It is especially suited to treat perturbation data, such as genetic perturbation experiments, in which case Gij describes the response of all genes (dxi) as a consequence of the perturbation of the source gene (dxj)38. In practice, however, Gij could be the result of a broader set of experimental realizations where other measures are used to evaluate the association between nodes, typically statistical measures such as Pearson or Spearman correlation coefficients. Still, our empirical results (Fig. 3) clearly show that the transformation (5) successfully applies to these empirically accessible measures as well. Hence, silencing is largely insensitive to the specific process by which Gij was constructed.

The method's broad applicability is rooted in the fact that it does not depend on the value of each specific term in Gij, but rather on the global relationships between them. Indeed, the global structure of Gij reflects the patterns of propagation of the perturbations along the network. Equation (5) helps uncover these paths from the raw data, disentangling the direct from the indirect effects. These patterns of information flow are inherent to the underlying network structure and should not depend on the specific experimental realization of equation (1). For instance, a cascade ijk will be characterized by a decreasing correlation propagating along the arrows, a large correlation between i and j and a weaker one between i and k. Although the magnitude of these correlations might depend on the size or the form of i's perturbation as well as on the statistical measure we used to evaluate them, the decay pattern required to infer the structure of the cascade is an inherent property of the network flow and can be successfully detected by the silencing method (Supplementary Note, I.4).

The silencing transformation is derived from fundamental mathematical principles of dynamical correlations in networks. Hence it is expected to apply under rather general conditions. However, as equation (5) indicates, it requires that the input matrix, Gij, is invertible. This imposes some limitations when constructed from statistical correlation measures. For instance, in the empirical results of Figure 3a, we constructed Gij from Pearson correlations, using the states of 4,511 nodes measured under 805 experimental conditions. In general, if the number of experimental conditions is smaller than the number of nodes, the resulting Pearson correlation matrix may be singular. In this case, additional processing will be required before equation (5) can be applied. In this work, following the DREAM5 protocol, we only focused on the correlations between the 141 known transcription factors and the rest of the nodes, which lead to an invertible Gij (Supplementary Note, IV). Other means to ensure Gij's invertibility are discussed in Supplementary Note, IV.4.

Isolating indirect effects in correlation data, a fundamental challenge of network inference, is typically approached through local probabilistic tools12,14,15,16,17,18. In contrast, the success of the silencing method is rooted in its exploitation of the global network topology39. It relies on the fundamental principles of network structure and dynamics to identify and silence the effects of indirect paths. The ability to extract Sij from Gij could also have implications for our understanding of network dynamics. Indeed, Gij is a global network measure, as its magnitude is determined by the numerous indirect paths connecting i and j. Hence, for a given dynamics, the Gij matrix will take a different form depending on the network topology, making it a poor predictor of the system's dynamics. By eliminating indirect effects, Sij measures the effect gene i would have on gene j had they been isolated from the rest of the network. It thus helps us quantify the dynamical mechanism that governs individual pairwise interactions, avoiding the convolution of dynamical and topological effects present in experimental data. For instance, consider a set of perturbation experiments providing Gij. The structure of Gij reflects the microscopic mechanisms that govern the pairwise interactions, for example, genetic regulation and biochemical processes. It is difficult, however to extract this information from Gij because its terms are a convolution of many interactions, reflecting the many paths leading from i to j. The transition to Sij, via equation (5), allows us to treat each isolated interaction on its own, providing a direct observation of the microscopic interaction mechanism. Direct application of this fact could be the derivation of a rate equation that governs the system's dynamics from Gij, as well as predicting the universality class and the scaling laws governing the system's response to perturbations. Hence equation (5) helps translate the ever-growing amount of data on global correlations into valuable local information.

Methods

Methods and any associated references are available in the online version of the paper.