**Barzel and Barabási reply:**

Bastiaens *et al*.^{1} raise several pertinent issues regarding the silencing method we proposed in ref. 2. They argue that the method is conceptually similar to modular response analysis (MRA)^{3,4,5} and that the use of correlation-based predictions as input for silencing generates symmetrizable network predictions, which prevents the inference of directionality. We agree that the principles that underpin the silencing method reported in our manuscript^{2} are similar to those used to derive MRA^{3,4,5} methods and regret that we did not cite the relevant literature, which we were unaware of at the time of publication. However, the main contribution in ref. 2 was that silencing, unlike MRA, is designed to improve correlation and mutual information–based predictions. These statistical similarity measures are frequently used in the context of link prediction^{6,7}, and thus, a method that can enhance their predictive power is a useful contribution towards the mapping of regulatory interactions^{8,9,10}.

We find, however, the second criticism of Bastiaens *et al*.^{1}, regarding the use of symmetrical predictions as inputs for silencing, to be rather unusual, as it does not seem to be directed towards our method, but rather towards the common practice of using the symmetrical correlation and mutual information–based methods for link predictions. Indeed, it has no relevance to our silencing method, which does not advocate the use of such predictions, but rather is designed to improve them. We would like to make it clear that silencing is not a stand-alone method, but instead should be used as a post-processing step for enhancing preexisting predictions. The symmetry that Bastiaens *et al*.^{1} criticize originates from the characteristics of the preexisting predictions (e.g., correlations), but has little bearing on the improvement to these predictions offered by our silencing method.

The final criticism of Bastiaens *et al*.^{1} is that our evaluation of the performance of our silencing method did not follow the precise DREAM5 protocol that was used by Marbach *et al*.^{6}. However, silencing was not designed to compete with the methods reported by Marbach *et al*.^{6} in DREAM5; instead, we created our method to improve them. Such improvement is independent of whether one does or does not follow the DREAM5 protocol.

The reservations of Bastiaens *et al*.^{1} regarding the applicability of our method to predictions based on correlation and mutual information have prompted us to improve the method's implementation by adding a preprocessing step that broadens the range of suitable input predictions. We present below a substantially expanded validation, reinforcing the conclusions in our original paper^{2}. The improved code, now tested using the full DREAM5 evaluation criteria, achieves an average score increase for link prediction of 96% for *Escherichia coli* and several orders of magnitude for *Saccharomyces cerevisiae*. Both the original and improved source codes are on figshare (http://dx.doi.org/10.6084/m9.figshare.1348220) and GitHub (https://github.com/baruchbarzel/NatureBiotech-31-720) as “Software 1–3”. In the following text, we respond in detail to the criticisms raised by Bastiaens *et al*.^{1}.

First, we agree that the principles that we used to derive the silencing method have common roots with the derivation of MRA^{3,4,5}, a mapping that, as opposed to our approximation, offers an exact solution to the fundamental equation (4) in our original paper^{2}. However, whereas MRA was shown to enhance perturbation experiments, the strength of our silencing method, as reported in the original paper^{2}, is that it also accounts for correlation-based predictions, namely *G*_{ij} constructed from statistical similarity measures (see below). This is a crucial complement to MRA, because most current inference efforts rely strongly on correlations and other statistical similarity measures^{6}. As we show in Figure 3 from our original paper and discuss in this response, our implementation of the silencing method allows us to enhance the predictive power not only of perturbation-based experiments, for which MRA is designed, but also of correlation-based predictions, thereby offering a broader range of application than MRA.

Second, Bastiaens *et al*.^{1} argue that the application of the silencing method to correlation-based *G*_{ij} results in “symmetrizable” network predictions, which violate the directionality of real biological networks. We find this difficult to reconcile, given that correlation-based matrices are perfectly symmetrical to begin with. It is therefore impossible for any methodology that uses correlation-based matrices as input to recover directionality. The information on the directions of the links is lost in the construction of *G*_{ij} and cannot be retrieved without exogenous inputs, such as a list of transcription factors, as provided in the DREAM5 challenge^{6}, which we used to validate our method.

This criticism may have resulted from a misunderstanding of the goal of our original paper in that silencing is not a stand-alone method. Rather, it is designed to take a preexisting *G*_{ij} as input and enhance its predictive power. Thus, the criticism of Bastiaens *et al*.^{1} might be better directed toward the input matrix *G*_{ij}, on account of its symmetrical structure, and not on the output provided by our method, *S*_{ij}. Indeed, the use of correlation-based matrices for gene network inference is common practice^{6,7,8,9}, despite the justified reservations of Bastiaens *et al*.^{1}. Thus, as imperfect as these inputs are, there is a need to develop methods that improve their performance. The true test is not whether the silenced *S*_{ij} matrix recovers the network's directionality because that information is already absent from *G*_{ij}, but rather whether *S*_{ij} improves on *G*_{ij}'s predictive power, namely does it predict direct links with higher fidelity. Our results as reported in our original paper clearly document that it does.

We agree with Bastiaens *et al*.^{1} that perturbations, the input for which silencing is ultimately designed, have different properties to correlations. However, like many other successful scientific applications, the silencing method is based on specific approximations. In this case, the approximation is that statistical similarity measures

can be used as substitutes for the terms of the linear response matrix

At no point do we claim that the silencing method is exact when applied to equation (1). Indeed equation (1) and equation (2) represent different measures, with several distinct mathematical properties. However, both are aimed at capturing similar characteristics of the system: quantifying the association between the activities of pairs of nodes *i* and *j*. Loosely speaking, these two quantities are expected to show similar behavior. For instance, a large *G*_{ij} in equation (2) indicates that x_{i} exhibits a strong response to changes in x_{j}. Under most circumstances such dependency will lead to a correspondingly large correlation in equation (1). Indeed, state changes in *x*_{i} will follow changes in *x*_{j}, which in turn, will lead to strong statistical correlations between them (Supplementary Note 1). Thus, applying the method to matrices of the form of equation (1) constitutes an uncontrolled approximation that must be tested either numerically or empirically, as we do in the paper, showing that the approximation of equation (1) is highly beneficial under both tests (Figs. 2 and 3 in ref. 2 and Fig. 1 below).

To better understand the class of input matrices that represent valid candidates for silencing we return to the original derivation of the silencing method. We show that we can write equation (5) in ref. 2, the silencing transformation, as

where *I*_{D} = *I* − *D*((*G* − *I*)*G*) (Supplementary Note 2). Equation (3) is a matrix representation of the derivation based on network paths that we provide in Supplementary Note I.2 in our paper^{2}. Indeed, since *S*_{ij} ≠ 0 only along direct links, the terms of *S*^{n} account for all paths of length *n* that link between *i* and *j*. Thus, equation (3) describes the observed response matrix *G*_{ij} as a summation over the contribution of all paths leading from the source node *j* to the target node *i*. A similar approach is reported in Feizi *et al*.^{10}, published in the same issue of *Nature Biotechnology* as our original paper^{2}, where the authors present an almost identical method to silencing, starting from equation (3) and taking *I*_{D} to be the identity matrix, *I*.

This derivation of the method, based on network paths, provides us with the general criterion that an input matrix must satisfy to be silence-able, that is, a good candidate for the silencing method. Consider a perturbation propagating along a path *i*→*k*→*j*. According to equation (3) the contribution of this single path is given by

Indeed, summing over all paths of length two between *i* and *j* provides

which, in accordance with equation (3), is nothing but the *j*, *i* term of *S*^{2}. Thus silencing is applicable as long as the propagation along paths follows the multiplicative rule of equation (4). This allows us to relax the stringent criteria for *G*_{ij}, and expand it from perturbation-based matrices of the form of equation (2) to a broader class of inputs, including other measures that satisfy equation (4). Thus, the application of silencing to correlation-based predictions of the form of equation (1) is valid as long as correlations propagate multiplicatively as in equation (4). Although this is not guaranteed, it is commonly the case that such multiplicative propagation is observed (Supplementary Note 2). Such interpretation of the propagation of indirect correlations was previously offered by Wright's path coefficients^{11}.

To summarize, we agree with Bastiaens *et al*.^{1} that correlation-based matrices of the form of equation (1) have different properties to response matrices defined in equation (2). The question is, however, whether the application of the method to correlation-based matrices represents a valid approximation. Our derivation provides the relevant criterion: that the propagation along network paths is governed by a multiplicative rule, as described in equation (4).

Third, Bastiaens *et al*.^{1} criticize the empirical validation in our original paper (Fig. 3 from our original paper^{2}) for not adhering to the protocol that was used in the DREAM5 challenge^{6}. Indeed, our implementation benefited from two advantages that the original participating groups did not have, namely the list of participating transcription factors (141 versus 334) and the number of nodes in the gold standard used for validation (4,511 versus 1,080). Thus Bastiaens *et al*.^{1} are correct to point out these differences, which prevented them from successfully reproducing our findings. Our goal, however, was not to compete with the methods in DREAM5, but to improve them. We maintain that the improvement achieved by silencing remains valid, even if our evaluation protocol differed from that used in DREAM5, as long as we consistently used the same criteria both before and after applying silencing. Indeed, our validation rigorously and fairly tested the silencing method against all the other methods reported under exactly the same experimental conditions, albeit those conditions were not the same as those used in the original evaluation of the methods reported in DREAM5. We emphasize that the two reported advantages were invested in the construction of the input matrix, *G*_{ij}, and not in the method's output matrix, *S*_{ij}. As explained above, silencing is not a stand-alone method; it is designed to take the prediction of an existing method, for example, Pearson correlations, as input, and improve on it by silencing indirect paths. The challenge is thus to construct the best possible input matrix, benefiting from all a priori knowledge available, and then show that silencing can further improve its predictive power. Indeed, the discrepancies in the deviation from the DREAM5 protocol only improved the baseline performance of the preexisting predictions. Given the fact that the improvement for which our method was tested is measured with respect to that baseline, the reported deviations did not grant any advantage to our method.

Bastiaens *et al*.^{1} also claim that the silencing method failed to improve the input when tested using the same protocol as the DREAM5 challenge evaluation. We tested this and found that the original code for silencing presented with our original paper^{2} (see figshare and GitHub, “Software 2”) performed poorly when tested using the DREAM5 evaluation scheme, confirming the concerns of Bastiaens *et al*.^{1}.

The above finding prompted us to reassess the performance of the silencing method, resulting in an improved implementation that is presented in this response (see figshare and GitHub, “Software 1”). Consider again the path-based derivation of equation (3). As the equation suggests, for silencing to be effective, *G*_{ij} must be the result of a geometric series, which aggregates the contributions of all paths. If this is indeed the case, the silencing transformation uses self-consistency to expose the kernel of the series *S*_{ij} from its sum *G*_{ij}. However, what is implicitly assumed in this analysis is that such a self-consistent solution is available, namely that the geometric series at the right-hand side of equation (3) is convergent. This requires that the spectrum of *S*_{ij}, *λ*_{S}, is between 1 and −1, namely that (Supplementary Note 2)

Thus the full criterion for an input matrix *G*_{ij} to be silence-able is that there exists a matrix for which both equation (3) and equation (5) are simultaneously satisfied. Satisfying equation (3) is, of course, an intrinsic property of *G*_{ij}, as we discussed in detail above. Equation (5), however, can always be satisfied by renormalizing the off-diagonal terms of *G*_{ij}^{10}.

Below we now offer an improved implementation of the method, where we add a preprocessing step that renormalizes the raw correlation-based matrix, by multiplying all off-diagonal terms by a constant until (5) is satisfied (Supplementary Note 3, see also ref. 10, in which a similar approach was introduced). Such renormalization preserves the ranks of all entries in *G*_{ij}, and thus has no effect on its performance as a link-prediction matrix. This step is not required if *G*_{ij} is a perturbation-based matrix of the form of equation (2), but crucial with correlation-based matrices, in which the specific values of the terms are arbitrary, and only their ranking plays a role in the prediction. Using this renormalization scheme we re-applied the silencing method, this time following the precise DREAM5 protocol^{6}. We used the original data sets and the evaluation scripts provided by the DREAM5 team to make sure that we strictly adhere to the 'rules of the game' (data sets were downloaded from ref. 10 and evaluation scripts from ref. 6). Source code and data reproducing this analysis are available on figshare (http://dx.doi.org/10.6084/m9.figshare.1348220) and GitHub (https://github.com/baruchbarzel/NatureBiotech-31-720).

Of the methods used in DREAM5, the most relevant are the ones implementing correlation/relevance-based predictions. Thus, we tested the silencing method on Pearson correlations, Spearman correlations, Relevance networks^{7}, ARACNE^{8} and the context likelihood of relatedness (CLR) algorithm^{9}. In addition to the results presented in our original paper^{2}, we now include results for all three organisms that were used to score DREAM5: the *in silico* model organism (Fig. 1a), and the empirical data sets for *S. cerevisiae* (Fig. 1b) and *E. coli* (Fig. 1c). Scoring was done using the evaluation script provided by the DREAM5 team^{6} (Supplementary Note 3). With the exception of CLR (see below), we find that silencing significantly improves all tested methods, with an average score increase of 54%, excluding CLR, and 41%, including CLR (Fig. 1d). We observe the most substantial improvement for the empirical data sets: the average improvement for *E. coli* is 96%, and for *S. cerevisiae* silencing enhances the average score by several orders of magnitude. In DREAM5 most methods performed very poorly on the gold standard constructed for *S. cerevisiae* some scoring no better than a random guess^{6}, with scores as low as 2 × 10^{−4} (ARACNE) or 8 × 10^{−5} (Spearman). The fact that silencing was able to improve these predictions more than 1,000-fold (0.95 and 0.63, respectively) shows that silencing can extract hidden information even from extremely low-quality inputs. The only exception is CLR, for which silencing leads to a marginal decrease in overall performance. This might be because CLR, like silencing, uses global information to prune indirect effects^{9}. Thus silencing is perhaps redundant for use with CLR. Note that the overall score in DREAM5, which averages performance of an algorithm over all three organisms, gives little weight to *S. cerevisiae*, whose typical scores are much lower than those of the other two organisms. For instance, the 5,000-fold increase observed for ARACNE in *S. cerevisiae*, from 2 × 10^{−4} to 0.95, is marginalized by the 110% increase for this method in *E. coli* (3.7 to 7.8) and the 25% increase for *in silico* (29.5 to 36.7) because averaging gives substantially more weight to the latter two organisms. Had we controlled for that, the overall improvement would have been significantly higher, and even CLR would have shown an overall increase in performance (−12%, −18%, and +38% for *in silico*, *E. coli* and *S. cerevisiae*, respectively). In response to a remark of Bastiaens *et al*.^{1} that AUPR provides a more relevant measure than AUROC for these systems^{12}, we tested the specific improvement achieved in the score for AUPR. We find that AUPR is the main source of improvement, showing an average increase of 122% across all methods and algorithms (Fig. 1e,f). These results substantially reinforce the conclusions from our original paper^{2} and show that we can achieve a large improvement on a wide range of methods using our approach.

A last claim of Bastiaens *et al*.^{1} is that our approximation based on correlation decay disagrees with biological reality. We concur that in certain cases a local perturbation may increase as it propagates along a network path, rather than decay. However, our application of the silencing method focused on statistical similarity measures, such as correlations, which always decrease along paths, and by definition cannot exceed unity. Moreover, even regarding perturbations, we argue that such amplification is not typical in biological networks. Indeed, if small perturbations were repeatedly amplified during their propagation, the implications on the stability and robustness of living cells would be dramatic; every local disturbance would lead to a macroscopic response and the modular nature of the cell's functionality would be constantly distracted by the cross-talk between distant genes. Thus, it is not surprising that both theoretical and empirical analyses of cellular dynamics indicate, time and again, that the impact of perturbations is, in most cases, strictly local^{13}. Studies have shown that perturbations typically feature an exponential decay as they penetrate the network^{14,15,16,17,18}. Others have quantified the impact of perturbations by measuring cascade sizes, that is, the number of genes that exhibit a significant response following a perturbation. These reports find that most cascades are tiny and only rarely does a perturbation affect a substantial number of genes^{19,20,21}. This paucity of large cascades further supports the notion that most perturbations do not penetrate deeply into the network.

Finally, the premise of network inference relies on the notion that the magnitude of the terms in the prediction matrix *G*_{ij} correlates with the likelihood of direct linkage^{6,7,8,9}. If, as Bastiaens *et al*.^{1} suggest, there are cases where the *G*_{ij} terms systematically increase with the distance between *i* and *j*, then in these cases *G*_{ij} is a poor candidate for network inference in general, with or without silencing, and thus we would not consider it a suitable input for our method.

To summarize, although we disagree with much of the criticism made by Bastiaens *et al*., we wish to thank them for raising several important issues and igniting a discussion that has ultimately led to the development of the improved silencing algorithm presented here.

## Change history

### 30 April 2015

In the version of this article initially published, citations were given for Supplementary Software 1 and 2; these files have been removed from the website and the citations replaced by ones for figshare and GitHub (see supplementary information for explanation). The Software files have been available on figshare since publication and are now also available at GitHub (https://github.com/baruchbarzel/NatureBiotech-31-720), the link for which we have added to the paper. The changes have been made in the HTML and PDF versions of the article.

## References

- 1
Bastiaens, et al.

*Nat. Biotechnol.***33**, 336–339 (2015). - 2
Barzel, B. & Barabási, A.-L.

*Nat. Biotechnol.***31**, 720–725 (2013). - 3
Kholodenko, B.N. et al.

*Proc. Natl. Acad. Sci. USA***99**, 12841–12846 (2002). - 4
Kholodenko, B.N.

*Nat. Cell Biol.***9**, 247–249 (2007). - 5
Kholodenko, B.N., Yaffe, M.B. & Kolch, W.

*Sci. Signal.***5**, re1 (2012). - 6
Marbach, D. et al.

*Nat. Methods***9**, 796–804 (2012). - 7
Butte, A.J. & Kohane, I.S. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements.

*Pacific Symposium on Biocomputing***5**, 415–426 (2000). - 8
Margolin, A.A. et al.

*BMC Bioinformatics***7**, S7 (2006). - 9
Faith, J.J. et al.

*PLoS Biol.***5**, 1 (2007). - 10
Feizi, S. et al.

*Nat. Biotechnol.***31**, 726–733 (2013). - 11
Wright, S.

*J. Agric. Res.***20**, 557–585 (1921). - 12
Davis, J. & Goadrich, M. The relationship between Precision-Recall and ROC curves.

*in Proceeding of the 23rd International Conference on Machine Learning*(ACM, Pittsburg, Pennsylvania, 2006). - 13
Gulbahce, N. et al.

*PLoS Comput. Biol.***8**, e1002531 (2012). - 14
Barzel, B. & Biham, O.

*Phys. Rev. E***80**, 046104 (2009). - 15
Barzel, B. & Barabási, A.-L.

*Nat. Phys.***9**, 673–681 (2013). - 16
Maslov, S. & Ispolatov, I.

*Proc. Natl. Acad. Sci. USA***104**, 13655–13660 (2007). - 17
Maslov, S. & Ispolatov, I.

*New J. Phys.***9**, 273–283 (2007). - 18
Yan, K.-K., Walker, D. & Maslov, S.

*Phys. Rev. Lett.***101**, 268102 (2008). - 19
Kauffman, S.

*Physica A***340**, 733–740 (2004). - 20
Furusawa, C. & Kaneko, K.

*Phys. Rev. Lett.***90**, 088102 (2003). - 21
Lu, T. et al.

*BMC Bioinformatics***6**, 37–49 (2005).

## Author information

### Affiliations

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing financial interests.

## Supplementary information

### Supplementary Text and Figures

Supplementary Notes 1–3 (PDF 833 kb)

## Rights and permissions

## About this article

### Cite this article

Barzel, B., Barabási, AL. Response to Letter of Correspondence – Bastiaens *et al*..
*Nat Biotechnol* **33, **339–342 (2015). https://doi.org/10.1038/nbt.3184

Published:

Issue Date: