Inferability of transcriptional networks from large scale gene deletion studies

Generating a comprehensive map of molecular interactions in living cells is difficult and great efforts are undertaken to infer molecular interactions from large scale perturbation experiments. Here, we develop the analytical and numerical tools to quantify the fundamental limits for inferring transcriptional networks from gene knockout screens and introduce a network inference method that is unbiased and scalable to large network sizes. We show that it is possible to infer gene regulatory interactions with high statistical significance, even if prior knowledge about potential regulators is absent.

knockout libraries, ranging from microbes 3,4 to higher eukaryotes 5 and open up a much more informative data source than inferring gene regulatory networks from unspecific perturbations, such as stress or changes in growth conditions 6 . However, the detection of direct interactions between two genes from association measures -for example the covariance between transcript levels -remains a highly non-trivial task, given the significant noise among biological replicates, the frequent case where the number of parameters exceeds the number of independent data points, and the high dimensionality of the inference problem.
The existence of a direct interaction between gene A as a source of regulation (source node) and gene B as a target of regulation (target node) can be detected if a significant part of the transcriptional activity of B can be explained by the transcriptional activity of A but not by the activities of the remaining genes in the network. Thus a necessary condition for identifiability or inferability of links is the knowledge about the information that can be transmitted by alternative routes in the network, which can be obtained by perturbing the involved nodes 7 . As most gene perturbation screens are incomplete -for example due to the fact that essential genes cannot be knocked out -we have in general the situation that a significant amount of interactions within an N -gene network are non-inferable, regardless of the amount of experimental replicates and the strength of perturbations. Direct interactions inferred from transcriptome data typically oversimplify the molecular complexity behind gene regulation, which frequently involves protein-protein interactions and modifications on protein or DNA level.
To calculate the maximum number of links that can be inferred from knockout screens in absence of other constraints, we consider a directed network of N nodes, with node activities as observables and a fraction q of node activities strongly perturbed by external forces. Given that the perturbed nodes are randomly distributed within the network, we can analytically calculate the expected fraction of inferable links, F (q), by a counting procedure illustrated in Fig. 1a (Online Methods and Supplementary Note 1).
As F (q) is an upper bound for the expected number of directed links that can be inferred from stationary node activities, we now ask how this bound is related to the structural properties of the network. To compare different network architectures, it is useful to define the network inferability, I F , as the area under the F (q)-curve, I F := 1 0 F (q)dq. Comparison of I F between two general classes of network structures with node degrees either power law or Poisson distributed shows that networks that are enriched with nodes of high out degree (outgoing hubs) are the most difficult ones to infer (Fig. 1b). Differences in inferability due to network structure are most predominant for networks with low mean degree and become less predominant with high mean degree ( Fig. 1c). As our measure of inferablity, I F , is essentially determined by the outdegree distribution the curve starts saturating for scale-free exponents γ > 3, as in this regime the variance of the number of links per node is essentially constant for increasing γ and fixed mean degree 8 (Fig. 1d).
The network inferability, I F , is asymptotically independent of network size (Fig. 1e). We further investigated the inferability of causal interactions in biological and social networks as a function of the mean degree (Fig. 1f). The decreasing trend can be explained by the higher number of alternative routes that come with a stronger connected network. The low inferability of gene regulatory networks can be attributed to master regulators that regulate a large fraction of the genome (hubs with high outdegree), whereas the comparatively high inferability of protein interaction networks is a consequence of the low number of different binding domains per protein and that only a fraction of the existing interactions have been identified due to limitations of experimental methods 9 . If we assume that the probability P (k, l, m|k → l) can be factorized, the resulting inferability measure, I * F , is simply a function of the outdegree distributions, P (k) and P (l). We observed that I * F ≈ I F for all networks investigated in this work (Fig. 1f, inset). This result shows that for a large variety of networks structures the outdegree is the dominating factor that determines network inferability.
Inference of transcriptional networks on a genome scale is best realised by methods that are (i) asymptotically unbiased, (ii) scalable to large network sizes, (iii) sensitive to feed-forward loops 10 , and (iv) can handle data sets with and without knowledge about which nodes are targeted by experimentally induced perturbations 7,[11][12][13][14][15] (Supplementary Note 2). Inference methods for directed networks typically require individual perturbation of all nodes 7 or many perturbations of different strengths to compute conditional association measures 6,16 or conditional probabilities 17 .
Generation of time course data seems to be the most natural way to infer directed networks by simply analysing the temporal ordering of the transcriptional activities 18,19 . However, this approach precludes the use of knockout experiments and requires fast acting perturbations in combination with monitoring node activities over time, which is experimentally demanding, especially if nodes respond on very different time scales 20 .
Inference is further complicated by the fact that transcriptome data contains a significant amount of stochastic variation between biological replicates despite pooling over millions of cells ( Supplementary Fig. 1a). It is interesting to see that the variation across biological replicates for baker's yeast 3 is close to a normal distribution and follows almost exactly a t-distribution with 11 degrees of freedom over five standard deviations ( Supplementary Fig. 1a, inset) and that biological noise is much larger than technical noise ( Supplementary Fig. 1b). As biological noise may arise from subtle differences in growth conditions that induce changes in gene regulation, we expected to see significant cross-correlations among genes ( Supplementary Fig. 1c), whose magnitude is much larger than expected by chance ( Supplementary Fig. 1c, inset). These cross-correlations can be exploited for inferring the structure of undirected networks 11 , if the driving noise is independent and identically distributed for all nodes (Supplementary Note 2). In contrast, technical noise not only reduces the statistical significance for detecting true interactions but can also induce a significant fraction of false positive interactions, especially if the interaction network under investigation is sparse. Such noise induced misclassification of links can be illustrated by a simple linear network A → B → C for which standard inference methods interpret the information that A has about C erroneously as a direct link between A and C if the state of B is corrupted by measurement noise ( Supplementary Fig. 1d). The reason is that a part of the correlation between A and C cannot be explained by B.
To make use of the rapidly growing amount of single-gene knockout screens for which transcriptome data is 3 or may become available 21,22 , we developed a method to infer directed networks on a genome scale where the number of genetic perturbations is typically below the number of nodes or genes in the network (Online Methods). In brief, our method uses the concept of probabilistic principle component analysis 23 to compute partial response coefficients (PRC) that are asymptotically unbiased with respect to Gaussian measurement noise ( Supplementary Fig. 1d).
Additionally the algorithm provides a feature to identify non-inferable links, which are removed before statistical analysis. In absence of noise, our numerical method correctly predictes the fraction of links that are inferable, F (q) (Supplementary Notes 1). To evaluate the performance of our method we generated two synthetic knockout data sets that closely resemble the gene regulatory network structure of baker's yeast, using the GeneNetWeaver software 24 that uses a hierarchical network structure and our own generative model that uses a scale free network structure (Supplementary Notes 3). We added Gaussian measurement noise to the synthetic data with a standard deviation of 10% the log 2 fold-change in expression level for each perturbation for each gene.
Residual bootstrapping among replicates was used to quantify the statistical significance of the inferred link strengths. In comparison with standard inference methods, such as partial correlations [11][12][13]25, 26 , our method shows a significantly higher performance in absence of any penalties that enforce sparse network structures (Fig. 2b, left panel). The improved performance of our method can be assigned to the fact that it is unbiased with respect to measurement noise (Online Methods).
To further improve the predictive power of our method we included the prior knowledge that transcriptional networks are highly sparse. Sparsity constraints are typically realized by penalising either the existence of links or the link strengths by adding appropriate cost functions, such as L 1 -norm regularized regression (Lasso) 27 . Adding a cost function to the main objective comes with the problem to trade-off the log-likelihood against the number of links in the network whose strength is allowed to be non-zero. In absence of experimentally verified interactions there is no obvious way how to determine a suitable regularization parameter that weights the likelihood against the cost function, which is one of the great weaknesses of such methods.
In our approach we reduce network complexity by assuming that functionally relevant information in molecular networks can only pass through nodes whose response to perturbations is significantly above their biological noise level. The individual noise levels can be estimated from natural variations between wild type experimental replicates ( Supplementary Fig. 1). The significance level that removes nodes from the network with high noise-to-signal ratio can be set by the desired false discovery rate. It can be shown that removal of noisy nodes imposes a sparsity constraint on the inference problem (Online Methods). The different steps required to arrive at a list of significant links are illustrated in Fig. 2a. In the first step, genes are grouped in clusters that are co-expressed under all perturbations. These clusters are treated as single network nodes in the subsequent steps. In the second step, only those samples are extracted from the dataset that correspond to perturbation of a chosen gene -the source node -with no other genes perturbed (node 5 in Fig. 2a). From this reduced dataset, we identify all nodes in the network that change expression above a given significance level upon perturbing the source node. These significantly responding nodes define a subnetwork for each source node, which is typically much smaller in size than the complete network. In the third step, we collect all perturbation data from the complete dataset for all nodes that are part of the subnetwork. Before inferring a direct interaction that points from the source node to a given target node in the subnetwork, we remove the perturbation data of target node and all nodes that respond significantly to perturbations of the target node (green arrows in Fig. 2a). The second and third steps realise numerically the counting procedure of inferable links as illustrated in Fig. 1a. Significant links are identified by partial response coefficients in combination with residual bootstrapping over replicates (Online Methods). In the fourth step, we collect all clusters of co-expressed genes that contain two nodes in total with one of the nodes perturbed and check statistical significance of the directed link between them. In the fifth step, all significant links are collected in an edge list. We refer to these five steps as the clustering method. If we remove all links from the edge list that have more than one node as a source or more than one node as a target, we obtain an edge list that corresponds to links between single genes. This reduced edge list would also arise by skipping the clustering step and we refer to remaining inference steps that compute links between single genes as subnetwork method.
The Lasso method followed by bootstrapping has been benchmarked as one of the highest performing network inference methods for in silico generated expression data 28 . The receiver operating characteristic (ROC) curve of the subnetwork method shows better performance than the Lasso method (Fig. 2b, middle and right panel) after adjusting the regularization parameter of the Lasso method such that the area under the ROC curve is maximized. However, a significant performance boost for the Lasso method can be achieved by applying the second step of our method that removes noisy nodes, resulting in comparable performance of Lasso with the subnetwork method for the case that validation data exists such that the regularisation parameter can be determined To get insight into the optimal experimental design for generating data for network inference, we computed the fraction of correctly inferred links and compared them against the fraction of independently perturbed nodes for different numbers of experimental replicates. We compared three different variants of our approach: partial response coefficients (PRC), PRC together with subnetwork method, PRC together with clustering method (Fig. 4C). As all variants share PRC as underlying inference method (Online Methods), the observed strong increase in performance can be assigned to the sparsity constraint that comes with the subnetwork method or the clustering method. Due to this constraint, both the subnetwork method and the clustering method can have higher accuracy than the noise-free analytical solution, as the latter does not enforce sparse network structures. The results show that in presence of 10% measurement noise the amount of available replicates limits the true positive rate, even if 100% of nodes are perturbed. However, inference of more than 80% of the network can only be achieved if the number of replicates is sufficiently high.
To evaluate the performance of our approach on real data, we use one of the largest publicly available transcriptome data sets 3 , compromising transcriptomes that cover 6170 genes for 1441 single gene knockouts. We use the galactose utilisation network as a gene regulatory example, which is one of the best characterised gene regulatory modules in yeast 29 . The regulatory mechanism of the GAL4 gene as a key regulator is shown in Fig. 4D, left panel. As information about phosphorylation and protein interaction is absent in expression data, the inferred network structure from transcriptome data with GAL4 and GAL80 perturbed is different from the known gene regulation. By sorting genes with respect to their number of statistically significant outgoing links, we can identify potential key regulators. Besides transcription factors, the regulators with highest statistical significance are factors involved in chromatin remodelling, signalling kinases, and genes involved in ubiquitination (Supplementary Tables 1-3). This result -although expected for eukaryotes -is inaccessible for inference methods that a priori fix known transcription factors as regulatory sources. However, as the 1441 knockout genes of this dataset compromise just 23% of the genes for which transcript levels have been measured, we can estimate from our simulations that we have inferred less than 10% of the direct interactions in the transcriptional network of yeast. However, the recently available double knockout screens in yeast 22 drastically increased the number of targeted perturbations, which can boost the prediction accuracy of network inference methods once transcriptome data for a fraction of these mutants become available.    PRC with subnetwork method (green) and Lasso (black) both applied to a subset of significantly responding nodes that were selected with 1% false discovery rate, Lasso regression applied to all 300 nodes (blue), and PRC from left panel (red) for comparison. c) True positives for the same scale-free network of (b) with 2, 4 and 8 experimental replicates with 5% false discovery rate for both significantly responding nodes and link strength: PRC (red), PRC with subnetwork method (green), PRC with subnetwork and clustering method (blue), and F (q) (black line). d) The S. cerevisiae GAL network as an example for a gene regulatory network where phosphorylated Mig1 sets the basal expression levels of Gal4 and one of its many regulatory targets Gal3. Gal4 protein can activate Gal3 but is inactivated upon binding of Gal80 protein. The transcriptome dataset contains knockout mutants for Gal80 and Mig1 but not for the remaining Gal genes. A schematic representation of the key molecular mechanisms (left) and links inferred from transcriptome data as described in the main text. Data for a three-node network with two links was generated by applying independent perturbations on node 1 and node 2. The link strength of the non-existing link from node 1 to node 3 relative to the existing link from node 2 to node 3 was inferred using three different methods (i) partial correlations (blue squares), (ii) conditional mutual information (green triangles), and PRC (red circles).

Methods
Partial response coefficients (PRC). We aim at inferring direct interactions between N observable molecular components, such as transcripts or proteins, by measuring their copy numbers or concentrations, y ∈ R N . We assume that the available dataset has been generated from P perturbation experiments, {y k } P k=1 , which may also include experimental replicates. We further assume that the molecular targets of the perturbations are known, as it is the case for gene knockout or knockdown experiments. The elements of the interaction matrix A ∈ R N ×N define the strength of the directed interactions among the molecular components, for example A ij quantifies the direct impact of component j on component i. Given the available experimental data, our aim is to correctly classify the off-diagonal elements of A as zero or nonzero to obtain the structural organization of the interaction network. We assume that the observed component abundance, y obs k , differs from the true value, y k , by additive measurement noise, (t), which is characterized by zero mean, We assume that the observed data can by described to sufficient accuracy by a linear stochastic procesṡ from which we obtain in the stationary regime, t t 0 , a relation between A and the covariance matrix of observed component abundances We exploit Eq. (3) to infer directed networks from correlation data. Here, we assume that component abundances are obtained from averaging over a large number of cells in the stationary growth phase. In this case, fast fluctuating perturbations that arise from thermal noise and can be observed only on single-cell level average out. To infer the interaction matrix, A, we start with singular value decomposition of the matrix product with U and V orthogonal matrices and Σ a diagonal matrix containing the singular values. The negative definite matrix A is invertible and hence has full rank. In the following, we show that it is possible to infer the directed link strength between a sender node j and a receiver node i, if all direct perturbations on receiver node i are removed from the dataset and if a significant partial correlation between i and j exists. Removing the perturbation data for node i implies that the matrix B has at least one zero entry. As a consequence, N 0 ≥ 1 singular values are zero -as in general not all nodes are perturbed -and the corresponding rows of U span the left nullspace of A −1 B. In the absence of fast fluctuating perturbations, γ = 0, we can rewrite the covariance matrix as Assuming that the observed node activities follow a multivariate normal distribution, we can find estimates for the unknown orthogonal matrix U , the singular values Σ, and the observational noise The maximum of the log-likelihood function is determined by the conditions dL/dU k = 0, dL/dΣ kk = 0, and dL/dσ 2 = 0, which results in showing that maximum likelihood estimates of U , σ 2 , and Σ are uniquely determined by the sample covariance matrix S. Our analysis is mathematically equivalent to the probabilistic interpretation of principle component analysis 23 .
Solving the matrix equation, Eq. (4), for A gives with Σ + the pseudoinverse of Σ. As the matrix A has full rank, we complement Σ + with an unknown diagonal matrix Σ 0 that has nonzero values where Σ + has zero values and vice versa and complement BV with an unknown orthogonal matrix W . Note that by construction, Σ + U T and Σ + U T map from complementary subspaces and thereby ensure that A has full rank. The fact that V , W , and Σ 0 cannot be determined from S shows that A cannot be computed from a single covariance matrix. A more general case arises when measurement noise is independent but not isotropic, σ 2 I → σ 2 D, with D = diag(r 1 , r 2 , ..., r N ) a diagonal matrix with known positive elements that contains scaled noise variances, r i := σ 2 i /σ 2 , resulting in A transformation to isotropic noise is possible by multiplying both sides of Eq. (13) by D − 1 2 , which changes the result Eq. (12) to with U the eigenvectors of D − 1 2 SD − 1 2 .
Case N 0 = 1. We assume that the i-th node is the only unperturbed node in the network and consequently define B il = 0 for all l. From Eq. (12) we obtain a unique solution for the i-th row with U j1 the j-th element of the eigenvector of S that has the smallest eigenvalue. Note that the first term in the brackets vanishes as B il = 0 and Σ 0 11 is the only nonzero element of Σ 0 . The important point is that any dependency on σ has dropped out, which makes this method asymptotically unbiased with respect to measurement noise. The fact that we can determine the elements of the i-th row of A only relative to a reference value, A ii , is rooted in fact that we have to de-  (15) if the data quality is sufficiently high.
Case N 0 > 1. For all i that are not perturbed or whose perturbation data has to be removed from the dataset we get from Eq. (12) Non-unique solutions of Eq. (16) can arise if a given fraction of the variance of the receiver node i can be explained by more than one sender node, for example, when a perturbed node j targets two nodes with index i and l. In this case it is unclear from the node activity data whether i is affected directly by j or indirectly through l, or by a combination of both routes. If the node l is unperturbed, we can use the simple criteria shown in Fig. 1a to decide whether the link from j to i is inferable or not. If node l is weakly perturbed, a statistical criteria is needed to decide about inferability, which can be computed numerically as follows: To find out whether j transmits a significant amount of information to i that is not in l, we first remove the perturbed node j from the network and determine the link strengths A for the remaining network of size N − 1. To construct a possible realisation of A we set in Eq. (16) the nonzero values of Σ 0 to unity and use W = U to arrive at the expression with U determined from the sample covariance matrix with the i-th column and i-th row removed.
Using the inferred link strength from Eq. (17) we can rewrite Eq. (3) as a two-node residual inference problem between j and i, where we obtain a lower bound for link strength from node j to i by using the variation of i that could not be explained by A . Defining byÃ,B andD the 2 × 2 analogs to the full problem we obtaiñ withC the covariance matrix of the vectorỹ obs = (y obs j , y obs An estimate for the minimum relative link strength from node j to node i can be calculated from Eq. (14) and is given byÃ Eq. (19) can be considered as an asymptotically unbiased response coefficient between node 1 as target node and node 2 as source node, as any dependency on σ 2 has dropped out. An estimate for the maximum relative link strength from node j to node i follows from Eq. (19)  Fraction of non-inferable links. Inferability of a directed link between source and target node requires that the remaining network may not contain the same information that is transmitted between them. A sufficient condition is that all information that the remaining network receives from the source node is destroyed by sufficiently strong perturbations. If the target node is not perturbed, information from the source node my reach the remaining network through the target node. In this case also the targets of the target node must be perturbed (Fig. 1a). Counting network motifs that satisfy these conditions gives the number of inferable links. If the network size, N , is significantly larger than the number of outgoing links for both the source and target nodes, we can approximate the fraction of inferable links, F (q), by the expression (Supplementary Note 1) Here, P (k, l, m|k → l) is the conditional probability of finding two connected nodes in the directed network, where the source node has k ≥ 1 outgoing links, the target node has l ≥ 0 outgoing links, and both share m nodes as common targets of their outgoing links. The first term in the brackets corresponds to the case that independent perturbation data for node B exists (Fig. 1a, left panel) and the second term to the case where independent perturbation data for node B is absent (Fig. 1a, right panel). In the calculation of F (q) we assumed that the links in the network are identical with respect to the information they can transmit. Sparsity constraints by removing noisy nodes As network inference typically comes with an insufficient amount of independent perturbations and experimental replicates we run into the problem of overfitting the data. In this case, noisy information from many network nodes is collected to explain the response of a given target node. L 1 -norm regularized regression (Lasso) systematically removes many links, where each link explains only a small part of the variation of the target node, in favour of few links, where each link contributes significantly. In our approach we remove noisy nodes and thus their potential outgoing links, where the critical noise level is determined by independent biological noise. Which nodes are removed depends on the source node that is perturbed.

Data preparation
In the absence of noise, our algorithm removes weakly responding nodes from the network. We thereby assume that the existence of many indirect interactions between source and target node by first distributing the signal of the source node among many weakly responding nodes and then collecting these week signals to generate a significantly responding target node is much less likely than the existence of a single direct interaction. However, in the noise-free case we run into the same problem as Lasso to determine the right cut-off (regularization parameter).
Synthetic Data Synthetic data was generated using our own model and GeneNetWeaver 24 -an open access software that has been designed for benchmarking network inference methods. With GeneNetWeaver, networks were generated from a model that closely resembles the structure of the yeast regulatory network 24 , and steady state levels of node activities were computed using ordinary differential equations (ODEs). In our model, we first generated scale-free networks with an exponent of 2.5 and an average degree of 2. Then, we solved a system of ordinary differential equations with non-linear regulatory interactions between nodes to obtain steady state values of node activities, e.g. transcript levels. For both models, logarithmic fold changes of node activities were calculated (transcriptional levels upon perturbation relative to wild levels), and gaussian measurement noise was added.  Assume a subnetwork where a source node (A) targets a node (B), given that the 8 out-degree of node A is k, the out-degree of node B is l, and A and B have c nodes 9 as common targets. We aim to infer the link from node A to node B (See figure 10 1). Fig. 1: Example of a directed subnetwork. We would like to infer the link from node A to node B (shown in bold), where A has out-degree 4, B has outdegree 3, and A and B share 1 common target.

A B
We denote a direct link from source node (A) to target node (B) as inferable 13 if there exits a detectable amount of mutual information between A and B that 14 1 2 Network's Inferability 2 cannot be transmitted by any alternative route through the network. This requires 15 that at least one nodes of each alternative route is perturbed and thereby part of 16 the transmitted information is destroyed. To meet this requirement, either one of 17 the following conditions must be fulfilled (Fig. 1a, main text): 18 19 1. All nodes that are targeted by A are perturbed, including node B. 20 2. Except B, all nodes that are targeted by A and B are perturbed. 21 By collecting all subnetworks that fulfil conditions 1 or 2 we can calculate the 22 average fraction of inferable links, F (N p ), using N := N −k−1 and N p : Here, N is network size, N p is number of perturbed nodes, k is the out-degree 24 of the source node (A), l is the out-degree of the target node (B), c is the number 25 of common nodes targeted by A and B. We further defined by P (k|A → B) the 26 conditional probability that for any two connected nodes, source node (A) has 27 out-degree k and P (k, l, c|A → B) is conditional probability that for any two con-28 nected nodes, source node (A) has out-degree k , target node (B) has out-degree 29 l, and A and B target c common nodes. The first term in the numerator counts where q denotes fraction of perturbed nodes, q = N p /N . F (q) curves (Fig. 1b, main text). We therefore define a inferability measure, I F , 44 as the area under the curve of F(q) that reflects how difficult it is to infer links for 45 a given network structure Not that I F is independent of network size. For a sufficient large networks N 1 47 and when feed forward loops rare, we can approximate the joint probability by According to Eqs. (3) and (4), inferability mainly depends on the out-degree of  (Fig. 2a), a scale-free network where 63 hubs are souces of links (Fig. 2b) and a random network (Fig. 2c). In all three The constraints imposed by the conditions (i)-(iii) explain the poor performance 15 of partial correlations in inferring undirected links for gene regulatory networks 16 [3], where most of data is generated by perturbations that persist on long time 17 scales, measurement noise is significant, and the network is directed. In contrast, 18 conditions (i)-(iii) are satisfied to very good approximation for identifying phys- 19 ical contacts between amino acids in folded proteins [4], where single nucleotide 20 substitutions occur on much shorter time scales than significant changes in pro- with diagX a matrix with elements {X 11 , X 22 , ...} on the main diagonal and zero el- In the derivation we used that by definition diagA with I the identity matrix.

41
The lack of a unique relation between G and A is rooted in the fact that the  preprocessing steps are described elsewhere [1]. We re-arranged the original pre-26 processed data layout to make it compatible with our network inference algorithm.

27
Specifically, M-values corresponding to dye-swap experiments were multiplied by 28 -1 and only knock-out experiments corresponding to genes that were also included 29 in the chip design were kept in the data set. Since there is insufficient information about the biological processes leading to 33 variation among biological replicates (biological noise), we assume that the process 34 is equal for all genes. By bringing the wild type data to unit variance for each gene, 35 we show that the variation among the wild type data logarithmic fold changes 36 is approximately Gaussian distributed (suppl. figure 1a). By taking the mean 37 logarithmic fold change of each biological replicate (wild type data only) and then 38 calculating the differences of the technical replicates to this mean, the distribution 39 of the technical variance can be found (suppl. figure 1b). To estimate how well network inference works on data which is similar to the data 47 set provided by Kemmeren et al. [1], the input data to the inference algorithms 48 was generated as described below. In short, steady state solutions of ordinary 49 differential equations were used to generate absolute gene expression levels, which 50 were then turned into logarithmic fold changes, and measurement noise was added. We chose scale free network models with exponent 2.5 to simulate a comparable 54 network structure to yeast; the average degree was set to 2. Up to a rounding 55 error, 80 % of links were set as inhibiting. Non-linear regulatory interactions were 56 simulated by solving ordinary differential equations. First, wild type expression 57 levels were calculated by solving a system of equations of the form shown below.

58
The Hill-coefficients h ki were sampled from a uniform distribution between 1 and 59 2. The constant K ki was set to 0.5. The linear degradation term λ i was set to 2 60 to assure stability. The coefficients a ki and b ki were set to 1 and -1 respectively for 61 inhibiting links and to 0 and 1 respectively for activating links.
Here, u i denotes the basal gene expression rate. To compute the response y ij of a 63 gene i to a knock-out of gene j, the above shown system of ODEs was modified by 64 setting the basal expression rate for the knocked out gene to zero and by removing 65 all links onto and away from the gene j. For better comparability with other publications, gene expression data was also 68 generated with the program "GeneNetWeaver" [3]. This data was only used for 69 comparing inference methods using ROC curves. From the "gold standard" yeast   we excluded all Pseudogenes and dubious ORFs listed on yeastgenome.org [5].  Nodes were grouped into clusters if they were not sufficiently linearly independent.

183
The following steps were followed: 184 1. All node activities were normalized to unit reference node activity (wild type 185 data logarithmic fold change) variance.   corresponding to a certain cutoff to select significantly affected nodes. That is, for 245 the most stringent cutoff, the network size was smaller than for the less stringent 246 cutoff because less nodes were found to be significantly affected by perturbations.

247
The cutoffs correspond to false discovery rates of 10 -8 and 10 -6 .  To show that network inference methods are biased towards measurement noise, 272 we simulated node activity data for the limiting case of infinitely many replicates.

273
In that case, all parameters are estimated perfectly because uncertainty resulting Where B is a diagonal matrix with element b ii = 1 if node i is perturbed. The 277 signal-to-noise ratio was adjusted by changing the value of σ 2 .   The noise-to-signal ratio was adjusted to 10 %. Data was generated with both Gunawan [6]. To infer a link onto a certain node, the data corresponding to a 299 perturbation of that node was removed prior to calculating the covariance ma-  , where G ji = N 0 where N 0 is the number of 334 unperturbed nodes and U i is the i th eigenvector of the covariance matrix.

335
Note that links were not identified as artifacts.

336
Comparison of regularized inference methods For a fair comparison, we used For simplicity, we assumed here that the reference state is 0. Here, Y are the true node activities that are masked by measurement noise to yield observed node activities Y obs , and the matrix A T denotes the link strengths. In the steady state and under the assumption that the measurement uncertainties are Gaussian distributed, a maximum a posteriori (MAP) function of the parameters A T and U given the observed data Y obs can be defined. The logarithm of this function is: to A T ii . The L 1 Norm ("Lasso" regularization) and L 2 Norm ("Tikhonov-Miller"