Robustness and lethality in multilayer biological molecular networks

Robustness is a prominent feature of most biological systems. Most previous related studies have been focused on homogeneous molecular networks. Here we propose a comprehensive framework for understanding how the interactions between genes, proteins and metabolites contribute to the determinants of robustness in a heterogeneous biological network. We integrate heterogeneous sources of data to construct a multilayer interaction network composed of a gene regulatory layer, a protein–protein interaction layer, and a metabolic layer. We design a simulated perturbation process to characterize the contribution of each gene to the overall system’s robustness, and find that influential genes are enriched in essential and cancer genes. We show that the proposed mechanism predicts a higher vulnerability of the metabolic layer to perturbations applied to genes associated with metabolic diseases. Furthermore, we find that the real network is comparably or more robust than expected in multiple random realizations. Finally, we analytically derive the expected robustness of multilayer biological networks starting from the degree distributions within and between layers. These results provide insights into the non-trivial dynamics occurring in the cell after a genetic perturbation is applied, confirming the importance of including the coupling between different layers of interaction in models of complex biological systems.

links. Applying this description, our process corresponds to treating all the links in the gene regulatory network as "AND" operators and all the protein-metabolite interlayer links as thresholded "OR" operators.
Supplementary Note 2. Calculating the equivalent coupling strengths between gene regulatory and PPI networks Genes in the gene regulatory network and proteins in the PPI network form one-to-one interdependent pairs. Owing to the incompleteness of the data, the gene regulatory and PPI networks are only partially interdependent of each other. We use N G and N P to denote the numbers of nodes in the gene regulatory and PPI networks, respectively, and N I denotes the number of gene-protein pairs. If the interdependencies between the gene regulatory and PPI networks are random in topology without any correlations in the degrees of the connected gene-protein pairs, then the coupling strengths of the regulatory and PPI networks are q Rand G = N I /N G and q Rand P = N I /N P . In the real network cases, the connections between genes and proteins are not random, leading to a situation in which the coupling strengths are difficult to quantify. To solve this problem, we propose a method for finding equivalent coupling strengths so that the couplings between the regulatory and PPI networks can be treated as random in theoretical computations.
In the gene regulatory network, the fraction of genes that do not have corresponding proteins is p GI = (N G − N I )/N G , and these nodes form a sub-network with no connections to the PPI network. We refer to this sub-network as an independent regulatory sub-network.
Treating the independent regulatory sub-network as an isolated network and given its degree distribution, P GI (k in , k out ), we can estimate the final functional node size f GI S in the independent regulatory sub-network by substituting P GI (k in , k out ) with P Gene (k in , k out ) in Eqs.
(1) and (3) (see main text). As a result, the final functional node size is p GI * f GI S if only the nodes in the independent regulatory sub-network remain, which is equal to the case of intentionally perturbing all of the genes that are connected to the PPI network. If we convert such a perturbation strategy into random perturbations, how many nodes are required to be perturbed in the isolated gene regulatory network so that the final functional node size is p GI * f GI S ? Figure S2 shows mappings from random perturbations on 1 − p fraction of genes to the functional node sizes. With such mapping, we could find an equivalent random S3 perturbation fraction 1 − p eG , having the final functional node size p GI * f GI S . Thus, the equivalent coupling strength from the gene regulatory network to the PPI network could be approximated by q G = 1 − p eG . Similarly, treating the independent PPI sub-network as an isolated network and given its degree distribution, P PI (k), the giant connected component size is . Based on mapping of the fraction p of remaining nodes to the size of the giant connected component in the isolated PPI, as shown in Figure S3, we can find a sub-network formed by a fraction p eP of randomly chosen nodes with its giant connected component being W PI * (N P − N I )/N P . The equivalent coupling strength from the PPI to the gene regulatory network is q P = 1 − p eP .  We apply the theoretical framework to multilayer ER networks which models the failure mechanisms in the multilayer biological molecular networks, in order to show the generality of our framework. In the multilayer ER networks, a directed ER network is partially interdependent with an undirected ER network by one-to-one correspondence, and such an undirected ER network give multiple supports to another undirected ER network. Figure S4 shows the solution of the final functional node sizes of three coupled ER networks after ran-

Supplementary Note 4. Essential and cancer genes enrichment and validation
To measure the enrichment of a set of genes in the ranking obtained from the influence scores, we identify the considered genes as the true positives of a classification problem.
The influence score of a gene is thus analogous to the confidence of our model in assigning that gene to the positive class. Under this perspective, the enrichment of the influential genes can be evaluated with standard statistical tools. In this work, we considered the precision-recall curve as a meaningful performance metric in virtue of its robustness to class imbalance. The performance associated with each precision-recall curve is evaluated through the average precision score (APS), a score defined as APS = n (R n − R n−1 )P n where P n and R n are the precision and recall at the n-th threshold. A large value of APS indicates that the considered genes are more likely to have high influence scores. This S6 evaluation was performed on essential and cancer genes.
To uncover the mechanism that the essential and cancer genes are enriched in the genes with high influence scores, we calculate the number of genes becoming dysfunctional in the second round of the cascading process. The removal of each gene could cause its corresponding protein and the proteins that apart from the largest connected component losing its function, which is called the first round in the cascading failure process. If failure in the PPI network goes back to the gene regulatory network, more genes stops functioning, which could cause more proteins becoming dysfunctional. This is called the second round in the cascading failure process. We find that the removal of essential and cancer could cause more genes and proteins becomes dysfunctional in the second round, than the removal of non-essential and non-cancer genes, as shown in the Fig. 2c and 2d in the main text.
There are different definitions of essential genes. At the individual level, a gene can be defined as essential if its loss of function will lead to individual death or loss of fitness. At the population level, a gene is essential if the biological system shows intolerance to lossof-function variants [2]. Next, we will show that genes important to a system's robustness are enriched in genes with high scores of essentiality valued by different metrics. We choose three additional sets of essential genes measured by: (1) the probability of haploinsufficiency (Phi), (2) the probability of loss-of-function intolerance (pLI), and (3) the essential genes found by Dickinson et al. [3]. As Figure S7 shows, the result holds for the essential genes valued by these three different metrics. Since that the metrics for essentiality are highly correlated with one another [3], we can infer that this results also holds for other metrics not tested here.  ple compartments that map to the same chemical compound, the overall metabolite coverage of the STITCH-based metabolite network was higher than that of Recon in terms of the unique compounds to which the metabolites mapped. To assess fairly the agreement of the two networks, we focused on the edge overlap between the networks formed by the set of metabolites common to STITCH and Recon networks. In all Recon models tested, there was a statistically significant enrichment of overlapping edges (see Table S2). Notably, the number of overlapping edges was markedly increased in the latest version, Recon 3D, suggesting that STITCH networks successfully capture the improvements in the subsequent versions of Recon models and hinting at the gradual convergence of these datasets.
To compare further our approach with alternative methods that identify nodes with impact on robustness of biological systems, we used an established in silico single gene deletion approach based on flux balance analysis (FBA) [6][7][8] in which the effect of the single gene knockout on the objective value (typically the biomass, also used here as the sole objective value) is simulated. In this framework, one can keep track of the impact of the single gene knockout by optimizing the objective function post-knockout and comparing it with the original objective value. A gene is considered essential if restricting the flux of all reactions that depend on it to zero causes the objective (i.e., the growth rate) also to be either zero or below a given viability threshold. This method also applies in the same way to the deletion of reactions from the metabolic network, resulting in the identification of essential reactions. Using Python implementations of the COBRA Toolbox [4,5], we performed this analysis for the Recon3D human metabolic model. We found that in Recon3D, only one gene (BiGG ID: 54675 AT1) is truly essential, resulting in a complete depletion of metabolic fluxes with its removal ( Figure S21). In addition, we kept a record of all other genes whose deletion results in some decrease in flux in the biomass equation. We found 39 additional such genes, but since the impact of their deletion on the biomass was much smaller in comparison S8 (< 20% of unperturbed value), they would not conventionally be deemed essential ( Figure   S21). The rest of the > 2, 000 genes in Recon3D had no impact on the biomass.
To ensure that this small number of essential genes by the FBA criteria is in agreement with other human metabolic models, we ran the same analysis for all of the 107 metabolic models in the BiGG database covering over 20 organisms. The Recon models (Recon1 and Recon3D), along with the two other human metabolic models (iAB RBC 283 and iAT PLT 636) in BiGG, consistently showed the most robust response to these single gene knockouts with a very small proportion of genes and reactions deemed essential by the above criteria. This finding is in contrast to many other organisms having as high as 4% and 2% of metabolic reactions and enzyme-encoding genes, respectively, flagged as essential ( Figure S22).    pval < 1e-6 pval < 1e-6 pval < 1e-6 pval < 1e-6 pval < 1e-6 pval < 1e-6 pval < 1e-6 pval < 1e-6 pval=3.75e-02 pval=6.11e-02 pval=6.27e-02 pval=2.65e-02 Test of overlap