Introduction

Biological functions and characteristics are consequences of complex interactions between numerous components1. These components can be molecules such as DNA, RNA, proteins and other small molecules or larger units such as cells, tissues, whole organisms or entire ecosystems. These interactions are organized into a hierarchy of networks. Networks at different levels of this hierarchy have been studied extensively. For instance, at the subcellular level, transcriptional regulatory networks (TRN) model protein–DNA interactions1,2,3,4,5,6,7,8,9,10,11,12, protein–protein interaction (PPI) networks capture physical interactions between proteins6,13,14,15,16,17,18,19,20,21,22,23,24,25,26 and metabolic networks map interactions between the set of biochemical reactions in an organism1,27,28,29. Analysis of individual network layers has answered important biological questions ranging from organization of gene expression5,8,29,30,31, predicting phenotype from molecular interaction networks16,24, to understanding disease biology32,33,34,35,36.

However, biological networks do not function in isolation. These networks comprise of different types of interactions and even interact with other networks1,37. For instance, TRN and PPI networks interact with each other. Proteins are translated from genes in accordance with the regulatory program encoded in the TRN. These translated proteins interact with each other in the PPI layer. Transcription factor proteins interact with other proteins in the PPI layer and also regulate downstream genes in the TRN network. Further, PPI networks can also encode different kinds of physical interactions between proteins, such as the ones revealed by Yeast Two-Hybrid (Y2H) binary, Affinity Purification (AP) protein complexes, synthetic lethality, dosage lethality, genetic interactions, etc,14. Such multilayer networks (comprising multiple networks) can be interdependent when different network layers interact with each other to form a network of networks (NON) architecture38. For instance, the interaction between TRN and PPI networks forms an interdependent NON (Fig. 1). Alternatively, multilayer networks can be multiplex with different networks, which encode distinct types of interactions between the same molecular species such as the different types of PPI interactions.

Until recently, network science has focused largely on the study of individual biological networks. Even some of the studies that worked with multiple networks aggregated or integrated the different networks and did not consider a multilayer approach39,40,41,42. This could partly be attributed to the fact that multilayer networks have gained popularity only in recent years, especially in statistical physics38. Now, extensive work has been done to study robustness properties of multilayer networks43,44,45,46,47,48,49,50,51,52,53. Counter-intuitively, interdependent networks are more fragile to random failure than independent individual networks43. Real interdependent networks mitigate this vulnerability by means of specific intra- and interlayer degree–degree correlation or coupling46,51. For a given TRN–PPI interdependent (or multiplex) network (Fig. 1A), degree–degree coupling (\(C_D\)) is quantified as the correlation between the connectivity of a protein in the PPI network, K, and the connectivity of its corresponding gene in the TRN, either in-degree (\(k_{in}\), number of regulations incident on a gene from upstream transcription factors), out-degree (\(k_{out}\), number of downstream genes regulated by a transcription factor), or total degree (\(k = k_{in}+k_{out}\)). In this case, \(C_D\) can be negative, positive, or zero. Particularly, positive \(C_D\) makes the multiplex robust to attack (Fig. 1A)54,55. With positive \(C_D\), hub nodes are likely to be hub nodes in all the network layers (Fig. 1A, top). For negative \(C_D\), hub nodes in one layer are dependent on spokes in the other network layer. For zero \(C_D\), hubs and spokes are randomly dependent on each other across network layers (Fig. 1B, bottom). Here, we investigate \(C_D\) across species and assess whether it is positive, negative, or uncoupled, and if there exists a global trend across various species.

Figure 1
figure 1

Degree–degree coupling and redundancy as potential modulators of robustness in molecular multiplexes. Degree–degree coupling (\(C_D\)) and redundancy coupling (\(C_R\)) can assume any value in the multiplex of TRN (yellow layer) and PPI (green layer) networks across species. (A) (Right, Top) For \(C_D > 0\), highly connected genes in TRN (red spheres) are more likely to produce hubs or highly connected proteins in PPI (red spheres), while sparsely connected genes in TRN (blue spheres) are highly likely to produce spokes or sparsely connected proteins in PPI (blue spheres). (Right, Bottom) For \(C_D = 0\), TRN and PPI will be uncoupled in the multiplex, and the association between genes and proteins would be randomized. Here, highly connected genes produce spoke proteins and vice-versa sparsely connected genes produce hub proteins. (Left) Based on theoretical studies, \(C_D\) is expected to be positively correlated with robustness. Robustness is quantified by area under the attack curve for the Mutually Connected Giant Component (MCGC) (“Methods”). (B) (Left, Top) For \(C_R > 0\), there will be non-zero number of edges which are simultaneously present in both TRN and PPI (edges marked green). (Left, Bottom) For \(C_R = 0\), TRN and PPI will not have common edges (no green edges). (Right) Based on theoretical studies, \(C_R\) is expected to be positively correlated with robustness. Black directed edges represent regulation of downstream genes by transcription factors in TRN. Black undirected edges represent protein–protein interactions in PPI. Dashed edges represent correspondence between a gene and its corresponding protein.

Edge overlap or redundancy between network layers also mitigates vulnerability in interdependent networks56. Two genes in the TRN network have an interaction between them if one is the transcription factor for the other. While in PPI networks, interaction between two proteins depicts physical or functional interaction between these proteins. We define multiplex redundancy as similarity in interactions between the TRN and PPI networks. We quantify redundancy (\(C_R\)) by counting the number of common interactions simultaneously in both the TRN and PPI networks. If TRN and PPI networks are represented as graphs, then \(C_R\) can be measured by counting the number of edges which are simultaneously present in both the graphs (Fig. 1B and “Methods”). Higher redundancy makes the multiplex more robust to attack (Fig. 1B).

We study the multilayer network of TRN and PPI networks in nine different species, namely H. pylori, M. tuberculosis, E. coli, S. cerevisiae, C. elegans, D. melanogaster, A. thaliana, M. musculus and H. sapiens, spanning two domains of life (bacteria and eukaryotes). We collected TRN networks from nine different sources, and 16 different PPI networks from five different sources (see “Methods” and Supplementary Table S1). Two of the PPI sources are publicly curated databases–BioGRID14 and HINT57. Interlayer connectivity is defined by one-to-one correspondence between a gene and its corresponding protein. Therefore, this multilayer network can be reduced to an equivalent multiplex network43. Henceforth, we call the TRN–PPI multilayer network a TRN–PPI multiplex. Based on quality control, five (S. cerevisiae, C. elegans, D. melanogaster, M. musculus and H. sapiens) of the nine species were used for further analysis; multiplexes with visually continuous percolation curves representing second-order like behavior were studied. Here, we show that for species TRN–PPI multiplexes, positive \(C_D\) increases robustness to targeted attack on the genes and proteins. Further, increasing \(C_R\) also increases robustness. We find a trade-off between robustness and independence. Independent multiplexes with no degree–degree coupling and redundancy are highly vulnerable, while positively degree–degree coupled and highly redundant multiplexes are highly robust. We show that this trade-off exists for different species individually. Multiplex coupling is also correlated with the distribution of functionally important genes and proteins such as essential, disease and pathogen-interacting genes and proteins. These functionally important genes are selectively situated in redundant and essential parts of the multiplex and, consequently, are vulnerable. Interlayer degree–degree coupling and redundancy offer design mechanisms for tuning robustness in molecular multiplex networks.

Results

TRN–PPI multiplex is non-randomly coupled across species

We study coupling between TRN and PPI networks using two multiplex properties—degree–degree coupling (\(C_D\)) and redundancy coupling (\(C_R\)). \(C_D\) is quantified by Pearson’s or Spearman’s rank correlation between PPI degree, K, and TRN out-degree, \(k_{out}\) (Fig. 2A–D, “Methods”). We quantify \(C_R\) by counting the total number of interactions present simultaneously in both TRN and PPI networks (Fig. 1B, “Methods”). We find that \(C_D\) is significantly positive across different eukaryotes (Fig. 2A–D and Supplementary Figure S1); \(C_D\) values are statistically significant as evident from the p-values (two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)) and z-scores for comparison against the null model (Fig. 2A–D and Supplementary Figures S19 and S20). \(C_R\) is also non-randomly positive across different eukaryotes (Fig. 2E–H and Supplementary Figure S2). Node-specific (for each gene–protein pair in the multiplex) \(C_R\) values are more long-tailed compared to a randomly shuffled null model. Shuffled null model is generated by randomly shuffling labels on genes in TRN, while keeping protein labels fixed in PPI.

Figure 2
figure 2

TRN–PPI multiplex is coupled across species. Scatter plot for degree (K) of proteins in the PPI network versus out-degree (\(k_{out}\)) of genes in the TRN for eukaryotes, (A) yeast, (B) fly, (C) mouse and (D) human. Both K and \(k_{out}\) values have been log-transformed after adding 1. The PPI networks are from the BioGRID database (“Methods”). Linear interpolated fits between K and \(k_{out}\) are also shown (green line) with 95% confidence region shaded in gray. Degree–degree coupling (\(C_D\)) values and corresponding p-values (two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)) are also shown. Distribution of redundancy (\(C_R\)) for gene–protein pairs for species multiplex and the randomly shuffled null model for eukaryotes, (E) yeast, (F) fly, (G) mouse and (H) human. For each gene–protein pair, \(C_R\) is quantified by the number of redundant edges incident on that gene–protein pair. Shuffled null model is generated by randomly shuffling labels on genes in TRN, while keeping protein labels fixed in PPI. Jensen Shannon divergence (JSD) between distributions of \(C_R\) in organismal and shuffled multiplexes is also given in (EH).

Figure 3
figure 3

Multiplex attack curves for eukaryotes. Relative size of the mutually connected giant component (MCGC) is plotted as a function of the fraction of gene–protein pairs attacked and removed from the multiplex (“Methods”). Attack curves are shown for five species using PPI networks from either BioGRID or HINT databases (“Methods”); database for a given panel are annotated next to the species name. Along with the attack curves for the species (red), attack curves for the Zero-Coupling–Zero-Redundancy (Supplementary Methods) null model are also shown (gray). Under this null model, multiplexes have no degree–degree coupling and no redundancy. On average, species attack curves are more robust than the null model. Robustness is quantified by area under the curve. Attack on the multiplex (organism or null) is repeated 1000 times.

Multiplex degree–degree and redundancy couplings modulate robustness

We study the relationship between multiplex degree–degree (\(C_D\)) and redundancy (\(C_R\)) couplings and robustness (R) across species and for individual species. We use a previously reported formalism to study topological robustness of the TRN–PPI multiplex under targeted attack on its nodes43,58 (“Methods”). Robustness is related to the size of the mutually connected giant component (MCGC). MCGC is defined as the largest connected component between both layers of the multiplex (“Methods”). Buldyrev et al.43 and Kleinberg et al.58 track the size of MCGC under attack to quantify robustness. We specifically focus on targeted attack. Under targeted attack, at each step of the attack, gene–protein pairs are removed in decreasing order of multiplex degree, \(K_{mult}(i)=max(K(i), k_{out}(i))\)58, where \(K_{mult}(i)\) is the multiplex degree for gene–protein pair i, K(i) is the degree of protein i in the PPI network and \(k_{out}(i)\) is the out-degree of gene i in the TRN network (“Methods”). Absolute robustness is then measured by tracking the relative size of MCGC (MCGC divided by number of gene–protein pairs in the multiplex) as we successively remove gene–protein pairs from the multiplex. Figure 3 and Supplementary Figure S3 show the relative size of MCGC as a function of the fraction of gene–protein pairs removed during targeted attack for all the species. We call the curve for MCGC the “attack curve”. Absolute robustness is quantified by area under the attack curve (we will call this area RobustArea) (“Methods”). Large RobustArea implies large robustness for a given multiplex, and vice versa small RobustArea means low robustness. Cohen’s d59 is used to quantify effect size for robustness by comparing RobustArea for a given multiplex against an appropriate null model (“Methods”). This quantity is used as an estimate of robustness (R) in this work. We use a Zero-Coupling-Zero-Redundancy null model (see Supplementary methods). Under this null model, we generate multiplexes with \(C_D\) and \(C_R\) fixed to zero.

Attack curves for different eukaryotic species are shown in Fig. 3. Visually, we see that organismal RobustArea values are larger than the null model on average (except for yeast with HINT PPI network). This is quantified in Fig. 4A, where \(C_D\) and R are positively correlated across different eukaryotic species with Spearman’s correlation coefficient of 0.68 (p-value = 0.044, two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)), and \(C_R\) and R are positively correlated across different eukaryotic species with Spearman’s correlation coefficient of 0.72 (p-value = 0.037, two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)). We also quantify the dependence of R on \(C_D\) and \(C_R\) for individual species (Fig. 4B,C). For each species, we sample a subset of the TRN–PPI multiplex such that the sampled multiplex has specific values of \(C_D\) and \(C_R\) (see Supplementary Methods). This sampling is repeated 1000 times. Attack curves and RobustArea are then computed for the sampled multiplexes. Therefore, for a given combination of \(C_D\) and \(C_R\), we get a distribution of RobustArea, and R can be calculated. For each species, we explore \(C_D\) and \(C_R\) values over a grid (Fig. 4B). Changing both \(C_D\) and \(C_R\) increases R (Fig. 4B and Supplementary Figure S6). For all the species, multiplexes with the highest \(C_D\) and \(C_R\) values exhibit maximum R (Fig. 4B and Supplementary Figure S6). Further, for a fixed value of \(C_R\), R increases with \(C_D\). Similarly, for fixed \(C_D\), R also increases with \(C_R\). The effect of \(C_R\) is stronger than \(C_D\), which is evident for the human multiplex, where R is high for high \(C_R\) irrespective of \(C_D\). The independent effect of \(C_D\) and \(C_R\) on R is shown in Fig. 4C.

Figure 4
figure 4

Degree–degree and redundancy couplings modulate multiplex robustness. (A) Robustness to targeted attack (R) (right) is positively correlated with degree–degree coupling (\(C_D\)) (left) across species with Spearman’s rank correlation coefficient of 0.6803 (p-value = 0.04372). Robustness to targeted attack (R) (right) is also positively correlated with redundancy coupling (\(C_R\)) (center) across species with Spearman’s rank correlation coefficient of 0.7167 (p-value = 0.03687). \(C_R\) is computed as a z-score against the null model (“Methods”) (B) We sample a subset of gene–protein pairs from the species multiplexes (sizes for the subsets are: S. cerevisiae-1000, D. melanogaster-2000, M. musculus-300, H. sapiens-500) with specific \(C_D\) and \(C_R\) values. We repeat the sampling 100 times. We explore \(C_D\) and \(C_R\) over a grid. For each point over the 2D grid, the heatmap shows the robustness (R) value. R is computed by comparing RobustArea of any point over the grid against the lower-left point of the grid (with \(C_D\) = 0, \(C_R\) = 0). For each point over the grid, for each of the sampled subsets, targeted attack is performed. Mean R values are shown here. Lower and upper 95% confidence interval (CI) values are shown in Supplementary Figure S5. \(C_{D}^{out}\): degree–degree coupling between \(k_{out}\) and K, \(C_{D}^{in}\): degree–degree coupling between \(k_{in}\) and K. (C) (Left) Dependence of robustness on degree–degree coupling while redundancy is fixed. For each species curve, R is calculated by comparison against the sampled multiplex with \(C_D = 0\). (Right) Dependence of robustness on redundancy coupling while degree–degree coupling is fixed to the value in the full multiplex. For each species curve, R is calculated by comparison against the sampled multiplex with \(C_R = 0\). In all the panels, error bars show 95% CIs. In panel (B), BioGRID PPI networks are used; results with HINT PPI networks are shown in Supplementary Figure S6B. For panel C (left), \(C_R\) is set equal to the number of redundant edges in the multiplex.

Figure 5
figure 5

Functionally important genes and proteins are redundant and essential. (A) (Left) Essential genes and proteins have lower robustness (R) to targeted attack across species against RanDP-RZ null model. (Right) Lower R is accompanied by higher redundancy (\(C_R\)) for essential genes and proteins against RanDP-RZ null model. Panels (BD) show results for the human multiplex. (B) (Left) Pathogen-related genes and proteins have lower R to targeted attack across pathogens against RanDP-RZ null model. (Right) Lower R is accompanied by higher \(C_R\) for pathogen genes and proteins against RanDP-RZ null model. (C) (Left) Disease-related genes and proteins have lower R to targeted attack across diseases against RanDP-RZ null model. (Right) Lower R is accompanied by higher \(C_R\) for disease genes and proteins against RanDP-RZ null model. (D) (Left) Oncogenes (tumor suppressor genes (TSGs)) and proteins have lower R to targeted attack against RanDP-RZ null model. (Right) Lower R is accompanied by higher \(C_R\) for oncogenes (TSGs) and proteins against RanDP-RZ null model. In all the panels, database used for PPI networks is annotated on the y-axis (“Methods”). In all the panels, relative robustness (R) is measured against RanDP-RZ null model (“Methods”), and \(C_R\) values are calculated as the difference in redundancy against RanDP-RZ null model. Results against RanDP and RanDP-no\(C_D\) null models are shown in Supplementary Figures S42 and S41, respectively. Error bars show 95% CIs. For all the panels, \(C_R\) is calculated as the mean difference in the number of redundant edges between the species’ multiplex and the null model.

Across species, we find that \(C_D\), \(C_R\) and R are positively correlated with the number of gene–proteins in the multiplex (Supplementary Figure S4). To assess whether across species correlation between \(C_D\) (\(C_R\)) and R is simply an artifact of the difference between the sizes of the multiplexes or not, we sample two subsets of different sizes for yeast, fly and human (Supplementary Figures S16S18). For each of these species, we find that the larger sized subset has a larger robustness while keeping \(C_R\) and \(C_D\) fixed (Supplementary Figure S16). However, despite this dependence of R on multiplex size, dependence on \(C_D\) and \(C_R\) can still be assessed by comparing the two subsets (Supplementary Figures S17 and S18). This suggests that the correlations with R seen in Fig. 4A,C are indeed because of \(C_D\) and \(C_R\) in addition to the dependence on the number of gene–protein pairs in the multiplex.

We also considered a configuration null model–Multiplex-Configuration (see Supplementary Methods)– to assess multiplex robustness. Briefly, under this null model, gene and protein degrees in TRN and PPI, respectively, are fixed, while edges are randomly shuffled. Additionally, the one-to-one correspondence between genes in TRN and proteins in PPI is fixed as well. Multiplex-Configuration preserves \(C_D\), while randomly shuffling \(C_R\). With the Multiplex-Configuration model, robustness of species multiplex is less pronounced compared to the Zero-Coupling-Zero-Redundancy model (see Supplementary Figures S21 and S22); two of the multiplexes (C. elegans HINT and yeast HINT) are even less robust than the null model. However, these results do not contradict our previous conclusions regarding TRN–PPI multiplexes. Zero-Coupling-Zero-Redundancy and Multiplex-Configuration null models answer different questions. The former investigates, given the TRN and PPI networks, whether the one-to-one correspondence between genes in TRN and proteins in PPI creates non-random coupling between the two network layers. Whereas, the latter null model quantifies the impact of TRN and PPI network structures on redundancy coupling. In this work, we specifically focus on the coupling between TRN and PPI created by the one-to-one correspondence between their nodes, and not on the impact of individual network structures.

Essential genes and proteins are essential for multiplex robustness

We collected essential and non-essential genes for three species (yeast, fly and human) from the Online GEne Essentiality (OGEE) database60,61. To gauge importance of essential genes for the multiplex, we selectively attack essential genes and proteins in decreasing order of multiplex degree. Partial attack curves are generated by successively attacking essential genes in decreasing order of multiplex degree. The attack process is halted once all the genes in the set of essential genes have been removed (“Methods”). RobustArea is quantified by computing area under such partial attack curves. R is calculated by comparison against three random null models–Random Degree Preserving (RanDP), Random Degree Preserving-Redundancy Zero (RanDP-RZ) and Random Degree Preserving-no \(\pmb {C_D}\) (RanDP-no\(C_D\)). Steps to generate random subsets from these models are given in Supplementary Methods. Briefly, RanDP generates subsets by matching network degrees in the random subsets to essential gene–protein pairs in the multiplex. RanDP-RZ further ensures that \(C_R\) is zero in the sampled subsets. Both RanDP and RanDP-RZ preserve \(C_D\). On the other hand, RanDP-no\(C_D\) matches degree distributions rather than individual gene or protein degrees; this does not preserve \(C_D\). RanDP-no\(C_D\) perfectly matches all the degree distributions to essential genes and proteins (Supplementary Figures S38S40). However, RanDP (Supplementary Figures S32S34) and RanDP-RZ (Supplementary Figures S35S37) do not perfectly match all the degree distributions.

For all the null models, essential genes and proteins are more vulnerable than random genes and proteins for yeast and human multiplexes (Fig. 5A, Supplementary Figures S41 and S42). Fly essential genes are more vulnerable only against the RanDP-RZ model (Fig. 5A). Vulnerability of essential genes in a species multiplex is greater than dictated by either TRN or PPI networks. We establish this by performing targeted attack on TRN and PPI networks and comparing against attack on the multiplex (Supplementary Figures S23S31). For the human multiplex, essential genes are topologically important in PPI and TRN networks as well (Supplementary Figures S23S28). In the fly multiplex, essential genes are more and less vulnerable than random genes and proteins in TRN and PPI, respectively (Supplementary Figures S23S28). Essential genes and proteins in the yeast multiplex are equally/less and more vulnerable than random genes and proteins in TRN and PPI, respectively (Supplementary Figures S23S28). However, comparing TRN and PPI attack curves against null model attack curves for the multiplex shows that attack on the multiplex is more lethal than attack on the individual networks (Supplementary Figures S29S31). Moreover, yeast and human essential genes are situated in highly topologically important parts of the multiplex, which is evident by higher vulnerability compared to random genes and proteins (Supplementary Figures S29S31). Fly essential genes and proteins appear topologically more important than random genes and proteins only against the RanDP-RZ null model (Supplementary Figures S29S31). Collectively, these results show that essential genes and proteins are topologically essential in species multiplex, and this importance is not trivially dependent on the indepdendent relevance in TRN and PPI.

In the human multiplex, higher \(C_R\) of essential genes compared to random genes co-occurs with lower R (Fig. 5A, Supplementary Figures S41 and S42). This suggests that redundancy might play a role in the topological importance of essential genes. We also study the impact of \(C_R\) on robustness by sampling subsets of genes and proteins (size 100) in the human multiplex (Supplementary Figure S15). As the redundancy of the sampled genes and proteins increases, robustness decreases against a random set of genes. Thus, redundancy might control selective placement of a subset of genes and proteins in important parts of the multiplex. For yeast and fly, such correlation between \(C_R\) and R is only seen against the RanDP-RZ model (Fig. 5A). This implies that \(C_R\) might not be the only property controlling topological importance of essential genes.

We conclude that attacking essential genes breaks the multiplex faster than attacking a random set of genes, which shows that essential genes and proteins are situated in a highly important part of the multiplex.

Pathogen- and disease-related genes and proteins are situated in essential parts of the human multiplex

Pathogen

We collected human-pathogen protein–protein interaction data for 13 different pathogens. Data for 12 of these 13 pathogens was collected from a publicly available database, HPIDB 3.062,63. This is a curated database which currently contains 69,787 unique protein interactions between 66 host and 668 pathogen species. We studied interactions for 12 different human pathogens from HPIDB 3.0 (Fig. 5B). Besides these 12 pathogens, we also included human-pathogen protein interactions for various human coronaviruses (HCoVs). We collected a list of 119 human proteins which interact with various HCoVs64. Therefore, in total we have 13 pathogens in our analysis.

Similar to essential genes, we assess topological relevance of pathogen-related genes and proteins using RanDP, RanDP-RZ and RanDP-no\(C_D\) null models. For all the pathogens, except Zika, targeted attack on the pathogen-related genes and proteins makes the multiplex highly vulnerable against all the null models (Fig. 5B and Supplementary Figures S41 and S42). As with essential genes, we also gauged the topological relevance of pathogen-related genes and proteins in TRN and PPI independently. Pathogen-related genes are highly vulnerable to attack and topologically essential in PPI (Supplementary Figures S26S28). In TRN, majority of the pathogens exhibit a similar behavior (Supplementary Figures S23S25). Similar to essential genes, pathogen-related genes are more vulnerable in the multiplex than in TRN or PPI independently (Supplementary Figures S29S31).

For all the pathogens with high vulnerability to targeted attack, this vulnerability co-occurs with higher \(C_R\) for the pathogen-related genes and proteins compared to a random set of genes (Fig. 5B and Supplementary Figures S41 and S42). Further, given our simulations (Supplementary Figure S15), this suggests that higher redundancy makes the pathogen-related genes and proteins highly important for the human multiplex. R and \(C_R\) for pathogen-related genes and proteins are negatively correlated with Spearman’s rank correlation of − 0.82 (p-value = \(2.025\times 10^{-6}\), two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)), − 0.87 (p-value = \(1.873\times 10^{-6}\), two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)) and − 0.86 (p-value = \(1.898\times 10^{-6}\), two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)) against RanDP, RanDP-RZ and RanDP-no\(C_D\) null models, respectively (Supplementary Figures S7S9). Even after controlling for the different number of pathogen-related gene–protein pairs for different pathogens, R and \(C_R\) are negatively correlated with Spearman’s rank correlation of − 0.71 (p-value = \(6.706\times 10^{-5}\), two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)), − 0.85 (p-value = \(1.905\times 10^{-6}\), two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)) and − 0.85 (p-value = \(1.908\times 10^{-6}\), two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)) against RanDP, RanDP-RZ and RanDP-no\(C_D\) null models, respectively (Supplementary Figures S7S9). In agreement with this conclusion, pathogen-related genes and proteins are significantly enriched in the human multiplex (Supplementary Figure S13).

Disease

We collected disease-related genes from a publicly available database, DisGeNET65,66,67. The current version (v6.0) contains gene-disease associations between 17,549 genes and 24,166 diseases, disorders, traits, and clinical or abnormal human phenotypes. We collected disease-gene associations for diseases which have at least 100 genes (with the HINT PPI network) in the human multiplex considered in this work. After this filtering, we retain 24 diseases in our analysis (Fig. 5C).

We assess topological relevance of disease-related genes and proteins using RanDP, RanDP-RZ and RanDP-no\(C_D\) null models. For most of the diseases, targeted attack on the disease-related genes and proteins makes the multiplex highly vulnerable against all the null models (Fig. 5C and Supplementary Figures S41 and S42). We also gauged the topological relevance of disease-related genes and proteins in TRN and PPI independently. Disease-related genes are highly vulnerable to attack and topologically essential in PPI (Supplementary Figures S26S28). In TRN, a majority of the diseases exhibit a similar behavior (Supplementary Figures S23S25). Similar to essential and pathogen-related genes, disease-related genes are more vulnerable in the multiplex than in TRN or PPI independently (Supplementary Figures S29S31).

For all the diseases with high vulnerability to targeted attack, this vulnerability co-occurs with higher \(C_R\) for the disease-related genes and proteins compared to a random set of genes (Fig. 5C and Supplementary Figures S41 and S42). Further, given our simulations (Supplementary Figure S15), this suggests that higher redundancy makes the disease-related genes and proteins highly important for the human multiplex. R and \(C_R\) for disease-related genes are negatively correlated with Spearman’s rank correlation of − 0.30 (p-value = 0.03986, two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)), − 0.7129 (p-value = \(6.898\times 10^{-8}\), two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)) and − 0.608 (p-value = \(7.29\times 10^{-6}\), two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)) against RanDP, RanDP-RZ and RanDP-no\(C_D\) null models, respectively (Supplementary Figures S10S12). Even after controlling for the different number of disease-related gene–proteins pairs for different diseases, R and \(C_R\) are negatively correlated with Spearman’s rank correlation of − 0.35 (p-value = 0.01475, two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)), − 0.68 (p-value = \(3.131\times 10^{-7}\), two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)) and − 0.56 (p-value = \(4.589\times 10^{-5}\), two-tailed z-test using Fisher’s z-transformation for \(\alpha = 0.05\)) against RanDP, RanDP-RZ and RanDP-no\(C_D\) null models, respectively (Supplementary Figures S10S12). In agreement with this conclusion, disease-related genes and proteins are significantly enriched in the human multiplex (Supplementary Figure S13).

We also collected the set of genes that contain mutations which have been causally implicated in cancer from the Network of Cancer Genes (NCG) database68. NCG also includes information on whether a given cancer gene is an oncogene or a tumor suppressor gene (TSG). As before, we conduct robustness analysis on the set of oncogenes and TSG against the three null models. We find that targeted attack on oncogenes (or TSGs) makes the multiplex highly vulnerable (Fig. 5D and Supplementary Figures S41 and S42). These sets are more vulnerable in the multiplex than in TRN or PPI independently (Supplementary Figures S23S31). Multiplex vulnerability co-occurs with higher \(C_R\) for the set of oncogenes (or TSGs) (Fig. 5D and Supplementary Figures S41 and S42).

Discussion

Recently, robustness properties of different biological multiplex and multilayer networks have been studied. These include brain networks51,58, multiplex of PPI interactions58, and a TRN-metabolic multilayer network69. However, these studies have limitations, which are described next. Kleineberg et al.58 do not draw broader conclusions about the organization of PPI multiplex across different species. Further, the different layers encode different types of interactions between proteins. This framework does not capture interaction between different types of molecules. Their conclusions are based on a generative model of network growth. This makes the results contingent on the accuracy of the generative model. This generative model is based on geometric principles and does not incorporate biological motivations or mechanisms. Klosik et al.69 study robustness under random failure of a TRN-metabolic multilayer network. The study only focuses on E. coli and there are no species wide comparisons. Moreover, they do not study the dependence of robustness on degree–degree coupling and redundancy. Another recent study focuses on the interdependent or multiplex network of TRN–PPI-metabolic networks in human70. Liu et al. 201970 show that this multiplex is more robust than an uncoupled or shuffled multiplex. They also showed that essential and cancer genes are preferentially arranged in essential parts of the multiplex. However, they do not study the dependence of robustness on multiplex properties. Further, there is no cross-species analysis.

This study bridges the gap between theoretical developments and sub-cellular multilayer networks of molecular interactions. The central goal of our work is to investigate the organization and traits of molecular multiplexes. We focus our attention on the multiplex of TRN and PPI networks across five different eukaryotes. Our analysis spans five different TRN and 9 different PPI networks. We show that degree–degree coupling and redundancy are universal principles that shape robustness of the multiplex. Both are independent modulators of robustness. Though maximum robustness is achieved for a degree–degree coupling of 1 and a completely redundant multiplex, the observed species multiplexes have low absolute degree–degree coupling and redundancy. This suggests that robustness is not the only evolutionary pressure shaping the TRN–PPI multiplex. Independence might be a countering force to robustness. One possible explanation for low absolute degree–degree coupling, redundancy and hence robustness could be an inability to tune degree–degree coupling and redundancy independently. Redundancy and degree–degree coupling are positively correlated across species (Supplementary Figure S14). Multiplexes in nature might be tuning degree–degree coupling and redundancy in unison. Therefore, increasing robustness would increase redundancy as well. If redundancy were high, both TRN and PPI layers would be encoding similar interactions and the amount of unique information captured by the multiplex would be low71. Species multiplexes might have an upper bound on redundancy which could explain the low absolute values for robustness.

Robustness, independence and redundancy are only some of the pressures which might affect the structure of TRN–PPI multiplex across the domains of life. Other topological factors might possibly be involved. For instance, theoretical studies have established that controllability72 and navigability73 both depend on multiplex structure. Further, these results have been confirmed in macroscale networks72,73. For multiplex networks, with one-to-one correspondence between the nodes in the two layers, controllability decreases with increasing degree–degree coupling72. This means that the TRN–PPI multiplex might become less controllable at high degree–degree coupling, and more genes and proteins will need to be controlled to steer the multiplex towards a desired state. Navigability is negatively affected by redundancy as well. As the number of overlapping edges, and hence redundancy, increases, navigability decreases73. Navigability is quantified by two different metrics; maximum entropy of trajectories explored by a random walker over the multiplex, and uniformity in the steady state probability distribution of node occupation under random walks over the multiplex. Maximum entropy decreases and probability distribution of node occupation becomes more heterogeneous with increasing edge overlap. At high redundancy or edge overlap, low maximum entropy will mean that a random walker can only explore a limited set of trajectories, and highly heterogeneous steady-state distributions will lead to unbiased occupancy among nodes. Therefore, controllability and navigability might exert countering pressures to robustness in shaping the structure of the TRN–PPI multiplex. Disassortative mixing is another important property of molecular networks74. Individual network disassortativity might interact with multiplex coupling to create higher order effects, where degrees of neighbors in individual networks might be coupled in the multiplex. Such higher order coupling may have additional impact on robustness and other multiplex properties.

We have identified degree–degree coupling and redundancy as two modulators of TRN–PPI multiplex robustness across five different eukaryotes. These modulators can potentially be tuned to control robustness in naturally existing TRN–PPI multiplexes. Further, we can even custom design synthetic TRN–PPI multiplexes to have desired robustness values. For instance, if robustness is the desired property for a set of genes, the multiplex could be rewired such that protein hubs are also highly regulated transcriptionally. On the other hand, if independence between TRN and PPI layers is the desired behavior, degree–degree coupling and redundancy can be reduced synthetically. In principle, similar ideas can be extended to multiplexes comprising different types of molecular species, for example, protein coding mRNA, miRNA and protein-binding mRNA. The results of this study can be easily extended to other molecular multiplexes and can inform the design of novel multiplexes with different molecular species to achieve a desired biological function.

Besides the global design principles for multiplex organization, we have also shown that functionally important genes and proteins have a distinct distribution over the TRN–PPI multiplex. Essential, disease- and pathogen-related genes and proteins are preferentially situated in essential parts of the multiplex. This topological placement is dictated by redundancy. Attack on these functionally important genes quickly dismantles the multiplex. For diseases and pathogens, this suggests that these diseases and pathogens might have evolved with the human multiplex and preferentially interact with the vulnerable genes and proteins. Thus, multiplex framework can be useful in the study of disease evolution. Network analysis has previously been used for repurposing existing drugs75. We believe that our multiplex approach might help in better identification of drug targets, since a multiplex better captures the complexity of the underlying molecular networks. Therefore, multiplex framework might have application in network medicine.

One limitation of any network analysis of molecular interactions is incomplete data. This problem is further compounded due to partial overlap between the observed TRN and PPI networks. However, it has been shown previously that if the size of the incomplete network is above a certain threshold such that a giant component exists, incomplete networks are representative of the complete network76. This suggests that our analysis is representative of the complete species multiplexes. Addition of more genes and proteins in the multiplex might change the specific values of \(C_D\), \(C_R\) and R, however the general dependence between robustness and degree–degree coupling and redundancy will hold. Further, we have not incorporated information about isoform proteins in this study. However, it is straightforward to include such information. In the presence of isoforms, the correspondence between TRN genes and PPI proteins will be one-to-many rather than one-to-one.

Quantifying structure and topology in complex biological networks has been actively researched within network biology. Design principles include the universality of scale-free networks1,77 (from metabolic networks across species28 and gene regulatory networks5,8,13,30,78,79, to power grids and the internet80), lethal deletions in the hubs of yeast16, disassortative mixing in molecular networks74, and the existence of sub-modules and reoccurring network motifs9,12. Network biology has only recently investigated the influence of multilayered multiplex networks in comparison to single network layers in isolation51,58,69,70,71,81. This study contributes to the understanding of internetwork connectivity in layered molecular interaction networks. It is the first to compare TRN–PPI multiplexes across species. We discover global trends across species with degree–degree coupling and edge redundancy positively correlated with increased robustness. Robustness is explored in the context of the TRN–PPI multiplex and is proposed as one of the selective pressures by which evolution has shaped internetwork connectivity, degree–degree coupling and redundancy. The design principles presented here may be useful for the future design and understanding of multiplex networks and to improve efficacy for targeting specific gene subgroups, e.g. in disease. This research presents a multiplex framework for additional investigations of design principles in interlayered biological networks.

Methods

Data

Networks

We compiled network data for nine different species. These nine species span two domains of life, namely bacteria and eukaryotes. There are three bacteria—H. pylori, M. tuberculosis and E. coli—and six eukaryotes—S. cerevisiae, C. elegans, D. melanogaster, A. thaliana, M. musculus and H. sapiens. For the eukaryotes, we collected three datasets—one TRN and two PPI networks. Among the bacteria, E. coli also has three datasets (one TRN and two independent PPI networks), while H. pylori and M. tuberculosis have one TRN and one PPI networks each. These datasets have been collected from diverse sources (see Supplementary Table S1). We use PPI data from multiple published sources—species-specific publications15,18,24, BioGRID database14 and HINT database57. Different experimental methods uncover different information about PPI networks26. Therefore, we only use PPI networks inferred from Yeast two-hybrid (Y2H) experiments82. Y2H infers binary protein–protein interactions and is a prominent strategy for identifying protein–protein interactions83. BioGRID and HINT do not have data for H. pylori and M. tuberculosis. PPI networks for these bacteria were collected from individual publications, Häuser et al.15 and Wang et al.24 respectively. E. coli only exists in HINT. We include another published PPI network for E. coli18. A consolidated database of TRN networks across species does not exist. Therefore, we collected protein–DNA interactions from different publications. References for all the species are given in Supplementary Table S1. Available TRN and PPI networks are incomplete. Consequently, they only contain a fraction of the total number of possible genes and proteins in the genome and proteome, respectively. Further, TRN and PPI networks used in this study have different numbers of genes and proteins (see Supplementary Table S1, Supplementary Information Additional File 2). For a given species, we have only considered genes and proteins which are present in both TRN and PPI networks in our analysis.

The following characteristics of the TRN and PPI networks used in this study are included in Supplementary Table S1—Number of genes and proteins, % proteome coverage in the TRN–PPI multiplex (fraction of the total proteome covered in the multiplex), number of network edges (edges represent connections between genes and proteins in TRN and PPI networks respectively), average degrees (average K in PPI and average \(k_{in}\) or \(k_{out}\) in TRN) and size of the Largest Connected Component (LCC) (subset of genes/proteins in TRN/PPI where every gene/protein is reachable from every other gene/protein) are shown for all nine species.

Essential genes

List of essential genes was collected for three species (yeast, fly and human) from the Online GEne Essentiality (OGEE) database60,61. The database has gene essentiality information on 48 species.

Pathogen-related genes

We collected human-pathogen protein–protein interaction data for 12 different pathogens from a publicly available database HPIDB 3.062,63. This curated database contains 69,787 unique protein interactions between 66 host and 668 pathogen species. Human-pathogen protein interactions for various human coronaviruses (HCoVs) were collected from a recently published paper64. In total, we analyzed pathogen-related gene–protein pairs for 13 pathogens.

Disease-related genes

We collected disease-related genes from a publicly available database DisGeNET65,66,67. The current version (v6.0) contains gene-disease associations between 17,549 genes and 24,166 diseases, disorders, traits, and clinical or abnormal human phenotypes. We collected disease-gene associations for diseases which have at least 100 genes in the human multiplex considered in this work.

Oncogenes and tumor suppressor genes

We collected the set of genes which contain mutations which have been causally implicated in cancer from the Network of Cancer Genes (NCG)68. NCG also includes information on whether a given cancer gene is an oncogene or TSG.

Multiplex formulation of transcriptional regulatory and protein–protein interaction networks

TRN and PPI networks are modeled as interdependent networks (Fig. 1A). TRN layer encodes the transcriptional program for producing proteins from genes. The proteins translated from the TRN layer participate in protein–protein interactions in the PPI layer. There is one-to-one correspondence between genes and proteins in the TRN and PPI network layers. This specific configuration of interdependent networks can be reduced to a multiplex network43, and we can apply the framework developed by Buldyrev et al.43.

We use graph theory to model and analyze TRN–PPI multiplex in this work. TRN and PPI networks are modeled as graphs with nodes representing genes and proteins respectively (Fig. 1A). Connections between nodes are represented by edges. PPI edges are undirected. Edges in TRN have directionality—transcription factors have edges emanating from them, while downstream genes have incoming edges. The connectivity pattern of edges is quantified by the concept of degree at each node. In PPI networks, degree (K) is the number of edges incident on a protein. For TRN, in-degree (\(k_{in}\)) is the number of transcription factors upstream of a gene, and out-degree (\(k_{out}\)) is the number of genes downstream of a transcription factor.

Since TRN and PPI layers have different coverage of the genome and proteome (see Supplementary Table S1), all analysis was done with genes and proteins present in both TRN and PPI networks.

Quantifying multiplex coupling

Degree–degree coupling

We quantify degree–degree coupling (\(C_D\)) using either Pearson’s correlation or Spearman’s rank correlation coefficient (Eq. 1).

$$\begin{aligned} C_D = cor(k_{out}, K), \end{aligned}$$
(1)

where cor() is the sample Pearson’s correlation or Spearman’s rank correlation coefficient, \(k_{out}\) is the TRN out-degree and K is the PPI degree.

Redundancy coupling

Redundancy coupling (\(C_R\)) is quantified by the number of edges simultaneously present in TRN and PPI. Assume that \(G^1\) and \(G^2\) are graphs representing TRN and PPI networks respectively, and \(V^1\) and \(V^2\) are the corresponding vertex sets. Let \(E(G^1)\) and \(E(G^2)\) be the edge sets for \(G^1\) and \(G^2\) respectively. The elements of the edge sets are vertex pairs. For instance, \((V^1_i, V^1_j)\in E(G^1)\) means that gene i is a transcription factor regulator for gene j, \((V^2_i, V^2_j)\in E(G^2)\) means that proteins i and j interact with each other. Interactions common between TRN and PPI can be mathematically represented by the number of common edges between \(G^1\) and \(G^2\) (Eq. 2).

$$\begin{aligned} Edges_{12} = |\{e \mid e \in E(G^1), e \in E(G^2) \}|, \end{aligned}$$
(2)

where \(Edges_{12}\) is the number of edges common between \(G^1\) and \(G^2\) and e represents an edge either in \(G^1\) or \(G^2\). In Fig. 2, we compute node-specific redundancy coupling. Here, \(C_R\) for each gene–protein is equal to the number of redundant edges incident on that gene–protein pair. \(C_R\) is either calculated as a z-score, which is computed as \(C_R = \frac{Edges_{12} - mean(Edges_{12}^{null})}{sd(Edges_{12}^{null})}\), where \(Edges_{12}^{null}\) is the number of redundant edges in a null model, or \(C_R = Edges_{12}\) or \(C_R = mean(Edges_{12}) - mean(Edges_{12}^{null})\). The definition of \(C_R\) used is specified in each figure’s caption.

Multiplex robustness

Quantifying multiplex robustness

We use MCGC to quantify multiplex robustness43. MCGC is the set of genes and proteins which are simultaneously connected in both the network layers—every gene/protein in MCGC is reachable from every other gene/protein in MCGC. MCGC is computed by finding the intersection between the largest connected components (LCCs) of the TRN and PPI network layers. To quantify response to targeted attack, we track the size of the largest MCGC at each step of the attack. We simulate attack on the multiplex via the following algorithm.

  1. 1.

    Compute multiplex degree for all gene–protein pairs in the multiplex. Multiplex degree is defined as \(K_{mult}(i)=max(K(i), k_{out}(i))\)58, where \(K_{mult}(i)\) is the multiplex degree for gene–protein pair i, K(i) is the degree of protein i in the PPI network and \(k_{out}(i)\) is the out-degree of gene i in the TRN network. Order multiplex degrees into a list of gene–protein pairs, D, arranged in decreasing order of multiplex degree.

  2. 2.

    At step L of the attack, remove the Lth gene–protein pair in D. Removing a gene (protein) from TRN (PPI) layer may lead to the failure of dependent proteins (genes) in the PPI (TRN) layer. This failure may progress recursively, affecting more nodes in the multiplex. This process is called a cascade of failures43.

  3. 3.

    After removing the attacked gene–protein pair and other failed dependent nodes at step L (cascade of failures), find the LCC in either TRN or PPI layer. At this stage, MCGC coincides with the LCC. Compute the size of MCGC. For computing size of MCGC, the TRN network is converted to an undirected version. Therefore, we calculate “weak” MCGC, where weak refers to the undirected nature of TRN.

  4. 4.

    Repeat steps 2 and 3 of this algorithm until MCGC breaks down.

This algorithm will generate a sequence of values, which give the trajectory of the MCGC as the multiplex is successively attacked. If we plot this trajectory as a function of the fraction of gene–protein pairs removed from the multiplex, robustness to attack can be assessed from either the area under the curve or the critical number of nodes removed for which the MCGC is fragmented46. We use area under the curve (RobustArea) as the measure for multiplex robustness. Thus, RobustArea is given as

$$\begin{aligned} RobustArea = \int _0^1 RMCGC(f)\, df, \end{aligned}$$
(3)

where \(RobustArea \in (0, 1]\) is the area under the curve, f is the fraction of gene–protein pairs removed during the attack, and RMCGC(f) is the size of MCGC relative to the total number of gene–protein pairs in the multiplex (n) after f fraction of gene–protein pairs have been removed from the multiplex. RobustArea quantifies absolute robustness. Relative robustness (R) is computed by comparing RobustArea against a null model. Cohen’s d is used to compute effect size of relative robustness (Eq. 4).

$$\begin{aligned} R = \frac{{mean(RobustArea_{obs}) - mean(RobustArea_{null})}}{\sqrt{\frac{var(RobustArea_{obs}) + var(RobustArea_{null})}{2}}}, \end{aligned}$$
(4)

where R is the relative robustness, \(RobustArea_{obs}\) and \(RobustArea_{null}\) are the RobustArea values for the observed multiplex and null model respectively and mean() and var() are the mean and variance functions respectively. We have assumed that \(RobustArea_{obs}\) and \(RobustArea_{null}\) have the same number of samples.

Since the attack is stochastic, given multiple nodes can have the same multiplex degree, we repeat targeted attack multiple times. For the species multiplexes, we repeat the attack either 1000 or 100 times.

Robustness for partial attack curves

For Fig. 5, we estimate robustness for a subset of functionally important gene–protein pairs and compare that against a random set of gene–protein pairs in the multiplex. Here, we explain the strategy to conduct such a comparison. Assume that \(S = \{S_1, S_2,\ldots , S_M\}\) is a collection of sets of gene–protein pairs in a multiplex. Here M is the total number of sets. Sets \(S_i\), \(i \in \{1, 2,\ldots , M\}\), can be mutually exclusive or not. Let \(M_{min}\) be the size of the smallest set in S. For an equitable comparison, we randomly sample (under an appropriate model) \(M_{min}\) number of gene–protein pairs from all the subsets, except for the smallest subset. We sample each subset 100 times. For each sampled version of a subset \(S_i \in S\), we attack the gene–protein pairs in \(S_i\) in decreasing order of multiplex degree for gene–protein pairs in that set, using the attack algorithm explained previously. We stop the attack once \(M_{min}\) number of gene–protein pairs have been removed from the multiplex. We also perform a similar partial attack on a set of randomly selected gene–protein pairs (under an appropriate null model). The size of the random set is set equal to \(M_{min}\). Robustness can be calculated for each subset based on the obtained partial attack curves. Relative robustness for each subset is calculated by comparing RobustArea for that subset against the random set.