Measuring criticality in control of complex biological networks

Someya, Wataru; Akutsu, Tatsuya; Schwartz, Jean-Marc; Nacher, Jose C.

doi:10.1038/s41540-024-00333-9

Download PDF

Article
Open access
Published: 20 January 2024

Measuring criticality in control of complex biological networks

npj Systems Biology and Applications volume 10, Article number: 9 (2024) Cite this article

936 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Recent controllability analyses have demonstrated that driver nodes tend to be associated to genes related to important biological functions as well as human diseases. While researchers have focused on identifying critical nodes, intermittent nodes have received much less attention. Here, we propose a new efficient algorithm based on the Hamming distance for computing the importance of intermittent nodes using a Minimum Dominating Set (MDS)-based control model. We refer to this metric as criticality. The application of the proposed algorithm to compute criticality under the MDS control framework allows us to unveil the biological importance and roles of the intermittent nodes in different network systems, from cellular level such as signaling pathways and cell-cell interactions such as cytokine networks, to the complete nervous system of the nematode worm C. elegans. Taken together, the developed computational tools may open new avenues for investigating the role of intermittent nodes in many biological systems of interest in the context of network control.

Sustaining a network by controlling a fraction of nodes

Article Open access 28 January 2023

Input node placement restricting the longest control chain in controllability of complex networks

Article Open access 07 March 2023

On quantification and maximization of information transfer in network dynamical systems

Article Open access 05 April 2023

Introduction

Controllability analysis integrates control theory concepts with complex networks techniques^1,2. Recent works have shown that minimum driver nodes (controllers) are not only important to control the network but also that they have been associated to important biological functions^3,4,5,6,7. In particular, network controllability has been applied to identify cancer-related genes^{3,8,9,10,11,12}, drug-targets^8,13, disease-related non-coding RNAs (ncRNAs)¹⁴, virus-targeted proteins^3,15,16 and has also been used to predict specific neuron functions¹⁷ as well as to unveil the human brain functional structure¹⁸.

Several network controllability methodologies, such as Maximum Matching methodology (MM)¹, Minimum Dominating Set (MDS) methodology², and Feedback Vertex Set (FVS) methodology¹⁹ have been proposed to determine the minimum set of driver nodes required to drive a system from an initial state to desired final state in finite time. However, the minimum driver node set is not necessarily determined uniquely in general. Therefore, we have to classify nodes into critical (a node that appears in all minimum driver node sets), redundant (a node that does not appear in any minimum driver node set) and intermittent (the remaining one). Note that this classification has been done in all major controllability approaches shown above^20,21,22. Extensive data analyses have demonstrated that the critical nodes play a prominent role in protein interaction networks as well as in metabolic pathways based on identified associations between them and human diseases and other important biological functions in a cell^{5,6,7,9,10,14,15,16,20,21}. Note that the analyses have not been limited to human cells, and that, for example, critical proteins have been observed as enriched among significantly overexpressed genes that are abundant in the spiral form of Helicobacter pylori, and that, therefore, they play a role in the mechanistic switch of the two bacterial forms⁷. In sharp contrast, redundant and intermittent control categories have received much less attention. While redundant nodes are not involved in control at all, intermittent nodes show a quite different profile because they are partially involved in control, and consequently their biological importance remains largely unexplored.

To quantify the likelihood that a node is a driver node in the context of a maximum matching approach, a so-called control capacity metric was proposed²³. The solution was obtained by using a random sampling algorithm, which was used to obtain the distribution of control capacity in several real-world networks²³. Here, we propose a criticality metric in the context of a Minimum Dominating Set (MDS) model that measures the importance of each intermittent node based on the number of times it appears in an MDS solution.

For an undirected graph $G\left(V,E\right)$, a subset U of V is called a dominating set (DS) if each node is in U or has at least a neighbor in U. The minimum dominating set (MDS) is a dominating set with the minimum cardinality. Note that MDS may not be determined uniquely. It was proven in ref. ² that if every edge is bidirectional and every node in an MDS (resp., DS) can assign an arbitrary value to itself and send arbitrary values to all of its outgoing links separately, at any time point, then the system can be driven to any desired state by using MDS nodes (resp., DS nodes) as driver nodes, in the sense of linear structural controllability (in ref. ²⁴). Furthermore, it was proven in ref. ¹⁶ that the system can be driven to any desired state by using MDS nodes as driver nodes for a certain class of non-linear discrete time systems, where similar properties may hold for wider classes of non-linear systems. For directed graphs, the same properties hold if we modify the definition of DS so that U is called a dominating set if each node is in U or has at least one incoming edge from some node in U.

While the maximum matching problem can be solved in polynomial time, the MDS problem is a well-known NP-hard problem. That means it is not expected to find an algorithm that solves this problem in polynomial time. Therefore, when the size of network increases, its computational time increases exponentially. Because the computation of criticality not only requires the computation of a single MDS solution but requires to know all possible solutions, it is clear that the criticality metric, although simple in its definition, poses significant computational challenges. Because of the intrinsic complexity of this problem, and to efficiently compute criticality metric, we developed a new algorithm (see Fig. 1d and details in the Methods section) and applied it to several biological networks to identify intermittent nodes with high criticality associated with relevant bio-medical functions. It is worth noting that while the sampling method proposed in²³ may be used to estimate the criticality based on MDS, their method is based on Markov chain and thus convergence to the uniform distribution may need many repetitions. Since MDS is NP-hard and thus its computation needs long CPU time (different from maximum matching), it is preferable to use a small number of repetitions. Therefore, we developed another heuristic method to estimate the criticality based on MDS.

**Fig. 1: Illustration of the criticality metric computation in a graph.**

Suppose that we have a directed graph $G\left(V,E\right)$ where $V=\left\{{v}_{1},{v}_{2},\ldots ,{v}_{N}\right\}$ is a set of nodes and E is a set of directed links. Let M_set be the set of all the minimum dominating sets (MDSs) as its elements for the $G\left(V,E\right)$, and M be an element of this set. We then define the criticality of each v_i by

$${{CR}}_{i}=\frac{\left|\left\{M\in {M}_{{set}}{\rm{|}}{v}_{i}\in M\right\}\right|}{\left|{M}_{{set}}\right|}$$

(1)

Note that this definition can also be applied to undirected graphs by considering a graph G with a set E of undirected links.

By using the metric shown in Eq. 1, we developed and implemented an efficient algorithm to compute criticality in directed biological networks. The proposed algorithm uses the Hamming distance concept among others to make feasible the computation in large networks (see Fig. 1d). As described above, and shown in Eq. 1, the criticality calculation is computationally intensive since it requires all possible MDS solutions, which increases exponentially with the network size. Moreover, the MDS problem itself is an NP-hard problem. Therefore, new efficient algorithms are required for its computation. First, we proposed and implemented an algorithm to efficiently computer criticality. The performance of the algorithm was evaluated using artificially generated scale-free networks (see Fig. 2 and Supplementary Figs. 1–5). The computational experiments demonstrated the correctness of the solution of the proposed heuristic algorithm as well as the improvements obtained in the computational time by using Hamming distance techniques among others. Next, we applied the proposed criticality algorithm to three biological networks and systems: the human RTK signal transduction pathway, the human cytokine-cytokine interaction network and the connectome of the nematode Caenorhadbitis elegans. The application of the developed algorithm to several biological networks unveiled that the intermittent nodes play a more important role than previously assumed. First, our findings show that the set of high criticality proteins in the human RTK signalling pathway are associated with phosphorylation and cancer. Moreover, they are statistically significantly enriched with disease genes associated to 16 specific human disorders, from congenital abnormalities to musculoskeletal diseases. Second, by taking advantage of the latest development on assembling cell-cell interaction data as well as cytokine-cytokine interaction network, we performed a criticality analysis of these networks with special attention of the inflammation network module related to the COVID-19 patients. While for the cell network most of the critical cells are involved or adjacent to this inflammation module, the results for the cytokine network show that the algorithm does not only recover well-known cytokines that are part of the module such as IL4 (critical) and CSF2 (high criticality), but also identifies new ones that are adjacent to this module such as IL20 (critical) and IL7, IL18 and TNFSF13 (high criticality), suggesting similar functional roles. The analysis of the neural network of C. elegans organism also assisted to classify intermittent nodes with high criticality into specific neuron types. Taken together, the application of the proposed criticality analysis in biological systems and the developed algorithmic tools in the context of controllability theory makes it possible to unveil previously overlooked biological functions of intermittent nodes in large networks.

**Fig. 2: Computational time versus the network size.**

Results

In this study, we first evaluated the performance of the algorithm using artificially generated scale-free networks. We then applied the proposed efficient criticality algorithm to the signal transduction pathways, cell-cell and cytokine-cytokine interaction networks and the connectome of the C. elegans organism.

Overview of the proposed criticality algorithm in MDS

Here we summarize the main computational steps involved in the proposed algorithm before showing the data analysis results. Note that the algorithm is described in detail in the Methods section and that the proofs of two mathematical propositions derived by us and involved in the algorithm can be found in the Methods section. As shown in the scheme (Fig. 1d), the first step requires the identification of all nodes control categories in the MDS. This can be efficiently computed by a direct application of the algorithm and techniques described in²¹. Next, the algorithm computes the initial value of Hamming distance K (between an existing MDS solution and the newly identified MDS solution) and calculates an MDS by using Integer Linear Programming (ILP) under the constraint that the Hamming distance must be greater or equal than K (see Eq. 20, the last constraint in the Methods section). The algorithm repeats the process until no solution for MDS is found (see Fig. 1d). Then, the criticality metric (Eq. 1) is computed. If the difference of criticality between two rounds is enough small (less than a threshold θ, where θ is a parameter that determines the precision of the computed criticality (see Supplementary Figs. 1–5)) or $K\le 2-{|MDS|}$, then the algorithm terminates. Otherwise, it updates the Hamming distance by reducing the value as K = K-2. Then, the algorithm re-starts the computation with different Hamming distance (see Fig. 1d). Note that the θ parameter can be adjusted in order to improve the accuracy of the criticality computation. In this work, experiments were done with small value of $\theta =0.01,\theta =0.02$ (see red lines in Supplementary Figs. 1–5).

Algorithm performance using artificially generated scale-free networks

First, to assess the computational performance of the algorithm using the Hamming distance, the computation time was measured using artificially generated scale-free networks. The degree exponent γ of the networks used in the analysis is 2.5. The results for average degrees $< {k}_{{in},{out}} > =\mathrm{3,4}$ are shown in Fig. 2a, b. The horizontal axis represents the number of nodes in the network and the vertical axis represents the computation time (seconds). The Hamming ($\theta =0.01,\theta =0.02$) curves are the results of the criticality algorithm using the Hamming distance with the specified value θ of 0.01 and 0.02 to terminate the calculation, respectively (see Fig. 2). The green dashed-line (normal) curve is the result when the criticality is calculated without using the Hamming distance-based algorithm. The results shown in Fig. 2a, b indicate that the proposed algorithm with Hamming distance provides significant improvements in computational time.

To assess the effect of the average degree, we consider two cases for $< {k}_{{in},{out}} > =3$ and 4. As shown in Fig. 2a, b (left and right), there is a shift in the Hamming ($\theta =0.01$) curve as the network is more complex for higher average degree, but the shift is very large indicating that the computation is significantly influenced by the average degree. In addition, the standard deviation is relatively large for the same network size, indicating that the criticality computation problem is even more complex than that of the original MDS which is a well-known NP-hard problem.

The Hamming distance K for the ILP calculation at the end of the computation is expressed as $K=2h-\left|{MDS}\right|$ (here, h means the number of the different nodes between an MDS sample and the new one), and from this, the value of h reads as $h=(K+\left|{MDS}\right|)/2.$

Note that we use a capital letter K to denote the Hamming distance, whereas we use lower-case letters, k_in, k_out and k to denote the in-degree, out-degree, and total degree of a node, respectively.

We then computed the dependence between the final h value and the network size, as shown in Fig. 2c. While for $< {k}_{{in},{out}} >$ = 3, the h value grows fast and becomes larger when increasing the network size, the curve displays a lower slope and takes smaller values for $< {k}_{{in},{out}} > =4$, being around 1 ~ 3 even with large increments of network size. By comparing Hamming curves $(\theta =0.01$) and ($\theta =0.02$) with $< {k}_{{in},{out}} > =4$, we observe that the h at the end of the calculation only differs by 1 ~ 2 as an average, but when combined with the information from Fig. 2a, b, the computational times becomes larger even when the h is different only by 1 or 2, because the number of MDS solutions with the same size also rapidly increases. Additional analyses on the algorithm performance and accuracy can be found in Supplementary Note 1 and Supplementary Figs. 1–5.

High criticality proteins in RTK signalling pathways are associated with phosphorylation and cancer

The Receptor Tyrosine Kinases (RTKs) pathway is the largest subnetwork of the signal transduction pathways and plays a key role in many biological processes such as growth, cell differentiation and metabolism²⁵. However, down-regulation and alterations in its genes lead to multiple human diseases, including various types of cancer. The signalling pathways have recently been analysed using controllability methods such as maximum matching⁸, minimum dominating sets²¹, minimum feedback vertex set²². While in all these studies the proteins of the pathway can be classified into three control categories, namely, critical, intermittent and redundant, the relative importance of the intermittent nodes in the network may have been overlooked.

First, we mapped the computed criticality metric onto the RTK signalling pathway (Fig. 3a). This continuous measure contrasts with previous discrete definitions of control classifications. Interestingly, most of the intermittent nodes have a large range of values of criticality and a noticeable fraction of them also has a high criticality above 0.5 (Fig. 3b). A detailed list of the proteins with high criticality is shown in Supplementary Table 1 (Excel file). The network also contains a large fraction of critical nodes. The reason is that many proteins act as receptors without incoming links from other proteins. The fraction of critical nodes excluding receptors (in-degree k_in≠0) is shown on the figure.

**Fig. 3: Criticality in the RTK signalling pathway.**

We then used a set of phosphorylated proteins derived from PhosphositePlus^8,26,27 to determine if there was a relation between phosphorylation and high criticality. PhosphositePlus is a comprehensive, manually curated resource gathering information about experimentally observed post-translational modifications (PTM). Cell rapid responses to environmental changes and conditions are mediated by the modification of certain protein binding features which are orchestrated by PTMs such as acetylation, ubiquitination and phosphorylation (pS/pT and pY). In this analysis, the subset of proteins affected by tyrosine phosphorylation (pY) was used.

The statistical analysis shows that the high criticality proteins are statistically significantly enriched with tyrosine phosphorylation (pY) mechanism (p-value = 0.0046; all p-values derived from two-tailed Fisher’s exact tests) (see Fig. 4). In contrast, the low criticality proteins are not significantly enriched by phosphorylation. This shows the biological importance of intermittent nodes which seems to be highly regulated by PTM related processes. See the Supplementary Note 2 and Supplementary Table 1 (Excel file) for the list of the high criticality proteins associated with the tyrosine phosphorylation (pY) mechanism.

**Fig. 4: Enrichment of high and low criticality proteins of RTK signaling pathways with phosphorylation (pY) and cancer datasets.**

In addition, we used a cancer-related gene-centric data from CancerGenes database which integrates four public resources namely, NCBI Entrez Gene, Ensembl BioMart, active promoter regions and the Sanger Institute COSMIC^8,28. Then, we similarly mapped it on the RTK signaling pathway. The results also showed a statistically significant enrichment of high criticality proteins with cancer genes (p-value = 0.0059; two-tailed Fisher’s exact test) (see Fig. 4). See also the Supplementary Table 1 (Excel file) for the list of proteins with high criticality associated with the cancer genes set.

To assess the relation between high criticality and node degree, we recomputed the enrichment in the mentioned above biological datasets by using node degree as a metric instead of high criticality. In a directed network, a node has in-degree k_in and out-degree k_out, and the sum of both is called total degree k. First, we considered a set of top degree nodes, which are obtained by selecting the same number of high criticality nodes as observed in the RTK signaling network ordered by degree. The second set is a set of high degree nodes composed by the same number of high criticality nodes as observed in the RTK signaling network and ordered by degree but excluding the same number of nodes as critical nodes observed in the network. Because our high criticality enrichment analysis is excluding the critical nodes (i.e., criticality score 1), this set also represents a fair benchmark for comparison purposes. Note that for each set, three sets are actually computed using in-degree k_in, out-degree k_out, and is total degree k. As shown in Fig. 4a, for top degree analysis, the high criticality gives highest enrichment and statistical significance for both phosphorylation and cancer genes sets.

For the high degree analysis, the results shown in Fig. 4b indicate that high criticality also gives higher enrichment than the rest of metrics, and only for k_in gives the same enrichment score. However, this case is not relevant for control purposes. Note that for k_out, there is negative enrichment (depletion). Therefore, this analysis reveals the specificity of high criticality nodes in the investigated datasets.

High criticality proteins in RTK signalling pathways are also associated with specific diseases

A closer insight on the high criticality proteins can be obtained by examining larger datasets corresponding to multiple and diverse human diseases. Here we used sets of genes corresponding to 299 diseases, which were used previously to unveil human disease modules in the interactome network²⁹, and mapped them on the RTK signaling pathway.

As shown in Fig. 5, the statistically significantly enrichment of high criticality proteins is observed in 16 different diseases. For example, congenital, hereditary and neonatal diseases are enriched by high criticality proteins. See the Supplementary Table 1 (Excel file) for a list of the identified high criticality proteins associated with each disease. It has been reported that developmental abnormalities and syndromes are linked to both cancer and RTK signaling pathways^25,30. Our algorithm identifies high criticality proteins that overlapped significantly with this disease (p-value = 0.0047; two-tailed Fisher’s exact test). Among cardiovascular diseases, the ischemic heart disease represents the largest cause of deaths in the world. It has also been reported that specific RTK targets are required to be regulated in gene expression to successfully treat ischemic heart diseases³¹. The analysis showed a statistically significant association of the cardiovascular disease (p-value = 0.0091; two-tailed Fisher’s exact test) and cardiovascular abnormalities (p-value = 0.0006; two-tailed Fisher’s exact test) with high criticality proteins. Several works have also highlighted the importance of inhibiting of RTKs targets as an improved therapeutic strategy for vascular disorders³². Moreover, we computed the fraction of critical, high criticality, low criticality, and redundant proteins in the RTK signaling pathway.

**Fig. 5: Enrichment of high criticality proteins in the RTK signaling pathways with human diseases.**

To assess that the high criticality measure outperforms degree centrality, we performed the same analysis as shown in Fig. 4 by using the high degree (Fig. 5a) and top degree (Fig. 5b) sets of nodes. The results clearly indicate that the identified set of high criticality proteins show the highest enrichment in all cases for high degree nodes, and also in almost all cases for top degree nodes except in three diseases for which k_out gave the same result.

Figure 6 shows the fraction of critical and high criticality nodes computed for each disease. Critical nodes appear in all of the MDS solutions, therefore are regarded as important nodes for control. Interestingly, the high criticality fraction tends to be sometimes larger than the critical one. In almost all of the analysed diseases, the combination of critical nodes and high criticality nodes represents more than half of the total nodes. The importance of these intermittent nodes that appear in many MDS solutions could have not been unveiled without using our proposed algorithm. Pathological processes, for example, do not have critical proteins, but a large fraction of the nodes are high criticality proteins.

**Fig. 6: Fraction of control categories for each disease.**

Key cytokines computed from cell interactions data in Covid-19 patients have high criticality

A recent study has inferred cell-cell interactions and cytokine-cytokine interactions that can be used to investigate multiple biological and disease processes such as inflammatory diseases³³. Each cell produces cytokines and different cells also have cytokine receptors. The combination of cytokine-receptor interactions and the tissue/cell-cytokine and receptor interactions led to obtaining two networks called cell-cell interaction network and cytokine-cytokine interaction network.

The cell network consists of 18 blood cell types and 24 tissues that are combined as 42 nodes. By means of the cytokines, a total of 581 interactions and communication channels between two cells were determined. The cytokine network consists of 115 nodes and the number of interactions between cytokines is 2818.

We then applied our criticality algorithm to both networks and determined the control categories of each network including critical, high and low criticality nodes.

Figure 7a shows a part of the entire cell network. The grey links correspond to the cell-cell interaction networks. The orange links correspond to those inter-cellular interactions involving elevated cytokines in COVID-19 patients³³. The red links represent interactions mediated by IL4 and IL5 cytokines, which are key cytokines involved in the inflammatory process. The figure shows the critical, low criticality and redundant cells in red, green and blue, respectively. Interestingly, all the critical cells identified in our analysis were adjacent to at least another cell already involved in the highlighted module of COVID-19 (see the six red nodes with dashed circles) or already present in the module such as the NK-cell. This suggests that the critical cells may also play a key role in the COVID-19 inflammatory response. Moreover, the skeletal muscle cell type shows up as low criticality and it is also connected to the main module. Myeloid DC, memory B- cell, MAIT T-cell and intermediate monocyte belong to the module and are also identified as low criticality cells. NK-cell is a critical cell and it is also included in the COVID-19 module.

**Fig. 7: Criticality in the cell-cell interactions network.**

As shown in Fig. 7b the network has approximately a 30% of nodes classified as intermittent nodes with low criticality and 17% of nodes are critical nodes.

The complete cytokine network consists of 115 nodes and 2818 interactions between cytokines³³. Figure 8a shows the subnetwork for the cytokines interactions that is overexpressed (blue links) in COVID-19 patients. Red links denote cytokine-cytokine interactions involving IL4 and IL5 cytokines. This network module is enlarged by several critical and high critical cytokines identified by our proposed algorithm. Interestingly, the IL20 cytokine is a critical cytokine and it is adjacent to the main module. The IL7, IL18 and TNFSF13 are high criticality cytokines identified by the algorithm. They are also adjacent to the main module, therefore, they are denoted with dashed lines. Finally, the CSF2 is a central cytokine in the module that has also high criticality value. This analysis shows that the combination of critical, high and low criticality control categories may lead to expand the knowledge of specific disease modules.

**Fig. 8: Criticality in the cytokine-cytokine network.**

The cytokine interaction network consists of a small fraction of critical nodes. One of the reasons for this is that the network is dense in terms of number of interactions among cytokines. Interestingly, this network shows a noticeable high criticality fraction which is even larger than that of the critical category (see Fig. 8b). This highlights the importance of the developed algorithm because it enables us to unveil control-related features in important subnetworks and molecules. Moreover, as shown in Fig. 8a these high criticality cytokines were not located randomly in the network. Instead, they were adjacent to at least one cytokine involved in the COVID inflammation network module. A list of the cytokines with the highest criticality values are shown in Table 1.

Table 1 Critical and high criticality cytokines in the COVID-19 cytokine-cytokine interaction network.

Full size table

In order to verify that the high criticality nodes (the four nodes shown in yellow in Fig. 8a) identified in the cytokine interaction network can indeed have stronger control properties on the COVID inflammation module than randomly selected nodes, we computed a series of metrics, namely the average shortest path from high criticality nodes to the network module <d > , the total number of all directed edges outgoing from any high criticality node and incoming to any node in the module (links) and the fraction of nodes in the module that have an incoming edge from high criticality nodes (cov) (that is, the module coverage ratio by the high criticality nodes) (see methods section for mathematical definitions). In each case from Fig. 9a to Fig. 9d, the values of the metric for the four high criticality nodes are indicated by red arrows and the average metric value of the redundant nodes in the module for COVID-19 is indicated by a blue line for comparison. Each histogram shows the distribution of each metric computed by randomly selecting four nodes from the cytokine interaction network. The number of trials of random selection for each metric is 10,000. The vertical axis of each figure shows the frequency of occurrence of the metric values of the randomly selected nodes as a percentage, and the horizontal axis shows the metric value.

**Fig. 9: Network metrics analysis for high criticality nodes.**

In Fig. 9a, the <d> value of the high criticality nodes is clearly smaller than the majority values obtained computed using randomly selected samples (p-value = 0.0063; one-tailed randomization test). This confirms that the high criticality nodes can control the module more easily than other nodes. In Fig. 9b, the links for the high criticality nodes are around 20 (p-value = 0.0045; one-tailed randomization test), while the links value for the randomly selected nodes peaks at around 10 in average, indicating that high criticality nodes have a greater impact on the module than the other nodes. In Fig. 9c, the cov metric of the module nodes is 1.0 for the high criticality nodes, indicating that these nodes are enough to effectively controlling the module (p-value = 0.0341; one-tailed randomization test). Note that Fig. 9c shows a discontinuous histogram, but this is because there are 10 module-constituting nodes and only 11 possible values for the coverage ratio. Figure 9d shows that the high criticality nodes coverage for the entire cytokine interaction network is over 90% (p-value = 0; one-tailed randomization test), indicating that the regulation of these cytokines may have a significant impact not only on the regulation of modules related to COVID-19, but also on other diseases.

To assess our results in comparison with other metrics, we selected five centrality metrics, namely, closeness, betweenness, page rank, out-degree and total degree. We then computed the same metrics <d > , links, and cov by using the set of top degree/centrality nodes (Fig. 9) and the set of high degree/centrality nodes (Supplementary Fig. 6 and Supplementary Note 3) ordered according to each centrality measure mentioned above. For the set of top degree/centrality nodes, the high criticality performs better than all the examined centrality for the cov metric in both the COVID module (Fig. 9c) and the entire network (Fig. 9d). Regarding <d> and links metrics, high criticality scores second after the out degree (Fig. 9a, b). Next, the results for the high degree set of nodes also show the best performance for the high criticality nodes when the cov metric is examined for the complete network (Supplementary Fig. 6d), the best result for the coverage of the COVID module scoring same as the out-degree (Supplementary Fig. 6c), and shows the second rank in the results for <d> and links measures (Supplementary Fig. 6a, b). It is worth mentioning that coverage is the metric most closely related to actual control among the other two. Moreover, it is to be noted that the high criticality nodes are part of a set that satisfies controllability properties derived by advanced techniques², while sets obtained by other centrality metrics such as page rank or degree, although they can be used to rank network nodes, do not satisfy controllability conditions.

Motor neurons in the C. elegans connectome tend to have high criticality

The nematode worm C. elegans is the only organism for which a detailed completed map of the nervous system has been reconstructed at the single neuron level resolution. It is therefore one of the best examples to test and apply control theory methods¹⁷. The C. elegans connectome is a highly dense directed network composed of several neuron classes. The network and neuron classification were compiled from information obtained from WormAtlas database^34,35. Ambient stimuli induce responses of sensory neurons acting in a role of input signal nodes distributing the incoming signals through a network composed of 281 neurons and 97 muscles (nodes), being connected by a total of 5256 directed synaptic interactions and/or neuro-muscular junctions. Motor neurons are connected to muscles, while inter-neurons mediate between neurons distributing the signals. Poly-modal neurons refer to those neurons classified as capable of double or more functionality.

By using this network, we applied the criticality algorithm and unveiled for the first time a subset of neurons that have a high criticality. See the Supplementary Note 2 and Supplementary Table 1 (Excel file) for a detailed list of the identified high criticality neurons in the C. elegans network. Figure 10a shows the C. elegans connectome network with circle colors denoting the neuron function according to figure legend and filled nodes referring to critical (red) and high criticality (orange) categories identified by the algorithm. Figure 10b shows the fraction of neuron types in each control category. The results show that the largest fraction of critical nodes as well as high criticality nodes are motor neurons. The amount of motor neurons in each control category decreases further from low criticality to redundant nodes. On the other hand, muscles play a more passive role in terms of control and therefore, they tend to appear in low criticality and redundant categories. High criticality nodes are also composed of sensory, interneuron and poly-modal neurons. No critical protein played a sensory role, however, intermittent neurons with high criticality may also be engaged as sensory tasks. The enrichment analysis shows the largest enrichment scores for critical and high criticality nodes for motor neurons (p-value = 0.0002 and p-value = 0.047; two-tailed Fisher’s exact tests, respectively) (see Fig. 10c). These results show not only reasonable functional findings explained by the MDS control approach but also demonstrate that the uncovered intermittent nodes with high criticality play more important and variate biological roles than previously thought.

**Fig. 10: Criticality in the connectome network.**

Here we also compared our results with those obtained from the set of high degree (Supplementary Fig. 7a, c) and the set of top degree nodes (Supplementary Fig. 7b, d). See also Supplementary Note 3. First, the analysis of the fraction of nodes (Supplementary Fig. 7a, b) indicates that high criticality nodes target a much more diverse set of nodes. While for the set of high degree nodes motor neurons are still identified in a low percentage (Supplementary Fig. 7a), they are completely overlooked for the set of top degree nodes (Supplementary Fig. 7b). Regarding the enrichment analysis, we observed similar findings. For the sets of high degree nodes (Supplementary Fig. 7c) and top degree nodes (Supplementary Fig. 7d), the out-degree nodes tend to target interneurons, while the high criticality nodes target motor neurons. These results confirm that the high criticality metric extracts unique information from the network, that could not be derived by using degree-based metrics.

Discussion

In this work, we have proposed an efficient algorithm based on a new criticality metric to analyse and determine the functional roles of intermittent nodes in complex biological networks. The computation of the criticality metric in a large network is challenging because by increasing the network size, the number of solutions of the MDS increases very fast. The criticality definition requires knowing all the possible MDS solutions of a given network, therefore it was necessary to develop an efficient algorithm to compute this metric. The algorithm can be applied to any type of directed networks. In this work, the application of this algorithm to several directed networks unveiled the importance and functional roles of intermittent nodes in different systems, from cellular level such as signaling pathways and cell-cell interactions such as cytokine networks, to large scale nervous systems from C. elegans. In our view, the proposed criticality metric and its efficient computation algorithm may open new avenues for investigating the role of intermittent nodes in many biological systems of interest in the context of the MDS framework for network control.

Methods

Hamming distance between minimum dominating sets

From the definition in Eq. 1, the calculation of criticality requires the computation of all MDS solutions. However, the number of the solutions of the MDS increases at a very fast rate as the network size increases, making it very difficult to calculate it in a large-scale network. In order to efficiently and approximately calculate the criticality in large networks, we introduce the concept of Hamming distance between MDSs in the proposed algorithm.

Let $G\left(V,E\right)$ be a graph with a set of nodes $V=\left\{{v}_{1},{v}_{2},\ldots ,{v}_{N}\right\}$ and a set of edges $E=\left\{{e}_{1},{e}_{2},\ldots ,{e}_{M}\right\}$. For a binary vector b, b[i] denotes the ith element of b. We represent a subset S of V by an n-dimensional binary vector b_S such that b_S[i] = 1 iff. ${v}_{i}\in {\rm{S}}$. Let ${d}_{H}\left({{\bf{b}}}_{1},{{\bf{b}}}_{2}\right)$ denote the Hamming distance between two binary vectors b₁ and b₂. Then, we can obtain the following two propositions. Note that, intuitively, bs represents an MDS and that the Hamming distance between b₁ and b₂ represents how many nodes are different or swapped between two MDSs.

Proposition 1

Suppose that both S₁ and S₂ are minimum dominating sets of $G\left(V,E\right)$. Then, ${d}_{H}\left({{\bf{b}}}_{{S}_{1}}{\boldsymbol{,}}{{\bf{b}}}_{{S}_{2}}\right)$ is even.

(Proof)

Let MDS be a minimum dominating (MDS) for $G\left(V,E\right)$. Let ${S}_{1}\backslash {S}_{2}=\left\{s,|,s\in {S}_{1},s\notin {S}_{2}\right\}.$ We define the following numbers:

${x}_{1}=\left|{S}_{1}{\rm{\backslash }}{S}_{2}\right|$,
${x}_{2}=\left|{S}_{2}{\rm{\backslash }}{S}_{1}\right|$,
$y=\left|{S}_{1}\cap {S}_{2}\right|$.

Since both S₁ and S₂ are MDSs, we have ${x}_{1}+y={x}_{2}+y={|MDS|}$ from which ${x}_{1}={x}_{2}$ follows. Therefore, the proposition follows from the following equality:

$${d}_{H}\left({{\bf{b}}}_{{S}_{1}},{{\bf{b}}}_{{S}_{2}}\right)={x}_{1}+{x}_{2}=2{x}_{1}=2{x}_{2}$$

(2)

Proposition 2

Let MDS, CMDS, and IMDS be a minimum dominating set, the set of critical

nodes, and the set of intermittent nodes, respectively. Then,

$${d}_{H}\left({{\bf{b}}}_{{S}_{1}},{{\bf{b}}}_{{S}_{2}}\right)\le 2\cdot \min \left(\left|{IMDS}\right|+\left|{CMDS}\right|-\left|{MDS}\right|,\left|{MDS}\right|-\left|{CMDS}\right|\right).$$

(3)

Let S₁ and S₂ be MDSs. We define x and y by

$$x=\left|{S}_{1}{\rm{\backslash }}{(S}_{1}\cap {S}_{2})\right|={\rm{|}}{S}_{2}{\rm{\backslash }}{(S}_{1}\cap {S}_{2}){\rm{|}}$$

(4)

$$y={\rm{|}}{(S}_{1}\cap {S}_{2}){\rm{\backslash }}{CMDS}{\rm{|}}.$$

(5)

Then, we have the following:

$$x+y+{|CMDS|}={|MDS|},$$

(6)

$${d}_{H}\left({{\bf{b}}}_{{S}_{1}},{{\bf{b}}}_{{S}_{2}}\right)=2x,$$

(7)

$$2x+y\le |IMDS|,$$

(8)

where the last inequality comes from the following facts:

$${\rm{|}}({S}_{1}\cup {S}_{2}){\rm{|}}=2x+y+{\rm{|}}{CMDS}{\rm{|}},$$

(9)

$$({S}_{1}\cup {S}_{2}){\rm{\backslash }}{CMDS}\subseteq {IMDS}.$$

(10)

We can see from Eqs. 6 and 8 that the following hold:

$$2x\le {\rm{|}}{IMDS}{\rm{|}}-y={\rm{|}}{IMDS}{\rm{|}}-{\rm{|}}{MDS}{\rm{|}}+{\rm{|}}{CMDS}{\rm{|}}+x,$$

(11)

$$x\le {\rm{|}}{IMDS}{\rm{|}}-{\rm{|}}{MDS}{\rm{|}}+{\rm{|}}{CMDS}{\rm{|}}.$$

(12)

Therefore, we have:

$${d}_{H}\left({{\bf{b}}}_{{S}_{1}},{{\bf{b}}}_{{S}_{2}}\right)=2x\le 2\left(\left|{IMDS}\right|+\left|{CMDS}\right|-\left|{MDS}\right|\right).$$

(13)

In addition, we can see from Eq. (6) and y ≥ 0 that the following holds:

$$x\le {\rm{|}}{IMDS}{\rm{|}}-{\rm{|}}{CMDS}{\rm{|}}.$$

(14)

Therefore, we have:

$${d}_{H}\left({{\bf{b}}}_{{S}_{1}},{{\bf{b}}}_{{S}_{2}}\right)=2x\le 2\left(\left|{IMDS}\right|-\left|{CMDS}\right|\right)$$

(15)

Efficient algorithm for computing criticality in networks

Here we present an algorithm for the computation of the newly proposed criticality control category metric (See also Fig. 1d for a simplified schema of the algorithm steps). The algorithm uses the Hamming distance as well as the two mathematical propositions presented in the above section. The algorithm is implemented by following the steps below:

1.
Determine all nodes in the network into the MDS control categories (critical, intermittent and redundant) using the Critical MDS (CMDS) control category classification algorithm and select one MDS, which is called S₁.
2.
Set constraints x_i = 1 and x_i = 0 on the critical and redundant vertices determined from step (1), respectively.
3.
Calculate the maximum difference n between the MDS solution (S₁) and the newly identified MDS solution (S₂) in this step, and the initial value of the Hamming distance K for the ILP computation by using the network control configuration obtained from step (1).
4.
Set the constraint that the Hamming distance (between an existing MDS solution and the newly identified MDS solution) must be greater or equal than K
5.
Calculate an MDS solution by using ILP computation under the constraints of step (4) (see Eq. 20).
6.
Repeat steps 4 and 5 until there are no more MDS solutions.
7.
Calculate the criticality metric as shown in Eq. 1.
8.
Compute the difference of the criticality between the current and previous K calculation. (Note this step can only execute from the second round since we need two criticality values to compute the difference).
9.
Update the constraints on the Hamming distance (see details below).
10.
Repeat the steps (4 to 9) until the specific conditions for algorithm termination are satisfied (see details below).

The constraints on the Hamming distance for the identified MDS solution in step (3) are explained as follows. If the identified MDS solution at step (1) is S₁ and the new arbitrarily obtained MDS solution in step (3) is S₂, the maximum difference n between these two MDS solutions can be calculated as follows:

Let us consider $h=\left|{S}_{2}\backslash {S}_{1}\right|$, then the Hamming distance ${d}_{H}\left({{\bf{b}}}_{{S}_{1}},{{\bf{b}}}_{{S}_{2}}\right)$ between S₁ and S₂ can be expressed as

$${d}_{H}\left({{\bf{b}}}_{{S}_{1}},{{\bf{b}}}_{{S}_{2}}\right)=2h\le 2\min \left(\left|{IMDS}\right|+\left|{CMDS}\right|-\left|{MDS}\right|,\left|{MDS}\right|-\left|{CMDS}\right|\right).$$

(16)

By invoking Proposition 1 and 2, the above expression can be rewritten in terms of h and reads as

$$h=\frac{{d}_{H}\left({{\bf{b}}}_{{S}_{1}},{{\bf{b}}}_{{S}_{2}}\right)}{2}\le \min \left(\left|{IMDS}\right|+\left|{CMDS}\right|-\left|{MDS}\right|,\left|{MDS}\right|-\left|{CMDS}\right|\right).$$

(17)

Therefore, when $h=\min \left(\left|{IMDS}\right|+\left|{CMDS}\right|-\left|{MDS}\right|,\left|{MDS}\right|-\left|{CMDS}\right|\right)$, the maximum value is obtained, which can be used to define n.

Next, we explain how to determine the Hamming distance K for the ILP calculation. Let us consider ${x}_{i}\left({x}_{i}\in {S}_{1}\cap {S}_{2}\right)=-1$, ${x}_{i}\left({x}_{i}\in {S}_{2}\backslash {S}_{1}\right)=1$, ${x}_{i}\left({x}_{i}\in \bar{{S}_{2}}\right)=0$, and $h=\left|{S}_{2}\backslash {S}_{1}\right|$, then we define the Hamming distance K for the ILP calculation as the sum of x_i in S₂, and that can be written as

$$K=-\left|{CMDS}\right|-\left(\left|{MDS}\right|-\left|{CMDS}\right|-k\right)+h=2h-\left|{MDS}\right|.$$

(18)

By using the obtained n (as shown above), we can calculate the initial value of the Hamming distance K for the ILP calculation, that is

$$K=2n-\left|{MDS}\right|.$$

(19)

Then, by using the K obtained above, the ILP formulation in step (4) is given by the following Eq. 20.

Minimize

$$\sum _{v\in V}{x}_{v}$$

Subject to

$$\begin{array}{c}{x}_{u}+\sum _{(v,u)\in E}{x}_{v}\ge 1\, \forall u\in V,\\ {x}_{v}=1\,\forall v\in CMDS,\\ {x}_{v}=0\,\forall v\in RMDS,\\ -\sum _{v\in S}{x}_{v}+\sum _{v\in \bar{S}}{x}_{v}\ge K\,\forall S\in MDSs,\\ {x}_{v}\in \{0,1\}\,\forall v\in V.\end{array}$$

(20)

In addition, as the MDSs include the solutions of all calculated MDS, the constraint equation related to Hamming distance K (see Eq. 20) requires ${\rm{|}}{MDSs|}$ equations. Note that the number of these equations increases proportionally to the number of MDS solutions.

By updating the constraint equation in step (9), the definition of the Hamming distance K ($K=2h-\left|{MDS}\right|$) for the ILP calculation indicates that if $h=\left|{S}_{2}\backslash {S}_{1}\right|$ changes by 1, the Hamming distance K changes by 2, therefore the Hamming distance K is updated as K = K-2 for each round (see Eq. 20).

The computation to determine the difference in criticality mentioned in step (8) is explained below.

When $\left|{IMDS}\right|=I$, the criticality of v_i computed at the Hamming distance K is expressed as ${{CR}}_{i}^{K}$. Then, the difference of criticality ${{dCR}}_{K,K-2}$ between K and K-2 is calculated by using Eq. 21.

$${{dCR}}_{K,K-2}=\sqrt{\frac{1}{I}\mathop{\sum }\limits_{i=1{\rm{;}}{v}_{i}\in {IMDS}}^{N}{\left({{CR}}_{i}^{K}-{{CR}}_{i}^{K-2}\right)}^{2}}$$

(21)

While the critical and redundant nodes have a criticality score of 1 and 0, respectively, the intermittent nodes are the only ones that may change their criticality score between K and K-2. Therefore, in Eq. 21, ${{dCR}}_{K,K-2}$ is computed using only intermittent nodes.

Regarding the specific conditions mentioned in step (10), the algorithm is terminated when ${{dCR}}_{K,K-2}$ is below the pre-defined termination threshold θ. The value of θ is a free parameter that can be set up in the algorithm before computation starts. By decreasing this value, we can increase the precision of the computed criticality.

When $\left|{IMDS}\right|\ne 0$ ($\left|{IMDS}\right|=0$) because the minimum value of $h=\left|{S}_{2}\backslash {S}_{1}\right|$ is 1 (0), from the definition of Hamming distance K ($K=2h-\left|{MDS}\right|$), the calculation is terminated when $K=2-\left|{MDS}\right|$ ($K=-\left|{MDS}\right|$), respectively. Therefore, by combining the above two cases, the termination condition can be written as $K\le 2-\left|{MDS}\right|$ (see Fig. 1d).

In this algorithm, the Hamming distance K takes its maximum value at the beginning of the computation. Therefore, it is possible to start the MDS calculation from the most constrained case where intermittent nodes in the MDS solutions must be interchanged. By gradually relaxing the constraint of the number of interchanges, it is possible to efficiently discriminate between nodes included in many MDS solutions (high criticality) and those included only in a few MDS solutions (low criticality).

Evaluation methods for the high criticality proteins

Enrichment analysis

To investigate the relationship between high criticality (HCR) proteins and disease-related proteins, the enrichment score was computed. The enrichment score indicates how abundant these proteins are in the set of high criticality proteins. The computation method is described as follows:

First, the ratio f^D indicates the fraction between the number of the disease-associated proteins N^D and the total number of proteins N constituting the network:

$${f}^{D}=\frac{{N}^{D}}{N}$$

(22)

The ratio ${f}_{{HCR}}^{D}$ indicates the fractions between the number of each disease-related protein in the high criticality category ${N}_{{HCR}}^{D}$ and the number of high criticality proteins N_HCR in the network:

$${f}_{{HCR}}^{D}=\frac{{N}_{{HCR}}^{D}}{{N}_{{HCR}}}$$

(23)

From above equations, we can define the enrichment ${E}_{{HCR}}^{D}$ of a set of proteins (related to diseases or other biological function of interest in general) in the set of high criticality proteins as follows:

$${E}_{{HCR}}^{D}={\log }_{2}\left(\frac{{f}_{{HCR}}^{D}}{{f}^{D}}\right)$$

(24)

The statistical significance of these values was calculated by Fisher’s exact probability test (two-tailed test). In this study, the relationship between high criticality proteins and each disease-related protein is considered to be statistically significant when the p-value calculated by the test is less than 0.05.

The same method was used to calculate the enrichment of phosphorylation mechanisms-associated proteins as well as for the enrichment of each functional neuron category in the set of high criticality proteins in the C. elegans organism.

The enrichment for different control categories was also computed with this method replacing the set of high criticality by those of each control category such as critical, intermittent and redundant. Note that the analysis classified the intermittent set into high criticality and low critical sets based on the criticality score (see Eq. 1) above or below 0.5, respectively.

Assessment of high criticality nodes in control of network module

In order to verify that the high criticality nodes have stronger control properties on the COVID inflammation module, the metrics described below were calculated. In each following equation, S_M represents the set nodes included in the module disease, and S_HCR represents the set of identified high criticality nodes.

$$< d > =\frac{1}{\left|{S}_{{HCR}}\right|\times \left|{S}_{M}\right|}\sum _{{v}_{j}\in {S}_{{HCR}}}\sum _{{v}_{i}\in {S}_{M}}d\left({v}_{j},{v}_{i}\right)$$

(25)

In Eq. 25, $d\left({v}_{j},{v}_{i}\right)$ represents the shortest path distance from nodes v_j to v_i.

The shortest path distance means the minimum number of edges that must be passed from the initial node to the destination node. Therefore, <d> means the average of the sum of all shortest path distances from each high criticality node to each module node. Thus, from Eq. 8, the closer this value is to 1.0, the deeper the structural association between high criticality and the disease module is in the network.

Next, Eqs. 26 and 27 show the metrics for links and cov, respectively.

$${links}=\sum _{{v}_{i}\in {S}_{M}}\left|\left\{{v}_{j}\in {S}_{{HCR}}{\rm{|}}\left({v}_{j},{v}_{i}\right)\in E\right\}\right|$$

(26)

$$\mathrm{cov}=\frac{1}{\left|{S}_{M}\right|}\sum _{{v}_{i}\in {S}_{M}}\min \left(1,\left|\left\{{v}_{j}\in {S}_{{HCR}}{\rm{|}}\left({v}_{j},{v}_{i}\right)\in E\right\}\right|\right)$$

(27)

where E indicates the set of directed edges.

The links means the total number of all directed edges outgoing from any high criticality node and incoming to any node in the module. The cov means the fraction of nodes in the module that have an incoming edge from high criticality nodes (that is, the module coverage ratio by the high critical nodes). From both equations, the higher the value of links indicates that the high criticality nodes have a greater impact on the module, and the closer the value of cov is to 1.0, the more effective the control of the high criticality nodes is in controlling the entire module.

After calculating the metric scores for the high criticality nodes (red arrows in Fig. 9), to assess the randomness impact and to evaluate the statistical significance of the results, we performed a statistical analysis as follows. We constructed sets of nodes with same size of the number of observed high criticality nodes. Each set of nodes was randomly selected from the entire network and their metric values were calculated. This random selection was performed 10,000 times. The results are shown (see Fig. 9) as histogram plots of the distribution of the calculated metric values in the random samples and a comparison with the metric values of the observed high criticality nodes (red arrows).

Network centrality metrics

To compute the closeness, betweenness and Page Rank network metrics we used the Networkx python package version: 2.8.4.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data used in this study is publicly available from databases^25,35 and from previous publications (Supplemental Material section) referenced in the text^8,29,33: https://doi.org/10.1073/pnas.1603992113⁸, https://doi.org/10.1126/science.1257601²⁹, https://doi.org/10.3390/cells10092242³³. The list of the identified intermittent proteins and neurons nodes with high criticality is shown in the Supplementary Table 1 (Excel file) that accompanies this manuscript. Data that support the tables and figures of this study are available from the corresponding author upon request.

Code availability

The custom code for the developed algorithm used in the analysis is publicly available from https://github.com/wataru-s1/criticality.

References

Liu, Y.-Y., Slotine, J.-J. & Barabási, A.-L. Controllability of complex networks. Nature 473, 167–173 (2011).
Article CAS PubMed Google Scholar
Nacher, J. C. & Akutsu, T. Dominating scale-free networks with variable scaling exponent: heterogeneous networks are not difficult to control. N. J. Phys. 14, 073005 (2012).
Article Google Scholar
Wuchty, S. Controllability in protein interaction networks. Proc. Natl Acad. Sci. USA 111, 7156–7160 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X.-F., Ou-Yang, L., Zhu, Y., Wu, M.-Y. & Dai, D.-Q. Determining minimum set of driver nodes in protein-protein interaction networks. BMC Bioinforma. 16, 146 (2015).
Article Google Scholar
Basler, G., Nikoloski, Z., Larhlimi, A., Barabási, A.-L. & Liu, Y.-Y. Control of fluxes in metabolic networks. Genome Res. 26, 956–968 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ishitsuka, M., Akutsu, T. & Nacher, J. C. Critical controllability in proteome-wide protein interaction network integrating transcriptome. Sci. Rep. 6, 23541 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wuchty, S. et al. Proteome data improves protein function prediction in the interactome of helicobacter pylori. J. Mol. Cell. Proteom. 17, 961 (2018).
Article CAS Google Scholar
Vinayagama, A. et al. Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets. Proc. Natl Acad. Sci. USA 113, 4976–4981 (2016).
Article Google Scholar
Wakai, K., Ishitsuka, M., Kishimoto, T., Ochiai, T. & Nacher, J. C. Identification of genes and critical control proteins associated with inflammatory breast cancer using network controllability. PLoS ONE 12, e0186353 (2017).
Article PubMed PubMed Central Google Scholar
Guo, W.-F. et al. A novel network control model for identifying personalized driver genes in cancer. PLoS Comput. Biol. 15, e1007520 (2019).
Article PubMed PubMed Central Google Scholar
Schwartz, J. M., Otokuni, H., Akutsu, T. & Nacher, J. C. Probabilistic controllability approach to metabolic fluxes in normal and cancer tissues. Nat. Commun. 10, 2725 (2019).
Article PubMed PubMed Central Google Scholar
Pan, C. et al. Control analysis of protein-protein interaction network reveals potential regulatory targets for MYCN. Front. Oncol. 11, 633579 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sun, P. G. Co-controllability of drug-disease-gene network. N. J. Phys. 17, 085009 (2015).
Article Google Scholar
Kagami, H., Akutsu, T., Maegawa, S., Hosokawa, H. & Nacher, J. C. Determining associations between human diseases and non-coding RNAs with critical roles in network control. Sci. Rep. 5, 14577 (2015).
Article CAS PubMed PubMed Central Google Scholar
Goodacre, N., Devkota, P., Bae, E., Wuchty, S. & Uetz, P. Protein-protein interactions of human viruses. Semin. Cell Dev. Biol. 99, 31 (2020).
Article CAS PubMed Google Scholar
Ravindran, V. et al. Network controllability analysis of intracellular signalling reveals viruses are actively controlling molecular systems. Sci. Rep. 9, 2066 (2019).
Article PubMed PubMed Central Google Scholar
Yan, G. et al. Network control principles predict neuron function in the Caenorhabditis elegans connectome. Nature 550, 519–523 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lee, B., Kang, U., Chang, H. & Cho, K.-H. The hidden control architecture of complex brain networks. iScience 13, 54–162 (2019).
Article Google Scholar
Mochizuki, A., Fiedler, B., Kurosawa, G. & Saito, D. Dynamics and control at feedback vertex sets. II: A faithful monitor to determine the diversity of molecular activities in regulatory networks. J. Theor. Biol. 335, 130–146 (2013).
Article PubMed Google Scholar
Jia, T. et al. Emergence of bimodality in controlling complex networks. Nat. Commun. 4, 2002 (2013).
Article PubMed Google Scholar
Ishitsuka, M., Akutsu, T. & Nacher, J. C. Critical controllability analysis of directed biological networks using efficient graph reduction. Sci. Rep. 7, 14361 (2017).
Article PubMed PubMed Central Google Scholar
Bao, Y. et al. Analysis of critical and redundant vertices in controlling directed biological networks using feedback vertex sets. J. Comput. Biol. 25, 1071–1090 (2018).
Article CAS PubMed Google Scholar
Jia, T. & Barabási, A.-L. Control capacity and a random sampling method in exploring controllability of complex networks. Sci. Rep. 3, 1–6 (2013).
Article Google Scholar
Lin, C.-T. Structural controllability. IEEE Trans. Autom. Control 19, 201–208 (1974).
Article Google Scholar
Fazekas, D. et al. SignaLink 2–a signaling pathway resource with multi-layered regulatory networks. BMC Syst. Biol. 7, 7 (2013).
Article PubMed PubMed Central Google Scholar
Woodsmith, J., Kamburov, A. & Stelzl, U. Dual coordination of post translational modifications in human protein networks. PLOS Comput. Biol. 9, e1002933 (2013).
Article CAS PubMed PubMed Central Google Scholar
Beck, P. V. et al. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261–D270 (2012).
Article Google Scholar
Higgins, M. E., Claremont, M., Major, J. E., Sander, C. & Lash, A. E. Cancer Genes: A gene selection resource for cancer genome projects. Nucleic Acids Res. 35, D721–D726 (2007).
Article CAS PubMed Google Scholar
Menche, J. et al. Uncovering disease-disease relationships through the incomplete interactome. Science 347, 6224 (2015).
Article Google Scholar
McDonell, L. M., Kernohan, K. D., Boycott, K. M. & Sawyer, S. L. Receptor tyrosine kinase mutations in developmental syndromes and cancer: two sides of the same coin. Hum. Mol. Genet. 24, R60–R66 (2015).
Article CAS PubMed PubMed Central Google Scholar
Heliste, J. et al. Receptor tyrosine kinase profiling of ischemic heart identifies ROR1 as a potential therapeutic target. BMC Cardiovasc. Disord. 18, 196 (2018).
Article CAS PubMed PubMed Central Google Scholar
Batchu, S. N. & Korshunov, V. A. Novel tyrosine kinase signaling pathways: Implications in vascular remodeling. Curr. Opin. Nephrol. Hypertens. 21, 122–127 (2012).
Article CAS PubMed PubMed Central Google Scholar
Olbei, M. et al. CytokineLink: A cytokine communication map to analyse immune responses- case studies in inflammatory bowel disease and Covid-19. Cells 10, 2242 (2021).
Article CAS PubMed PubMed Central Google Scholar
Varshney, L. R., Chen, B. L., Paniaqua, E., Hall, D. H. & Chklovskii, D. B. Structural properties of the C. elegans neuronal network. PLoS Comput. Biol. 3, e1001066 (2011).
Article Google Scholar
Altun, Z. F. et al. (ed.s) 2002-2023. WormAtlas: https://www.wormatlas.org/neuronalwiring.html.

Download references

Acknowledgements

T.A. was partially supported by JSPS KAKENHI Grant Numbers 22H00532 and 22K19830. J.C.N. was partially supported by JSPS KAKENHI Grant Number 18K11535. This research was also supported in part by the Research Collaboration Projects of the Institute for Chemical Research, Kyoto University.

Author information

Authors and Affiliations

Department of Information Science, Faculty of Science, Toho University, Funabashi, Chiba, 274-8510, Japan
Wataru Someya & Jose C. Nacher
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Uji, 611-0011, Japan
Tatsuya Akutsu
School of Biological Sciences, University of Manchester, Manchester, M13 9PT, UK
Jean-Marc Schwartz

Authors

Wataru Someya
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuya Akutsu
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Schwartz
View author publications
You can also search for this author in PubMed Google Scholar
Jose C. Nacher
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.S. and J.C.N. conducted the principal research and developed the main algorithm. W.S. analysed the empirical datasets, implemented the algorithms and prepared figures. J.C.N. and T.A. designed research, contributed new analytical and theoretical tools and wrote the paper. J.-M.S. contributed to biological discussion and data interpretation. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Jose C. Nacher.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

Supplementary Table 1

Reporting summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Someya, W., Akutsu, T., Schwartz, JM. et al. Measuring criticality in control of complex biological networks. npj Syst Biol Appl 10, 9 (2024). https://doi.org/10.1038/s41540-024-00333-9

Download citation

Received: 02 March 2023
Accepted: 04 January 2024
Published: 20 January 2024
DOI: https://doi.org/10.1038/s41540-024-00333-9

Subjects

Abstract

Similar content being viewed by others

Sustaining a network by controlling a fraction of nodes

Input node placement restricting the longest control chain in controllability of complex networks

On quantification and maximization of information transfer in network dynamical systems

Introduction

Results

Overview of the proposed criticality algorithm in MDS

Algorithm performance using artificially generated scale-free networks

High criticality proteins in RTK signalling pathways are associated with phosphorylation and cancer

High criticality proteins in RTK signalling pathways are also associated with specific diseases

Key cytokines computed from cell interactions data in Covid-19 patients have high criticality

Motor neurons in the C. elegans connectome tend to have high criticality

Discussion

Methods

Hamming distance between minimum dominating sets

Efficient algorithm for computing criticality in networks

Evaluation methods for the high criticality proteins

Enrichment analysis

Assessment of high criticality nodes in control of network module

Network centrality metrics

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Material

Supplementary Table 1

Reporting summary

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links