Network controllability analysis of intracellular signalling reveals viruses are actively controlling molecular systems

In recent years control theory has been applied to biological systems with the aim of identifying the minimum set of molecular interactions that can drive the network to a required state. However, in an intra-cellular network it is unclear how control can be achieved in practice. To address this limitation we use viral infection, specifically human immunodeficiency virus type 1 (HIV-1) and hepatitis C virus (HCV), as a paradigm to model control of an infected cell. Using a large human signalling network comprised of over 6000 human proteins and more than 34000 directed interactions, we compared two states: normal/uninfected and infected. Our network controllability analysis demonstrates how a virus efficiently brings the dynamically organised host system into its control by mostly targeting existing critical control nodes, requiring fewer nodes than in the uninfected network. The lower number of control nodes is presumably to optimise exploitation of specific sub-systems needed for virus replication and/or involved in the host response to infection. Viral infection of the human system also permits discrimination between available network-control models, which demonstrates that the minimum dominating set (MDS) method better accounts for how the biological information and signals are organised during infection by identifying most viral proteins as critical driver nodes compared to the maximum matching (MM) method. Furthermore, the host driver nodes identified by MDS are distributed throughout the pathways enabling effective control of the cell via the high ‘control centrality’ of the viral and targeted host nodes. Our results demonstrate that control theory gives a more complete and dynamic understanding of virus exploitation of the host system when compared with previous analyses limited to static single-state networks.

about host-cell subsystems that are perturbed during infection and to investigate approved drug targets [10][11][12] . It is clear that the specific intra-cellular functions required by a virus to replicate and maintain an infection are the important focus for the virus, and the individual molecular interactions with the host system are just a means to this end 13 . Linked to networks, graph theory is widely used as a model to describe and visualize perturbation in host cellular systems at the molecular level. For example, the tendency of 'hubs' (highly connecting proteins) and high centrality proteins, to be targeted by viruses has been highlighted 8,14,15 . However, in some cases these can be explained by the over-representation of highly-connected molecules in the host function being used 10 . The majority of these studies analyse and visualise the virus-host relationships in static networks, i.e., all of the interactions represented as active, disregarding the temporal and spatial nature of infection. In reality the virus must interact with a complex non-linear dynamic host system. Study of the dynamic nature of infection tends to be limited to specific sub-cellular systems, for example, using logical models in the context of T-cell signalling provided an approach that dynamically modelled the host-viral interactions to identify potential drug targets 16 .
Control theory has emerged as a mathematical framework for understanding how best to control an engineered system, and has largely been applied to study complex dynamic networks and identify ways to control network behaviour [17][18][19][20][21] . The aim is to identify the minimum number of inputs, termed 'driver' nodes, that can steer the system from any initial state to any final state in finite time. Past studies of network controllability have identified the driver proteins to be associated with human diseases like cancer [22][23][24][25][26] and other molecular interaction networks [27][28][29][30] .
Viral infection is unique as a system for the study of the applicability of controllability to a natural system, as the virus makes many interactions with the host system and is explicitly exploiting host functions. Past studies have explored the use of control theory in biology. However, in these systems only a few molecules are involved in any instance, e.g., in the case of disease or drug therapy based studies, and so studying all known disease-associated molecules or all drug targets at the same time has little biological meaning 23,31 . In studies where viruses have been investigated explicitly the virus-host interactions have not been included in the network 27,28 , and so control theory has only been partially applied to the system and the effect of inclusion of viral proteins has not been modelled.
HIV-1 and HCV are still a major cause of infections worldwide. An enormous amount of research has been carried out on HIV-1 leading to a very detailed understanding of the virus and use of host systems 3,6 , while studies of HCV have so far identified fewer proteins-host interactions 7,32 . The primary goal of studying these viruses is to obtain a detailed and coherent understanding of infection and viral replication. Critical to this goal is an understanding of viral-host interactions, the host's immune response and anti-viral mechanisms that play key roles in combating infection.
In this work, we have modelled HIV-1 and HCV host interactions as directed PPI networks both with and without the virus-host interactions included. The signalling network represents biological pathways and is directed. This shows the ordered 'flow' of information through the network and so is capturing some of the dynamics of the network. We examine these networks from a controllability perspective and test whether viruses follow the principles of control theory during 'hijack' of a host system (Fig. 1). This study of viral infection from a controllability perspective permits, for the first time, testing of the applicability of this mathematical framework to intra-cellular networks and discrimination of available control models (maximum matching versus minimum dominating set 18,19 ) using a directed network, consisting of two states: normal/uninfected and infected. Our results lead to novel understanding of the infection mechanism limited to single-state networks, demonstrating the applicability of control theory to the study of infection and validating its use in the study of intra-cellular networks.

Results
Identification of driver nodes in the uninfected network. We first identified the driver nodes without the presence of virus in a large directed signalling network comprised of 6339 nodes and 34814 interactions, data from Vinayagam et al. 27 . In this network the nodes represent proteins and the edges/links represent directed interactions between them (Supplementary Data File 1). The minimum number of driver nodes (N D ) were identified and compared using two established models (i) minimum dominating set (MDS) 19 and (ii) maximum matching (MM) 18 (Fig. 2, Methods). This analysis classified 1398 (22%) of the nodes as driver nodes based on the MDS method, compared to 2282 (36%) of them classified by MM ( Table 1). The ease with which one can control a network is determined by the minimum number of driver nodes (N D ). The lower the number of driver nodes, the easier it is to control the network. Interestingly, the MDS method identified fewer driver nodes compared to MM.
For a given graph, there can be more than one MM or MDS set and hence there can be multiple minimum driver node sets all with the same size N D . Hence we determined the role of a node as a driver node and classified each node as 'critical' if it was present in all driver node sets, 'intermittent' if it was present in some driver node set and 'redundant' if the node was never part of any driver node set ( Fig. 2, Methods). We find 14% of the nodes were critical in MDS, while only 6% of the nodes were classified as critical using MM model. Similarly, the MDS classified more nodes as redundant (67%) compared to MM (42%). On the other hand, the MM model classified most of the nodes as intermittent (53%) compared to MDS (20%) ( Table 1). Since the MDS classified fewer nodes as critical and intermittent, this explains why fewer overall driver nodes are required to control the network with MDS. MDS, thus, better reflects optimal control of the signalling network by identifying fewer driver nodes compared to MM.
The driver nodes identified by MM have low degree, specially the critical nodes have zero in-degree, corresponding to receptors on the cell that extracellular molecules (ligands) interact with in order to convey signals to the cell via the signalling pathways. MM only identifies receptors as the driver nodes because this model of control tends to prioritize as drivers those nodes at the beginning of long linear chains, which are controlled externally. The internal nodes of these chains are controlled internally through maximum matched links, whose coupling are consistent with linear response systems as described in the linear mathematical equations shown in Liu et al. 18 . Molecular signalling systems, however, are inherently non-linear. Thus, the MM model only detects the initial nodes, such as receptors, as the driver nodes. A result observed in other signalling network studies 27,31 .
On the contrary the MDS method does not require structural controllability, here a single integrator node that receives a unique signal from an input link makes the node controllable ( Fig. 1B).
In order to assess the roles of proteins in the context of cell signalling, we classified them as either signalling proteins, kinases, receptors and transcription factors 27 . In total the proteins were classified into 1006 signalling proteins, 545 receptors, 366 kinases and 1150 transcription factors (Supplementary Data File 2). We observed that the critical driver nodes obtained through the MDS model played diverse roles in signalling processes and were highly enriched for receptors, consistent with MM (see Supplementary Material, Tables S1 and S2). MDS also showed enrichment for signalling proteins and kinases; the redundant nodes were enriched for transcription factors while the intermittent nodes showed no distinct enrichment. The dynamics underlying biological systems are non-linear as they interact and respond to external and internal cues robustly. In a signalling network, different proteins interact with each other and influence the signal information. Abnormal signal transduction triggers aberrant biological processes that might result in disease. The MDS model not only identified fewer driver nodes, but also identified driver proteins at different levels in the directed network better reflecting the biological reality of both up-and downstream proteins being vital in the control of pathways.
HIV-1 targeted driver nodes in the uninfected network. Next we looked at the association of virus proteins with driver nodes to identify if they are preferentially targeted by the virus. Out of 6339 proteins in this human directed PPI network, 2529 nodes have been reported to be interacting with HIV-1 from HHID 6 (See Methods). Of the different MDS driver node classes that interact with an HIV-1 protein we observed that, compared to random samples, 50% of the critical driver nodes were significantly interacting with HIV-1 (Z-score = 6.50) ( Table 2), rather than intermittent and redundant nodes, indicating that critical driver nodes are enriched in the virus-host interaction set.
To gain more insights into the role of driver nodes and their preference as viral targets, we propose a novel metric for the MDS model: the control centrality (CC) metric that measures the 'power' of a node in controlling other nodes or the number of controllable nodes in the network. For MDS, control centrality of a node v is    k out + 1, in which k out denotes the node out-degree. We used this method to compute the control centrality metric for critical, intermittent and redundant nodes (Table 3). For comparison, we averaged this measure and observed that the control centrality of critical driver nodes was more than that of intermittent and redundant nodes. Given that critical driver nodes in the uninfected network were predominantly receptors (45%) and that they have higher power based on CC, this explains their preference as targets by HIV-1 for viral entry and modulation of key host functions. While there is an overall decrease in the number of critical driver nodes identified by both of the models, intermittent and redundant nodes exhibited almost no change for the uninfected and infected states (Table 1). This shows that the virus does not target all of the driver nodes as its not necessarily important for the virus to control the entire network. Importantly, it is only for MDS that the majority of HIV-1 nodes are critical driver nodes (12 of 22), while for MM only 1 of 22 have this property (Table 1). These main 12 HIV-1 proteins are: Nef, gp120, Tat, Pr55(gag), caspid, Rev, gp41, Vpr, Vif, retropepsin, Vpu and p51. In addition the critical nodes identified by MM tend to be peripheral in the network as they correspond to the zero in-degree nodes (Fig. 3). Collectively these results confirm that MDS is matching the viral control of the system much more accurately than MM as they identify the viral proteins as driver nodes along with both up-and downstream host proteins in the signalling network.

Identification of driver nodes in the HIV
These results indicate that inclusion of the virus has increased the controllability of the network, i.e., the virus set of interactions increases the total number of interactions facilitating the hijacking of the cell, and controlling the network more efficiently as fewer driver nodes are required for control. Further, the control centrality analysis for the HIV-1 infected network indicates that HIV-1 molecules are an order of magnitude more 'powerful' than host molecules (Table 3).  Table 3. Control Centrality (CC) metric of MDS driver nodes. In the HIV-1 infected network, the MDS model identified 42 nodes as new critical control nodes, while 646 nodes were 'preserved' critical nodes, i.e., they retained their driver node status even after HIV-1 infection (Supplementary Material, Figure S1A, and Supplementary Data File 3). Again, this indicates that the controllability model is capturing the biological signal in terms of the virus mechanism of control.
We also looked at those MDS critical driver nodes that HIV-1 interacts with. 438 critical driver nodes were HIV-1 targets in the uninfected network, which reduced to 271 in the infected network (Supplementary Material, Figure S1B). This marked change is presumably due to the inclusion of the HIV-1 host interactions causing many host molecules to lose their critical driver node status. This indicates the presence of virus is fundamentally changing the controllability of the host system. Interestingly, after infection 248 HIV-1 target proteins still preserved their critical driver node status (Supplementary Material, Figure S1B and Supplementary Data File 3). These 248 critical driver nodes are not just important from a control point of view but are also the nodes that are HIV-1 targets (206 of 2087 virus-host interactions) or directed at HIV-1 (109 of 679 host-virus interactions) and could act as potential drug/intervention targets.
We next investigated the biological properties of the preserved critical driver node interacting with HIV-1 by submitting them to the Reactome Pathway database 33 for enrichment analysis. These critical nodes are enriched for 1341 pathways in the Reactome database. As expected, given the nature of the data, the vast majority were active in signal transduction followed by the immune system. Interestingly, the critical driver nodes seem to play roles in many other cellular subsystems, ranging from cell cycle to developmental biology, disease, programmed cell death and matrix organisation among others, though not as pronounced as in the first two (Table 4). Using the preserved critical proteins, the enriched pathways were analysed to ascertain the top 10 for each set of proteins ( Table 4). The list was curated using the FDR rate, p-value for significance, ratio of hit proteins in pathway and similarity percentage between proteins within the enriched systems. Not surprisingly given HIV-1's life cycle, the top five pathways are specific to the immune system, though most of them also have moderate overlap with RAF/ MAP kinase cascades in signal transduction. The next three pathways exclusively deal with signal transduction, followed by programmed cell death, and the last pathway deals with gene expression, transcription in particular. These analyses ascertain the different pathways that the preserved critical driver nodes are involved in and demonstrate the MDS model is a useful method to identify different intra-cellular pathways that HIV-1 interacts with.
Control robustness: before and after infection. To analyse the robustness to control of the signalling network we classified each node into one of the following three categories 27 : (1) 'indispensable' , i.e., positive control factor, if we have to control more driver nodes in its absence; (2) 'dispensable' , i.e., negative control factor, if we have to control fewer driver nodes in its absence; and (3) 'neutral' control factor if in its absence there is no change in the number of driver nodes to be controlled (Supplementary Data File 2).
Nodes were categorised and compared based on the dispensable classification in both networks using MDS and MM, and the minimum number of the driver nodes (N D ) was calculated. Interestingly, MDS classified a much smaller number of nodes as indispensable (503) or dispensable (770) compared to MM (1330 versus 2347 respectively), with twice the number of neutral nodes identified by MDS compared to MM (Table 5). This demonstrates MDS performs more efficiently than MM. A similar trend was seen in the HIV-1 infected network with a smaller number of nodes classified as indispensable (397 human and 11 HIV) or dispensable (719 human and 3 HIV-1) compared to MM (1331 human and 19 HIV-1 proteins verses 2346 human and 1 HIV-1). In addition, on comparing the difference in node characterisation in both states (normal/uninfected and infected), the indispensable nodes reduced by about 20% to 397 for the MDS model but showed no change for the MM model (Table 5). Again these results indicate MDS is a more appropriate model of control for the infected network than MM as this model captured the change in the number of nodes classified as indispensable before and after infection, reflecting the change in control following virus infection.
Comparing the alternative characterisation of nodes obtained from the MDS model for identifying the overall node control profile, we find that all of the indispensable nodes were critical and all redundant nodes were neutral  in both networks; and have deduced a mathematical proof for this property of MDS (See Methods). From a control point of view, indispensable nodes not only determine the ease of control, but also are the driver nodes, since the ability to efficiently control a system is determined by the minimum number of driver nodes. Further, from a biological perspective, these nodes were frequently targeted by HIV-1 (62%) compared to other node classes (see Supplementary Material, Table S3).

Controllability analysis of HCV human molecular interactome.
We also performed controllability analysis using the MDS model for an HCV infected network to validate our findings with another virus. The HCV proteome is comprised of eleven gene products and has a slightly bigger genome than HIV-1. HCV interaction data was obtained from the HCVpro database 7 . A total of 674 interactions were retrieved and curated by matching host proteins with the directed signalling network 34 . Filtering resulted in 11 HCV gene products forming 389 interactions with the PPI network by targeting 325 proteins (Supplementary Data File 1). The direction of the signal flow was assumed by looking at the trend of the interaction type observed for each HCV gene product in the associated literature found on the HCVpro database and comparing the interaction type to the classification system proposed by 15 . The final network consists of 6,350 nodes and 35,202 edges. HCV interacts with 325 proteins out of which 140 were driver nodes and 77 were classified as critical in the uninfected network. In the HCV infected network, 1390 (22%) of the nodes were characterised as driver nodes, and among these driver nodes, 13% were critical, 20% intermittent and 66% redundant (Table S4). All 11 HCV proteins are among the 852 critical driver nodes. Between uninfected and HCV-infected networks, a higher number of proteins (836 nodes) share critical driver node status, most likely due to the fact that the HCV interacts with fewer proteins (given available data). 16 proteins become critical after HCV infection, while 38 proteins lose their critical node status (Supplementary Material Figure S2A). We analysed 65 preserved critical driver nodes for pathway enrichment (Supplementary Material Figure S2B, Table S5). Many of the top 10 pathways were similar to that obtained for HIV-1, particularly, CD28 co-stimulation, PDGF signalling, signalling by NGF, Fc epsilon receptor signalling and Apoptosis pathway. Regarding signal transduction pathways, the viruses have only NGF and PDGF signalling in common, whilst the remaining signalling pathways seem to be particular to HCV only. Interestingly about 40 proteins are common virus-host interactions shared between HIV-1 and HCV (Supplementary Material, Table S6) and of these 39 are annotated as virus to host and 24 are as host to virus interactions.

Control models on minimum dominating set.
Given that the MDS model more accurately quantifies the viral hijack and control, here we describe the relation between controllability and the minimum dominating set (MDS). Although we only consider directed networks, analogous properties hold for undirected networks if each edge is regarded as two opposite directional edges. We assume in the MDS model that every node in the MDS can assign an arbitrary value to itself and send arbitrary values to all of its outgoing links separately, at any time point. We refer this condition as Condition (#1). Intuitively, this condition means that each node can have its own driver signal, which makes control easy. Although this condition might be too strong as a control model of biological systems, it is reasonable for control of artificial systems such as computer networks 19 . Furthermore, several studies showed that many nodes in an MDS are biological important ones 28,30,35 , which suggests that the MDS may capture some important control properties of biological systems. For linear structural controllability in the sense of MM model 18 , the following theorem was proved by modifying the network structure and applying the theory of Liu et al. 18 , for undirected networks 19 . The same proof can be extended for directed networks. 19 .

Theorem 1. Under Condition (#1), a network with linear dynamics is structurally controllable by selecting the nodes in an MDS as driver nodes
Furthermore, MDS can be applied to controllability analysis of certain kinds of non-linear networks. Here we consider discrete-time systems as a concrete example. Suppose that a given network has n nodes, x 1 , …, x n . Let x i (t) denote the state of node x i at time t. We consider the following dynamics: are input nodes to x i and k depends on i. Let D be a subset of R (D ⊆ R), where R is the set of real numbers. Suppose that each f i satisfies  where this condition is referred to as Condition (#2). Various functions satisfy this condition for appropriate domains D. For example, every linear function depending on all inputs clearly satisfies this condition for D = R.

MDS model MM model
For other examples, the following functions satisfy the condition.
1}, respectively, where α i , β i , and a i j s are non-zero constants and an additional condition of j x ( )( 0) i j ∀ ≠ must be satisfied in the former function. Let U be a set of nodes in an MDS. Let E′ be the set of edges outgoing from U. Under Condition (#1), each x i ∈ U has its own control signal u i (t) and each (x j , x i ) ∈ E′ has its own control signal u j,i (t). Then, the dynamics of the above system is given by . Under this model, the dynamics of the whole system can be represented as x(t + 1) = F(x(t), u(t)), where u(t) is a vector consisting of to u i (t)s and u t ( ) i i , j s. Let d u denote the number of dimensions of u(t). We will omit '(t)' from u(t) when t is not relevant. (#1) and (#2). Then, for any real vectors a = (a 1 , …, a n ) ∈ R n and b = (b 1 , …, b n ) ∈ D n , there exists a d u -dimensional real vector u such that b = F(a, u).

Proposition 1. Suppose that the system satisfies Conditions
This proposition means that we can drive the system from state b to state c in one step by sending adequate signals to MDS nodes and their outgoing edges.

Discussion
We have compared two network states (uninfected and infected) from a controllability perspective and find HIV-1 and HCV are driver agents in the host system. Importantly, the MDS model more effectively captures the dynamics of viral infection than MM. The performance of MDS validates, for the first time, the applicability of the control theory framework for the study of intra-cellular signalling networks and as a model for studying viral use of host cells. Our results clearly demonstrate the way in which a natural control system (virus exploitation of a host cell) can be used to understand the control of information flow in intra-cellular networks. This hints at the possibility of learning how to synthetically control complex biological systems. In terms of understanding infection, the virus is mainly 'driving' the network by exploiting its usual dynamic organisation, i.e., mostly targeting the existing critical driver nodes, with some of the critical driver nodes representing the response to infection. Interestingly, we demonstrate that indispensable nodes, the positive control factors, are always the MDS critical driver nodes. With the addition of the high-powered viral proteins (as measured by our CC analysis), this control is achieved more efficiently with fewer host molecules acting as critical driver nodes. The MDS proteins, specifically the critical driver nodes, are effectively 'central' molecules in terms of information flow in the system 36 being enriched significantly for proteins that are highly-connected and often multi-functional.
An alternative graph theoretical approach for the study of controllability is the feedback vertex set (FVS) for control 37,38 . For a directed or undirected network, an FVS is a subset of nodes whose removal makes the network acyclic. FVS has been applied to analyses of biological networks. It was shown that singleton attractors in a Boolean network can be enumerated by examining states of nodes in the minimum size FVS 39 . Then, FVS was applied to analysis of Boolean models of metabolic networks 40 . For control of biological systems, Fiedler et al. 37 and Mochizuki et al. 38 showed that selecting FVS as a set of driver nodes, the network can be driven to any statically or periodically steady states under a wide range of dynamical models 37,38 . The FVS method was shown to be useful for controlling real biological networks 41 as well as to have some advantages over the maximum matching method 42 . Furthermore, the concepts of critical/redundant/intermittent nodes were also applied to the FVS method 43 . However, computation of the minimum size FVS is a well-known NP-hard problem. Although ILP-based methods are known for computing the minimum size FVS 43,44 , it needs huge computation time for large-scale networks 43 , different from the case of MDS. Since our work focused on large-scale PPI networks, we only compared to MM and MDS methods that discuss general controllability to arbitrary states.
In conclusion, using control theory to analyse a complex signalling network in the context of viral infection has proven to be a useful tool for furthering systems biology research. As the virus is mostly interacting with the host systems to replicate itself, involving the up and down regulating of specific host functions, control theory offers an ideal model for the study of this control of information flow, and we believe opens a new discipline of viral control theory. This has the potential to enhance our ability to interfere with infection, for example, by better understanding of the aberrant functions stimulated as a result of infection will be helpful in terms of treating the side-effects/symptoms of infection. Fully understanding the entanglement of viruses with host systems will be the key to limiting their harmful tendencies.

Methods
Data procurement and network construction. The human directed signalling network was obtained from Vinayagam et al. 27 . It consists of 6,339 proteins and 34,814 interactions. Interaction direction represents potential signal flow between interacting proteins, which was predicted using a Naïve Bayesian Classifier. The classifier assigns a score to each interaction ranging from 0.5 to 1 if there is signal flow, otherwise it assigns a score of 0 34 (Supplementary Data File 1).

HIV-1 infected network.
HIV-1 interactions were obtained from HHID 6 . A total of 15,230 interactions were retrieved and were further curated by ignoring the number of publications, counting each reaction type only once and selecting only those interactions that had shared nodes with the signalling network. In this network 2,529 human proteins interact with HIV, out of which 308 host proteins are involved in virus replication and infectivity based on siRNA/shRNA-mediated knock-down of human genes in different cell lines. The remaining 2221 human proteins are assigned directions from virus to host or host to virus, giving a total of 5,811 interactions. The directions of the HIV-host interactions were assigned using the method provided by MacPherson et al. 13 , where each HHID interaction was assigned a direction based on its interaction type. Direction represents whether the virus protein acts upon the host or vice versa. For example, "Nef activates ACHE" would be given a forward direction as the virus protein acts upon the host, whereas "Nef is activated by ACHE" would be attributed a backward direction, since it is the host protein that activates the virus protein 13  Driver node identification. Minimum dominating set method. Driver nodes are identified by calculating the minimum dominating set for a given network. For a graph G(V, E), where V is set of nodes and E is set of edges, a subset ⊆ S V is called dominating set (DS) if every node in V is either an element of S or is adjacent to an element of S. That is for a directed graph, any node v ∈ V, v ∈ S holds or there is a node u ∈ S such that there exists a directed edge (u, v) ∈ E then we say that v is dominated by u. Then S is dominating set if each node in V is either in S or dominated by some node in S. A minimum dominating set (MDS) is a dominating set with the minimum number of nodes. The MDS forms the driver node set. Since the computation of MDS is NP-hard, we used integer linear programming (ILP) to compute the MDS by assigning 0 − 1 variable to each vertex, where 1 is if v is part of MDS else 0 19 . A graph can have multiple minimum dominating sets and hence multiple minimum driver node sets with same size N D . So, each node is categorized based on its presence in the driver node set. If a node is always present in all MDS, it is a critical driver node, occasionally present in MDS then it is an ordinary/intermittent driver node and if a node is never part of any MDS then it is a redundant/non-driver node. To address critical controllability in large-scale PPI networks analysed in this study, we used a fast algorithm adapted to directed networks that uses efficient graph reduction using heuristics and mathematical propositions 45,46 . The algorithm for the undirected case was used to analyse large protein interaction networks integrating transcriptome 29 .
Maximum matching method. Driver nodes were identified using the controllability package by Liu. et al. 18 . The algorithm converts the network into a bipartite graph of two disjoint sets by splitting each node x i into two nodes (x i + and x i − ) forming sets of out and in nodes and placing an directed edge (x x j i → + − ) if it was in the original graph. The maximum matching for a digraph was then identified using Hopcroft-Karp algorithm. The unmatched nodes are the driver nodes and the minimum number of driver nodes is denoted by N D . As there could be multiple maximum matchings for a digraph, multiple minimum driver node sets exist with the same size N D . Thus, a node is characterised based on its role as driver node into three categories. If a node is always unmatched then it is a critical driver node, if it sometimes matched and unmatched it is an ordinary intermittent driver nodes while if it is always matched it is a redundant/non-driver node. Thus a node may be never matched, occasionally matched and always matched in the in set. Once a matching is found, a matched node say i is picked from the in set and a node j that matched it from the out set is identified. While keeping the current matching, node i with all its edges is temporarily removed. Starting from node j the algorithm checks for an augmenting path that ends to an unmatched node and alternates between unmatched and matched links on that path. If there is no augmenting path the node i needs to be always matched and is therefore redundant. If there is an augmenting path, the node i is replaceable and hence is intermittent. A node is critical if and only if it has in-degree zero 47 . This procedure is repeated for all nodes in the graph.
Calculation of control centrality measure of nodes in minimum dominating set. Control centrality is a measure that determines the power of a node to control its sub-systems or other nodes. Mathematically, for a directed network the control centrality of a node v is k out + 1. The reasoning for this count is an MDS driver node controls itself and the outgoing edges independently 19,45 . Classification of nodes based on their impact to size of driver nodes set in MDS. The  = ′ the node is neutral. We mainly considered undirected networks. However, the proofs can be applied to directed networks too. First, we present an example showing that there exists a node which is critical but is not indispensable. See networks (A) and (B) in Figure S3. Clearly, v 1 is a critical node in the original network of Figure S3A because exclusion of v 1 from MDS increases the size of DS. After removal of v 1 , the size of MDS does not change and thus v 1 is neutral. The same properties holds for a directed network Figure S3B. We also show that in an MDS, every indispensable node is critical and every redundant node is neutral. These properties are different from those obtained by the MM method 27 .

Proposition 1. In MDS, every indispensable node is critical.
Proof. We prove the proposition by contraposition. That is, we show that if v is not critical, v is not indispensable. Let G(V, E) be an undirected graph. Suppose that v ∈ Vis not a critical node in G(V, E). Then there must exist an MDS ⊂ U V such that v ∉ U. Let G′(V′, E′) be the undirected graph obtained by removing v (i.e., deleting v and its connecting edges). Then U is a dominating set of G′(V′, E′) because each node u ∈ V − v was dominated by some node w ∈ U(w ≠ v) in G(V, E). It means that the MDS size of G′(V′, E′) is not larger than that of G(V, E). Therefore, v is not indispensable.  Proposition 2. In MDS, every redundant node is neutral.
Proof. Let G(V, E) be an undirected graph. First we show that any redundant node is not indispensable. Suppose that v is a redundant node in G, which means that v does not appear in any MDS of G. Let G′ be the graph obtained by removing v from G. Let U V ⊂ be an MDS of G. Then v ∉ U holds, which implies that U remains to be a dominating set in G′. Therefore, the size of an MDS does increase after removal of v and thus v is not indispensable. Next we show by contraposition that any redundant node is not dispensable. Let W be an MDS of G. Suppose that v is dispensable. Let G′ be the graph obtained by removing v from G. Since v is dispensable, there must exist an MDS Clearly, U′ is an MDS of G, which implies that v is not redundant. By combining the above two properties, we can see that any redundant node is not indispensable or dispensable. Therefore, the proposition holds. 