Control efficacy of complex networks

Controlling complex networks has become a forefront research area in network science and engineering. Recent efforts have led to theoretical frameworks of controllability to fully control a network through steering a minimum set of driver nodes. However, in realistic situations not every node is accessible or can be externally driven, raising the fundamental issue of control efficacy: if driving signals are applied to an arbitrary subset of nodes, how many other nodes can be controlled? We develop a framework to determine the control efficacy for undirected networks of arbitrary topology. Mathematically, based on non-singular transformation, we prove a theorem to determine rigorously the control efficacy of the network and to identify the nodes that can be controlled for any given driver nodes. Physically, we develop the picture of diffusion that views the control process as a signal diffused from input signals to the set of controllable nodes. The combination of mathematical theory and physical reasoning allows us not only to determine the control efficacy for model complex networks and a large number of empirical networks, but also to uncover phenomena in network control, e.g., hub nodes in general possess lower control centrality than an average node in undirected networks.

single node to control a fraction of nodes in a directed network. Here we shall consider the more general situation where multiple input signals imposed on more than a single node are unable to control the entire network but but only a part of it, for broader classes of complex networks including weighted, undirected networks, with or without local loop structure. The main result is a control efficacy theorem and its proof, which provides a rigorous assessment of the role of an arbitrary set of nodes in partial control of the underlying network. In particular, for any given control inputs, our theorem gives the corresponding set of nodes in the network that are controllable. Our theorem of control efficacy can measure control centrality of a single node in terms of imposing a single external input signal on the node. The control centrality can be generally evaluated in arbitrary undirected networks with any distribution of link weights. We anticipate our finding to be practically significant for situations where the underlying network system is not fully accessible from the standpoint of delivering control signals. For example, for a social network, only a very limited set of nodes may be manipulated for control. For a neuronal network, only a small set of nodes can be perturbed externally. Our theory gives, in these realistic applications, a quantitative picture of what portion of the network may be controlled.

Results
Control efficacy of complex networks. Consider an undirected network of N nodes described by the following linear time invariant (LTI) dynamical system [37][38][39] : where the vector x ≡ (x 1 , … , x N ) T represents the state of all nodes at time t,  = a ( ) ij is an N × N coupling matrix of the network with the element a ij representing the weight of a directed link from node j to node i (a ij = a ji for an undirected network), u(t) is the input signal of m controllers: u = (u 1 , … , u m ) T , and  is the N × m control matrix with  ∈ b ij representing the strength of the input signal u j (t) on node i. According to the classical control theory 40 , the controllability of the system is determined completely and uniquely by the combined matrix ( , ) A B . For an initial state = t x x ( ) 0 , if there exists a control input u(t) that can drive the system to the final state, say x(t 1 ) = 0, within the finite time interval [t 0 , t 1 ], we say that the state x is a controllable state of the system and denote it as x + . The classic Kalman rank condition 40 stipulates that the linear system Eq. (1) is controllable if and only if the has full rank. When the system is not fully controllable or, equivalently, when the state space is not filled entirely with the controllable states x + , there can still be a controllable subspace spanned by the column vectors of the Kalman controllability matrix C A B ( , ) . The dimension of the controllable subspace is the rank of C A B ( , ) : = R rank( ) ( , ) C A B , which characterizes the control efficacy of the system.
There are two difficulties in determining the rank of the controllability matrix: (1) the task is often computationally prohibitive for large networks, and (2) the controllability matrix is typically nearly singular due to the dramatic differences among its elements, making the numerical rank computation inaccurate or even divergent. We are thus led to develop a feasible and effective method to calculate the rank R. The starting point is a non-singular linear transformation. In particular, for an arbitrary matrix  in the system equation [Eq. (1)], there exists 30 a non-singular matrix  such that = ⋅ ⋅ −1 A P J P or , where λ i (i = 1, … , l) are the distinct eigenvalues of  and  λ ( ) i is the Jordan block matrix of  associated with the eigenvalue λ i . For an undirected network, the coupling matrix  is diagonalizable and the matrix  reduces to the diagonal matrix with elements being all the eigenvalues 41 .
Applying the non-singular transformations  = ⋅ − y x 1 and = ⋅ −1 Q P B, we can rewrite Eq. (1) in the following form: where  is a diagonal matrix (see Method). The controllability matrix  for the transformed system is We can verify that the systems Eqs (1) and (2) possess the same degree of controllability in the sense that , i.e., the rank of the controllability matrix of the original system is equal to C D Q rank( ) ( , ) , which can be calculated reliably and accurately. For an arbitrary undirected network, we can prove that rank( ) ( , ) C D Q is determined by the corresponding element values in the transformed control input matrix Q P B = ⋅ −1 associated with the distinct eigenvalues (Method). A schematic illustration of our method to calculate the rank for undirected networks with self loops is presented in Fig. 1, and an explicit example is given in Supplementary Note 1. Our key analytic results are as follows.
Single control input. When there is only a single controller (i.e., when the control input matrix  is a column vector), the task of calculating rank( )  is reduced to counting the corresponding nonzero elements in the matrix . Letting the element corresponding to the eigenvalue λ i be λ For each distinct, non-degenerate eigenvalue, the corresponding nonzero elements in  contribute equally to the value of rank( )  . For a degenerate eigenvalue, if there are no corresponding nonzero elements in , the contribution of this eigenvalue to  rank( ) is zero. If the corresponding elements in  are nonzero, the degenerate eigenvalue contributes one to rank( )  (see Method).

Multiple control inputs.
When there are m control inputs, the matrix  has the dimension N × m. In this case, the control efficacy is determined by the sum of the rank values of the sub-block matrices composed of the corresponding rows in the transformed control input matrix  for each distinct eigenvalue (see Method). We have i l 1 i Control Centrality. Given a complex network, it is often necessary to quantify the relative importance of the nodes with respect to a specific function. For this purpose, various kinds of centrality measures 23 were proposed in the past, such as the degree centrality, the closeness centrality 42 , the betweenness centrality 43 , the eigenvector centrality 44,45 , and PageRank 46 . Control centrality has been defined in directed networks for quantifying the relative importance of nodes in effecting control 36 , i.e., if an external driver signal is applied to a node in a directed network, how many other nodes can be controlled? Our task is to extend the definition of control centrality to undirected networks and offer a control centrality measure. Specifically, For node i in an undirected network, its control centrality is nothing but the dimension of the controllable subspace. When a single driving signal is applied to i, the corresponding control input matrix  is effectively reduced to a vector b (i) with a single non-zero element. For convenience, we set this element to be unity, let R (i) be the rank of the controllability matrix, and rewrite the system as The adjacency matrix  and the control input matrix  for a simple undirected network with self-loops. Each colored lattice point in  represents an element, where the colors from white to black correspond to element values from zero to one, respectively. A similar color scheme applies to the matrix . (b) Through a nonsingular matrix transformation, the system ( , ) A B is converted into the equivalent system D Q ( , ), where  is a diagonal matrix. Distinct eigenvalues of  correspond to different subblocks marked with different colors. The rank of the controllability matrix for the transformed system ( ) ( , ) C D Q is equal to the sum of the rank values of the corresponding subblocks in the transformed control input matrix , which is identical to the rank of the controllability matrix of the original system A B ( , ).
Scientific RepoRts | 6:28037 | DOI: 10.1038/srep28037 where u is the strength of the input signal. The value of R (i) can be used to characterize node i's ability to control the whole network. In Method, we provide a proof for the following inequality, which gives the upper bound of R (i) : where num(λ) is the number of the distinct eigenvalues of the matrix . If R (i) = N, then node i alone can control the whole system. However, for R (i) = 1, node i is not able to control any other node in the networks. A value of R (i) between 1 and N gives the dimension of the controllable subspace of node i. To compare the control centrality in networks with different size, the normalized control centrality r (i) can be defined as the ratio of R (i) to the network size N. Then the average value, maximum and minimum values of r (i) are For the networks with random weights, the control centrality and the normalized control centrality can be denoted by R i W ( ) and r i W ( ) respectively. We employ the criteria of control efficacy to explore undirected chains. To our surprise, complex phenomena associated with prime numbers emerge in the extremely simple regular network. Figure 2(a) shows, for an undirected chain graph of size N = 155 with identical link weights, that the values of the nodal control centrality are distributed symmetrically. For certain node (e.g., 1 or 155), the chain is fully controllable with a single input signal. For majority of the nodes, the control centrality measure is less than N. In fact, we can show analytically that the control centrality value of each node is given by (see Supplementary Note 2) where GCD(m, n) is the greatest common divisor of the positive integers m and n. Figure 2(b) shows, the distribution of the control centrality values versus the network size N. Two clusters of periodic behavior of R (i) present as N is increased. The periodic phenomena can be verified in terms of Eq. (10).
The combination of the two clusters of periodic behavior lead to the emergence of complex control centrality in undirected chains. Let num(R) be the number of the distinct control centrality values in a chain with a certain size. The dependence of num(R) on N is shown in Fig. 2(c). Analytically, num(R) can be determined from the following equation where, for fixed N, num(R) is the total number of all integer solutions of f a and f b that satisfy f a · f b = N + 1 (see Supplementary Note 2): Thus, the solution of num(R) is related with prime numbers, accounting for the complex result of num(R) in a simple chain structure. Specifically, if N + 1 is a prime number, there is only one integer solution of Eq. For an undirected chain graph with random weights, the control centrality is simpler than that of a directed chain graph. We can as well offer theoretical results (see Supplementary Note 2). Specifically, when N = 2n + 1 is odd, we have When N = 2n is even, we have The results are graphically shown in Fig. 2(d), which differs from the results in Fig. 2(c) with identical weights. The fact that random weights eliminates lots of linear correlations in the network matrix accounts for the simplification of the control centrality.
For undirected regular graphs with identical weights, the eigenvalues can be calculated analytically 10,47,48 , so can be R (i) , as listed in Table 1. We see that the control centrality of the star network is either 2 or 3, and that of the fully connected network is 2, regardless of the network size. For a ring network, almost half of the network can be controlled by a single input. The results for chain networks are relatively more complicated, for which the control centrality values are symmetrically distributed, which can be obtained from Eq. (10) (see Supplementary Note 2). In general, the control centrality R i W ( ) of the undirected regular graphs with random weights is simpler than that of the undirected regular graphs with identical weights, because of the elimination of linear correlation by random weights (see Table 1). Figure 3(a) shows the control centrality versus a key structural parameter, the connecting probability, for undirected Erdös-Rényi (ER) random networks with identical link weights. We see that, regardless of the network size, in the regime of small values of the connecting probability p, the value of r increases monotonically with p, indicating that making the network more dense can on average enhance the control efficacy. However, in the extreme regime where p is close to unity (e.g., exceeding 0.99), r begins to decrease toward zero due to the effect of identical link weights. The control centrality of ER random networks associated with random link weights differs significantly when the network becomes very dense. Specifically, Fig. 3(a) show that r is always one, as p

Network
Eigenvalues num(λ) approaches unity. The difference is as well attributed to the elimination of linear correlation by random weights, but such effect of random is negligible in sparse ER network. Figure 3(b) shows, for Barabási-Albert (BA) scale free networks, r versus half of the average degree m = 〈 k〉 /2, where we see that r increases rapidly toward unity as 〈 k〉 is increased, regardless of the number of new links associated with the addition of a new node into the network during its growing process. We see that, qualitatively similar to ER networks, making a scale free network more densely connected can enhance its control efficacy. We also see that because of the general sparsity of the BA network, random link weights have negligible effect on r for m ≤ 2 compared to identical link weights.
We characterize the control efficacy for a number of real world (empirical) networks. The results are listed in Table 2. (For the empirical networks with random weights, its corresponding control centrality are slightly higher than the origin network topology.) An issue is whether the hub nodes carry a stronger control centrality in undirected networks. We find that the average control centrality of hub nodes is generally smaller than that of the other nodes in undirected networks, which is consistent with the finding that driver nodes avoid hubs in directed networks 49 . To demonstrate this counterintuitive phenomenon, we divide the nodes into three groups in terms of their degrees: low, medium and high. Figure 4(a) shows, for model ER and BA networks, that the control centrality is generally higher for low-degree nodes than that for the hubs. Figure 4(b) shows the mean degree of the nodes with the maximum control centrality versus the mean degree 〈 k〉 of all nodes, for each empirical network in Table 2. We see that the values of k r max , the degree value at which maximum control efficacy is achieved, are significantly smaller than or comparable to 〈 k〉 , indicating the nodes with large values of control centrality are generally not hubs. To provide further evidence for the determining role of nodal degree in the control efficacy, we randomize each empirical network by converting it into an ER random network, keeping the network size N and its diameter L unchanged. As shown in Fig. 4(c), for some networks there is no correlation between the values of r for the original and randomized networks, indicating that the full randomization process has effectively eliminated any topological features of the original network that determine the control efficacy. We then apply a   degree-preserving procedure 4,50,51 that randomly rewires the links but keeps the degree of each node unchanged. Contrary to the case of full randomization [ Fig. 4(c)], when the nodal degrees are preserved, there is little change in the value of r, indicating strongly that degree is the key characteristic that determines the control efficacy.

Identification of controllable nodes.
For an arbitrary undirected network, given a control input matrix, we can obtain the dimension of the controllable subspace by calculating the control efficacy. An issue of practical importance is how to identify the actual set of nodes that can be controllable, i.e., the set of controllable nodes for a given control input configuration. Here, we offer a general method based on network diffusion dynamics to address this issue. Specifically, note that the N × Nm controllability matrix C can be expressed iteratively as For the N × N matrix  s , between any pair of nodes (e.g., i and j), there exists a path of length s: Regarding the nonzero elements of  as sources of diffusion, the controllability matrix C A B ( , ) can be viewed as being formed by a diffusion process from the nodes with control matrix  to all the controllable nodes in the network in (N − 1) time steps, generating the corresponding diffusion mode for each column of C A B ( , ) . At time step s, the matrix product ⋅ − s 1 A B is a linear combination of the mode at the s step and the modes from all prior forward steps. The rank of ( , ) C A B is determined by the number of the distinct modes of diffusion. In general, unless C A B ( , ) has a full rank, there is interdependence among its columns. Using this fact, we can prove that, for fixed  = r rank( ) , the distinct diffusion modes are fully contained in the former r iteration steps (Supplementary Note 3). Consequently, we can implement the following elementary column transformation on the controllability matrix to obtain  so that the controllable nodes correspond to the maximal linearly independent group of the rows. To illustrate the method explicitly, we present a concrete example, as shown in Fig. 5, where the diffusion process can be seen by noting the newly appeared diffusion mode (color marked) at each time step. We next perform the elementary column transformation on  to obtain its column canonical form that reveals the linear dependence among the rows, where the rows that are linearly independent of other rows correspond to the controllable nodes. Note that the configuration of drivers is not unique as it depends on the order with which the elementary transform is implemented. While there are many possible choices of the linearly dependent rows, the number of controllable nodes is fixed and determined solely by = r rank( )  . Our procedure of finding the driver nodes is rigorous, as guaranteed by our theory of control efficacy and the column canonical form associated with the matrix rank.

Discussion
For complex networks in the real world, from the standpoint of control not every node is externally accessible. Often, control signals can be applied to a limited set of nodes or just a few nodes. If the network structure is known, theoretically it is possible to determine a specific set of nodes to apply the control signals, e.g., through identification of maximum matching in SCT. However, the set of control nodes so determined may not overlap with the set of externally accessible nodes. Under these circumstances it is not possible to control the whole network. Nonetheless, there are situations where full control of the entire network is not necessary. A fundamental question is then, if control is applied to a few nodes or even a single node, what fraction of the network can be controlled? That is, for a complex network of arbitrary structure, what is the control efficacy or, equivalently, the dimension of the controllable subspace of the underlying network? Figure 5. Illustration of our method to identify the set of controllable nodes. Take as an example a simple undirected network with self-loops. (a) A step by step illustration of the diffusion process over the network from the driver node, where a control signal is applied at node 1. The newly appeared mode at each step is marked with different colors. At step 7, the iteration column vector A B 6 can be expressed as a linear combination of the former modes, so the corresponding value of the control efficacy is 6. (b) For the controllability matrix C A B ( , ) , its column canonical form generated by the elementary column transformation. For a fixed value of the control efficacy measure r, the column canonical form can be performed only for r iterations of the column vector. (c) There is a one-to-one correspondence between the controllable nodes and the rows that are linearly dependent upon others in the column canonical form. In the specific case shown, there are four distinct configurations of the controllable nodes (marked in blue). Nevertheless, the number of controllable nodes is fixed and solely determined by Scientific RepoRts | 6:28037 | DOI: 10.1038/srep28037 The issue of control efficacy (or control centrality if control is applied at a single node) was addressed in a previous work 36 but for directed networks. The contribution of the present paper is a rigorous framework based on the theory of exact controllability 10,14 to determine, for undirected complex networks of arbitrary structure (regular, random, or scale-free, weighted or unweighted, with or without self loops, etc.), their control efficacy. From the mathematical control theory, the control efficacy is given by the rank of the Kalman controllability matrix, the determination of which is computationally prohibitive for large networks. Utilizing the non-singular similarity transformation, we discovered a mathematical theorem that enables us to convert rank calculation into a counting problem in terms of the block matrices associated with the distinct eigenvalues of the network coupling matrix. The framework allows us to determine, rigorously, the control efficacy of not only model complex networks, but also a large number of real world networks. Physically, we developed the picture of diffusion, i.e., to view the control process as a signal originated from the driver node and diffused through the controllable subnetwork. The powerful combination of rigorous mathematical theorem and physical reasoning leads to the discovery of striking phenomena in controlling complex networks. For example, more densely connected networks in general have stronger control efficacy, regardless of their topology, and nodal degree is key to control efficacy. However, hub nodes in general have low values of control centrality as compared with majority of the nodes in the network.
From the perspective of fundamental science, our framework of control efficacy represents an important step forward in understanding, quantitatively, the controllability of complex networks at the detailed level of individual nodes. (Extension of our control efficacy framework to analyzing the efficacy of observability of complex networks is straightforward -see Supplementary Note 5). Practically, our theory provides a method and algorithms that can be used to identify efficiently the nodes that possess the strongest possible control centrality. This can have significant applications. For example, given a social network, our framework allows the nodes with the largest control efficacy, i.e., the nodes that can control the largest possible fraction of the network, to be identified. Similarly, for a complex infrastructure network, we can determine a small set of critical nodes to obtain maximum possible control of the network to achieve the highest possible energy efficiency.
Despite our initial success as reported in the present paper, many outstanding issues remain. For example, in real world networks the estimated link weights are not exactly known, which will lead to errors in determining the control efficacy. A mathematical uncertainty or error analysis is needed, but at the present a rigorous treatment seems difficult. Also, our framework of control efficacy relies on complete knowledge of the network structure.
What if there is missing information about nodes, links and/or their weights? -at the present we do not have a theory to deal with this practically important issue. Last but not least, our entire theory is based on hypothesizing the underlying complex network as a linear and time invariant dynamical systems. Although much effort has been dedicated to controlling complex networks with nonlinear dynamics 52-54 , a general approach for measuring control efficacy remains to be an outstanding problem 55 . The main challenge stems from the fact that the control efficacy is determined by both network structure and dynamics, in contrast to the network governed by linear dynamics. Much further effort is called for in the extremely rapidly developing field of controlling complex networks.

Methods
For an undirected network with arbitrary link weights [Eq. (1)], the matrix  is symmetric and so is diagonalizable: there exists an orthogonal matrix  and a diagonal matrix  such that = ⋅ ⋅ = ⋅ ⋅ −

T 1
A P D P P D P with , where λ i 's (i = 1, … , l) are the distinct eigenvalues of  and  λ i is the diagonal block matrix of  associated with λ i . The size of λ i  is given by the multiplicity of λ i . We write Single control input. When the system is subject to a single control input, the control matrix  and the is an N × 1 column vector. If  has zero element, the corresponding row in the the transformed Kalman matrix ⋅ −1 ( , ) P C A B is zero. For the nonzero elements in , the corresponding eigenvalues can be of two types.
(i) Case I: Distinct eigenvalues. An illustrative example for this case is shown in Fig. 1(a), where the values of q 1 , q 2 and q 3 are assumed to be nonzero, corresponding to the eigenvalues λ 1 , λ 2 and λ 3 , respectively. The corresponding row of (q 1 , q 2 , q 3 ) in ⋅ −1 ( , ) P C A B is a Vandermonde matrix, whose rows are linearly independent. In this case, the rank of the controllability matrix is nothing but the number of the nonzero elements corresponding to distinct eigenvalues of .
(ii) Case II: Degenerate eigenvalues. When there is eigenvalue multiplicity, the rows of P C A B ⋅ −1 ( , ) are linearly dependent upon each other. An example of the controllability matrix is Scientific RepoRts | 6:28037 | DOI: 10.1038/srep28037  1  2  1  1 1   2  1 2  1  2  2  1 2   3  2 3  2  2  3  3  3   4  3 4  3  2  4  4  4   6  3 6  3  2  6  3  6   2   2 where the rows of q 4 and q 6 are linearly dependent. If the linearly dependent rows in  have nonzero elements, they together contribute one to the rank of the controllability matrix. For a single control input, the calculation of the rank of ( , ) C A B is thus equivalent to counting the corresponding nonzero elements in . We have Since the control matrix  has a single column, the control centrality of the input node is given by i ( ) where the num(λ) is the number of the distinct eigenvalues of .

Multiple control inputs.
With multiple control input signals, the transformed control matrix  has the dimensional N × m. To illustrate our method of rank calculation explicitly, we consider the first two columns in . The matrices  and  can be written as

P C A B
For the case where the control matrix  has distinct eigenvalues, if certain rows of the transformed control matrix  contain nonzero elements, the corresponding rows of the transformed Kalman matrix P C ⋅ −1 must be linearly independent of each other. The matrix P C A B ⋅ −1 ( , ) can be organized into a block matrix form, where each block corresponds to one distinct eigenvalue and its dimension is the multiplicity of the eigenvalue. The rank of such a matrix is the sum of the rank values of the sub-block matrices. In particular, letting the algebraic multiplicity of the eigenvalue λ 1 be l 1 , we have  In general, for multiple control inputs, the control efficacy = ⋅ − rank( ) rank( ) 1 ( , ) C P C A B is the sum of the rank values of the sub-block matrices: