Introduction

One fundamental challenge in network science is to understand the impact of the structural property on its functionality. In the last decade considerable advances have been made, particularly on the structural transitions that can bring big impacts on numerous dynamical processes on networks. A lot have been discovered, such as the application of k-core percolation1,2,3,4 and giant components5,6,7,8,9 in analysis of epidemic and information spreading on socio-technical systems10,11,12, the use of dominating set in disease outbreak detection, control and social influence propagation13,14,15,16,17 and the fragility in many real networks caused by their multilayer connections18,19,20. Nevertheless, a lot remains unknown. Core percolation, as one example, is a structural transition in complex networks with a long history. The core represents the reminder of the greedy leaf removal (GLR) procedure in a network21,22. While core percolation has applications in several important problems such as conductor-insulator transitions, maximum matching and minimum vertex cover problem21,23,24,25,26, the physical importance of the core and how the core structure would affect the dynamics of a network are not entirely understood.

Recent advances have brought us an analytical framework of core percolation in uncorrelated random networks with arbitrary degree distribution27. The tools introduced not only allow us to predict the emergence of the core but also calculate the expected core size. These findings reveal some interesting interplay between the core and controllability of complex networks. For example, it is observed that the sudden change in controllability robustness and the emergence of the two control modes coincide with the formation of the core28,29, suggesting a strong connection between the two topics. Here we analytically explore this connection. The rest of the paper is organized as follows. In the first two sub-sessions, we briefly review the analytical framework of the core percolation and the basic concepts of network controllability. In the third sub-session, we demonstrate the role of the core in control. The core structure determines not only the control mode but also the stability of the control mode under structure perturbations. Finally, we study the controllability robustness in the fourth sub-session, the ability to maintain the control under nodes' failures. By applying the tools in core percolation, we obtain an analytical expression of the fraction of nodes playing different roles in sustaining the controllability of the network.

Results

Analytical Framework of Core Percolation

In core percolation leaf nodes and their neighbors are taken off iteratively from the network according to GLR procedure21,22. Specifically, a node with degree one is randomly chosen. This node and its neighbor are removed with all their links. Nodes that becomes isolated are also removed. This procedure is repeated until no node with degree one is left and a core emerges as a compact cluster of nodes left. To systematically study the core percolation, two categories of removable nodes are introduced27: α removable as nodes that can become isolated without directly removing themselves (e.g. node 3 and 4 in Fig. 1), β removable as nodes that can become a neighbor of a leaf (e.g. node 3+, 5+ and 5 in Fig. 1). The category of a node i in an arbitrary graph can be determined by the categories of the neighbor nodes in the subgraph where node i and all its links are excluded: node i is α removable in if all its neighbor nodes are β removable in and β removable in if at least one neighbor is α removable in . Correspondingly the fraction of α and β nodes can be expressed as

where P(k) is the network's degree distribution, is the generating function of P(k) and , are respectively the probability that we find an α and β nodes at the end of a random chosen link and in the absence of that link (in ). These two parameters have been found as

in which Q(k) = kP(k)/〈k〉 is the excess degree distribution, 〈k〉 is the average degree and . On the basis of Eq.(2), can be solved as the smallest root of the function f(x) ≡ A(A(x)) − x, which can be further used to calculate . The parameters and can then be applied in determining the fraction of α and β nodes outlined in Eq.(1).

Figure 1
figure 1

A bipartite graph with a core (highlighted).

Node 1+, 2+, 1 and 2 are the core nodes. Node 3 and 4 are α removable because they can become isolated without directly removing themselves (e.g. when node 3 is chosen in the GLR procedure, node 3 and 3+ with all their links are removed and node 4 becomes isolated). Node 3+, 5+ and 5 are β removable as they can become a neighbor of a leaf.

To generalize core percolation to a directed network, different removal procedures are introduced27,30. Here we adopt the one that converts the direct network to a bipartite graph, which splits a node i in directed network into two nodes i+ (upper) and i (lower) (Fig. 2a, b). A directed link from node i to j becomes a connection from node i+ to node j in the bipartite representation. The out- and in-degree distribution become degree distribution P+(k) and P(k) in the bipartite graph, respectively. Correspondingly Eq.(2) becomes

and is the smallest root of function where . The fraction of α and β nodes in + and − set is

It is noteworthy that in sparse networks without cores, all nodes are either α or β removable therefore . As the network becomes denser, a core emerges after the average degree exceeds a critical value. At the formation of the core, different categories of removable nodes and the unremovable core nodes appear, making . Nevertheless, parameters and used to find and can also be used to find the fraction of core nodes in the + and − set as

Figure 2
figure 2

(a) An example of a directed network with five nodes. (b) The bipartite representation of the directed network in (a) where a node i in the directed network is split into two nodes i+ (upper) and i (lower). A directed link from node 1 to node 2 in (a) corresponds to a connection between node 1+ and node 2. (c) Three different control configurations to control the network in (a), indicating different participations of nodes in control: node 1 is critical as it is a driver node in all cases, node 2 is redundant because it does not participate in any of the driver node sets and nodes 3, 4 and 5 are intermittent as they are driver nodes in some situations but not all. (d–f) The control of the network in (a) after removing one node and all its links. (e) Node 1 is structurally redundant as its removal does not change the number of driver node. (d) Node 2 is structurally critical because more driver nodes are needed in its absence. (f) The number of driver nodes decreases by 1 without node 3, therefore node 3 is structurally ordinary.

Controllability of Complex Networks

The controllability of complex systems is a fundamental challenge of contemporary science that draws considerable interests in multidisciplinary fields28,29,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50. According to control theory51,52, the dynamic process of controlling a linear time-invariant system can be described by the equation dx(t)/dt = Ax(t) + Bu(t), where the state vector x(t) = (x1(t), …, xN(t))T captures the state of a system of N components at time t. The N × N state matrix A corresponds to the internal interactions of the system. The input matrix B is an N × M matrix indicating how the M external signals u(t) = (u1(t), …, uM(t))T are exerted to the system to drive it from any initial state to any desired final state within finite time. Recently an efficient methodology has been introduced to identify the minimum driver node set (MDS), the smallest set of nodes whose time-dependent control yields the control over the whole system28. The procedure is to convert a directed network into a bipartite graph (Fig. 2a, b) and find the maximum matching of bipartite graph53. The minimum driver nodes are the unmatched nodes in the − set. If a perfect matching exists and all nodes in the − set are matched, one input signal would be sufficient to drive the system. In this case the number of driver node is one.

The methodology proposed indicates the existence of multiple MDSs, hence a node does not necessarily participate in all MDSs (Fig. 2c)32. Accordingly a node can be categorized by its participation in control: critical if it participates in all MDSs, redundant if it is not included in any MDSs and intermittent if it is in some MDSs but not all. The fraction of critical nodes nc is purely determined by the in-degree distribution as nc = Pin(0). However, the fraction of redundant nodes nr displays a bifurcation after some critical average degree 〈kc〉: networks with identical degree distribution and average degree can have very high or low value of nr (Fig. 3, 4). Such a bimodality feature leads to the two distinct control modes, with significant difference in the total number of MDS choices. While the two control modes coexist with equal probability in dense networks with identical in- and out-degree distributions, networks with different in- and out-degree distribution may have one mode dominate or follow only control mode29.

Figure 3
figure 3

(a–c) Dependency between the core and the control modes for networks with , γout = 3, γin = 2.85 and different sizes. The bifurcation emerges at the formation of the core. The solid lines in (a) and (b) correspond to Eq.(6) with the expected core structure . The dashed lines are based on Eq.(6) when , as the minority of network realizations different from expectation. (c) The expected fraction of core nodes on the + and − set (and ) based on Eq.(5). The difference between and is very small (≈ 0.005) and the two curves of and almost overlap. (d,e) The distribution of nr of an ensemble of network realizations with , γout = 3, γin = 2.85, 〈k〉 = 10 and different sizes. As the networks size increases, the core structure is closer to the expectation, eliminating the lower branch of the bifurcation curve.

Figure 4
figure 4

(a), (b) Two networks on different control modes, with high (a) and low (b) fraction of redundant nodes. (c), (d) The core of the directed network in (a) and (b), respectively. Network with a high fraction of redundant nodes (a) yields a core with (c) and vice versa.

Core Percolation and the Bimodality in Control

The emergence and the absence of bimodality can be best explained using the knowledge of core percolation. Indeed, β nodes, as the neighbor of a leaf node, are always matched in all possible maximum matching configurations and α nodes are not. Therefore β nodes in the − set are redundant nodes and α nodes in the − set are not redundant. Before the formation of the core, a node is either α or β node, hence the fraction of redundant nodes where and can be found by Eqs.(3) and (4).

When core emerges, however, α and β nodes are not the only nodes in the network and nr depends on the shape of the core. When there are fewer core nodes in the + set than in the − set , all the core nodes in the − set are not always matched. This corresponds to the lower branch of the bifurcation with as only the β nodes in the − set are redundant (Fig. 4b, d). When , however, all the core nodes in the − set are always matched, giving rise to a large number of always matched nodes. In this case, the network is on the upper branch of the bifurcation curve and as only α nodes in the − set are not redundant (Fig. 4a, c). In summary, depending on the core structure, nr can be high or low as

Eq.(6) confirms the emergence of bifurcation at the formation of the core (Fig. 3). More importantly, it explains the condition for the coexistence of the two control modes. When the in- and out-degree distribution are the same, the expected and are identical, giving the same probability of and (the probability that is negligible in large systems). Hence the network is equally likely to be on either of the control mode. When the in- and out-degrees are asymmetric, however, Eq.(5) gives different and values27, implying that in sufficiently large networks where the mean-field equation Eq.(5) applies, only one type of the core structure is allowed. Consequently any tiny differences between the in- and out-degree distribution eliminates the bifurcation of nr and forces the large network to follow one or the other control mode29. Yet, in small systems the mean-field equation may not hold and the core structure different from the expectation can emerge due to random fluctuations. As an example, networks with , γout = 3 and γin = 2.85 have very small difference between and (Fig. 3c). While is expected, networks with size N = 1000 also generate cores with and the two modes coexists with one mode that dominates in probability (Fig. 3b, d). As the network size increases (e.g.N = 5000), the expected difference between and is more significant and networks that fall into the other branch become very rare (Fig. 3a, e). Eventually the gap between and exceeds fluctuations and one branch will vanish as the system size increases, effectively forcing the system into only one control mode.

The above results reveal the importance of core as a fundamental structure that controls the two control modes. This feature allows us to switch the control modes by changing the core balance. The most intuitive way to induce such a switch is to reverse the direction of all links in the network. A network originally with that is on the upper branch of the bifurcation will have in the transpose network staying on the lower branch and vice versa. While the change of all links' directions is too drastic, sometimes the switch of the control mode can be induced by only local changes, occasionally as little as flipping the direction of a single well chosen link29. Our chance of finding such a link depends on the core structure: if and is close, the network is sensitive to link changes. Indeed in a sample of 1000 realizations of Erdős-Rényi networks5, all networks that are able to switch the control modes via a link's reversal have very close core size (Fig. 5a, b). As the core size depends on the network size and degree distribution, the control mode is more stable in large networks and networks with asymmetric in- and out-degrees, in which more structural changes are required to change the control mode. Small networks with identical in- and out-degree distribution, however, are more likely to have close and therefore sensitive to structural perturbations (Fig. 5c, d).

Figure 5
figure 5

(a) Distribution of in 1000 realizations of Erdős-Rényi networks with size N = 500 and 〈k〉 = 6. (b) Among these samples in (a), the distribution of conditioning on the existence of a single link whose reversal causes a switch of control mode. The networks sensitive to structural perturbations are typically with very close number of core nodes on each set. (c) The chance to switch the control mode by flipping one link's direction decreases with network size. The statistics are based on 1000 realizations of Erdős-Rényi networks with 〈k〉 = 6 and different size N. (d) The chance to switch the control mode by flipping one link's direction decreases with degree asymmetry. The statistics are based on 1000 realizations of scale-free networks with , 〈k〉 = 12 and N = 500.

Core Percolation and the Robustness of Control

The control mode can be sensitive to structural changes, but the network controllability is relatively robust. Removing a node or link can only change the number of driver node ND by 141,42,54. To measure the importance of nodes in sustaining the controllability over the network, a different set of node category has been introduced28. A node is structurally critical if the number of driver nodes has to be increased to maintain full control in its absence (ND increases by 1), a node is structurally redundant if it can be removed without affecting the current set of driver nodes (ND does not change) and a node is structurally ordinary if it is neither structurally critical or structurally redundant (ND decreases by 1) (Fig. 2d). While the fraction of nodes in each category can be numerically studied, the analytical approach is missing.

Here we use the tools in core percolation to derive the analytical expression of the fraction of nodes in each category. In the controllability problem, a node has dual roles. On one hand, a node's dynamics is controlled via internal or external channels pointed to it. On the other hand, a node serves as means to control other neighboring nodes it points to. Such dual roles can be best seen in a network's bipartite representation. The nodes in the + set can be considered as “superiors” that influence others internally and the nodes in the − set are “subordinates” that need to be controlled. Accordingly the consequence of a node's removal relies on the node's role in both + and − set. If a node in the − set is always matched, its removal will not change the number of unmatched nodes in the − set. Otherwise, the number of unmatched nodes in the − set will decrease by 1. Similarly if a node in the + set is always matched, it matches a node in the − set in all matchings. Removing this node increases the number of unmatched nodes in the − set by 1. Otherwise, the number of unmatched nodes in the − set will not change as there exits alternative configurations matching the same number of nodes. The impact of node i's removal (node i+ and node i in the bipartite graph) is summarized in Table 1. As the number of driver nodes equals the number of unmatched nodes in the − set, we readily have the relationship between a node's structural role in control and its matching status. A node i in a directed network is structurally critical if in its bipartite representation both nodes i and i+ are always matched. A node i is structurally ordinary if neither node i nor i+ is always matched. Otherwise node i is structurally redundant.

Table 1 Impact of node i's removal on the number of unmatched nodes in the – set. Node i in the directed network corresponds to node i+ and i in its bipartite representation. Number 0 means no change and ±1 means the number of unmatched nodes in the – set increases and decreases by 1 respectively

The expression of always matched nodes in the − set (nr) is obtained in Eq.(6). With the symmetry between the + and − sets, we can find the expression of structurally critical, ordinary and redundant nodes as

and

where nsc, nso and nsr are fractions of structurally critical, structurally ordinary and structurally redundant nodes.

The results of Eqs.(7–9) are numerically tested (Fig. 6a). There is a sudden change of nsc, nso and nsr on the formation of the core (〈kc〉 = 2e in the Erdős-Rényi network22,30) accompanied by the change of fraction of α and β nodes. Note that whether a node is a driver node depends on the matching in the − set only. The always matched nodes in the − set can be high or low depending on the two structures of the core. This generates the bifurcation feature where nodes' participations in control differ dramatically. When a node's role in controllability robustness is concerned, however, it depends on matching in both the − and + sets. Therefore nsc, nso or nsr do not show a bifurcation feature. Indeed, when the in- and out-degree distribution are the same, and are equally likely, but β+ = β and α+ = α, giving rise to a single value of nsc, nso and nsr regardless of the shape of the core. When the in- and out-degree distribution are different, only one condition ( or ) is allowed in infinite networks, generating only one value of nsc, nso and nsr. Small networks with different in- and out-degree distribution do have cores with or (Fig. 3). But the difference in the expected value of nsc, nso and nsr in the two cases are typically small (Fig. 6b). As the result no obvious bifurcation curve can be observed with the random noise due to the small system size.

Figure 6
figure 6

(a) nsc, nso and nsr in Erdős-Rényi networks with N = 10000 and different average degree 〈k〉. The solid lines correspond to Eqs.(7–9). There is a sudden change at 〈k〉 = 2e when the core emerges. (b) Analytical results of nsc, nso and nsr based on Eqs.(7–9) in networks with , γout = 3, γin = 2.85 and different 〈k〉. The solid lines correspond to the expected core structures () and the dashed lines are the results for core structures different from the expectation (). The difference of the results between the two core structures are small. Therefore even though in small systems both core structures are possible, there is no bifurcation observed.

Discussion

In summary, we apply the recent advances in core percolation to the controllability of complex networks. Historically, core percolation has been discovered to be related to a wide range of important problems in complex networks. Here we add a new connection to the network controllability. In particular, we reveal the importance of core as a fundamental structure that generates the two control modes. The core structure determines the control mode that is related to the participation of nodes in control under the minimum driver nodes. The stability of the control mode under structural perturbations also relies on the balance of the core. Moreover, we derive the analytical expression of the fraction of nodes with different controllability robustness. The expression obtained demonstrates dependency on the structure of the core.

The results presented raise several intriguing questions awaiting answers. For example, it is found that switching the balance of the core is crucial in changing the control modes. However, an efficient algorithm to identify a series of structure variations needed to change the core size is still missing. We lack the method to change the control mode in arbitrary networks. The calculation in controllability robustness is based on uncorrelated in- and out-degree distribution. The effects of higher order correlations require further investigations. Finally, the analytical framework of core percolation is limited to model networks. In many real systems, core is quite different from the analytical expectations. For example, many real networks have multiple pieces of cores which is not observed in model networks, many real networks containing cores are not dense enough to yield cores in theory and for those dense networks, the analytical prediction of the core size can be off. The theoretical work in this paper can not be simply generalized to real networks without proper modifications. Such studies are left for future work.

Methods

Generating a scale free network

The scale-free networks55 analyzed are generated via the static model56. We start from N disconnected nodes indexed by integer number i (i = 1, … N). The weight is assigned to each node in the out and the in set, with α± a real number in the range [0, 1). Randomly selected two nodes i and j respectively from the out set and the in set, with probability proportional to and . Connect node i and j if there is no connection between them, corresponding to a directed link from node i to node j in the digraph. Otherwise randomly choose another pair. Repeated the procedure until 〈kin〉 = 〈kout〉 = 〈k〉/2 links are created. The degree distribution under this construction is with Γ(s) the gamma function and Γ(s, x) the upper incomplete gamma function. In the large k limit, the distribution becomes .

Eliminating correlations between in- and out-degree distribution

The calculation of controllability robustness is based on the assumption that a node's role in the + set is independent of its role in the − set, which requires independent in- and out-degree distribution. The scale free network directly generated by static model has degree correlations. For example, node 1 in both the + and − set have the largest expected degree. To eliminate the degree correlation, we randomize the sequence of w±.