Identifying critical edges in complex networks

The critical edges in complex networks are extraordinary edges which play more significant role than other edges on the structure and function of networks. The research on identifying critical edges in complex networks has attracted much attention because of its theoretical significance as well as wide range of applications. Considering the topological structure of networks and the ability to disseminate information, an edge ranking algorithm BCCMOD based on cliques and paths in networks is proposed in this report. The effectiveness of the proposed method is evaluated by SIR model, susceptibility index S and the size of giant component σ and compared with well-known existing metrics such as Jaccard coefficient, Bridgeness index, Betweenness centrality and Reachability index in nine real networks. Experimental results show that the proposed method outperforms these well-known methods in identifying critical edges both in network connectivity and spreading dynamic.

The proposed method is compared with Jaccard coefficient, Bridgeness, Betweenness centrality and Reachability index in three evaluation metrics, SIR model 24,25 , susceptibility index S 26 and the size of giant component σ 27 in nine real networks which have large differences in basic topological features and the results show that the proposed method in this report can quickly decompose networks and has a greater impact on information spreading.

Results
If there are many different cliques containing two related nodes of an edge, the edge is not so important for the perspective of spreading. Based on above point and betweenness centrality of edges, a new index BCC MOD (Betweenness Centrality and Clique Model) is proposed to measure the importance of an edge e (u, v). BCC MOD is an index which combines the local and global characteristics. In BCC MOD , if we remove edges with high score, the effect of spreading is large. The performance of BCC MOD is compared with that of Jaccard coefficient, Bridgeness, Reachability and Betweenness. The results show that BCC MOD can quickly decompose networks and has a greater impact on information spreading in most cases comparing with other methods. The detailed definitions of indices are given in the Method section. Data Description. Nine undirected and unweighted networks are used to evaluate the performance of the edge ranking method. (1) Jazz, a collaboration network between Jazz musicians. (2) Oz, a network contains friendship ratings between 217 residents living at a residence hall located on the Australian National University campus. (3) Highschool, a network contains friendships between boys in a small high school in Illinois. (4) Innovation, a network spread among 246 physicians in five towns, i.e., Illinois, Peoria, Bloomington, Quincy and Galesburg. (5) Lesmis, a network contains co-occurances of characters in Victor Hugo's novel Les Miserables. (6) Train, a network contains contacts between suspected terrorists involved in the train bombing of Madrid on March 11, 2004 as reconstructed from newspapers. (7) PowerGrid, a network contains information about the power grid of the Western States of the United States of America. (8) Email, a network contains the email communication at the University Rovira i Virgili in Tarragona in the south of Catalonia in Spain. (9) Router, a network contains autonomous systems of the Internet connected with each other. All data can be downloaded from Chicago network dataset 28 and the basic topological properties of these nine networks are shown in Table 1. In order to guarantee the diversity of networks, these nine networks have large differences in total number of nodes and edges, average degree, maximum degree, average clustering coefficient and degree heterogeneity.
Evaluation metrics. Susceptibility index S, the size of giant component σ and SIR spreading model are used to evaluate the performance of ranking methods.
Susceptibility index S. In network connectivity metric, susceptibility index S is used to evaluate the performance of methods. Susceptibility index S is defined as: where n s is the number of components whose size equals s, s max is the size of giant component, and n is the size of whole network. For details, sort edges in descending order according to their ranking score firstly, and then calculate the Susceptibility index S after removing the edges from network one by one from high to low ranking scores. In this report, parameter p is defined as: r where m is the number of all edges and m r is the number of removing edges. The results are shown in Table 2 and Fig. 1. From Table 2 and Fig. 1, it can be seen that BCC MOD has the minimum p when the largest S achieves in Lesmis, Highschool, Jazz, Train, Email and Oz. In Innovation, all methods have the same effect. In PowerGrid and Router, the largest S of BCC MOD is appeared the second earliest. So, the largest S of BCC MOD appeared the earliest in most cases compared with other methods, this demonstrates that BCC MOD can break down the network quickly. Moreover, the largest S of BCC MOD is the highest among all methods for all networks except Email and Router, which means BCC MOD has the greatest damage to networks. From these results, in the point of network connectivity, BCC MOD can quickly decompose networks and has the greatest damage to networks in most cases.
The size of giant component σ. Besides susceptibility index S, another metric, the size of giant component σ is used to evaluate the performance of methods. For details, sort edges descending order according to their score firstly, and then count the size of giant component σ after removing the edges from network one by one from high to low ranking scores. The results are shown in Fig. 2. The faster the curve falls, the better the effect of method is. From Fig. 2(b,c,f,h,i), it can be found that the curve of BCC MOD falls the fastest, which means BCC MOD can break down the network quickly. And in Fig. 2(d,g), the falling speed of the BCC MOD is close to the best case among all methods. In Fig. 2(a), the size of giant component σ drops quickly although it drops relative slow at the beginning. These results demonstrate that BCC MOD can quickly decompose networks in most cases.
SIR model. In SIR model, there are three statuses: (1) S(t) denotes the number of nodes which may be infected (not yet infected); (2) I(t) denotes the number of nodes which have been infected and will spread the disease or information to susceptible nodes; (3) R(t) denotes the number of nodes which have been recovered from the disease or boredom the information and will never be infected by infected nodes again. In a network, each infected node will infect all susceptible neighbors with a certain probability μ. Infected nodes recover with probability β (for simplicity, β = 1 in this report) at each step. The process stops when there is no infected node. We can set a node to be infected and the others to be susceptible to estimate the influence of a single node in the network. The normalized final effected scale is defined as  where n R (t c , u) is the number of final effected nodes if node u is infected initially under SIR model and F(t c , u) is the finally normalized scale. To estimate the influence of edges, we can calculate the average influence of all nodes when remove a certain fraction of edges. We have an index where F (i) (t c ) is the average final infected scale of all nodes, i.e., , and F (1) (t c ) and F (2) (t c ) are results of original network and the network after removing p of edges.
In Table 3, we show the spearman correlation coefficients between the ranking scores and the relative differences of real infected scale R s with μ/μ c = 2 where μ =  Table 3, it can be seen that BCC MOD has maximal spearman correlation in PowerGrid, Lesmis, Router, Jazz, Innovation, Train and Email. These results demonstrate that the edge which BCC MOD preferentially removed has a greater impact on the dissemination of real information. Figure 3 shows the relative differences of real infected scale R s after removing top 5% ranking edges under different infect rates. It can be seen that BCC MOD has higher R s under different infect rates comparing with Jaccard, Bridgeness, Betweenness and Reachability methods. Generally, there is a significant impact on information spreading after removing top 5% ranking edges under BCC MOD . Figure 4 shows the relative differences of real infected scale R s under different ratio of edges removing p with μ/μ c = 2. From Fig. 4, it can be seen that BCC MOD has higher R s under different ratio of edges removing comparing  with other methods. These results demonstrate that BCC MOD has a greater impact on information spreading while removing a small part of edges than other methods.

Discussion
In this report, the results show that if there are many different cliques containing both two related nodes of an edge, then the edge is not important for the perspective of spreading. We propose a global structural index, called BCC MOD and compared with four well-known topological indices by susceptibility index S, the size of giant component σ and SIR model. The results show that BCC MOD performs good in identifying critical edges both in network connectivity and spreading dynamic. As indicated by the experiments on the SIR model, BCC MOD is effective in quantifying the spreading influences of edges. This will help us in some real-life applications such as controlling the spreading of diseases or rumors and withstanding targeted attacks on network infrastructures. What's more, formal definitions of cliques have generally assumed that the network links are undirected, in directed networks, the definition of cliques will be modified 29,30 , correspondingly, the algorithm of mining critical edges also have subtle changes. Although the methods have a good performance, high computational  ). So the computational complexity of BCC MOD is O (mn + M (n)) in undirected networks. BCC MOD is a global index with not too high computational load and expected to be applied in small and middle undirected networks. How to optimization of our algorithm in large-scale networks and directed networks will be part of our future work. Besides SIR model, there also have other well-known dynamical processes to measure the importance of edges, for example, the susceptible-infected-susceptible (SIS) spreading model 33 can examine how much information through the edge over a period of time.

Methods
Betweenness centrality. We know that betweenness centrality of edges indicates that the more the shortest paths between node pairs pass through the edge e(u, v), the more important the edge e(u, v) is. The betweenness centrality of an edge e(u, v) 15 is defined as: can be defined to measure the importance of an edge e(u, v). Where BC(u, v) is the betweenness centrality of edge e(u, v), k u and k v are the degrees of node u and node v respectively, C(u, v) i is the number of cliques containing edge e(u, v) (in this report, clique means full connected subgraph, not the maximum full connected subgraph) whose size being i. For example C(u, v) 4 = 3 means there are three cliques containing edge e(u, v) whose size being 4. In this method, the larger the score is, the more important the edge is. For example, as shown in Fig. 5(a,c), the degrees of nodes 1 and 2 are 7 and 8 respectively. In Fig. 5(a) (max size of cliques is 4), C(1, 2) 3 is 5 and C(1, 2) 4 is 2. When we remove edge e (1,2), there are also many paths from node 1 to node 2, the effect of spreading is little. However, in Fig. 5(c) (max size of cliques is 3) with C(1, 2) 3 being 1, when we remove edge e(1, 2), the effect of spreading is large since there is only one path (1, 3, 2) from node 1 to node 2. Table 4 shows the effect probability p e of nodes 2, 3, and 9 with the original infected source being node 1 on SIR spreading model with full contact process. Taking node 2 as an example, in Fig. 5(a,b), its effect probability is 0.3733 and 0.2240 respectively under μ = 0.2. However, in Fig. 5(c,d), the effect probability of node 2 is 0.2392 and 0.0380 respectively under μ = 0.2.
The Jaccard coefficient of an edge e(u, v) is defined as where S u , S v and S e (u, v) is the size of max clique which contains node u, v and edge e(u, v), respectively. The Reachability index of edge e(u, v) is defined as where |V| is the number of nodes, G e is the subnetwork by removing an edge e(u, v) from original network and R s G ( ; ) e u v ( , ) is the number of reachable nodes from a node s over G e .  Table 4. The ratio of infected cases among 10000 simulations of nodes 2, 3, and 9 with the original infected source being node 1 before and after edge e(1, 2) being removed in the toy network shown in Fig. 5 under different infected probability μ.