Introduction

The structure and function of complex networks attracted a great deal of attention in many branches of science1. Networks mediate the spread of information, sometimes, a few initial seeds can affect large portions of networks. Such information cascade phenomena are observed in many situations, for example, cascading failures in power grids, diseases contagion between individuals, innovations and rumors propagating through social networks, and large grass-roots social movements in the absence of centralized control. How to find critical nodes and edges is an important and interesting issue. With the rapid development of internet media, the information interaction between individuals is becoming more and more frequent and the mechanism of information diffusion has become more and more complex. Many methods are used to measure the importance of nodes in networks. Degree centrality2, semi-local centrality3, k-shell4 and H-index5,6 are based on nodes’ degrees. Closeness centrality7, betweenness centrality8 and eccentricity centrality9 are based on paths in networks. PageRank10, LeaderRank11 and HITs12 are based on eigenvector. Sleep scheduling13 is one of the approaches to save residual energy of wireless nodes in energy-constraint large-scale industrial wireless sensor networks while satisfying network connectivity and reliability. In comparison, critical edges also play a significant role in the process of information diffusion. In complex networks, sometimes it is impractical to forbid all communications of a node, so it is necessary to truncate some important communication links. Critical edges analysis will be beneficial to guide or control the information dissemination from a global perspective.

In order to explore the transmission of information, many researches have focused on the network topology to find the critical edges. Degree product14 supposes that edges connecting two nodes with high degrees are critical. Betweenness centrality of edges15,16 and betweenness centrality of a group of edges17 suppose that edges linking two connected components are important. Average node reachability and the maximum flow of a network can characterize the ability of information transmission in networks and critical edges have serious influence on average node reachability and maximum flow18,19. In Jaccard coefficient20, if node i and node j have a lot of common neighbors, even if they have no direct connection, information also can spread from node i to node j easily, so edges are more important if there are less common neighbors. Complex networks may have many cliques. In Bridgeness21, if an edge is removed, information can spread through other edges in the clique which contains the removed edge, so, intuitively, edges in smaller cliques are more important.

What’s more, The ability to disseminate information is also an evaluation index to measure the importance of edges. In online social networks, the study finds three different spreading mechanisms: social spreading, self-promotion and broadcast22. An edge is important if most of the information is spreading through this edge23.

In this report, we only use the topology of networks to rank the importance of edges, considering not only the local characteristics (degrees of nodes, cliques) but also the global characteristics (betweenness centrality). The proposed method is compared with Jaccard coefficient, Bridgeness, Betweenness centrality and Reachability index in three evaluation metrics, SIR model24,25, susceptibility index S26 and the size of giant component σ27 in nine real networks which have large differences in basic topological features and the results show that the proposed method in this report can quickly decompose networks and has a greater impact on information spreading.

Results

If there are many different cliques containing two related nodes of an edge, the edge is not so important for the perspective of spreading. Based on above point and betweenness centrality of edges, a new index BCCMOD (Betweenness Centrality and Clique Model) is proposed to measure the importance of an edge e(u, v). BCCMOD is an index which combines the local and global characteristics. In BCCMOD, if we remove edges with high score, the effect of spreading is large. The performance of BCCMOD is compared with that of Jaccard coefficient, Bridgeness, Reachability and Betweenness. The results show that BCCMOD can quickly decompose networks and has a greater impact on information spreading in most cases comparing with other methods. The detailed definitions of indices are given in the Method section.

Data Description

Nine undirected and unweighted networks are used to evaluate the performance of the edge ranking method. (1) Jazz, a collaboration network between Jazz musicians. (2) Oz, a network contains friendship ratings between 217 residents living at a residence hall located on the Australian National University campus. (3) Highschool, a network contains friendships between boys in a small high school in Illinois. (4) Innovation, a network spread among 246 physicians in five towns, i.e., Illinois, Peoria, Bloomington, Quincy and Galesburg. (5) Lesmis, a network contains co-occurances of characters in Victor Hugo’s novel Les Miserables. (6) Train, a network contains contacts between suspected terrorists involved in the train bombing of Madrid on March 11, 2004 as reconstructed from newspapers. (7) PowerGrid, a network contains information about the power grid of the Western States of the United States of America. (8) Email, a network contains the email communication at the University Rovira i Virgili in Tarragona in the south of Catalonia in Spain. (9) Router, a network contains autonomous systems of the Internet connected with each other. All data can be downloaded from Chicago network dataset28 and the basic topological properties of these nine networks are shown in Table 1. In order to guarantee the diversity of networks, these nine networks have large differences in total number of nodes and edges, average degree, maximum degree, average clustering coefficient and degree heterogeneity.

Table 1 The basic topological features of nine real networks.

Evaluation metrics

Susceptibility index S, the size of giant component σ and SIR spreading model are used to evaluate the performance of ranking methods.

Susceptibility index S

In network connectivity metric, susceptibility index S is used to evaluate the performance of methods. Susceptibility index S is defined as:

$$S=\sum _{s < {s}_{{\rm{\max }}}}\frac{{n}_{s}{s}^{2}}{n},$$
(1)

where ns is the number of components whose size equals s, smax is the size of giant component, and n is the size of whole network. For details, sort edges in descending order according to their ranking score firstly, and then calculate the Susceptibility index S after removing the edges from network one by one from high to low ranking scores. In this report, parameter p is defined as:

$$p=\frac{{m}_{r}}{m},$$
(2)

where m is the number of all edges and mr is the number of removing edges.

The results are shown in Table 2 and Fig. 1. From Table 2 and Fig. 1, it can be seen that BCCMOD has the minimum p when the largest S achieves in Lesmis, Highschool, Jazz, Train, Email and Oz. In Innovation, all methods have the same effect. In PowerGrid and Router, the largest S of BCCMOD is appeared the second earliest. So, the largest S of BCCMOD appeared the earliest in most cases compared with other methods, this demonstrates that BCCMOD can break down the network quickly. Moreover, the largest S of BCCMOD is the highest among all methods for all networks except Email and Router, which means BCCMOD has the greatest damage to networks. From these results, in the point of network connectivity, BCCMOD can quickly decompose networks and has the greatest damage to networks in most cases.

Table 2 The value of p corresponding to the largest S.
Figure 1
figure 1

The susceptibility index S over different value of p.

The size of giant component σ

Besides susceptibility index S, another metric, the size of giant component σ is used to evaluate the performance of methods. For details, sort edges descending order according to their score firstly, and then count the size of giant component σ after removing the edges from network one by one from high to low ranking scores.

The results are shown in Fig. 2. The faster the curve falls, the better the effect of method is. From Fig. 2(b,c,f,h,i), it can be found that the curve of BCCMOD falls the fastest, which means BCCMOD can break down the network quickly. And in Fig. 2(d,g), the falling speed of the BCCMOD is close to the best case among all methods. In Fig. 2(a), the size of giant component σ drops quickly although it drops relative slow at the beginning. These results demonstrate that BCCMOD can quickly decompose networks in most cases.

Figure 2
figure 2

The size of giant component σ over different value of p.

SIR model

In SIR model, there are three statuses: (1) S(t) denotes the number of nodes which may be infected (not yet infected); (2) I(t) denotes the number of nodes which have been infected and will spread the disease or information to susceptible nodes; (3) R(t) denotes the number of nodes which have been recovered from the disease or boredom the information and will never be infected by infected nodes again. In a network, each infected node will infect all susceptible neighbors with a certain probability μ. Infected nodes recover with probability β (for simplicity, β = 1 in this report) at each step. The process stops when there is no infected node. We can set a node to be infected and the others to be susceptible to estimate the influence of a single node in the network. The normalized final effected scale is defined as

$$F({t}_{c},u)=\frac{{n}_{R({t}_{c},u)}}{n},$$
(3)

where nR(tc, u) is the number of final effected nodes if node u is infected initially under SIR model and F(tc, u) is the finally normalized scale. To estimate the influence of edges, we can calculate the average influence of all nodes when remove a certain fraction of edges. We have an index

$${R}_{s}=\frac{{F}^{\mathrm{(1)}}({t}_{c})-{F}^{\mathrm{(2)}}({t}_{c})}{{F}^{\mathrm{(1)}}({t}_{c})},$$
(4)

where F(i)(tc) is the average final infected scale of all nodes, i.e., \({F}^{(i)}({t}_{c})=\frac{1}{n}{\sum }_{u\in V}F({t}_{c},u)\), and F(1)(tc) and F(2)(tc) are results of original network and the network after removing p of edges.

In Table 3, we show the spearman correlation coefficients between the ranking scores and the relative differences of real infected scale Rs with μ/μc = 2 where \({\mu }_{c}=\frac{\langle k\rangle }{\langle {k}^{2}\rangle -\langle k\rangle }\) in this report and all results are averaged over 200 independent implementations. Edges are descending order and divided into 50 parts. For each step only 1 part of edges (remaining other 49 parts) are removed and calculated the relative differences of real infected scale corresponding. Finally, two sequences (scores of the 2% edges and the relative differences of real infected scale) are obtained and the spearman correlation coefficients between them are obtained. From Table 3, it can be seen that BCCMOD has maximal spearman correlation in PowerGrid, Lesmis, Router, Jazz, Innovation, Train and Email. These results demonstrate that the edge which BCCMOD preferentially removed has a greater impact on the dissemination of real information.

Table 3 Spearman correlation coefficients between the ranking scores and the relative differences of real infected scale Rs.

Figure 3 shows the relative differences of real infected scale Rs after removing top 5% ranking edges under different infect rates. It can be seen that BCCMOD has higher Rs under different infect rates comparing with Jaccard, Bridgeness, Betweenness and Reachability methods. Generally, there is a significant impact on information spreading after removing top 5% ranking edges under BCCMOD.

Figure 3
figure 3

The relative differences of real infected scale Rs after removing top 5% ranking edges under different infect rates. All results are averaged over 100 independent implementations.

Figure 4 shows the relative differences of real infected scale Rs under different ratio of edges removing p with μ/μc = 2. From Fig. 4, it can be seen that BCCMOD has higher Rs under different ratio of edges removing comparing with other methods. These results demonstrate that BCCMOD has a greater impact on information spreading while removing a small part of edges than other methods.

Figure 4
figure 4

The relative differences of real infected scale Rs over each node as seed under different ratio of edges removing p. All results are averaged over 100 independent implementations under μ/μc = 2.

Discussion

In this report, the results show that if there are many different cliques containing both two related nodes of an edge, then the edge is not important for the perspective of spreading. We propose a global structural index, called BCCMOD and compared with four well-known topological indices by susceptibility index S, the size of giant component σ and SIR model. The results show that BCCMOD performs good in identifying critical edges both in network connectivity and spreading dynamic. As indicated by the experiments on the SIR model, BCCMOD is effective in quantifying the spreading influences of edges. This will help us in some real-life applications such as controlling the spreading of diseases or rumors and withstanding targeted attacks on network infrastructures. What’s more, formal definitions of cliques have generally assumed that the network links are undirected, in directed networks, the definition of cliques will be modified29,30, correspondingly, the algorithm of mining critical edges also have subtle changes. Although the methods have a good performance, high computational complexity make it can’t be used in large-scale networks. In BCCMOD, all nodes’ degrees should be determined (running time is O (m)) and the time complexity for calculating the betweenness centrality of all edges in undirected networks is O (mn)31. The time complexity for finding all cliques in undirected networks is O (M (n)) where M (n) is the cost of multiplying two n × n matrices32 (for sparse matrices, M (n) is O (n2)). So the computational complexity of BCCMOD is O (mn + M (n)) in undirected networks. BCCMOD is a global index with not too high computational load and expected to be applied in small and middle undirected networks. How to optimization of our algorithm in large-scale networks and directed networks will be part of our future work. Besides SIR model, there also have other well-known dynamical processes to measure the importance of edges, for example, the susceptible-infected-susceptible (SIS) spreading model33 can examine how much information through the edge over a period of time.

Methods

Betweenness centrality

We know that betweenness centrality of edges indicates that the more the shortest paths between node pairs pass through the edge e(u, v), the more important the edge e(u, v) is. The betweenness centrality of an edge e(u, v)15 is defined as:

$$BC(u,v)=\sum _{s\ne t\in V}\frac{{\delta }_{st}(u,v)}{{\delta }_{st}},$$
(5)

where δst is the number of all the shortest paths between node s and node t, δst(u, v) is the number of all the shortest paths between node s and node t which pass through the edge e(u, v), the larger the score BC is, the more important the edge is.

Critical edge identification method

Generally, from the perspective of information spreading, the more important the two related nodes are, the more important the edge is. On the other hand, if there are many different cliques containing e(u, v), even e(u, v) is removed, the information also can spread from u to v (or v to u) easily through other edges in these cliques. Based on above 2 points and combined betweenness centrality of edges, a new index BCCMOD (Betweenness Centrality and Clique Model)

$$BC{C}_{MOD}(u,v)=\frac{{k}_{u}{k}_{v}\cdot BC(u,v)}{\sum _{i=3}^{n}C{(u,v)}_{i}},$$
(6)

can be defined to measure the importance of an edge e(u, v). Where BC(u, v) is the betweenness centrality of edge e(u, v), ku and kv are the degrees of node u and node v respectively, C(u, v)i is the number of cliques containing edge e(u, v) (in this report, clique means full connected subgraph, not the maximum full connected subgraph) whose size being i. For example C(u, v)4 = 3 means there are three cliques containing edge e(u, v) whose size being 4. In this method, the larger the score is, the more important the edge is. For example, as shown in Fig. 5(a,c), the degrees of nodes 1 and 2 are 7 and 8 respectively. In Fig. 5(a) (max size of cliques is 4), C(1, 2)3 is 5 and C(1, 2)4 is 2. When we remove edge e(1, 2), there are also many paths from node 1 to node 2, the effect of spreading is little. However, in Fig. 5(c) (max size of cliques is 3) with C(1, 2)3 being 1, when we remove edge e(1, 2), the effect of spreading is large since there is only one path (1, 3, 2) from node 1 to node 2. Table 4 shows the effect probability pe of nodes 2, 3, and 9 with the original infected source being node 1 on SIR spreading model with full contact process. Taking node 2 as an example, in Fig. 5(a,b), its effect probability is 0.3733 and 0.2240 respectively under μ = 0.2. However, in Fig. 5(c,d), the effect probability of node 2 is 0.2392 and 0.0380 respectively under μ = 0.2.

Figure 5
figure 5

Four toy networks.

Table 4 The ratio of infected cases among 10000 simulations of nodes 2, 3, and 9 with the original infected source being node 1 before and after edge e(1, 2) being removed in the toy network shown in Fig. 5 under different infected probability μ.

The Jaccard coefficient of an edge e(u, v) is defined as

$${J}_{e(u,v)}=\frac{|{{\rm{\Gamma }}}_{u}\cap {{\rm{\Gamma }}}_{v}|}{|{{\rm{\Gamma }}}_{u}\cup {{\rm{\Gamma }}}_{v}|},$$
(7)

where u and v are two related nodes of the edge e(u, v) and Γu is the set of u’s neighbors.The Bridgeness index of an edge e(u, v) is defined as

$${B}_{e(u,v)}=\frac{\sqrt{{S}_{u}{S}_{v}}}{{S}_{e(u,v)}},$$
(8)

where Su, Sv and Se(u, v) is the size of max clique which contains node u, v and edge e(u, v), respectively.

The Reachability index of edge e(u, v) is defined as

$${R}_{e(u,v)}=\frac{1}{|V|}\sum _{s\in V}|R(s;{G}_{e(u,v)})|,$$
(9)

where |V| is the number of nodes, Ge is the subnetwork by removing an edge e(u, v) from original network and \(|R(s;{G}_{e(u,v)})|\) is the number of reachable nodes from a node s over Ge.