Efficient network disintegration under incomplete information: the comic effect of link prediction

The study of network disintegration has attracted much attention due to its wide applications, including suppressing the epidemic spreading, destabilizing terrorist network, preventing financial contagion, controlling the rumor diffusion and perturbing cancer networks. The crux of this matter is to find the critical nodes whose removal will lead to network collapse. This paper studies the disintegration of networks with incomplete link information. An effective method is proposed to find the critical nodes by the assistance of link prediction techniques. Extensive experiments in both synthetic and real networks suggest that, by using link prediction method to recover partial missing links in advance, the method can largely improve the network disintegration performance. Besides, to our surprise, we find that when the size of missing information is relatively small, our method even outperforms than the results based on complete information. We refer to this phenomenon as the “comic effect” of link prediction, which means that the network is reshaped through the addition of some links that identified by link prediction algorithms, and the reshaped network is like an exaggerated but characteristic comic of the original one, where the important parts are emphasized.

targeted strategy. Schneider et al. 24 developed an immunization approach based on optimizing the susceptible size, which outperforms the best known strategy based on immunizing the highest-betweenness links or nodes.
In the early works on network disintegration, it was usually assumed that the attacker can obtain perfect information on the network structure, in other words, they assumed that the observed networks are complete. However, the complete information of network structure is not always available in realistic cases. Growing attention has been paid to the study of network disintegration with imperfect information. Dezső et al. 25 proposed a biased treatment strategy against viruses spreading based on uncertain information, in which the likelihood of identifying and administering a cure to an infected node depends on its degrees as k α . Li et al. 26 studied the optimal attack problem based on incomplete information, which means that one can obtain the information of partial nodes, when the information is certain. Moreover, many researches [27][28][29][30] focused on the disintegration strategy based on local information, i.e. the knowledge of the neighborhood.
Different from the above studies which consider either uncertain information or partial information of individual level, in this paper we focus on another important and frequent scenario of imperfect information, such that part of links (i.e., interactions between nodes) are missing in the observed network. In many real networks, such as food webs 31 , terrorist networks 32 , sexual contact networks 33 , protein-protein interaction networks 34 , and disease relationship networks 35 , it is easy to obtain the information of nodes, but difficult to detect the relations or interactions between nodes, which is usually costly or even infeasible. The missing links may reduce the network disintegration performance. To address this problem, a potential approach is to recover the missing links (or part of the missing links), which remind us the so-called "link prediction" problem 36 . Link prediction algorithms aim at estimating the likelihood of the existence of a link between two nodes based on the observed network structure and the attributes of nodes. Therefore, before the attack we can use one of the link prediction algorithms to recover parts of the missing links and then identify the targets based on the "improved" network. Experiments on both synthetic and real networks show that with the assistance of link prediction the performance of disintegration can be largely improved.

Results
Network disintegration model based on link prediction. A network can be presented by a simple undirected graph G = (V, E), where V is the set of nodes, and E is the set of links. Multiple links and self-loops are not allowed. Let N = |V| and W = |E| be the number of nodes and number of links, respectively. Let k i be the degree of node v i , which equals the number of links connected to node v i . We assume that all nodes are known but partial link information is missing. Denote by E O and E M the set of observed links and missing links, respectively. Clearly, we have . Therefore, the observed network can be presented by P O P the improved network by adding the predicted links E P (⊆ Ω P ). We define the ratio β = |E P |/|E O | as the magnitude of additional link information. In general, we have E P ≠ E M due to the error predictions. Denote by M the set of links that are correctly predicted. We use the true positive rate (recall or sensitivity) R TPR = |E + |/|E M | to measure the proportion of links that are correctly predicted among the missing links set E M , and the ratio R PPV = |E + |/|E P |, i.e., the positive predictive value (precision), to measure the proportion of links that are correctly predicted among the predicted links set E P . To express the mathematical description of link prediction intuitionally, we give the iceberg diagram for link prediction problem in Fig. 1. In a manner of speaking, the network is like an iceberg. We can only see the part above sea level but do not know the rest under the sea. Link prediction is a technique to infer the invisible part based on the knowledge of observed part.
We identify the targets based on the improved network G P and then carry out the attack in the original complete network G. Note that if a node is attacked, its attached links will be removed together with its removal. Denote by ⊆ V V the set of nodes that are attacked (i.e., targets) and ⊆ E E the set of removed links, then the network obtained after node attacks is 1] as the strength coefficient of node attacks. Among the many attack strategies 28 we apply the most used "high degree strategy" in this paper. In this strategy, nodes are attacked according to their rank of degree. i.e., high degree nodes will be attacked firstly. Let k i O be the degree of node v i in G O and k i P be the degree of node v i in G P . Without link prediction, we remove nodes in the descending order of the node degree k i O . With link prediction, we remove nodes in the descending order of the node degree k i P . As the attack strength coefficient f increases, the network will eventually collapse at a critical value f c which is generally used to measure the structure robustness of a complex network from the view of defenders. The larger the f c is, the more robust the network is. Here we employ f c to evaluate the performance of network disintegration strategy from the view of attackers. Smaller f c implies more efficient network disintegration.
, respectively. After the addition of three predicted links, their degrees in the improved network G P (see Fig. 2 will be removed preferentially as shown in Fig. 2(c), and the network Ĝ obtained after removing the node v B is still connected. While based on the improved network, the node v E with the largest degree = k 4 E P will be removed preferentially as shown in Fig. 2(e), and the network Ĝ obtained after removing the node v E is disintegrated into two components.

Comic effect of link prediction.
To analyze the impact of link prediction on network disintergration, we firstly perform experiments on synthetic networks. Due to the ubiquity of scale-free networks with a power-law degree distribution p(k) ~ k −λ in real life world, our studies first focus on the network disintegration in scale-free  networks. The random scale-free networks with degree distributions p(k) = (λ − 1)m λ−1 k −λ are generated by using the method proposed in ref. 38. In Fig. 3, we report the dependence of critical attack strength coefficient f c on the magnitude of link prediction information β. We use resource allocation (RA) link prediction algorithm 37 to predict the missing links. For comparison, we also show the case of complete link information, i.e. α = 0, which is usually considered as the ideal case.
From Fig. 3, we can see that with the increasing number of missing links, the f c curve shifts gradually to top-left. For α = 0.1, α = 0.3 and α = 0.5, f c first decreases with β and then increases after β > β * . We call the region [0,β * ] the "valid prediction area" (VPA) and the region (β * , β max ) the "excessive prediction area" (EPA) where the inclusion of any additional predicted links will bring negative effects on the performance of network disintegration. To our surprise, we find an area in which the performance of our method is even better than the "ideal case" where the critical attack strength coefficient is f c 0 . We call the area "surpassing prediction area (SPA)", see Fig. 3(a). Figure 4(a) shows the performance of network disintegration under the optimal magnitude of link prediction information (i.e., ⁎ f c ), along with the performance of network disintegration without link prediction (i.e.,  f c when β = 0). The difference between ⁎ f c and  f c indicates the contribution of the additional links predicted by link prediction algorithm. We find that when α < 0.24, ⁎ f c is lower than f c 0 , which corresponds to the SPA. It can be explained that the link prediction amplifies the heterogeneity of node importance and reshape the network structure like drawing an exaggerated and characteristic comic. We refer to this phenomenon as the "comic effect" of link prediction. The values of ⁎ f c and  f c meet at α = 0.6, implying that in some cases we can reconstruct the original network to improve the performance of network disintegration even when the network has about 60% links are missing.
It is worth pointing out that, when α is large enough, see in Fig. 3(d) when α = 0.7, there is no "valid prediction area" and β * = 0. It suggests that link prediction will be counterproductive for the network disintegration performance if overmuch links are missing. The reason is that the link prediction accuracy is usually very low if the prediction based on the observed network with many missing links 39 . These results show that when the link information is not complete, a proper number of additional links can efficiently improve the performance of network disintegration and even obtain better performance (i.e., lower f c ) than the case with complete information. It is true that the added links by link prediction may connect to wrong nodes and thus we may not recover the original network completely. However, through link prediction, we partly recover the ranking of node importance, which is really critical in network disintegration. We also show in Fig. 4(b) the optimal magnitude of link prediction information β * as a function of the magnitude of missing link information α. We find that β * monotonically decreases with α and eventually reaches to zero at about α = 0.6, which suggests that the less links are missing, the more predicted links (usually with high accuracy) are required to be added to obtain the best effect. On the contrary, if more links are missing, the less predicted links are added because adding more links will lead to more mistakes due to the low accuracy of link prediction. The dependence of the critical attack strength coefficient f c on parameter α and β is shown in Fig. 5, where the VPA, EPA and SPA can be clearly partitioned.
The measure f c is the critical fraction of nodes at which the network completely collapses. However, sometimes we are also interest in the case when the network suffers a big damage without completely collapsing. Figure 6 (α, β) plane. The original network is the same as the one we used in Fig. 3. The red dash line presents the optimal magnitude of link prediction information β * . The left region and the right region of the red dash line are corresponding to the valid prediction area (VPA) and excessive prediction area (EPA), respectively. The area under the green dash dot line is the surpassing prediction area (SPA). The results are averaged over 100 independent realizations of link prediction.
reports the fraction of nodes in the giant component after node attacks S as a function of attack strength coefficient f with various magnitude of missing link information α. Here we set β = β * for corresponding α, namely β = 0.85 for α = 0.1, β = 0.55 for α = 0.3, β = 0.1 for α = 0.5 and β = 0 for α = 0.7. The effect of network disintegration can be characterized by the area under the curve of S. The smaller the area is, the more efficient the network disintegration is. Therefore, the area between the curve of S with link prediction (dotted lines) and without link prediction (solid lines) demonstrates the improvement of the performance of network disintegration with the assistance of link prediction. The improvement of our method is significant for small α and the "comic effect" of link prediction appears in the case of α = 0.1, see Fig. 6(a).
Experiments on real networks. The study of disintegration is important for many real-world systems such as rumor spreading in online social networks, disease transmission through airlines and foodweb. To evaluate the performance of our method, we investigate four real-world networks: (i) the Political blogosphere network (PB) 40 , (ii) the network of the US air transportation system (USAir) (http://toreopsahl.com/datasets/#usairports), (iii) the Foodweb of south Florida during the wet season (Foodweb) 41 and (iv) the collaboration network between Jazz musicians (Jazz) 42 . Basic statistics of these networks are shown in Table 1. As we can see, all networks are well connected, with high clustering coefficients and short average distances.
We simulate the prediction and disintegration process on these networks, and results are shown in Fig. 7. All four networks exhibit similar pattern with the synthetic networks: the critical attack strength coefficients, f c all decrease at the beginning as the ratio of additional links increase, after an optimal ratio, the performance of disintegration degenerates while more links are added. It is interesting to observe that, all the four networks have a large "surpassing prediction area", where f c deceases to even below the value obtained under complete information.

Discussion
Network disintegration with incomplete link information is an important and challenging problem. In this paper, we introduced the link prediction as a strategy for attackers to improve the performance of network  (dash dot lines), attacks without link prediction (dot lines) and attacks with optimal link prediction information (solid lines). The filled area demonstrates the improvement of the effect of network disintegration due to link prediction. The original network is the same network as in Fig. 3. For different α, we set β = β * as shown in Fig. 3. The results are averaged over 100 independent realizations of link prediction.
disintegration. We showed that although the missing of link information harms the effect of network disintegration, link prediction can help to improve the performance remarkably. We found with surprise that if the magnitude of missing link information is not too large, the effect of network disintegration with the assistance of link prediction even can be better than the case of complete link information. We called this phenomenon the "comic effect" of link prediction. Although, the link prediction does not recover the missing information completely, but it reshapes the network just like an exaggerated but characteristic comic. As a result, the importance of the key nodes is emphasized by adding a number of predicted links. We believe that the comic effect of link prediction may exist in many backgrounds, not only in the network disintegration. For example, link prediction can not only help to improve the classification accuracy of partially labeled networks 43 but also be used in recommender  Table 1. Basic statistics of four real networks. N and W are the number of nodes and links. 〈 k〉 is the average degree; C is the clustering coefficient; r is the assortativity; 〈 l〉 is the average shortest distance. systems 44 . These useful applications demonstrate that hidden information revealed by link prediction can help to improve the accuracy of information filtering algorithms. Moreover, we exposed the area of excessive prediction where the addition of more predicted links will give negative contribution. An optimal magnitude of link prediction information is obtained when the critical attack strength coefficient reaches the minimum. Beyond the optimal magnitude of link prediction information, the contribution of link prediction to the network disintegration will decrease and can even be negative. In addition, we found that the optimal magnitude of link prediction information decreases with the increasing of missing link information, indicating that when there are many missing links it should be very cautious to add new links. For real applications, how to obtain the optimal magnitude of link prediction information for real networks is still an open and challenging problem, as we usually don't know the portion of missing links and thus it's difficult to evaluate the algorithm's performance. According to the results in this paper, by adding a small number of predicted links is usually beneficial when the number of missing links is moderate. Future studies are required to evaluate the choice of appropriate link prediction algorithms to achieve better network disintegration performance 45 .

Methods
Algorithms for link prediction. The link prediction problem has been a long-standing challenge in modern information era. Its main goal is to estimate the existence likelihood of nonobserved links based on the known topology and node attributes. The simplest index of link prediction is the common neighbors (CN) index which in common sense, two nodes, x and y, are more likely to have a link if they have many common neighbors 46 .
where Γ (t) denotes the set of neighbors of node t. Resource Allocation (RA) index 37 is an improved index based on CN, which assign less-connected neighbors more weight. The index is motivated by the resource allocation dynamics on networks. Consider a pair of nodes, x and y, which are not directly connected. The node x can send some resource to y, with their common neighbors being transmitters. The similarity between x and y can be defined as the amount of resource y received from x. The mathematical expressions are xy RA z x y ( ) ( ) Performance measurement of network disintegration. In the context of complex networks, the critical removal fraction of nodes f c for the disintegration of networks is generally used to characterize the network robustness from the view of defenders. The larger f c is, the more robust the network is. This measure emerged from the random graph theory and was stimulated by Albert et al. 4 . Instead of a strict extreme property, it considers statistically how the removal of nodes leads to a deterioration of network performance, and eventually to the collapse of the network at a given critical removal fraction f c . The most common performance measurements include the diameter, the size of the largest component and the average path length. We choose κ ≡ 〈 k 2 〉 /〈 k〉 〈 2 as the criterion for the collapse of networks 47,48 , where the angular brackets 〈 .〉 denote an ensemble average. After each node is removed, we calculate κ. When κ becomes less than 2, we record the number of nodes t removed up to that point. The threshold f c is calculated as f c = 〈 t〉 /N. Here we employ f c to measure the effect of network disintegration strategy from the view of attackers. Smaller f c implies more efficient network disintegration.