Link Prediction in Evolving Networks Based on Popularity of Nodes

Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.

methods, global methods and Quasi-global methods. Local similarity is mainly based on common neighbors, such as the most well-known Common Neighbor (CN) index that counts the number of common neighbor nodes 21 , Adamic-Adar (AA) index and Resource Allocation (RA) index that depress the large-degree neighbor nodes 22,23 . For large networks, Cui et al. proposed a fast algorithm for calculating the number of common neighbors 24 . Global similarity emphasizes the global topology information of network, such as Katz index that counts all of the paths between two nodes 25 . Quasi-global similarity is a well trade-off of local similarity methods and global similarity methods, such as Local Path (LP) index that only considers the short paths in Katz index 23 , Local Random Walk (LRW) index that focuses on the limited random walk in local area 26 . Beyond that, some algorithms based on maximum likelihood methods and other exquisite models have been proposed. Clauset et al. proposed a Hierarchical Structure Model which presents well performance in hierarchical networks by using a dendrogram 27 . Lü et al. proposed a Structural Perturbation Method that approximates the observed networks by randomly repeated perturbations. This method outperforms state-of-the-art methods in accuracy and robustness 28 . In terms of information theory, Xu et al. proposed the Path Entropy index that considers the information entropies of shortest paths and penalizes the long paths 29 . Tan et al. proposed a Mutual Information (MI) method with the high accuracy and reasonable computation time, which considers the feature of common neighbors and denotes the likelihood of one link as the the conditional self-information of this link existing between the node pair when their common neighbors are given 30

. Zhu et al. generalized the MI index into Neighbor Set
Information that is applicable to multiple structural features to enhance the accuracy 31 .
Real networks are highly dynamic with the come-and-go of nodes and edges 32 . However, the aforementioned algorithms unexceptionally ignore the temporal aspects of real networks, in particular, the trend of nodes: yesterday active nodes that contacted numerous neighbors may be unpopular today. Inspired by this, we propose a hypothesis that the emergence of future links are not only determined by existing network structure, but also are affected by the popularity of endpoints. For instance, Fig. 1 illustrates the effects of popularity. The red node will enter in the network and connect with one of the existing nodes. In Fig. 1(a), according to the static analysis, node 10 prefers to connect with the large-degree node 1. While the birth time of each edge is given in Fig. 1(b), we can easily know that node 3 is of high popularity because only it attracts edges at the present time t 2 . In practice, the fresh edge will be more likely to occur between node 10 and the active node 3 at the next period t 3 . To comply with this scenario, unlike previous works that predict potential links mostly based on static networks, we propose a popularity based structural perturbation method (PBSPM) and its fast algorithm that integrate popularity of nodes and observed network topology to predict future edges. Experimental results on real-world networks show that the proposed methods outperform the other traditional approaches in accuracy and robustness.

Results
Popularity metrics. The definition of popularity is related to the concepts of temporal trend of nodes that could be obtained through the statistics and analysis of relevant historical information. For two nodes with the same degree, one may connect with its neighbors at early stage and not form any connections later, while the other one develops most of its connections at late stage. Intuitively the latter node would attract more fresh edges with high probability in the near future. Given this, a straightforward approach to evaluate the popularity of a node is counting the edges it recently attracts.
Given an undirected and unweighted network G(V, E) where V and E represent the set of nodes and links, respectively, each link has a time-stamp that represents the entering time. In this work, multi-links and self-loops are not allowed. k i (t) denotes the degree of nodes i at time t. In the next time span T, node i would attract Δk i (t, T) new edges, Note that Δk i (t, T) in Eq. (1) determined by both t and T cannot reflect the relative popularity of node i, since even large degree nodes become inactive, they still attract more fresh edges than nodes of small degree due to the preferential attachment mechanism. To solve this issue, for a dataset spans starting from t a to t c , we divide its Figure 1. Illustration of the popularity. The fresh link and node 10 will be added into the existing networks at the next time t 3 . In panel (a), attractiveness of nodes are determined by static features. According to the preferential attachment, node 10 prefers to connect with node 1 due to the largest degree. In panel (b), temporal effects are considered. The currently popular node 3 may become attractive and connect with node 10 at time t 3 . edges into the fresh set and the old set according to a boundary t b ∈ (t a , t c ). If an edge was constructed in (t a , t b ), it belongs to the old set otherwise the fresh set. The fraction of old edges and fresh edges are denoted as p older and p fresher . The p fresher can be comprehended as the observation length of historical information. Then, the popularity of node i is where k i,all and k i,fresher indicate the whole degree and fresher degree of node i. Equation (2) improves the drawbacks of simply counting the new edges and quantifies the popularity in the normalized range. Clearly, if all links of node i locate in the fresh set, s i = 1. For another case that all links of node i locate in the old set, node i becomes dormant, s i = 0. Therefore s i ∈ [0, 1] and a higher s i means a higher popularity.
Popularity based structural perturbation method. In this section, we propose a hypothesis that the observed network is determined by some latent attractors (e.g. similar hobbies, ages, gender, location) that independently influence the structural properties. For an attractor = .
, x k,i represents the attractiveness of node i for the latent attractor x k . Inspired by configuration model, the probability p ij that an edge exists between two node i and j is proportional to x k,i x k,j . Supposing that there are m kinds of attractors, probability p ij is defined as the weighted influence of each attractor, where w k is a tunable parameter to balance the relative influence of each attractor x k . The problem is how to seek the optimal w k and x k,i that make p ij approximate a ij at most. Considering a network G with adjacent matrix A = (a ij ) n × n , a special case is that p ij = 1 if a ij = 1, otherwise p ij = 0. For optimal w k and x k , where n is the size of the network, then Eq. (4) could be comprehended as the matrix decomposition, with w k and x k representing eigenvalues and eigenvectors respectively. In practice, many random connections exist in networks, Lü et al. proposed the structural perturbation method (SPM) that can reduce the influence of randomness 28 . In SPM, a small fraction p H of edges ΔA is removed from the network, adjacent matrix A R of the remaining network is decomposed into is the coupling influence of x k on λ k . Ã actually is a special case of A p , (λ k + Δλ k ) and elements of eigenvector x k represent weight difference and the attractiveness for attractor x k separately. As we have argued, the ability for node i to attract new edges is determined by both latent attractors and its current popularity. To better meet practice, an advanced attractiveness ′ x k i , is proposed as where α indicates the degree of temporal popularity. Equation (7), a combination of the static attractiveness and popularity, tightly captures both the static features and the temporal information of the evolving pattern. Later in Eq. (6), substituting x k with x′ k to predict future links, , we substitute w k and x k in Eq. (4) with λ k and x k in Eq. (5). Similar to the same transition from Eq. (5) to Eq. (8), we obtain Experiments on real networks. The proposed method PBSPM, integrating the attractiveness x k,i and popularity s i , reduces into the original SPM when α = 0. With the increase of α, PBSPM prefers to predict links between popular nodes. Figure 2 gives the performance of PBSPM in contrast to SPM (α = 0) under different p fresher . The precision values tend to be stable or achieve the best when α brings the static attractiveness and popularity into balance. Clearly, the optimal value of α varies for different networks. For Hypertext, Infec and UcScoci, future links have high likelihood to exist between the active nodes. However, for the Haggle dataset, the temporal trend of nodes are less obvious. Hence, the precision curve is optimized at α = 2, contrast to the other three networks of which the curves finally stabilize when α increases. Overall, when α ∈ [3,5], PBSPM achieves improved performance compared with SPM in the four networks. Moreover, given the different length of historical information p fresher , all the curves present different levels of superiority in precision, suggesting a general and robust range of p fresher . Actually, it is difficult to choose the optimal value, which should follow the principle of keeping the balance between the length of historical information and future information (probe set). With regard to 10% probe set in this experiment, P fresher = 0.1 is the balanced option because the corresponding curves all show the great improvements. Reducing the number of eigenvectors could reduce the computation complexity. To address the high computation complexity, we propose the fast PBSPM that takes into account a few eigenvectors with only some large eigenvalues, which can well reflect the backbone structure of networks 33 . In practical networks, a huge gap exists in the eigenvalue space. Some eigenvectors with large eigenvalues play more important roles than those with small  Table 1  eigenvalues. Taking Hypertext as example, Fig. 3(a) plots the precision for various m in Eq. (9). Compared with SPM, the curve presents significant improvements and achieves the best at m = 1, meeting the effectiveness of Eq. (9). The distinct g 1 indicates a huge gap between λ 1 and λ 2 , while the other gaps (m ≥ 2) are all close to 0, suggesting that the huge gap g 1 induces the decline of precision when m > 1. Then, we choose m = 1 as the optimal value for Hypertext, analogously, the values for Haggle, Infec and UcSoci are respectively determined as m = 2,19,2 after which the g m approaches to 0 approximately. In consequence, it only requires O(n 2 ) time to calculate the top-m eigenvalues and corresponding eigenvectors, and the reconstruction of similarity matrix (Eq. 9) needs O(m × n 2 ) time. To reduce the randomness, the fast PBSPM repeats the random perturbation for ten times and obtains the averaged similarity matrix with O(10 × (mn 2 + n 2 )) time. Hence, with  m n and the increase in size n, the time complexity of fast PBSPM is O(n 2 ) in contrast with the time complexity O(n 3 ) of PBSPM and SPM, where the decomposition and reconstruction consume O(n 3 ) time. Besides, the time complexity is O(n 2 ) for local similarity based methods, such as CN, RA, AA, and O(n 3 ) for Katz and SRW. Table 1 and Table 2 list the precision values and computation time of different link prediction algorithms. Obviously, the proposed methods achieve remarkable improvements, at most 84.84% for Hypertext, 28.42% for Haggle, 6.19% for Infec, 95.97% for UcSoci. In spite of this, PBSPM suffers from the huge computational cost that limits its extensive applications. Fast PBSPM, a well trade-off of computation complexity and accuracy, has the reasonable computational cost and the high accuracy. Due to the repeated steps in experimental procedures, the fast algorithm still consumes more time than some traditional predictors with the same time complexity. Additionally, the attractors ignored by the fast algorithm contain some secondary information that may either improve the accuracy as useful information or deteriorate the performance as network noise, hence, the precision slightly fluctuates around that of PBSPM. In general, the proposed methods show the high robustness because of the well performance for disparate networks, while other baselines give poor predictions for some networks. Apart from precision improvements, we also try to quantify the physical difference between the age of links selected by various methods, which can be comprehended as the average popularity of endpoints = ∑ + | ⁎ s s s E 2 i j P if edge e ij is selected by a certain predictor. According to Table 3, links selected by the proposed methods are much older than the others; that is, the potential links prefer to form between the active nodes in the earlier future.
In the following, we mainly focus on the performance of SPM and PBSPM to explore underlying reasons of the improvements. To figure out the effect of popularity, four typical nodes from the training set of Hypertext, the large-degree node 1 and 3 (k 1,training = 78,s 1 = 0.051;k 3,training = 93,s 3 = 0.032), and the active node 91 and 113 (k 91,training = 29,s 91 = 0.289;k 113,training = 14,s 113 = 1) are chosen to analyse their predicted connections and corresponding variation of attractiveness. Figure 4 plots the predicted future links attached to selected nodes by SPM and PBSPM when p fresher = 0.05 and α = 9. After that, the principal eigenvector x 1 of A R and the advanced ′    Table 2. Computation time of different methods for four networks. All the results are averaged over ten runs on AMD R7 computer with MATLAB R2016b and 8GB RAM.
the optimal case are calculated to quantify the attractiveness for the most weighted attractor. In addition, the principal eigenvector also characterizes the ranking of nodes, i.e. the importance 34,35 . In Fig. 4(a), node 1 and 3 (x 1,1 = 0.1715,x 1,3 = 0.1899) with the high importance are much more attractive than node 91 and 113 (x 1,91 = 0.0648,x 1,113 = 0.0329), especially, node 113 with the lowest importance has no connections at all. Contrastingly, the high popularity enhances the active nodes ( ′ = . ′ = . ) and results in the burst of links connecting to the them in Fig. 4(b), notably the most active node 113. In summary, nodes with the higher popularity are emphasized by PBSPM to attract much more links, whereas the inactive despite their importance are weakened to reduce connections.
The above figures conduce to the understanding of how popularity imposes effects on several typical nodes, but note that, it is a rational speculation that the improvements must result from the advanced attractiveness of all nodes. As above argued, principal eigenvector denotes the attractiveness for the most weighted attractor. Because 1 1 occupies the main body of Ã, neglecting constant term Δλ 1 + Δλ 1 , similarity ã i,j is mainly determined by eigenvector x 1 . The Pearson correlation coefficient (CC) between principal eigenvector and degree in the probe set, holistically reflecting the extent to which the attractiveness x 1,i coincides with real degree increment k i,probe , is computed as follows, The CC between advanced ′ x 1 and degree in the probe set is obtained similarly. Table 4 lists the variation of CC after the addition of popularity and the coupling influence Δλ 1 averaged over ten independent perturbations. The positive ΔCC of four networks suggest attractiveness of some nodes are corrected to meet the degree increment in the future. Furthermore the positive Δλ 1 also strengthens the improvements of correlations. As a result, the popular nodes are assigned more connecting opportunities to promote the precision.  Table 3. Average age of links selected by predictors. The bold data are averaged over ten runs and obtained under the optimal parameters.   Table 4. Variation of correlation coefficient ΔCC and coupling influence Δλ 1 . Each data is averaged over ten perturbations.
Eventually, to demonstrate the feasibility of the proposed methods in practical applications, we compare the fast PBSPM with time series (TS) based methods on continuous temporal networks, which have been effectively applied to the temporal link prediction [36][37][38] . For each network, the dataset is divided into T N snapshots ... G G G ( , , , ) N with the length of time period P length = 7 days. Setting a specified time window T = 5, we use the graph series (G t , G t + 1 , …, G t + T − 1 ) and its reduced static graph G t~t + T − 1 to predict the links that will occur in G t + T (t = 1, 2, …, T N − T). Then the popularity of each node is calculated as: During the evolution, certain mechanisms drive the network organization regularly and the structural features keep relatively stable. Hence, we obtain the optimal α and m by the known networks observed between the time period 1 ≤ t ≤ 6 (G 1~5 as the training set, G 6 as the probe set) and apply them to the subsequent predictions. Figure 5 shows the precision at continuous time steps and the average accuracy of different methods. For LKMLR, though the fast PBSPM falls behind sometimes, its average value shows a slight advantage in precision (Fig. 5(a) (c)). For Wiki, not only does the fast PBSPM gain the upper hand at any time, but it achieves much higher average accuracy compared with TS based methods (Fig. 5(b) (d)). These experimental results demonstrate that the fast PBSPM has prospective applications in evolving networks.

Discussion
In this paper, we propose the PBSPM and its fast algorithm to predict future links. The main contribution is to investigate the popularity (activeness) of nodes in real-world evolving networks and apply it to link prediction. Unlike previous works that calculate temporal effects with complex theories, we infer the popularity of each node by its recently active edges. Then we propose a hypothesis that the future network is influenced by both existing structure and popularity of nodes. By introducing popularity into perturbation method, PBSPM could distinguish active and inactive historical important nodes, and prefer to predict new edges attached to active nodes. Subsequently, the fast method is proposed to get rid of the high computation complexity. Experimental results on real-world evolving networks reveal that compared with traditional methods, the proposed methods achieve better performance in precision and robustness. Besides, further experiments are conducted to uncover the underlying reasons of the improvements.
Definitely, the performance of proposed methods largely depend on the popularity of each node. In other words, the popularity based methods are more applicable for the networks with obvious temporal effects, where the popularity metric can effectively quantify the popularity of each node. Hence, another important issue is that improving popularity performance would enhance the precision of link prediction, which is the future work. Since our work mainly explores prediction in evolving networks, it has possible applications in traffic prediction, airline control, recommendation of social network, and so on.

Methods
Experimental procedures. To predict the future links of evolving networks with PBSPM, there are five detailed steps to follow: Step 1: We firstly divide the network into the training set E T and the probe set E P based on the birth time of each edge, the corresponding adjacent matrix are denoted by A T and A P .
Step 2: The training set is further divided into the old set and the fresh set to calculate the popularity via Eq. (2) or Eq. (11).
Step 3: We perturb the training set by randomly removing a small fraction p H = 0.1 of edges ΔA, obviously, Step 4: We decompose the matrix A R and obtain the ′ A via Eq. (7) and Eq. (8).
Step 5: Repeat step 3 and step 4 for ten times. In other words, we implement the perturbations for ten times to obtain the averaged ′ A where the score 〈 ′ 〉 a ij represents the existent likelihood of the link between node i and j. Finally, non-observed edges with the top-E P scores are chosen as potential future edges.
Data description. In this work, six datasets are considered to evaluate the performance of algorithms. To simplified the problem, we ignore the direction and weighted of links, and remove the isolated nodes. What is more, the networks are divided into historical training set and future probe set only according to the timestamps that attach to edges. Evaluation metric. AUC (Area Under the receiver operating characteristic Curve) and Precision are two standard metrics used to measure the link prediction algorithm 43,44 . The former randomly compares the score of a missing link with a non-existent link to evaluate the performance. The latter focuses on the links with top-L scores. When dealing with highly skewed datasets, the precision always gives a more informative picture of algorithms' performance 45 . Hence, We choose Precision index as the metric to evaluate the accuracy of the proposed method and other baselines. Precision is defined as the ratio of links predicted accurately to all links selected. Namely if we select top-L links in the all ranked non-observed links and only L r links are predicted correctly in the probe set E P , then the accuracy of predictor follows In our experiments, we select = L E P and count how many of top-E P links really exist in the probe set.
Baselines. For comparison, we briefly introduce five traditional algorithms based on all three kinds of structural similarity.
(1) Common Neighbors (CN), related to the concepts of the triadic closure, is the most well-known method with an assumption that two target points tend to connect with each other if the new connection may produce much more triangles in the graph.

xy CN
where Γ(x) is the set of neighbors of node x and ∩ Γ Γ x y ( ) ( ) represents the set of common neighbors of x and y.