An information-theoretic model for link prediction in complex networks

Various structural features of networks have been applied to develop link prediction methods. However, because different features highlight different aspects of network structural properties, it is very difficult to benefit from all of the features that might be available. In this paper, we investigate the role of network topology in predicting missing links from the perspective of information theory. In this way, the contributions of different structural features to link prediction are measured in terms of their values of information. Then, an information-theoretic model is proposed that is applicable to multiple structural features. Furthermore, we design a novel link prediction index, called Neighbor Set Information (NSI), based on the information-theoretic model. According to our experimental results, the NSI index performs well in real-world networks, compared with other typical proximity indices.

where the base of the logarithm is specified as 2.
The self-information indicates the uncertainty of the outcome x k . It is related to the probability p(x k ). Clearly, the higher the self-information is, the lower probability the outcome occurs. Similarly, if we have the conditional probability p(x i |y j ), we can define the conditional self-information as I(x i |y j ) = − log p(x i |y j ). (2) I(x i |y j ) indicates the uncertainty of the outcome x i when the outcome y j is given. When x i and y j are independent to each other, I(x i |y j ) equals to I(x i ). Definition 2 Considering two variables X and Y with a joint probability distribution function p(x, y) and marginal probability distribution functions p(x) and p(y). The mutual information I(X; Y ) is defined as [2] I(X; Y ) = Therefore, the mutual information I(x i ; y j ) = I(X = x i ; Y = y j ) can be denoted as (4) The mutual information gives the reduction in uncertainty of an outcome x i when the outcome of another variable y j is given.
As to the link prediction problem, we primarily estimate the probability that the node pairs are connected based on some prior known information. For a node pair (x, y), if the event that node pair (x, y) is connected is denoted as L 1 xy , the self-information of event L 1 xy is I(L 1 xy ) = − log p(L 1 xy ), where p(L 1 xy ) is the prior probability of the connection of node pair (x, y). If the common neighbors of node pair (x, y) are known, then the uncertainty of the event L 1 xy reduces. Formally, the common neighbors of node pair (x, y) is defined as O xy = {z : z ∈ Γ(x) ∩ Γ(y)}, where Γ(x) is the neighbors of node x. Then the conditional self-information of event L 1 xy when the common neighbors are given is I(L 1 xy |O xy ). The higher value of I(L 1 xy |O xy ) means the event L 1 xy is less likely to happen. Based on this idea, we could distinguish which link is more likely to be formed.

The Derivation of the Information-theoretic Model
For the case with only one topological feature. Given a disconnected node pair (x, y) and one feature F associated with (x, y). When obtaining the feature variable set Ω corresponding to feature F , the probability score can be defined as According to the definition of mutual information [2], I(L 1 xy |Ω) can be denoted as where I(L 1 xy ) is the value of self-information of that node pair (x, y) is connected, and I(L 1 xy ; Ω) denotes the value of mutual information between the event that node pair (x, y) is connected and the event that feature variable set Ω is available, which indicates the reduction in uncertainty of the connection between nodes x and y when feature variable set Ω is given.
If the feature variables in Ω are assumed to be independent to each other, then In the real implementation, this assumption depends on the chosen feature, and it is true in most cases. For example, common neighbors in the set Ω = O xy = {ω : ω ∈ Γ(x) ∩ Γ(y)} are independent to each other. I(L 1 xy ; ω) can be further derived as where I(L 1 xy |ω) is the conditional self-information of that node pair (x, y) is connected when a feature variable ω is known.
We substitute Eqs. (6), (7) and (8) into Eq. (5) and obtain Consider the case with multiple topological features, the variable set for feature i is denoted as Ω i . Then, we adopt a parameter λ i to reckon the contribution of feature i to the final connection likelihood, and define the probability score as

The Neighbor Set Information Approach to Link Prediction
In this section, we will introduce an information-theoretic approach based on the neighbor set in details. As shown in Fig. 1, the features we considered here are the feature of common neighbors and the feature of link across two neighbor sets. For a disconnected node pair (x, y), the set of common neighbors is denoted as O xy = {z : z ∈ Γ(x) ∩ Γ(y)} and the set of links across two neighbor sets is defined as where E denotes the link set of the network and Γ(x) is the neighbor set of node x.
First, let's discuss the information caused by the common neighbors. If the event of the connection of node pair (x, y) is described as L 1 xy , denoted by p(L 1 xy ), the prior probability of the connection of node pair (x, y) can be defined as which indicates the link density in the training set. According to the information theory [1,2], the effect of common neighbors on the connection of two nodes can be estimated by where I(L 1 xy ) is the self-information of that node pair (x, y) has one link. I(L 1 xy ; O xy ) is the mutual information between the event that node pair (x, y) is connected and the event that the common neighbor set O xy is known. I(L 1 xy ; O xy ) indicates the reduction in uncertainty of the connection between node x and node y when the common neighbors are available.
If the nodes of O xy are assumed to be independent of each other, then where z is one of the common neighbors of node x and node y. According to the definition of mutual information, I(L 1 xy ; z) can be wrote as Particularly, p(L 1 xy |z) is the clustering coefficient of node z, and can be denoted as where N △z and N ∧z are respectively the numbers of connected and disconnected node pairs whose common neighbors include node z. We substitute Eq. (13) into Eq. (12) and obtain where I(L 1 xy ) and I(L 1 xy ; z) can be calculated by Eqs (11) and (14), respectively. Moreover, the information brought by the links across two neighbor sets can also be used to make predictions. The effect of links between two neighbor communities on the connection likelihood can be estimated by I(L 1 xy |P xy ) = I(L 1 xy ) − I(L 1 xy ; P xy ), where I(L 1 xy ; P xy ) is the mutual information between the event that node pair (x, y) is connected and the event that the links cross neighbor sets of nodes x and y are known. I(L 1 xy ; P xy ) denotes the reduction of uncertainty in the connection of node pair (x, y) when the links between neighbor sets of nodes x and y are obtained. If the links in P xy are supposed to be independent of each other, then where l st is a link in P xy with endpoints s and t. In addition, I(L 1 xy ; l st ) can be denoted as where I(L 1 xy |l st ) is the conditional self-information of the event that node pair (x, y) is connected when link (s, t) is the link between neighbor sets of nodes x and y. Here p(L 1 xy |l st ) can be calculated in a similar way of estimating p(L 1 xy |z), which can be described as where N st stands for the number of connected node pairs whose neighbors are s and t respectively and N ⊓st denotes the number of disconnected node pairs whose neighbors are s and t respectively. We substitute Eq. (18) into Eq. (17), and obtain where I(L 1 xy ) and I(L 1 xy ; l st ) can be calculated by Eqs. (11) and (19) respectively. Now given a disconnected node pair (x, y), we can obtain the common neighbors and links across the node pair's neighbor sets. After calculating the effect of common neighbors and links between the node pair's neighbor sets using Eqs. (16) and (21) respectively, the connection likelihood score for this non-adjacent node pair can be defined as According to this equation, the score can be locally calculated by the neighbor sets of nodes x and y by using the information-theoretic model. Thus, we call it Neighbor Set Information (NSI) index. According to the information theory, the smaller I(L 1 xy |O xy ) and I(L 1 xy |P xy ) are, the higher probability of a future link between nodes x and y tends to be. Therefore, we define the score as the negation of I(L 1 xy |O xy ) and I(L 1 xy |P xy ). For a simpler formalization, we define λ = λ 2 /λ 1 , and obtain

Data Description
Our experiments include 12 real-world networks drawn from disparate fields. Details are as follows and the basic structural features are presented in Table S1. The networks here are all regarded as undirected and unweighted networks. In order to better describe the perspectives of the networks, if a network is unconnected, we only consider the largest connected component.
• Karate [3] -A friendship network of a karate club at a US univercity in the 1970s.
• C.elegans (Celegans) [5] -A neural network of the nematode worm C.elegans. Table S1. The basic structural features of twelve real-world networks. N and M are the numbers of nodes and links in the network, respectively. e is the network efficiency [12], defined as e = 2 N (N −1) ∑ x,y∈V,x̸ =y d −1 xy , where d xy is the shortest path distance between node x and node y. C and r are clustering coefficient [10] and assortative coefficient [13]. ⟨k⟩ and ⟨d⟩ denote the average degree and the average shortest distance. H is the degree heterogeneity, denoted as H = ⟨k 2 ⟩/⟨k⟩ 2 . • Email [7] -A network of Alex Arenas's email.

Networks
• Political Blogs (PB) [8] -A network of the US political blogs.
• Kohonen [4] -A network of articles with topic self-organizing maps or references to Kohonen.
• EPA [4] -A network of web pages linking to the website www.epa.gov.
• Power [10] -An electrical power grid of the western of US.
• Router [11] -The router-level topology of the Internet.