Link predication based on matrix factorization by fusion of multi class organizations of the network

Link predication aims at forecasting the latent or unobserved edges in the complex networks and has a wide range of applications in reality. Almost existing methods and models only take advantage of one class organization of the networks, which always lose important information hidden in other organizations of the network. In this paper, we propose a link predication framework which makes the best of the structure of networks in different level of organizations based on nonnegative matrix factorization, which is called NMF 3 here. We first map the observed network into another space by kernel functions, which could get the different order organizations. Then we combine the adjacency matrix of the network with one of other organizations, which makes us obtain the objective function of our framework for link predication based on the nonnegative matrix factorization. Third, we derive an iterative algorithm to optimize the objective function, which converges to a local optimum, and we propose a fast optimization strategy for large networks. Lastly, we test the proposed framework based on two kernel functions on a series of real world networks under different sizes of training set, and the experimental results show the feasibility, effectiveness, and competitiveness of the proposed framework.

Definition of the link predication problem. As most of works about link predication denoted, we consider an unweight and undirected network G = (V, E), V and E represent the sets of nodes and edges in the network, respectively. n = |V| and m = |E| are the number of nodes and edges of this network, respectively. The adjacency matrix of the network is denoted as A, if nodes i and j has a link, then A ij = A ji = 1 and A ij = A ji = 0, otherwise. As demanded of link predication, we divide the edges of the network into training set and test set, denoted as E 1 and E 2 , it is obvious that E = E 1 ∪ E 2 and E 1 ∩ E 2 = Ø. We use A 1 and A 2 denoting the matrix formation of E 1 and E 2 with all the nodes in V, respectively, and both of them are symmetric with 1 or 0 as the elements and A 1 + A 2 = A.
We let L = |E 2 |/2 be the number of edges in test set and it is easy to know |E 1 | = 2(m − L), the number of all the possible edges in the network but out of the training set, we denote it as candidate set, is = − − − E n n m L ( 1)/2 ( ). Then we need to learn one model from the training set E 1 , compute likelihood scores for each edge in the candidate set, select the edges with top L values, and validate that on the testing set E 2 based on some evaluation indexes. 3 . Here, we will introduce the formation of our proposed link predication framework, including how to map the network into another space to get the other classes of organization structure of the network based on kernel functions and how to construct our proposed model.

Formation of proposed NMF
Kernel function. Kernel functions have been widely applied to pattern recognition and machine learning, which are based on a fixed nonlinear feature space mapping φ(x), the kernel function is generally given by the relation T For a given network, we can regard each column of the adjacency matrix, the first order link of one node, as the feature vector of the node. So a series of kernel functions can be applied to the network to get different classes of organization structure. Such as the polynomial kernel, a non-stationary kernel and for problems where all the training data is normalized, the gaussian kernel and exponential kernel, which are examples of radial basis functions, and so on.
Without loss of generality, in this paper, we introduce two classic kernel functions as our instances in the proposed framework, called linear kernel 22 and covariance kernel 23 . They are denoted on the network as where, X is the adjacency matrix of the observed network (or training set), X ·i is the i-th column of X and μ is the average value of all the X ·i . It is obvious that both K 1 (X) and K 2 (X) are the symmetric positive matrices and there is no additional parameter in both them. The linear kernel K 1 (X) could extract the local structure informations of the network, yet the covariance kernel K 2 (X) could extract the global structure information. Although we just take advantage of the two kernel functions in our paper, we believe that the proposed framework can be easily extended and scaled.
Detail of the framework. Before introducing our proposed framework, we simply review the nonnegative matrix factorization for link predication. Based on the adjacency matrix X of observed network or training set, the objective function can be written as where D(X|WH) represents the distance between X and WH, such as the quadratic loss function or K − L divergence. W and H are the latent feature matrices(basis matrix and coefficient matrix), with the size of n × C and C × n, respectively, C is the number of latent features or the inner rank of X. f(W, H) is the penalty function about W and H, such as L 1 or L 2 norm 24 . Without loss of generality, we consider a simple case with the quadratic loss function, and rewrite the objective function as or in a matrix form as How to fuse the other organization structure with equation 5 in a principled way? Motivated by the nonnegative matrix factorization for recommendation system, we propose the objective function of the NMF 3 as follows where R is the organization structure obtained by the kernel function, which has the same size with X. Parameter γ is used to scale the strength of R. After optimizing the equation 6, we can compute the similarity of all the edges in candidate set by WH. The setting of parameters γ and λ is in the experimental results, and the how to optimize the objective function, the detail algorithm and how to scale it to large networks can be seen in section methods.

Evaluation index.
To quantify the performance of the link predication methods, we introduce three evaluation metric, area under the receiver operating characteristic curve (AUC) 25 , Precision 26 and Prediction-Power 27 .
In fact, link predication methods give an order list of all the edges in candidate set E according the computing similarity values. Based on the rank of edges in E , the AUC value is the probability that we randomly select an edge in test set E 2 with a higher rank score than a randomly selecting an edge in candidate set E . The AUC can always be approximately calculated as where t, t′ and t′′ are the number of times of randomly independent comparisons, a higher score of an edge in test set E 2 and both two having a same score. If we select the edges with the top L similarity values, denoted as E p , then the Precision can be computed as which represents the accuracy of link predication methods.
As discussed in ref. 27, the Prediction-Power (PP) is denoted as where Precision Random is the performance of random-predictor can be computed by L/(n(n − 1)/2 − (m − L)), and this metric can assess deviation from the mean random-predictor performance.
Scientific REPORtS | 7: 8937 | DOI:10.1038/s41598-017-09081-9 Baseline methods. We compare our method with several well known methods, including CN, AA, RA, Salton, Jaccard, ACT and CRA indices and some widely used global methods, which are denoted as following, respectively.
(1) Common Neighbors (CN) 15 , which is denoted between nodes x and y as where Γ(x) denotes the set of neighbors of node x. (2) Adamic-Adar (AA) 28 , which is denoted as where k z is the degree of node z. This index considers the information about the degree of the common neighbors of the two nodes, and assigns the less-connected neighbors more weight. 29 , which is denoted as the RA index assigns the different weight to the common neighbors. (4) Salton Index 30 , which is denoted as xy Salton x y the index is also based on the number of common neighbors yet with another normalization methods. (5) Jaccard index 8 , which is denoted as xy Jaccard which is the ratio between the number of the intersection of Γ(x) and Γ(y) and the number of the union of that. (6) Average Commute Time (ACT) 31 , it is denoted as xy ACT xx yy xy which means that two nodes are more similar if they have a smaller average commute time, and this similarity between the nodes x and y can be defined as the reciprocal of average commute time between x and y. Where + l xx represents the elements of matrix L + , the pseudo inverse of the Laplacian matrix of the network. (7) CRA, which is a extend similar index based on the RA denoted by Carlo Vittorio Cannistraci in ref. 27. It is denoted as where α z refers to the sub-set of neighbors of z that are also common neighbors of nodes x and y. (8) SPM, which is called structural perturbation method 32 for link predication by assuming that the regularity of a network is reflected in the consistency of structural features before and after a random removal of a small set of links. (9) HSM, the hierarchical structure model proposed by Aaron Clauset in ref. 17, which can infer hierarchical structure from network data and predict the missing links. (10) SBM, which can deal the data reliability in complex networks and infer missing and spurious links based on the stochastic block model 18 . (11) LR, a robust principal component analysis based 33 for estimate the missing links in complex networks and we set the weighting parameter to balance the low-rank property and sparsity as 0.1. (12) LOOP, which is an algorithmic framework of probability by denoting a predefined structural Hamiltonian 34 based on the network organizing, and predict each non-observed link by computing the conditional probability of adding the link to the observed network. As far as we know, the methods LOOP and SPM have nearly best performance on link predication recently, however, both of two methods are time-consuming, especially the LOOP.
In this paper, we denote NMF 3 − 1 and NMF 3 − 2 indicating the linear kernel and and covariance kernel based on the proposed framework.

Experimental results
Datasets. We evaluate the performance of our proposed framework by using ten quality networks from various areas, including social, biological, and technological network. The networks used in the experiment are described as follows and the basic statistical features are shown in Table 1. Directed links are treated as undirected, multiple links are treated as a single unweighted link and self loops are removed.
(1) Jazz 35  Results and analysis. Parameters setting: we select six networks including FWFD, FWMW, Jazz, Metabolic, USAir and Celegans from the all ten networks, and analyze the experimental sensitivity of γ and λ in our framework with the performance of link predication. As represented in Fig. 1, we set the proportion of training set as 0.9, and take the widely used evaluation index Precision for link predication as evidence. It is obvious that the performances on FWFD, Jazz, Metabolic, USAir and Celegans are gradual stable. Although the different settings of γ and λ have significant influence on the predict results, we also know that our framework has equally better performance than other baseline methods. Without losing generality, we set γ = 0.1 and λ = 2 in subsequent experiments.
As represented in Tables 2, 3 and 4, we show the performance on the ten real world networks with the proportion of training set 0.9 based on AUC, Precision and PP, respectively. The black texts represent the largest value in each column, each row in the table represents the experimental results of one method including the average value and standard deviation over 100 times of dividing the network into training set and test set. From Table 2, our method NMF 3 , SPM and LOOP have more competitive performance based on precision compared with other methods. As represented in Table 3, our method and SPM have better performance based on AUC, and the RA index is a best select in similarity-based index. In Table 4, we also show the mean value of PP of each method at last column. Also of note, the N/A in the tables represents that the value could not be computed for the corresponding method not applicable to large-scale networks. In general, our methods have nearly equal performance to LOOP and SPM based on the three evaluation index, however, the both two methods are more time-consuming compared with the proposed method in this paper and we analyze the computational complexity in section methods.
Furthermore, we analyze the experimental results on the networks with different fraction of training set from 0.9 to 0.2. As reported in Figs 2, 3 and 4 , and r is the assortative coefficient.    Table 3. Link prediction accuracy measured by AUC on the 10 real networks. We compared our methods (NMF 3 − 1, NMF 3 − 2) with other methods on the 10 network data sets and the AUC are returned with an average run of over 100 times. For each data set, the presented links are partitioned into training set (90%) and test set (10%).  Table 4. Link prediction accuracy measured by Prediction-Power on the 10 real networks. We compared our methods (NMF 3 − 1, NMF 3 − 2) with other methods on the 10 network data sets and the AUC are returned with an average run of over 100 times. For each data set, the presented links are partitioned into training set (90%) and test set (10%).

PP
larger networks, especially for the SPM and LOOP). The black lines represent the performance of the proposed NMF 3 − 1 and NMF 3 − 2 methods, the purple lines correspond to the SPM and LOOP methods, the rest lines are the other global methods (HSM, SBM, and LR) and similarity-based methods. From the results, it is obvious that SPM, LOOP and our methods have better and competitive performance that others. There are two different  expressions in the figures, they are the FWFD and FWMW networks, on which our proposed framework has super performance than other methods. In fact, our methods have competitive and stable performance.

Discussion
In this paper, we have proposed a framework of link predication which could make multi class organizations of the network. We take two kernel functions as the special cases of the proposed framework and experiments show the feasibility, effectiveness, and competitiveness of the framework.
As an extension to the nonnegative matrix factorization, our proposed framework for link predication not only inherits the advantages of which, but also take full advantage of multi organizations of the network based on kernel function. Furthermore, we proposed a gradient descent algorithm to optimize the object function and extend it to large networks. Other more, our framework is easy to be extended to directed and weighted networks, for that it is based on nonnegative matrix factorization, just by letting the X be directed and weighted. And we believe that this proposed method highlights the research in which taking different structure information for link predication.
There are some limitations and improved studies for our proposed framework in future. One of which is how to set parameters γ and λ to be adaptive on different networks. For our framework only taking the adjacency matrix and one of other organization of the network, making the best of more classes of structure information of the network in a principled and effective way is our next work.

Methods
In this section, we introduce how to optimize the objective function 6 with a gradient descent algorithm, give a simple operation process for the algorithm and propose a strategy to scale the algorithm for larger networks.
Parameter learning. The determination of the number of latent features C is a very important and difficult problem in the matrix factorization. Here, for it is not our primary attention, we take an easy and effective method for automatic determination of C, Colibri 42 , which seeks a nonorthogonal basis by sampling the columns of the input matrix.
Because of the non convex of objective function 6, we alternate update W with fixed H and update H with fixed W under the Majorization-Minimization framework 43 . We rewrite the objective function 6 as Here, the ⋅ represents the element wise multiplication. To enforce the non-negativity constraints of W and H, we introduce the Lagrangian and write the equation 6 as where Φ and Ψ are the Lagrange multipliers, following the Karush-Kuhn-Tucker (KKT) optimality conditions 44  Algorithm for NMF 3 . Here, we summed the algorithm for proposed NMF 3 based on the procedure of link predication in 1.
Complexity analysis and discussion. Here, we give a simple complexity analysis of the proposed algorithm. The most time-consuming parts are updating W and H, for each iteration, the time cost of (γR · X)H T is O(N 2 C + N 2 ), the time cost of ((1 + γR) · (WHH T ) + W) is NC 2 + N 2 , so the total time cost of of the algorithm is O (N iter (N 2 C + NC 2 + N 2 + NC)) ~ O(n iter (N 2 C)), where n iter is the number of iterations. If we consider the sparse of real world networks, the time cost can be as O(n iter (mC)), where m is the number of the edge of the network. The most confused problem of the algorithm is that it just converges to a local minimum, so we need run the algorithm many times and chose a best one which has the least value of the objective function.
Scale to large networks. In order to deal the large networks, we rewritten the object function 6 as here, the i ~ j indicates there existence an edge between nodes i and j, then we could only compute the observed links in the training set of the network. The optimization process of function 25 is similar to the algorithm 1.