An information theoretic approach to link prediction in multiplex networks

The entities of real-world networks are connected via different types of connections (i.e., layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method—SimBins—is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applying SimBins to various datasets from diverse domains, our findings indicate that SimBins outperforms the compared methods (both baseline and state-of-the-art methods) in most instances when predicting links. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks.

of node pairs. In 17 mutual information (MI) of common neighbors is incorporated to estimate the connection likelihood of a node pair. In addition, Path Entropy (PE) 18 similarity index takes quantity and length of paths as well as theirentropy into account. This results in a better assessment of connection likelihood for node pairs. In 19 , authors proposed an information theoretic method to benefit from several structural features at the same time. By using information theory, they score each structural feature separately and then combine them by weighted summation. Then they apply the idea on common neighbors and connectivity of neighbor sets as two structural features. Although, most of literature about link prediction is devoted to unweighted networks but a few works have targeted the weighted networks. In 20 , authors use a weighted mutual information to predict weighted links which benefits from both structural properties and link weights. The results are promising when compared to both weighted and unweighted methods.
In a coarse-grained sense, learning-based link prediction models reside in a different class than aforementioned similarity-based ones. They learn a group of parameters by processing input graph and use certain models, such as feature-based prediction (HPLP 21 ) and latent feature extraction (Matrix Factorization 15 ). Representation learning has helped automating the entire process of link prediction, especially feature selection; node2vec 22 and DGI 23 , for instance. Recently, an interesting multiplex embedding model has also been proposed called DMGI 24 which is basically an extension of DGI. Learning-based methods often yield better results than their similaritybased counterparts, but that does not mean these models are obsolete. On the one hand, similarity-based models provide a better understanding of the underlying characteristics of networks. Take common neighbors (CN) for example, which indicates the high clustering property of networks 18 or Adamic-Adar index which is based on the size of common nodes' neighborhoods 9 . On the other hand, similarity-based methods often take less computation effort, making them suitable for online prediction without costly training procedures or feature selection stages 25 .

Related works
Complex networks research was focused on single-layer networks (simplex or mono-plex) for many years. The study of multi-layer (multiplex or heterogeneous) networks has gained the attention of researchers in the past few years. Refs. 26,27 provide noteworthy reviews on history of multi-layer networks. The attempts to predict multi-layer links are not abundant and some are discussed here.
Hidden geometric correlation in real multiplex networks 28 is an interesting work which depicts how multiplex networks are not just random combinations of single-layer networks. They employ these geometric correlations for trans-layer link prediction i.e., incorporating observations of other layers for predicting connections in a specific layer. This work is followed by a study that argues the requirement of a link persistence factor to explain high edge overlap in real multiplex systems 29 . In heterogeneous networks (i.e., networks with different types of nodes and relations), several similarity-search approaches have been proposed. PathSim 30 is a meta pathbased similarity measure that can find similar peers in heterogeneous networks (e.g. authors in similar fields in a bibliographic network). The intuition behind PathSim is that two peer objects are similar if they are not only strongly connected, but also share comparable visibility (number of path instances from a node to itself). HeteSim 31 is another method of the same kind which can measure similarity of objects of different type, inspired by the intuition that two objects are related if they are referenced by related objects. Their drawback, however, is their dependence on connectivity degrees of node-pairs (neglecting further information provided by meta paths themselves) and their necessity of using one and usually symmetric meta-path. In 32 , a mutual information model has been employed to tackle these problems. Most meta path-based models suffer from lack of an automated meta-path selection mechanism, in other words, pre-defined meta paths (mostly specific to the dataset under study) are utilized for prediction. In the previously discussed methods, including longer meta paths required much more computation to analyze them and determine their effects.
Link prediction for multiplex networks has been addressed by researchers using features and machine learning. A study of a multiplex online social network, demonstrates the importance of multiplex links (link overlap) in significantly higher interaction of users based on available side information 33 . The authors consider Jaccard similarity of extended neighborhood of nodes in the multiplex network as a feature for training a classifier for link prediction task. A similar work on the same dataset benefits from node-based and meta-path-based features 34 . A specialized type of these meta-paths is tailored to be originated from and ending at communities. The effectiveness of the features has been examined by a binary classification for link predication task. Recently, other interlayer similarity features, based on degree, betweenness, clustering coefficient and similarity of neighbors has been used 35 .
Furthermore, the issue of link prediction has been investigated in a scientific collaboration multiplex network 36 . The authors have proposed a supervised rank aggregation paradigm to benefit from the node pairs ranking information which is available in other layers of the network. Another study uses rank aggregation method on a time-varying multiplex network 37 .
Yao et al. in 38 discuss the issue of layer relevance and its effect on link prediction task. The authors use global link overlap rate (GOR) and Pearson correlation coefficient (PCC) of node features as measures of layer relevance and later they use it to combine the basic similarity measures of each layer. The results support that the more layers are relevant, the better performance of link prediction is attained. In this work, well-known single-layer similarity measures like CN, RA, and LPI are used. We compare our work with their best performing methods. They show that LPI as a quasi-local metric is the best choice of base similarity measure. For interlayer relevance both GOR and PCC perform well and we refer to them as YaoGL and YaoPL, respectively. Samei et al. have studied the effect of other layers on the target layer using global link overlap rate 39 . Two features based on hyperbolic distance are used, WCN and HP. WCN uses embedded network in geometric space and calculates hyperbolic distance of nodes to weigh the importance of common neighbors. HP considers the hyperbolic distance of nodes Methods Link prediction in multiplex networks. Consider where M , V and E α are the number of layers, the set of all nodes and existing edges in layer α of the multiplex network, respectively. Let U = V × V be the set of all possible node pairs. Current research aims to study undirected multiplex networks; therefore, it is assumed that G(V , E α ) for any arbitrary layer α is an undirected simple graph. The link prediction in multiplex networks is concerned with the issue of predicting missing links in an arbitrary target layer T ∈ {1, 2, . . . , M} with the help of other auxiliary layers. To be able to evaluate the proposed method, E T i.e. the edges in target layer is divided into a training set E T train (90% of E T ) and a test set Only the information provided by the training set is used in the prediction task and eventually, E T test is compared to the output of the proposed algorithm (link-existence likelihood scores for a subset of U − E T train , including E T test ), determining the performance of the method. To be more specific, link likelihood scores are calculated for node pairs of E T test and a random subset To put it in a few words; only a subset of non-observed links in training set are scored for the sake of complexity which will be discussed in detail later. Notice coefficient 2, a ratio incorporated to implement the link imbalance assumption in real networks (that are mostly sparse by nature 44 ).
In the present study, the issue under scrutiny is how employing one layer of the multiplex network such as A , facilitates the task of link prediction in another layer T where T, A ∈ {1, . . . , M}; T � = Ai.e., a duplex subset of the multiplex network. In 'Discussion' section, it is argued that how one can extend the proposed method to utilize the structural information of multiple layers for link prediction.
Evaluation methods. In their ideal form, link prediction algorithms tend to rank non-observed links in a network so that all latent links are situated on top of the ranking and all other non-existent links underneath. This ranking is based on a link-likelihood score that is dedicated to node pairs corresponding to non-observed links in the network. For imperfect rankings a metric is required to assess the quality of the ranking. Here, we describe two evaluation metrics used in this research.
AUC : Using of Area Under Receiver Operating Characteristic Curve (AUC or AUROC) 45 is prominent in the literature for evaluating link prediction methods 16 . AUC indicates the probability that a randomly chosen missing link is scored higher than a randomly chosen non-existent link, denoted as: where by performing n times of independent comparisons ( n = 10000 in our experiments), a randomly chosen latent link has a higher score compared to a randomly chosen non-existent link in n ′ times and are equally scored in n ′′ times. AUC will be 1 if the node pairs are flawlessly ranked and 0.5 if the scores follow an identical and independent distribution i.e., the higher the AUC, the better the scoring scheme is.
Precision: Given the ranked (by score) list of the non-observed links, the precision is defined as the ratio of the missing links to the number of selected items from the top of the list. That is to say, if we take the top-L links as the predicted ones, among which L r links are known missing links; Precision is defined as: Here, we consider L = E T test . Clearly, higher precision indicates higher prediction accuracy.
Data. Various real-world multiplex network datasets from different domains are selected for investigation; from social (Physicians, NTN and CS-Aarhus) to technological (Air/Train and London Transport) and biological systems (C. Elegans, Drosophila and Human Brain). They also have diverse characteristics that are briefly introduced in Table 1.  46 . To relate the train stations to the geographically nearby airports, in 28 they have aggregated all train stations within 50 km from an airport into a super-node. Then, the super-nodes are considered as connected if they share a common train station, or if one train station of one super-node is directly connected to a station of the other super-node. Air is the network of airports and Train is the network of aggregated train station super-nodes.
C. Elegans. The network of neurons of the nematode Caenorhabditis Elegans that are connected through miscellaneous synaptic connection types: Electric, Chemical Monadic and Chemical Polyadic 47 .
Drosophila Melanogaster (DM). Layers of this network represent different types of protein-protein interactions belonged to the fly Drosophila Melanogaster, namely suppressive genetic interaction and additive genetic interaction. More details can be found in 48,49 .
Human Brain (HB). The human brain multiplex network is taken from 28,50 . It consists of a structural or anatomical layer and a functional layer that connect 90 different regions of the human brain (nodes) to each other. The structural network is gathered by dMRI and the functional network by BOLD fMRI 50 . In this multiplex network, the structural connections are obtained by setting a threshold on connection probability of brain regions (which is proportional to density of axonal fibers in between) 28 . The functional interactions are derived in a similar manner, by putting a threshold on the connection probability of regions which is proportional to a correlation coefficient measured for activity of brain region pairs 28 .
Physicians. Taken from 51 , the Physicians multiplex dataset contains 3 layers which relate physicians in four US towns by different types of relationships; to be specific, advice, discuss and friendship connections.
Noordin Top Terrorist Network (NTN). Taken from 52 , this multiplex dataset is made of information among 78 individuals i.e. Indonesian terrorists that depicts their relationships with respect to exchanged communications, financial businesses, common operations and mutual trust.
London Transport. For the purpose of studying navigability performance under network failures, De Domenico et al. 53 gathered a dataset for public transport of London consisting of 3 different layers; the tube, the overground, and the docklands light railway (DLR). Nodes are stations which are linked to each other if a real connection exists between them in the corresponding layer.
CS-Aarhus. This dataset is collected from 54 which is conducted at the Department of Computer Science at Aarhus University in Denmark among the employees. The network consists of 5 different interactions www.nature.com/scientificreports/ corresponding to current work relationships, repeated leisure activities, regularly eating lunch together, coauthorship of publications and friendship on Facebook. SacchPomb. The SacchPomb dataset is taken from 28,48 and represents the multiplex genetic and protein interaction network of the Saccharomyces Pombe (fission yeast). The multiplex consists of 5 layers corresponding to 5 different types of interactions. Layer 1 corresponds to direct interaction, Layer 2 to colocalization, Layer 3 to physical association, Layer 4 to synthetic genetic interaction, and Layer 5 to association. More details on the data can be found in 48 .
Node multiplexity in Table 1 shows the fraction of nodes in a multiplex network that are active (have at least one link attached) in more than one layer.
Information theory background. This sub-section is concerned with the issue of introducing necessary concepts of information theory, as it lays out the main mathematical background of the proposed method. What follows is the definition of self-information and mutual information.
Given a random variable X , the self-information or surprisal of occurrence of event x ∈ X with probability p(x) is defined as 55 : The self-information implies how much uncertainty or surprise there is in the occurrence of an event; the less probable the outcome is, the more the surprise it conveys. The base of the logarithmic functions is assumed to be 2 throughout the paper, as they measure uncertainty in bits of information.
Let's proceed with the definition of mutual information between two random variables X and Y with joint probability mass function p(x, y) and marginal probability mass functions p(x) and p(y) , respectively. The mutual information I(X; Y ) is 56 : Consequently, the mutual information of two events x ∈ X and y ∈ Y can be denoted as 17 : In fact, the mutual information indicates how much two variables are dependent to each other i.e., for a variable X , how much uncertainty is reduced due to observation of another variable Y . The mutual information would be zero if and only if two variables are independent. In the following section, we will describe how these two measures play their roles in designation of our method.

Base similarity measures.
There is extensive literature on similarity measures that determine how similar two nodes are in a single-layer network; as it was partially presented on introduction of this paper. In our proposed method, a subset of these similarity indices (both local and global) is used as base measures that the multiplex link prediction model is built on top of them. CN 1 : Maybe, the most well-known and typical way to measure similarity of two nodes x and y is to count the number of their common neighbors: where Ŵ(x) and Ŵ(y) are the set of neighbors of x and y , respectively. RA 10 : In Resource Allocation, degree of a node is considered as a resource that is allocated to the neighbors of that node negatively proportional to its degree: ACT 1 : Random-walk based methods account for the steps required for reaching one node starting from some arbitrary node. Average Commute Time measures the average number of steps required for a random walker to reach node y starting from node x . For the sake of computational complexity, pseudo-inverse of Laplacian matrix is utilized to calculate the commute time: www.nature.com/scientificreports/ where l + xy is the [x, y] entry in pseudo-inverse Laplacian matrix i.e., l + xy = [L + ] xy . The pseudo-inverse of Laplacian is calculated as 57 : where e is a column vector of 1's ( e ′ is its transpose) and n is the total number of the nodes.
LPI 10,13 : To provide a good tradeoff of accuracy and computational complexity, the Local Path Index (LPI) is introduced as an index that takes consideration of local paths, with wider horizon than CN. It is defined as: where ε is a free parameter. Clearly, this measure degenerates to CN when ε = 0 . And if x and y are not directly connected, (A 3 ) xy is equal to the number of different paths with length 3 connecting x and y . This index can be extended for higher order paths and considering paths of infinite length this similarity measure converges to Katz index. The LP index performs remarkably better than the neighborhood-based indices, such as RA and CN. Throughout the current work, ε is set to 10 −4 wherever LPI is used. This is the same for the compared methods. In 16 , it is stated that the value of can be directly set as a very small number instead of finding its optimum, which may take a long time. In particular, the essential advantage of using a second-order neighborhood is to improve the distinguishability of similarity scores.
For more details on base similarity measures, readers are encouraged to see surveys on link prediction algorithms 16,58 .

Results
Does the structure of one layer of a multiplex, provide any information on the formation of links in some other layer of the same network? Take a social multiplex network, for example, in which one layer states people's work relationships and the other layer represents their friendship. Intuitively it can be conjectured that in a real multiplex like our sample social network, structural changes in one layer can affect the other; if two people become colleagues, the conditions of them being friends will probably not be the same as it was before. More specifically, is there any correlation among the structure of layers of a multiplex network? This question has been positively answered in previous studies with different approaches. In 28 a null model is created for a multiplex network, by randomly reshuffling inter-layer node-to-node mappings. Subsequently, it is shown that geometric inter-layer correlations are destroyed in the null model compared to the original network.
Various structural features can be analyzed to uncover correlations between layers. Direct links, common neighbors, paths 1 and eigenvectors 59 are such examples. In the following sections we will develop a set of tools that assist in collection of evidences about inter-layer correlations in multiplex networks, as basic intuitions supporting the proposed link prediction framework.
Partitioning Node Pairs (Binning). Consider two layers T, A ∈ {1, 2, . . . , M}; T � = A of a multiplex network with Mlayers and V nodes. T is the target layer, so it is intended to predict likelihood of presence of links in that layer, and A is the auxiliary layer assisting the prediction task. A subset The size of Z T train is twice as large as E T train , so that U ′ would be a suitable representative of the target layer due to the link imbalance phenomenon in real complex systems. Two different partitions of U ′ is formed (using equal-depth binning, described in the following paragraph): (i) w.r.t the target layer T: With respect to the auxiliary layer A: These partitions are introduced as bins of node pairs in current study. The number of bins w.r.t target and auxiliary layer are b T and b A , respectively. An equal-depth (frequency) binning strategy is applied to the target layer similarity scores of the node pairs in U ′ , in order that each partition S T i ; i ∈ {1, 2, . . . , b T } contains approximately the same number of members (node pairs). The same strategy goes for similarity scores in auxiliary layer A , establishing S A j ; j ∈ {1, 2, . . . , b A } partitions. It should be noted that S T i andS A j are two different partitions of the same set, namely U ′ . To make distinction between these two partitions, readers should pay attention to the superscript in the notation. Therefore, for i = j , S T i is not necessarily equal to S A j because the former partitioning is based on similarity in the target layer while the latter is based on similarity in the auxiliary layer.
Aforementioned partitions (bins) form the building blocks of how the multiplex networks are scrutinized in this paper, as they put forward a coarse-grained view of the data; tolerating the insignificant fluctuations observed in particular regions of the networks. The setting denoted above will be used from now onwards, to avoid any further repetitions.
Intra-layer and trans-layer connection probabilities. The foregoing discussion introduces two key measures for target and auxiliary layer bins, namely S T i and S A j : (1) intra-layer connection probability p intra (S T i ) ,  (2) trans-layer connection probability p T trans (S A j ) . Intra-layer connection probability in S T i is the connection likelihood of pairs existing in that bin. This measure can also be expressed as conditional probability of connection of an arbitrary node pair x, y in layer T , given their similarity (bin) in the same layer: Notice L T = 1 , which is the event that any randomly selected pair (x, y) are linked in layer T . Empirically, p intra (S T i ) is computed as proportion of linked node pairs in S T i to all of node pairs in the set: Intra-layer connection probability for four different multiplex (duplex) networks is provided for each bin in (Fig. 1). In data-driven observations of this paper, wherever a similarity measure is involved, Resource Allocation (RA) index is used; otherwise specified. Additionally, it is assumed that the number of bins in both the target and auxiliary layers i.e., b T and b A are set to 10. Our experiments show that too small number of bins leads to significant decrement in prediction results.
In most of the cases, increasing the number of bins either has no effect on prediction results or degrades them (although not quite significantly). Additionally, large number of bins brings unnecessary computational complexity to our algorithm. We have also tried a more adaptive approach for choosing the number of bins by maximizing the entropy of node-pairs distribution in bins which lead to no substantial improvement in prediction. A value www.nature.com/scientificreports/ between 10 and 50 is recommended as SimBins shows no significant sensitivity in terms of accuracy within the mentioned range and the computational overhead is miniscule. The bars with dashed lines in (Fig. 1) represent imputed values. Because of high frequency of some certain similarity values (such as 0 scores in RA for node pairs with no common neighbors), a perfect equal-depth binning may not be feasible; as a result, a number of bins will contain no sample node pairs. The value of intra-layer connection probability for these bins has been imputed using a penalized least squares method which allows fast smoothing of gridded (missing) data 60 . In addition to more clear observations, this imputation will let us fix the number of bins and handle missing data in a systematic way. The results indicate that by the increment of similarity (higher bin numbers) intra-layer connection probability increases respectively, depicting a positive correlation between similarity (bin number) and intra-layer connection probability; as stated in seminal work of Liben-nowell and Kleinberg 1 .
Trans-layer connection probability is defined analogously except that although connection in target layer T is concerned, the similarity scores of node pairs are given in auxiliary layer A . Similar to formula (11), p T trans (S A j ) can be defined as follows: Empirical value of trans-layer connection probability is calculated likewise: In other words, p T trans w.r.t A relates the similarity of node pairs in layer A to their probability of connection in layer T . Trans-layer connection probability of four duplexes is depicted in the left column of (Fig. 2). Moreover, the node pairs in S A j can be divided into two disjoint sets based on their connectivity in the auxiliary layer. Then the trans-layer connection probability for connected node pairs in auxiliary layer S A j ∩ E A and unconnected ones S A j ∩ (U − E A ) will be: and: as shown in the middle and right columns of (Fig. 2), respectively. The bars with dotted lines represent imputed trans-layer connection probabilities, similar to intra-layer connection probabilities in (Fig. 1). By inspecting the values of trans-layer connection probabilities for the datasets under study, a rising pattern is prominent by moving to bins corresponding to higher similarity ranges. Drosophila in (Fig. 2d1-3) brings up an exceptional case, where similarity in the auxiliary (Additive) layer shows no correlation with connection in the target (Suppressive) layer. Except these kind of irregularities in data, the available evidence appears to suggest that in most of the real multiplex networks, probability of connection in one (target) layer of the network does have positive correlation with similarity in some other (auxiliary) layer i.e., as similarity grows higher in the auxiliary layer, it can be a signal of higher connection probability in target layer. This observation develops the claim that for link prediction in target layer, not only the similarity of nodes in that same layer, but also their similarity in some other auxiliary layer can be utilized. Notice that this rising pattern in p trans is observed in almost all datasets under scrutiny, independent from the choice of similarity measure.
The previously described property of trans-layer connection probability lies at the heart of the current study, shaping the main idea of the proposed multiplex link prediction method. In addition, the connectedness of the node pairs in the auxiliary layer leads to significant increase in the trans-layer connection probabilities. In Human Brain and Physicians networks the presence of link in the auxiliary is a strong evidence of connectivity in the target layer. The case is similar for AirTrain network but with lower certainty. The Drosophila network is an exception as before. These findings are in consistence with the link persistence phenomenon as reported in 29 . Here, we propose a consolidated method which considers the similarity of node pairs in the target and auxiliary layers, and also their connectedness in the auxiliary layer as the underlying evidences for calculating the uncertainty of linkage in the target layer.
Furthermore, by simultaneously partitioning U ′ based on their similarity in both target and auxiliary layers, we obtain b T × b A partitions or 2d-bins. Within each 2d-bin, the fraction of target layer links to total node pairs is included i.e., the empirical connection probability in target layer is computed. In (Fig. 3), empirical probability of connection in 2d-bins is presented for the same duplexes as in (Fig. 2).
Several results can be inferred by scrutinizing (Fig. 3). Increment of the empirical probability of connection in the horizontal axis expresses the effectiveness of the similarity measure in target layer; the higher the bin number, the larger the fraction of node pairs that have formed links. Another aspect of the above figure is the ascension of the empirical probability of connection by moving to higher bin number in the auxiliary layer i.e., the vertical axis (except Drosophila in Fig. 3. d1-3), which is a sign of positive correlation between the probability of connection in target layer and similarity in the auxiliary layer; so far totally consistent with Figs. 1 and Fig. 2. This cross-layer connection and similarity correlation are observed in the majority of datasets under study, in which a subset of them is presented above. It is interesting that when similarity of a node-pair is very low in the target layer, high similarity in the auxiliary layer leads to stronger connection probability between them. www.nature.com/scientificreports/ The following sub-sections are concerned with the issue of how to estimate probability of connection in the target layer of a multiplex network by incorporating other layers' structural information with a systematic approach that generalizes beyond specific data. www.nature.com/scientificreports/

Fusion of decisions.
Consider two independent decision makers that determine the probability of occurrence of a certain event corresponding to a binary random variable. Each of them declares a probability p and q (where 0 ≤ p, q ≤ 1 ) for the same event, respectively. One would want to reach to a consensus based on these two different opinions. This goal can be achieved by incorporating various functions that operate on input probabilities. The AND operator is one such function: Another option could be the OR operator, defined as: The more interesting function in the context of current research is the OR operator because it fits much better in the problem of link prediction as it is less prone to variations of only one of the input probabilities. We will return to the issue of fusion of decisions in the following sub-section when characterizing the link prediction model.
The multiplex link prediction model. On these grounds, a model is suggested to predict probability of connection between node pairs in a layer of the multiplex network such as T which incorporates information both from the layer itself and from some other auxiliary layer A . The similarity between two distinct nodes x and y is defined as:   where S T xy is the min-max normalized similarity score of the pair (x, y) in target layer T i.e., the probability of connection in target layer (without any knowledge on bins partitioning) is estimated with similarity in that same layer, intuitively. The second term in Eq. (20) is the mutual information of (x, y) being connected in the target layer and belonging to S T i and S A j bins; which is estimated as follows: Equation (22) propounds the view that a group of node pairs dwelling in known target and auxiliary bins can be looked at similarly. To be more specific, if the goal is to obtain the mutual information between the event that (x, y) are connected and the event that it resides in both S T i and S A j , a possible workaround is to estimate it with the reduction in uncertainty of connection of any node pair due to which bins (target and auxiliary) it belongs to. Thus, according to Eq. (5), we proceed by expanding the right-hand side of Eq. (22): The term I(L T = 1) in Eq. (23) is the self-information of that a randomly chosen node pair is linked in target layer T . Clearly, I(L T = 1) is the same for every node pair in the multiplex network; therefore, it does not affect the scoring (node pairs ranking), and it can be safely neglected. Thus, to carry out the model specification, I(L T = 1|S T i , S A j ) needs to be calculated; which is the conditional self-information of that a randomly chosen node pair is linked in layer T when the pair's state of binning in target and auxiliary layer is known. Using Eq.
. On the basis of our discussion on fusion of decisions, the probability p(L T = 1|S T i , S A j ) for any randomly selected node pair (x, y) which is a member of S T i ∩ S A j is estimated by incorporating p intra (S T i ) i.e. intra-layer connection probability in target layer T and p T trans (S A j ) i.e. trans-layer connection probability in T w.r.t auxiliary layer A . Therefore, similar to Eq. (18), the OR operation on intra and trans-layer connection probabilities concludes in: It should be noticed that the trans-layer connection probability can be divided for connected and unconnected node pairs in the auxiliary layer according to Eqs. (15) and (16), respectively. To put it altogether, we incorporate Eqs. (15) and (16)   www.nature.com/scientificreports/ Algorithm 1 outlines the entire scheme. Now that our multiplex scoring model is complete, we will proceed by evaluating the method on the datasets section introduced earlier.
The diagram in Fig. 4 illustrates the process of node-pairs similarity calculation in SimBins. The main source of information are the structure of the target and auxiliary layers. The train and test sets are derived from the target layer including both links and non-existent link (the test set is later used for evaluation). The rest of the process includes partitioning of the train set ( U ′ ) according to the base similarity scores in T , A and connectedness in A . Accordingly, intra-layer and trans-layer connection probabilities of each partition (bin) is calculated and fed to the final SimBins scoring Eq. (25).
Experimental results. The link prediction performance on9 different datasets, a total of 29 network layers forming 52 layer-pairs has been reported based on both AUC (Table 2)  In Table 2, for each base measure, the highest mean AUC is shown in bold and, for each duplex (all 52 rows), the highest AUC among all of the methods (independent from the base measure) is highlighted with an underscore. SimBins dominates other baseline methods and proves to be an effective multiplex link prediction method due to several reasons: (i) Most of the time, SimBins is superior to the other baseline methods (i.e., bold entries). This can be further verified with the fact that SimBins achieves higher average of all mean AUCs (the last row of the table) (ii) In a large fraction of duplexes (37 of 52), the overall best mean AUC belongs exclusively to SimBins (in 6 other duplexes, SimBins achieves the best performance alongside another method, nonexclusively) (iii) SimBins performs better than the single-layer method (or S T ) in most of the cases whereas for similarities addition method ( S T + S A ) this is less frequently observed; meaning our method is capable of using other layer's information effectively. And, SB T,A is more robust against deceptive signals compared to S T + S A . Consider Drosophila for example. The slightly negative correlation between similarity in the auxiliary layer (Suppressive) and connection probability in the target layer (Additive), as previously discussed on (Fig. 2-d), has caused performance reduction for S T + S A whereas SimBins still performs as good as-if not better than-S T . A similar outcome can be observed for NTN and London Transport, more clearly when ACT is used as the base similarity measure. In CS-Aarhus, where Facebook is the target layer, both S T and S T + S A perform even worse than random scoring (expected 50% AUC) while SimBins keeps the performance up about 70 − 80% . As the last row indicates, the average mean AUC of SimBins is higher than both other baseline methods, no matter the choice of base measure.
There exist occasions in which SimBins cannot improve the link prediction performance compared to the base similarity measure. Specifically, Drosophila which the absence of inter-layer correlation as discussed earlier is the underlying reason. And, in London Transport, node multiplexity is far too low as shown in Table 1. Consequently, very few nodes are shared among different layers that makes utilization of structural similarities between layers a hard task.
The above discussion holds true for Adamic-Adar 9 , Preferential Attachment 8 , and LRW 15 similarity measures, as we have performed similar experiments which led to resembling results, but we have avoided bringing the corresponding details for the sake of brevity.
Interestingly, the results appear to suggest that choosing LPI as the base similarity measure, leads to the best overall performance in most of the multiplex networks. Using LPI as the base similarity measure for SimBins gives the best performance with average mean AUC of 85.0% for all 52 duplexes under study.
The evaluation of methods based on Precision metric as reported in 3, confirms our earlier discussions. This metric measure quantifies the quality of top entries of the sorted list of unobserved links while AUC considers the quality of the ranking in the whole list. Here, also SimBins is superior compared to other two baseline methods. Specifically, in 38 duplexes out of 52 the best performance based on Precision metric is for SimBins while in 2 duplexes it shares the best performance with another baseline method. So, the results of Tables 2 and  3 confirm the superiority of SimBins over baseline methods regardless of the choice of base similarity measure and evaluation metric and also suggest that using SimBins along with LPI as the base similarity measure leads to the best performance.
Finally, we compare SimBins with three state-of-the-art methods, namely, YaoPL, YaoGL 38 , and SameiHP 39 . An introduction to these methods is given in 'Related Works' section. The scoring schema of these methods can be summarized as Eq. (26). The base similarity measure used in these methods ( S T and S A for the target and auxiliary layers T and A respectively) is LPI for the two former methods and HP for the latter. Moreover, the layer relevance measure ( µ T,A )is PCC for YaoPL and GOR for YaoGL and SameiHP. Based on the recommendation of the authors, the parameter ϕ = 0.5 is considered. The results of the experiments are shown in Table 4.  www.nature.com/scientificreports/ Clearly, SimBins achieves the best performance (85.0%) in term of average mean AUC over all 52 duplexes. Also, in 25 duplexes SimBins is the best performing method (the best in 18 cases and sharing the best performance in 7 cases with another method) while the second best is SameiHP with the best performance in 13 duplexes. It should be also noted that SameiHP method has large fluctuation across different networks and the lowest average mean AUC. So, using SimBins based on LPI is our choice that performs well across diverse set of multiplex networks.

S T S T + S A SB
Complexity analysis. Consider a duplex network G(V , E [1] , E [2] ; where layer 1 is the target, and layer 2 is the auxiliary layer. Let O(θ) be a representative of computational complexity for the base similarity measures. The similarity of node pairs in both layers is needed for subset U ′ of U = V × V as formulated in 'Partitioning Node Pairs (Binning)' section. Therefore, the computing complexity of measuring similarities is O( i=1,2 θm i ) . Partitioning U ′ into equal-depth bins requires sorting of similarities, Notice that for obtaining a full ranking of propensity of links, SimBins, like the majority of link prediction algorithms would need at least O(n 2 ); n = |V | computations which is not easily scalable to very large networks without pruning the n 2 space. To be specific, for a full ranking, SimBins would have a computing complexity of O(θn 2 + m log m) in which O(θn 2 ) is the dominating term in real-networks; meaning that SimBins imposes minor overhead to the base similarity measures. This makes SimBins appropriate for using with large networks like SaccPomb that we studied in this paper.

Discussion
In this manuscript, we explored the intra-layer and trans-layer connection probabilities in multiplex networks and verified that in many real multiplex networks, connection probability within an arbitrary layer is correlated with similarity in other layers of the same multiplex. We also observe that connectedness in one layer of the multiplex, increases the probability of linkage in other layers. Subsequently, we developed a consolidated link prediction model by incorporating information theory concepts for characterizing intuitions gathered from the observed evidences.
The proposed method works on a pair of multiplex's layers i.e., a duplex. Different ideas can be conducted to extend it to use multiple layers' topology for link prediction. Considering a target layer T and auxiliary layers A 1 , . . . , A M , the simplest idea is to add up the SimBins scores for each possible layer pairs, symbolically SB T,{A 1 ,...,A M } = M i=1 SB T,A i where SB T,A i is computed according to Eq. (25). The other-not as straightforward as previous-idea is to compose and study bins of more than two dimensions. This extension, although more systematic, might suffer from heavy sparsity of samples (imagine node pairs residing in 3d-bins).
Eventually, SimBins is compared with two baseline methods (base similarity measure in the target layer and simple addition of similarities in target and auxiliary layers) and three state-of-the-art methods (YaoPL, YaoGL and SameiHP) on 9 multiplexes. It is shown that SimBins outperforms the other two baseline methods in most cases. Besides, it rarely performs worse than target similarity and is more robust to deceptive signals compared to the simple addition of similarities. It is mentioned that in some networks, such as London Transport and Drosophila, SimBins seems to be unprofitable as a result of massively condensed node pairs similarity distribution   Overground  www.nature.com/scientificreports/ and negative inter-layer correlations. On the other hand, when comparing with the state-of-the-art methods, it is observed that the overall best average AUC belongs to SimBins and it performs consistently well across various multiplex networks. This can be attributed to the design of the proposed method in which incorporates information both from connectedness and similarity of nodes in different layers. It is shown that SimBins imposes negligible computation overhead to the base similarity measures (as we applied the method on a large network with a few thousand nodes and edges like SacchPomb, with minor computational burden). The idea of using an equal-width strategy for partitioning node pairs leads to even more efficiency due to its O(m) complexity (instead of O(m log m) in equal-depth binning), although the accuracy of prediction might be affected.

CS-AARHUS
Because our method falls under the structural similarity category, it may not beat learning-based approaches that are of higher computational complexity. As discussed earlier in this section, extending SimBins to use similarities in multiple layers simultaneously can be further explored as a future direction. The proposed method integrates intra-layer structural similarities and connectedness in the auxiliary layers in a systematic way; it is proved to boost the performance of link prediction in multiplex networks while maintaining a low computational complexity.  Table 3. Average Precision over 100 iterations for the networks under study. Each row shows the performance of link prediction methods on a duplex of a multiplex network grouped by the corresponding base similarity measure in use. Columns show the average Precision over 100 iterations for the prediction methods S T (similarity score of only the target layer), S T + S A (addition of similarity scores of the target and auxiliary layer), SB A T ≡ SB T,A (SimBins).  Table 4. Comparison of average AUC over 100 iterations for the networks under study with state-of-the-art methods. Performance evaluation of the link prediction methods on 52 real-world duplex networks based on AUC measure. Three left columns determine the name of the multiplex networks, the target layer of link prediction, and the auxiliary layer which comes to help the prediction task. From left to right, the evaluated methods are SimBins using LPI as base similarity measure: Our proposed method, YaoPL: a state-of-the-art method that utilizes LPI with PCC as the layer relevance measure, YaoGL: LPI with PCC as the layer relevance measure, SameiHP: a state-of-the-art method that utilizes hyperbolic distance as dissimilarity measure within each layer with GOR as the layer relevance measure. Bold and underlined are the best results in each row.