Characterizing dissimilarity of weighted networks

Measuring the dissimilarities between networks is a basic problem and wildly used in many fields. Based on method of the D-measure which is suggested for unweighted networks, we propose a quantitative dissimilarity metric of weighted network (WD-metric). Crucially, we construct a distance probability matrix of weighted network, which can capture the comprehensive information of weighted network. Moreover, we define the complementary graph and alpha centrality of weighted network. Correspondingly, several synthetic and real-world networks are used to verify the effectiveness of the WD-metric. Experimental results show that WD-metric can effectively capture the influence of weight on the network structure and quantitatively measure the dissimilarity of weighted networks. It can also be used as a criterion for backbone extraction algorithms of complex network.

It is generally accepted that weights are coupled in a non-trivial way to the binary network topology, playing an important part in structural organization, functionality and dynamics. For instance, the spreading of emergency diseases in the international airport network is closely related to the number of passengers travelling from one airport to another. In many applications of similarity comparison, such as discriminating between neurological disorders 31 , quantifying changes in temporal evolving network 32 , if these networks are weighted, undoubtedly, more accurate similarity measurement can be obtained after considering the edge weight. Especially, when comparing two weighted complete graphs, like the similarity network between cities obtained by different methods 33 , whose difference mainly comes from the edge weight, and then a dissimilarity metric of weighted networks becomes indispensable.
In view of the above analysis, we propose a quantitative dissimilarity metric for comparing weighted networks based on method proposed by Schieberl 30 . It is assumed that the initial weighted networks are with similarity weights. Firstly, the shortest path lengths are measured through reciprocal edge weights and are rescaled by the ratio of the average shortest path lengths of the weighted network to its binary counterpart. Hence, we can construct a probability matrix based on distance between each pair of nodes, which captures the comprehensive information of the network. Secondly, Jensen-Shannon divergence is used to compare the differences between the distance distribution vectors obtained from probability matrix. Thirdly, the concept and calculation of complementary graph and alpha centrality of weighted network are defined. The quantitative differences between original weighted network and its complementary graph in alpha centrality are respectively computed through Jensen-Shannon divergence. Finally, several synthetic and real-world networks are used to verify the effectiveness and necessity of the proposed WD-metric. Moreover, WD-metric is used to compare original real networks and their skeleton, extracting through Disparity filter and Global Threshold filter when retaining similar edge density, indicating new proposed metric can be used as a criterion for backbone extraction algorithms of complex network.
Methods D-measure. When measuring the difference between two unweighted networks, Schieberl proposed a dissimilarity metric (D-measure), which was defined as a three-term function 30 : where ω 1 ,ω 2 and ω 3 are arbitrary weights of the terms satisfying Instead of comparing vectors whose elements were numbers such as the number of node or edge, average degree and so on, Schieberl considered vectors in which the elements were sets of probability distributions. Particularly, for each node i = 1, 2, . . . , N , the node-distance distribution P i = {p i (j)} was defined as the fraction of nodes at distance j from node i. The set of N node-distance distributions {P 1 , . . . , P N } contains a lot of detailed topological information, such as the degree (number of nodes at distance 1 from i) and the closeness centrality (the sum of the inverse distance from i to all other nodes). Then, the network node dispersion (NND) was defined as: In the first term of Formula (1), averaged connectivity distribution of nodes, µ G and µ G ′ , the set of µ j (j = 1, 2, . . . , d) and µ ′ j (j = 1, 2, . . . , d ′ ) were compared, which captured the global topological differences of network G and G' . The second term analyzed the heterogeneity of nodes by comparing the connectivity distribution of each node P i (i = 1, 2, . . . , N) and normalizing by log(d + 1) . In addition, considering many networks like most k-regular networks possess NND = 0 , the third term compared the difference values of the graphs and their complements in alpha centrality.
Because of the importance of weight in the research of network structure and function, designing an efficient and quantitative dissimilarity metric applicable to weighted network is very meaningful and necessary. Therefore, we propose the WD-metric based on D-measure.
WD-metric. Given the weight, the distances between nodes of weighted network become different real numbers, not just integer any more as in an unweighted network. How to convert them to integers for calculating the node-distance distributions while depicting their meaning of n-th order neighbors? In addition, little is known about complement of a weighted network. Moreover, redesigning the reasonable parameter values in calculating alpha centrality of a weighted network is also an important part.
As for the weighted network G ω = �V ω , E ω � , where V ω and E ω represent the set of nodes and edges in G ω . Denote W as the adjacency matrix of G ω . Here, for consistency of understanding and processing distance, we state that the ω ij is the similarity weight and the value ω ij = 0 if two nodes i and j are disconnected. In addition, (2) NND(G) = J(P 1 , . . . , P N ) log(d + 1) www.nature.com/scientificreports/ we perform the normalization on weight by dividing the maximum weight. So, the similarity weights are distributed in [0,1].
The distance distribution of weighted network. Given a network with similarity weight, the reciprocal of the weight is taken to measure the path length. L ω is the matrix of shortest path length, whose entry l ij , being the weighted distance from node i to node j, becomes continuous real number rather than integer. In this case, instead of simply rounding it, we first rescale L ω through multiplying it by L/L ω ( L ω and L are the average shortest path lengths of the weighted network and its binary counterpart, respectively) to get L ′ ω , and then ceiling the values to get L ′′ ω . By doing this, the original real distances are classified thus we can count up the numbers of nodes with the same distance from node i and then divide them by N − 1 to obtain the node-distance distributions of weighted network N) . Most importantly, the method of rescaling distance can retain the topological properties about n-th order neighbor. The set of N node-distance distribution {P ω 1 , P ω 2 , . . . , P ω N } forms a matrix P ω with the element p ω i (j) being the fraction of nodes that are connected to the node i at distance j, similar to the case for unweighted network. In particular, the matrix P ω includes one column for those disconnected nodes. Therefore, our method can also work well for the disconnected networks. See Supplementary Note 1 for detailed description with a simple example.
Complement of weighted network. There is very little discussion on the complement of a weighted network. We give a similar and reasonable definition of the complementary graph of a weighted network referring to the complement of an unweighted network.
For an unweighted network G with adjacency matrix A(G) , its complementary graph G c , in the matrix representation, can be denoted as A(G c ) = K n − A(G) . K n is a matrix whose entries are all equal to one.
For a weighted network G ω , with similarity weights distributed in [0,1], denoting its adjacency matrix as W(G ω ) , correspondingly, its complementary graph can be defined as W(G c ω ) = K n − W(G ω ) , where K n is a matrix whose entries are all equal to one.
Alpha centrality. Since alpha centrality considers not only the interaction between nodes, but also the information of each node that are independent of others 34 , it is widely studied as an important property of network. It is generally formed as: where A is the adjacency matrix of network G, α is the attenuation factor and β is an exogenous factor vector. It can be proved that the solution of equation converges for α < 1/ max , where max is the spectral radius of the network.
According to the Perron-Frobenius theory, in a real symmetric matrix M , max ≤ max i j M ij . Therefore, in a graph, max must be less than the maximum degree. Schieberl set α = 1/N and considered link density of every node as an exogenous factor vector for an unweighted network. In a weighted network G ω , the adjacency matrix W is also symmetric, then max is bounded from above by the maximum node strength. Because the weights of G ω are distributed in [0,1], the maximum node strength is bounded from above by N. Hence, we set where ω is the average weight, S is the node strength vector.
As known, JS divergence is often used to measure the difference between two probability distributions. Therefore, when considering the influence of alpha centrality, we process the calculated alpha centrality vector V α to obtain P α who is a discrete probability distribution with one dimension more than V α : Expression of the WD-metric. Considering the effects of global and local features, we can obtain a few related vectors based on the above definitions of the distance probability matrix, complementary graph and alpha centricity of a weighted network.
First of all, through the distance probability matrix P ω , we can obtain the average proportion of each order neighbors: Further, we can calculate the value of node dispersion of weighted network (WNND), which is defined as: www.nature.com/scientificreports/ where m is the number of columns of the distance probability matrix P ω , and J is the JS divergence. Finally, the quantitative dissimilarity metric of weighted network is proposed as: Here we set the weights ω 1 = ω 2 = 0.45 and ω 3 = 0.1 as Schieberl did to quantify structural dissimilarities between weighted networks. On one hand, considering of the consistency, we hope that the weighted dissimilarity metric is still applicable to the unweighted network. On the other hand, the weights here respectively represent the influence of networks global (first term), networks local (second term) features and the network heterogeneity (third term) on the network differences. The value of each term of the WD-metric supposed to be proportional to that of unweighted. We calculate several pairs of real networks and get basically consistent results.

Results
Leveraging the WD-metric we propose, several groups of experiments are performed on synthetic networks and real networks to verify the necessity and validity of new proposed metric. Note that, if no specific instructions in this paper, the dissimilarity values (D-values) between all synthetic networks are average results of running 100 times, and the size of synthetic network is N=100.
Complete graphs with four edge weight distributions. In order to verify the effectiveness of the WD-metric in comparison between diverse weighted networks, the weights drawn from different distributions are first added to the complete graphs, and then the dissimilarity values (D-values) between the complete graphs with and without weights are calculated and shown in Figs. 1 and 2. As shown in Fig. 1, there is a significant difference between before and after weighting on a complete graph. Meanwhile the D-values change gradually with the corresponding parameters under different weighting modes. www.nature.com/scientificreports/ They indicate that our method captures the influence of the weight on the network structure. Except the comparison between a weighted and an unweighted network, we also compare the difference between two weighted networks. As red lines shown in Fig. 2, the D-values between two networks with same topology but different weights are relatively small, but they still change significantly with the weight, which further indicates the WDmetric effectively depicts the effect of weight on the network.

Incomplete graphs with different edge densities.
Having observed the differences between weighted complete graphs, we would like to see the performance of the WD-metric on the weighted incomplete graphs. Therefore, we use the WD-metric to observe the differences before and after weighting on Erdos-Renyi (ER) network and Barabasi-Albert (BA) network with different densities. As shown of the black curves in Fig. 3, there is little difference between two unweighted networks (UD-values) at any of the same density. However, the colored curves show that the difference after weighting (WD-values) increase obviously in most cases, except on ER network with small p. The possible reason may be that small connecting probability causes the ER network to be divided into many disconnected groups, so the UD-values are relatively larger. Moreover, in this case, a small quantity of edge weight has little effect on network, so there is no clear difference between UD-values and WD-values. In addition, from the colored curves, it is not difficult to find that the WD-values wholly increase with the increasing of the edge density. That is, when the network is sparse, the weight has little impact on the structure, while in the dense network, the weight has a greater impact.  Comparison between neural networks. As an interdisciplinary technology, neural network has been widely used in various fields to tackle the problems like classification and prediction in recent years 35 . Figure 4 shows a simplified two-layer neural network, composed of many neurons from input layer, hidden layer and output layer, and weighted edges. Neural network is a typical weighted network with specific functions. By continually training data and adjusting edge weights, the new neural network usually has better ability in prediction or classification. We try to use WD-metric to compare these neural networks with different prediction or classification accuracy. If the accuracy of two neural networks is closer, and the dissimilarity between them is smaller, it will further probe the validity of the WD-metric in capturing the function of weighted networks.
Here, we perform some experiments on the classical BP neural network for pattern recognition of handwritten numbers. By inputting 4 groups of training sets with size of 10, 100, 1000 and 10,000, we can obtain four neural networks with different weights but the same topology connection mode. Then, WD-metric is used to compare these networks. Table 1 shows that when the sizes of training sets are different, WD-metric can capture the differences between corresponding neural networks with different classification ability. D-values increase gradually between network with 10 training sets and networks with training sets 100, 1000 and 10,000, while D-values decrease gradually between network with 10,000 training sets and networks with training sets 10,100 and 1000. This shows when the difference of classification accuracy of networks is larger, the D-value between them is larger. The results further manifest that the WD-metric is quantitative and effective for measuring the distance between networks with different functions caused by weights.
Distances between real weighted networks. After the comparison between synthetic networks, in order to observe the performance of the WD-metric on real-world networks, we make pair-by-pair comparison among various weighted real networks and the results are shown in Fig. 5a.
17 data sets of 4 networks types: Animal, Online Communication, Human Contact and Human Social, are considered. Table 2 shows the basic statistics of them. All networks here presented are freely available at The Koblenz Network Collection (http://konec t.uni-koble nz.de/). We also calculate the differences between those networks when ignoring the weight, and the results shown in Fig. 5b. It can be found that there is a significant difference between the two figures. What's more, as shown in Fig. 5a, the dissimilarities between Reality Mining and other networks are very large under consideration of weight. If not, shown as Fig. 5b, Reality Mining is submerged in the networks, which further indicates the necessity of designing the dissimilarity metric of weighted network. Moreover, we can find that the similarity between some networks with the same type are higher, such  Application of the WD-metric to backbones extraction. In a large-scale network, the extraction of truly relevant nodes or connections forming the network's backbone can help form reduced but meaningful representations of a large-scale complex network, and understand its fundamental structure and function 36 . However, many existing extraction methods are mainly for retaining one or more topological attributes. For example, the classical method of Disparity filter proposed by Serrano 37 , still qualitatively shown its superiority to the global threshold filter mainly through the heterogeneity of the weight distribution. However, our proposed WD-metric can quantitatively measure the dissimilarity of weighted network from comprehensive information. Figure 6 presents us the D-values between the U.S. Airport and Residence Hall  www.nature.com/scientificreports/ network and their backbones. On one hand, with the increase of edge density, D-values gradually decrease as a whole, which can't agree more about the fact that the subgraph with lager density retains more information. On the other hand, the blue line is almost below the red line, quantitatively and intuitively indicating the disparity filter is superior to the global threshold filter. The WD-metric can be used as a criterion for backbone extraction algorithms of complex network.

Discussion
In this paper, we propose a qualitative dissimilarity metric applicative to weighted networks (WD-metric) based on the method of D-measure 30 only for unweighted networks. Especially, for disconnected networks, it also performs well. Various experiments have shown that WD-metric can capture the influence of the weight on the network structure, and quantitatively and effectively measure the dissimilarity of weighted networks. In addition, it can depict the influence of edge density on network structure. On one hand, when the network is sparse, the weight has little impact on the structure. On the other hand, while in the dense network, the weight has a greater impact. Furthermore, the WD-metric can be used as a criterion for backbone extraction algorithms of complex network.
We have compared among some real-world networks and obtained the dissimilarity values between them through the WD-metric but without further analyzing the practical significance of the dissimilarity values. Scholars from different fields can use it combined with various practical problems yield interesting results and applications. Moreover, from the perspective of minimizing D-value between original network and its backbone, developing a new method of backbone extraction is a meaningful idea. In addition, we can pay more attention to the relationship between network differences and network functionalities such as the percolation and spreading dynamics. How to set the weight of each term of the WD-metric is also worth seriously considering. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.