Abstract
Module or community structures widely exist in complex networks, and optimizing statistical measures is one of the most popular approaches for revealing and identifying such structures in realworld applications. In this paper, we focus on critical behaviors of (Quasi)Surprise, a type of statistical measure of interest for community structure, accompanied by a series of comparisons with other measures. Specially, the effect of various network parameters on the measures is thoroughly investigated. The critical number of dense subgraphs in partition transition is derived, and a kind of phase diagrams is provided to display and compare the phase transitions of the measures. The effect of “potential well” for (Quasi)Surprise is revealed, which may be difficult to get across by general greedy (agglomerative or divisive) algorithms. Finally, an extension of QuasiSurprise is introduced for the study of multiscale structures. Experimental results are of help for understanding the critical behaviors of (Quasi)Surprise, and may provide useful insight for the design of effective tools for community detection.
Introduction
Networks with complex architectures of connections are ubiquitous in nature. Biological, technological and social networks are found to exhibit many common topological properties, such as clustering, correlation and modularity^{1}. The latter feature means that the networks often consist of communities, clusters or modules, i.e., groups of vertices within which connections are very dense while between which they are sparser^{1}. Communities are closely related to real functional groupings in realworld systems^{2,3} and can affect dynamical behaviors such as information diffusions and synchronizations^{4,5,6}. Because of the relevance in practical applications, many methods have been proposed to detect graph’s communities, such as spectral analysis^{7,8}, random walks^{9,10,11,12}, synchronization^{13,14}, diffusion^{15}, statistical models^{16,17}, label propagation^{18,19,20,21}, optimization^{22,23,24,25,26} and others^{27,28,29,30,31,32,33,34,35}. Most of them consist in optimizing quality functions that can capture the intuition of community structures, such as Modularity^{36}, Hamiltonians^{16,37}, (Quasi)Surprise (S_{q})^{38,39,40}, Significance (S_{g})^{41} and “fitness” functions^{42,43}.
For real applications, it becomes crucial to analyze the methods’ behavior in depth. For instance, some methods, e.g. based on modularity optimization and Bayesian inference, were shown to have phase transitions from detectable to undetectable structures, which actually constitute a limitation for their achievable performance^{44,45,46}. On the other hand, the limits of Modularity, such as the resolution limit^{47,48,49}, implies the possible existence of multiscale structures in networks, and promoted the proposal of various (improved) methods, especially the multiresolution Modularity or Pottsbased Hamiltonians^{50,51,52,53,54}. Various approaches were proposed to improve the Modularitybased methods^{55,56}. For example, Lai et al. improved the Modularitybased belief propagation method by using the correlation between communities to improve the estimating of number of communities^{56}.
Surprise (Sp) is a recently proposed statistical measure of interest for community structure, which is defined as the minus logarithm of the probability that the number of intracommunity links larger or equal to the observed one is found in random networks, according to a cumulative hypergeometric distribution^{38}. While it has been shown to have good performance in many networks^{38,39,57}, it is inherently applicable only to unweighted networks and it is not easy to be optimized directly due to computational complexity caused by complex nonlinear factors. Recently, Traag et al.^{40} proposed a kind of accurate asymptotic approximation for Surprise (a method called QuasiSurprise, Sq). This makes Surprise able to treat weighted networks naturally, and more accessible for theoretical analysis and efficient optimization.
So far, there is less work on the behaviors of (Quasi)Surprise, though it has been applied to the investigation of module structures. In many networks, by comparing with Modularity as a reference, (Quasi)Surprise seems to be immune to the resolution limit, because it has super high resolution, and thus it is able to discover the underlying community structure better. However, there may be misunderstanding needed to be clarified here. Moreover, a kind of “potential well” effect is observed in (Quasi)Surprise, while not in Modularity and Significance. It may make (Quasi)Surprise more difficult to be optimized than other measures such as Modularity, even if optimization for the measures remains NPhard.
To analyze the critical behaviors of (Quasi)Surprise in community detection, we firstly study the effect of the various network parameters on (Quasi) Surprise, derive the critical number of dense subgraphs in merging/splitting of communities, and provide a kind of phase diagrams to display the parameter regions of dense subgraphs merging, accompanied by a series of comparisons with other methods, e.g. Modularity and Significance. And then, we will show that single group of dense subgraphs merging may be more difficult, due to the effect of “potential well” on optimization algorithms, where (Quasi)Surprise is not a monotonic function of the number of dense subgraphs merging. This may lead to “false” optimal solutions for (Quasi)Surprise. Moreover, we further show that the heterogeneity of degrees and community sizes will quicken the splitting of communities. Finally, we propose a kind of multiscale version of QuasiSurprise as well as Significance to deal with multiscale networks.
Results
Effect of network parameters
For convenience, we have constructed a set of singlelevel networks with r “dense subgraphs” (or called “predefined communities”) that are placed at a circle and are connected to adjacent “dense subgraphs” (see Method and Fig. 1A). In those networks, let us consider a community partition that consists of r/x communities, where each of these communities contains x adjacent and dense subgraphs (or say, predefined communities), denoted by Partition X (see Fig. 1A for an example of a partition with X = 3). The predefined partition in the network is a special case for x = 1. For Partition X in the networks, the probability that a link exists within a community can be written as,
and the expected value of such probability can be written as,
Here, m_{in} is the number of existing intracommunity links in the partition with r/x communities; m is the number of existing links in the network; n_{c} is the number of vertices in each of dense subgraphs (or say, predefined communities); p_{in} is the probability of linking vertices within the same dense subgraphs; p_{out} is the probability of linking vertices respectively in two adjacent dense subgraphs; M_{int} is the maximal possible number of intracommunity links in the partition; M is the maximal possible number of links in a network. As a result, \(S=mD({q}_{x}{\bar{q}}_{x})\) for the partition, which is a multivariate function (See Methods for details).
Firstly, we show the relation between QuasiSurprise and the number x of dense subgraphs being contained in each community of Partition X (Fig. 2). For small rvalues, S(x)/S(1) is less than 1 and decreases with x, so there is no merging of dense subgraphs; and as expected, when r is large enough, S(x)/S(1) has a clear peak, so the merging of x dense subgraphs will appear. In the most modular networks (Fig. 2, Right), where cliquelike dense subgraphs are connected by only one intercommunity link, the merging is very difficult or even almost impossible (because it requires large rvalues), but, for other parameters (e.g. Fig. 2, Left), the merging is easier obviously. The difficulty of dense subgraphs merging will greatly reduce with the increase of p_{out}/p_{in}.
Secondly, we compare the normalized S(r)curves for different xvalues (Fig. 3A,B). The S(r)values increase with r. For small rvalues, S(r, x = 1) is larger than others. With the increase of r, S(r, x = 2, 3 and 4) will be larger than others in turn, that is, the merging of (x = 2, 3 and 4) dense subgraphs will be preferred. As in Fig. 2, the merging is very difficult to appear in the most modular networks (Fig. 3A), but it may appear much more easily in other networks (e.g. Fig. 3B).
Thirdly, we show that for different xvalues, the (normalized) Scurves decrease with p_{out}/p_{in} (Fig. 3C,D). For small p_{out}/p_{in}values, S(x = 1) is larger than others, so there is no merging. With the increase of p_{out}/p_{in}, S(x = 2, 3 and 4) will be larger than others in turn, that is, the (x = 2, 3 and 4) merging will be preferred in turn. And for largersize networks (i.e. larger rvalues), the merging will appear more easily.
For the sake of comparison, we also analyzed the effect of network parameters on other quality functions, the original Surprise, Significance and Modularity, which have similar behaviors to QuasiSurprise (see Figs S1–S3).
Phase transition in merging/disconnecting dense subgraphs
When the number r of dense subgraphs in the networks or p_{out}/p_{in} is large enough, the dense subgraphs may merge. To study the critical number \({r}_{x}^{\ast }\) of dense subgraphs, we firstly analyze the transition of QuasiSurprise from one partition to another in the singlescale networks (see Method). When \({\rm{\Delta }}S=S(x)S(1)\ge 0\), the predefined partition was not preferred, compared with Partition X. By solving ΔS = 0 for r, one can obtain,
where \(\beta =\frac{x1}{x}\frac{2{p}_{out}}{{p}_{in}+2{p}_{out}}\). For comparison, we analytically derive the critical point of Modularity for r/x groups of x dense subgraphs merging in the networks (see Method),
See Fig. 4 and Supplementary information (SI) for the critical points of other measures in the networks.
For QuasiSurprise, the critical number \({r}_{x}^{\ast }\) of dense subgraphs increases monotonically with the increase of x for different p_{out}/p_{in}values (Fig. 4A), meaning that, the larger the xvalues, the larger the critical \({r}_{x}^{\ast }\)values. And \({r}_{x}^{\ast }\) decays quickly with the increase of p_{out}/p_{in} for all x–values (Fig. 4B), meaning that, the larger the p_{out}/p_{in}–values, the more easily the merging appears, in other words / namely, the lower the resolution of it. Moreover, the resolution of QuasiSurprise is far higher than that of Modularity for small p_{out}/p_{in}–values.
Figure 4C provides a comparison of the critical rvalues of various methods (QuasiSurprise, Surprise, Significance and Modularity) in the singlelevel networks (see SI). On one hand, the critical rvalues of the methods as a function of p_{out}/p_{in} have similar behaviors; on the other hand, the curves of the critical rvalues give a precise comparison of the resolutions of the methods.
Figure 4 could be regarded as a kind of phase diagrams in which, below the corresponding curve, the xcommunitymerging partition is not allowed compared with the predefined one. And the phase diagrams show that the appearance of community merging is not as difficult as imagined, e.g., in the most modular networks with small p_{out}/p_{in}values. Note that, for simplicity, we have only considered the transition from the predefined partition to Partition X. Through more detailed analysis, e.g., comparing Partition X = 3 with Partition X = 2, one can find that the critical rvalues will be larger than estimated above (see Method). Moreover, notice Fig. 4D and SI for phase transition of various methods in doublelevel networks.
We further verify the above conclusions directly by using Louvain algorithm to optimize the quality functions. Table 1 shows that, in the singlelevel networks, (a) for the fixing number r of dense subgraphs, dense subgraphs merging will appear when p_{out} is large enough for all methods, because the increase of the links between dense subgraphs makes dense subgraphs more and more close together; (b) with the increase of rvalues, the needed p_{out}values for dense subgraphs merging become smaller, meaning that dense subgraphs merging become easier; (c) the effect of “potential well” appears in some cases where f* is less than 1 but there is no merging (see the “potential well” in the following section); (d) Modularity is more inclined to merge dense subgraphs than other methods, while (Quasi)Surprise and Significance display relatively high resolution (partly due to the effect of “potential” well). (e) By comparing results of different p_{in}values, the decrease of p_{in}values will make the needed p_{out}values and rvalues for dense subgraphs merging become smaller (see Tables 1 and S1), and (f) in the doublelevel networks, there are similar results (see Tables S2 and S3).
Moreover, the above methods are applied to a set of realword networks (Table S7). Similar to the results in the model networks, QuasiSurprise has higher resolution than Modularity, and thus it tends to generate more communities in the realworld networks than Modularity. Original Surprise and Significance can find more communities in the networks than other methods, because they have higher resolution than others.
Effect of “potential well” on community detection
Because of nonlinearity of (Quasi)Surprise, a kind of unexpected phenomenon may appear when one searches the optimal community structure for (Quasi)Surprise in an agglomerative manner, e.g., by Louvain algorithm. Generally, for some statistical measures, e.g., Modularity, one (or two) set(s) of xdense subgraphs merging will be allowed, or say it can lead to the increase of the statistical measure, when Partition X is allowed, namely, it can stimulate the increase of the measure. However, it may not be the case for (Quasi)surprise in some cases, because (Quasi) Surprise as a function of the number of dense subgraphs merging has obvious region of low Svalue, which may be difficult to get across by general greedy (agglomerative or divisive) algorithms. We call the phenomenon as “potential well” effect by borrowing the concept in Quantum mechanics.
Now, consider a partition with only one group of x dense subgraphs merging in the singlelevel networks. For small increments of q and \(\bar{q}\), by comparing the original partition, the increments of QuasiSurprise can be estimated by,
where \({\rm{\Delta }}q=\frac{2(x1){p}_{out}}{{p}_{in}+2{p}_{out}}\cdot \frac{1}{r}\) and \({\rm{\Delta }}\bar{q}=({x}^{2}x)/r\). By solving ΔS = 0 for r, one can obtain,
Figure 4B clearly shows that the line “x = 2(1 group)” is above the line “x = 2”, and the line “x = 3(1 group)” is above the line “x = 3”, where “(1 group)” denotes community partitions that have only several (e.g. x = 2 or 3) “dense subgraphs” merging into 1 group and other “dense subgraphs” are still separated. This is a counterintuitive result, because single group of x dense subgraphs merging is more difficult to happen than r/x groups of x dense subgraphs merging.
Figure 5A,B confirms the conclusion again. When r > r_{2}, the partition for r/2 groups of 2 dense subgraphs merging already has larger Svalue than the predefined one. However, only when \(r > {r}_{2^{\prime} }\), generating one group of 2 dense subgraphs merging is able to be approved in the agglomerative algorithms, because it will lead to the decrease of S before \(r > {r}_{2^{\prime} }\), even if r > r_{2} (see Fig. 5C–J). This may lead that the expected 2communitymerging partition cannot be found until \(r > {r}_{2^{\prime} }\).
Figure 5C–J clearly shows that, the increment of S (ΔS) decreases monotonously with the number of groups of 2 dense subgraphs merging, when r < r_{2}; while it increases monotonously when \(r > {r}_{2^{\prime} }\). In the two cases, generally agglomerative algorithms should be able to find the optimal partition for S. However, when \({r}_{2} < r < {r}_{2^{\prime} }\), there exists “potential well”, where ΔS will firstly decrease and then increase. It is also the case for 3community merging. It may be difficult for general greedy algorithms to get across the “potential well”. This is frustrating, because this may lead “false” optimal results for some algorithms; while from another viewpoint, this may be a “good” news, because this means that S will be more difficult to encounter the resolution problem than estimated by Equation (3).
Whether could the “potential well” problem be solved by divisive algorithms? The answer may be negative. Because another kind of “potential well” still exists in some cases for the algorithms. As shown in Fig. 5F,J, S of Partition X = 2 is less than that of Partition X = 1, but the disconnecting of dense subgraphs or the decrease of fraction of dense subgraphs merging will lead to the decrease of S. Because of the existence of the “potential well”, general greedy divisive algorithms may be unable to get across the “potential well”, e.g., from Partition X = 2 to Partition X = 1. Maybe only some ergodic but timeconsuming algorithms could solve the “potential well” problem to find the “true” optimal partition for QuasiSurprise.
Does the “potential well” problem also exist in other methods? For comparison, we have confirmed that the original Surprise also has the “potential well”, while it was not found in Significance and Modularity, because of the additional property for communities (see Fig. S4). We have proved strictly that Significance have the same critical point for r/2 groups and single group of 2 dense subgraphs merging,
where \(H(y)=\,y\,\mathrm{ln}(y)(1y)\,\mathrm{ln}(1y)\), p_{1} = p_{i}, p_{2} = (p_{i} + p_{o})/2 and p = (p_{i} + 2p_{o})/r. Modularity also has the same critical point \({r}_{2}={r}_{2^{\prime} }={p}_{in}/{p}_{out}+2\) in the two cases. Therefore they have no “potential well” effect (see SI for the proof).
Table 1 just shows a little sign of the “potential well” effect, therefore we have further tested the “potential well” effect of distinct measures by directly optimizing the measures (Fig. 6). As discussed above, when the number r of dense subgraphs is large enough, S2 (Q2) > S1 (Q1), dense subgraphs should merge. In the networks, for example, the identified partition should have a transition from the firstlevel partition to the secondlevel one. However, because of the effect of “potential” well, with the increase of rvalues, QuasiSurprise and Surprise have clear delays for the transition – the identified partition is still the firstlevel one, though S2 (Q2) > S1 (Q1). While this kind of delay is not observed for Significance and Modularity. This confirms that the effect of “potential well” really exists in QuasiSurprise and Surprise, while not for significance and modularity.
Effect of heterogeneity of vertex degree and community size
To study the effect of heterogeneity of vertex degrees and community sizes on QuasiSurprise, we further apply various methods to a type of classical modular network, LancichinettiFortunato Rachicchi (LFR) networks^{58} (Fig. 1C), which are able to mimic the general properties of many realworld networks, such as the community structures, the heterogeneity of vertex degrees and community sizes.
Firstly, a set of networks with different heterogeneity of vertex degrees and community sizes are generated by the increase of maximal degree and maximal community size. The heterogeneity of the two kinds make the inhomogeneity of link density within communities emerges gradually due to the random fluctuations of links. When the inhomogeneity of link density is large enough in a community, the community may be split. The results (Tables 2 and S4 and Fig. S7) show that, (1) almost all methods work well in homogeneous networks; (2) the heterogeneity of community size leads the splitting of communities (see Sq, Sp, Sg and Mod); (3) while the heterogeneity of degree aggravates the splitting of communities further. (4) The partitions by Sq, Sp and Sg contain more groups of vertices than the predefined ones, because they (as well as LP^{19}) are more inclined to split the communities, especially when comparing with other methods (such as Modularity^{22,36}, Infomap^{10}, Walktrap^{11} and OSLOM^{59}).
Then, we study the effect of the mean degree of the networks. With the decrease of the mean degree of the networks, links within communities become more and more sparse. Thus, the inhomogeneity of link density will emerge gradually in the communities due to the random fluctuations. As a result, communities will also tend to split (see Tables S5 and S6, Figs S8–S11 and the details of analysis).
Further, we also analyzed the effect of the network size on the above results. For QuasiSurprise, Surprise, Significance and Modularity, with the increase of network size, (1) the tendency to split is also weakened gradually; (2) the difference between identified and predefined partition gradually decreases; and (3) NMI gradually increases (see Tables 2 and S4–S6, Figs S8–S11).
Extension to multiscale networks
Analysis of critical behaviors in multiscale networks
We further study the critical behaviors of various measures in the networks with twoscale community structures, a generalization of the singlescale networks (Fig. 1B). In the networks, the critical rvalue of Sq can be estimated by (see SI for the proof),
For only one group of 2 (firstlevel) communities merging in the networks,
When p_{out1} = p_{out2}, the networks are equivalent to the singlescale networks. When p_{out1} > p_{out2}, twoscale structures emerge. With the decrease of p_{out2}/p_{in}, \({r}_{2}^{\ast }\) decreases. When p_{out2} = 0, \({r}_{2}^{\ast }\approx \frac{{p}_{out1}}{{p}_{in}}\cdot {(\frac{2{p}_{in}}{{p}_{in}+{p}_{out1}})}^{\frac{{p}_{in}}{{p}_{out1}}+1}\), so the effect of p_{out2} is limited. \({r}_{2}^{\ast }\)value is mainly determined by p_{out1}/p_{in}; also, equation (9) has similar behaviors (see Fig. S6). See SI for detailed analysis of critical behaviors of various measures in the networks. We provided a systematical comparison of phase diagrams of various measures (Sq, Sp, Sg and Mod) in the networks (See Figs 4D, S5 and S6).
In the networks, (Quasi)Surprise also has \({r}_{2} < {r}_{2^{\prime} }\) (Figs 4D, S5 and S6). On the one hand, this means the “potential well” effect still exists when \({r}_{2} < r < {r}_{2^{\prime} }\), and thus general greedy optimization algorithm may prefer either the firstlevel partition (for agglomerative algorithms, see Fig. 6) or the secondlevel partition (for divisive algorithms), leading to “false” optimal solutions. So the identified level depends on applied algorithms.
On the other hand, the partition of which level is identified is closely related to the number of communities in the networks, partly because (Quasi)Surprise is just singlescale method. When r < r_{2}, the first level is found; when \({r}_{2^{\prime} } < r\), the second level is preferred; when \({r}_{2} < r < {r}_{2^{\prime} }\), the identified level depends on whether optimization algorithm could get across the “potential well” to find true optimal solutions. Moreover, the critical values of \({r}_{2^{\prime} }\) (as well as r_{2}) will quickly decrease with the increase of p_{out1}/p_{in}, and therefore small p_{out1}/p_{in}values will prefer the first level in a network with fixed r, while large p_{out1}/p_{in}values will prefer the second level (in this case, the communities easily merge).
Because of the accumulative property, Modularity has the same critical \({r}_{2}^{\ast }\)value for r/2 groups and single group of 2 communities merging in the networks: \({r}_{2}^{\ast }=({p}_{in}+{p}_{out1}+{p}_{out2})/{p}_{out1}\), and Significance is too: \({r}_{2}^{\ast }=({p}_{in}+{p}_{out1}+{p}_{out2})\cdot \exp \{(2H({p}_{2})H({p}_{1}))/(2{p}_{2}{p}_{1})\}\), where p_{1} = p_{in} and p_{2} = (p_{in} + p_{out1})/2 (see SI). They thus do not show the potentialwell phenomenon, but there still is similar resolution problem in the networks: the second level is found when \(r > {r}_{2}^{\ast }\), otherwise the first level is found.
Extension to multiscale community detection
As discussed above, (Quasi) Surprise as well as many other methods are just singlescale methods with limited resolutions, so the identified partitions or levels closely depend on the network parameters such as the (inter and intracommunity) link densities and the number of communities in the networks (note that the “potential well” effect is also related to the number of communities). And multiscale structures extensively exist in various complex networks. So, developing multiresolution (or multiscale) methods is of importance. Here, we proposed an extension of QuasiSurprise to multiscale networks, by adjusting the random model (See Methods;), because the original QuasiSurprise closely depends on the difference between the probability of links existing within communities and the expected values in the random model. Similarly, other measures such as Significance and Modularity can also be extended to multiscale networks (see Methods). The exact formulations of the methods are provided in the additional material.
To display the effectiveness of the multiscale methods in detecting communities at different scales, the methods are applied to two kinds of networks with multiscale community structures (Figs 7, S12 and S13). The results show that they are able to identify the partitions of the predefined scales in the networks. The proposed multiscale extensions are simple while effective, which provide alternative approaches to analyze the community structures at different scales.
Discussion and Conclusion
Community structure (or module structure) widely exists in various complex networks. Detecting the communities (or modules) in complex networks is an issue of interest in the research of complex networks. Many methods have been proposed to detect community structures in complex networks, and optimizing quality functions for community structure is one of the most important strategies for community detection. The existing methods could help in discovering the intrinsic structures in networks, but they also have respective scopes of application. Therefore, it is necessary to study the behaviors of the methods for the theoretical research and real applications. This is of help in understanding the methods themselves, and can promote the improvement of the methods or the development of more effective methods. However there is less work on the behaviors of (quasi)Surprise–a kind of statistical measure of interest for community structure until now.
This paper provided the detailed study for the critical behaviors of (Quasi)Surprise, accompanied by a series of comparison with other methods, including Significance, Modularity, Infomap, Walktrap, LP and OSLOM. We analyzed the effect of various network parameters on (Quasi)Surprise, and derive the critical number of dense subgraphs in merging/splitting of dense subgraphs. To display the phase transitions of various measures from one partition to another one, we provided the phase diagrams of the critical points in merging/splitting of dense subgraphs, which give a clear comparison for the critical points of various measures. The critical number of dense subgraphs for (Quasi)Surprise has a clearly superexponential increase with the difference between inter and intracommunity link possibilities, but it is close to that of Modularity for small difference of link possibilities, for which the difference between the resolutions of (Quasi)Surprise and Modularity is far less than in the most modular structures.
A kind of “potential well” effect for (Quasi) Surprise was revealed in community detection. In some cases, just one group of x dense subgraphs merging may be more difficult to appear than all r/x groups of x dense subgraphs merging, because, when \({r}_{2} < r < {r}_{2^{\prime} }\), (Quasi) Surprise as a function of the number of dense subgraphs merging has obvious “potential well”. The “potential well” is generally difficult to get across by greedy (agglomerative or divisive) algorithms, e.g. the popular Louvain. This may result in “false” optimal solutions for the algorithms, though this also may be able to implicitly mitigate the resolution problem or the excessive split of communities, to some extent. Maybe, only some ergodic but timeconsuming algorithms, e.g., simulated annealing, can avoid the problem.
Overall, (Quasi) Surprise tends to split communities due to such reasons as the heterogeneity of link density, degree and community size, often displays higher resolution, and thus identify more communities than other methods, e.g., Modularity, in community detection, but it also may lead to the excessive splitting of communities due to the density inhomogeneity inside communities, e.g., caused by the heterogeneity of degrees and community sizes.
Moreover, it is believed that multiscale structures widely exist in various complex networks. In the multiscale networks, e.g. the doublelevel networks above, different methods may identify structures at different scales. (Quasi) Surprise is a kind of statistical measures of interest for community detection, but it is just a singlescale method. And the results suggest the necessity of developing multiresolution methods, though it may be not easy for (Quasi)Surprise. We proposed an extension of QuasiSurprise to multiscale networks, which provide alternative approaches for identifying the multiscale structures.
Finally, we expected that the above analysis could be helpful for the understanding of the critical behaviors of the statistical measures and provide useful insight for developing more effective communitydetection methods in complex networks.
Method
Networks
Singlelevel networks
For convenience of analysis, a set of communityloop networks with singlelevel community structure is constructed, where r “dense subgraphs” (predefined communities) are placed at a circle and are connected to adjacent ones. For each network, let n_{c} the number of vertices in each (predefined) community, while n = r · n_{c} the number of vertices in the network; p_{in} is the probability of linking vertices within the same (predefined) community; p_{out} is the probability of linking vertices respectively in two adjacent (predefined) communities; \(m=r\cdot {n}_{c}^{2}{p}_{in}+2r\cdot {n}_{c}^{2}{p}_{out}\) is the number of edges in the networks (Fig. 1A).
Doublelevel networks
To further analyze the phenomena in the merge and breakup of communities, we construct the communityloop networks with doublelevel community structures. Let r the number of communities and n_{c} the number of vertices in each community at the first level, while n = r · n_{c} the number of vertices in the network. p_{in} the probability of linking vertices within the firstlevel community; p_{out1} the probability of linking vertices respectively in the firstlevel and adjacent communities contained in the same secondlevel community; p_{out2}(<p_{out1}) the probability of linking vertices respectively in two adjacent and firstlevel communities contained in two different secondlevel communities (Fig. 1B).
LFR networks
LancichinettiFortunatoRachicchi (LFR) networks is a type of classical modular networks, which are able to mimic the general properties of many realworld networks, such as the community structures, the heterogeneity of vertex degrees and community sizes^{58}. In the networks, vertex degrees and community sizes follow powerlaw distributions with exponents τ_{1} and τ_{2} respectively, and a common mixing parameter μ controls the ratio between the external degree of each vertex with respect to its community and the total degree of the vertex. The smaller the μvalues, the more obvious the communities. Here, low μvalues are used, so communities are well separated from each other (Fig. 1C).
Hierarchical networks
The networks have 256 vertices and twoscale community structures^{14}. The first scale consists of 4 groups of 64 vertices and the second scale consists of 16 groups of 16 vertices. The number of links of each vertex with the most internal community is k_{in0}, the number of links of each vertex with the most external community is k_{in1}, and the number of links with any other vertex at random in the network is 1.
Statistical measures for community structure
Surprise (S _{p})
Surprise is a statistical approach to assess the quality of community partition in a network, with higher values corresponding to better partitions^{38}. It was shown that Surprise can give better characterization for community structures than modularity in several complex benchmarks. Given a community partition in a network, Surprise is defined as the minus logarithm of the probability that the observed number of intracommunity links or more appears in ErdösRényi random networks. According to a cumulative hypergeometric distribution, it can be written as,
where M is the maximal possible number of links in a network; M_{int} is the maximal possible number of intracommunity links in a given partition; m is the number of existing links in the network; while m_{int} is the number of existing intracommunity links in the partition.
QuasiSurprise (S _{q})
The original definition of Surprise is for unweighted networks and it involves complex nonlinear factors, leading to the difficulties of the theoretical analysis and numerical computations. So it is very useful to provide a kind of effective approximate expression for Surprise. By only taking into account the dominant term and using Stirling’s approximation of the binomial coefficient, a kind of QuasiSurprise reads,
where q = m_{int}/m denotes the probability that a link exists within a community; \(\bar{q}={M}_{{\rm{int}}}/M\) denote the expected value of q; \(D(xy)=x\,\mathrm{ln}\,\tfrac{x}{y}+(1x)\mathrm{ln}\,\tfrac{1x}{1y}\) is the KullbackLeibler divergence, which measures the distance between two probability distributions x and y^{40}.
The original QuasiSurprise is based on the difference between the probability of links existing within communities and the expected values in random model. We propose a kind of alternative approach to extend the QuasiSurprise to multiscale case, by using a resolution parameter to adjust the expected values in random model. This results in the multiscale QuasiSurprise,
where γ is the resolution parameter.
Significance (S _{g})
Significance also is a recently proposed measure for estimating the quality of community structures, which looks at how likely dense communities appear in random networks^{41}. It is defined as
Here the sum runs over all communities; the density of community s, p_{s}, is the ratio of the number of existing edges to the maximum in the community; the density of network, p, is the ratio of the number of existing edges to the maximum in the whole network. It cloud also be directly optimized as objective function to find the optimal community partitions.
To extend the original Significance to multiscale cases, we use a parameter to adjust the density of network, because the original Significance is based on the difference between the density of community and the density of network. As a result, the multiscale Significance can be written as,
where γ is the resolution parameter.
Modularity (Q or Mod)
For given community division of a network, it is defined as^{36}
where M is the total number of edges in the network, \({k}_{s}^{in}\) the inner degree of group s, k_{s} the total degree of group s, and the sum runs over all communities in the given network. Modularity evaluates the fraction of edges within communities in the network minus the expected value in a random graph (i.e. in the null model). In general, the larger the modularity, the better the division. In recent years, it has become one of the most popular quality functions for community detection.
To detect communities at different scales, the multiscale Modularity can be defined as,
where γ is the resolution parameter.
Normalized mutual information (NMI)
This measure is taken from information theory and estimates the similarity between two community partitions^{42}. When perfectly matched, NMI = 1. Otherwise, the less is the match, the smaller is the value of NMI. NMI is often used to evaluate the performance of methods by assessing the amount of community information correctly extracted by the methods in networks with known community structures.
Critical number of dense subgraphs in partition transition
For the singlelevel networks, we derive the critical number \({r}_{{x}_{1}\to {x}_{2}}^{\ast }\) of dense subgraphs from Partition X_{1} to Partition X_{2} (Partition X denotes the partition with r/x groups of x dense subgraphs merging). When \({\rm{\Delta }}S=S({x}_{2})S({x}_{1}) > 0\), Partition X_{2} will be preferred, compared to Partition X_{1}. Here, ΔS, divided by the number of links, can be written as,
where \(r{x}_{1}\approx r{x}_{2}\approx r\), \(D(xy)=x\,\mathrm{ln}\,\tfrac{x}{y}+(1x)\,\mathrm{ln}\,\tfrac{1x}{1y}\) and \(\beta =(\frac{{x}_{2}1}{{x}_{2}}\frac{{x}_{1}1}{{x}_{1}})\frac{2{p}_{out}}{{p}_{in}+2{p}_{out}}\).
By solving ΔS = 0 for r, one can obtain the critical number of dense subgraphs,
For comparison, in the same networks, we derive the critical point of modularity for r/x groups of x dense subgraphs merging. For Partition X,
By solving Q_{x2} − Q_{x1} = 0 for r, one can obtain,
For x_{1} = 1, \({r}_{x}^{\ast }=\frac{{p}_{in}+2{p}_{out}}{2{p}_{out}}{x}_{2}\).
References
 1.
Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75–174 (2010).
 2.
Chen, P. & Redner, S. Community structure of the physical review citation network. Journal of Informetrics 4, 278–290 (2010).
 3.
Zhang, S.H., Ning, X.M., Ding, C. & Zhang, X.S. Determining modular organization of protein interaction networks by maximizing modularity density. BMC Syst. Biol. 4, 1–12 (2010).
 4.
Stegehuis, C., van der Hofstad, R. & van Leeuwaarden, J. S. H. Epidemic spreading on complex networks with community structures. Sci. Rep. 6, 29748 (2016).
 5.
Nematzadeh, A., Ferrara, E., Flammini, A. & Ahn, Y.Y. Optimal Network Modularity for Information Diffusion. Phys. Rev. Lett. 113, 088701 (2014).
 6.
Yan, S., Tang, S., Fang, W., Pei, S. & Zheng, Z. Global and local targeted immunization in networks with community structure. J. Stat. Mech. 2015, P08010 (2015).
 7.
Cheng, J.J. et al. A divisive spectral method for network community detection. J. Stat. Mech. 2016, 033403 (2016).
 8.
Qin, X., Dai, W., Jiao, P., Wang, W. & Yuan, N. A multisimilarity spectral clustering method for community detection in dynamic networks. Sci. Rep. 6, 31454 (2016).
 9.
Su, Y., Wang, B. & Zhang, X. A seedexpanding method based on random walks for community detection in networks with ambiguous community structures. Sci. Rep. 7, 41830 (2017).
 10.
Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA 105, 1118–1123 (2008).
 11.
Pons, P. & Latapy, M. Computing communities in large networks using random walks. J. Graph Algorithms Appl. 10, 191–218 (2006).
 12.
Delvenne, J.C., Yaliraki, S. N. & Barahona, M. Stability of graph communities across time scales. Proc. Natl. Acad. Sci. USA 107, 12755–12760 (2010).
 13.
Chen, J., Wang, H., Wang, L. & Liu, W. A dynamic evolutionary clustering perspective: Community detection in signed networks by reconstructing neighbor sets. Physica A 447, 482–492 (2016).
 14.
Arenas, A., DíazGuilera, A. & PérezVicente, C. J. Synchronization Reveals Topological Scales in Complex Networks. Phys. Rev. Lett. 96, 114102 (2006).
 15.
Cheng, X.Q. & Shen, H.W. Uncovering the community structure associated with the diffusion dynamics on networks. J. Stat. Mech. 2010, P04024 (2010).
 16.
Reichardt, J. & Bornholdt, S. Statistical mechanics of community detection. Phys. Rev. E 74, 016110 (2006).
 17.
Karrer, B. & Newman, M. E. J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011).
 18.
Han, J., Li, W. & Deng, W. Multiresolution community detection in massive networks. Sci. Rep. 6, 38998 (2016).
 19.
Steve, G. Finding overlapping communities in networks by label propagation. New J. Phys. 12, 103018 (2010).
 20.
Liu, W., Jiang, X., Pellegrini, M. & Wang, X. Discovering communities in complex networks by edge label propagation. Sci. Rep. 6, 22470 (2016).
 21.
Hou Chin, J. & Ratnavelu, K. A semisynchronous label propagation algorithm with constraints for community detection in complex networks. Sci. Rep. 7, 45836 (2017).
 22.
Blondel, V. D., Guillaume, J.L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
 23.
Li, H. J., Bu, Z., Li, A. H., Liu, Z. D. & Shi, Y. Fast and Accurate Mining the Community Structure: Integrating Center Locating and Membership Optimization. Ieee Transactions on Knowledge and Data Engineering 28, 2349–2362 (2016).
 24.
Li, H. J. & Daniels, J. J. Social significance of community structure: Statistical view. Phys. Rev. E 91, 10 (2015).
 25.
Mei, G., Wu, X., Chen, G. & Lu, J.a Identifying structures of continuouslyvarying weighted networks. Sci. Rep. 6, 26649 (2016).
 26.
Sobolevsky, S., Campari, R., Belyi, A. & Ratti, C. General optimization technique for highquality community detection in complex networks. Phys. Rev. E 90, 012811 (2014).
 27.
Jia, C., Li, Y., Carson, M. B., Wang, X. & Yu, J. Node Attributeenhanced Community Detection in Complex Networks. Sci. Rep. 7, 2626 (2017).
 28.
Quiles, M. G., Macau, E. E. N. & Rubido, N. Dynamical detection of network communities. Sci. Rep. 6, 25570 (2016).
 29.
Fu, J., Zhang, W. & Wu, J. Identification of leader and selforganizing communities in complex networks. Sci. Rep. 7, 704 (2017).
 30.
Yang, Z., Algesheimer, R. & Tessone, C. J. A. Comparative Analysis of Community Detection Algorithms on Artificial Networks. Sci. Rep. 6, 30750 (2016).
 31.
Ju, Y., Zhang, S., Ding, N., Zeng, X. & Zhang, X. Complex Network Clustering by a Multiobjective Evolutionary Algorithm Based on Decomposition and Membrane Structure. Sci. Rep. 6, 33870 (2016).
 32.
Ding, Z., Zhang, X., Sun, D. & Luo, B. Overlapping Community Detection based on Network Decomposition. Sci. Rep. 6, 24115 (2016).
 33.
Chen, Y., Zhao, P., Li, P., Zhang, K. & Zhang, J. Finding Communities by Their Centers. Sci. Rep. 6, 24017 (2016).
 34.
Žalik, K. R. Maximal Neighbor Similarity Reveals Real Communities in Networks. Sci. Rep. 5, 18374 (2015).
 35.
Yang, L., Jin, D., Wang, X. & Cao, X. Active link selection for efficient semisupervised community detection. Sci. Rep. 5, 9039 (2015).
 36.
Newman, M. E. J. & Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004).
 37.
Reichardt, J. & Bornholdt, S. Detecting Fuzzy Community Structures in Complex Networks with a Potts Model. Phys. Rev. Lett. 93, 218701 (2004).
 38.
Aldecoa, R. & Marín, I. Surprise maximization reveals the community structure of complex networks. Sci. Rep. 3, 1060 (2013).
 39.
Nicolini, C. & Bifone, A. Modular structure of brain functional networks: breaking the resolution limit by Surprise. Sci. Rep. 6, 19250 (2016).
 40.
Traag, V. A., Aldecoa, R. & Delvenne, J. C. Detecting communities using asymptotical surprise. Phys. Rev. E 92, 022816 (2015).
 41.
Traag, V. A., Krings, G. & Van Dooren, P. Significant Scales in Community Structure. Sci. Rep. 3, 2930 (2013).
 42.
Lancichinetti, A., Fortunato, S. & Kertész, J. Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys. 11, 033015 (2009).
 43.
Havemann, F., Heinz, M., Struck, A. & Gläser, J. Identification of overlapping communities and their hierarchy by locally calculating communitychanging resolution levels. J. Stat. Mech. 2011, P01023 (2011).
 44.
Nadakuditi, R. R. & Newman, M. E. J. Graph Spectra and the Detectability of Community Structure in Networks. Phys. Rev. Lett. 108, 188701 (2012).
 45.
Decelle, A., Krzakala, F., Moore, C. & Zdeborová, L. Inference and Phase Transitions in the Detection of Modules in Sparse Networks. Phys. Rev. Lett. 107, 065701 (2011).
 46.
Reichardt, J. & Leone, M. (Un)detectable Cluster Structure in Sparse Networks. Phys. Rev. Lett. 101, 078701 (2008).
 47.
Zhang, X. S. et al. Modularity optimization in community detection of complex networks. Europhys. Lett. 87, 38002 (2009).
 48.
Good, B. H., de Montjoye, Y.A. & Clauset, A. Performance of modularity maximization in practical contexts. Phys. Rev. E 81, 046106 (2010).
 49.
Fortunato, S. & Barthélemy, M. Resolution limit in community detection. Proc. Natl. Acad. Sci. USA 104, 36–41 (2007).
 50.
Xiang, J. et al. Local modularity for community detection in complex networks. Physica A 443, 451–459 (2016).
 51.
Xiang, J. et al. Multiresolution community detection based on generalized selfloop rescaling strategy. Physica A 432, 127–139 (2015).
 52.
Ronhovde, P. & Nussinov, Z. Local resolutionlimitfree Potts model for community detection. Phys. Rev. E 81, 046114 (2010).
 53.
Arenas, A., Fernández, A. & Gómez, S. Analysis of the structure of complex networks at different resolution levels. New J. Phys. 10, 053039 (2008).
 54.
Li, H. J., Wang, Y., Wu, L. Y., Zhang, J. H. & Zhang, X. S. Potts model based on a Markov process computation solves the community structure problem effectively. Phys. Rev. E 86, 10 (2012).
 55.
Xiang, J. et al. Enhancing community detection by using local structural information. J. Stat. Mech. 2016, 033405 (2016).
 56.
Lai, D., Shu, X. & Nardini, C. Correlation enhanced modularitybased belief propagation method for community detection in networks. J. Stat. Mech. 2016, 053301 (2016).
 57.
Jiang, Y., Jia, C. & Yu, J. An efficient community detection algorithm using greedy surprise maximization. J. Phys. A 47, 165101 (2014).
 58.
Lancichinetti, A., Fortunato, S. & Radicchi, F. Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78, 046110 (2008).
 59.
Lancichinetti, A., Radicchi, F., Ramasco, J. J. & Fortunato, S. Finding Statistically Significant Communities in Networks. Plos One 6, e18961 (2011).
Acknowledgements
This work was supported by the construct program of the key discipline in Hunan province, the Training Program for Excellent Innovative Youth of Changsha, the Beijing Natural Science Foundation (Grant No. 9182015), the National Natural Science Foundation of China (Grant No. 61702054 and Grant No. 71871233), the Hunan Provincial Natural Science Foundation of China (Grant No. 2018JJ3568), the Scientific Research Fund of Education Department of Hunan Province (Grant No. 17A024), the Scientific Research Project of Hunan Provincial Health and Family Planning Commission of China (Grant No. C2017013), and the Hunan key laboratory cultivation base of the research and development of novel pharmaceutical preparations(Grant No. 2016TP1029).
Author information
Affiliations
Contributions
J.X., H.J.L., Z.B., Z.W. and J.M.L. devised the study, analyzed the data and wrote the manuscript; H.J.L., L.T., M.H.B. and J.M.L. performed the experiments and prepared the figures. All authors contributed to the discussion of results and reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Xiang, J., Li, H., Bu, Z. et al. Critical analysis of (Quasi)Surprise for community detection in complex networks. Sci Rep 8, 14459 (2018). https://doi.org/10.1038/s41598018325820
Received:
Accepted:
Published:
Keywords
 Community Detection
 Dense Subgraphs
 General Greedy
 Multiscale Network
 Singlelevel Network
Further reading

Significancebased multiscale method for network community detection and its application in diseasegene prediction
PLOS ONE (2020)

Impacts of multitype interactions on epidemic spreading in temporal networks
International Journal of Modern Physics C (2020)

Effect of triangle behavior on topological properties of weighted networks
Modern Physics Letters B (2019)

Identifying multiscale communities in networks by asymptotic surprise
Journal of Statistical Mechanics: Theory and Experiment (2019)

Find modules in signed networks based on modularity optimization
International Journal of Modern Physics B (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.