Critical analysis of (Quasi-)Surprise for community detection in complex networks

Module or community structures widely exist in complex networks, and optimizing statistical measures is one of the most popular approaches for revealing and identifying such structures in real-world applications. In this paper, we focus on critical behaviors of (Quasi-)Surprise, a type of statistical measure of interest for community structure, accompanied by a series of comparisons with other measures. Specially, the effect of various network parameters on the measures is thoroughly investigated. The critical number of dense subgraphs in partition transition is derived, and a kind of phase diagrams is provided to display and compare the phase transitions of the measures. The effect of “potential well” for (Quasi-)Surprise is revealed, which may be difficult to get across by general greedy (agglomerative or divisive) algorithms. Finally, an extension of Quasi-Surprise is introduced for the study of multi-scale structures. Experimental results are of help for understanding the critical behaviors of (Quasi-)Surprise, and may provide useful insight for the design of effective tools for community detection.

predefined communities), denoted by Partition X (see Fig. 1A for an example of a partition with X = 3). The pre-defined partition in the network is a special case for x = 1. For Partition X in the networks, the probability that a link exists within a community can be written as, Here, m in is the number of existing intra-community links in the partition with r/x communities; m is the number of existing links in the network; n c is the number of vertices in each of dense subgraphs (or say, predefined communities); p in is the probability of linking vertices within the same dense subgraphs; p out is the probability of linking vertices respectively in two adjacent dense subgraphs; M int is the maximal possible number of intra-community links in the partition; M is the maximal possible number of links in a network. As a result, = | | S mD q q ( ) x x for the partition, which is a multivariate function (See Methods for details). Firstly, we show the relation between Quasi-Surprise and the number x of dense subgraphs being contained in each community of Partition X (Fig. 2). For small r-values, S(x)/S(1) is less than 1 and decreases with x, so there is no merging of dense subgraphs; and as expected, when r is large enough, S(x)/S(1) has a clear peak, so the merging of x dense subgraphs will appear. In the most modular networks (Fig. 2, Right), where clique-like dense subgraphs are connected by only one inter-community link, the merging is very difficult or even almost impossible (because it requires large r-values), but, for other parameters (e.g. Fig. 2, Left), the merging is easier obviously. The difficulty of dense subgraphs merging will greatly reduce with the increase of p out /p in .
Secondly, we compare the normalized S(r)-curves for different x-values (Fig. 3A,B). The S(r)-values increase with r. For small r-values, S(r, x = 1) is larger than others. With the increase of r, S(r, x = 2, 3 and 4) will be larger than others in turn, that is, the merging of (x = 2, 3 and 4) dense subgraphs will be preferred. As in Fig. 2, the merging is very difficult to appear in the most modular networks (Fig. 3A), but it may appear much more easily in other networks (e.g. Fig. 3B).
Thirdly, we show that for different x-values, the (normalized) S-curves decrease with p out /p in (Fig. 3C,D). For small p out /p in -values, S(x = 1) is larger than others, so there is no merging. With the increase of p out /p in , S(x = 2, 3 and 4) will be larger than others in turn, that is, the (x = 2, 3 and 4) merging will be preferred in turn. And for larger-size networks (i.e. larger r-values), the merging will appear more easily.
For the sake of comparison, we also analyzed the effect of network parameters on other quality functions, the original Surprise, Significance and Modularity, which have similar behaviors to Quasi-Surprise (see Figs S1-S3).
Phase transition in merging/disconnecting dense subgraphs. When the number r of dense subgraphs in the networks or p out /p in is large enough, the dense subgraphs may merge. To study the critical number ⁎ r x of dense subgraphs, we firstly analyze the transition of Quasi-Surprise from one partition to another in the single-scale networks (see Method). When Δ = − ≥ S S x S ( ) (1) 0, the predefined partition was not preferred, compared with Partition X. By solving ΔS = 0 for r, one can obtain, For Quasi-Surprise, the critical number ⁎ r x of dense subgraphs increases monotonically with the increase of x for different p out /p in -values (Fig. 4A), meaning that, the larger the x-values, the larger the critical ⁎ r x -values. And ⁎ r x decays quickly with the increase of p out /p in for all x-values (Fig. 4B), meaning that, the larger the p out /p in -values, the more easily the merging appears, in other words / namely, the lower the resolution of it. Moreover, the resolution of Quasi-Surprise is far higher than that of Modularity for small p out /p in -values. Figure 4C provides a comparison of the critical r-values of various methods (Quasi-Surprise, Surprise, Significance and Modularity) in the single-level networks (see SI). On one hand, the critical r-values of the methods as a function of p out /p in have similar behaviors; on the other hand, the curves of the critical r-values give a precise comparison of the resolutions of the methods. Figure 4 could be regarded as a kind of phase diagrams in which, below the corresponding curve, the x-community-merging partition is not allowed compared with the pre-defined one. And the phase diagrams show that the appearance of community merging is not as difficult as imagined, e.g., in the most modular networks with small p out /p in -values. Note that, for simplicity, we have only considered the transition from the predefined partition to Partition X. Through more detailed analysis, e.g., comparing Partition X = 3 with Partition X = 2, one can find that the critical r-values will be larger than estimated above (see Method). Moreover, notice Fig. 4D and SI for phase transition of various methods in double-level networks.
We further verify the above conclusions directly by using Louvain algorithm to optimize the quality functions. Table 1 shows that, in the single-level networks, (a) for the fixing number r of dense subgraphs, dense subgraphs merging will appear when p out is large enough for all methods, because the increase of the links between dense subgraphs makes dense subgraphs more and more close together; (b) with the increase of r-values, the needed p out -values for dense subgraphs merging become smaller, meaning that dense subgraphs merging become easier; (c) the effect of "potential well" appears in some cases where f* is less than 1 but there is no merging (see the "potential well" in the following section); (d) Modularity is more inclined to merge dense subgraphs than other methods, while (Quasi-)Surprise and Significance display relatively high resolution (partly due to the effect of "potential" well). (e) By comparing results of different p in -values, the decrease of p in -values will make the x-values in single-level networks. The inset graph is for more explicitly comparison. (lines + symbol for Quasi-Surprise, while the corresponding lines for Modularity (Mod)). "(1 group)" denotes community partitions where there are just several (e.g. x = 2 or 3) "dense subgraphs" merging into 1 group, generating a community with 2 or 3 "dense subgraphs", and other "dense subgraphs" are still considered as independent communities.  Tables 1 and S1), and (f) in the double-level networks, there are similar results (see Tables S2 and S3).
Moreover, the above methods are applied to a set of real-word networks (Table S7). Similar to the results in the model networks, Quasi-Surprise has higher resolution than Modularity, and thus it tends to generate more communities in the real-world networks than Modularity. Original Surprise and Significance can find more communities in the networks than other methods, because they have higher resolution than others.
Effect of "potential well" on community detection. Because of nonlinearity of (Quasi-)Surprise, a kind of unexpected phenomenon may appear when one searches the optimal community structure for (Quasi-) Surprise in an agglomerative manner, e.g., by Louvain algorithm. Generally, for some statistical measures, e.g., Modularity, one (or two) set(s) of x-dense subgraphs merging will be allowed, or say it can lead to the increase of the statistical measure, when Partition X is allowed, namely, it can stimulate the increase of the measure. However, it may not be the case for (Quasi-)surprise in some cases, because (Quasi-) Surprise as a function of the number of dense subgraphs merging has obvious region of low S-value, which may be difficult to get across by general greedy (agglomerative or divisive) algorithms. We call the phenomenon as "potential well" effect by borrowing the concept in Quantum mechanics. Now, consider a partition with only one group of x dense subgraphs merging in the single-level networks. For small increments of q and q, by comparing the original partition, the increments of Quasi-Surprise can be estimated by, . By solving ΔS = 0 for r, one can obtain, x out in in out Figure 4B clearly shows that the line "x = 2(1 group)" is above the line "x = 2", and the line "x = 3(1 group)" is above the line "x = 3", where "(1 group)" denotes community partitions that have only several (e.g. x = 2 or 3) "dense subgraphs" merging into 1 group and other "dense subgraphs" are still separated. This is a counterintuitive  Table 1. "F1|F2|F3|F4" (the numbers separated by vertical lines) denotes the ratio of the number of identified communities by Sq, Sp, Sg and Q respectively, to that of pre-defined communities (i.e. dense subgraphs), in the single-level networks with different number of pre-defined communities and different p out -values (p in = 1.0). "(f1|f2|f3|f4)" (the numbers in the parentheses, separated by vertical lines) denotes the ratio of the values of quality functions of pre-defined partitions for Sq, Sp, Sg and Q respectively, to that of Partition X = 2. If F1, F2, F3 or F4 is less than a value of 1, then there are the appearance of communities merging, that is, there are at least 2 or more communities being merged into a large community by corresponding methods (Sq, Sp, Sg or Q). If f1, f2, f3 or f4 is less than a value of 1, then Partition X = 2 should be preferred by Sq, Sp, Sg or Q, but the identified partitions are not inevitably to be Partition X = 2 or the partitions with communities merging, which are related to used algorithms due to the "potential well" effect. result, because single group of x dense subgraphs merging is more difficult to happen than r/x groups of x dense subgraphs merging. Figure 5A,B confirms the conclusion again. When r > r 2 , the partition for r/2 groups of 2 dense subgraphs merging already has larger S-value than the predefined one. However, only when > ′ r r 2 , generating one group of 2 dense subgraphs merging is able to be approved in the agglomerative algorithms, because it will lead to the decrease of S before > ′ r r 2 , even if r > r 2 (see Fig. 5C-J). This may lead that the expected 2-community-merging partition cannot be found until > ′ r r 2 . Figure 5C-J clearly shows that, the increment of S (ΔS) decreases monotonously with the number of groups of 2 dense subgraphs merging, when r < r 2 ; while it increases monotonously when > ′ r r 2 . In the two cases, generally agglomerative algorithms should be able to find the optimal partition for S. However, when < < ′ r r r 2 2 , there exists "potential well", where ΔS will firstly decrease and then increase. It is also the case for 3-community merging. It may be difficult for general greedy algorithms to get across the "potential well". This is frustrating, because this may lead "false" optimal results for some algorithms; while from another viewpoint, this may be a "good" news, because this means that S will be more difficult to encounter the resolution problem than estimated by Equation (3).
Whether could the "potential well" problem be solved by divisive algorithms? The answer may be negative. Because another kind of "potential well" still exists in some cases for the algorithms. As shown in Fig. 5F,J, S of Partition X = 2 is less than that of Partition X = 1, but the disconnecting of dense subgraphs or the decrease of fraction of dense subgraphs merging will lead to the decrease of S. Because of the existence of the "potential well", general greedy divisive algorithms may be unable to get across the "potential well", e.g., from Partition X = 2 to Partition X = 1. Maybe only some ergodic but time-consuming algorithms could solve the "potential well" problem to find the "true" optimal partition for Quasi-Surprise.
Does the "potential well" problem also exist in other methods? For comparison, we have confirmed that the original Surprise also has the "potential well", while it was not found in Significance and Modularity, because of the additional property for communities (see Fig. S4). We have proved strictly that Significance have the same critical point for r/2 groups and single group of 2 dense subgraphs merging, in the two cases. Therefore they have no "potential well" effect (see SI for the proof).  Table 1 just shows a little sign of the "potential well" effect, therefore we have further tested the "potential well" effect of distinct measures by directly optimizing the measures (Fig. 6). As discussed above, when the number r of dense subgraphs is large enough, S2 (Q2) > S1 (Q1), dense subgraphs should merge. In the networks, for example, the identified partition should have a transition from the first-level partition to the second-level one. However, because of the effect of "potential" well, with the increase of r-values, Quasi-Surprise and Surprise have clear delays for the transition -the identified partition is still the first-level one, though S2 (Q2) > S1 (Q1). While this kind of delay is not observed for Significance and Modularity. This confirms that the effect of "potential well" really exists in Quasi-Surprise and Surprise, while not for significance and modularity.

Effect of heterogeneity of vertex degree and community size.
To study the effect of heterogeneity of vertex degrees and community sizes on Quasi-Surprise, we further apply various methods to a type of classical modular network, Lancichinetti-Fortunato-Rachicchi (LFR) networks 58 (Fig. 1C), which are able to mimic the general properties of many real-world networks, such as the community structures, the heterogeneity of vertex degrees and community sizes.
Firstly, a set of networks with different heterogeneity of vertex degrees and community sizes are generated by the increase of maximal degree and maximal community size. The heterogeneity of the two kinds make the inhomogeneity of link density within communities emerges gradually due to the random fluctuations of links. When the inhomogeneity of link density is large enough in a community, the community may be split. The results (Tables 2 and S4 and Fig. S7) show that, (1) almost all methods work well in homogeneous networks; (2) the heterogeneity of community size leads the splitting of communities (see Sq, Sp, Sg and Mod); (3) while the heterogeneity of degree aggravates the splitting of communities further. (4) The partitions by Sq, Sp and Sg contain more groups of vertices than the pre-defined ones, because they (as well as LP 19 ) are more inclined to split the communities, especially when comparing with other methods (such as Modularity 22,36 , Infomap 10 , Walktrap 11 and OSLOM 59 ).
Then, we study the effect of the mean degree of the networks. With the decrease of the mean degree of the networks, links within communities become more and more sparse. Thus, the inhomogeneity of link density will emerge gradually in the communities due to the random fluctuations. As a result, communities will also tend to split (see Tables S5 and S6, Figs S8-S11 and the details of analysis).
Further, we also analyzed the effect of the network size on the above results. For Quasi-Surprise, Surprise, Significance and Modularity, with the increase of network size, (1) the tendency to split is also weakened Figure 6. Effect of "potential" well in community detection. S1/S2, Sd/S2, Q1/Q2 and Q/Q2 as a function of the number of (the first-level) communities in the two-level networks. S1 (Q1), S2 (Q2) and Sd (Qd) denote the values of quality functions respectively for the first-level partition (without communities merging), the secondlevel partition (with communities merging), and the identified partition by different methods. Each measure is tested in three networks, denoted by I, II and III respectively. The parameters in the networks are set as follows:

Extension to multi-scale networks. Analysis of critical behaviors in multi-scale networks.
We further study the critical behaviors of various measures in the networks with two-scale community structures, a generalization of the single-scale networks (Fig. 1B). In the networks, the critical r-value of Sq can be estimated by (see SI for the proof), In the networks, (Quasi-)Surprise also has < ′ r r 2 2 (Figs 4D, S5 and S6). On the one hand, this means the "potential well" effect still exists when < < ′ r r r 2 2 , and thus general greedy optimization algorithm may prefer either the first-level partition (for agglomerative algorithms, see Fig. 6) or the second-level partition (for divisive algorithms), leading to "false" optimal solutions. So the identified level depends on applied algorithms.
On the other hand, the partition of which level is identified is closely related to the number of communities in the networks, partly because (Quasi-)Surprise is just single-scale method. When r < r 2 , the first level is found; when < ′ r r 2 , the second level is preferred; when < < ′ r r r 2 2 , the identified level depends on whether optimization algorithm could get across the "potential well" to find true optimal solutions. Moreover, the critical values of ′ r 2 (as well as r 2 ) will quickly decrease with the increase of p out1 /p in , and therefore small p out1 /p in -values will prefer the first level in a network with fixed r, while large p out1 /p in -values will prefer the second level (in this case, the communities easily merge).
Because of the accumulative property, Modularity has the same critical ⁎ r 2 -value for r/2 groups and single group of 2 communities merging in the networks: , and Significance is too: where p 1 = p in and p 2 = (p in + p out1 )/2 (see SI). They thus do not show the potential-well phenomenon, but there still is similar resolution problem in the networks: the second level is found when > ⁎ r r 2 , otherwise the first level is found.
Extension to multi-scale community detection. As discussed above, (Quasi-) Surprise as well as many other methods are just single-scale methods with limited resolutions, so the identified partitions or levels closely depend on the network parameters such as the (inter-and intra-community) link densities and the number of communities in the networks (note that the "potential well" effect is also related to the number of communities  Table 2. Ratio of the number of communities identified by different methods, to that of the predefined ones, in the LFR networks with different network size (N), different heterogeneity of vertex degrees and community sizes. Other parameters: k m = 10, C min = 20, µ = 0.1, τ 1 = 2, and τ 2 = 2. The increase of k max and C max will respectively lead the increase of the heterogeneity of degree and community size in the networks. To display the effectiveness of the multi-scale methods in detecting communities at different scales, the methods are applied to two kinds of networks with multi-scale community structures (Figs 7, S12 and S13). The results show that they are able to identify the partitions of the pre-defined scales in the networks. The proposed multi-scale extensions are simple while effective, which provide alternative approaches to analyze the community structures at different scales.

Discussion and Conclusion
Community structure (or module structure) widely exists in various complex networks. Detecting the communities (or modules) in complex networks is an issue of interest in the research of complex networks. Many methods have been proposed to detect community structures in complex networks, and optimizing quality functions for community structure is one of the most important strategies for community detection. The existing methods could help in discovering the intrinsic structures in networks, but they also have respective scopes of application. Therefore, it is necessary to study the behaviors of the methods for the theoretical research and real applications. This is of help in understanding the methods themselves, and can promote the improvement of the methods or the development of more effective methods. However there is less work on the behaviors of (quasi-)Surprise-a kind of statistical measure of interest for community structure until now.
This paper provided the detailed study for the critical behaviors of (Quasi-)Surprise, accompanied by a series of comparison with other methods, including Significance, Modularity, Infomap, Walktrap, LP and OSLOM. We analyzed the effect of various network parameters on (Quasi-)Surprise, and derive the critical number of dense subgraphs in merging/splitting of dense subgraphs. To display the phase transitions of various measures from one partition to another one, we provided the phase diagrams of the critical points in merging/splitting of dense subgraphs, which give a clear comparison for the critical points of various measures. The critical number of dense subgraphs for (Quasi-)Surprise has a clearly super-exponential increase with the difference between inter-and intra-community link possibilities, but it is close to that of Modularity for small difference of link possibilities, for which the difference between the resolutions of (Quasi-)Surprise and Modularity is far less than in the most modular structures.
A kind of "potential well" effect for (Quasi-) Surprise was revealed in community detection. In some cases, just one group of x dense subgraphs merging may be more difficult to appear than all r/x groups of x dense subgraphs merging, because, when < < ′ r r r 2 2 , (Quasi-) Surprise as a function of the number of dense subgraphs merging has obvious "potential well". The "potential well" is generally difficult to get across by greedy (agglomerative or divisive) algorithms, e.g. the popular Louvain. This may result in "false" optimal solutions for the algorithms, though this also may be able to implicitly mitigate the resolution problem or the excessive split of communities, to some extent. Maybe, only some ergodic but time-consuming algorithms, e.g., simulated annealing, can avoid the problem.
Overall, (Quasi-) Surprise tends to split communities due to such reasons as the heterogeneity of link density, degree and community size, often displays higher resolution, and thus identify more communities than other methods, e.g., Modularity, in community detection, but it also may lead to the excessive splitting of communities due to the density inhomogeneity inside communities, e.g., caused by the heterogeneity of degrees and community sizes.
Moreover, it is believed that multi-scale structures widely exist in various complex networks. In the multi-scale networks, e.g. the double-level networks above, different methods may identify structures at different scales. (Quasi-) Surprise is a kind of statistical measures of interest for community detection, but it is just a single-scale method. And the results suggest the necessity of developing multi-resolution methods, though it may be not easy for (Quasi-)Surprise. We proposed an extension of Quasi-Surprise to multi-scale networks, which provide alternative approaches for identifying the multi-scale structures.
Finally, we expected that the above analysis could be helpful for the understanding of the critical behaviors of the statistical measures and provide useful insight for developing more effective community-detection methods in complex networks.

Method
Networks. Single-level networks. For convenience of analysis, a set of community-loop networks with single-level community structure is constructed, where r "dense subgraphs" (predefined communities) are placed at a circle and are connected to adjacent ones. For each network, let n c the number of vertices in each (predefined) community, while n = r · n c the number of vertices in the network; p in is the probability of linking vertices within the same (predefined) community; p out is the probability of linking vertices respectively in two adjacent (predefined) communities; = ⋅ + ⋅ m r n p r n p 2 c in c out 2 2 is the number of edges in the networks (Fig. 1A).

Double-level networks.
To further analyze the phenomena in the merge and breakup of communities, we construct the community-loop networks with double-level community structures. Let r the number of communities and n c the number of vertices in each community at the first level, while n = r · n c the number of vertices in the network. p in the probability of linking vertices within the first-level community; p out1 the probability of linking vertices respectively in the first-level and adjacent communities contained in the same second-level community; p out2 (<p out1 ) the probability of linking vertices respectively in two adjacent and first-level communities contained in two different second-level communities (Fig. 1B).
LFR networks. Lancichinetti-Fortunato-Rachicchi (LFR) networks is a type of classical modular networks, which are able to mimic the general properties of many real-world networks, such as the community structures, the heterogeneity of vertex degrees and community sizes 58 . In the networks, vertex degrees and community sizes follow power-law distributions with exponents τ 1 and τ 2 respectively, and a common mixing parameter μ controls the ratio between the external degree of each vertex with respect to its community and the total degree of the vertex. The smaller the μ-values, the more obvious the communities. Here, low μ-values are used, so communities are well separated from each other (Fig. 1C). Statistical measures for community structure. Surprise (S p ). Surprise is a statistical approach to assess the quality of community partition in a network, with higher values corresponding to better partitions 38 . It was shown that Surprise can give better characterization for community structures than modularity in several complex benchmarks. Given a community partition in a network, Surprise is defined as the minus logarithm of the probability that the observed number of intra-community links or more appears in Erdös-Rényi random networks. According to a cumulative hyper-geometric distribution, it can be written as,  is the Kullback-Leibler divergence, which measures the distance between two probability distributions x and y 40 .
The original Quasi-Surprise is based on the difference between the probability of links existing within communities and the expected values in random model. We propose a kind of alternative approach to extend the Quasi-Surprise to multi-scale case, by using a resolution parameter to adjust the expected values in random model. This results in the multi-scale Quasi-Surprise, q where γ is the resolution parameter.

Significance (S g ).
Significance also is a recently proposed measure for estimating the quality of community structures, which looks at how likely dense communities appear in random networks 41 . It is defined as g s s s Here the sum runs over all communities; the density of community s, p s , is the ratio of the number of existing edges to the maximum in the community; the density of network, p, is the ratio of the number of existing edges to the maximum in the whole network. It cloud also be directly optimized as objective function to find the optimal community partitions.
To extend the original Significance to multi-scale cases, we use a parameter to adjust the density of network, because the original Significance is based on the difference between the density of community and the density of network. As a result, the multi-scale Significance can be written as, g s s s where γ is the resolution parameter.
Modularity (Q or Mod). For given community division of a network, it is defined as 36 where M is the total number of edges in the network, k s in the inner degree of group s, k s the total degree of group s, and the sum runs over all communities in the given network. Modularity evaluates the fraction of edges within communities in the network minus the expected value in a random graph (i.e. in the null model). In general, the larger the modularity, the better the division. In recent years, it has become one of the most popular quality functions for community detection.
To detect communities at different scales, the multi-scale Modularity can be defined as, where γ is the resolution parameter.

Normalized mutual information (NMI).
This measure is taken from information theory and estimates the similarity between two community partitions 42 . When perfectly matched, NMI = 1. Otherwise, the less is the match, the smaller is the value of NMI. NMI is often used to evaluate the performance of methods by assessing the amount of community information correctly extracted by the methods in networks with known community structures.
Critical number of dense subgraphs in partition transition. For the single-level networks, we derive the critical number → ⁎ r x x 1 2 of dense subgraphs from Partition X 1 to Partition X 2 (Partition X denotes the partition with r/x groups of x dense subgraphs merging). When Δ = − > S S x S x ( ) ( ) 0 2 1 , Partition X 2 will be preferred, compared to Partition X 1 . Here, ΔS, divided by the number of links, can be written as,