Interplay between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}k-core and community structure in complex networks

The organisation of a network in a maximal set of nodes having at least k neighbours within the set, known as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}k-core decomposition, has been used for studying various phenomena. It has been shown that nodes in the innermost \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}k-shells play a crucial role in contagion processes, emergence of consensus, and resilience of the system. It is known that the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}k-core decomposition of many empirical networks cannot be explained by the degree of each node alone, or equivalently, random graph models that preserve the degree of each node (i.e., configuration model). Here we study the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}k-core decomposition of some empirical networks as well as that of some randomised counterparts, and examine the extent to which the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}k-shell structure of the networks can be accounted for by the community structure. We find that preserving the community structure in the randomisation process is crucial for generating networks whose \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}k-core decomposition is close to the empirical one. We also highlight the existence, in some networks, of a concentration of the nodes in the innermost \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}k-shells into a small number of communities.

2 Effects of changing the resolution of the Louvain method A pitfall of the Louvain method is its propensity to merge small communities due to the existence of a lower bound in the size of the communities (i.e., the modularity's resolution limit) [1,2]. However, communities of empirical networks may not have a typical size, and small communities coexist with large ones in general. A method for mitigating the resolution limit is to introduce a resolution parameter r ∈ (0, 1] into the Louvain algorithm [3]. By tuning r, it is possible to vary the resolution scale of the detected communities, spanning from large (i.e., r ∼ 1) to small (i.e., r ∼ 0) communities. We denote the Louvain method using a different resolution r by LvnR. Figure 2 in the main text shows that commB combined with the communities found by the SBM reproduces the tail of P ≥ (k s ) of the empirical networks most accurately. Table 1 indicates that SBM tends to find more communities than Lvn, although there are exceptions. To understand whether the different number of communities found by SBM and Lvn is the reason behind their different performances, we extracted the communities using LvnR with r ∈ {0.1, 0.3, 0.5}. For the sake of brevity, we only discuss the results for r = 0.3 in the following text. However, we have verified that the results for the other values of r are similar.
In Supplementary Table S2 we show, for all the data sets considered, the number of communities, N c , and the modularity, Q, corresponding to the community structure found using Lvn, SBM, and LvnR with r = 0.3. With LvnR, the number of communities is similar or even larger than that obtained with SBM. Supplementary  Figures S3 and S4 show P ≥ (k s ) plotted against k s and the four indicators used for comparing the innermost kshells, respectively, including the results obtained with LvnR. These figures indicate that using r = 0.3 as opposed to r = 1 improves the ability of the Louvain algorithm to mimic the structure of the k-shell. In Supplementary  Fig. S5 we summarise the performances of the different shuffling methods including LvnR.
Although LvnR mimics the k-shell features better than SBM for some data sets and indicators (approximately around the 20% of the cases), SBM still attains the highest success ratio for each indicator, f X . The few cases for which commB-LvnR does better than commB-SBM are the networks for which the difference between the empirical P ≥ (k s ) and P ≥ (k s ) obtained from the deg shuffling apparently looks small, such as the Emails and Cookpad UK networks. In these networks, the differences between the P ≥ (k s ) of the networks obtained using commB-LvnR and commB-SBM are also small. By contrast, other data sets such as Condensed Matter, Computer Science, and Words show bigger differences between their P ≥ (k s ) and that obtained using the deg method. In these data sets, commB-SBM is much better than commB-LvnR, although commB-LvnR is better than commB-Lvn. Overall, these results suggest that the increase in the number of communities enabled by a small r does not lead to reconstruction of the k-shell structure with a better accuracy than SBM does.   We report the fraction of data sets for which a given combination of the shuffling method and the community detection method yields an indicator's value closest to that for the original network. In addition to the methods considered in Fig. 2, we also consider the case of the communities extracted using the Louvain method with r = 0.3. See the caption or Fig. 2 for notations and legends.
To investigate which features of the SBM and Louvain algorithms are responsible for the difference in their performances, we study the average of the size of the communities to which nodes of a certain k-shell belong, s C , as a function of the k-shell index, k s . In Supplementary Fig. S6, we observe how s C of the communities found by the Louvain method (Lvn) stays nearly constant across the entire range of k s values. By contrast, with SBM, s C monotonically decreases as k s increases, such that nodes in inner k-shells tend to belong to smaller communities. Although a high resolution (i.e., a small r value) in the Louvain method produces a large number of communities (see Supplementary Table S2), the behaviour of s C for Lvn and LvnR is similar, with the major difference that the value of s C is smaller for LvnR.
These results altogether lead us to conclude that our method combined with the community structure identified using the Louvain method with a higher resolution does not outperform our method combined with SBM in grasping the features of the k-core decomposition. This may be because SBM is capable of finding more universal mesostructures than those found by the Louvain method [4][5][6].

The LFR model
The Lancichinetti-Fortunato-Radicchi (LFR) model generates networks where both the node's degree and the size of the communities (i.e., the number of nodes belonging to a community) follow power-law distributions [7]. Such features are found in many empirical networks [8] and have led to the success of the LFR model as generator of benchmark networks to test community detection algorithms [2]. A main finding presented in the main text is that preserving the community structure of the original network in addition to the degree of each node improves the ability of the shuffling methods to mimic the k-core decomposition of the original networks. Here, to test whether or not the community structure and the degree of each node, but not a possible intricate association between the two, is sufficient for mimicking the features of k-core decomposition observed for many empirical networks, we generated networks using the LFR model and analysed their k-cores and those of the shuffled counterparts. The LFR algorithm depends on the following parameters: the exponent, t 1 ∈ [2, 3], of the degree distribution P (k) ∝ k −t 1 ; the exponent, t 2 ∈ [1, 2], of the community's size distribution P (S c ) ∝ S c −t 2 ; the mixing parameter, µ ∈ [0, 1], specifying the fraction of intra-community edges for a node. A value of µ = 0 indicates that a node is connected only with nodes belonging to communities different from its own. A value of µ = 1 indicates that a node is connected exclusively with nodes belonging to its own community; either one of the following: the average degree, k , the minimum degree, k min , or the minimum number of communities, min N c . This stochastic algorithm may not produce a network fulfilling all the requirements in some realisations. Therefore, we have to set the parameter values to ensure the algorithm's convergence.
To encompass a good spectrum of networks, we consider four batches of parameter sets, which are summarised in Supplementary Table S3, together with the properties of the generated networks. Each batch of parameter sets consists of a value of t 1 , a value of t 2 , and seven values of µ ranging from 0.1 to 0.8. We assumed N = 10000 nodes and used the implementation of the LFR algorithm in the NetworkX Python package [9].
For each network generated, we extracted its k-core decomposition and calculated the four indicators. We did the same for the shuffled counterparts generated using the deg, commA, and commB methods. In analogy to Supplementary Fig. S1, in Supplementary Figs. S7-S10 we show the survival function of the probability distribution of the k-shell index, P ≥ (k s ), for the original LFR networks and the shuffled counterparts, one figure per each (t 1 , t 2 ) pair. An eye inspection of Supplementary Figs. S7-S10 highlights the existence of three trends.
First, Supplementary Figs. S7 and S8 indicate that, in networks generated using the smaller t 1 values (i.e., parameters batches 1 and 2 in Supplementary Table S3), the shuffled networks generated by deg, commA-Lvn, and commB-Lvn attain a k-core decomposition with a degeneracy, D, considerably higher than the original one. In contrast, Supplementary Figs. S9 and S10 indicate that, with the larger t 1 values (i.e., parameter batches 3 and 4), we recover the same trend as that shown in Fig. 1. In other words, D for the original networks are larger than that for the shuffled networks. The difference between the original D and its shuffled counterpart seems to be influenced by the value of t 1 , but not t 2 or µ.
Second, P ≥ (k s ) for the original LFR networks mainly decreases smoothly as k s increases, without plateaus or abrupt drops. Therefore, the k-core decomposition of LFR networks does not return any k-shell that is empty or much more populated than its adjacent k-shells. This result is in stark contrast to that for various empirical networks, e.g., the Facebook 1 data set (see Supplementary Fig. S1).
Third, regardless of the values of t 1 , t 2 , and µ, the commB-SBM shuffling method produces networks with the P ≥ (k s ) more akin to the original one than the other shuffling methods do. This result is consistent with that for the empirical networks presented in the main text.
In a nutshell, the analysis of the k-core decomposition of networks generated by the LFR model reveals that the presence of communities is not enough to justify main properties of the k-shell structure observed in the empirical networks.  Supplementary Table S3: Summary of the properties of the networks generated with the LFR model. For each combination of parameters t 1 , t 2 , and µ we report the number of edges, L, minimum degree, k min , average degree, k , maximum degree, k max , degeneracy, D, number of communities, N c , and modularity, Q, for communities extracted using either the Louvain (Lvn) or stochastic block model (SBM) method. All networks have N = 10000 nodes.   Supplementary  Table S3). See the caption or Supplementary Fig. S7 for notations and legends.

LFR parameters
4 Relationship between community structure and k-core decomposition In this section, we examine the number of communities to which the nodes in each k-shell belong, with the aim of examining whether or not those nodes are concentrated into one or a small number of communities, particularly for nodes in innermost k-shells. This analysis is similar to Supplementary Fig. S6, whereas in that case we focused on the averaged community size. Supplementary Figure S11 shows the number of distinct communities to which the nodes with a given k s value belong, denoted by n C (k s ), for all the data sets. In agreement with Fig. 3, some data sets show a strong concentration of the innermost k-shells (i.e., nodes with large k s values) into one or a few communities. Next, we ask whether or not the number of communities across which each k-shell is distributed is a byproduct of random interactions. To answer this question, first, for each network, we extract communities using either Lvn or SBM. Second, we compute n C (k s ) for each k s . Third, we compute the same quantity for the case in which we permute the association between the k-shell index of each node, k s (i), and the community membership of the node, g(i), uniformly at random; in fact, it is sufficient to randomly permute either {k s (1), . . . , k s (N )} or {g(1), . . . , g(N )}, not both. Fourth, we calculate the number of communities to which the set of nodes with a given k s value belong after the permutation, which is denoted by n S C (k s ). Fifth, using an approach similar to the calculation of the rich-club coefficient [10], we compute for each k s . A value of ϕ(k s ) larger (smaller) than 1 indicates that the number of communities to which the nodes having the k s value belong is smaller (larger) than in the case of the randomised association between the nodes and communities. Therefore, ϕ(k s ) larger than 1 implies that the nodes with the given k-shell index, k s , are concentrated into a relatively small number of communities as compared to randomised counterparts. In Supplementary Fig. S12 we plot ϕ(k s ) against k s for all the data sets. We observe that, with the exception of the Spanish and British Cookpad's networks, ϕ(k s ) tends to be larger than 1. This result implies that, on average, nodes of a given k-shell tend to belong to less communities than the randomised case. We stress that the permutation of either the k-shell index or the community membership sequences may return networks whose k-shell and community structure are not physically plausible. For instance, if a node i receives a k-shell index value of α upon randomisation and α is larger than k i (i.e., degree of node i), then the node cannot belong to the corresponding k-shell.   Figure S12: Ratio, ϕ(k s ), (see (S1)) plotted against the k-shell index, k s , for all the data sets. We identified the community structure using either Lvn or SBM. Each panel accounts for a different data set. Results are averaged over one hundred runs of randomisation between the association between the node's k-shell index and community label. The horizontal dashed lines represent ϕ(k s ) = 1.