Multi-resolution community detection in massive networks

Aiming at improving the efficiency and accuracy of community detection in complex networks, we proposed a new algorithm, which is based on the idea that communities could be detected from subnetworks by comparing the internal and external cohesion of each subnetwork. In our method, similar nodes are firstly gathered into meta-communities, which are then decided to be retained or merged through a multilevel label propagation process, until all of them meet our community criterion. Our algorithm requires neither any priori information of communities nor optimization of any objective function. Experimental results on both synthetic and real-world networks show that, our algorithm performs quite well and runs extremely fast, compared with several other popular algorithms. By tuning a resolution parameter, we can also observe communities at different scales, so this could reveal the hierarchical structure of the network. To further explore the effectiveness of our method, we applied it to the E-Coli transcriptional regulatory network, and found that all the identified modules have strong structural and functional coherence.

the corresponding two meta-communities. The number of edges between nodes of the same meta-community leads to the weight of self-loop for this meta-community in the new network (see Fig. S1 c→d, e→f, g→h).
Step4: Repeat steps 2 and 3 iteratively, until the number of meta-communities no longer changes. Then communities can be restored from the last stable aggregated network (see Fig. S1 h→i). Figure S2 shows the number of meta-communities detected by our algorithm at each iteration on the real-world networks. As one can see, our algorithm generally converges after a small number of iterations, even on large networks. The number of meta-communities decreases dramatically in each iteration, and thus our algorithm converges fast. Figure S3 shows the evolution of modularity during the detection process of our algorithm on the four real-world networks.

Additional analysis of our method
In general, the modularity shows an increasing trend during the detection process, although we do not optimize modularity directly. This provides further experimental evidence for the effectiveness of our method. Note that the modularity does not always increase during our detection process. This implies that, in contrast to modularity optimization, in our method small communities will not be merged so as to increase the modularity. This eliminates the resolution limit issue of modularity-based methods (see Fig. S7).

Additional benchmarks
Here, we included three more algorithms in our experiments. The full list of the algorithms is shown in Table S1. In Figures S4,   S5 and S6, each point always corresponds to an average over 100 different network realizations. Figure S4 shows the results of different algorithms on the GN benchmark 12 . Generally, modularity-based methods perform quite well, especially for SG and FADM. Our method obtains a reasonable performance and runs extremely fast. Figure S5 shows the results of different algorithms on the LFR benchmark 13 . As can be seen, in contrast to the GN benchmark, modularity-based methods do not have remarkable performances on the LFR networks, and perform worse in the case of larger networks with smaller communities, due to the resolution limit of modularity 14 . Our algorithm performs fairly well on the LFR networks, and runs very fast, even faster than LP in some cases. Infomap performs the best in the case of larger networks. LE has rather poor performance and fails to find communities at low mixing parameters. Figure S6 shows the number of communities detected by different algorithms as a function of average degree on ER and SF random networks. The GN algorithm is excluded because it is too slow to be used for analysis. In both cases, our method, LP and Infomap always find a single community containing all nodes of the network when connections are densely enough.
However, the rest five methods: FG, FADM, Louvain, WT and LE, are not so good, as they always find a few communities even when the network is dense enough. Figure S7 shows the results of different algorithms on networks that are made of some cliques connected by a single edge In the last column, n and m denote the number of nodes and the number of edges in the network respectively.
with each other. As we can see, our method correctly finds all the predefined cliques in all cases. Figure S8 shows the performances of different algorithms on the real-world networks. We can see that the performance of our method is competitive compared with other methods, and the time complexity of our method is fairly low, even lower than LP in most cases. The actual values of modularity, NMI and execution time are shown in Table S2.

Detailed results of the E. Coli network
We have analysed the E. Coli transcriptional network in detail. Figure S9 shows the modules identified by our algorithm with λ = 1, which include 5 isolated nodes, 26 modules with two operons, 9 modules with three operons and 22 modules with more than three operons. Table S3 lists the entire operon list of the 22 modules which have more than three operons.

Time complexity analysis
We have measured the speed of our algorithm by computing the time of analysing a LFR network of increasing size N, and compared the results with those of the other four fast methods: LP, FG, Louvain and Infomap. It was shown in Fig. S10 that, the computation time of all the five methods depends linearly on the network size. But our algorithm has a significantly lower time complexity than the rest of four methods, which makes it very efficient in dealing with large-scale networks.  high, or the algorithm is not suitable for that particular network. In our algorithm, the resolution parameter is set to be 0.6.