Isolate sets partition benefits community detection of parallel Louvain method

Community detection is a vital task in many fields, such as social networks, and financial analysis, to name a few. The Louvain method, the main workhorse of community detection, is a popular heuristic method based on modularity. But it is difficult for the sequential Louvain method to deal with large-scale graphs. In order to overcome the drawback, researchers have proposed several parallel Louvain methods (Parallel Louvain Method, PLM), which suffer two challenges: (1) latency in the information synchronization and (2) communities swap. To tackle these two challenges, we propose a graph partition algorithm for the parallel Louvain method. Different from existing graph partition algorithms, our graph partition algorithm divides the graph into subgraphs called isolate sets, in which vertices are relatively decoupled from others, and the PLM computes and synchronizes information without delay and communities swap. We first describe concepts and properties of isolate sets. In the second place, we propose an algorithm to divide the graph into isolate sets, which enjoys the same computation complexity as the breadth-first search. Finally, we propose the isolate-set-based parallel Louvain method, which calculates and updates vertices information without latency and communities swap. We implement our method with OpenMP on an 8-cores PC. Experiments on 18 graphs show that our parallel method achieves a maximum 4.62 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}× speedup compared with the sequential method, and outputs higher modularity on 14 graphs.

In the serial Louvain method, all vertices in the graph are accessed sequentially, and the calculation and synchronization of vertices are carried out in sequence.
The calculation of each vertex is based on the latest information of the formerly calculated vertex. In the PLMs, the parallel computing vertices have waiting time for other related vertices to synchronize information.
This latency would delay the convergence of PLMs, and reduce the modularities of graphs; As for the second difficulty, when two different vertices move to each other's community after calculation, the vertices swap occurs [18][19][20][21][22][23][24] .
After the vertices swap, few mergers between the vertex happens, and the community structure remains. The exchanged vertices only affect the community labels without improving the modularity of the community. Therefore, the vertices swap in the computing process will also hurt the convergence of the method. Most existing PLMs deal with the first difficulty in two ways: one is synchronizing information by compare and swap (CAS) vertices information in hash tables 15 , the other one is using "ghost"vertices (or virtual vertices) between subgraphs in different process 25 . These two existing solutions do not remove the latency entirely and have additional calculation and memory overheads.
And these existing PLMs deal with the second problem by labeling scheme, which reduces the modularity of PLMs.
From the existing PLMs, we find that the above two major difficulties of latency of synchronization and vertices swap are related to the topology of the graph.
In the PLMs, the difficulty of vertices swap is related to the parallel computing of adjacent vertices. What is more, the latency of information synchronization is not only caused by parallel computing of the adjacent vertices, but also by the vertices with common adjacent vertices.
In order to improve the efficiency of parallel computing without additional overhead, the parallel computing vertices are expected to decouple from their adjacent vertices and the vertices with common adjacent vertices.
In order to solve the bottlenecks of synchronization latency and vertex swap in the parallelization of Louvain method, we propose a novel graph partition algorithm of isolate sets partition algorithm, which divides adjacent vertices and vertices with common adjacent vertices into a series of subgraphs.
Therefore, the vertices in an isolate set are relatively independent in parallel computing, and cannot swap with each other.
Then we propose a PLM based on the isolate sets. It should be noted that our method requires undirected graphs.
We divide the graph network into subgraphs called isolate sets. Using properties of isolate sets, vertices in the same isolate set can be computed and synchronized without latency and without additional overheads. And vertices in the same isolate set are unable to swap communities naturally. Our methods are implemented on an 8-cores personal, and extensive experiments show that the proposed parallel method achieves a speedup of 4.62× compared to the sequential Louvain method, and meanwhile our method improves the modularity of communities in most cases.
The main contributions of this paper can be summarized as follows: • The definition and certain properties of the isolate set are proposed. Especially the vertices in isolate sets have properties of relative independence, which is the base of our method. • An algorithm for dividing graphs into isolate sets is given, and its computation complexity is equivalent to breadth-first search. Our partition algorithm might be not only used in parallel Louvain methods but also implemented on other parallel graph algorithms. • Isolate-set-based parallel Louvain method (IPLM) is proposed. Experiments show our method is scalable with higher modularity than existing PLMs on the shared memory architecture.

Results
This section employs 18 frequently used graphs. The experiment mainly shows the modularity and speedup of our method. The experimental results on popular datasets and analyses are reported as below.     Table 2, in the dataset wikipedia, the graph is covered by 1327 isolate sets. While 90% of vertices are included in the first 13 isolate sets (0.98% of all isolate sets), the remaining 1314 isolate sets contain only 10% vertices of the graph. By contrast, the dataset LiveJournal is covered by 450 isolate sets and 90% of vertices are included in the first 17 isolate sets (3.78% of all isolate sets). Obviously, vertices in LiveJournal are connected stronger, which affects the efficiency of our method.
On the dataset LiveJournal, our method has a little lower speedup than hash-table-based method with 2 threads, because this graph has less edges (34,681,189) than others. On this graph, the hash-table-based method has less waiting time for information synchronization, while the isolate-set-based method needs more time on the partitioning graph. However, the increasing threads reduce more computation time on our method than the hash-table-based one. And the speedup of our method is higher than hash-table-based method with 8 threads.
On the dataset Wikipedia, our isolate-set-based method has an obviously higher speedup than the existing method. Because the graph dataset Wikipedia has a large number of edges of 101,355,853. Hash-table-based method requires more waiting time to synchronize and update information. However, the latency in the information synchronization is reduced in our method, which enlarges the speedup of our method dramatically.
As shown in the Figs. 3 and 4, in our method, the time occupation of isolate sets partition is about 3.3% on LiveJournal and 6.0% on Wikipedia, respectively. Our partition algorithm costs less time on partitioning and consequently reduces much time on the information synchronization.
In the perspective of the community detection modularity, our method deals with the communities swap by properties of isolate sets. Restriction vanishes on the movement of vertices in our method, which indicates that the method has more searching spaces. Therefore, our method outputs higher modularity than other methods such as the sequential Louvain method and hash-

Experiments on extended graph datasets. Speedup experiments.
The improvement on the speedup of our method on extended graph datasets is shown in the Fig. 6.
We implement hash-table-based method and our isolate-set-based method on 18 graph datasets(including the dataset LiveJournal and Wikipedia). These datasets have vertices from 196,591 to 12,082,987, and edges from 899,792 to 375,535,932. Our method presents a higher speedup than hash-table-based method on 17 graph datasets. The exception is the dataset web-Google, on which our method has 2.14× speedup and is lower than the hash-table-based method of 2.66× . On the dataset Wikipedia-link-en, our method obtains 4.62× speedup with 8 threads (the best case). In contrast, the speedup is 1.22× with hash-table-based method. What's more, the average speedup of our method on these 18 datasets is 2.69× , which is higher than hash-table-based method of 1.78×.
Experiments of the modularity. The final community detection performance of our method on extended graph datasets is shown in Table 3.
As shown in Table 3, our method obtains the highest modularity on the 14 of total of 18 datasets. The sequential Louvain method obtains the highest modularity on 2 datasets. On the datasets amazon.ungraph and web-Google, the modularity of isolate-set-based method is lower than the sequential method by 0.01%. Among the     www.nature.com/scientificreports/ datasets, com-dblp, amazon, and com-orkut have the ground-truths. We compare the NMI of the three methods on these datasets. The results show that there is a slight difference between these three methods. However, on the datesets as-skitter and com-lj.ungraph, our methods' improvements are respectively 4.60% and 4.76% than the sequential method. What's more, our method can improve the quality of community detection in most cases.

Discussion
To further accelerate Louvain method of community detection method, we propose the definition of isolate sets in a graph, and prove several useful properties of the isolate sets.
Additionally, a graph partition algorithm is proposed, which partitions vertices of a graph into a series of isolate sets.
We propose a parallel Louvain method based on the isolate set partition algorithm. The isolate-set-based PLM takes advantage of the properties of the isolate set to synchronize and update information efficiently in parallel computing without latency.
After the first stage of the Louvain method, two vertices belonging to different communities join the communities that the other vertex is in, which is called community swap.
In the sequential Louvain method, only one vertex is computed and moved at a time. By contrast, several vertices are moved at the same time in PLMs. Existing studies solved the problem by the minimum label heuristic method, which restricts the direction of movement of vertices and decreases the search space. In our method, because of the properties of isolate sets, the vertices and their neighbors fail to fall in the same isolate set. A vertex and its neighbor cannot be moved at the same time. Therefore, the community swap problem is avoided.
Our method requires no extra computation and memory overhead and increases the searching spaces of the method and improves the quality of community detection.
The limitation of our method is that partitioning isolate sets needs extra time. And we will improve the efficiency of the partition algorithm.
To verify the efficiency of our method, we use 18 graph datasets and conduct comparisons with state-of-theart methods. The experiments results show that our method obtains 4.62 × speedup on the personal computer with 8 threads and improves the modularity of communities detection in most cases. However, our method degrades a little on a small graph, because the partition algorithm occupies a constant time.
Theoretically, the method depends on the statistical mode of degrees of vertices in the graph, which means the majority of vertices are connected strongly or weakly. When most vertices are connected strongly, the "tail" of the distribution of isolate sets is long, and there are plenty of isolate sets containing few vertices.
In the future, we will improve the efficiency of our method algorithm and implement our method on multicore architecture for community detection on large-scale graphs.

Methods
We observe that the two challenges mentioned in the last section are related to the topology of the graphs.
The parallel computing vertices are expected to be relatively independent of each other, and the relatively independent vertices reduce the latency in synchronization and thus avoid communities swap. www.nature.com/scientificreports/ Therefore, we propose an isolate sets partition algorithm, which divides the graph to isolate sets. The vertices in the same isolate set are relatively independent, which means that they have no common adjacent vertices. And our partition method has a computation complexity equivalently to the breadth-first search (BFS).
Definition and properties of isolate sets. In order to help to propose and describe our isolate partition algorithm, we give several definitions and lemmas. We define the dependency set, isolate set [29][30][31][32] , and maximum isolate set. Based on the definitions, we prove that the maximum isolate set is non-unique and that an undirected graph can be covered by a series of maximum isolate sets. G(V, E), ∀v ∈ G , N + (v) = {v} ∪ N(v) . The set N + is defined as the dependency set of vertex v.

Definition 1 Given an undirected graph network
In a graph G (V, E), the union of a vertex v and the neighborhoods of v ( N(v)) is defined as the dependency set of vertex v. Obviously, the dependency set of vertex v consists of vertex v and all adjacent vertices of vertex v.
In a graph network G(V, E), the set s is a subset of V. Vertices v i and v j are arbitrary two vertices in the set s, and sets N + (v i ) and N + v j are the dependency sets of v i and v j . If the intersection of these two dependency sets is empty set, the subset s is called an isolate set of graph G.
According to Definition 2, intersections of the dependency sets of vertices in an isolate set are empty. Sometimes, the isolate set is similar to the independent set, in which the vertices have no common edges. What distinguishes is that two arbitrary vertices in an isolate set have no common adjacent vertices. Especially, the sets of the single vertex are isolate sets.

the isolate set s is defined as an maximum isolate set of graph G.
In a graph G(V, E), s is an isolate set of G. Two arbitrary vertices v j ∈ G − s and the vertex v i ∈ s exist, which cannot make the intersection of these two dependency sets N + (v i ) and N + v j empty set. The isolate set s is an maximum isolate set of the graph G. That is to say, there is no vertex in set G − s that can be divided to the isolate set s.

Lemma 1 The maximum isolate set is non-unique. Except for the graph G(V, E) including one vertex, the maximum isolate set of G is unique, and the isolate set of G is V.
Proof 1 Suppose there is a graph G including 2 vertices at least, and the isolate set s is the only maximum isolate set of G. There is an arbitrary vertex v located in the complementary set of graph G ( ∀v ∈ G − s ). The vertex v forms a set s * containing itself ( s * = {v} ). According to the Definition 2, the set s * is another isolate set of graph G. Hence, if the set s * is a maximum isolate set, it conflicts with the former assumption that the maximum isolate set is unique. If the set s * is not a maximum isolate set, there must be a maximum isolate set including the set s * , which is different from the set s and also conflicts with the assumption.

Proof 2 Given an undirected graph G(V, E)
, we can find a series of isolate sets {s i } . The graph G can be covered with union of these isolate sets. Suppose the graph G cannot be covered with the union of isolate sets n i s i , and a vertex v 0 ∈ G − n i s i . According to the Lemma 1, we divide the vertex into a new isolate set s n+1 = {v 0 } . The union of isolate set n+1 i s i covers the graph G, which conflicts with the former assumption. Therefore, we can find a series of isolate sets s i to cover the graph G. The Lemmas 1 and 2 indicate that the maximum isolate sets can be found by the graph search algorithm. And we propose an algorithm for partitioning the graph into maximum isolate sets.
Isolate sets partition algorithm. According to the definition of isolate sets, in an isolate set the neighborhood of a vertex are totally different from the neighborhood of another vertex, which means that these vertices have been decoupled. The decoupling vertices can be parallelly computed without waiting for information synchronization, and the vertices cannot swap with each other in the movement. Therefore, the latency in the information synchronization is reduced without computing and memory overhead. At the same time, the communities swap is avoided.
According to the Lemma 2, an undirected graph can be divided into several isolate sets. The union of these isolate sets contains the whole vertices. In the first stage of the Louvain method, isolate sets are computed serially, and then vertices in the same isolate set are calculated in parallel to accelerate the Louvain method. However, the intersection of these isolate sets may not be empty. Such a fact means these isolate sets may share the same vertices, which are computed more than one time, unreasonably.
This part proposes a novel graph partition algorithm based on breadth-first search. Our algorithm enjoys the advantage of the avoidance of the repeat computation for the intersections. The algorithm divides the whole vertices of an undirected graph to several isolate sets. The union of these isolate sets covers vertices V of the graph G, and the intersections of these sets are empty, that is, we find s 1 , s 2 , s 3 , · · · , s n ⊂ G(V , E) such that i s i = V and i s i = ∅ . The pseudocodes are displayed in Algorithm 1.

Our algorithm contains 3 steps:
In the first step, the set N is initialized with vertices set V. In other words, the set N includes all vertices in the graph.
In the second step, we search an isolate set s i from the set N. Firstly, a set N ′ is initialized with the set N. Secondly, the two-level breadth-first search is implemented from an arbitrary vertex v j in the set N ′ . After searching, these searched vertices form the set B. That is to say, the set B is the union of adjacent vertices of v j and the neighbor vertices of adjacent vertices of v j .
Thirdly, the vertex v j is moved into the isolate set s i , and the other vertices in the set B are moved out of the set N ′ . These three processes are implemented repeatedly until the set N ′ is an empty set.
When the set N ′ becomes an empty set, the isolate set s i is found out.
In the third step, the vertices in the isolate sets are moved out of the set N. And the same operation as step 2 and step 3 is implemented alternatively until the set N becomes an empty set. When the set N is empty, we get the desired non-intersected isolate sets whose union keeps all vertices.
What surprised is that our algorithm enjoys quite low computation complexity from the experimental observation. Although it is quite hard to prove such a phenomenon rigorously, several explanations may help the understanding: Let m be number of the edges and n be the number of vertices, respectively. In the process of generating an isolate set, all the edges in the graph are traversed by breadth-first search only once. And the vertices of the graph are traversed by breadth-first search once. The computation complexity of this part is equal to the breadth-first search, which is O(n + m) . Then the traversed vertices are moved once, which has the computation complexity of O(n). The computation complexity of generating an isolate set is O(2n + m) . These steps are implemented iteratively, the total computation complexity of our partition algorithm is O(k(2n + m)) , where k is the iteration number of the algorithm. In the experiments, we found that k is always small. Therefore, the isolate Scientific Reports | (2022) 12:8248 | https://doi.org/10.1038/s41598-022-11987-y www.nature.com/scientificreports/ set partition algorithm does not degrade to the breadth-first search algorithm too much in the perspective of computation complexity. How to bound k theoretically will be left as future work.

Isolate-set-based parallel louvain method.
According to the timing of information synchronization, prior works on synchronization latency indicate two types: non-real-time synchronization and real-time synchronization. The non-real-time synchronization method utilizes the former information to implement the calculation and does not synchronize vertices information immediately after vertices calculation.
Many PLMs with a non-real-time method have been proposed, among which different synchronization opportunities are selected. Therefore, the latencies of these methods are different.
In some studies, the vertices information is updated at the end of stage 1 of the Louvain method, and the latency is the entire iteration time.
The speedup of these methods is from 3 to 6 × based on OpenMP 14 and 12× based on GPUs 33 .
To reduce the delay of synchronization, in some studies 25,[34][35][36][37][38] , the graph is partitioned into subgraphs. During an iteration, information is updated when a subgraph has finished computing.
The real-time synchronization method was proposed by Que et al. 15 . They utilize thread-safe hash tables to manage the vertices'information.
Parallel computing vertices in different threads read and update information in the hash table in parallel by Compare and Swap (CAS) operations, which guarantees the updated information is accessed to the other vertices (threads).
This work achieves a speedup of 49.8× in the"Giant Blue" supercomputer. Naim et al. 39 use two hash tables to manage the vertices information and communities information, which is different from the previous studies.
The authors implement their method on GPUs, and their work ultimately reaches the highest speedup of 270× on several medium graphs.
However, the hash tables have massive storage overhead and computation overhead, which is unaffordable for personal computers.
Moreover, the hash-table-based methods cannot eliminate the latency entirely, and it is difficult for hashtable-based methods to deal with large complex graphs on personal computers, which usually have less memory than supercomputers.

Information synchronization and vertices movement.
In the first stage of Louvain method, every vertex is traversed. When traversed, the vertex utilizes information about itself and its neighbors for the calculation to update. Unfortunately, latency always exists in the updating, which limits the power of parallelism. Among the current PLMs, one popular methodology uses the hash tables, in which vertices' information is organized by hash tables. When different vertices utilize and update the information of the same vertex at the same time, the CAS operation (compare-and-swap is an atomic instruction) guarantees the correct operation sequence of writing first and then reading. The computed information can be used to update other vertices. In this way, the vertices are computed parallelly.
Nevertheless, two bottlenecks trouble hash tables synchronizing information. The first one is that hash tables need additional memory and computation overhead. On a graph with hundreds of millions of vertices, methods based on hash tables require huge memory, which may overdrive personal computers' memory. The second one is that the likelihood of collision increases. When several parallel computing vertices modify the information of the same location, a significant waiting time is needed for completing these operations.
What's more, state-of-the-art PLMs deal with the communities swap by minimum label heuristic method. The minimum label heuristic method labels communities and utilize the notation to guide vertices movements, which restricts vertices movements and decreases the modularity of community detection.
To this end, we propose a parallel Louvain method based on isolate sets. Vertices in the same isolate set are decoupled. In our method, the parallel computing vertices are in the same isolate set. That is to say, these vertices have completely different neighbors. The parallel computing vertices in an isolate set can synchronize information in time without memory and computation overhead. In the first stage of the Louvain method, the computation of these vertices utilize information of their neighbor vertices without waiting for other vertices synchronization. After computation, these vertices update their information, which has no influence on other parallel computing vertices. Because of the properties of isolate sets, the vertices and their neighbors fail to fall in the same isolate set. A vertex and its neighbor cannot be moved at the same time. Therefore, the communities swap is avoided.
An example is shown in the Fig. 7. Vertices 3, 6, and 9 belong to isolate set s 1 , which are computed parallelly. The computation of vertex 3 requires the information of vertices 7 and 8. The computation of vertex 6 requires the information of vertices 1, 2, and 5. The computation of vertex 9 requires the information of vertex 4. The parallel computations of vertices 3, 6, and 9 are independent of each other. After computing, these three vertices synchronize information to their neighbor without waiting, because the neighbor vertices are occupied by these three vertices exclusively. And the movement target of vertex 3, 6, and 9 is one of its neighbor vertices, which are not in the parallel computing vertices. The communities swap is avoided.
We implement our method in OpenMP and C++ program language. Due to the shared memory programming of OpenMP, vertices in an isolate set are forked to different threads, and these threads carry out calculations and information synchronization independently. www.nature.com/scientificreports/ Implementation of the algorithm. From the definition, traversing these isolate sets is equal to traversing all the vertices. The existing PLMs traverse the vertices in order of their index. Different from the existing Louvain methods, our method traverses the vertices in order of the isolate sets. Our method computes vertices in the same isolate set parallelly. During the parallel computing of vertices, the information is synchronized in time, and the vertices are moved parallelly. In this way, the method utilizes the property of isolate sets to automatically synchronize information in an isolate set and update community information without relying on other information synchronization methods, such as hash tables. Therefore, all vertices in the graph are computed without latency and communities swap in the first stage of Louvain method. The isolate-set-based parallel Louvain method involves three stages and the pseudocodes are displayed in Algorithm 1: The first step is to partition the graph into a series of isolate sets. These isolate sets can cover all the vertices in the graph, and the intersection of these isolate sets is an empty set.
The second step is to iteratively traverse these isolate sets and calculate the community information of the vertices. The first time of the second stage of the method being executed, the communities' information of all vertices needs to be initialized, and each vertex in the graph is divided into a different community. After the initialization is completed, these isolate sets are traversed in turn. When the isolate set is traversed, the vertices in this isolate set are computed parallelly. After computation, these vertices are moved in parallel to the communities of their largest Q , and synchronize their information. Then this stage is implemented iteratively until the modularity of the entire graph no longer changes.
The third step is to restructure the graph, which merge vertices belonging to the same community into a new vertex, according to the updated vertices information. After the second stage of calculation, the vertices in the graph are moved to the neighbor vertices of the largest increase in modularity. And the communities information of vertices has been changed. The vertices belonging to the same community are merged into a new vertex, and the edges between these vertices are ignored. The edges between different communities are reserved.
These three steps are applied alternatively on the reconstructed graph until the modularity no longer changes or the change of modularity is less than a certain threshold. The threshold t threshold is a small quantity, which improves the robustness of the algorithm and ensures that the algorithm ends properly. What's more, the threshold avoids the extreme cases where all communities are merged into one community. And the final community detection result is obtained.