Hierarchical Decomposition for Betweenness Centrality Measure of Complex Networks

Betweenness centrality is an indicator of a node’s centrality in a network. It is equal to the number of shortest paths from all vertices to all others that pass through that node. Most of real-world large networks display a hierarchical community structure, and their betweenness computation possesses rather high complexity. Here we propose a new hierarchical decomposition approach to speed up the betweenness computation of complex networks. The advantage of this new method is its effective utilization of the local structural information from the hierarchical community. The presented method can significantly speed up the betweenness calculation. This improvement is much more evident in those networks with numerous homogeneous communities. Furthermore, the proposed method features a parallel structure, which is very suitable for parallel computation. Moreover, only a small amount of additional computation is required by our method, when small changes in the network structure are restricted to some local communities. The effectiveness of the proposed method is validated via the examples of two real-world power grids and one artificial network, which demonstrates that the performance of the proposed method is superior to that of the traditional method.

Many real-world systems, such as social networks 1,4 , electric power grids 9 , communication networks 6 , and biological networks 3,5 , have a community structure, which consists of subsets of vertices that are densely connected to each other but sparsely connected to the rest of the network. In such a structure, the strong ties (intracommunity edges) are primarily along the shortest paths between the vertices of the same communities; conversely, the weak ties (intercommunity edges) that connect two communities are covered by many shortest paths between the nodes of different communities. Many methods [22][23][24][25][26][27][28][29][30][31] , inspired by this pattern, have been proposed to detect the community structure of networks 22 . Particularly, Girvan et al. proposed one elegant method 3 for community detection that iteratively deletes the highest betweenness edges and hierarchically decomposes the networks. Conversely, the BC calculation could be simplified by using the local structural information of these communities in a graph.
In this article, a new decomposition method that uses the local structural information from different communities is proposed to speed up the betweenness calculation for complex networks with a community structure. This method is proved to be rigorously valid, and can be applied to any network with community structure, inspective of any community detection methods. This improvement is much more evident for those networks that has numerous communities, uniform community size and strong hierarchy structure. Our analysis shows that the runtime complexity for such weighted and unweighted networks with known community structure can be even reduced from + O nm n l og n ( ) 2 and O(nm) to + and O(nm) to O(n 2 ), respectively, where c is the number of communities of the networks. Furthermore, the proposed method features a parallel structure, hence, its computation speed can be enhanced via parallel calculation. Moreover, when small changes in the network structure are restricted to some local communities, e.g., an line outage in power grids, only some additional computations are required for our method while a complete recalculation is needed by other methods. Therefore the proposed method is suitable for real-world, online BC-related application. The proposed method is compared with the traditional methods, and the results validate its superiority.

Hierarchical decomposition modelling.
Here we illustrate the model of hierarchical community structure and the updated community with a simple network composed of three communities, as shown in Fig. 1. Community of network consists of subset of vertices that are densely connected to each other but sparsely connected to the rest of the network such as C 1 , C 2 and C 3 in Fig. 1. The intercommunity vertices (v 7 , v 10 , v 16 , v 20 , v 21 ) are the terminal vertices of the intercommunity edges (e 7,10 , e 7,21 , e 16,21 , e 16,20 ). Those vertices (v 13 , v 18 , v 19 ) and the edges (e 10,13 , e 13,16 , e 20,21 ) that lie on the shortest paths between the intercommunity vertices of the same The community structure including three communities (C 1 , C 2 , C 3 ). (c) The hierarchy structure (HSN). (d) The original communities (C 1 , C 2 ) and the updated community ( ′ C 3 ). For the community C 3 , the condition that the shortest paths (e 21,18 + e 18,19 + e 19,20 ) between the two intercommunity vertices (v 20 , v 21 ) in C 3 are equal or greater than the shortest paths (e 21,16 + e 16,20 ) in the HSN, is satisfied, thus the community C 3 is updated to ′ C 3 by copying those vertices and edges (v 20, v 21, e 21,16 and e 16,20 ) through which the shortest paths between the two intercommunity vertices (v 20 , v 21 ) pass in the HSN. community (ISP) are called the ISP-vertices and the ISP-edges, respectively. Community structure consists of independent communities, as shown in Fig. 1(b). The hierarchy structure is the top level of a network and is also called the hierarchical subnet (HSN), which consists of intercommunity edges, intercommunity vertices, ISPvertices and ISP-edges, as shown in Fig. 1(c).
The proposed decomposition approach for BC calculation is to turn the BC computation in global network into the computation in hierarchical subnet (HSN) and every independent community (more detailed in Algorithm). However, the BC computation within some original communities will generate calculation errors if the communities satisfy the condition that the shortest path lengths between the intercommunity vertex pairs of the community are equal or greater than that of the HSN (see Supplementary Lemma S1). To solve this problem, the original communities satisfying the condition must be updated.
Those communities satisfying the aforementioned condition are updated by copying those vertices and edges through which the shortest paths between the intercommunity vertex pairs pass in the HSN into the community, such as the updated community ′ C 3 from C 3 shown in Fig. 1(d). For the communities that do not satisfy the condition, the update is not needed, e.g., the community C 1 and C 2 shown in Fig. 1(d).
Algorithm: For a network with a hierarchical community structure, the HSN plays a vital role because it bridges all communities, especially the intercommunity vertices and the intercommunity edge of the HSN. This indicates that the betweenness calculation for a network with a hierarchical community structure can be decomposed into two main stages by the integrated use of local and global information. The first stage involves searching for the shortest paths and calculating BC within local communities and the HSN independently. In the second stage, BC is calculated for every pair of vertices from different communities via the HSN. In the proposed method, we use breadth-first search to search for the shortest paths of unweighted networks and Dijkstra's algorithm for weighted networks, and Brandes' method is used to calculate BC in every community and the HSN. More specifically, the proposed method for computing betweenness is stated as follows (also see the pseudo-code of the method in Table 1): Step 1. Mark the intercommunity edges and the intercommunity vertices based on the community structure. Note that the structures of some networks are unknown in advance. In this case, the community structure of the network is detected by using a well-known fast method [22][23][24][25][26][27][28][29] .
Step 2. Isolate each community from the other communities. Then, search for and store the shortest paths for all intercommunity vertex pairs in each community. Mark ISP-vertices and ISP-edges.
Step 3. Distill the hierarchy structure to construct HSN by using the vertices and edges marked in Steps 1 and 2. Search and store the shortest paths for all intercommunity vertex pairs in HSN, calculate the BC with Brandes' method and store the number of the shortest paths between each intercommunity vertex pair for those vertices and edges through which the shortest paths pass.
Step 4. For any two intercommunity vertices in each community, if the shortest path length between them in the community (in Step 2) is equal or greater than that in the HSN (in Step 3), then update the community by copying those vertices and edges through which the shortest paths between them pass in the HSN into the community, as illustrated in Fig. 1(d).
Step 5. Search and store the shortest paths for all vertex pairs in each community except the intercommunity vertex pairs, calculate the BC with Brandes' method and store the number of the shortest paths between each intercommunity vertex pair for those vertices and edges through which the shortest paths pass.
Step 6. Based on the data saved in Steps 3 and 5, calculate the shortest paths for any vertex pairs of different communities except the intercommunity vertex pairs via the HSN and update the BC for those vertices and edges through which the shortest paths pass. The Step 6 of the proposed algorithm can be further decomposed ( Supplementary Fig. S1), and the BCs of vertices and edges lying in the shortest paths between the vertices of different communities, can be updated according to the Supplementary equations S6, S7 and S8.
The computational efficiency of the proposed decomposition method is superior to the computational efficiency of other methods 4,10,14,16 that are based on the global structural information, because only the local structural information from the communities and the HSN of network are required. Furthermore, from the mathematical analysis and proof (see Supplementary Method Section S1.2, S1.3), it is shown that the proposed method described by Steps 1-6 is rigorous and valid, and can be applied to any networks with hierarchical community structure, inspective of any community detection methods.

Computational complexity. The preprocessing runtime in Step 2 is w i m i and
for unweighted and weighted networks, respectively, where m i and w i are the number of edges and intercommunity vertices of the community C i , respectively. The runtime of Step 3 for unweighted and weighted networks is wq and (wq + w 2 log w), respectively, where w and q are the numbers of intercommunity vertices and edges in the HSN, respectively. For a community C i , the runtime of the BC computation inside for the unweighted network and the weighted network, respectively, where n i is the number of vertices of the community C i (in Step 5).
The runtime of the BC computation from a non-intercommunity vertex of any a community C i to all vertices of other communities is (n − n i ) for both the unweighted and the weighted networks (in Step 6), where n is the number of vertices in the network. Then, for all vertices of C i , the runtime is (n i − w i )(n − n i ). Furthermore, the BC computation runtime for all communities of an unweighted network can be calculated by , and c is the number of communities of the network. Similarly, for a weighted network, the BC computation runtime for all communities is given by Then, we analyze two particular networks: the one without hierarchical community structure, and the other one with hierarchical community structure of many communities which possess the same number of vertices, edges, intercommunity vertices, ISP-vertices and ISP-edges. For the former one, the whole network is a community, and thus the number of community is equal to 1 and w = 0, T uw and T w reach the maximum value at (T uw ) max = mn and (T w ) max = mn + n 2 log n, respectively. For the latter one, = = w n i n c , T uw and T w can be simplified as follows: if The community structure of the network is unknown then Detect the community structure; end if C 0 ← The number of communities; Mark the intercommunity edges and vertices; for i = 1 to C 0 do search and store the shortest paths for all intercommunity vertex pairs in C i ;

end for
Mark all ISP-vertices and ISP-edges; by using Brands' method; Construct HSN and search and store the shortest paths in HSN; Update the community C i ; end for Where k is average degree of the network.
, thus, T uw and T w can be approximated as follows: The above analysis shows that the computation complexities of the proposed BC method are ∈ T n mn ( , ) for the unweighted and weighted networks with known community structure, respectively. More specifically, the computation complexity is related to the hierarchical community characteristics of network: more numerous communities, more uniform community size and stronger hierarchical structure (smaller HSN) result in less computational time. For the networks with unknown community structure, community detection time must be considered in our method.
Extension to a dynamic environment and the parallel computation. In the dynamic environment, there could be frequent small changes of network topological structure which are restricted to some local communities. For instance, a line outage is likely to be trigged due to a random failure in power grids. In such a dynamic environment, the BC of networks needs frequent online update, for example, reevaluating the capacities of power transmission lines or traffic capacities in transportations.
In this aspect, a complete recalculation is required by other methods and this is time-consuming. In contrast, much less time is needed for our method in such a dynamic environment. For example, as shown in Fig. 2(a), only the BC within C 3 and between C 3 and C i (i = 1, 2, … , 5) are recalculated, and the total runtime complexity is . Hence, from equation (5), the runtime of the dynamic betweenness calculation is approximately decreased to n 3 /n times the runtime of the static betweenness calculation.
Our method also has a natural advantage in parallel calculation. The computational resources, including the memories and procedures required by our method, can be conveniently divided in a parallel manner, because the information used to search for the shortest paths comes from the independent communities and HSN. Furthermore, the data sets stored in Steps 3 and 5, which are used to calculate the betweenness, are also independent. A simple parallel implementation of our method is the allocation of computing resources according to the communities and the HSN. For instance, as shown in Fig. 2(b), a server and five CUPs are assigned to calculate the BC for a given network with five communities, and their computational tasks are assigned as follow: 1. The server detects and isolates the communities of network, marks the intercommunity edges and vertices, assigns computational tasks according the size and number of communities and send the community information to other CPUs; 2. Each CPU searches the shortest paths for all intercommunity vertex pairs in its community, marks and shares ISP-vertices and ISP-edges with the server. 3. The server constructs HSN, searches the shortest paths for HSN and shares the information of shortest paths with each CPU; each CPU calculates the BC and shares the shortest paths of its community with the server. 4. CPU i computes the BC and shortest paths within community C i and between C i and (C i+1 and C i+2 ) (if i > 5, i = i− 5) and shares those computational information with the server, the server updates BC for all vertices and edges.
For the above analysis, one can see that the main task of CPU i is to compute the BC of vertices and edges in its community C i and between C i and (C i+1 and C i+2 ), while the server coordinates tasks primarily, calculates BC of the HSN and updates BC for all vertices and edges. The computational tasks should be uniformly assigned in practical application for avoiding cask effect.

Results
Hierarchical community structure of test networks. We tested the proposed method on one artificial network with a known community structure and two representative real-world networks with unknown community structures, i.e., the Henan provincial power grid and the Gansu provincial power grid in China 32,33 . The artificial network consists of 9 random subnets (communities) in which each vertex has 16 edges on average. Each subnet has the same number of vertices and 3 interconnected intercommunity vertices. The Henan power grid consists of 310 vertices (nodes) and 932 edges, and the Gansu power grid has 1569 vertices and 4326 edges. The vertices represent transformer substations or power plants, and the edges denote their interconnections 34,35 .
The community structure of the artificial network is known in advance, but the community structure of each power grid needs to be detected before applying our method. Three detection methods are presented to detect the community structure of power grids. The first one is the voltage information based detection method (VIDM). The community structure information of a power grid can be detected via the voltage grade of the vertices and edges, because a power grid is designed and operated according to the voltage grade. Therefore, the information-theory based method 31 is used to divide each power grid into different communities by deleting the edges with high-level voltage (500 kV for the Henan power grid, 750 kV and 330 kV for the Gansu power grid). By using this method, the Henan and the Gansu power grids are divided into 9 and 5 communities, respectively. The second one is the geographical information based detection method (GIDM), which is used to divide the Henan and the Gansu power grids into 15 and 13 communities, respectively. The third one is the detection method 24 proposed by Radicchi et al. (RCDM). It is an effective and efficient approach to determine the community structures of networks that yields the correct number of communities without prior knowledge. By using the RCDM, the Henan and the Gansu power grids are divided into 9 and 6 communities, respectively, as shown in Table 2.
Computation accuracy. We test the effectiveness of our method on one artificial network with a known community structure and two real-world power grids with unknown community structures. The performance of our method is compared with the performance of the well-known Brandes' method. As shown in Fig. 3 (and see Supplementary Fig. S3), the BCs of each vertex and edge obtained by Brandes' method and our method perfectly match each other. Taking the results of Brandes' method as a reference, we also calculate the relative errors of our method. The accumulated relative betweenness errors of vertices and edges are both close to zero. The results indicate that the proposed method is rigorous and valid, and our method is accurate regardless of network partition methods (see also Method and Supplementary Method Section S1.2 and S1.3 for a detailed analysis and proof).
In Figure 4, for the artificial network, our method can speed BC calculation up to 6.17 times, as compared with Brandes' method. For the Gansu power grid which is partitioned into 5, 6 and 13 communities by VIDM, GIDM, and RCDM, respectively, as shown in Table 2, and the speedup factors of the hybrid methods, defined as the ratio of runtime of Brandes' method to that of the hybrid methods, are 1.57, 1.97 and 2.64, respectively. Even if the number of communities is the same, for instance, the Henan power grid is divided into 9 communities by VIDM and RCDM, respectively, as shown in Table 2, the asymmetrical community sizes bring about diverse computation speeds (the speedup factors are 2.32 and 2.51, respectively) (see also equation (1) and equation (2)).
The effect of the number of partitioned communities on the computational efficiency is compared, as shown in Fig. 5. This figure shows that the increasing numbers of communities lead to larger computational speedup factors (see also equation (1) and equation (2)), since the speedup factor curves of the artificial network, the Henan and Gansu power grids rise.
The strong community structure 36 , as illustrated in Fig. 6(a), is another factor that affects the computational efficiency of the proposed method. The strong (or weak) community structure means sparser (or denser) intercommunity edges between communities, namely, smaller (or greater) interconnection probability between communities. As shown in Fig. 6(b), one can see that the speedup factor gradually drops to 1, when the network structure is changed from strong community structure to no community structure (increasing interconnection probability). This is because a weaker community structure brings about a greater size of the hierarchical subnet (HSN) and results in a greater computational complexity (see also equation (1) and equation (2)). Figure 7, the artificial network consists of 9 random subnets (communities). Each subnet has 32 vertices randomly interconnected including three interconnected intercommunity vertices. Each vertex possesses 6 edges on average. Then we increase the number of vertices from 288 to 3456 with a step of 576 while keeping the number of communities  fixed. Note that some edges are added to ensure the average degree is fixed during the modification. As shown in Fig. 7(a), the runtime of both our and Brandes' method grow with increasing number of vertices in power function as theoretical predictions (i.e., O(n 2 ) and O(nm), respectively), furthermore, the runtime of our method is significantly lower than that of Brandes' method, which indicates the faster computational speed of our method. The runtime of our method in the dynamic environment is also shown in Fig. 7(b). The improvement rate of runtime is defined as a ratio of the runtime difference between Brands' and our methods to the runtime of Brands' method. Figure 7(b) shows that the computation time in the dynamic environment is significantly lower than the runtime required for Brandes' method. This is because that the previous BC information can be used to update the BC, when the inner structures of the communities are changed. Figure 7 also shows that the fewer the number of communities is changed, the faster the calculation speed is. This means that if more communities are changed,  more calculation is required to update the BC. Figure 7(b) shows the improvement rates of the runtime, which indicates that the improvement rates in the dynamic environment is higher than that of Brandes' method. The fewer the number of communities is changed, the higher the improvement rate is. Moreover, Fig. 7 shows that the improvement rate is decreased with the increase of the network size, which occurred because more computation time is required to search for the shortest paths of each vertex-pair in each community, when the number of communities is kept the same and the size of each community becomes increasingly larger. Notably, the improvement rate decreases slowly in a dynamic environment. This good performance is gained since that only additional computation is needed with the change of the community structure.

Runtime improvement rates on network size and dynamic environment. In
Investigation of our methods on networks of different density. As a first, we give a theoretical analysis on the effects of the density of both unweighted and weighted networks on the speedup factor of our method. From Brands' method and the equations (3)(4), the theoretical speedup factors, for unweighted and weighted networks with homogenous community structure, can be derived as follows: Secondly, we validate the above analysis on a larger set of artificial networks by varying average degrees for unweighted and weighted networks. As shown in Fig. 8, the speedup factors grow linearly with average degree as theoretical predictions of equation (9) and (10), when the networks are very sparse (in the initial stages). For both unweighted and weighted networks with the same number of communities, the speedup factors become increasingly hyperbolic with the growth of average degrees, which agree well with the theoretical results of equation (7) and (8). From the comparison of Fig. 8(a) and Fig. 8(b) (or equation (7) and (8)), one can see that the weighted networks have better speedup performance than unweighted networks.
Parallel implementation of our methods. Here, a representative example of the parallel implementation of our method is provided. The performance of parallel computation is tested on five artificial networks by six computers as shown in Fig. 9. The tasks of the six computers (CPUs or server) are divided as shown in Table 3,     Table 3. The task partition of parallel computation. For a computer (CPU), its internal task is to calculate the BC in and between its five communities assigned by the server, while its external task is to compute the BC between its communities and the communities of other CPUs (C i ↔ C j ). The main task of the server is to calculate the BC in the HSN, assign the tasks to other computers and update the BC.

Discussion
In this article, we propose a new decomposition method to enhance the efficiency of the betweenness calculation for networks with community structures. This method (including steps 1-6 in Methods) is rigorous and valid, and can be applied to any networks with community structure, inspective of any community detection methods (more detailed mathematical analysis and proof in Supplementary Method Section S1.2 and S1.3). The computational efficiency of our method is related to the hierarchical community characteristics of networks. Namely, more uniform community size, more numerous communities and stronger hierarchical structure (smaller HSN) in networks will result in less computational complexity of our method (see equation (1), equation (2) and . For such networks, if  c k 2 , the runtime of our method for the weighted networks can be even reduced from , and the runtime for the unweighted networks can be reduced from O(nm) to O(n 2 ), where n, m and c are the numbers of vertices, edges and communities, respectively (detailed in the computational complexity of Method Section). Our method can also speed up the betweenness calculation for other atypical network with community structure. Thus, our method shows better performance than traditional methods. Moreover, the time complexity in a dynamic environment can also be effectively reduced by using our method.
For networks with an unknown community structure, it is necessary to detect the community structure of the networks, and the runtime of our method must include the runtime of detecting the community structure. Indeed, the community partition quality of detection methods influences directly the computational performance of our method (see complexity analysis in Method Section), and unsuitable detection methods may worse the computational speed of our method. However, if the community structure is known in advance, our method is much faster than other methods. In all case studies, our method plus other detection methods (including GIDM, VIDM, RCDM) is still much faster than traditional methods. For a larger scale network with unknown community structure, our hybrid method may be inapplicable because of excessive division cost from the detection methods such as RCDM. Furthermore, our method is naturally suitable for parallel calculation because the shortest path and betweenness within each community can be independently calculated. It is promising that our method can also help to enhance the calculation speed of the variants of betweenness and centrality, such as stress centrality 1,14 .