Null Model and Community Structure in Multiplex Networks

The multiple relationships among objects in complex systems can be described well by multiplex networks, which contain rich information of the connections between objects. The null model of networks, which can be used to quantify the specific nature of a network, is a powerful tool for analysing the structural characteristics of complex systems. However, the null model for multiplex networks remains largely unexplored. In this paper, we propose a null model for multiplex networks based on the node redundancy degree, which is a natural measure for describing the multiple relationships in multiplex networks. Based on this model, we define the modularity of multiplex networks to study the community structures in multiplex networks and demonstrate our theory in practice through community detection in four real-world networks. The results show that our model can reveal the community structures in multiplex networks and indicate that our null model is a useful approach for providing new insights into the specific nature of multiplex networks, which are difficult to quantify.


Results
The Basic Model and Redundancy. In this paper, we choose an adjacency matrix to represent a network because it contains all the connection relationships in the network. A multiplex network consists of a set of networks. Therefore, we use the set of adjacency matrixes representing each isolated network in a multiplex network to preserve the complete connection information of the multiplex network. That is, MN = {A 1 , A 2 , …, A k , …, A M }, k ≤ M, where M denotes the number of networks in the multiplex network and A k = (a ij ) N × N represents the adjacency matrix of each single network k. N represents the number of nodes in the network.
Because more than one network exists in a multiplex network, an edge between node i and node j could exist in duplicate. This redundancy represents the degree of repetition of a graph structure; therefore, this measure captures the phenomenon that a set of nodes constituting a community in one network tend to also constitute a community in other networks. Such redundancy is a basic attribute of multiplex networks. Here, we first define ERC (Fig. 1a-c) as follows: Definition 1. Edge Redundancy (ERC): The ERC refers to the number of duplicates of an edge in a multiplex network. We use m ij to represent the measure where E k is the set of edges in layer k and e ij is the edge between node i and node j. The formula (1) means the number of layers where there are edges between node i and node j and minus one. To a certain extent, the ERC captures the phenomenon that an edge that exists in one network tends to appear in other networks. Intuitively, the edges with high ERC values should be segmented into a community instead of between communities. Naturally, we could divide the edges into M groups according to the ERC. We also define the NRD as follows: Definition 2. Node Redundancy Degree (NRD): The NRD of node i refers to the number of connected nodes j for which the ERC values of edge m ij differ. We use r i m to represent the m-order NRD of node i, which denotes the number of connected nodes j for which the ERC of edge m ij is equal to m: where M denotes the number of single networks in the multiplex network. The NRD r i m represents the degree of the connected edge for which the ERC equals m (Fig. 1a,b,c). When the multiplex network degenerates to a single network, the NRD r i m becomes the degree k i of node i. Therefore, the NRD is a new parameter that measures the degrees of nodes in multiplex networks.
k k where A k = (a ij ) N × N represents the adjacency matrix of each single network k. The element rm ij in the matrix refers to the number of occurrences of the edge between node i and node j. That is, rm ij = m ij + 1. Using the RM, we can simplify the calculation of the ERC and the NRD as follows: Null Model with Redundancy for Multiplex Networks. One of the null models of a single network proposed by Newman used the node degree k i to determine the structures of random networks; later, Mahadevan proposed their higher-order representations (see Supplementary Note 1). Because NRD is an evolution of the concept of node degree for multiplex networks, we use the NRD to define null models and their higher-order representations in multiplex networks. The null model with redundancy for multiplex networks is based on the configuration model of a single network 58,59 and dKGRAPHS 27 . The model can also be explained by Laplacian Dynamics 60 and random walk 35 . In this model, the edge probability of the configuration method and the random walk method are unified. Based on the null model of a single network, we introduce our null model with redundancy for multiplex networks (NMR): Definition 4. Null Model with Redundancy for Multiplex Networks (NMR): The NMR is a network model that matches the original multiplex network in NRD but is otherwise taken to be a random network instance.

Definition 5. K-Order
NMR: This network model matches the original multiplex network in size and d-order NRD distribution P(r) but is otherwise taken to be an instance of a random network. The 1-order NMR is shown in Fig. 2. A 1-order NMR is a random model for the whole multiplex network rather than for each layer. Therefore, the aggregated information can be encoded into the multiplex structure. However, the NMR is not only a randomized aggregate version of the original network but also of each layer of the network under the constraint of the NRD. In Fig. 2, each network layer is connected differently between the NMR and the original network. The connections in each layer are also randomized-but they are not completely random. The NRD is a measure that applies to the whole multiplex network. It describes the relationships among each layer and ensures that they are not completely independent in the multiplex network. Therefore, The synthetic network of the multiplex networks in (a). We combine these three networks into one network by adding an edge between two nodes if there is any edge between them in one of the three networks. (c) The ERC of the multiplex networks in (a). Edge (1,6) appears three times with the repeated number of two in the multiplex networks. Therefore, the ERC of edge (1, 6) is 2. (d) The NRD of the multiplex networks in (a). Node 1 has a 2-order NRD that is equal to 1 because there is one edge (1, 6) whose ERC is equal to 2 that is connected with node 1.
SCIEnTIfIC REPORTS | (2018) 8:3245 | DOI:10.1038/s41598-018-21286-0 randomization under the constraint of the NRD can randomize both the aggregated information and the information in each layer while preserving the basic relationships among each layer in a multiplex network.
Note that in multiplex networks, we use the NRD distribution instead of the degree distribution and "of the same size" means that the model has the same number of nodes N and number of networks M as the original multiplex network. Here, we provide the details of K-Order NMR, and a summary is shown in Table 1. 0-Order: A random network with the same number of nodes N, number of networks M, and average NRD as in the original multiplex network.
1-Order: A random network with the same number of nodes N, number of networks M, and NRD distribution P 1 (r) as in the original multiplex network.
2-Order: A random network with the same number of nodes N, number of networks M, and 2-order NRD distribution P 2 (r 1 , r 2 ) as in the original multiplex network.
N-Order: A random network with the same number of nodes N, number of networks M, and n-order NRD distribution P n (r 1 , r 2 …, r n ) as in the original multiplex network.

Modularity of Multiplex Networks.
In this study, with the NMR, we propose the modularity of a multiplex network. Based on the modularity of a single network (see Supplementary Note 5), the modularity of a multiplex network refers to the actual number of edges within communities minus the expected number of such edges in the 1-order NMR.
In a multiplex network, the actual number of edges between node i and node j is rm ij in the RM, and the expected number of such edges in the first-order NMR is (2 ) , where μ is the total number of edges, p(i, j) is the probability of there being an edge between node i and node j in the NMR (See Methods), M is the number of layers in the original multiplex networks and μ m is the number of m-ERC edges, meaning that there are μ m edges whose ERC equals m. According to this definition, we can obtain the modularity function of a multiplex network:

Tag k-order
Property symbol k-order -distribution n-order Pn P(r 1 , r 2 , …, r n )  where g i refers to the community that node i belonging to, δ(g i , g j ) = 1 if g i = g j , and δ(g i , g j ) = 0, otherwise. When the number of networks M is 1, the multiplex network degenerates into a single network and the modularity function of the multiplex network automatically becomes the single network modularity proposed by Newman. Thus, we can consider the modularity function of a multiplex network as an extension of single network modularity to multiple networks. Compared with the modularity of a multi-slice network, this function focuses on the impact of NRD instead of on virtual connections, which do not exist in reality. Thus, our framework is more in line with the actual structures of multiplex networks and is a more acceptable measure for analysing multiplex networks.

Community Detection in Multiplex Networks.
We first give a definition of community in a multiplex network: Definition 6. Community in a Multiplex Network: In a multiplex network, a community consists of a group of nodes that are tightly connected. Here, the tight connection means that many more edges exist within the community than among the communities. Note that each layer of a multiplex network contains the same nodes but the edges are different; the number of edges between two nodes should be calculated from all layers of the network.
We executed some community-detection algorithms across the Twitter event networks, Noordin terrorist relationship networks, student-cooperation social networks and global terrorism networks. These algorithms are BGLL for multiplex networks (BGLLMN) 61,62 , bridge detection (BD) 63 ,tensor decomposition for multiplex networks (TD) 64 , Modularity-driven Ensemble-Based Community Detection (M-EMCD) 65 , Multidimensional Label Propagation Algorithm (MDLPA) 66 , Multilayer Local Community Detection (ML-LCD) 67 and our modularity function for multiplex networks (see Supplementary Note 2). Figure 3 shows the results of this quantitative comparison (see Supplementary Note 3) on three of the tested networks and indicates that the modularity function for multiplex networks results in higher-quality communities than do the other tested methods (see Supplementary Note 3). In addition, the results in Fig. 3 show that communities in real networks always have much higher redundancy, which verifies the importance of checking the redundancy in multiplex networks.
Twitter Event Networks. We analysed the relationships among events detected from Twitter. The tweet stream is captured through the Tweet API 68 . Tweets are clustered using similar keywords to detect the Twitter events. Each node in the network represents a Twitter event. We build the three networks below to construct the multiplex relationship among Twitter events (see Supplementary Note 4). The results of the four community-detection algorithms are visualized in Fig. 4. To facilitate the visualization, we combined the three networks into one network. Nodes of the same colour represent a community, meaning that these nodes correspond to the same event. In Fig. 4, BD results are not obviously better than those of the six other algorithms, but the other six algorithms could not be judged intuitively. Therefore, we present the community quality measures in Table 2. As listed in Table 2, the three measures for our method is much higher than those of the other methods, especially redundancy, which is 0.16, and the ground truth, which is 0.2. The high redundancy and node similarity lead to the high accuracy (72%) of our method, which is considerably higher than the accuracies achieved by BD and TD. BGLLMN also attains high accuracy (72%) because it is based on the novel modularity of a single network. When we combine these three networks into one network, some connection information is lost, but these losses are determined by the network structures. When the losses are relatively low, BGLLMN can exhibit good performance; however, the community redundancy of BGLLMN (0.07) is still much lower than that of our method (0.16). Also, the three new algorithm (M-EMCD, MDLPA and ML-LCD) perform the relative high accuracy but low redundancy, which means that our modularity for multiplex network catch the redundancy of the network.
Noordin Terrorist Relationship Networks. Using the Noordin terrorist network data 69,70 , we constructed the multiplex terrorist relationship networks based on six relationships between terrorists. Each node in the network represents a terrorist (see Supplementary Note 4). The results of the four community-detection algorithms are visualized in Fig. 5. To facilitate visualization, we combined the six networks into one network. Nodes of the same colour represent a community, meaning that these nodes likely belong to the same terrorist organization. In Fig. 5, shows almost the same conclusion as in the previous test-that the results of BD are not better than those of the three other methods. The number of communities found by BGLLMN is less than that for TD and for our method. This result may cause high node similarity (shown in Table 3) because most of the nodes are divided into the same community, which results in pairs of nodes having more common neighbours (see Supplementary Note 3). However, according to Table 3, our method still has the highest community redundancy (0.30) and accuracy (27%), which again shows that the communities in real multiplex networks always have high redundancy. Three new algorithms still have a good performance on node similarity and accuracy but low redundancy. In addition, the accuracy of all the algorithms is low because there are many noise data and the ground truth may not agree with the network structure. Therefore, we could judge only whether the algorithm is good or bad through comparisons. Based on the results, our method performs better than do the others (see Table 3).  Students' Cooperation Social Networks. The Students' Cooperation Social Networks dataset is constructed based on a Computer and Network Security course given at Ben-Gurion University of the Negev 71 in which students are required to submit a paper to specific web sites. We built the students' cooperation social networks based on the course website log. Each node in the network represents a student (see Supplementary Note S4). The results of the four community-detection algorithms are visualized in Fig. 6. To facilitate the visualization, we combined the six networks into one network. Nodes of the same colour represent a community, meaning that these nodes likely belong to the same group. In the Students' Cooperation Social Networks, the first network represents the partner relationships between pairs of students (see Supplementary Note S4). We use these disconnected communities as the ground truth and the other two networks as noise data. Intuitively, BGLLMN and our method perform better than do BD and TD, as shown in Fig. 6, because the community discrimination in the BD and TD results is insufficient. From the measure comparison in Table 4, we can see directly that BD and TD have lower values on all three measures than do the other five methods. Though the MDLPA has the highest redundancy (0.21), the accuracy of it (45%) is much lower than our method (57%). This is because the MDLPA detect 29 communities, which is less than our method (49). Moreover, there are 51 communities in the real network. Our method has the highest values of node similarity (0.34) and accuracy (59%). The results show that in an environment with noisy networks, our method demonstrates a strong anti-noise capability.
Global Terrorism Networks. From the database of global terrorism 72 , we created four networks in which one terrorist organization is connected to another if they both performed an attack in the same country during the same year. Each node in the network represents a terrorist organization (see Supplementary Note 4). Nodes of the same colour represent a community, meaning that these nodes performed an attack in the same country. In Fig. 7a, there are four complete sub-graphs in each network. The other single nodes in the networks are organizations that did not attack during this year and in this country; therefore, there is no connection between them.   We find that the six other community-detection algorithms (BGLLMN, BD, TD, M-EMCD, MDLPA, ML-LCD) obtained the same results: the community is divided into these four networks, as shown in Fig. 7b. When we combine these four networks into one network, none of the edges are redundant except for the edges in the red box in Fig. 7c, which displays the results of our algorithm. The nodes in the red box are connected to each other by edges with a weight of 2. The four nodes are divided into different communities, meaning that our algorithm could reveal the organizations that performed attacks twice in two countries. More generally, our community detection function captures the edges with high redundancy, leading to the high redundancy of communities. This is because we achieved high accuracy on the three multiplex networks described above (see Tables 2, 3, and 4).

Discussion
The results reported in the preceding section demonstrate the advantageous community detection performance on real-world multiplex networks based on the NMR. In all three networks, our algorithm obtained considerably higher values on all three measures: node similarity, community redundancy and accuracy with ground truth. In turn, the meaningful community structures with different redundant parts of multiplex networks are revealed by our NMR, as demonstrated on the fourth multiplex network. Therefore, we have shown that our framework accurately reflects the community quality and that it maximally preserves the community redundancy, which indicates that it could be a reasonable function for community detection in multiplex networks. The general conclusion from the results presented in this paper is that communities in real-world networks always have much higher redundancy, which verifies the importance of capturing the NRD in multiplex networks. Both the theoretical and experimental results show that NRD is a reasonable measure for describing the connection relationships of multiplex networks. With regard to a single network, NRD automatically degenerates to the node degree. Therefore, NRD is a more general and fundamental measure that includes the node degree  as a specific case for single networks. Indeed, this measure can be used in systems with arbitrary nodes, edges and layers-not only in social networks as described above but also in other multi-layer networks such as traffic networks, metabolic networks, epidemic networks and Internet topology. In a more general sense, the NMR is a general null model for any multiple-relationship system such as the social networks utilized above. We developed the NMR and its higher-order representation using the basic configuration method based on the NRD. The rationality of the NMR can also be explained by the traditional random-walk theory. The connection between the 0-order NMR and the original networks is almost completely random, except for size. As the order increases, the model gradually becomes closer to the original multiplex network, and as more attributes match those of the original network, the model becomes the same as the original multiplex network. For different purposes, the order of the NMR can be controlled to guarantee the connection similarity to the original network, and other properties of the original network can be exposed by the comparison.
The general significance of the NMR is that in addition to community structure, many other specific properties can be revealed through the different orders of the model. These properties, including motif identities, propagation-rate threshold, redundancy-distribution correlations and synchronization-state stability, have already been shown to be important in network science. Additionally, the NMR can be used in directed networks based on in-and-out NRD. For example, a comparison of the number of structures appearing in the NMR with the same in-and-out NRD distribution may help researchers determine whether this higher-order structure is the most important motif in the original multiplex network. Our future work is based on such extensions of our NMR and its high-order representations, which may lead to some problems involving the applications of all systems with multi-relationships that can be described by multiplex networks.
Finally, our null model of multiplex networks provides a powerful tool for the structure analysis of complex systems with multiple relationships. Through comparisons, the specific nature of these multi-relationship systems can be exposed quantitatively by the NMR. We believe that the NMR can give rise to much stronger and more general applications in many areas, including social science, Internet topology, bioscience, engineering, economics, and education, where multi-relationship systems can be described by multiplex networks. To accomplish this, much more work needs to be done to gain a deeper understanding of the model and its high-order representations, such as a determination of the NRD distribution law. We hope that many more attributes of multi-relationship systems can be modelled and analysed through the null model with redundancy for multiplex networks.

Generation of the 1-Order Null Model with Redundancy for Multiplex Network.
To generate the one-order NMR, we introduce the random configuration model of multiplex networks based on the configuration model in single networks. 1) i m edges that can be assigned for the m-ERC edges. For the entire network, a total of 2 μ m edges can be assigned for the m-ERC edges. We consider that the process of one edge selecting the two end nodes is independent. Therefore, the probability that node i and node j in network A i are assigned an m-ERC edge is (4) Assign all the edges to the model. Note that the processes of edge assignment to networks and to nodes are independent. Thus, at the end of the assignment processes, the probability of an edge existing between a node i and node j in one network is (2 ) , 0 Thus, in M networks, the probability of an edge existing between node i and node j is: (2 ) , 0 Explanation of Random Travel. We can also generate our NMR based on a random walk under Laplacian dynamics 60 . Here, we suppose there is a traveller who travels randomly from any one node to any other node in a multiplex network, even if the two nodes are in different networks. In contrast to a random walk on a single network, the traveller can travel between different networks only when two nodes are connected in any network. Thus, we call the agent a "traveller" rather than a "walker".
Because the edges can be divided into M groups according to their ERC values, we can divide the multiplex network into M layers in which the ERC is the same for all edges in each layer. Thus, the traveller can travel among all the layers of the multiplex network, which means that the traveller can choose edges with different ERC values to travel between layers. In the m-layer, where the ERC values of all the edges are equal to m, the probability of the traveller travelling from node i to node j in the model is The random travel process, similar to the random walk process, is a Markov process. When the process is stable in each layer, the steady-state probability distribution is Thus, the joint probability of traveler traveling from node i to j in one network in the model is: (2 ) , 0 (2 ) , 0 The probability of each edge occurring in the random travel model is the same as that in the random configuration model. Thus, the two models are unified for multiplex networks, which verifies the correctness and validity of our NMR.

Fast Algorithm of Community Detection based on the Multiplex Networks Modularity
Function. In the era of big data, the scale of networks is becoming increasingly large. Thus, we propose a new SCIEnTIfIC REPORTS | (2018) 8:3245 | DOI:10.1038/s41598-018-21286-0 fast algorithm for community detection based on the multiplex networks modularity function in large networks (FCDMNN). This work is based on the work of V. D. Blondel 61 . The steps in the algorithm are as follows: (1) Initialization: We regard each node in the multiplex network as a community. Thus, the number of communities is N, which also denotes the number of nodes. (2) Traverse each node i in the multiplex network to find all the nodes connected with node i. Compute the modularity increment ΔQ of each neighbouring node k of node i. ΔQ is defined as follows: is the expected number of edges of 1-order NMR.
(3) Find the community C k of node k with the maximum ΔQ. Add node i to community C k . (4) Repeat steps (2) and (3) until the communities no longer change. (5) When step (4) is complete, regard each community as a node. The edges within each community can be regarded as the loopback weighted edges of the new node. Here, the weight is the number of edges within the community to which the node belongs. (6) The edges between two communities can be regarded as the weighted edges of the two new nodes. Here, the weight is the total number of edges between the two communities to which the nodes belong. (7) Repeat steps (2)-(5) until the communities no longer change.

The time complexity of FCDMNN is
where r max m denotes the maximum m-order NRD and N refers to the number of nodes. Compared with the BGLL algorithm, the time complexity of our algorithm is slightly higher. However, for large networks, ∑ r m max m is far less than the number of nodes N. Thus, the time complexities of the two algorithms are both O(n). However, our algorithm is acceptable for multiplex networks and the quality of the resulting communities is better compared to other multiplex-network community detection algorithms.