Complex Network Clustering by a Multi-objective Evolutionary Algorithm Based on Decomposition and Membrane Structure

The field of complex network clustering is gaining considerable attention in recent years. In this study, a multi-objective evolutionary algorithm based on membranes is proposed to solve the network clustering problem. Population are divided into different membrane structures on average. The evolutionary algorithm is carried out in the membrane structures. The population are eliminated by the vector of membranes. In the proposed method, two evaluation objectives termed as Kernel J-means and Ratio Cut are to be minimized. Extensive experimental studies comparison with state-of-the-art algorithms proves that the proposed algorithm is effective and promising.

Scientific RepoRts | 6:33870 | DOI: 10.1038/srep33870 We seek the optimal solution through the evolution of a particle in the membrane structure and the exchange of an adjacent membrane structure among the optimal particles. Experimental results indicate that, in terms of time and effect, MOEA/DM performs better than MOEA/D. The rest of this paper is organized as follows. Section 2 describes the community detection and concept of multi-objective optimization. Section 3 elucidates the proposed MOEA/DM. Section 4 presents the experimental studies. Section 5 concludes the study.

Clustering problem related background
Network Community detection based on the graph. A network is usually expressed as a graph structure, G = G (N, V), where N represents nodes and V represents the relationships between the network's nodes. For a graph, G = G (N, V) can also be expressed as an adjacent matrix, A. For every element a ij of A, ij ij ij L(i, j) denotes that node i and node j are connected; w ij represents the weight value of the two nodes. The purpose of network community detection is to determine the characteristic similarities of the nodes in the network and then classify them. If a network is undirected and unweighted when an edge is connected between two nodes, then a ij = 1; otherwise, a ij = 0. The degree of node i is defined as = ∑ = k A i j n ij 1 . However, the degree of node i can also be expressed as =  For network we divide into m communities, S = {S 1 , S 2 , … , S m }, if ∀ ∈ i S i , ∑ > ∑ ∈ ∈ k k i S i in i S i out i i , this community is called strong sense, and community in a weak sense if The above statement shows that, in a strong community, each node has more connections within a community than with other communities in the network. In a weak community, the sum of the degrees within the rest of the community, is greater than the sum of the degrees within the community.

Multi-objective Optimization(MOP).
A multi-objective optimization problem is stated as follows: where Ω is the variable space.
Ω → R F : m contains m objective functions where R m is defined as the objective space. Unlike a single objective optimization problem producing one optimal solution, there are probably many, even infinite, solutions for problem (3). These feasible solutions are called Pareto Optimality. Let u, v ∈ R m . u is said to dominate v if for any i ∈ {1, 2, … , m}, u i ≥ v i and there exists at least one j ∈ {1, 2, … , m} that u j > v j . If there is no point X ∈ Ω such that X dominates X * , then X * is a Pareto optimal solution. All the non-dominated X * set is called Pareto Front. However, it is time-comsuming and even impossible to find the entire Pareto Front. Therefore, most algorithms aim to find out an even-distributed part of Pareto Front to represent the whole one.
Under certain conditions, a multi-objective optimization problem(MOP) can be decomposed into several single objective optimization problems(SOPs). There are two types of algorithms to decompose an MOP into a group of SOPs. The first type is weight aggregation based decomposition approaches, a set of weight vectors are used to convert an MOP into a number of SOPs using a scalarization method. the weighted Tchebycheff approach 27 and the PBI approach 21 are most widely used. The second type decomposes the objective space into a group of subspaces using a set of weight/reference vectors, which are most widely used in recent years.
The Tchebyshev method is classic and is expressed as: For each non-dominated solution x* of (3), there exists a weight vector lamda so that x* is the optimal solution of (4). We cannot conclude whether or not (1) the pareto front is concave and (2) the two objectives we use in this paper are discontinuous. When the pareto front is non-concave, the weighted sum approach does not work well, and that is why we choose the Tchebycheff approach.

MOEA/DM for Community Detection
Introduction of MOEA/DM algorithm. In 2004, MOEA/D was proposed. However, we discovered there is a lack of diversity in the pareto front. We assume the reason for this is there may be a number of sub problems corresponding to the same non-dominated solutions. Therefore, we propose a MOEA/DM algorithm to reduce the number of sub problems and improve the probability that the solution is not the same for each sub-problem and corresponding optimal.
Objective function. For an unsigned network, the degree of the node reveals the closeness between the nodes. Modularity density (D) 28 , widely used in a variety of community detection algorithms, is one of the most basic measurement standards that uses the degree of the node. D is defined as: In Eq. (5), each sum means the ratio between the difference of the internal and external degrees of the subgraph S i and the size of the subgraph. In the above formula, we define . Give a partition S = (S 1 , S 2 , … , S m ) of the graph, where S i is the vertex set of subgraph G i (i = 1, 2, … , m). However, MOEA/D divided D into two parts as two objectives one of which is NRA (negative ratio association) and the other one is ratio cut (RC).
RC is used to measure the connection density between communities and NRA is used to measure connection density within communities. If these two goals are optimized simultaneously These two objectives can be determined to minimize the community more closely and the internal connection between communities sparse connection. Given a partition S = (S 1 , … , S m ), S i are the decision variables, and m is the scale (i.e., the number of decision variables) of the problem.
Encoding and decoding of discrete population position. Proposed in the graph structure is a genetics-based adjacency matrix notation 29  modeling a network. Thus, a value of j, assigned to the ith gene, is then interpreted as a link between the nodes i and j, and, in the resulting clustering solution, the nodes are in the same cluster. The decoding of this representation requires the identification of all connected components. All nodes belonging to the same connected component are then assigned to one cluster. A main advantage of this representation is that it is unnecessary to fix the number of clusters in advance, because the number of clusters is automatically determined in the decoding step. Figure 3 illustrates the locus-based adjacency scheme for a network of seven nodes.

Crossover.
We choose the two-point crossover, in favor of uniform crossover, because the two-point crossover better maintains effective node connections in the network. Given two parents, A and B, we first randomly select two points i and j (1 ≤ i ≤ j ≤ N), and then everything between the two points is swapped between the par- An example of the operation of two-point crossover on encoding is shown in Fig. 4.

Combination of Evolutionary Algorithms and Membrane Structure. The main idea of MOEA/DM
is that object space is divided into a plurality of membrane structures and the solution of each membrane structure is initialized. Through population evolution within the membrane to screen out the best solution in each membrane and passed to the adjacent membrane structure. In each evolution of the membrane interior and also remove the worst performance of the solution. So in a sub-problem we choose the best solution is relatively more and to ensure that in each membrane is the best. Through a number of iterations, the solutions of each membrane structure is considered to be the best solutions of the sub-problem that corresponding membrane structure.
Through in comparison to the four known classification network and in two unknown classification network MOEA/D and MOPSO, MOEA/DM in the calculation of the cost of time is much faster than the other two algorithms and also in effect superior by. The algorithm flow can be expressed as Fig. 5.

Experimental Results
In this paper, we compare, mainly in time and Q 4,29 values, our proposed algorithm with one EA-based algorithm (MOEA/D) and one PSO-based algorithm (MOPSO). Experimental parameters are listed in Table 1. Compare the number of iterations of the three algorithms, those algorithms set for the 200 generation. Finally, we use the modularity proposed by Newman and Girvan. The modularity is defined as:   For each test instance, both MOEA/D and MOPSO were run independently 100 times on the same computer (Inter(R) Celeron(R)M CPU 520 machine, 1.6 GHz, 512 MB memory). The operating system is Windows 8. In our experiments, the following performance indexes are used. There are many parameters that can be set flexibly, shown on Table 1, such as the parameter that are used to store the dominant solution, that we call it niche. The parameter niche is used to determine the neighborhood size and influence on the performance of our algorithm. In order to find the best value, we run the algorithm 200 times with different niche. After a lot of experiments, when niche equals 13 more suitable for this algorithm. Apart from the parameter niche, the population size popsize, the iteration number maxgen and the cell number CellNum are also affected the results of the experiment. We use the method of controlling variables, and ultimately determine their values. We set the popsize equal 120, the gmax equal 200 and the CellNum equal 40.
Experimental results on real-world networks. In this section, we demonstrate the MOEA/DM application effects on five real-world networks. Of these, the dolphin social 30 , the American college football 31 , the Zachary's karate club 32 and the political book network found from V. Krebs are known to be true. For the Santa Fe Institute SFI 33 and the netscience networks 34 the true data classification is unknown. The characteristics of the networks are given in Table 2. Table 3, we reflect on the performance of Q and running time on the value of the data that is known to the specific classification of MOEA/DM. Table 4, we reflect on the MOEA/DM performance of the Q value and time cost on the two unknown exact classification data.
Comparison of algorithms on the karate network. Karate network. The Karate network is a social network analysis in the field of classical data sets. In the early 1970s, the sociologist, Zachary, took two years to  observe the social relations among the 34 members of an American university karate club. Based on these internal club members as well as external exchanges, he constructed the social relations between members of the network consisting of 34 nodes. An edge between two nodes means that between the corresponding communities at least two members frequent exchanges of friends. In Fig. 6(a), we show the true situation of the clustering karate network, In Fig. 6(b) we present the results of the clustering algorithm, MOEA/DM. In Fig. 6(b), MOEA/DM is divided into four categories: the top part is divided into two categories and the bottom part also divided into two parts. In Fig. 6(a), Point 10 (red) belongs to the real structure (upper part). According to our prediction, Point 10 (blue) should belong in the predicted structure (lower part) shown in Fig. 6(b). Other papers designate points (such as Point 10) as fussy nodes, i.e., it can be either classified to the first cluster or to the second one.
In the classification process, because points have just two edges to connect two different categories, points (such as Point 10) are divided into two parts. Although we used four categories (rather than two), we correctly divided the network. Table 3 shows that, although the performance value index, Q, for our MOEA/DM is consistent with the values for MOEA/D and MOPSO, the time value for MOEA/DM is superior to the times for MOEA/D and MOPSO.

Comparison of algorithms on the dolphin network. Dolphin network.
In New Zealand's life habits of 62 bottlenose dolphins, Lusseau 30 found the dolphin's interaction with a specific pattern, and constructed a social network containing 62 nodes. This dolphin network is naturally separated into two large groups: female and male. In Fig. 7(a), we show the true situation of the clustering dolphin network, In Fig. 7(b) we present the results of the clustering algorithm, MOEA/DM. In Fig. 7(b), MOEA/DM is divided into four categories: the top part is divided into two categories and the bottom part also divided into two parts. Table 3 shows that, in terms of the Q value, MOEA/DM's performance is the same as that for MOPSO, and both (MOEA/DM and MOPSO), in terms of Q value, perform better than MOEA/D. However, as indicated in Table 3 and shown graphically in Fig. 8, in terms of running time, MOEA/DM has the advantage that its running time is substantially less than half that of MOPSO.
Comparison of algorithms on the football network. Football network. When Jantonio Turner 31 wanted to find more football highlights and discovered that no other all-football channel existed, he founded the football network in August, 1996. He was first mentored by Sheldon Altfeld, who had launched his own channel and who by then was giving seminars to entrepreneurs who wished to begin their own networks.
The network is divided into twelve categories as shown in Fig. 9(a). Figure 9(b) shows the classification results after using the MOEA/DM algorithm. A comparison of Fig. 9(a) with Fig. 9(b) shows that the football network has a more complex structure than the Dolphin and Karate networks. In the football network, nodes belong to the    Table 4. Comparison of the data on the unknown.
same classare relatively decentralized. The real network structure, shown in Fig. 9(a), and our predicted network structure, shown in Fig. 9(b), have the same number of categories. From Fig. 9(b) we extracted the three categories on the right and placed them in Fig. 9(c). The three categories in Fig. 9(c) appear to classify the wrong point. The point that marked 58, 29 and the 43, 37, 91 be divided into the wrong position. An analysis of these points reveals that a characteristic they have in common is connecting to other classes is more prevalent than connecting to the edges of their own classes. The MOPSO algorithm divides the network into a like category, but more than 10 points are incorrectly placed. Table 3 shows that, in terms of the Q value, MOEA/DM's performance is better than MOEA/D and MOPSO. As indicated in Table 3 and shown graphically in Fig. 8, in trems of running time, MOEA/DM has the advantage that its running time is substantially less than half that of MOPSO.
Comparison of algorithm on the polbooks network. American political book network. The American political book network, based on American political books, is a network of V. Krebs, which has been established on Amazon's online bookstore. Network edges represent that more readers bought two books simultaneously. This information is obtained from the purchase of books on the web page provided by the "purchase of the book's readers also buy books." At the same time, according to the point of view and evaluation of the readers of the Amazon books, Mark Newman divided the node types into three categories: "free, " "conservative, " and "centrist. " The network is divided into three categories as shown in Fig. 10(a). Figure 10(b) shows the classification results after using the MOEA/DM algorithm. A comparison of Fig. 10(a) with Fig. 10(b) shows that the political network has a more complex structure than the Football networks. In the political network, nodes belong to the same classare relatively decentralized. MOEA/DM divides the network into eight category and the part that the color mark red, have divides 9 points are incorrectly placed. The category that color marks blue, have divides four major sub category the color mark orange, green, pink and yellow. Table 3 shows that, in terms of the Q value, MOEA/DM's performance is the same as that for MOEA/D, and both (MOEA/DM and MOEA/D), in terms of Q value, perform better than MOPSO. However, as indicated in Table 3 and shown graphically in Fig. 8, in terms of running time, MOEA/DM has the advantage that its running time is substantially less than MOEA/D.
Experimental results on unknown networks. The SFI 34 Figure 11 show the result of the MOEA/DM. From the picture, the network be divided into twelve category and MOPSO divided into eight category. From the Table 4, the Q value result from MOEA/DM better than the result from MOPSO and the time cost less than MOPSO.   Fig. 11, it very clearly that MOEA/DM splits the network into eight communities, with the same network the algorithm that MOPSO splits network into eight. From the top of Fig. 11 to the bottom, category at the top represents a group of scientists using agent-based models to study problems in economics and traffic flow we shows with the color blue. The second category represents a group of scientists working on mathematical models in ecology we shows with the color red. The third category which made up of four parts, the color in the picture is red, yellow, green, blue, represents a group of scientists working primarily in statistical physics. The two algorithm subdivide this group into four small ones. At the bottom of the figure is a group working primarily on the structure of RNA.

Concluding remarks
This study introduced an algorithm that combines membrane structure and an evolutionary algorithm, MOEA/ DM. In the process of studying the MOEA/D algorithm, it is found that a non-dominated solution corresponds to multiple sub problems. MOEA/DM algorithm, mainly in the number of sub problems and the corresponding solution of each sub problem, improves the number of solutions of one sub problem by trying to reduce the number of sub problems and the addition of film structure to try to ensure that each sub problem has a different number of solutions. Through experiments in the real network, it is found that this improvement has a certain effect. The following is a summary of the three improvements of MOEA/DM: 1. The diversity performance of the proposed algorithm is high because it places the target space on the average weight vector, and the membrane structure is divided into several parts. 2. The time efficiency of the proposed algorithm is higher than those of MOEA/D and MOPSO because the average algorithm to target by the proposed algorithm is divided into several parts: a few particles within the membrane of the evolutionary algorithm. 3. The effect of the proposed algorithm is better than those of MOEA/D and MOPSO, and spends much less time. Figure 8 illustrates that MOEA/DM has a great advantage in running time. Nevertheless, the results of the experiment in a real network indicate that although MOEA/DM rapidly and accurately locates the real community, it inevitably produces errors in terms of the community number. If a corresponding estimate of the network category is obtained, then the effect is better. In addition to the category of problems, we also determined that certain points (known as the concepts of point classification) still present a high probability of error. Such results are usually generated in the connection between two communities with the same side. If the two goals can be optimized, better results may be obtained. It is anticipated that the proposed algorithm for complex network clustering can be applied to the field of bioinformatics, such as Disease Gene network [35][36][37] , DNA binding protein network identification 38,39 , protein remote homology detection 40 , etc. It is of interests to consider machine learning methods [41][42][43][44][45] for network clustering. Recently, spiking neural network models, see e.g. refs 46-48 particularly the ones with-self organizing 49,50 have been a hot topic in the field of machine learning, it is expected to obtain interesting result with this new powerful model.