A multi-similarity spectral clustering method for community detection in dynamic networks

Community structure is one of the fundamental characteristics of complex networks. Many methods have been proposed for community detection. However, most of these methods are designed for static networks and are not suitable for dynamic networks that evolve over time. Recently, the evolutionary clustering framework was proposed for clustering dynamic data, and it can also be used for community detection in dynamic networks. In this paper, a multi-similarity spectral (MSSC) method is proposed as an improvement to the former evolutionary clustering method. To detect the community structure in dynamic networks, our method considers the different similarity metrics of networks. First, multiple similarity matrices are constructed for each snapshot of dynamic networks. Then, a dynamic co-training algorithm is proposed by bootstrapping the clustering of different similarity measures. Compared with a number of baseline models, the experimental results show that the proposed MSSC method has better performance on some widely used synthetic and real-world datasets with ground-truth community structure that change over time.

Scientific RepoRts | 6:31454 | DOI: 10.1038/srep31454 incorporate the temporal smoothness in spectral clustering. In both frameworks, a cost function is defined as the sum of the traditional cluster quality cost and the temporal smoothness item. Our method follows the evolutionary clustering strategy, but with one major difference. The intuitive goal of spectral clustering is to detect latent communities in networks such that the points are similar in the same community and different in different communities. There are several similarity measurements to evaluate the similarities between two vertices. A common approach is to encode prior knowledge about objects using a kernel, such as the linear kernel, Gaussian kernel and Fisher kernel. A large proportion of existing spectral clustering algorithms use only one similarity measurement. However, there is a problem in that the clustering results based on different similarity matrices may be notably different 11,27 . Here, we introduce a multi-similarity method to the evolutionary spectral clustering algorithm, which simultaneously considers multiple similarity matrices.
Inspired by Abhishek Kumar et al. 28 , we propose a multi-similarity spectral clustering (MSSC) method and a dynamic co-training algorithm for community detection in dynamic networks. The proposed method preserves the evolutionary information of community structure by combining the current data and historic partitions. The idea of co-training was originally proposed in semi-supervised learning for bootstrapping procedures where two hypotheses are trained in different views 29 . The cotraining idea assumes that the two views are conditionally independent and sufficient, i.e., each view can conditionally independently give the classifiers and be sufficient for classification on its own. Then the classification is restricted in one view to be consistent with those in other views. Co-training has been used to classify web pages using the text on the page as one view and the anchor text of hyperlinks on other pages that point to the page as the other views 30 . In another words, the text in a hyperlink on one page can provide information about the page to which it links. Similarity to semi-supervised learning, the clustering, which is based on different similarity measures, is obtained using information from one another by co-training in the proposed dynamic co-training approach. This process is repeated in a pre-defined number of iterations.
Moreover, the problem of how to determine the weight of the temporal penalty to the historic partitions, which reflects the user preferences on the historic information, remains. In many cases, this parameter depends on the users' subjective preference 26 , which is undesirable. We propose an adaptive model to dynamically tune the temporal smoothness parameter.
In summary, we introduce multiple similarity measures in the evolutionary spectral clustering method. We propose a dynamic co-training method, which accommodates multiple similarity measures and regularizes current communities according to the temporal smoothness of historic ones. Then, an adaptive approach is presented to learn the change in weight of the temporal penalty over time. Based on these ideas, a multi-similarity evolutionary spectral clustering method is presented to discover communities in dynamic networks using the evolutionary clustering 23 and dynamic co-training method. The performance of the proposed MSSC method is demonstrated on some widely used synthetic and real-world datasets with ground-truths.

Results
To quantitatively compare our algorithm and others, we compare the values of the normalized mutual information (NMI) 31 and the sum of squares for error (SSE) 32 for various networks from the literature. The NMI is a well known entropy measure in information theory, which measures the similarity of two clusters (in this paper, between the community structures Ĝ obtained using our method and G obtained from the ground truth). Assume that the i-th row of Ĝ indicates the community membership of the i-th node (i.e., if the ith node belongs to the k-th community, then = g 1 ik and = ′ g 0 ik for k ≠ k′ ). NMI can be defined as = , which is the normalization of mutual information Î G G ( ; ) by the average of two entropies Ĥ G ( ) and H(G). The NMI value is a quantity between 0 and 1, a higher NMI indicates higher consistency, and NMI = 1 corresponds to being identical. SSE can be defined as − ‖ˆˆ‖ GG GG T T F 2 , which measures the distance between the community structure represented by Ĝ and that represented by G. A smaller SSE, indicates a smaller difference between the prediction values and the factual values.
We compare the accuracy against three previously published spectral clustering algorithms for detecting communities in dynamic networks: the preserving cluster quality method (PCQ) 26 , the preserving cluster membership method (PCM) 26 and the traditional two-stage method. PCQ and PCM are two proposed frameworks that incorporate temporal smoothness in spectral clustering. In both frameworks, a cost function is defined as the sum of the traditional cluster quality cost and a temporal smoothness one. Although these two frameworks have similar expressions for the cost function, the temporal smoothness cost in PCQ is expressed as how well the current partition clusters historic data, which makes the clusters depend on both current data and historic data, whereas the temporal cost in PCM is expressed as the difference between the current partition and the historic partition, which prevents the clusters from dramatically deviating from the recent history. The traditional two-stage method divides the network into discrete time steps and performs static spectral clustering 11 at each time step. Each approach is repeated for 10 times, and the average result and variance are presented. The parameter for PCQ and PCM is α = 0.9. We begin by inferring communities in three synthetic datasets with known embedded communities. Next, we study two real-world datasets, where communities are identified by human domain experts. For concreteness and simplicity, we restrict ourselves in this paper to the case of two similarity measures. The proposed method can be extended for more than two similarity matrices. We choose to use the Gaussian kernel and linear kernel as the similarity measures among different data points. Then, the similarity matrices are = , where  v i and  v j represent the m-dimensional feature vectors and i ≠ j. In our experiments,  v i is a column vector of the adjacency matrix A at snapshot t, which is represented by A t . In other words,  v i is an n-dimensional feature vector. The σ is taken equal to the median of the pair-wise Euclidean distances between the data points.  Table 1. The performance in different GN-benchmark networks. When parameter z = 4, 5 and 6, the average degree of each node is 16 and 20 at each snapshot, we randomly select 1, 3 and 6 nodes change their cluster membership, respectively. Notice that the value of NMI and SSE is the average for 10 snapshots.
are divided into four communities of 32 nodes. Every node has an average degree of 16 and shares z links with other nodes of the network. Then, 3 nodes are randomly selected from each community and randomly assigned to the other three communities. For SYN-VAR, the generating method for SYN-FIX is modified to introduce the forming and dissolving of communities and the attaching and detaching of nodes. The initial network contains 256 nodes, which are divided into 4 communities of 64 nodes. Then, 10 consecutive networks are generated by randomly choosing 8 nodes from each community, and a new community is generated with these 32 nodes. This process is performed for 5 timestamps before the nodes return to the original communities. Every node has an average degree of 16 and shares z links with the other nodes of the network. A new community is created once at each timestamp between 2 ≤ t ≤ 5. Therefore, the numbers of communities between 1 ≤ t ≤ 10 are 4, 5, 6, 7, 8, 8, 7, 6, 5, and 4. At each snapshot, 16 nodes are randomly deleted, and 16 new nodes are added to the network for 2 ≤ t ≤ 10. Table 2 shows the accuracy and error of the community membership that are obtained by the four algorithms for SYN-FIX and SYN-VAR with z = 3 and z = 5. Table 2 shows that the MSSC method can handle dynamic networks well when the number of community varies, and when z = 3, the community structure is easy to detect because there is less noise. Hence, although MSSC does not perform well in NMI, it has a lower error for SYN-FIX.
Synthetic dataset #3. The third synthetic dataset is used to study the MSSC method in dynamic networks, where the number of nodes changes. Greene et al. 31 developed a set of benchmarks based on the embedding of events in    synthetic graphs. Five dynamic networks are generated without overlapping communities for five different event types: birth and death, expansion, contraction, merging and splitting, and switch. A single birth event occurs when a new dynamic community appears, and a single death event occurs when an old dynamic community dissolutions. A single mergeing event occurs if two distinct dynamic communities observed at snapshot t − 1 match to a single step community at snapshot t and a single splitting event occurs if a single dynamic community at snapshot t − 1 is matched to two distinct step communities at snapshot t. The expansion of a dynamic community occurs when its corresponding step community at snapshot t is significantly larger than the previous one and the contraction of a dynamic community occurs when its corresponding step community at snapshot t is significantly smaller than the previous one. The switch event occurs when the nodes move among the communities. The performance of a small example dynamic graph produced by the generator is shown in Fig. 1(c,d), which involves 1000 nodes, 17 embedded dynamic communities and a single contraction event. To evaluate methods, we constructed five different synthetic networks for five different event types, which covered 1000 nodes over 10 snapshots. In each of the five synthetic datasets, 20% of node memberships were randomly permuted at each snapshot to simulate the natural movement of users among communities over time. The snapshot graphs share a number of parameters: the nodes have a mean degree of 15, a maximum degree of 50, and a mixing parameter value of μ = 0, which controls the overlap between communities. The number of communities were constrained to have sizes in the range of [20,100]. In each of the five synthetic datasets, the node memberships were randomly permuted at each step to simulate the natural movement of users among communities over time. Table 3 shows the performance of five different methods in different events. We also find that the standard deviation for MSSC is smaller, which implies that the clustering results are more stable.
Real-World Datasets. NEC Blog Dataset. The blog data were collected by an NEC in-house blog crawler.
Given seeds of manually picked highly ranked blogs, the crawler discovered blogs that were densely connected with the seeds, which resulted in an expanded set of blogs that communicated with each other. The NEC blog dataset has been used in several previous studies on dynamic networks 24,26,35 . The dataset contains 148, 681 entry-to-entry links among 407 blogs crawled during 15 months, which start from July 2005. First, we construct an adjacency matrix, where the nodes correspond to blogs, and the edges are interlinks among the blogs (obtained by aggregating all entry-to-entry links). In the blog network, the number of nodes changes in different snapshots. The blogs roughly form 2 main clusters, the larger cluster consists of blogs with technology focuses and the smaller cluster contains blogs with non-technology focuses (e.g., politics, international issues, digital libraries). Therefore, in the following studies, we set the number of clusters to be 2. Figure 2 shows the performance. Because the edges are sparse, we take 4 weeks as a snapshot and aggregate all edges in every month into an affinity matrix for that snapshot. Figure 2(a) shows that although MSSC does not perform as well as NA-based PCQ and PCM in the first few snapshots, MSSC begins to outperform NA-based PCQ and PCM as time progresses. In addition, MSSC retains a lower variance than NA-based PCQ and PCM. This result suggests that the benefits of MSSC accumulate more over time than those of NA-based PCQ and PCM. Furthermore, Fig. 2(b) shows that MSSC has lower errors although it does not outperform the baselines in NMI at few snapshots.
KIT E-mail Dataset. Furthermore, we consider a large number of snapshots of the e-mail communication network in the Department of Informatics at KIT 36 . The network of e-mail contacts at the department of computer science at KIT is an ever-changing graph during 48 consecutive months from September 2006 to August 2010. The vertices represent members, and the edges correspond to the e-mail contacts weighted by the number of e-mails sent between two individuals. Because the edges are sparse, we construct the adjacency matrix among 231 active members. In the E-mail network, the clusters are different departments of computer science at KIT. The number of clusters is 14, 23, 25, 26, and 27, for the snapshots of 1, 2, 3, 4, and 6 months, respectively, because the smaller divided intervals correspond to more data points that are treated as isolated points. Therefore, when we take one month as a snapshot, the number of clusters is the smallest. Because of limited space, we show the NMI scores and SSE values for the 8 snapshots situation (each snapshot is six months) in Fig. 2(c,d). We observe that MSSC outperforms the baseline methods. To study the effect of considering historic information, Table 4 Table 3. The performance for five dynamic networks. Dynamic networks for five different event type: birth and death, expansion, contraction, merging and splitting, switch nodes.
information and smaller error. Therefore the SSE is smallest when the dynamic networks are considered as 48 snapshots.

Discussion
In this paper, to find a highly efficient spectral clustering method for community detection in dynamic networks, we propose an MSSC method by considering different measures together. We first construct multiple similarity matrices for each snapshot of dynamic networks and present a dynamic co-training method that bootstrapping the clustering of different similarity measures using information from one another. Furthermore, the proposed dynamic co-training method, which considers the evolution between two neighbouring snapshots can preserve the historic information of community structure. Finally, we use a simple but effective method to adaptively estimate the temporal smoothing parameter in the objective.
We have evaluated our MSSC method on both synthetic and real-world networks with ground-truths, and compared it with three state-of-the-art spectral clustering methods. The experimental results show that the method effectively detects communities in dynamic networks for most analysed data sets with various network and community size.
In all of our experiments, we observe that the major improvement in performance is obtained in the first iteration. The performance varies around that value in subsequent iterations. Therefore, in this paper, we show the results after the first iteration. In general, the algorithm does not converge, which is also the case with the semi-supervised co-training algorithm 28 .
However, the number of clusters or communities must be pre-designed in each snapshot. Determining the number of clusters is an important and difficult research problem in the field of model selection. There is currently no good resolution method for this problem. Some previously suggested approaches to this problem are   Table 4. The performance for the KIT E-mail Dataset. The e-mail networks taking 1, 2, 3, 4, 6 months as a snapshot, respectively. cross-validation 37 , minimum description length methods that use two-part or universal codes 38 , and maximization of a marginal likelihood 39 . Our algorithms can use any of these methods to automatically select the number of cluster k because our algorithm still uses the fundamental spectral clustering algorithm. Additionally, as a spectral clustering method, MSSC must construct an adjacency matrix and calculate the eigen-decomposition of the corresponding Laplacian matrix. Both steps are computationally expensive. For a data set of n data points, these two steps have complexities of O(n 2 ) and O(n 3 ), which are unbearable burdens for large-scale applications 40 . There are some options to accelerate the spectral clustering algorithm, such as landmark-based spectral clustering (LSC), which selects  p n ( ) representative data points as the landmarks and represents the remaining data points as the linear combinations of these landmarks 41,42 . Liu et al. 43 introduced a sequential reduction algorithm based on the observation that some data points quickly converge to their true embedding, so that an early stop strategy will speed up the decomposition. Yan, Huang, and Jordan 44 also provided a general framework for fast approximate spectral clustering.

Methods
Traditional spectral clustering. In this section, we review the traditional spectral clustering approach 11 .
The basic idea of spectral clustering is to cluster based on the spectrum of a Laplacian matrix. Given a set of data points {x 1 , x 2 , … , x n }, the intuitive goal of clustering is to find a reasonable method to divide the data points into several groups, with greater similarity in each group and dissimilarity among the groups. From the view of graph theory, the data can be represented as a similarity-based graph G = (V, E) with vertex set V and edge set E. Each vertex v i in this graph represents a data point x i , and the edge between vertices v i and v j is weighted by similarity . The adjacency matrix is a square matrix A, such that its element A ij is one when there is an edge from vertex v i to vertex v j and is zero when there is no edge. Two common variants of spectral clustering are average association and normalized cut 45 . The two partition criteria that maximize the association with the group and minimize the disassociation among groups are identical (the proof is provided in the literature 45 ). Unfortunately, each variant is associated with an NP-hard problem. The relaxed problems can be written as 11,26,45 In our algorithm, we will use the normalized cut as the partition criteria. The optimal solution to this problem is to set Z to be the eigenvectors that correspond to the k smallest eigenvalues of . Then, all data points are projected to the eigen-space and the k-means algorithm is applied to the projected points to obtain the clusters. The focus of our work is the definition of the similarity matrix in the spectral clustering algorithm, i.e. computing the relaxed eigenvectors Zs with different similarity measurements.

Different similarity measures.
In spectral clustering, a similarity matrix should be constructed to quantify the similarity among the data points. The performance of the spectral clustering algorithm heavily depends on the choice of similarity measures 46 . There are several constructions to transform a given set of data points into their similarities. A common approach in machine learning is to encode prior knowledge about the data vertices using a kernel 27 . The linear kernel which is given by the inner products between implicit representations of data points, is the simplest kernel function. Assume that the ith node in V can be represented by an m-dimensional feature vector ∈  v R i m , and the distance between the ith and jth nodes in which is the Euclidean distance. The linear kernel can be used as a type of similarity measure, i.e., similarity matrix W can be solved by = The Gaussian kernel function is one of the most common similarity measures for spectral clustering 11 , which can be written as = , where the standard deviation of the kernel σ is equal to the median of the pair-wise Euclidean distances between the data points.
There are also some specific kernels for the similarity matrix. Fischer and Buhmann 47 proposed a path-based similarity measure based on a connectedness criterion. Chang et al. 48 proposed a robust path-based similarity measure based on the M-estimator to develop the robust path-based spectral clustering method.
Different similarity measures may reveal similarity between data points from different perspectives. For example, the Gaussian kernel function is based on Euclidean distances between the data points, whereas the linear kernel function is based on the inner products of the implicit representations of data points. Most studies of spectral clustering are based on one type of similarity measure, and notably few works consider multiple similarity measures. Therefore, we propose a method to consider multiple similarity measures in spectral clustering. In other words, our goal is to find a spectral clustering method based on multiple similarity matrice.
Multi-similarity spectral clustering. First, we introduce basic ideas on multi-similarity spectral clustering in the dynamic networks. We assume that the clustering from one similarity measurement should be consistent with the clustering from the other similarity measurements, and we bootstrapping the clustering of different similarities using information from one another by a dynamic co-training. The dynamic co-training method based on the idea of evolutionary clustering can preserve historic information of community structure. After a new similarity matrix is obtained by the dynamic co-training, we follow the standard procedures in traditional spectral clustering and obtain the clustering result. Figure 3 graphically illustrate the dynamic co-training process.
Specifically, we first compute the similarity matrices with different similarity measures at snapshot t, and the pth similarity matrix is denoted by W t p ( ) . Following most spectral clustering algorithms, a solution to the problem of minimizing the normalized cut is the relaxed cluster assignment matrix Z t p ( ) whose columns are the eigenvec- . Then all data points are projected to the eigen-space, and the clustering result is obtained usin the k-means algorithm. For a Laplacian matrix with exactly k connected components, its first k eigenvectors are are the cluster assignment vectors, i.e., these k eigenvectors only contain discriminative information among different clusters, while ignoring the details in the clusters 11 . However, if the Laplacian matrix is fully connected, the eigenvectors are no longer the cluster assignment vectors, but they contain discriminative information that can be used for clustering. From the co-training, we can use the eigenvectors from one similarity matrix to update the other one. The updated similarity matrix on the pth similarity measure at snapshot t can be defined as where Z t q ( ) denotes the discriminative eigenvector in the Laplacian matrix from the qth similarity measure, p, q = 1, 2, 3, … s and p ≠ q. Equation (3) is the symmetrization operator to ensure that the projection of similarity matrix W t p ( ) onto the eigenvectors is a symmetric matrix. Then, we use S t p ( ) as the new similarity matrix to compute the Laplacians and solve for the first k eigenvectors to obtain a new cluster assignment matrix Z t p ( ) . After the co-training procedure is repeated for a pre-selected number of iterations, matrix = V Z t p ( ) is constructed, where p is considered the most informative similarity measure in advance. Alternatively, if there is no prior knowledge of the similarity informativeness, matrix V can be set to be the column-wise concatenation of all Z s t p ( ) . For example, we generate two cluster assignment matrices Z t (1) and Z t (2) , which are combined to form = (1) (2) . Finally, the clusters are obtained using the k-means algorithm on V.
As descibed, we can solve the problem to accommodate multiple similarities. A further consideration is to follow the evolutionary clustering strategy to preserve the historic information of the community structure based on the co-training method. A general framework for evolutionary clustering was proposed by a linear combination of two costs 26 : where CS measures the snapshot quality of the current clustering result with respect to the current data features, CT measures the goodness-of-fit of the current clustering result with respect to either historic data features or historic clustering results.
Here, we assume that the clusters at any snapshot should mainly depend on the current data and should not dramatically shift to the next snapshot. Then, a better approximation to the inner product of the feature matrix and its transposition is define as , and α t q ( ) is the temporal penalty parameter that controls the weight on the current information and historic information. Notice that Z Z t q t q ( ) ( ) T is determined by both current eigenvectors and historic eigenvectors, so the updated similarity S t p ( ) defined in Equation (2), which considers the history, produces stable and consistent clusters. With the increase in α t q ( ) , more weight is placed on the current information, and less weight is placed on the historic information. Algorithm 1 describes the MSSC algorithm in detail.