Clustering method for time-series images using quantum-inspired digital annealer technology

Time-series clustering is a powerful data mining technique for time-series data in the absence of prior knowledge of the clusters. Here we propose a time-series clustering method that leverages an annealing machine, which accurately solves combinatorial optimization problems. The proposed method facilitates an even classification of time-series data into closely located clusters while maintaining robustness against outliers. We compared the proposed method with an existing standard method for clustering an online distributed dataset and found that both methods yielded comparable results. Furthermore, the proposed method was applied to a flow measurement image dataset containing noticeable noise with a signal-to-noise ratio of approximately unity. Despite a small signal variation of approximately 2%, the proposed method effectively classified the data without any overlaps among the clusters. In contrast, the clustering results of the existing methods exhibited overlapping clusters. These results indicate the effectiveness of the proposed method.


Introduction
The collection of large-sized datasets has significantly increased with advancements in data storage and data acquisition technology.][9][10] Additionally, some libraries for timeseries clustering have been made available on the web [11][12][13][14][15] and are widely used.Following the literature, 7,8 time-series clustering is defined as "the process of unsupervised partitioning a given time-series dataset into clusters, in such a way that homogenous time-series are grouped together based on a certain similarity measure, is called time-series clustering."Three main methods are commonly employed for time-series clustering: raw-data-based/shape-based, feature-based, and model-based approaches. 7,8These methods differ in their initial calculation procedures.The raw-data-based/shape-based approach directly uses the raw data for clustering, while the featurebased approach transforms the raw data into a low dimensional feature vector.The model-based approach assumes that the time-series data is generated from a stochastic process model, and the parameters of the model are estimated from the data.Since the performance of model-based approach degrades when clusters are close to each other, raw-data-based and feature-based approaches are usually used. 2,8The subsequent step involves calculating the similarity or distance between two data, feature-vectors, models.Then, the data is grouped into clusters based on the measured similarity or distance using machine learning methods.Clustering algorithms commonly employed for time-series data include partitioning, hierarchical, model-based, and density-based clustering algorithms. 7,8Among partitioning clustering algorithms, k-means clustering is one of the most widely used algorithms. 5,6,16,17Its main advantage lies in its low computational cost.
However, the method requires a number of clusters to be pre-determined by users.In a hierarchical clustering algorithm, number of clusters does not need to be pre-determined.However, once clusters are split or merged using the divisive or agglomerative method, they cannot be adjusted.
Neural network approaches such as self-organizing maps 18 and hidden Markov model 5 are used as model-based clustering approaches.In addition to the above-mentioned disadvantage of the performance degradation for the close clusters, these approaches suffer from high computational cost.Density-based methods, such as density-based spatial clustering of applications with noise (DBSCAN), 19 do not need to pre-determine number of clusters and is robust to outliers.However, in these methods, an appropriate choice of parameters is difficult, and it is known that they suffer from the curse of dimensionality.This study proposes a novel time-series clustering method using an annealing machine as an alternative clustering approach.1][22][23] Therefore, we expect that our proposed method can achieve clustering tasks that are challenging to achieve with existing methods.One unique characteristic of the proposed method is its ability to evenly classify time-series data into closely related clusters while maintaining robustness against outliers.Additionally, the method was specifically designed to evenly classify periodic time-series images into several phase ranges because of assuming a sufficient number of images for each phase, given the long duration of the time-series data relative to the period.This paper provides a comprehensive explanation of our proposed method.We used the third-generation Fujitsu Digital Annealer (DA3), which is one of the quantum-inspired computing technologies, for the clustering computations.DA3 can solve quadratic unconstrained binary optimization (QUBO) problems, and the clustering problem can be formulated as an Ising model which is equivalently a QUBO problem. 24,25DA3 provides a solution in large-scale problem space of up to 100 kbits.0][31] We specifically chose flow measurement data because it was typically high-dimensional (~10 6 ) and contained a measurement noise.For the clustering process, we employed the raw-data-based and the feature-based approaches.Furthermore, we compared our results with those obtained from standard existing methods, specifically "tslearn" 11 available online, and the conditional image sampling (CIS) method 32,33 (only for flow measurement data).

Clustering of online available time-series dataset
7][28] The clustering results obtained using the "TimeSeriesKMeans" function in "tslearn" and the proposed methods are shown in Fig. 1.The "crop" dataset contained twenty-four clusters.However, we present the results of two representative clusters.In this dataset, the correct classifications were known and displayed in Fig. 1.Additionally, ensemble-averaged data for each method were calculated.As shown in Fig. 1(a), the proposed method successfully classified the data, while the results obtained by the standard existing method (tslearn) exhibited some unfavorable classifications.We calculated the root mean squared error (RMSE) between the correct data and the results obtained by the proposed method and "tslearn".The RMSEs of the proposed method and the existing method shown in Fig. 1(a) were 0.013 and 0.015, respectively.This further confirmed that the proposed method surpassed the standard existing method.On the other hand, the RMSEs of the proposed and the existing methods shown in Fig. 1(b) were 0.013 and 0.009, respectively.In this condition, the result obtained by the proposed method was inferior to that of the existing method.However, since the variance of the correct data is large as shown in Fig. 1(b), the classification is inherently difficult.
Consequently, we can conclude that the proposed method provides results comparable to those of conventional methods.

Clustering of flow measurement time-series dataset
We applied our method to the flow measurement dataset to demonstrate its effectiveness for noisy data.A typical data of a snapshot is shown in Fig. 2, and the image shows that the data contains noticeable noise with a signal-to-noise ratio of approximately 1.In this study, we classified this time-series data into nine clusters using the proposed method, "tslearn", and the CIS method.The clustering results are shown in Fig. 3, where the data is presented on a two-dimensional scatter plot using multi-dimensional scaling (MDS).In the MDS calculation, the distance between the data   and   is represented as �sin� , /2��, where || represents the absolute value of , and  , corresponds to the angle between data vectors   and   .Since the Kármán vortex street dataset used here is a periodical phenomenon with a maximum distance of unity, the data were distributed along a circle with a radius of 1/2.As illustrated in Fig. 3, the proposed method successfully classified the data without overlaps.The data points were evenly classified into each cluster, and the cluster sizes were similar to each other.The data points out the circle with the radius of 1/2 were considered outliers, which is a reasonable because these data were considered disturbances deviating from periodic phenomena.However, the outliers were classified into one of the clusters in the standard existing method.This will be inappropriate when calculating ensemble averaging of the data.In the CIS method, only the data on the circle are classified.
However, some clusters exhibited overlapping regions and did not form discrete clusters.Density-based methods, such as density-based spatial clustering of applications with noise (DBSCAN), are known as powerful clustering methods.However, the data points on the circle were classified into a single cluster in DBSCAN.
The ensemble-averaged pressure distributions are shown in Fig. 4. The proposed method (Fig. 4a) and the CIS method (Fig. 4c) well extract a periodic vortex generation despite a small pressure variation of approximately 2%.On the other hand, the pressure distribution obtained from the standard method failed to extract the periodic motion accurately.For example, the vortex located at the upper side suddenly disappeared from phase 2 to phase 3, and the vortex at the upper side reversed its flow direction from phase 5 to phase 6 (Fig. 4b).This discrepancy is attributed to the overlapping clusters observed in Fig. 3b.As the pressure decreases when the vortex comes, we compared the minimum pressure at the center of the vortex between the proposed method and the CIS method.The ensemble-averaged pressure values were / ref = 0.982 ± 0.001 and / ref = 0.984 ± 0.002 for the proposed and the CIS methods, respectively, where the error represents the standard deviation and  ref denotes the atmospheric pressure.The pressure by the CIS method was slightly higher than that of the proposed method, which aligned with the observations in Fig. 4a and c.This difference indicates that the vortex was weakened in the CIS method due to overlapping clusters mentioned earlier, where data from different phases were also included in the ensemble averaging process.These findings provide further evidence that the proposed method is a powerful clustering approach for analyzing periodic phenomena.

Proposed method for time-series clustering
We propose a novel clustering method using an annealing machine.We focused on the raw-databased and the feature-based approaches for time-series data analysis.We considered a clustering problem that a given dataset of  time-series data  = { 1 ,  2 , ⋯ ,   }, where   is a column vector, is classified to  clusters  = { 1 ,  2 , ⋯ ,   }.The Hamiltonian for the clustering problem is described as follows: 34,35 where  ℊ,  = 1 when   belongs to cluster  ℊ and  ℊ,  = 0 when   does not belong to the cluster  ℊ , that is, The similarity or inverse of the distance between   and   is denoted as  , , and ( The second term in Eq. 3 represents a constrained term ensuring each data point belongs to only one cluster. 34,35The value  1 determines the strictness of this constraint, where smaller value allows for the possibility of some data points being treated as outliers and not assigned to any cluster.This study considered the following minimization problem: where the new term, the third term in Eq. 4, adjusts the number of data points classified into each cluster.We denote  ℊ = ∑  ℊ,   to simplify the notation, indicating the number of data points belonging to the cluster  ℊ .Then, the third term of Eq. 4 is written as The mean number of data points and the variance of data points belonging to each cluster are represented by  and  2 , respectively.Equation 5 is written as where the number of the data points  and the clusters  are fixed, and  = / is a constant.
Then, as the variance decreases, i.e., the third term in Eq. 4 becomes smaller, the data points are evenly classified into each cluster.

Time-series dataset for demonstration
We applied the proposed clustering method to two time-series datasets.][28] These time-series data were derived from the images taken by the FORMOSAT-2 satellite.The dataset consists of 24 classes corresponding to an agricultural land-cover map, and each data corresponds to its temporal evolution.The time-series length was 46 , and the number of dimensions of the data was 1.The data was standardized to have a mean of 0 and a variance of 1.
We compared the clustering results obtained by the proposed method and those obtained by "tslearn" 11 In this study, we used the "TimeSeriesKMeans" function in "tslearn."The parameters in the function were set to general settings as follows: the number of clusters was 24, the metric (distance between each data) was Euclidean, the method for initialization was k-means++, and the other parameters were employed default values.This is a standard time-series clustering method.In the proposed method, the Euclidean distance was also used as the metric, and the inverse of the metric was used to minimize the first term in Eq. 4. Since all data points should belong to one of the clusters in this dataset, the parameter  1 was approximately 100 times larger than  2 .In this condition, the solution that all data point belonged to one of the clusters (the second term of Eq. 4 was 0) was obtained.
7][38] PSP is a pressure distribution measurement technique based on the oxygen quenching of the phosphorescence emitted from the dyes incorporated into the PSP coating.The measured data was the pressure distribution induced by the Kármán vortex street behind a square cylinder.The data size was 780 × 780 spatial grids.
The flow velocity was 50 m/s, and the Reynolds number was 1.1 × 10 5 .The pressure difference was too small to detect the PSP technique due to the small variation in the phosphorescence intensity.Then, the measured pressure contained a noticeable noise, and the noise should be reduced from the data.It is well known that the Kármán vortex is a periodic phenomenon.The data was classified into several phase groups and averaged within these groups to reduce the noise and to extract significant patterns, which is one of the purposes of time-series clustering.The cosine similarity measure was used to assess the similarity between the data because we focused on the phase information of the vortex.Since the PSP data was a time-series image data with two spatial dimensions and one temporal dimension, the pressure distribution data was reshaped into a column vector.Consequently, the time-series PSP data is written as  time-series data  = { 1 ,  2 , ⋯ ,   }, where   is a vector corresponding to a reshaped pressure distribution.Since the measured PSP data contains a significant noise, the denoised data was used for the calculation of the similarity.Following the literature, 39 the dataset with small noise can be obtained by considering the truncated singular value decomposition (SVD).We considered a data matrix  = [ 1  2 ⋯   ], where the data matrix  is obtained by arranging vectors   in time-series order.
SVD provides the following representation: where the matrices  and  are unitary matrices, and the superscript ⊤ shows the transpose.
The matrix  is a diagonal matrix of singular values arranged in descending order.It is well known that the data can be approximated by a truncated SVD 40 as follows: where  � is a first  ×  diagonal matrix and  is a truncation rank.The matrices  � and  � are reduced matrices corresponding to  � .Then, we obtained the noise-reduced time-series data or the time-series data of  � = { � 1 ,  � 2 , ⋯ ,  �  }.We set  = 5, which is a commonly used truncation value.Subsequently, the cosine similarity cos  , was calculated as follows: where 〈 �  ,  �  〉 is the inner product of  �  and  �  , and ‖ �  ‖  is the ℓ2 norm of  �  .This study only considered the pressure distribution behind the square cylinder to reduce computational cost.
Substituting  , = cos  , in Eq. 4, we calculated the clustering using DA3.The parameter  1 was several ten times larger than  2 , ensuring that each term in Eq. 4 was of a similar magnitude.
In this condition, some data were classified as outliers.The images within the same cluster were ensemble averaged to extract significant patterns.Here, we note that the original image data  was averaged to extract the patterns, while the truncated dataset of  � was not used.
Considering that cos  , lies within the range of −1 to 1, we introduced the following relation  , , which range from of 0 to 1: where 1 −  , = 0 when  =  (the same data).Then, we define the distance metric between the data as �sin� , /2��, where || represents the absolute value of  and  , is the angle between data vectors.The maximum value of the distance is unity in this distance metric.This distance metric was used in the MDS calculation.
The time-series data were also classified by the "TimeSeriesKMeans" function in "tslearn" described above.Additionally, we used the CIS method 32,33 , which is a specialized methods designed specifically for PSP measurements.In the CIS method, the time-series data were classified into several phase groups based on the pressure data measured by a pressure transducer sensor which is a point sensor with a higher sampling rate than PSP.In other words, the CIS method requires an additional sensor for clustering.This reliance on an extra sensor can be considered one of the limitations of the CIS method.

Conclusions
We propose a novel clustering method using an annealing machine.We added the new term that adjusts the number of data classified into each cluster to a QUBO model.In this study, we applied our proposed method to two distinct datasets: one is the "crop" dataset available from the UEA & UCR time-series classification repository and the other is a flow measurement image dataset obtained in our previous study.For the clustering of "crop" dataset, we also employed a standard existing method distributed as "tslearn," in which the distance between each data was calculated based on the Euclidean distance and the clustering was calculated by the k-means++ algorithm.Comparing the results obtained from our proposed method and the existing method, we observed that the variation of the data points obtained by the proposed method was smaller than that by the existing method.In this dataset, the correct clustering result was provided.Then, we calculated the ensemble averaged data, and the root mean squared errors (RMSEs) between the correct data and the ensemble averaged data were compared.Our findings indicate that both methods provide similar result for this dataset.
Next, we applied our clustering method to the flow measurement image dataset which consisted of the time-series pressure distributions induced by the Kármán vortex street.This dataset exhibited periodicity.Another characteristic of this data is that the dataset contains a noticeable noise with a signal-to-noise ratio of approximately 1.For comparison, the dataset was also classified using the standard existing method and the conditional image sampling (CIS) method, which is specifically designed for flow measurement data.The proposed method successfully classified the data without any overlap between the clusters in spite of the small pressure variation of approximately 2%.On the other hand, both the existing and the CIS methods exhibited overlapping of clusters, failing to form discrete clusters.In particular, the overlap between the clusters calculated by the existing method was large; thus, the vortex suddenly disappeared at times and exhibited reverse flow at other times in the ensemble-averaged pressure distribution.It was also found that the vortex was weakened in the ensemble-averaged pressure distribution obtained 11 by the CIS method.These outcomes highlight the superior performance of the proposed method in the clustering periodic phenomena.

1
is a hyperparameter.The sum ∑  ,  ℊ,   ℊ, ≠ represents the sum of the similarity or the inverse of the distance between two data points belonging to a cluster.The sum ∑  represents the sum over all clusters in the first term of Eq. 1. Clustering can be calculated by minimizing −ℋ 1 , i.e., min − � �  ,  ℊ,   ℊ

1 .
Figure 1.Typical clustering results for "crop" dataset from the UEA & UCR time-series classification repository using the proposed method and the existing method.The data labeled as class 1 and class 17 in the repository are shown in (a) and (b), respectively.

Figure 2 .
Figure 2. Typical pressure distribution, where pressure  is normalized by an atmospheric pressure  ref .Reproduced from Inoue et al. 31 .

Figure 3 .
Figure 3. Clustering results shown in two-dimensional scatter plot based on MDS.(a) the result by the proposed method, (b) that by the existing standard method (tslearn), (c) that by the CIS method.

Figure 4 .
Figure 4. Ensemble averaged pressure distribution for (a) the proposed method, (b) the existing standard method (tslearn), and (c) the CIS method.Pressure  is normalized by an atmospheric pressure  ref .