Introduction

The collection of large-sized datasets has drastically increased with advancements in data storage and data acquisition technologies. Time-series data containing one or multiple variables (e.g., images) that vary with time is extensively recorded and analyzed in various fields, such as science, engineering, medical science, economics, and finance1,2,3. Clustering is a powerful data mining technique for classifying these data into related groups in the absence of sufficient prior knowledge of the groups4,5,6. In particular, when dealing with time-series data, the clustering technique is referred to as time-series clustering7,8,9. Many studies on time-series clustering have been summarized in review papers2,7,8,9,10,11. In addition, several libraries for time-series clustering have been made available on the web12,13,14,15,16 and are widely used. Following the literature7,8, time-series clustering is defined as “the process of unsupervised partitioning a given time-series dataset into clusters, in such a way that homogenous time-series data are grouped together based on a certain similarity measure, is called time-series clustering.” Three main methods are commonly employed for time-series clustering: raw-data-based/shape-based, feature-based, and model-based approaches7,8. As an example, the raw-data-based/shape-based approach is illustrated in Fig. 1. These methods differ in their initial calculation procedures. The raw-data-based/shape-based approach directly uses the raw data for clustering, whereas the feature-based approach transforms the raw data into a low-dimensional feature vector. The model-based approach assumes that the time-series data are generated from a stochastic process model, and the parameters of the model are estimated from the data. The raw-data-based and feature-based approaches are more commonly used because the performance of the model-based approach degrades when clusters are close to each other2,8. The subsequent step involves calculating the similarity or distance between two data points, feature-vectors, or models. Then, the data is grouped into clusters based on the measured similarity or distance using machine learning methods. Clustering algorithms commonly employed for time-series data include partitioning, hierarchical, model-based, and density-based clustering algorithms7,8. Among partitioning clustering algorithms, k-means clustering is one of the most widely used algorithms5,6,17,18. Its main advantage lies in its low computational cost. However, the method requires user to pre-determine a number of clusters. In a hierarchical clustering algorithm, the number of clusters does not need to be pre-determined. However, once clusters are split or merged using the divisive or agglomerative methods, they cannot be adjusted. Neural network approaches such as self-organizing maps19 and hidden Markov model5 are employed as model-based clustering approaches. In addition to the above-mentioned disadvantage of the performance degradation for close clusters, these approaches suffer from high computational costs. Density-based methods, such as density-based spatial clustering of applications with noise (DBSCAN)20, do not require users to pre-determine a number of clusters and is robust to outliers. However, in density-based methods, an appropriate choice of parameters is difficult, and it is known that they suffer from the curse of dimensionality. Overall, each method has its advantages and disadvantages. A definitive method that can be used for all datasets does not exist, and an appropriate method should be employed depending on the purpose and dataset to be processed. Recently, continued attempts have been made to improve the performance of each method. As examples, studies published in the last three years are introduced as follows: the extension of dynamic time warping (DTW)21,22, the measure based on quantile cross-spectral density23, and the measure of two linear fuzzy information granule time-series24 have been proposed to calculate the similarity or distance. A clustering method that focuses on the time-varying moment was proposed for financial time-series data25. A model-based approach based on the mixture of linear Gaussian state space model was proposed26. A notable research trend is approaches based on deep learning27,28,29, which is different from the previously mentioned unsupervised methods. As a contrasting approach, a computationally efficient approach based on single-template matching was proposed30. However, to the best of the authors’ knowledge, no method has been reported to concurrently achieve a high clustering performance (e.g., classification of data points that are close to each other, robustness to outliers, etc.) and low computational cost.

Fig. 1: Typical clustering procedure of raw-data-based/shape-based approach.
figure 1

The similarity or distance between two data points is calculated. The data points are classified into each cluster based on the similarity or distance.

In this study, we propose a time-series clustering method that can achieve a higher clustering performance and lower computational cost. To achieve this goal, we focused on clustering algorithms using an annealing machine. As mentioned above, research has not included the study of clustering algorithms as much as the calculation of similarity or distance, with the exception of the use of deep learning. Annealing/Ising machines, such as quantum annealing and digital Ising machines, solve combinatorial optimization problems faster and more accurately than conventional computers31,32,33,34. Therefore, we expect that our proposed method can achieve clustering tasks that are challenging to achieve with existing methods. A unique characteristic of the proposed method, which is not found in existing methods, is its ability to evenly classify time-series data into closely related clusters while maintaining robustness against outliers. More specifically, the method can equally classify periodic time-series images into several phase ranges by assuming a sufficient number of images for each phase, given the long duration of the time-series data relative to the period. This paper provides a comprehensive explanation of our proposed method. We used the third-generation Fujitsu Digital Annealer (DA3), which is a quantum-inspired computing technology, for the clustering calculation. DA3 can solve quadratic unconstrained binary optimization (QUBO) problems, and the clustering problem can be formulated as an Ising model that is equivalent to a QUBO problem35,36. DA3 provides a solution in a large-scale problem space of up to 100 kbits. Subsequently, we applied our proposed method to two time-series datasets: one obtained from “the UEA & UCR time-series classification repository”37,38,39, and the other consisted of flow measurement image data capturing the Kármán vortex street, periodic wakes, obtained in our previous data40,41,42. We specifically chose flow measurement data because it is typically high dimensional (~106) and contains a measurement noise. For the clustering process, we employed raw-data-based and feature-based approaches. Furthermore, we compared our results with those obtained from existing standard methods, specifically “tslearn”12 available online, and the conditional image sampling (CIS) method43,44 (only for flow measurement data).

Results and discussion

Clustering of online available time-series dataset

We demonstrated the application of the proposed method to classify the “crop” dataset available from the UEA & UCR time-series classification repository37,38,39. The clustering results obtained using the “TimeSeriesKMeans” function in “tslearn” and the proposed methods are shown in Fig. 2. The “crop” dataset contained 24 clusters. However, we present the results of two representative clusters. In this dataset, the correct classifications were known and displayed in Fig. 2. In addition, ensemble-averaged data for each method were calculated. As shown in Fig. 2a, the proposed method successfully classified the data, whereas the results obtained by the standard existing method (tslearn) exhibited some unfavorable classifications. We calculated the root mean squared error (RMSE) between the ensemble-averaged data of the correct data and those obtained by the proposed method and “tslearn”. The RMSEs of the proposed and existing methods shown in Fig. 2a were 0.115 and 0.121, respectively. This further confirmed that the proposed method surpassed the standard existing method. On the other hand, the RMSEs of the proposed and the existing methods shown in Fig. 2b were 0.117 and 0.096, respectively. In this condition, the result obtained by the proposed method was inferior to that of the existing method. However, since the variance of the correct data is large, as shown in Fig. 2b, the classification is inherently difficult. The demonstrations for other datasets are provided in Supplementary Note 1. Consequently, we can conclude that the results of the proposed method are comparable to those of conventional methods.

Fig. 2: Typical clustering results for “crop” dataset from the UEA & UCR time-series classification repository using the proposed and existing methods.
figure 2

The data labeled as class 1 and class 17 in the repository are shown in (a) and (b), respectively.

Clustering of flow measurement time-series dataset

We applied our method to the flow measurement dataset of the Kármán vortex street to demonstrate its effectiveness for noisy data. A typical data of a snapshot is shown in Fig. 3a, and the image shows that the data contain noticeable noise with a signal-to-noise ratio (SNR) of approximately 1. The dimensions of the measurement area are shown in Fig. 3b. This periodic time-series dataset should be equally classified into each phase range because a sufficient number of images were acquired for each phase owing to the long duration of the measurement relative to the period. Therefore, this is a typical dataset to demonstrate the effectiveness of this method. In this study, we classified this time-series data into nine clusters using the proposed method, “tslearn,” and the CIS method. The clustering results are shown in Fig. 4, where the data are presented on a two-dimensional scatter plot using multi-dimensional scaling (MDS). In the MDS calculation, the distance between the data xi and xj is represented as \(|\sin ({\theta }_{i,j}/2)|\), where |a| represents the absolute value of a, and θi,j corresponds to the angle between data vectors xi and xj. Since the Kármán vortex street dataset used in the analysis is a periodical phenomenon with a maximum distance of unity, the data points were distributed along a circle with a radius of 1/2. As illustrated in Fig. 4a, the proposed method successfully classified the data points without overlaps. The data points were evenly classified into each cluster, and the cluster sizes were similar, which is a favorable result. The data points outside the circle with a radius of 1/2 were considered outliers, which is reasonable because these data points were considered disturbances deviating from periodic phenomena. However, the outliers were classified into one of the clusters in the standard existing method (Fig. 4b). This will be inappropriate when calculating the ensemble average of the data. The CIS method only classified the data points on the circle as shown in Fig. 4c. However, some clusters exhibited overlapping regions and did not form discrete clusters. Density-based methods, such as DBSCAN, are known as powerful clustering methods. However, the data points on the circle were classified into a single cluster in DBSCAN.

Fig. 3: Typical raw data of PSP measurement and calculation condition.
figure 3

(a) Typical pressure distribution, where pressure p is normalized by an atmospheric pressure pref. Reproduced from Inoue et al.42. (b) Dimensions of experimental setup and area for similarity calculation.

Fig. 4: Clustering results shown in two-dimensional scatter plot based on MDS.
figure 4

(a) The result by the proposed method, (b) that by the existing standard method (tslearn), (c) that by the conditional image sampling (CIS) method.

The ensemble-averaged pressure distributions are shown in Figs. 57. The proposed method (Fig. 5) and the CIS method (Fig. 7) effectively extract a periodic vortex generation despite a small pressure variation of approximately 2%. On the other hand, the pressure distribution obtained from the standard method failed to accurately extract the periodic motion. For example, the vortex located at the upper side suddenly disappeared from phase 2 to phase 3, and the vortex at the upper side reversed its flow direction from phase 5 to phase 6 (Fig. 6). This discrepancy can be attributed to the overlapping clusters observed in Fig. 4b. As the pressure decreases when the vortex comes, we compared the minimum pressure at the center of the vortex between the proposed and CIS methods. The ensemble-averaged pressure values were p/pref = 0.982 ± 0.001 and p/pref = 0.984 ± 0.002 for the proposed and CIS methods, respectively, where the error represents the standard deviation and pref denotes the atmospheric pressure. The pressure obtained by the CIS method was slightly higher than that of the proposed method, which aligned with the observations in Figs. 5 and 7. The difference indicates that the vortex was weakened in the CIS method because of the previously mentioned overlapping clusters, where data from different phases were also included in the ensemble averaging process. These findings provide further evidence that the proposed method is a powerful clustering approach for analyzing periodic phenomena.

Fig. 5: Ensemble-averaged pressure distribution for the proposed method.
figure 5

The images are in phase order, and the vortices are flowing in this order. Pressure p is normalized by an atmospheric pressure pref.

Fig. 6: Ensemble-averaged pressure distribution for the existing standard method (tslearn).
figure 6

The vortices are not flowing in the phase order. Pressure p is normalized by an atmospheric pressure pref.

Fig. 7: Ensemble-averaged pressure distribution for the conditional image sampling (CIS) method.
figure 7

The vortices are weaker than those of the proposed method. Pressure p is normalized by an atmospheric pressure pref.

Conclusions

We propose a novel clustering method using an annealing machine. We added a term that adjusts the number of data classified into each cluster to a QUBO model. In this study, we applied our proposed method to two distinct datasets: one is the “crop” dataset available from the UEA & UCR time-series classification repository and the other is a flow measurement image dataset obtained in our previous study. For the clustering of “crop” dataset, we also employed a standard existing method distributed as “tslearn,” in which the distance between each data was calculated based on the Euclidean distance and the clustering was calculated by the k-means++ algorithm. Comparing the results obtained from our proposed method and the existing method, we observed that the variation of the data points obtained by the proposed method was smaller than that by the existing method. In this dataset, the correct clustering result was provided. Then, we calculated the ensemble-averaged data, and the root mean squared errors (RMSEs) between the correct data and the ensemble-averaged data were compared. Our findings indicate that both methods provide similar results for this dataset.

Next, we applied our clustering method to the flow measurement image dataset, which consisted of the time-series pressure distributions induced by the Kármán vortex street. This dataset exhibited periodicity. Another characteristic of this data is that the dataset contains a noticeable noise with a signal-to-noise ratio of approximately 1. For comparison, the dataset was also classified using the standard existing method and the conditional image sampling (CIS) method, which is specifically designed for flow measurement data. The proposed method successfully classified the data without any overlap between the clusters in spite of the small pressure variation of approximately 2%. On the other hand, both the existing and the CIS methods exhibited overlapping of clusters, failing to form discrete clusters. In particular, the overlap between the clusters calculated by the existing method was large; thus, the vortex suddenly disappeared at times and exhibited reverse flow at other times in the ensemble-averaged pressure distribution. It was also found that the vortex was weakened in the ensemble-averaged pressure distribution obtained by the CIS method. These outcomes highlight the superior performance of the proposed method in the clustering periodic phenomena. The clustering algorithm using an annealing machine is a promising algorithm for large dataset. However, the calculation of similarity or distance is conducted by conventional computers. This is considered to be a major limitation that needs to be resolved when handling large datasets.

Methods

Proposed method for time-series clustering

We propose a clustering method using an annealing machine. We focused on the raw-data-based and feature-based approaches for time-series data analysis. We considered a clustering problem that a given dataset of n time-series data X = {x1, x2,, xn}, where xi is a column vector, is classified to \({{{{{\mathcal{k}}}}}}\) clusters \({{{{{\mathcal{c}}}}}}=\left\{{c}_{1},{c}_{2},\cdots ,{c}_{{{{{{\mathcal{k}}}}}}}\right\}\). Since DA is designed to solve QUBO problems, an objective function is expressed as a QUBO problem. The Hamiltonian for the clustering problem is described as follows45,46:

$${{{{{{\mathcal{H}}}}}}}=\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}\mathop{\sum }\limits_{i \neq j}{d}_{i,j}{q}_{{{{{{\mathcal{g}}}}}},i}{q}_{{{{{{\mathcal{g}}}}}},j}-{\lambda }_{1}\mathop{\sum }\limits_{{{{{{\bf{X}}}}}}}{\left(\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}{q}_{{{{{{\mathcal{g}}}}}},j}-1\right)}^{2}$$
(1)

where \({q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}i}=1\) when xi belongs to cluster \({{{{{{\mathcal{c}}}}}}}_{{{{{{\mathcal{g}}}}}}}\) and \({q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}i}=0\) when \({{{{{{\bf{x}}}}}}}_{i}\) does not belong to the cluster \({{{{{{\mathcal{c}}}}}}}_{{{{{{\mathcal{g}}}}}}}\), that is,

$${q}_{{{{{{\mathcal{g}}}}}},i}=\left\{\begin{array}{c}1:{{{{{{\bf{x}}}}}}}_{i}\in {{{{{{\mathcal{c}}}}}}}_{{{{{{\mathcal{g}}}}}}}\\ 0:{{{{{{\bf{x}}}}}}}_{i}\notin {{{{{{\mathcal{c}}}}}}}_{{{{{{\mathcal{g}}}}}}}\hfill\end{array}\right.$$
(2)

The similarity or inverse of the distance between \({{{{{{\bf{x}}}}}}}_{i}\) and \({{{{{{\bf{x}}}}}}}_{j}\) is denoted as \({d}_{i,{j}}\), and \({\lambda }_{1}\) is a hyperparameter. The sum \({\sum }_{i\ne j}{d}_{i,{j}}{q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}i}{q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}j}\) represents the sum of the similarity or the inverse of the distance between two data points belonging to a cluster. The sum \({\sum }_{{{{{{\mathcal{c}}}}}}}\) represents the sum over all clusters in the first term of Eq. (1). Clustering can be calculated by minimizing \(-{{{{{{\mathcal{H}}}}}}}\), i.e.,

$$\min -\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}\mathop{\sum }\limits_{i \neq j}{d}_{i,j}{q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}i}{q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}j}+{\lambda }_{1}\mathop{\sum }\limits_{{{{{{\bf{X}}}}}}}{\left(\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}{q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}j}-1\right)}^{2}$$
(3)

The second term in Eq. (3) represents a constrained term ensuring each data point belongs to only one cluster45,46. The value λ1 determines the strictness of this constraint, where a smaller value enables some data points to be treated as outliers and not assigned them to any cluster. This study considered the following minimization problem:

$$\min -\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}\mathop{\sum }\limits_{i \neq j}{d}_{i,j}{q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}i}{q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}j}+{\lambda }_{1}\mathop{\sum }\limits_{{{{{{\bf{X}}}}}}}{\left(\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}{q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}j}-1\right)}^{2}+{\lambda }_{2}\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}{\left(\mathop{\sum }\limits_{j}{q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}j}\right)}^{2}$$
(4)

where the third term in Eq. (4) adjusts the number of data points classified into each cluster. We denote \({S}_{{{{{{\mathcal{g}}}}}}}={\sum }_{j}{q}_{{{{{{\mathcal{g}}}}}}{{{{{\mathscr{,}}}}}}j}\) to simplify the notation, indicating the number of data points belonging to the cluster \({{{{{{\mathcal{c}}}}}}}_{{{{{{\mathcal{g}}}}}}}\). Then, the third term of Eq. (4) is written as

$$\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}{\left(\mathop{\sum }\limits_{j}{q}_{{{{{{\mathcal{g}}}}}},j}\right)}^{2}=\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}{S}_{{{{{{\mathcal{g}}}}}}}^{2}$$
(5)

The mean number of data points and the variance of data points belonging to each cluster are represented by μ and σ2, respectively. Eq. (5) is written as

$$\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}{S}_{{{{{{\mathcal{g}}}}}}}^{2}=\mathop{\sum }\limits_{{{{{{\mathcal{c}}}}}}}\left\{{\left({S}_{{{{{{\mathcal{g}}}}}}}-\mu \right)}^{2}+2{S}_{{{{{{\mathcal{g}}}}}}}\mu -{\mu }^{2}\right\}={{{{{\mathcal{k}}}}}}\left({\sigma }^{2}+{\mu }^{2}\right)$$
(6)

When m data points are classified into one of \({{{\mathcal{k}}}}\) clusters, the mean \(\mu =m/{{{{{\mathcal{k}}}}}}\) is a constant. Then, as the variance decreases, i.e., the third term in Eq. (4) becomes smaller, the data points are evenly classified into each cluster. As the number of data points classified into each cluster decreases, the mean μ decreases and the third term also becomes smaller. In other words, adding this term enables us to easily adjust the number of data points in each cluster by only varying λ2. The effect of λ2 on the clustering of the flow measurement dataset is discussed in Supplementary Note 2. This adjustment is difficult for many existing clustering algorithms.

Time-series dataset for demonstration

We applied the proposed clustering method to two time-series datasets. One of the datasets, named “crop,” was obtained from the UEA & UCR time-series classification repository37,38,39. These time-series data were derived from images taken by the FORMOSAT-2 satellite. The dataset consists of 24 classes corresponding to an agricultural land-cover map, and each data point corresponds to its temporal evolution. The time-series length was 46, and the data were one-dimensional. The data were standardized to have a mean of 0 and a variance of 1. We compared the clustering results obtained by the proposed method and those obtained by “tslearn.”12 In this study, we used the “TimeSeriesKMeans” function in “tslearn.” The parameters in the function were set to general settings as follows: the number of clusters was 24, the metric (distance between each data) was Euclidean, the method for initialization was k-means++, and the other parameters were employed default values. This is a standard time-series clustering method. In the proposed method, the Euclidean distance was also used as the metric, and the inverse of the metric was used to minimize the first term in Eq. (4). The data were multiplied by 104 before being transferred to DA3 because it can only handle integer values. Since all data points should belong to one of the clusters in this dataset, the parameter λ1 was approximately 100 times larger than λ2. The actual values used for the calculation are shown in the code attached in Supplementary Note 3. In this condition, a solution that all data points belonged to one of the clusters (the second term of Eq. (4) was 0) was obtained.

The second dataset used in this study was the flow image data obtained in our previous study40,41,42, which were measured using the pressure-sensitive paint (PSP) method47,48,49. The PSP method is a pressure distribution measurement technique based on the oxygen quenching of the phosphorescence emitted from the dyes incorporated into the PSP coating. The measured data were the pressure distribution induced by the Kármán vortex street behind a square cylinder as shown in Fig. 3a. The data size was 780 × 780 spatial grids. The flow velocity was 50 m/s, and the Reynolds number was 1.1 × 105. The number of data points was 720. The pressure difference was too small to be detected using the PSP technique because of the small variation in the phosphorescence intensity. Then, the measured pressure contained noticeable noise, and the noise should be reduced from the data. It is well known that the Kármán vortex is a periodic phenomenon. The data were classified into several phase groups and averaged within these groups to reduce the noise and extract useful patterns, which is one of the purposes of time-series clustering. The cosine similarity measure was used to assess the similarity between the data because we focused on the phase information of the vortex. Since the PSP data were a time-series image data with two spatial dimensions and one temporal dimension, the pressure distribution data were reshaped into a column vector. Consequently, the time-series PSP data are written as n time-series data \({{{{{\bf{X}}}}}}=\{{{{{{{\bf{x}}}}}}}_{1},{{{{{{\bf{x}}}}}}}_{2},\cdots ,{{{{{{\bf{x}}}}}}}_{n}\}\), where \({{{{{{\bf{x}}}}}}}_{i}\) is a vector corresponding to a reshaped pressure distribution. Since the measured PSP data contains significant noise of SNR ~ 1, the denoised data was used for the calculation of the similarity. Following the literature50, the dataset with small noise can be obtained by considering the truncated singular value decomposition (SVD). We considered a data matrix \({{{{{\bf{Y}}}}}}=[{{{{{{\bf{x}}}}}}}_{1}\,{{{{{{\bf{x}}}}}}}_{2}\cdots {{{{{{\bf{x}}}}}}}_{n}]\), where the data matrix Y was obtained by arranging vectors xi in time-series order. SVD provides the following representation:

$${{{{{\bf{Y}}}}}}={{{{{\bf{U}}}}}}{{{{{\boldsymbol{\Sigma }}}}}}{{{{{{\bf{V}}}}}}}^{{{{{{\rm{\top }}}}}}}$$
(7)

where the matrices U and V are unitary matrices, and the superscript shows the transpose. The matrix Σ is a diagonal matrix of singular values arranged in descending order. It is well known that the data can be approximated by a truncated SVD51 as follows:

$$\widetilde{{{{{{\bf{Y}}}}}}}=\widetilde{{{{{{\bf{U}}}}}}}\widetilde{{{{{{\boldsymbol{\Sigma }}}}}}}{\widetilde{{{{{{\bf{V}}}}}}}}^{{{{{{\rm{\top }}}}}}}$$
(8)

where \(\widetilde{{{{{{\boldsymbol{\Sigma }}}}}}}\) is a first r × r diagonal matrix and r is a truncation rank. The matrices \(\widetilde{{{{{{\bf{U}}}}}}}\) and \(\widetilde{{{{{{\bf{V}}}}}}}\) are reduced matrices corresponding to \(\widetilde{{{{{{\boldsymbol{\Sigma }}}}}}}\). Then, we obtained the noise-reduced time-series data matrix of \(\widetilde{{{{{{\bf{Y}}}}}}}=\left[{\widetilde{{{{{{\bf{x}}}}}}}}_{1}\,{\widetilde{{{{{{\bf{x}}}}}}}}_{2}\cdots {\widetilde{{{{{{\bf{x}}}}}}}}_{n}\right]\) or the time-series data of \(\widetilde{{{{{{\bf{X}}}}}}}=\left\{{\widetilde{{{{{{\bf{x}}}}}}}}_{1},{\widetilde{{{{{{\bf{x}}}}}}}}_{2},\cdots ,{\widetilde{{{{{{\bf{x}}}}}}}}_{n}\right\}\). We set r = 5, which is a commonly used truncation value. Subsequently, the cosine similarity \(\cos {\theta }_{i,j}\) was calculated as follows:

$$\cos {\theta }_{i,j}{{{{{\boldsymbol{=}}}}}}\frac{\left\langle {\widetilde{{{{{{\bf{x}}}}}}}}_{i},{\widetilde{{{{{{\bf{x}}}}}}}}_{j}\right\rangle }{{{{{{{\boldsymbol{||}}}}}}{\widetilde{{{{{{\bf{x}}}}}}}}_{i}{{{{{\boldsymbol{||}}}}}}}_{2}{{{{{{\boldsymbol{||}}}}}}{\widetilde{{{{{{\bf{x}}}}}}}}_{j}{{{{{\boldsymbol{||}}}}}}}_{2}}$$
(9)

where \(\left\langle {\widetilde{{{{{{\bf{x}}}}}}}}_{i},{\widetilde{{{{{{\bf{x}}}}}}}}_{j}\right\rangle\) is the inner product of \({\widetilde{{{{{{\bf{x}}}}}}}}_{i}\) and \({\widetilde{{{{{{\bf{x}}}}}}}}_{j}\), and \({{{{{{\boldsymbol{||}}}}}}{\widetilde{{{{{{\bf{x}}}}}}}}_{i}{{{{{\boldsymbol{||}}}}}}}_{{{{{{\boldsymbol{2}}}}}}}\) is the \({{{{{\mathcal{l}}}}}}2\) norm of \({\widetilde{{{{{{\bf{x}}}}}}}}_{i}\). In the similarity calculation, we only considered the pressure distribution behind the square cylinder to reduce the computational cost (see Fig. 3b). Substituting \({d}_{i,j}=\cos {\theta }_{i,j}\) in Eq. (4), we calculated the clustering using DA3. Since the data were also multiplied by 104 before being transferred to DA3, \({d}_{i,{j}} \sim {10}^{3}\). The parameter \({\lambda }_{1}\) was 40 times larger than λ2 (λ1 = 1 × 106), ensuring that each term in Eq. (4) was of a similar magnitude. In this condition, some data were classified as outliers. The images within the same cluster were ensemble averaged to extract useful patterns. Here, we note that the original image data X was averaged to extract the patterns, while the truncated dataset of \(\widetilde{{{{{{\bf{X}}}}}}}\) was not used.

Considering that \(\cos {\theta }_{i,j}\) lies within the range of −1 to 1, we introduced the following relation ri,j, which range from 0 to 1:

$${r}_{i,j} =\frac{\cos {\theta }_{i,j}+1}{2}\\ ={\cos }^{2}\frac{{\theta }_{i,j}}{2}\\ =1-{\sin }^{2}\frac{{\theta }_{i,j}}{2}$$
(10)

where \(1-{r}_{i,j}=0\) when i = j (the same data). Then, we define the distance metric between the data as \(|\sin \left({\theta }_{i,j}/2\right)|\), where |a| represents the absolute value of a and \({\theta }_{i,j}\) is the angle between data vectors. The maximum value of the distance is unity in this distance metric. This distance metric was used in the MDS calculation.

The time-series data were also classified by the “TimeSeriesKMeans” function in “tslearn” described above. In addition, we used the CIS method43,44, which is a specialized methods designed specifically for PSP measurements. In the CIS method, the time-series data were classified into several phase groups based on the pressure data measured by a pressure transducer sensor which is a point sensor with a higher sampling rate than PSP. In other words, the CIS method requires an additional sensor for clustering. This reliance on an extra sensor can be considered one of the limitations of the CIS method.