Research on stress curve clustering algorithm of Fiber Bragg grating sensor

The global stress distribution and state parameter analysis of the building's main structure is an urgent problem to be solved in the online state assessment technology of building structure health. In this paper, a stress curve clustering algorithm of fiber Bragg grating stress sensor based on density clustering algorithm is proposed. To solve the problem of large dimension and sparse sample space of sensor stress curve, the distance between samples is measured based on improved cosine similarity. Aiming at the problem of low efficiency and poor effect of traditional clustering algorithm, density clustering algorithm based on mutual nearest neighbor is used to cluster. Finally, the classification of the daily stress load characteristics of the sensor is realized, which provides a basis for constructing the mathematical analysis model of building health. The experimental results show that the stress curve clustering method proposed in this paper is better than the latest clustering algorithms such as HDBSCAN, CBKM, K-mean++,FINCH and NPIR, and is suitable for the feature classification of stress curves of fiber Bragg grating sensors.

In Eq. ( 2), ε is the axial strain of the fiber grating, α is the thermal expansion coefficient of the fiber, ξ is the thermal optical coefficient of the fiber, T is the temperature change.P e is called the effective photoelastic coef- ficient, and can be expressed as In Eq. ( 3), υ is the Poisson's ratio.When using a fiber with pure silica core and boron dioxide doped cladding, P 11 = 0.121 , P 12 = 0.27 , υ = 0.17 .When the effective refractive index of the fiber is n eff = 1.46 , the effective photoelastic coefficient P e = 0.22 .When B is measured by a fiber grating demodulator, its axial stress change ε can be expressed as 3

Stress data processing method of FBG sensor
There are many methods for structural health monitoring in different scenarios.Kahandawa et al. 3 proposed a fixed filter decoding system and an integrated artificial neural network algorithm for extracting strain from embedded FBG sensor.On the basis of Kalman filter, Song et al. 4 taked the strain value measured by FBG sensor as the observed signal.Through the gain matrix, new information sequence and covariance matrix generated by Kalman filter, the least square algorithm is used to estimate the load size in real time.However, this algorithm can only calculate the real-time local load of the building, and cannot give the current overall health status of the building based on the past sensor data.Zhang et al. 5 proposed a model reconstruction predicting algorithm based on PSO-SVR to achieve the self-repairing of the FBG sensor network in SHM system.Through this algorithm, the reliability and survivability of the FBG-based SHM system is enhanced if partial FBG sensors are invalid.Stotaw Talbachew Hayle et al. 6 used deep learning technique to accurately identify the Bragg wavelength of FBGs in the condition of the partially or fully overlapped spectra to improve the reliability and detection accuracy of the sensor system even the number of overlaps FBGs spectra increases.Jiang et al. 7 developed an FBG sensingbased structural health monitoring system for Chinese ancient Chuan-dou-type timber buildings that aims at monitoring structural deformation.Methods proposed by Zhang et al. 8 can accurately identify structural macro strain modal shapes and are much more robust than traditional modal parameter based indexes for structural damage detection.Sierra-Pérez et al. 9 presented a damage detection methodology based on optimal baseline selection by means of clustering techniques which includes the use of hierarchical nonlinear PCA as a nonlinear modeling technique.Luckey et al. 10 proposed conceptual explainable artificial intelligence framework that provide a basis for improving ML acceptance and transparency and therefore increase trust in ML algorithms implemented in SHM applications.
Most of the above methods used FBG sensors to monitor loads in different scenarios in real time, without assessing or predicting the overall health status of buildings from a global perspective.So this paper puts forward a scheme to solve this problem.

Clustering algorithms
Clustering algorithm is a commonly used unsupervised learning algorithm and can be used to find hidden patterns in datasets.The purpose of clustering is to divide the samples in a dataset into several disjoint subsets.The data of fiber grating building stress sensor is usually collected at a certain time interval.The stress data collected from the same sensing point exists in the form of curves.In this paper, the daily load stress curve of stress sensor is taken as a sample, and the clustering algorithm is used to analyze it, and the mathematical model of building health is built.Traditional clustering algorithms have low clustering efficiency and poor clustering effect [11][12][13] .For example, the clustering effect of K-means algorithm 14 depends heavily on the selection of cluster centers.The clustering results often fall into local optimum due to inappropriate initial cluster centers or the influence of noise and boundary points, and cannot adapt to non-convex shape data, and the K-means algorithm also needs to specify the number of clusters 15,16 .CBKM 17

Density clustering algorithm based on mutual nearest neighbor
This subsection mainly describes the specific process of mutual nearest neighbor-based density estimation clustering.Given the number of neighbors k, k is a positive integer, and the distance between x and y is d x, y .Assuming that the sample set is S, the set of k nearest neighbors of a sample point x in the set, that is, the set of k sample points closest to the sample point x is expressed as: The k-inverse neighbor set of sample point x is expressed as: The k-mutual neighbor set of sample point x is expressed as: Algorithm 1 describes the overall clustering algorithm improved from 28 , followed by a detailed discussion of its time complexity Vol:.( 1234567890  MakeClusterBackbones(Algorithm2): After removing the noise points with |MNN k p i | = 0 , the remaining points are divided into backbone points and non-backbone points.Backbone points are used to construct the skeleton of a cluster and non-backbone points are processed in the second stage.When two backbone points are mutual neighbors ,they are connected to form the skeleton of the same cluster.
(2) Merge Non Backbones Points (Algorithm3): For each non-backbone point w i , find out the MNN k (w i ) of that contains the most backbone points belongs to certain cluster.If there are some backbone points belongs to a certain cluster in MNN k (w i ) and the number of that is greater than or equal to k/5 , it is determined that it belongs to the certain cluster.Otherwise, w i is marked as a noise.

Time complexity
Suppose that the number of points in the dataset is N, the number of nearest neighbors is k, the dimension of the data is D, and the number of clusters constructed is C. The time complexity for constructing the distance matrix is O NlogN 28 and 29 .In algorithm 1, it takes K iterations to calculate both NN k p i and RNN k p i of each point, so the time complexity of this part is O(Nk) 30 .The time complexity analysis of the remaining part of algorithm 1 is as follows:

Pattern recognition algorithm for stress curves of fiber bragg grating sensors
Figure 2 shows the pattern recognition algorithm flow of the stress curve of the fiber grating building stress sensor based on the mutual neighbor clustering algorithm, in which the clustering adopts the algorithm described in 3.1.
Firstly, the daily load data set of the building stress sensor is obtained.The data set contains P samples, and each sample has a data set matrix of Q time point attributes.The building health monitoring system collects sample data through fiber grating sensors installed in the building, and each sample is the stress daily load of a sensor.
Secondly, preprocess the acquired daily load data of the building stress sensor to obtain an initial cluster, wherein the preprocessing progress includes missing value processing, data standardization, data regularization and data dimensionality reduction on the daily load data of the building stress sensor.After this step, the initial cluster obtained is a dataset matrix of p samples with q attributes per sample.
Specifically, the missing value processing is to delete the samples with few valid values, and complete the missing values of the samples with many valid values.Of course, when deleting attributes with few valid values, redundant attributes can be deleted together.
In the process of deleting samples, if n samples are deleted, p samples remain, where p = P − n.In addition, there are various ways to supplement the missing values.In this application, the average of the existing validity is taken as the filling value of the missing value.Those skilled in the art can choose other supplementary methods, which will not affect the subsequent analysis process.

Start Obtain daily load data from building stress sensors
Preprocessing building stress sensor daily load data to obtain initial clusters Perform dimensionality reduction processing on the initial cluster to obtain a dimensionality reduction cluster; Clustering algorithm 1 is used to cluster the dimensionality reduction clusters to obtain sensor clusters with similar characteristics, and then to obtain the clustering features of sensor daily load.This formula realizes the proportional scaling of the original data, where X norm is the normalized data, and X is the original data.

Output pattern recognition result
Data regularization is to subtract each attribute from the corresponding mean of the attribute, and then divide by the corresponding variance of the attribute.After standardization and regularization, the data of each attribute are clustered around 0 and the variance is 1, that is, the obtained sample data has zero mean and unit variance.
Data dimensionality reduction is to use PCA (Principal Component Analysis), to reduce the dimensionality of the data set, and obtain the processed dimensionality reduction cluster S. In the process of dimensionality reduction, if the number of dimensionality reduction is q, then the dimensionality reduction cluster S after dimensionality reduction is a dataset matrix with p samples and each sample has q attributes.
A clustering algorithm is used to classify the data in the data set S, so as to identify sensor groups with similar characteristics.Using the density estimation clustering algorithm based on mutual neighbors, the p samples in the dataset are divided into M classes, where M is a positive integer.Since the dimension of the sample data after dimensionality reduction is still large, in order to avoid the problem of sparse sample space, the distance between the samples is calculated by the improved cosine similarity: The Calinski-Harabasz (CH) value is used to evaluate the clustering effect, and the result with the largest CH value is selected, and finally M classes of sensor clusters are obtained.The specific calculation process of Calinski-Harabasz (CH) is as follows: In the above formula, m represents the number of clusters, h represents the current class, trB(h) represents the trace of the inter-class dispersion matrix, and trW(h) represents the trace of the intra-class dispersion matrix.The larger the CH value, the closer the samples within the class, the more dispersed the samples between different classes, that is, the better clustering results.

Experimental results
In order to verify the feasibility and superiority of the proposed algorithm, K-Means++, CBKM, HDBSCAN, FINCH, NPIR and the proposed algorithm are used to cluster the data sets with known classification labels and the actual collected data of fiber grating sensor, and then the clustering effect is analyzed.

Analysis of clustering results of actual collected data
The experimental device platform used in this paper to collect fiber grating sensor data consists of fiber demodulator, FBG sensor, and stress generator.The experimental platform is shown in Fig. 9.The FBG sensor is placed in the stress generator device which can apply different types of force shocks to the FBG sensor through programming.The center wavelength of the FBG is demodulated through the GM8050C fiber demodulator to calculate the stress value.The GM8050C fiber demodulator has 4 channels, each of which can be connected to 16 FBG stress sensors in series.
In this paper, the stress curves of 4 FBG sensors are collected as data samples through the experimental device platform.The collection time interval is 2 min and the collection period is 24 h.The daily load data of the stress sensor consists of 720 stress data.Each stress curve sample is collected continuously for 3 months.The proposed algorithm, K-Means++, HDBSCAN,CBKM,FINCH and NPIR algorithms were used to compare the clustering effect of the daily load data of S1, S2, S3, S4 sensors and the mixed data of the four sensors respectively.The optimal results of the Calinski-Harabasz(CH) index obtained by the experimental test are shown in Table 1.

Result and discussion
Experimental results show that proposed algorithm greatly outperforms other algorithms on many datasets such as A1, A2, Aggregation, Compound, D31, Pathbased, Mouse, R15, S1.The performance of NPIR, HDBSCAN and CBKM on Frame, Jain, Spiral, G2-2-10, Vary density datasets is almost equal to that of the proposed algorithms.This is due to the framework adopted by the proposed algorithm in the clustering process can distinguish real (8) clusters in noisy data, even if they are connected to each other or even overlap, provided that they have distinguishable densities.The clustering effect of HDBSCAN deteriorates at different densities between clusters because of the fixed neighborhood radius.Centroid based algorithms like K-Means++and CBKM fail when the centroid of a cluster is closer to the data points belongs to other cluster than to that of its own cluster.FINCH will output incorrect clustering results due to inappropriate location of the merged cluster center during hierarchical merging.NPIR perform poorly when the data is noisy or the cluster density is different and slightly overlapped.The process of calculating the MNN k of each point in the proposed algorithm is actually constructing the structure of mutual neighbor graph, and then completing the clustering by dividing the subgraphs.The structure of the mutual nearest neighbor graph is determined only by the parameter K, and there are no hyperparameters related to the distance between points.Therefore, the algorithm still performs well when the cluster density is different or there are connections between clusters.The last row of the Table1 shows the ranking of the proposed algorithm in the performance of all six algorithms based on Calinski-Harabasz (CH) index.The results show that the CH value of the algorithm in this paper are both greater than that of other five algorithms, which proves that the clustering effect of the algorithm in this paper is suitable for the actual sensor daily load curve.In general, it can be seen that the clustering effect of the  proposed algorithm in this paper is better than that other five algorithms, and the data of the fiber grating stress sensor can be clustered with higher accuracy.However, the proposed algorithm still has problem: (1)the k value for the nearest neighbor in the the algorithm still cannot be determined adaptively.(2)The time complexity of the algorithm is still unacceptably high when large samples need to be processed.

Conclusions
Aiming at the clustering problem of the stress curve of the fiber grating stress sensor, this paper obtains the initial cluster by preprocessing the stress data of the fiber grating building stress sensor.The initial cluster is dimensionally reduced to obtain a dimensionality-reduced cluster.The initial clusters are clustered by the clustering algorithm based on mutual neighbors, and the clustering effectiveness index is used to evaluate the clustering results, and finally sensor groups with similar characteristics are obtained.The cosine similarity is used to calculate the sample distance based on the mutual neighbor clustering algorithm, which solves the problem of sparse sample space caused by the large dimension of the sample data.The experimental results show that compared with the other five clustering algorithms, the proposed algorithm has a better pattern recognition effect.

Algorithm1:
Clustering algorithm based on mutual nearest neighbor Input:Data points },k //size of the neighborhood of a point Output: //set of generated clusters Intialization: let backbone point set , non-backbone point set 1 caculate the of each point use BallTree algorithm 2 construct digraph ⃗ ) where ⃗ ⃗ ( ) Algorithm 1 consists of the following two stages: (1) The first stage: P are sorted based on |RNN k (x)| in a descending order.Points in dataset which have the property of |MNN k p i | = 0 are labeled as noise points.

Figure 2 .
Figure 2. Pattern recognition algorithm flow of the stress curve.

Figure 3 .
Figure 3.Comparison of NMI scores in standard data sets.

Figure 4 .
Figure 4. Comparison of AMI scores in standard data sets.

Figure 5 .
Figure 5.Comparison of ARI scores in standard data sets.

Figure 6 .
Figure 6.Comparison of purity scores in standard data sets.

Figure 7 .
Figure 7.Comparison of homogeneity scores in standard data sets.

Figure 8 .
Figure 8.Comparison of completeness scores in standard data sets.
is an improved algorithm of KMeans algorithm.It improves

Table 1 .
The CH results of the proposed algorithm and other five algorithms.