A power quality disturbances classification method based on multi-modal parallel feature extraction

Power quality disturbance (PQD) is an important problem affecting the safe and stable operation of power system. Traditional single modal methods not only have a large number of parameters, but also usually focus on only one type of feature, resulting in incomplete information about the extracted features, and it is difficult to identify complex and diverse PQD types in modern power systems. In this regard, this paper proposes a multi-modal parallel feature extraction and classification model. The model pays attention to both temporal and spatial features of PQD, which effectively improves classification accuracy. And a lightweight approach is adopted to reduce the number of parameters of the model. The model uses Long Short Term Memory Neural Network (LSTM) to extract the temporal features of one-dimensional temporal modes of PQD. At the same time, a lightweight residual network (LResNet) is designed to extract the spatial features of the two-dimensional image modality of PQD. Then, the two types of features are fused into multi-modal spatio-temporal features (MSTF). Finally, MSTF is input to a Support Vector Machine (SVM) for classification. Simulation results of 20 PQD signals show that the classification accuracy of the multi-modal model proposed in this paper reaches 99.94%, and the parameter quantity is only 0.08 MB. Compared with ResNet18, the accuracy of the proposed method has been improved by 2.55% and the number of parameters has been reduced by 99.25%.

www.nature.com/scientificreports/networks (CNN) transform time-series signals into two-dimensional images, and then automatically extract spatial features from the images 23 .Literature 24 used continuous wavelets to convert PQD into color images, and then used Bayesian CNN for classification.This method achieves certain results, but ignores the influence of some temporal features.Methods represented by recurrent neural networks (RNNs) extract temporal features from one-dimensional PQD signals and then perform classification 25 .Literature 26 proposes introducing a dual attention mechanism in Bi-LSTM to increase the weight of important features, reducing computational complexity and improving accuracy.This approach effectively extracts the temporal features of the signal, but does not focus on the impact of spatial features on classification.Literature 27 proposes a hybrid neural network model to convert PQD into a perturbed image, using CNN to automatically extract spatial features of the image, and then inputting the temporal features into a gated recurrent unit (GRU) for classification.This method extracts the spatial-temporal characteristics of the PQD signal, but may result in the loss of some temporal features during image conversion.Although the above methods can eliminate the interference of human factors, these singlemodality models can easily lead to varying degrees of feature loss during feature extraction, which can have an impact on PQD classification.Meanwhile, in order to achieve better classification results, these deep learning models continuously increase the depth of the model by stacking, which makes the number of parameters of the model increase significantly and improves computational difficulty.
In recent years, inspired by the multisensory (visual and auditory) perception of the world by humans, research on classification methods has gradually shifted from unimodal to multi-modal domains 28 .Multimodal data fusion aims to combine different distributed and different types of data in a single space, including images, audio, and measurement signals 29 .Currently, it is mostly used in medical diagnostics, acoustics, and vision [30][31][32] , research in PQD recognition suffers from the problems of feature extraction methods and a large number of model parameters 33 .Multimodal information is obtained by fusing different modalities of information, and its amount of information exceeds that of a single modality.In previous studies, most of them adopted a single mode approach, that is, extracting features from one-dimensional signals or two-dimensional images.However, the types of PQDs in modern power systems are complex and diverse, and most single modal methods are prone to feature loss when extracting features, resulting in an inability to fully grasp the characteristics of the signal 34 .To address the above issues, this paper combines these types of data and proposes a PQD classification method based on multimodality LResNet-LSTM parallel feature extraction.The model simultaneously inputs the onedimensional time-series signal and two-dimensional disturbance images of the PQD.LSTM and LResNet are used to extract the temporal and spatial features, which are then fused into a multimodal spatiotemporal feature (MSTF), and finally input into SVM for classification.
Based on the above research, this paper proposes a PQD classification method based on multi-modal LResNet-LSTM parallel feature extraction.The model uses LResNet and LSTM to extract spatial features and temporal features in parallel, and then fuses them into MSTF, which are finally input into SVM for classification.
The main contributions of this paper are as follows:

Parallel feature extraction module
Due to the fact that LSTM needs to calculate the previous result before calculating the next result when processing sequences, it is unable to directly combine LSTM with LResNet into a parallel computing model.Therefore, we need to split the PFE module into two separate submodules: SFE and TFE.The two sub-modules can be used to extract the spatial features and temporal characteristics of the PQD signals respectively.

SFE module
SFE module consists of two parts: image conversion and feature extraction.Image conversion is done by gramian angular field (GAF), which maps the PQD signal into the polar coordinate system using the GAF method and converts it into a scrambled image by coding 35 .This method has been widely used to convert 1D signals into 2D images.The conversion process is as follows.
Step 1: The timing signal x(t) is mapped into the polar coordinate system.Where, is the angle and r is the radius.
Step 2: Convert to Gram matrix.
Step 3: The imagesc function in MATLAB is called to implement the image conversion.Each element in the matrix specifies a pixel color in the image.
This article designed a lightweight residual network that focuses on the inherent relationships in the spatial domain, aiming to extract key information from images.The structure of LResNet is shown in Fig. 2. In the graph, BN refers to batch normalization and Swish is an activation function.ReLU is a commonly used activation function in neural networks.It is a left-saturated function with a derivative of 1, which prevents the gradient from decaying as quickly as the sigmoid function does, providing the advantage of speeding up training and overcoming gradient extinction.However, when the ReLu input is negative, the output is always 0, resulting in no activation.Swish overcomes the problem that ReLu is invalid when the input is negative 36 .The Swish functional expression is shown in Eq. (3).
Where β is a constant.
PQD image generated through convolution operation 112 × 112 × 64 feature diagram.The maximum pooling layer compresses the input feature map and extracts the main features, while enhancing the robustness of the model to some extent.To achieve a lightweight model, this paper uses two deep separable convolutions (GConv)

TFE module
The PQD signal is fed into the TFE module to extract temporal features, and the first requires normalization of the raw data.In this paper, min-max normalization is used to map the perturbed signal x to between [0,1].The formula is Eq. ( 4).
Then input the normalized data into LSTM for training and extract temporal features.LSTM is a neural network that is improved to solve the problem of gradient vanishing in recurrent neural networks 37,38 .The input gate of LSTM is used to read data, while the forgetting gate discards useless information and preserves valid information.The output gate transmits effective information to the next moment.
The input gate of LSTM is used to read the data and the forgetting gate discards the useless information and keeps the valid information.The output gate delivers valid information to the next moment.The calculation formula are shown in Eqs. ( 5)- (10)   At time t, x t is the input of time t, σ is sigmoid function, f t is forgetting gate, i t is input gate, g t is output of tanh function, o t is output gate, C t−1 is the carrier of the previous round of global information, h t−1 is the intermedi- ate state output of the previous round.C t is the carrier of this round of global information, h t is the intermediate state output of this round.W f , W i , W g , W o are the weights of the corresponding symbols, b f , b i , b g , b o are the bias of the corresponding symbols.
Due to the different operating speeds of LSTM and LResNet, the time required for the TFE and SFE modules to complete feature extraction is not synchronized.Therefore, the module that completes the extraction first needs to wait for the other module to complete its operation before being input into the MFF layer for feature fusion.Experiments have shown that the TFE module runs faster, and the extracted temporal features need to be stored in memory before the SFE module completes the spatial feature extraction.Usually, data is stored in main memory, waiting for the CPU to issue a call instruction, and then the data is input into the CPU to execute the subsequent program.However, the CPU access to main memory is slow.To speed up the call, we use a cache between the CPU and main memory to store the timing characteristics.When the SFE completes its operation, the CPU reads the timing characteristics data directly from the Cache at high speed, speeding up the program's execution.

MFF and classification
The PQD signal uses SFE to extract a size of 1 × 1 × 128 spatial features, extracted using TFE with a size of 1 × 1 × 128 temporal features.Directly concatenate and fuse two sets of feature vectors to obtain 1 × 1 × 256 is shown in Fig. 3.This multimodal feature vector preserves the spatiotemporal feature information of the PQD signal to the maximum extent through direct concatenation.SVM is a machine learning algorithm that can handle high-dimensional data, overcome dimensionality catastrophe, has better robustness and interpretability, and has better generalization ability to provide more reliable results.Therefore, SVM is chosen as the classifier in this paper.SVM maps multimodal perturbation features into a high dimensional space by means of a kernel function and searches for an optimal hyperplane in that space to classify the PQD 39 .The classification problem can be transformed into a quadratic programming problem.The objective function and constraints is shown in Eq. (11).
In Eq. ( 11), C is the penalty factor, ξ i is the relaxation term, x i is the training data, y i is the classification label, ω is the weight matrix, and b is the polarization parameter.The linear kernel function K(x i , x j ) = |ϕ(x i ), ϕ(x j )| is used to find the optimal hyperplane through global search, and the optimal classification decision function is obtained as shown in Eq. ( 12).
In Eq. ( 12), a * i and b * are solutions of the above formula, n is the number of training samples, x j is the input vec- tor, y j is the corresponding expectation.

Simulation experiments PQD dataset
The mathematical model of the PQD signal is based on IEEE 1159-2019 standard 40 .The mathematical model of a single disturbance is shown in Table 1.A single disturbance superimposes each other to form a composite disturbance.Using MATLAB to call the rand function to generate 20 types of power quality signals with random amplitude and random disturbance occurrence time within the range of parameters and at a sampling frequency of 3 kHz, and add 30 dB white noise to simulate noise interference during the acquisition process.The PQD signal includes normal voltage (S1), sag (S2), swell (S3), interruption (S4), harmonics (S5), transient oscillation (S6), flicker (S7), and transient pulse (S8), sag + oscillation (S9), swell + oscillation (S10), flicker + oscillation ( 11) Table 1.Mathematical model of power quality signal.

Evaluation indicators
In this paper, accuracy recall rate, precision rate, F1 score and parameters are used as evaluation metrics.The calculation formula is shown in Eqs. ( 13)- (15).
Where TP denotes true positive.TN means true negative.FP means false positive.FN means false negative.

Simulation analysis
The two sub-modules of the multi-modal model only need simple training, and then the output of the prediction intermediate layer is fused through the MFF layer, and finally input into the SVM to realize PQD classification.
The SFE module converts the PQD signal into an image using GAF, and the conversion result is shown in Fig. 4. Set LResNet training for 1 round with a batch size of 30 and a learning rate of 0.001.The number of neurons of LSTM is set to 128, the maximum number of rounds is 30, the batch size is 30, the initial learning rate is 0.001, and the learning rate decreases by a factor of 0.1 every 10 rounds.Both modules use the 'adam' optimizer.Using cross entropy as the loss function, the mathematical formula for the function is shown in Eq. ( 16).
In Eq. ( 16), y is the expected output and a is the actual output.
The confusion matrix is introduced to show the classification results of the model, as shown in Fig. 5.In Fig. 5, only one S2 (red box) was incorrectly identified as S20 in the test set, and the rest of the types were accurate.To verify the role of each module, ablation experiments were done in this paper, and the experimental protocol and classification results are shown in Table 2. www.nature.com/scientificreports/ As can be seen from Table 2, it can be seen that the correct recall rate of Scheme 3 has increased by 6.22% compared to Scheme 2, and the recall rate of Scheme 5 has increased by 0.11% compared to Scheme 4. This verifies that the model using the Swish activation function is better than the model using the ReLu activation function.The correct rates when using LSTM and LResNet alone are only 76.94% and 86.44% due to insufficient feature information extracted by a single modality.The fusion of the features of the two modalities resulted in a significant increase in the correctness rate, which reached 99.94%.Therefore, multi-modal features contain more information than single modal features and can fully grasp the characteristics of PQD signals.And the model does not add too many parameters, only 0.08 MB.
In order to reflect the advantages of multi-modal models, nine single modal models are built for comparison in this paper, namely GRU, AlexNet, GoogLeNet, Xception, ResNet18, ResNet50, ResNet101, EfficientNet-B0, MobileNetV2 and ShuffleNetV1.The same data set was used for the experiments.Calculate the evaluation indicators of the model according to formulas (13-15), and the comparison results are shown in Table 3.

Real data validation
In order to further validate the feasibility of the method, this paper validates the method using a set of real signals as inputs.The dataset is provided by the Kaggle public database and includes six categories of power quality signals, namely normal voltage (S1), sag (S2), harmonics (S5), transient pulse (S8), sag + oscillations (S9), and sag + harmonics (S12).There are 600 samples for each signal type.The confusion matrix of the classification results is shown in Fig. 6.
As can be seen in Fig. 6, 2 groups of S1 are incorrectly identified as S8 due to the influence of noise, 2 groups of S2 are identified as S8 due to smaller amplitude of voltage drop and noise interference, 11 groups of S8 are identified as S1 due to smaller amplitude of pulse, 1 group of S8 is incorrectly identified as S5 due to larger  www.nature.com/scientificreports/number of pulses in the sample, and 1 group of S9 is identified as S1 due to smaller amplitude of voltage drop and oscillation.Although there is a certain difference between real data and simulated data, the model of this paper still achieves 99.53% classification accuracy, which is only 0.41% less than the simulation results, thus verifying the effectiveness of the method.

Conclusion
For the problem of PQD classification, this paper proposes a PQD classification model based on multimodal LResNet-LSTM parallel feature extraction.
• The model proposed in this article consists of three modules: PFE, MFF, and classification.The two sub- modules SFE and TFE of PFE are utilized to extract spatial and temporal features in parallel.Then merge the two types of features into a MSTF.Finally, the MSTF is input into SVM for classification.The model can recognize 20 types of PQDs with an accuracy rate of 99.94% and a parameter size of only 0. 08MB.• A simple-structured Light ResNet was designed based on residuals.Unlike traditional ResNet18, the residual block of LResNet uses two deep separable convolutions, greatly reducing the number of parameters in the model.And LResNet uses the Swish activation function instead of the original ReLu, which optimizes the classification performance of the model.• This article uses a high-speed cache to address data storage problems caused by asynchronous execution of SFE and TFE, but the capacity of high-speed cache is limited and unsuitable for large-scale data storage.Therefore, we are considering adopting a more appropriate method to control the operation of both modules in the future, enabling them to complete feature extraction simultaneously, and avoiding data transfers and reads.• Unlike traditional deep learning models that improve classification accuracy by increasing depth, the pro- posed model in this paper not only improves classification accuracy, but also reduces the number of parameters in the model.

Multi-modal PQD classification model PQD classification model framework
• This paper proposes a PQD classification model with multi-modal parallel feature extraction.The model designs spatial feature extraction (SFE) module and temporal feature extraction (TFE) module based on LResNet and LSTM respectively.The multi-modal feature fusion (MFF) module fuses the features • Classification: This module uses a better performing SVM as a classifier, first inputting MSTF for training, and then inputting the test set into the trained SVM for classification.

Table 3 .
Comparison of multi-modal and single modal model.