Microseismic records classification using capsule network with limited training samples in underground mining

The identification of suspicious microseismic events is the first crucial step in microseismic data processing. Existing automatic classification methods are based on the training of a large data set, which is challenging to apply in mines without a long-term manual data processing. In this paper, we present a method to automatically classify microseismic records with limited samples in underground mines based on capsule networks (CapsNet). We divide each microseismic record into 33 frames, then extract 21 commonly used features in time and frequency from each frame. Consequently, a 21 × 33 feature matrix is utilized as the input of CapsNet. On this basis, we use different sizes of training sets to train the classification models separately. The trained model is tested using the same test set containing 3,200 microseismic records and compared to convolutional neural networks (CNN) and traditional machine learning methods. Results show that the accuracy of our proposed method is 99.2% with limited training samples. It is superior to CNN and traditional machine learning methods in terms of Accuracy, Precision, Recall, F1-Measure, and reliability.

Underground engineering causes disturbances in the stress state of the rock mass, leading to a large number of microseismic events 1 . By post-processing these records (e.g., P-wave arrival picking 2 , event location 3 , and source parameter calculation [4][5][6], the mechanical state of the corresponding rock mass can be adequately reflected, which is beneficial especially for disaster early warning in underground mining [7][8][9] . However, in the underground mining process, the microseismic monitoring system often receives interference from blasting operations, ore extraction, mechanical operations, high voltage cables, and magnetic fields 10 . Therefore, quickly and accurately identifying microseismic records from a large number of suspicious records is a crucial task. Currently, the classification of suspicious microseismic records depends on the visual scanning of waveforms by experienced analysts 11 . However, manual classification of microseismic records is a time-consuming, tedious task that is easy to bring into subjective opinions. For these reasons, automatic classification of microseismic records is urgently needed. Throughout the years, many automatic classification methods have been proposed to address the abovementioned problems in seismic and microseismic fields. Scarpetta et al. 12 established a specialized neural discrimination method for low magnitude seismic events, quarry blasts, underwater explosions, and thunder sources at Mt. Vesuvius Volcano, Italy. Langer 13 , Esposito 14 and Curilem 15 used the machine learning to classify seismic records at the Soufriere Hills volcano (Montserrat), Stromboli island (southern Italy) and the Villarrica volcano (Chile), respectively. Malovichko 16 utilized a set of seismic characteristics and the multivariate maximum-likelihood Gaussian classifier, to quantify a probability that a particular event belongs to a population of blasts. Vallejos and McKinnon 17 presented an approach to the classification of seismic records from two mines in Ontario, Canada, by using logistic regression approach and neural network classification techniques. Hammer et al. 18 attempted to automatically classify seismic signals from scratch by utilizing a hidden Markov model and 30 features extracted from waveforms. Ma et al. 4 realized the discrimination of mine microseismic events by Bayes discriminant analysis. Dong et al. 19,20 proposed a discrimination method for seismic and blasting events based on a Fisher classifier, a naive Bayesian method and logistic regression; this method regards the logarithm of the seismic moment, the logarithm of the seismic energy, and the probability density function of the arrival time between adjacent sources as features.

Results
we analyze and discuss the proposed method based on the actual application process of the automatic classification method. The accuracy and reliability of CapsNet, CNN and other methods are compared. Figure 1 shows the actual application process of the automatic classification method in the mine.
Training process. Based on the microseismic records from the Huangtupo Copper and Zinc Mine, five training sets are divided according to different proportions, which contain 400, 800, 1,200, 1,600, and 2000 microseismic records. 20% of each training set will be used as the validation set. Moreover, one dataset had 3,200 microseismic records, 800 of each type, and no duplicate elements from the training were set as a universal test set. Training sets of different sizes constitute different training processes, and the situations of different training processes are shown in Table 1. The purpose of different training processes is to test the performance and reliability of CapsNet and CNN under limited samples.   Fig. 2, we trained this two networks in different training processes (as shown in Table 1). The CapsNet consists of 2 convolution layers, a maxpooling layer, 2 ReLU layers, and a unique dynamic routing layer; the CNN consists of 2 convolution layers, a maxpooling layer, 5 ReLU layers, 5 batch normalization layers, 3 fully connected layers, 2 dropout layers, a softmax layer, and a classification layer. the minibatch size of all training process is 10 and ended in 30 epochs. The minibatch accuracy, validation accuracy, minibatch loss, and validation loss during the training process were recorded, and the training process were shown in Fig. 3. From Fig. 3, the training process of CapsNet is stable and converges rapidly. Accuracy, loss, and validation curve closely match the training curve. However, for CNN, its training curve has been repeatedly beaten in 30 Epochs, eventually resulting in a low convergence state, even though it achieves higher accuracy. Through different training processes, we obtained five classification models of CapsNet and CNN, respectively. Accuracy and comparison. Based on the training process of the classification model, this section uses the test set to test the effect of these models. Moreover, the classification result of deep learning method is compared with the result of commonly used machine learning method. The test set consisted of 3,200 actual microseismic records of the Huangtupo Copper and Zinc Mine, with 800 records for each category, and none of these records appeared during the training and verification process. As an evaluation, Accuracy, Precision, Recall, and F1-Measure will be adopted 25    when α = 1, is the most common F1-Measure: Figure 4 shows the test results of the trained CapsNet and the trained CNN, and these results demonstrate the accuracy of CapsNet is always higher than that of CNN. Taking into account more detailed comparisons, the abovementioned Precision, Recall, and F1-Measure are calculated for each type of microseismic records. Figure 5a,b show the Precision of each type of microseismic records in the different training process. From Fig. 5a,b, the Precision of the CapsNet is much larger than the CNN's in both types of microseismic and blasting records, and for both ore extraction and noise, the two are almost identical. It reveals that CapsNet's Precision is superior to CNN's in different experiments. Also, Fig. 5c,d show the Recall of each type of microseismic records in the different training process. From Fig. 5c,d, the Recall of the CapsNet curve is much larger than the CNN in both types of blasting and ore extraction records, and for both microseismic and noise records, the gap still exists, but it is weak. It reveals that CapsNet's Recall is superior to CNN's in different experiments. Through F1-Measure, we take a comprehensive consideration of the above two indicators. Figure 5e,f show the F1-Measure of each type of microseismic records in the different training process. It can be found that the value of CNN's test results is always lower than the value of the CapsNet's test results. Multiple indicators reveal that CapsNet has certain advantages over CNN in the classification of microseismic records.
Moreover, a comparison of the classification performance between the deep learning approach and traditional machine learning methods is presented. Decision tree and k-nearest neighbor (kNN) are often used to classify microseismic records. Therefore, we tested these models by utilizing the same dataset of training process 5 (details in Table 1) and compared their results with the findings from the deep learning approach proposed herein. Table 2 shows the classification results from different classification models, including the CapsNet and CNN presented in this paper, while utilizing the same dataset and features. For the testing accuracy, the CapNet performed the best. The testing accuracy of the CapsNet reached 99.2%, while the accuracies of the machine learning methods were below 90%. Each index of the CapsNet proposed in this paper outperformed those of the other methods. These findings demonstrate that the CapsNet has excellent efficiency and reliability for the classification of microseismic data.

Discussion and conclusion
Additionally, to show that CapsNet has clear advantages over CNN in microseismic records classification, we analyze the reliability of the two from the network classification probability. In deep learning, the final predicted output is composed of the decision probability of the corresponding labels, and the label corresponding to the maximum probability value is used as the predicted class of input. The probability used in this paper is the max probability of predicted output.  www.nature.com/scientificreports/ Figure 6 shows the distribution of classification probability in different training process and classification results (correct and incorrect). For example, Fig. 6a1,a2 show the probability distribution of test samples that predicted class in keeping with the label after training process 5 by CapsNet and CNN. On the contrary, (a3) and (a4) show that of incorrect classification.
For the correct classification, the results of CapsNet is concentrated on higher probability value, which is almost always above 0.70, and there is a larger percentage of results below 0.70 for CNN. Moreover, for the incorrect classification, an excellent classifier should attribute the failure to the hesitant state, that is, the output probability of all types is similar and low. However, the results of CNN are concentrated on higher probability which is above 0.90, many samples are strongly misclassified. But CapNet's results are the opposite of CNN's. Detailed probability distribution comparisons are shown in Fig. 7. In summary, CNN's strong predictive characteristics for both correct and incorrect classifications result in lower reliability than CapsNet. CapsNet's strong prediction of correct classification and weak prediction of incorrect classification can effectively help inspectors to screen the results in specific situations.
In order to more intuitively prove the advantages of CapsNet under limited data, we designed a set of repetitive experiments. We have prepared training sets with different amounts of data, which contain 400, 800, 1,200, 1,600, 2000, 4,000, 8,000, 12,000, and 16,000 microseismic records. We define the data volume less than 2000 as limited training samples. We perform four pieces of training and four tests on the model for each amount of data. As shown in Fig. 8, for each amount of data, we train four models for classification. From the experimental results, it can be seen that under limited training samples, CapsNet still has high accuracy and stability. However, for CNN, its accuracy is low, and the variation is large. As a consequence, CapsNet will outperform CNN in accuracy and stability for real applications with the everlasting scarcity issue of labeled seismic or microseismic data. Thus, CapsNet is a better option when we don't have much labeled data at hand.
We propose a deep learning approach based on CapsNet to realize the automatic classification of microseismic records with limited samples in underground mining. CapsNet is a fully connected network of a series of interconnected capsules. In order to convert the microseismic record into the input for CapsNet, we extract the feature of the microseismic record by dividing a microseismic record waveform into 33 frames and extracting 21 feature parameters from each frame. Consequently, a 21 × 33 matrix is utilized to represent a microseismic record as the input of the CapsNet. On this basis, we use different sizes of training sets to train the classification models separately. The trained models are tested using the same test set containing 3,200 microseismic records and compared to CNN. Results show that CapsNet can achieve stable convergence faster than CNN with limited training samples. Then we use Accuracy, Precision, Recall, and F1-Measure as evaluation indexes. Results show that CapsNet is superior to CNN and traditional machine learning methods on various indicators. Finally, we analyze the reliability of the classification results of CapsNet and CNN. Results show that CapsNet performs better than CNN in terms of reliability. These results all indicate the reliability and practicability of CapsNet for automatic classification of microseismic records with limited samples in underground mining.

Methods
The principle of the CapsNet. At present, the deep learning architecture based on CNN architecture is widely used in various fields, such as image recognition, automatic driving, etc [26][27][28] . However, due to the convolution operation of CNN, only the existence information of the feature is retained in the recognition process, and the orientation of the feature and the spatial relationship are ignored. Moreover, the downsampling of the maxpooling layer discards much crucial information. Therefore, the conventional deep learning method represented by CNN requires much data for training 29 .
The Capsule Network (CapsNet) represents an entirely novel type of deep learning architectures that attempt to overcome the abovementioned disadvantage of conventional deep learning. Figure 9 shows a typical architecture of CapsNet. The architecture is shallow with only two convolutional layers (Conv1 and Conv2 in Fig. 1) and one fully connected (FC) layer 30 . The outputs of each layer are Conv1d, Primary Capsule (PrimaryCaps), and Digit Capsule (DigitCaps). CapsNet was robust to the complex combination of features and required fewer training data. Also, CapsNet has resulted in some unique breakthroughs related to spatial hierarchies between features 32 . A capsule is a vector that can contain any number of values, each of which represents a feature of the object (such as a picture) that needs to be identified 33 . In CNN, each value of the convolutional layer is the result of a convolution operation. The convolution operation is a linear weighted summation, so the value of each convolutional layer is a scalar. However, in CapsNet, each value of the capsule is a vector; that is, this vector can represent not only the characteristics but also the direction and the state of the input.
Moreover, the CapsNet uses the dynamic routing algorithm to achieve data transmission between the capsule layers (as shown in Fig. 10), which overcomes the shortcomings of the traditional pooling layer 34 . In the dynamic routing algorithm, a non-linear "squashing" function (Eq. 6) is used to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1.
where v j is the vector output of the capsule j and s j is its total input. And s j is a weighted sum of all output û j|i of the previous layer. û j|i is produced by multiplying the output u i with a weight matrix W ij .    The initial value of b ij is 0. Therefore, in the forward propagation process of solving s j , we design weight matrix W ij as random values, b ij is initialized to 0 to get c ij , and dynamic update of b ij continuously optimizes the coupling coefficient c ij . This series of calculations finally realized the dynamic routing propagation between the two layers of capsules 35 .
Except that the coupling coefficient c ij is updated by dynamic routing, other convolution parameters of the entire network and W ij in the CapsNet need to be updated according to the loss function: where T k = 1, m + = 0.9, and m − = 0.1 by default. λ enables down-weighting of the loss for absent digit classes stops the initial learning from shrinking the lengths of the activity vectors of all the digit capsules 30 .

Dataset. The Huangtupo Copper and Zinc Mine is located in the southwest of Hami city, Xinjiang Uygur
Autonomous Region, China. Two larger goaf areas (No.1 and No.2 goaf in Fig. 11) have been formed in this mine because of the use of non-pillar sublevel caving. Moreover, as the lower part and the upper part of the ore body are being mined at the same time, a larger and more unstable goaf area (No.3 goaf in Fig. 11) is formed at the mining junction. The volumes of these three goaves are 120,068.60 m 3 ,42,633.25 m 3 , and 183,483.19 m 3 , respectively. Among them, the No.3 goaf area is much larger than the other two, and it is also the most dangerous. As shown in Fig. 11, No.3 goaf area has been interconnected with multiple mining routes, which is a severe crisis.
To understand the stability of the rock mass, a microseismic system was used to perform continuous monitoring of around goaves and stopes. Eight single-component accelerometers with a sensitivity of 10 V/g and a sampling frequency of 10 kHz were embedded in the Huangtupo Copper and Zinc Mine. Their coordinates are shown in Fig. 12.
Hundreds of events are triggered in the Huangtupo Copper and Zinc Mine every day. Considering our processing goal to monitor rock activity and to provide early-warning systems, these events are categorized into four types: microseismic events, blasts, ore extraction, and noise. All events triggered between September 2017 and January 2019 were manually labeled and were selected as our dataset. The example of each type of event is shown in Fig. 13. points repeatedly appear between adjacent frames to avoid a large difference between adjacent frames. As a consequence, we can obtain 33 frames for each microseismic record under the condition that each record includes 10,000 sampling points. The purpose of waveform framing is to preserve the characteristics of the time sequence while transforming the waveform. Moreover, to maintain continuity between adjacent frames and attenuate the frequency leakage caused by signal truncation, each frame is multiplied by the Hamming window after the microseismic records are framed 10 . Assuming that the microseismic record is S(n), n = 1, 2, …, N − 1, multiplying the record by the Hamming window w(n) gives  where N is the number of frames within the framed microseismic record. Then, we extract features of the time and frequency domains from each frame. Table 3 gives an overview of the 21 features employed most frequently in the literature for each frame used in this study. It is worth mentioning that these features are selected by the genetic algorithm (GA)-optimized correlation-based feature selection (CFS) method, for more detail implementation of feature selection, please see the reference 35,36 . Zero-crossing rates are used to determine whether the microseismic record is present in a frame 37 . Energy and energy entropy can be used to indicate signal strength, and the strengths of different types of microseismic records show distinct differences 38 . The spectral centroid, spectral spread, spectral entropy, spectral flux, and spectral rolloff form the low-level spectral features, which aim to describe the structure of the frame spectra using a single quantity 39,40 ; these features can be extracted within either linear or logarithmic frequency domains using spectral amplitudes, power values, logarithmic values, etc. Mel frequency cepstral coefficients (MFCCs) are an interesting variation on the linear cepstrum, which is widely used in signal analysis. MFCCs are the most widely used features in signal recognition, mainly due to their ability to concisely represent the signal spectrum 41,42 . Additionally, the harmonic ratio can be used to indicate the proportion of the signal composed of the non-microseismic record part 43 .
As a consequence, a microseismic record is transformed into a 21 × 33 feature matrix by framing and feature extraction. Figure 14 shows the process and result of the transform. This 21 × 33 feature matrix is the initial input of the CapsNet.  N is the length of the signal; f i (j) is the frequency of the j-th point of the i-th frame; E i (j) is the spectral energy of the corresponding frequency of the i-th frame