Risk assessment of coal mine water inrush based on PCA-DBN

To provide an effective risk assessment of water inrush for coal mine safety production, a BP neural network prediction method for water inrush based on principal component analysis and deep confidence network optimization was proposed. Because deep belief network (DBN) is disadvantaged by a long training time when establishing a high-dimensional data classification model, the principal component analysis (PCA) method is used to reduce the dimensionality of many factors affecting the water inrush of the coal seam floor, thus reducing the number of variables of the research object, redundancy and the difficulty of feature extraction and shortening the training time of the model. Then, a DBN network was used to extract secondary features from the processed nonlinear data, and a more abstract high-level representation was formed by combining low-level features to find the expression of the nonlinear relationship between the characteristics of water in bursts. Finally, a prediction model was established to predict the water inrush in coal mines. The superiority of this method was verified by comparing the prediction of the actual working face with the actual situation in typical mining areas of North China. The prediction accuracy of coal mine water inrush obtained by this algorithm is 94%, while the prediction accuracy of traditional BP algorithm is 70%, and the prediction accuracy of SVM algorithm is 88%.

Geology, Chinese Academy of Sciences, put forward the theory of a "strong seepage channel" in the 1990s, which believes that the presence of a water inrush channel is the key to the occurrence of water inrush 2 . Qian Minggao, an academician at the China University of Mining and Technology, proposed the KS theory of key strata of stope floor rock according to the layered structure characteristics of floor rock 18 . But most of these prediction methods for coal mine water inrush are limited to the evaluation of a key control factor, which is based on the calculation method of geological theory. The nonlinear dynamic characteristics of the occurrence process of water inrush and a va Main controlling factors of water inrush from coal seam floor. 5 major factors influence coal seam water inducing: confined aquifer, coalfield geological structure, Water barrier condition, aquifer performance and mine pressure failure development zone.
Aquifer conditions. An aquifer conditions provides water and power for water inrush. The main influencing factors of a confined aquifer are aquifer water pressure, working face distance and aquifer thickness.
Coal seam condition. The dip Angle of coal seam is the main factor affecting the depth of mining failure.
The decrease of mining failure depth can effectively strengthen the thickness of waterproof layer. When the thickness of the coal seam is thick, it needs to be mined in layers, and each layer will destroy the integrity of the floor and reduce the waterproof performance.

Structure condition.
The geological structure provides water-inrush passages for water inrush, and main factors influencing water inrush accidents is faults. The main impacts of faults are, groundwater passage will be formed due to the stress damage caused by the fractured rock stratum and coal mining. And the mechanical strength of the fault zone rock is greatly reduced because of the tectonic stress.
Water barrier condition. The coal seam aquifer is a water-repellent layer between the seam floor and the aquifer, which has an inhibitory effect on coal water inrush. The combination of lithology influenced the water barrier performance of the aquifer . Stratigraphic lithology indicate the mechanical strength of the rock layer or its ability to resist water pressure.
Mining condition. The original pressure balance of the mine is destroyed by the mine excavation works, and the resulting changes in the geological and hydrological conditions of the coal seam will induce water-inrush accidents. The main influencing factors of confined water pressure are mining area, strike length and mining height. The Influencing factors of coal seam water inrush as shown in Table 1.

The theory of methods
PCA. PCA is a dimension reduction algorithm. The principle is that the use of multiple indicators through linear transformation converts the comprehensive indicators of several unrelated indicators to each other, and according to certain rules to classify the integration of the comprehensive index, never reduces the dimension of the original data, extracts the main information in the original data, and minimizes the information loss in the process of the dimension reduction algorithm.
There is information overlap among the variables influencing the occurrence of water inrush, which will increase the cost and time of the classification prediction algorithm and reduce the success rate of its prediction. PCA is used to carry out dimensionality reduction processing on the original feature data, eliminate redundant information within the acceptable loss range, save the key evaluation index factors, and realize the dimensionality reduction of the evaluation index 19 .

RBM.
RBM is a probabilistic abrupt model that can be explained by a stochastic neural network. In the classic RBM structure, neurons located in the same layer have no correlation with each other. This structure is developed on the basis of a Boltzmann machine (BM), which solves the shortcoming of the unacceptably slow training speed of traditional RM and improves the training speed of the network 20 . RBM is composed of two layers of neurons as shown in Fig. 1. There is undirected full connection between different neurons, and there is no connection between neurons in the same layer. Data are input by the visual layer and output by the hidden layer after training by neurons and weight matrix.
With given a cell node (v, h), the energy function of RBM is Based on the energy function, the following probability distribution under the condition Θ = (w n×m , a, b) can be obtained: Z is the normalized coefficient. The activation probabilities of h and v are obtained after the activation function sigmoid: The core formula of the RBM algorithm is the activation formula of h and v. Data are input from the visual layer, and the characteristic index is mapped from the visual layer to the neurons of the hidden layer through Eq. (5). Then, the output value obtained is reconstructed to the visual layer v through Eq. (6), and the error between the reconstructed data in the original data domain is calculated. The weight parameters between the visible and hidden layers are adjusted by the error minimization rule so that the reconstructed data can represent the original input data to the maximum and achieve the goal of feature extraction. In fact, the goal of the training process of the RBM algorithm is to solve the Markov maximum likelihood estimation problem; that is, under the condition of fixed data input, the P Θ (v) value is maximized by adjusting the internal parameters of the RBM.

DBN network structure.
A DBN is composed of multiple stacked RBMs, which construct a typical DBN network model. Compared with the shallow neural network, this kind of stacked DBN structure has a deeper network level and better model generalization ability. Traditional neural networks rely on the selection of data features, while DBN can extract hidden features from input data by setting multiple hidden layers 21 . (1)

Hidden layer
Visual layer www.nature.com/scientificreports/ The DBN is composed of a cascading RBM and a back propagation algorithm adopted in the top layer as shown in Fig. 2. The algorithm training process is divided into two parts: pretraining and parameter fine-tuning. Pretraining means that the input data are trained layer by layer unsupervised by the bottom RBM, and the output of the previous layer will be used as the input data of the upper-layer RBM. This structure can effectively screen out the feature information. The parameter fine-tuning process involves overall tuning and supervised training. The error between the expected data in the output data domain is propagated back layer by layer to fine-tune the parameters of the entire network 22 . The original data is shown in Table 2.
The DBN prediction model. The coal mine water inrush accident data presents non-linear, high-dimensional characteristics, and there are complex interrelationships among various water inrush accident-related factors. Most of the current prediction and evaluation methods cannot effectively extract a large number of hidden features in the data, resulting in a more partial water inrush accident model, which affects the prediction accuracy and cannot provide effective support for safely mining in coal mines. Therefore, there are two main aspects of model design ideas in this paper: converting high-dimensional influencing factors into low-dimensional, easy-to-train data and more complete extraction of features in the data 23 .
The PCA data dimensionality reduction. The PCA algorithm is used to perform nonlinear dimensionality reduction on the main control factors of coal mine water inrush and to standardize the data proof of the coal mine's actual sampling. SPSS software is used to perform principal component analysis on the corresponding measured data. The selection criterion of principal components is that the cumulative variance contribution rate must exceed 80%. Since the cvcp value of the first to the sixth principal component is approximately 83%, these six components contain most of the information required for water inrush prediction, and thus, the first 6 components are used for floor water inrush evaluation. The contribution rate and cumulative contribution rate of principal components are shown in Table 3.     Table 4 is entered into the DBN model, and the results are shown in Table 5, Figs. 4 and 5. There are three incorrect predictions, which means that the correct rate is 94%. The reason for the incorrect prediction sample may be the result of an insufficient sample size and missing features in dimensionality reduction. In the training process, better dimensionality reduction methods can improve the accuracy of the algorithm. The correct rate of the BP neural network using oversampling data is 80%, the correct rate of the water burst coefficient method is 60%, and the SVM algorithm using the SMOTE oversampling data is 88%, and the accuracy rate of the DBN algorithm trained with the unexpanded data training set is 85%. It can be seen from Table 6 that the accuracy rates of the water inrush risk prediction models proposed in this paper are better than the rates of these method.
The model proposed in this paper can be directly applied to the prediction of water inrush from coal fields in North China. The prediction results show that DBN can effectively extract features. DBN has good performance for nonlinear and interrelated data, such as water inrush influencing factors. The preprocessing function can effectively improve the prediction effect of the BP neural network. In summary, the DBN prediction model Table 4. Part of the data after PCA dimensionality reduction.    www.nature.com/scientificreports/ based on PCA has a good predictive effect on water inrush data. It can also make a more accurate water inrush risk assessment for coal mine safety production.

Conclusion
There are many risk factors affecting coal floor water inbursts, and some data are redundant. Principal component analysis reduces the data dimension without damaging the integrity of the data and saves the cost of the training algorithm. By training relative to the original features of PCA and BP, the PCA-DBN model is more effective for extracting the characteristics of water inrush that influence the original data, improving the training accuracy and generalizing the performance of the model. As a result, the PCA-DBN model can eliminate the defects of traditional algorithms for feature selection, extract implicit characteristics in complex hydrogeological information, and effectively filter the missing and noise data to establish a more reliable evaluation model for water inrush accidents. The case analysis shows that the predicted value of the model is consistent with the actual situation of water inrush in coal mines, and the following conclusions are drawn: (1) The multidimensional redundant input data will complicate the structure of the DBN. PCA is used to reduce the dimensionality of the data, extract the nonlinear features of the high-dimensional data, and input them into the deep confidence network, which can simplify the network structure and improve the accuracy of the model. (2) Compared with the traditional BP network, the PCA-BP network model and the water inburst coefficient method, the PCA-DBN model proposed in this paper has the highest prediction accuracy. In subsequent research, the network model can be optimized from the structure of the DBN network itself, and other algorithms can be integrated to further improve the model's accuracy [24][25][26]