Application of an improved naive Bayesian analysis for the identification of air leaks in boreholes in coal mines

Pan, Hong-yu; He, Sui-nan; Zhang, Tian-jun; Song, Shuang; Wang, Kang

doi:10.1038/s41598-022-20504-0

Download PDF

Article
Open access
Published: 27 September 2022

Application of an improved naive Bayesian analysis for the identification of air leaks in boreholes in coal mines

Hong-yu Pan¹,
Sui-nan He¹,
Tian-jun Zhang¹,
Shuang Song¹ &
…
Kang Wang¹

Scientific Reports volume 12, Article number: 16081 (2022) Cite this article

790 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Borehole extraction is the basic method used for control of gases in coal mines. The quality of borehole sealing determines the effectiveness of gas extraction, and many influential factors result in different types of borehole leaks. To accurately identify the types of leaks from boreholes, characteristic parameters, such as gas concentration, flow rate and negative pressure, were selected, and new indexes were established to identify leaks. A model based on an improved naive Bayes framework was constructed for the first time in this study, and it was applied to analyse and identify boreholes in the 229 working face of the Xiashijie Coal Mine. Eight features related to single hole sealing sections were taken as parameters, and 144 training samples from 18 groups of real-time monitoring time series data and 96 test samples from 12 groups were selected to verify the accuracy and speed of the model. The results showed that the model eliminated strong correlations between the original characteristic parameters, and it successfully identified the leakage conditions and categories of 12 boreholes. The identification rate of the new model was 98.9%, and its response time was 0.0020 s. Compared with the single naive Bayes algorithm model, the identification rate was 31.8% better, and performance was 55% faster. The model developed in this study fills a gap in the use of algorithms to identify types of leaks in boreholes, provides a theoretical basis and accurate guidance for the evaluation of the quality of the sealing of boreholes and borehole repairs, and supports the improved use of boreholes to extract gases from coal mines.

A probability prediction method for the classification of surrounding rock quality of tunnels with incomplete data using Bayesian networks

Article Open access 18 November 2022

Machine-learning-based ground sink susceptibility evaluation using underground pipeline data in Korean urban area

Article Open access 03 December 2022

A novel combined intelligent algorithm prediction model for the risk of the coal and gas outburst

Article Open access 25 September 2023

Introduction

One of the most common and dangerous natural risks associated with coal mining is methane, which can mix with air and cause disasters. Extraction of gases from coal mines is a fundamental measure taken to prevent and control disasters and accidents^1,2. Drainage boreholes are used to extract gas from coal seams³. However, the concentration of gas extracted from coal seams by boreholes in China is generally low because of leaks. Air flows through a channel into a borehole and reduces the negative pressure to enable gas extraction. As a result, low concentrations of gas are extracted by boreholes^4,5. The effective identification of the presence and types of gas leaks is vital to improve the efficiency of gas extraction.

Studies of the mechanism of borehole leakage have led to the development of physical models. Zhang T⁶ explained that air leakage was caused by a local change in the strain^7,8 around a borehole. Zhang C⁹ studied the mechanism of air leakage in the cracks around a borehole and concluded that the leakage mechanisms of fractures around boreholes differed depending on the extraction stage. This insight provided a theoretical basis for the classification and identification of leaks in boreholes. To further analyse the flow state and characteristic changes of air leaks in boreholes, some scholars constructed a physical model to determine the mechanism of leakage. Zhang J^10,11 combined numerical simulations of the leakage mechanism around a borehole in coal with the rheological and viscoelastic–plastic characteristics of coal to build a dynamic leakage model of the borehole. Based on an analysis of flow coupling between methane and air in borehole fractures, Fan J¹² constructed a flow model of air leakage coupling components in boreholes by using the finite difference method (FDM). Zhang Y¹³ constructed a physical model of air leakage in boreholes and classified three types of leaks according to their source, i.e., roadway fissure zones, borehole fissure zones, and materials used in sealing sections of boreholes. Wang Z¹⁴ analysed the mechanism of air leakage from boreholes by numerical simulation and established a dynamic leakage model of drainage boreholes. Wang H¹⁵ and Zhang Y¹⁶ discussed the influence of air leakage on gas concentration by studying the influence of factors around roadways and boreholes and constructed an air-gas mixed-flow coupling model. Their physical model explained the mechanisms of gas extraction and air leakage in boreholes and provided a theoretical basis for the classification of air leaks from boreholes. However, the construction of a physical model of air leakage in a drilling hole is complicated and cannot be applied quickly to guide field practice. Therefore, there is still a need for an efficient mathematical model for identification of leaks.

Advances in computer science, applied mathematics and artificial intelligence have promoted in-depth research on identification models for use in coal mining^{17,18,19,20,21,22}. However, the algorithms used to construct these models are subject to limitations. The hierarchical cluster analysis method cannot redistribute existing data and has a small number of iterations. The chaotic immune particle swarm optimization-probabilistic neural network (CIPSO-PNN) optimizes the PNN, but the process of finding the best solution is long, and the model is complex. In Fisher's discriminant analysis, the number, representativeness and correctness of the learned samples directly impact the recognition accuracy of the model. In addition, the algorithms of existing discrimination classification models cannot adapt to differences in the relationships of various data characteristics with multiparameter nonlinearity, which is important for discrimination of leaks in extraction boreholes. naïve Bayes classification, a classification method based on the Bayes principle and independent assumption of feature conditions, has stable classification efficiency²³. Compared with the above classification algorithms, decision trees and artificial neural networks perform better on small amounts of sample data and have the minimum error rate, and they have been widely used in coal mines^24,25. Therefore, they have been applied to identify in air leaks in gas boreholes. However, because of low sensitivity to linear data, improvements are needed.

In summary, research is now relatively mature for the development of models for leaks in boreholes for gas extraction based on studies of the mechanism of air leakage, and models are widely used in the field of coal mining. However, research is limited on using machine learning methods to analyse multisource characteristic information about air leakage and establish a mathematical model for the recognition of leaks from boreholes. In this study, we collected data for leaks from boreholes and applied multisource data fusion theory (MDF) and principal component analysis (PCA). We also improved the traditional naive Bayesian classification (NBC) system and established mathematical models to identify types of air leaks from boreholes. In this study, this model fills a gap by supporting an algorithm to identify types of leaks in boreholes used to extract gases from coal mines, provides a theoretical basis and accurate guidance for the evaluation of the quality of the sealing of boreholes and borehole repairs, and supports the improvement of the application of boreholes to extract gases from coal mines.

Construction of an improved naive Bayesian model for the identification of air leaks from gas drainage boreholes

Feature information selection

According to previous studies^{4,15,26,27,28,29,30,31}, air leaks from gas drainage boreholes can be divided into the three types shown in Table 1.

Table 1 Types of borehole leaks.

Full size table

In Fig. 1, there are many cracks in the coal seam. Due to the poor sealing effect, air from the roadway enters the borehole through cracks in the coal seam, which leads to the leakage of the borehole. In addition, the connections between the extraction pipes are not close, which results in low extraction concentrations. In this paper, according to the actual situation of the gas drainage borehole in the 229 working face of the Xiashijie Coal Mine, eight characteristics can reflect the gas drainage effect of the borehole, including A₁: extraction flow, A₂: gas concentration at 0 m, A₃: gas concentration at 2 m, A₄: gas concentration at 6 m, A₅: gas concentration at 9 m, A₆: gas concentration at 12 m, A₇: negative pressure at the orifice and A₈: negative pressure at the extraction, and they are used in the model for the identification of leaks in boreholes for gas extraction.

Identification model construction

A naive Bayes classifier (NBC) and the eight characteristics (above) were used as the main theory for model construction. Since the NBC could not accommodate the missing data for air leakage in gas extraction boreholes, and since the identification and classification accuracy of information with strong correlations is not high, some data easily have a greater impact on the overall model³². As shown in Fig. 2, by using MDF and principal component analysis (PCA) to improve the traditional NBC, a model for the identification of air leaks from a borehole for gas extraction was established as follows:

Data preprocessing

The existing m-dimensional sample data of gas drainage borehole leakage ${\mathbf{x}} = \left( {x_{1} ,x_{2} \ldots x_{m} } \right)$ with n independent observations, $\left( {{\mathbf{x}}_{{\mathbf{1}}} {\mathbf{,x}}_{{\mathbf{1}}} {\mathbf{ \ldots x}}_{{\mathbf{n}}}^{{\text{T}}} } \right)$, is used as the observation sample to build the gas drainage borehole leakage data matrix:

$${\mathbf{X}} = \left[ {{\mathbf{x}}_{{\mathbf{1}}} {\mathbf{,x}}_{{\mathbf{2}}} {\mathbf{ \ldots x}}_{{\mathbf{n}}} } \right]^{{\mathbf{T}}} = \left[ {\begin{array}{*{20}c} {x_{11} } & {x_{12} } & \ldots & {x_{1m} } \\ {x_{21} } & \ldots & \ldots & \ldots \\ \ldots & \ldots & {x_{ij} } & \ldots \\ {x_{n1} } & \ldots & \ldots & {x_{nm} } \\ \end{array} } \right]$$

(1)

${\mathbf{x}}_{{\mathbf{i}}} = \left( {x_{i1} ,x_{i2} \ldots x_{im} } \right)$ represents the observation sample of group i, i = 1,2…,n, and$x_{ij}$ represents the jth variable of the ith group of observation samples, where j = 1,2…,m.

Following MDF theory, the training samples for gas drainage borehole leaks are processed at the data level³³, and the processed data are standardized with Eqs. (2)–(4).

$$x_{ij}^{1} = \frac{{x_{ij} - \overline{x}_{j} }}{{\sqrt {s_{jj} } }}\quad i = 1,2....{\text{n}};\;\;j = 1,2....{\text{m}}$$

(2)

$$\overline{x}_{j} = \frac{1}{n}\sum\limits_{i = 1}^{n} {x_{ij} }$$

(3)

$$s_{ij} = \frac{1}{m - 1}\sum\limits_{j = 1}^{m} {\left( {x_{ij} - \overline{x}_{j} } \right)^{2} }$$

(4)

In Eqs. (2)–(4), $x_{ij}^{1}$ represents the standardized single sample data, $\overline{x}_{j}$ represents the sample mean for the same characteristic information, and $s_{jj}$ represents the covariance of single sample data. Equations (2)–(4) can eliminate the influence of the data dimension. The standardized gas drainage borehole leakage data are still expressed in X.

Principal component selection

The correlation coefficient matrix of the gas drainage borehole leakage sample data after standardization is calculated as follows:

$$R = \left[ {r_{ij} } \right]_{n*n} = \frac{1}{m}XX^{{\mathbf{T}}}$$

(5)

where

$$r_{ij} = \frac{1}{m - 1}\sum\limits_{i = 1}^{m} {x_{il} x_{lj} \quad i,j = 1,2....{\text{n}}}$$

(6)

The characteristic equation of the sample correlation matrix R is obtained with k eigenvalues and the corresponding k unit eigenvectors:

$$\begin{gathered} \left| {{\mathbf{R}} - \lambda {\mathbf{I}}} \right| = 0 \hfill \\ \lambda_{1} \ge \lambda_{2} \ge \lambda_{3} \ge \ldots \lambda_{m} \hfill \\ \end{gathered}$$

(7)

In Eq. (7), $\lambda$ is the characteristic value of the characteristic equation corresponding to the characteristic information, and the values are sorted according to the size of the characteristic value, from large to small.

The cumulative contribution rate and cumulative variance contribution rate are calculated as follows:

$$z = \frac{{\lambda_{k} }}{{\sum\limits_{i}^{m} {\lambda_{m} } }}$$

(8)

$$z_{i} = \sum\limits_{j = 1}^{k} {\left( {\frac{{\lambda_{k} }}{{\sum\limits_{i}^{m} {\lambda_{m} } }}} \right)}$$

(9)

The principal component $z_{i} \ge 85\%$ is determined to reduce the dimensionality and eliminate information overlap.

Construction of new indexes

The unit eigenvector corresponding to the first k principal components is obtained:

$$a_{i} = \left( {a_{1i} ,a_{2i} \ldots a_{ni} } \right)^{{\mathbf{T}}} ,\;\;i = 1,2,3 \ldots {\text{k}}$$

(10)

Linear transformation with k unit eigenvectors as coefficients yields:

$$Y_{i} = a_{i}^{T} {\mathbf{x}}\quad i = 1,2,3 \ldots {\text{k}}$$

(11)

That is, after orthogonal transformation, potentially correlated variables or influencing factors in the gas drainage borehole leakage data are linearly combined to obtain a set of new linear irrelevant variables, simplify the data structure, extract the data characteristics, and construct a new improved naive Bayes identification index.

As shown in Fig. 3, for the characteristic information obtained in the lower section, the original eight-dimensional sample characteristic information (A₁, A₂…, A₈) is converted into a new p-dimensional identification index (Y₁, Y₂…, Y_k) k < 8. The associated characteristic information is combined and retains most of the information of the original variables³⁴ while eliminating overlapping information.

Modelling

The data matrix ${\mathbf{Y}} = [{\mathbf{y}}_{1} ,{\mathbf{y}}_{2} ...{\mathbf{y}}_{k} ]$ is constructed according to the new leakage index of the drainage borehole. Among them,$y_{i} = [y_{1} ,y_{2} ...y_{n} ]^{{\mathbf{T}}}$, $y_{i}^{\left( j \right)}$ is the jth feature of sample i, $y_{t}^{\left( j \right)} \in \{ a_{j1} ,a_{j2} , \ldots a_{jsn} \}$, and $a_{jl}$ is the possible value of the jth feature, i = 1, 2, 3… n; j = 1, 2, 3… k; l = 1, 2, 3… s_n. The sample category is $G = \{ g_{1} ,g_{2} \ldots g_{T} \}$, $y_{i} \in \{ g_{1} ,g_{2} \ldots g_{T} \}$.

The prior probability and conditional probability of the air leakage category of the extraction borehole are calculated. Because the characteristic information data optimized by the principal component are normally distributed, the Gaussian function is used to determine the conditional probability, as shown in Eqs. (12)–(13).

$$P\left( {Y = {\text{g}}_{t} } \right) = \frac{{\sum\limits_{i = 1}^{n} {I\left( {{\text{y}}_{i} = {\text{g}}_{t} } \right)} }}{n}\quad {\text{t = 1,2,3}} \ldots {\text{T}}$$

(12)

$$P\left( {{\text{y}}_{i}^{(j)} = {\text{a}}_{{jl_{jl} }} \left| {Y = g_{t} } \right.} \right) = \frac{1}{{\sqrt {2\sigma_{{Y = g_{t} }}^{2} } }}e^{{\frac{{ - \left( {q_{{y^{(j)} = a_{jl} }} - u_{{y = g_{t} }} } \right)}}{{2\sigma_{{Y = g_{t} }}^{2} }}}}$$

(13)

In Eq. (13), $u_{{y = g_{t} }}$ is the normalized expected value of the sample data of category $g_{t}$; $\sigma_{{Y = g_{t} }}^{{}}$ is the normalized variance of the sample data of category $g_{t}$. The posterior probability is calculated for the given leakage sample data $y_{i} = [y_{1} ,y_{2} ...y_{n} ]^{{\mathbf{T}}}$ of the extraction borehole.

$$P\left( {Y = g_{t} } \right)\prod\limits_{j = 1}^{k} {P(Y^{(j)} = y^{(j)} \left| {Y = g_{t} } \right.)}$$

(14)

The category of an actual case is determined, and the probability model of gas leakage identification of the extraction borehole is built as shown in Eq. (15):

$$G_{{y_{i} }} = \arg \mathop {\max }\limits_{{g_{t} }} P(Y = g_{t} )\prod\limits_{j = 1}^{k} {P(Y^{(j)} } = y^{(j)} \left| {Y = g_{t} } \right.)$$

(15)

where $G_{{y_{i} }}$ is the maximum posterior probability value of the corresponding category of the leakage of the extraction borehole.

In the actual extraction process, there are gas drainage boreholes with good drainage effects. When the sealing effect is good, the difference in gas concentration at various positions is small. Combined with the air leakage characteristics of the drainage borehole, the gas concentration at different positions in the drainage borehole is defined as $C_{{y_{i} }}^{\left( b \right)}$, i = 1, 2… n, for borehole gas concentration positions b = 0, 1, 2…,b.

$$\frac{{C_{{y_{i} }}^{{\left( {b - 1} \right)}} }}{{C_{{y_{i} }}^{\left( b \right)} }} \ge \frac{{C_{{y_{i} }}^{\left( 0 \right)} }}{{C_{{y_{i} }}^{\left( b \right)} }} \ge 90\%$$

(16)

The corresponding borehole is a borehole with a good gas drainage effect, and there is no need to evaluate the type of leakage. Incorporating Eq. (15), the gas drainage borehole leakage identification model can be constructed as follows:

$$\left\{ {\begin{array}{*{20}l} {\frac{{C_{{y_{i} }}^{{\left( {b - 1} \right)}} }}{{C_{{y_{i} }}^{\left( b \right)} }} > \frac{{C_{{y_{i} }}^{\left( 0 \right)} }}{{C_{{y_{i} }}^{\left( b \right)} }} \ge 90\% } \hfill \\ {G_{{y_{i} }} = \arg \mathop {\max }\limits_{{g_{t} }} P(Y = g_{t} )\prod\limits_{j = 1}^{k} {P(Y^{(j)} } = y^{(j)} \left| {Y = g_{t} } \right.)} \hfill \\ \end{array} } \right.$$

(17)

This identification model can realize the identification of leakage and leakage type.

Model application

Data acquisition and preprocessing

The model was applied to the gas drainage borehole of the 229 working face in the Xiashijie Coal Mine of Tongchuan, as shown in Fig. 4. Mainly No. 4 coal is mined in the working face, the thickness of the coal seam is 0 ~ 34.28 m, the original gas content of the coal seam is 3.48 m³/t, and the gas pressure is 0.4 MPa, which classifies the mine as a high-gas mine. The gas in the coal seam is extracted by a parallel borehole arrangement.

For the purpose of this study, the characteristic information of gas concentration, flow rate and negative pressure at different depths of the borehole were effectively measured. We designed a detection device to connect each borehole and collect data; the device is shown in Fig. 5. By changing the length of the probe, we monitored the gas concentration and extraction flow at different positions in the borehole. The collected data were used to establish the model discussed in this paper.

According to the actual layout of the test boreholes, 30 groups of 240 monitoring data of gas flow, concentration and negative pressure sensors were selected, and they were divided into 18 groups of training samples and 12 groups of test samples according to the ratio of 6:4. The 18 groups of training samples were preprocessed.

Gas drainage borehole leakage data are multidimensional and multivariate^35,36, with complex correlations. MDF theory was used to preprocess the data³⁷. The Newton interpolation method was used to eliminate abnormal values and fill the missing values of the training sample data of the gas leakage borehole³⁸ according to Eqs. (18) and (19):

$$\begin{aligned} & f\left( {x_{n} ,x_{n - 1} , \ldots ,x_{1} ,x} \right) \\ & \quad = \frac{{f\left[ {x_{n - 1} , \ldots ,x_{1} ,x} \right] - f\left[ {x_{n} ,x_{n - 1} , \ldots ,x_{1} ,x} \right]}}{{x - x_{n} }} \\ \end{aligned}$$

(18)

$$\begin{aligned} f\left( x \right) & = f\left( {x_{1} } \right) + \left( {x - x_{1} } \right)f\left[ {x_{2} ,x_{1} } \right] \\ & \quad + \left( {x - x_{1} } \right)\left( {x - x_{2} } \right)f\left[ {x_{3} ,x_{2} ,x_{1} } \right] \\ & \quad + \left( {x - x_{1} } \right)\left( {x - x_{2} } \right) \ldots \left( {x - x_{n} } \right)f\left[ {x_{n} ,x_{n - 1} , \ldots ,x_{1} } \right] \\ \end{aligned}$$

(19)

The missing value corresponding to the x-sequence value was substituted into the calculated value $f\left( x \right)$ to eliminate some abnormal values affecting the overall analysis, fill missing values caused by sensor problems, human operation and other factors, and provide perfect and accurate data for the identification air leaks in boreholes. The complete sample data are shown in Table 2.

Table 2 Multisource data table for gas drainage boreholes.

Full size table

Table 2 shows 18 groups of drilling test sample data, of which 15 groups correspond to boreholes with leaks and 3 groups correspond to boreholes without leaks. Because the gas concentrations in boreholes 16, 17, and 18 show little change at 0 m, 2 m, 6 m, 9 m, and 12 m, and the proportion is more than 90% according to model Eq. (16), the drainage effect was good, and there is no need to identify the type of leak. In addition, Table 2 shows that due to the air leakage of boreholes, the concentration decreased greatly from the bottom to the orifice in the boreholes in groups 1–15.

The leakage data of the first 15 groups of gas extraction boreholes were standardized by Eqs. (2)–(4), as shown in Table 3. The original data were compared with the box diagram of the standardized data. (Box plots can also be used to detect outliers.) Table 2 and Fig. 6a show that the extraction flow rates in the 15 groups of training samples were very similar, approximately 2.0 m³/min, which was relatively low. From the negative pressure of extraction to the negative pressure of the orifice, the pressure loss was obvious. Due to the air leakage in the gas drainage borehole, the differences in gas concentrations between samples in each group at 0 m, 2 m, 6 m, 9 m and 12 m were large, and the distribution of gas concentration in the borehole was not uniform. The specific positions of the different types of air leaks in the gas drainage borehole differed. Figure 6b shows that the range and distribution trend of the standardized data were consistent with the original data. After data standardization, the range of the original data was reduced to [0, 1], and the influence of each data dimension was eliminated, thereby optimizing the data for subsequent PCA to determine the new index for the identification of leaks.

Table 3 Standardized data for gas extraction boreholes.

Full size table

New indexes of the model for the identification of leaks

In this study, PCA was used to linearly combine several representative new indexes for identification of leaks. The correlation between the original feature information should be considered to determine whether the PCA is applicable^39,40. The Kaiser–Meyer–Olkin (KMO) and Butterley sphericity tests were applied in SPSS, as shown in Table 4.

Table 4 KMO and Bartlett tests.

Full size table

As shown in Table 4, the value of Bartlett’s test statistic was 65.343, and the significance level was approximately 0, which was less than the statistical significance level (a = 0.05) specified by SPSS. Thus, the original hypothesis was rejected. That is, the variables in the original data had a statistically significant influence, and the KMO test value was greater than 0.5, which indicated that the air leakage data of the gas drainage borehole were suitable for PCA.

According to the standardized data for air leakage of the gas drainage borehole in Table 3, the correlation coefficient matrix of air leakage characteristics was calculated, as shown in Table 5. The closer the correlation coefficient is to 1, the greater the degree of correlation of the corresponding two groups of characteristics; e.g., the correlation coefficient of gas concentrations at 6 m and 9 m is 0.7447, which indicates a strong correlation. The closer the correlation coefficient is to 0, the smaller the degree of correlation of the corresponding two groups of characteristics; e.g., the correlation coefficient of the gas concentrations at 2 m and 12 m is 0.0593, which indicates a weak correlation. A negative correlation coefficient indicates that the two groups are inversely correlated. For example, the correlation coefficient between the gas concentrations at 9 m and 12 m is − 0.1529.

Table 5 Correlation coefficient matrix for air leakage characteristics of gas drainage boreholes.

Full size table

As shown in Table 5, some of the 8 selected gas drainage borehole leakage characteristics are strongly correlated. Using these 8 kinds of characteristic data to identify gas drainage borehole leakage will lead to an incorrect decision, thus affecting the accuracy of the identification model. Therefore, it is necessary to analyse the training sample data via PCA to obtain the eigenvalue and contribution of each feature and select the appropriate principal component to eliminate the strong correlations from the feature data.

The eigenvalues and cumulative contribution rates of the eight types of feature information were obtained through calculations and analysis, as shown in Table 6. The characteristic value of A₁: extraction flow was the largest, and the contribution of its variance contribution was also the largest. The characteristic value and contributions of A₂–A₈ decreased in turn, and the contributions were small; the 6th–8th principal component, A6–A8, were ignored. The cumulative contribution rate of extraction flow and gas concentrations at 0 m, 2 m, 6 m and 9 m reached 95.54%. According to Eq. (9), the contributions of these variables were more than 85%, and they were preliminarily considered as the main identification indexes of the improved naive Bayesian extraction leakage identification model. In a practical sense, the flow rate and concentration are the main variables of gas extraction in the borehole. The concentrations at 0 m, 2 m, 6 m and 9 m can reflect concentration changes in the borehole. The negative pressure has a linear relationship with the concentration and flow rate, and a change in negative pressure affects the concentration and flow rate; thus, it is advisable to select 5 principal components.

Table 6 Eigenvalues of correlation coefficients.

Full size table

To further confirm the rationality of this selection, a scree plot was used. A scree plot is a trend map that reflects changes in data characteristics. The steepness of the decrease in eigenvalues shows whether the selected features are correct and reasonable. The scree plot in Fig. 7 shows that the slope k1 of principal components A₁–A₅ is − 0.63645, and the trend is steep. The slope k₂ of principal components A5–A8 is − 0.11931, and the trend is relatively flat. Principal component A5 is an inflection point, and thus, it is reasonable to select these five variables as the principal components.

The original A1–A8 feature information with strong correlations was reconstructed into the selected principal component features Y1–Y5, and the component analysis matrix table of Y1–Y5 was established according to the PCA (Table 7) to establish the new feature information index of gas drainage borehole leakage.

Table 7 Component analysis matrix.

Full size table

Among the new indexes Y₁–Y₅, the higher the load coefficient corresponding to the original feature is, the closer the relationship between the feature information and the new indicator, which is the main influence quantity in the new index. According to the component analysis matrix in Table 7, the new index coefficient expression of gas drainage borehole leakage is:

$$\begin{aligned} Y_{1} & = 0.15881A_{1} + 0.44685A_{2} + 0.34194A_{3} + 0.30134A_{4} + 0.37329A_{5} + 0.35162A_{6} + 0.50706A_{7} + 0.21746A_{8} \\ Y_{2} & = 0.03039A_{1} - 0.39790A_{2} + 0.33001A_{3} + 0.46679A_{4} + 0.42654A_{5} - 0.46037A_{6} - 0.25446A_{7} + 0.23521A_{8} \\ Y_{3} & = 0.57660A_{1} - 0.10068A_{2} - 0.34096A_{3} - 0.30251A_{4} + 0.04862A_{5} - 0.06492A_{6} - 0.04217A_{7} + 0.66429A_{8} \\ Y_{4} & = 0.76498A_{1} - 0.06039A_{2} - 0.32301A_{3} - 0.08075A_{4} - 0.19592A_{5} - 0.15732A_{6} - 0.23946A_{7} - 0.42281A_{8} \\ Y_{5} & = - 0.20849A_{1} + 0.00492A_{2} + 0.62830A_{3} - 0.21162A_{4} - 0.43937A_{5} + 0.19106A_{6} - 0.24950A_{7} + 0.47451A_{8} \\ \end{aligned}$$

According to the expression for the characteristic information and the PCA matrix (Table 7), the gas concentrations A₂–A₆ in the first principal component Y₁ at different depths and the load coefficient of negative pressure at orifice A₇ were greater than those of the other indexes, and this was the main characteristic influence of the first principal component index Y₁. Therefore, the principal component index of Y₁ was interpreted as the influencing factor of negative pressure-concentration hole leakage. The load coefficient of A₃–A₅ in the second principal component index Y₂ was higher, so the second principal component index Y₂ was interpreted as the influencing factor of hole depth-concentration borehole leakage. By analogy, Y₃ was the influencing factor of negative pressure-flow borehole leakage, Y₄ is the influencing factor of single flow borehole leakage, and Y₅ was the negative pressure-hole gas concentration borehole leakage factor.

Analysis of test sample results

Y₁, Y₂, Y₃, Y₄ and Y₅ were used as the new identification indexes of the improved naive Bayesian extraction borehole leakage model, and the prior probability under the new index identification was calculated by Eqs. (12)–(13). According to the numerical relationship between the score of the new index and the leakage type of the above 15 groups of samples, the training was performed by MATLAB platform programming, as shown in Table 8. The final score of each index was taken as the training sample of the improved NBC.

Table 8 Improved naive Bayesian training samples.

Full size table

To verify the accuracy and reliability of the model, 12 sets of gas drainage borehole data corresponding to three different types of leaks in the Xiashijie Coal Mine were collected as test samples, as shown in Table 9. A total of 240 data points from 30 groups of training samples and test samples were divided into a verification set and a test set according to a ratio of 0.6, which prevented a poor model identification rate and overfitting caused by a verification set that was too large as well as inaccurate model verification caused by a test set sample that was too small. In this study, we adopted the hold-out verification method, namely, the twofold cross-validation method. The schematic diagram of k-fold cross-validation is shown in Fig. 8^41,42. The data set was divided into a training set and a test set for verification, and the average and accuracy of the final verification results were calculated.

Table 9 Test sample data.

Full size table

We classified the 12 test sample data using single NBC identification and an improved naive Bayes extraction borehole identification model. The results of the analysis are shown in Table 10. Single NBC identification identified 3 boreholes with type I leaks, 5 boreholes with type II leaks, and 2 boreholes with type III leaks, but it could not identify boreholes without leaks. Improved naive Bayesian identification successfully identified 2 boreholes without leaks, 3 boreholes with type I leaks, 4 boreholes with type II leaks, and 3 boreholes with type III leaks.

Table 10 Improved naive Bayesian identification results.

Full size table

Table 10 indicates that the single NBC identified the leakage of the No. 8 gas drainage borehole factors, which resulted in errors; this approach could not identify whether the gas drainage borehole was leaking. The recall rate was 75%, and the training time of the identification was 0.0045 s. The identification and analysis recall rate of the improved model was type II, and its real type was type III. This was because the mutual influence between the original eight characteristic Bayesian air leakage identification models of gas extraction boreholes was 98.9%, the identification accuracy improved by 31.8%, and the training time decreased to 0.0020 s, which was an improvement of 55%. To further analyse the error rate, a confusion matrix comparison diagram was drawn⁴³. It showed that the improved naive Bayesian gas extraction borehole leakage identification model could fully identify the type of borehole leakage in the Xiashijie coal mine, the identification accuracy was high, and the identification rate was fast.

As shown in Fig. 9, the single naive Bayesian identification failed to identify boreholes without leaks due to the inability to calculate eigenvalues. This resulted in the effective identification of only 10 out of 12 groups, and type II boreholes were mistakenly identified as type III boreholes. The improved naive Bayes model accurately identified 12 groups of boreholes, and the true value of the improved naive Bayesian model was consistent with the predicted value. Thus, the identification analysis showed that the improved naive Bayesian gas drainage borehole leakage identification model was superior to the single NBC identification analysis. Depending on its superiority, it could more accurately identify the type of air leak and provide further guidance for borehole sealing and repair to improve the efficiency of gas extraction and prevent gas disasters.

Conclusions

1)
Through multisource data fusion theory (MDF) and principal component analysis (PCA), the traditional naive Bayes method was improved, and an enhanced naive Bayes air leakage identification model of gas drainage boreholes was constructed. The new model overcame the shortcomings of the naive Bayes method that could not accommodate missing and nonstandard data and eliminated the misevaluation caused by the superposition of a large amount of feature information in the process of identification of leaks in gas drainage boreholes.
2)
The model was applied to the 229 working face of the Xiashijie Coal Mine. Combined with 8 types of characteristic information of gas drainage boreholes. Thirty groups of 240 gas drainage borehole data were divided into training samples and test samples for analysis, and 12 groups of gas drainage borehole test sample data were successfully identified, including 2 boreholes without leaks, 3 boreholes with type I leaks, 4 boreholes with type II leaks, and 3 boreholes with type III leaks, which were consistent with the conditions of the actual gas drainage boreholes. Thus, this study provides a basis for improving gas drainage efficiency and ensuring safe mining in the Xiashijie Coal Mine.
3)
The feasibility of the model was verified by the hold-out method. The recall rate of model identification analysis was 98.9%, and the running time was 0.0020 s. Compared with the single naive Bayes method, the operation rate increased by 55%, and the identification accuracy increased by 31.8%. The improved model filled the gap related to the determination and identification of leaks in boreholes and provides a theoretical basis for the evaluation of the quality of sealing and borehole repairs.

Data availability

All data generated or analysed during this study are included in this published article.

References

Cheng, L. et al. A sequential approach for integrated coal and gas mining of closely-spaced outburst coal seams: Results from a case study including mine safety improvements and greenhouse gas reductions. J. Energies. 11(11), 3023 (2018).
Article CAS Google Scholar
Niu, Y. et al. Experimental study and field verification of stability monitoring of gas drainage borehole in mining coal seam. J. Pet. Sci. Eng. 189, 106985 (2020).
Article CAS Google Scholar
Lin, B. et al. Significance of gas flow in anisotropic coal seams to underground gas drainage. J. Pet. Sci. Eng. 180, 808–819 (2019).
Article CAS Google Scholar
Liu, P., Jiang, Y. & Fu, B. A novel approach to characterize gas flow behaviors and air leakage mechanisms in fracture-matrix coal around in-seam drainage borehole. J. Nat. Gas Sci. Eng. 77, 103243 (2020).
Article Google Scholar
Liu, P. et al. Evaluation of underground coal gas drainage performance: Mine site measurements and parametric sensitivity analysis. J. Process Saf. Environ. Prot. 148, 711–723 (2021).
Article CAS Google Scholar
Zhang, T. et al. Strain localization characteristics of perforation failure of perforated specimens. J. China Coal Soc. 45(12), 4087–4094. https://doi.org/10.13225/j.cnkj.jccs.2019.143 (2020) ((in Chinese)).
Article Google Scholar
Wang, K., Pan, H. & Zhang, T. Experimental study of prefabricated crack propagation in coal briquettes under the action of a CO₂ gas explosion. J. ACS omega. 6(38), 24462–24472 (2021).
Article CAS Google Scholar
Wang, K. et al. Experimental study on the radial vibration characteristics of a coal briquette in each stage of its life cycle under the action of CO₂ gas explosion. J. Fuel. 320, 123922 (2022).
Article CAS Google Scholar
Zhang, C. et al. Experimental research and field application of anti-sloughing support material in gas extraction borehole sealing section. J. Min. Saf. Eng. 38(1), 199–205. https://doi.org/10.13545/j.cnki.jmse.2020.0029 (2021) ((in Chinese)).
Article Google Scholar
Junxiang, Z., Bo, L. & Yuning, S. Dynamic leakage mechanism of gas drainage borehole and engineering application. Int. J. Min. Sci. Technol. 28(3), 505–512 (2018).
Article Google Scholar
Zhang, J. et al. A fully multifield coupling model of gas extraction and air leakage for in-seam borehole. J. Energy Rep. 7, 1293–1305 (2021).
Article Google Scholar
Fan, J. et al. A coupled methane/air flow model for coal gas drainage: Model development and finite-difference solution. J. Process Saf. Environ. Prot. 141, 288–304 (2020).
Article CAS Google Scholar
Zhang, Y., Zou, Q. & Guo, L. Air-leakage Model and sealing technique with sealing–isolation integration for gas-drainage boreholes in coal mines. J. Process Saf. Environ. Prot. 140, 258–272 (2020).
Article CAS Google Scholar
Wang, Z. et al. A coupled model of air leakage in gas drainage and an active support sealing method for improving drainage performance. J. Fuel. 237, 1217–1227 (2019).
Article CAS Google Scholar
Wang, H. et al. Study on sealing effect of pre-drainage gas borehole in coal seam based on air-gas mixed flow coupling model. J. Process Saf. Environ. Prot. 136, 15–27 (2020).
Article Google Scholar
Zhang, Y. et al. A novel failure control technology of cross-measure borehole for gas drainage: A case study. J. Process Saf. Environ. Prot. 135, 144–156 (2020).
Article CAS Google Scholar
Liu, Q. et al. Application of the comprehensive identification model in analyzing the source of water inrush. Arabian J. Geosci. 11(9), 1–10 (2018).
Article Google Scholar
Hui, L. & Xiaojun, Z. Predictive analysis of impact hazard level of coal rock mass based on fuzzy inference network. J. Intell. Fuzzy Syst. 38(2), 1509–1518 (2020).
Article Google Scholar
Jiang, C. et al. Identification model and indicator of outburst-prone coal seams. Rock Mech. Rock Eng. 48(1), 409–415 (2015).
Article ADS Google Scholar
Wang, H. & Zhang, Q. Dynamic identification of coal-rock interface based on adaptive weight optimization and multi-sensor information fusion. Inf. Fusion. 51, 114–128 (2019).
Article Google Scholar
Li, N., Feng, X. & Jimenez, R. Predicting rock burst hazard with incomplete data using Bayesian networks. Tunn. Undergr. Space Technol. 61, 61–70 (2017).
Article Google Scholar
Li B., Wu Q., Liu Z. Identification of mine water inrush source based on PCA-FDA: Xiandewang coal mine case. J. Geofluids. 2020, (2020).
Caruana R, Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. C. Proceedings of the 23rd international conference on Machine learning. 161–168(2006).
Dimitrios, P. & Andreas, B. Enhancing machine learning algorithms to assess rock burst phenomena. Geotech. Geol. Eng. 39(8), 5787–5809 (2021).
Article Google Scholar
Huang, P. et al. Research on piper-PCA-bayes-LOOCV discrimination model of water inrush source in mines. Arabian J. Geosci. 12(11), 1–14 (2019).
Article Google Scholar
Ba Q. Research on gas leakage mechanism and detection technology of coal mine gas drainage. C. IOP Conference Series: Earth and Environmental Science. IOP Publishing. 446(5), 052018 (2020).
Zhang, X. et al. Study on the influence mechanism of air leakage on gas extraction in extraction boreholes. J. Energy Explor. Exploit. 40(5), 1344–1359 (2022).
Article Google Scholar
Zheng, C. et al. Effects of coal properties on ventilation air leakage into methane gas drainage boreholes: Application of the orthogonal design. J. Nat. Gas Sci. Eng. 45, 88–95 (2017).
Article CAS Google Scholar
Hao, J. et al. Analysis of gas leakage field and location determination of gas leakage in surrounding rock of gas extraction borehole. J. Coal Eng. 51(5), 143–147. https://doi.org/10.11799/ce201905033 (2019) ((in chinese)).
Article Google Scholar
Zhou, H., Shen, K. & Chen, B. Classification of leakage types and application of efficient holesealing technology for gas drainage drilling. J. Min. Saf. Prot. 46(01), 33–3642. https://doi.org/10.3969/j.issn.1008-4495.2019.01.008 (2019) ((in chinese)).
Article Google Scholar
Ping, G. Study on leakage model of gas extraction borehole and optimization of sealing process. J. Coal Technol. 39(06), 82–85. https://doi.org/10.13301/j.cnki.ct.2020.06.025 (2020) ((in chinese)).
Article Google Scholar
Zhao, Y. & Tian, S. Identification of hidden disaster causing factors in coal mine based on Naive Bayes algorithm. J. Intell. Fuzzy Syst. 41(2), 2823–2831 (2021).
Article MathSciNet Google Scholar
He, Y. et al. Rock hardness identification based on optimized PNN and multi-source data fusion. J. Proc. Inst. Mech. Eng., Part. C-J. Mech. Eng. Sci. 236(7), 3701–3716 (2022).
Article Google Scholar
Uddin, M. P., Mamun, M. A. & Hossain, M. A. Effective feature extraction through segmentation-based folded-PCA for hyperspectral image classification. Int. J. Remote Sens. 40(18), 7190–7220 (2019).
Article Google Scholar
Ju, Q. & Hu, Y. Source identification of mine water inrush based on principal component analysis and grey situation decision. J. Environ. Earth Sci. 80(4), 1–14 (2021).
Article Google Scholar
Zhou, F., Wang, X. & Liu, Y. Gas drainage efficiency: An input–output model for evaluating gas drainage projects. J. Nat. Hazard. 74(2), 989–1005 (2014).
Article Google Scholar
Cai, J. et al. Numerical analysis of multi-factors effects on the leakage and gas diffusion of gas drainage pipeline in underground coal mines. J. Process Saf. Environ. Prot. 151, 166–181 (2021).
Article CAS Google Scholar
Pérez-Ortiz, J. A. et al. Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets. J. Neural Netw. 16(2), 241–250 (2003).
Article Google Scholar
Öcal, M. E. et al. Industry financial ratios—application of factor analysis in Turkish construction industry. J. Build. Environ. 42(1), 385–392 (2007).
Article Google Scholar
Zhang, J. et al. Investigation of carbon dioxide emission in China by primary component analysis. J. Sci. Total Environ. 472, 239–247 (2014).
Article ADS CAS PubMed Google Scholar
Ji, J. et al. Application of GSK-XGBOOST Model in prediction of bottom hole air temperature. J. China Saf. Sci. Technol. 18(03), 131–136. https://doi.org/10.11731/j.issn.1673-193x.2022.03.020 (2022) ((in chinese)).
Article Google Scholar
Liu, Y. & Wang, Y. Review of various cross-validation estimation methods of generalization error. J. Appl. Res. Comput. 32(5), 1287–1290, 1297. https://doi.org/10.3969/j.issn.1001-3695.2015.05.002 (2015) ((in chinese)).
Article CAS Google Scholar
Yang, X. Survey for performance measure index of classification learning algorithm. J. Compu. Sci. 48(8), 209–219. https://doi.org/10.11896/jsjkx.200900216 (2021) ((in chinese)).
Article ADS Google Scholar

Download references

Acknowledgements

This research was financially supported by The National Natural Science Foundation of China (51874234, 52104215). The authors are also grateful to the anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

College of Safety Science and Engineering, Xi’an University of Science and Technology, Xi’an, 710054, People’s Republic of China
Hong-yu Pan, Sui-nan He, Tian-jun Zhang, Shuang Song & Kang Wang

Authors

Hong-yu Pan
View author publications
You can also search for this author in PubMed Google Scholar
Sui-nan He
View author publications
You can also search for this author in PubMed Google Scholar
Tian-jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Song
View author publications
You can also search for this author in PubMed Google Scholar
Kang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The main research idea and manuscript preparation were congtributed by H.P.; S.H. drafted the manuscript and verified the research; S.S. and T.Z. gave several suggestions from a professional point and supervised the manuscript; K.W. assisted on finalizing research work and manuscript.All authors have read and agree to the published version of the manuscript.

Corresponding author

Correspondence to Sui-nan He.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pan, Hy., He, Sn., Zhang, Tj. et al. Application of an improved naive Bayesian analysis for the identification of air leaks in boreholes in coal mines. Sci Rep 12, 16081 (2022). https://doi.org/10.1038/s41598-022-20504-0

Download citation

Received: 11 May 2022
Accepted: 14 September 2022
Published: 27 September 2022
DOI: https://doi.org/10.1038/s41598-022-20504-0

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.