Abstract
Based on the nonlinear algorithmic theory, the R-SVM water source discrimination model and prediction method were established by using the piper qualitatively to compare the differences between the ionic components and R-type factor approximation indicator input dimensions. Taking the mine water samples of Zhaogezhuang Coal Mine as an example, according to the chemical composition analysis of the water samples from different monitoring points, six indexes of Na+, Ca2+, Mg2+, Cl–, SO42– and HCO3– were selected as the discrimination factors. According to the water characteristics of each aquifer and the actual needs of discrimination, the water inrush sources in the mining area were divided into four categories: The goaf water is class I, Ordovician carbonate is class II, Sandstone fracture water from the 13 coal system is class III, and Sandstone fracture water from the 12 coal system is class IV. Taking 56 typical water inrush samples as training samples, 11 groups for prediction samples, establish the input index as typical ion content, output as water source type, using SPSS statistics and MATLAB to realize the R-SVM water source discriminant analysis model, automatically establishing the mapping relationship between the water quality indexes and the evaluation standards, which can achieve the purpose of rapid and accurate discrimination of the water sample data. The results showed that the accuracy of the R-SVM model classification was 90.90% in the verification of the water source discrimination example of Zhaogezhuang mine and the coupled model has high accuracy, good applicability and discriminant ability, and has certain guiding significance for the prevention and control of water damage and the related field work.
Similar content being viewed by others
Introduction
With the development of economy and society, the demand for mineral resources is steadily escalating. Mineral resources serve as the indispensable material foundation for human production activities1,2,3. Over the years, the development and utilization of mineral resources have necessitated a shift in mining focus, transitioning coal mines towards the extraction of intricate refractory mining bodies, such as deep orebody, broken soft orebody, alpine area orebody and low-grade orebody, and “three lower and one upper” ore bodies4. As mining intensity and depth increase, the extraction of mineral resources within complex geological structures becomes more challenging, giving rise to a surge in engineering predicaments. Among these challenges, mine water disasters emerge as a prominent threat to mining operations. Hence, the timely and precise identification of water source categories, constitutes essential prerequisites for averting water-related disasters and establishing a scientific foundation for swift rescue and management endeavors5,6.
Water chemistry data plays a crucial role in understanding the fundamental characteristics of aquifers and is vital for discriminating water sources7. Qualitative and quantitative methods are commonly employed to analyze water chemistry information for this purpose. Qualitative analysis, combined with water level dynamics, provides a rough determination of the syncline level. Piper's trilinear water chemistry analysis, on the other hand, is a convenient and visual tool for water quality classification and ion distribution8. The modified D-Piper trilinear diagram provides a solution for the challenge of visualizing ion distribution in large data sets9, leading to improved visualization and interpretation with an increase in data points. In addition, it is crucial to consider physicochemical information such as isotopes and radioactive elements in water bodies to reflect the essential characteristics and historical evolution of hydrogeology. The hydrogeochemical distribution, recharge sources, indicator tests, influencing factors, and evolutionary laws are analyzed based on conventional water chemistry, trace elements, and isotopes of the aquifer10. Gibbs’ semi-qualitative model11 is employed to analyze the hydration types of surface water and shallow groundwater, providing insights into the controlling factors, formation mechanisms, and recharge sources of isotopes in various aquifers. This analysis reveals the distinct weathering and hydration characteristics of different water bodies. However, qualitative methods alone face limitations in similar aquifers due to the ambiguous relationship between indicators, overlapping water quality characteristics, and unclear distribution boundaries12. To overcome these limitations, quantitative analysis13 is utilized to uncover the inherent laws of water chemistry data, establish mathematical models for determining water source types, elucidate the close connection between water quality indicators and determination criteria, and minimize the errors associated with qualitative analysis methods.Fisher function discrimination of water source locations based on fuzzy clustering and factor analysis14,15 and Bayes classification of water sources16,17 are employed to determine the water sources of sudden water in the mine area, with improved accuracy of discrimination. Groundwater is subject to multiple factors coupling due to the variability of mine geological structure, the complexity of hydrogeological characteristics, and the diversity of mining conditions, resulting in fuzzy connections and complex nonlinear relationships between water quality indicators and discriminatory criteria. However, model studies for index simplification through data dimensionality reduction are limited, and the redundancy of information between water chemical components reduces discriminative accuracy, requiring further optimization of the discrimination model.
This study addresses the water quality assessment system by introducing a novel approach that combines qualitative and quantitative analysis. A key contribution of this research is the utilization of Piper's trilinear diagram graphical method to analyze the variation pattern of ionic composition in aquifers and water chemistry characteristics through point mapping. By comparing the differences in ionic composition among aquifers and evaluating the proximity to the target water body, an initial classification of water quality is established.This fills the gap in existing research on risk factor internal information mining and machine learning, and provides a foundation for subsequent quantitative water source discrimination. To achieve this, a coupled discrimination model, integrating the R-factor and Support Vector Machine, is developed to uncover inherent characteristics within water chemistry data and automatically establish the mapping relationship between water quality indices and evaluation criteria. This innovative approach enables precise identification of water source types and provides valuable guidance for effective water damage control in practical engineering applications.
Theoretical basis
Principle of R-factor dimensionality reduction
There are m test variables \(Z_{i} (i = 1,2,3, \cdots ,m)\), which may be correlated, and each \(Z_{i}\) contains independently existing common factor \(f_{j} \left( {j = 1,2, \cdots ,p} \right)\), \(P \le m\) where \(Z_{i}\) contains m mutually uncorrelated unique factors \(u1,u2,u3, \cdots ,um\), and u and f are mutually uncorrelated. Each Z can be linearly characterized by f and u as18:
Expressed as matrix:
Abbreviated as:
The factor analysis method lies in replacing Z by F through Eqs. (2) and (3), conditioned on \(p < m\), which can streamline the number of dimensions to reduce redundancy. The specific steps are19:
-
(1)
Construct sample matrix and perform correlation test,
Collect the p-dimensional random variable \(X = (x_{1} ,x_{2} , \cdots x_{p} )^{T}\) and construct the sample matrix:
The KMO or Bartlett test was used to test the correlation of variables, and if the correlation coefficient is less than 0.3, there is no sense of dimensionality reduction. If the correlation is strong means that the commonality of variables can be extracted and is suitable for factor analysis.
-
(2)
Processing to obtain the standardized matrix,
The standardization is done through the following:
The standardized matrix is obtained:
-
(3)
Calculate the correlation matrix,
The correlation coefficient matrix is obtained as follows:
In addition,
The correlation calculation is performed on the standardized matrix Z. The eigenvector values of \(|R - \lambda I_{P} | = 0\) are obtained based on the features of the correlation matrix, and then the common factors are extracted using the above approach, making the information utilization rate cover more than 85%.
-
(4)
Calculate the factor load matrix, rotate the load matrix, and obtain the matrix U,
$$U = \left[ {\begin{array}{*{20}c} {u_{1}^{T} } \\ {u_{2}^{T} } \\ {u_{3}^{T} } \\ {u_{4}^{T} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {u_{11} } & {u_{12} } & \cdots & {u_{1p} } \\ {u_{21} } & {u_{22} } & \cdots & {u_{2p} } \\ \vdots & \vdots & {} & \vdots \\ {u_{n1} } & {u_{n2} } & \cdots & {u_{np} } \\ \end{array} } \right].$$(9)
\(u_{i}\) Principal component vector of the i sample. \(u_{ij}\) Projection of the vector on the unit eigenvector.
Support vector machine principle
Support Vector Machine simplifies complex problems by establishing nonlinear mapping relationships is good at dealing with nonlinear complex systems, and automatically establishes the mapping relationship between water quality indicators and evaluation criteria by performing inner product operations in the transformation space to achieve the purpose of effectively classifying the categories to which the predicted samples belong. The principle is shown in Fig. 1.
The support vector machine consists of three parts: input layer, intermediate inner product kernel function layer, and output layer. The water source discriminant \(X_{1} ,X_{2} ,X_{3} , \cdots ,X_{n}\), which represents the sample feature information, is input into the Support Vector Machine model, and the input variables will be processed by the intermediate inner product kernel function layer to map them into the high-dimensional space to seek the optimal solution. This does not consider the specific mapping relationship in the transformation stretching process, and the discriminant type of the water source is finally output in the output layer after a nonlinear transformation20.
The procedure of SVM classification operation is as follows21,22:
-
① Determine the input sample variable as \(\{ x_{i} \} \subset X = R^{n}\), the output variable as \(y_{i} \in Y = \{ 1, - 1\}\).
-
② Select the optimal combination of parameters, where the kernel function is \(K\left( {x_{i} ,x} \right) = \varphi \left( {x_{i} } \right) \cdot \varphi \left( x \right)\).
-
③ Solve \(\min = \frac{1}{2}\sum\limits_{i = 1}^{L} {\sum\limits_{i = 1}^{L} {a_{i} } } a_{j} y_{i} y_{j} K\left( {x_{i} ,x_{j} } \right) - \sum\limits_{i = 1}^{L} {a_{i} }\) according to the constraints.
-
④ The optimal solution \(a^{*} = (a_{1} ,a_{2} ,a_{3} ,.....a_{n} )\) is obtained from the above calculation.
After dimensioning, assuming a nonlinear mapping \(\varphi :R^{d} \to H\), the optimization problem can be transformed into:
Introducing Lagrange multipliers yields:
The pairwise objective function is:
\(K(x_{i} ,x) = \varphi (x_{i} ) \cdot \varphi (x)\) is a kernel function that implicitly maps the data and then learns it. To obtain the classification decision function:
The soft interval with the introduction of the penalty factor C and the relaxation variable \(\xi_{i} (\xi_{i} > 0)\) is optimized as:
The optimal decision function can be obtained as:
Optimal parameter solving
In this paper, the grid search method is chosen to divide the grid for the optimal search. Using the fixed-step grid search search23, a violent search method with a combination of coarse and fine, and a large step size in the optimization search space, all the real target points to be searched are cyclically arranged and combined, and the value range of c and g are set to [2–10]. The process and principle of the optimization search are shown in Fig. 2.
The support vector machine steps for the optimization of the grid search method are as follows24,25:
-
(1)
Create a coordinate grid Set \(X = \left[ {\begin{array}{*{20}c} {X_{1} ,X_{2} } \\ \end{array} } \right]\), \(Y = \left[ {\begin{array}{*{20}c} {Y_{1} ,Y_{2} } \\ \end{array} } \right]\). Set up the training learner, pick the step size L, put in the parameter search range, and the grid parameter node \({\text{c}} = 2X\),\(g = 2Y\).
-
(2)
Using K-fold to find the classification accuracy The samples are divided into N subsets, including the test set and the training set, and the number of subsets is 1 and N-1, respectively, where the training set is used for model building. The accuracy evaluation method is set to obtain the classification accuracy corresponding to the set of parameters, which is used for the training set.
-
(3)
Traversing the coordinate grid The combination with the smallest mean square error among all the traversed parameters is selected to obtain the optimal trainer, that is, the combination of (c, g) with the highest classification accuracy, and the optimal trainer accuracy is output.
Analysis of water information
Hydrogeologic conditions in the study area
The coal seams in the Zhaogezhuang Coal mine are predominantly distributed within the Upper Taiyuan Formation (Zhaoge Formation) of the Shanxi Formation (Da Miaozhuang Formation). The presence of faults on the eastern, western, southern, and northern boundaries has resulted in the uplift and exposure of the Ordovician limestone due to tectonic activity. This faulting has led to the development of intense structural karst. Consequently, the gently inclined limestone has formed troughs, and a robust karst development zone has emerged along the eastern boundary fault of the Kaiping block. The overlying Quaternary loose layers exhibit coarse particle size, exceptional permeability, and high water content, serving as a prominent conduit for groundwater movement and constituting the primary strong runoff zone in the regional groundwater system. The hydrodynamic forces are notably strong, displaying characteristics of concentrated conduit flow. Furthermore, a portion of the groundwater in the eastern part of the Shahe River basin in the Zhaogezhuang mine infiltrates the field's interior through the Leizhuang fault, with groundwater flowing from the northeast to the southwest.
The Zhaogezhuang Coal Mine has developed five major aquifer systems from the Cambrian to the Quaternary: the Cambrian aquifer, the Ordovician limestone aquifer, the coal-bearing formation sandstone aquifer, the Tangshan limestone aquifer, and the Quaternary alluvial aquifer.The Quaternary alluvial aquifer in the study area exhibits a relatively thin structure, exerting minimal impact on coal mining operations. In contrast, the Cambrian aquifer predominantly interacts with the Ordovician aquifer. Consequently, the Ordovician aquifer assumes a pivotal role in water influx incidents within the study area, particularly in cases of deep water influx. The principal contributors to these occurrences are the aquifers comprising Ordovician limestone and coal-bearing sandstone within the coal-bearing rock series. To maximize differentiation of water source types, the study selected the six most widely distributed ions in groundwater as discriminative indexes26,27. These include Na+, Ca2+ , Mg2+ , Cl–, SO42– and HCO3–. K+ was combined with Na+ due to their low variation range.
Data index extraction and collection
For data selection, the Zhaogezhuang mine’s deep mining process was primarily threatened by Ordovician carbonate from the Ordovician aquifer, followed by goaf water damage and sandstone water damage. As a result, four water sample types were chosen: goaf water (from the I aquifer), ordovician carbonate (from the II aquifer), sandstone fracture water from the 13 coal system (from the III aquifer), and sandstone fracture water from the 12 coal seam (from the IV aquifer section A). To screen the typical water sample data, 67 groups were selected from 19 boreholes based on the anion and cation balance test and hydrogeological data of Zhaogezhuang. Among these groups, 18 were from goaf water, 13 from ordovician carbonate, 17 from 13 coal seam sandstone fracture water, and 19 from 12 coal seam sandstone fracture water. The four water sample sources are indicated by I, II, III, and IV respectively. The water samples were submitted to the Testing and Analysis Center of Hebei Coalfield Geology Bureau for chemical analysis. The water quality testing report provided analysis of the main ions and the total hardness (TH) using ion chromatography. Additionally, the bicarbonate ion (HCO3–) and total alkalinity (TA) were determined through titration using dilute sulfuric acid-methyl orange. The pH value was measured using a pH tester. Subsequently, the data on the nine discriminant indices of the mine water were organized and presented in Table 1(attached).
Using 67 sets of typical water sample data collected from the Zhaogezhuang mining area, 56 of these were utilized as training samples for the learning machine as shown in Table 2(attached) while the remaining 11 sets were reserved as test samples, labeled G1 to G11 as presented in Table 3. The distribution of anion and cation content was illustrated using a three-dimensional diagram, with the cation content distribution depicted in Fig. 3, and the anion content distribution shown in Fig. 4.
Water chemistry characterization
Analysis of statistical characteristic values
The water chemistry statistical characteristic values were calculated and analyzed based on the water chemistry content information of 67 groups of water samples from Zhaogezhuang mine. In the water sample data of study area, the goafwater is obviously different from the other three types of water samples in ionic composition. Among the anions of the goaf water, the anion with the highest content is SO42–, which is 78.022 mmol·L–1, while the other water samples are HCO3–. The goaf water is easier to identify than the other three types of water sources, and can be identified by the content of anions, if the highest content of SO42– can be initially classified as goaf water; in the cations, the highest content in all four types of water samples is Ca2+. In addition, in terms of the overall content of anions and cations in all water samples data, the content of Ca2+ and HCO3– is higher compared to other ions, which indicates that Ca2+ and HCO3– have strong recognition ability.
The goaf water
The hydrochemical index of goaf water are as shown in Table 4. The water chemical composition of the four water samples from Zhaogezhuang differed significantly, and their mass concentrations of substances were related to the water source cycle. In the goaf water, the mass concentration of SO42– was the highest in the distribution of anion content, and its substance concentration ranged from 60.47 mmol·L–1 to 85.55 mmol·L–1, accounting for 78% of the anions, followed by HCO3–. Cl– had the smallest mass concentration. The cations were mainly Ca2+ and Mg2+, and the lowest mass concentration of Na+. The coefficient of variation is the ratio of the standard deviation to the mean, indicating the degree of dispersion of the data, and the Cl– coefficient of variation was the largest at 0.9, followed by Na+ at 0.41, and the rest were smaller, indicating the poor uniformity of ion concentration in the water.
Ordovician carbonate
The hydrochemical index of Ordovician Carbonate are as shown in Table 5.The ph of ordovician carbonate is 7.30–7.94, which is weakly alkaline. 86.6% of the anions in ordovician carbonate are mainly HCO3– and SO42–, and the mass concentration of cations are: Ca2+ > Mg2+ > Na+, mainly Ca2+ and Mg2+ accounting for 92.88%, and the water chemistry type is Ca-Mg-HCO3. The variation coefficient of ordovician carbonate is in the following order: SO42– > Cl– > Na+ > Mg2+ > HCO3– > Ca2+, and the coefficients of variation of all six indexes are less than 0.5. and the coefficients of variation of the anions Cl–, SO42–, HCO3– is greater than that of cations Na+, Mg2+, Ca2+.
Sandstone fracture water from the 13 coal system
The hydrochemical index of sandstone fracture water from 13 coal system are as shown in Table 6.The highest mass concentration of HCO3– among the anions in the fracture water of the 13-coal sandstone is up to 79.58 mmol·L–1, the content of SO42– and Cl– is less, and the highest mass concentration of cations is Ca2+, followed by Mg2+. The 13 coal system sandstone fracture water coefficient of variation is not much different except for Na+, which is less than 0.1, and the ion concentration is dispersed more uniformly.
Sandstone fracture water from the 12 coal system
The anions in the fracture water of the 12 coal seam sandstone are mainly HCO3–with a mean mass concentration of 71.79 mmol·L–1. The cations are dominated by Ca2+ up to 64.36 mmol·L–1, followed by Mg2+ with a mean concentration of 32.57 and finally Na+. The variation coefficients of sandstone fracture water in the 12 coal seam are in the following order: Mg2+ > Ca2+ > Na+ > Cl– > HCO3–, and the variation coefficient of Mg2+ is as high as 0.69.
The hydrochemical index of sandstone fracture water from 12 coal system are as shown in Table 7. In order to study the hydraulic connection between individual aquifers, the degree of connection K between them can be calculated quantitatively28,29, and since the Cl– concentration is minimally disturbed by other factors and is mainly influenced by the formation itself, the degree of hydraulic connection between two aquifers can be obtained by calculating the difference between their average Cl– concentrations .If the K value of the hydraulic connection between the two aquifers is less than 0.2, it means that they have a strong hydraulic connection, if K is greater than 0.4, it means that the hydraulic connection between the two aquifers here is weak, if the final calculated K value is between 0.2 and 0.4, it means that the hydraulic connection is moderately strong30,31.
Cl1 The average Cl– concentration in aquifer 1. Cl2 The average Cl– concentration in aquifer 2.
Through Eq. (16), the K values of goaf water and Ordovician carbonate, sandstone fracture water of 13 coal system and sandstone fracture water of 12 coal system are all 0.25, and the degree of hydraulic connection is moderate. The K value of the hydraulic connection between the goaf water and the sandstone fracture water of 13 coal system is 0.025, and the K value of the fracture water with the 12 coal seam sandstone is 0.03, which is a weak hydraulic connection; the K value of the fracture water with the 13 coal system sandstone and the 12 coal seam sandstone fracture water is 0.001, which is a very weak hydraulic connection. It can be summarized that there is a certain hydraulic connection between the goaf water and other aquifers, indicating the existence of connection and increasing the difficulty of discrimination.
Piper trilinear diagram analysis
The hydrogeological conditions in Zhaogezhuang Coal Mine are characterized by complexity and variability. As demonstrated by the previous analysis of the goaf water composition and other water sources, they exhibit distinguishable differences. To further investigate the distribution patterns of aquifer water samples, the Piper trilinear diagram method was employed for analysis. The ion contents were represented as points on the diagram, allowing for inference of the water chemistry type and quality pattern of the aquifer based on the scatter position of the water samples.
The water samples of the study area were drawn for hydrochemistry analysis using piper trilinear diagram shown in Fig. 5. The goaf water was located in the upper right corner, near Ca2+, Mg2+ and SO42-, Cl–, mainly Ca·Mg-Cl·SO4 type, and individually Ca·Mg-SO4 type. The water sample of Ordovician carbonate water is located in the left position of the diamond-shaped area, and the water quality type is Ca·Mg-HCO3 type. By observing the left triangle area, we can find that the cations in the Ordovician carbonate sample are mainly Mg2+ and Ca2+, and the anions are mainly HCO3– and SO42– in the right triangle area. Sandstone fracture water from the 13 coal system is located in the middle and left position, and the cations are mainly located in Ca2+ and The anions are scattered in the end elements with high proportion of HCO3– and SO42–, and the water quality type is Ca·Mg-HCO3 type. sandstone fracture water samples from the 13 coal system are highly similar to the 13 in the trilinear diagram, and the water chemistry type is Ca·Mg-HCO3 type, the cations are mainly Ca2+ and Mg2+, and the anions are mainly HCO3– and CO32–. In summary, the water quality types of Ordovician carbonate, sandstone fissure water from 13 or 12 coal seam are the same, with overlapping characteristics and inconspicuous distribution boundaries, which need further quantitative discrimination.
Model building and application
Dimensionality reduction based on R-factor
The normalization process is performed before the operation to make it lie in the interval of [0, 1] to solve the comparability between indicators and ensure the stability of calculation.The normalization of water sample data are as shown in Table 8 (attached).
There is a non-linear association between the indicators, and to reduce the correlation between the data, the optimal number of common factors for the six indicators of sodium ion, calcium ion, magnesium ion, chloride ion, sulfate ion, and bicarbonate ion was determined to be 3, denoted as Y1, Y2, and Y3. SPSS software was used to analyze 67 groups of samples and 6 evaluation indicators of Zhaogezhuang based on the correlation calculation steps of R-type factors. The eigenvalues and contribution rates of the main factors were as Table 9.
The cumulative contribution rate of the first three principal factors reaches 96.660%, which indicates that the factors extracted by dimensionality reduction contain 96.660% of the information of the original index data. When the cumulative contribution rate reaches 80%, it shows that the extracted principal factors are reasonable and effective, which indicates that these three principal factors cover most of the water chemistry information and can effectively replace the original indexes.
The factor correlation matrix is as follows:
The correlation coefficient above 0.8 indicates a strong correlation, while between 0.3 and 0.8 indicates a moderate correlation, and below 0.3 indicates no correlation. The correlation coefficient between Na+ and Ca2+ is − 0.416, indicating a weak correlation, while with Mg2+ is − 0.167, with Cl– is 0.231, with SO42– is − 0.104, and with HCO3– is 0.080, all of which have no correlation. The correlation coefficient between Ca2+ and Mg2+ is − 0.799, indicating weak correlation between Ca2+ and other ions. Similarly, Mg2+ is not correlated with Na+ and weakly correlated with other ions, while Cl– and SO42– are strongly correlated and SO42– and HCO3– are strongly correlated.
Using the maximum variance orthogonal rotation method, SPSS rotates to obtain the rotated component matrices. The factor loading matrix and the rotated component matrix were:
The component conversion matrix is:
Three new main components Y1, Y2, and Y3 were extracted, and the factor score coefficient matrix based on SPSS operations was as follows:
According to the factor score coefficient matrix, the expressions of the main factors Y1, Y2, and Y3 are:
The original data of water samples (I), water samples (II), water samples (III), and water samples (IV) from Zhaogezhuang mine were substituted into the model expressions of the three main factors Y1, Y2, and Y3, and the factor score matrices were as follows:
R-SVM model establishment
The R- SVM model is shown in Fig. 6. First, the R-factor is used to initially reduce the dimensionality of the data, and the three common factors Y1, Y2, and Y3 are used as the input variables of the model, and the four types of water sources H are used as the output of the model to establish the mapping \(F({\text{Y1,Y2,Y3}}) \to H\), which automatically searches for complex connections between the input variables and the types of water sources. The grid search method is used to find the optimal combination of parameters for the Support Vector Machine model. The training set data is then used to train the model, and the trained model is used to predict the water sample types for the testing set data. The predicted types are then compared with the actual types to correct for any deviations. This process is repeated until the model achieves a satisfactory level of accuracy in predicting the types of water samples.
Parameter search and model application
Six indicators of sodium ion, calcium ion, magnesium ion, chloride ion, sulfate ion and bicarbonate ion are used as input variables of the SVM, and four water source types of goaf water, Ordovician carbonate, sandstone fracture water from the 13 coal system and sandstone fracture water from the 12 coal system are used as outputs of the model to establish the mapping relationship between the two and seek the nonlinear law of the two by SVM. Firstly, 55 sets of training samples and 11 sets of prediction samples are substituted into the grid search method to run the search for parameters, and the range of values of the parameters c and g of the grid search method are set \({\text{g}} \in \left[ {2^{ - 10} ,2^{10} } \right]\) \({\text{c}} \in \left[ {2^{ - 10} ,2^{10} } \right]\), and the step size L = 0.2 according to the operation process of SVM.
The three public factors of Zhaogezhuang after dimensionality reduction were used as the input variables of the model, and four types of goaf water, Ordovician carbonate, sandstone fracture water from the 13 coal system, and sandstone fracture water from the 12 coal system of Zhaogezhuang mine were used as the outputs of the model to establish the mapping relationship about the public factors and water source types. The factor scores of the 67 sets of sample data after dimensionality reduction were substituted into the SVM model of grid search method for finding the best model for training, and the best parameter combination c = 1 and g = 2.8284 was finally obtained.The result of the optimization search is shown in Fig. 7
Substituting c = 1 and g = 2.8284 into the SVM model, the type attributes were predicted for 11 sets of data to be discriminated, and the final results are shown in Fig. 8 and Table 10. The model misjudged Type II ordovician carbonate as Type III sandstone fracture water from the 13 coal system, indicating that the model is suitable for water source discrimination in Zhaogezhuang Coal Mine and can effectively make the distinction.
Table 11 presents a comparative analysis of model performance across different optimization types. The accuracy and precision metrics were employed to evaluate the models' efficacy. The Fisher optimization type exhibits the lowest performance in terms of accuracy and precision. The Grid optimization type shows a significant improvement in both accuracy and precision compared to the Fisher type. Notably, the R-type grid optimization type demonstrates the highest level of performance, surpassing both the Fisher and Grid types in terms of accuracy and precision.
Based on the information provided, it seems that the coupled discriminant model of R-SVM was able to provide more targeted and effective characterization of water sources compared to other multi-model prediction results presented in Table 11. The R-factor simplification was used as a new discriminant to improve the model’s independence component. The coupled discriminant model of R-SVM can also complement the qualitative analysis of water chemistry and provide rapid identification of water sources.
Conclusion
As coal mine of submarine mining, the identification and prediction of mine water inrush source is of great significance to the safety and efficiency of mine production in Zhaogezhuang Coal Mine. In order to prevent and control the water inrush, it is of great practical significance to identify the mine water source effectively and accurately. Through the analysis of the water source data of different parts in the mine, the effective water source discrimination model was established to verify its effectiveness and practicability.The conclusions of the study are as follows:
-
(1)
The chemical composition data of 67 water samples of Zhaogezhuang Coal Mine were collected. According to the chemical composition analysis of selected mine water sources, the main ions identified in water sources were Na+, Ca2+, Mg2+, Cl–, SO42– and HCO3–. The water inrush sources in the mining area were divided into four categories: goaf water was type I, ordovician carbonate was type II, sandstone fracture water from 13 coal seam was type III, and from 12 coal seam was type IV. The analysis and comparison of water source information provide support for the establishment of water source discrimination model.
-
(2)
R factor analysis was used to reduce the dimensionality of the original data, resulting in three common factors (Y1, Y2, and Y3) and factor score data for water source data. This approximation of indicator attributes filtered out redundant features and improved efficiency.
-
(3)
The coupled model of R-SVM achieved a classification accuracy of 90.90% in water source discrimination for the Zhaogezhuang mine. Compared to traditional qualitative approaches, this model explores the internal laws of the data and provides accurate discrimination, improving upon the Fisher discrimination function and SVM model alone.
Data availability
The data used to support the findings of this research are included within the paper.
References
Liu, X., Han, K. & Fan, Z. Discriminated method of mine water inrush source based on entropy weight fuzzy comprehensive analysis. Coal Ming Technol. 22(06), 82–84 (2017).
Chen, Y., Tang, L. & Zhu, S. Comprehensive study on identification of water inrush sources from deep mining roadway. Environ. Sci. Pollut. Res. 29, 19608–19623 (2022).
Wei, Z., Dong, D., Ji, Y., Ding, J. & Yu, L. Source discrimination of mine water inrush using multiple combinations of an improved support vector machine model. Mine Water Environ. 41, 1106–1117 (2022).
Rahbar, A. et al. A hydrogeochemical analysis of groundwater using hierarchical clustering analysis and fuzzy C-mean clustering methods in Arak plain, Iran. Environ. Earth Sci. 79, 1–17 (2020).
Fan, Z. Quantify discriminated method of water source of mine water inrush based on grey relational analysis. Coal Min. Technol. 22(02), 10–14 (2017).
Nadiri, A. A. et al. Hydrogeochemical analysis for Tasuj plain aquifer. Iran. J. Earth Syst. Sci. 122, 1091–1105 (2013).
Zhang, D., Meng, L., Dong, F., Liu, X. & Shao, Q. Study on GA-SVM for mine water inrush source identification. Coal Technol. 37(04), 144–147 (2018).
Erdogan, I. G., Fosso-Kankeu, E., Ntwampe, S. K. O., Waanders, F. & Hoth, N. Seasonal variation of hydrochemical characteristics of open-pit groundwater near a closed metalliferous mine in o’kiep, namaqualand region, South Africa. Environ. Earth Sci. https://doi.org/10.1007/s12665-020-8863-2 (2020).
Moreno Merino, L., Aguilera, H., González-Jiménez, M. & Díaz-Losada, E. D-piper, a modified piper diagram to represent big sets of hydrochemical analyses. Environ. Model. Softw. 138, 104979 (2021).
Song, C., Yao, L., Gao, J., Hua, C. & Ni, Q. Identification model of water inrush source based on statistical analysis in Fengyu minefield, Northwest China. Arab. J. Geosci. https://doi.org/10.1007/s12517-021-06901-1 (2021).
Guo, Y., Gan, F., Yan, B., Wang, F. & Bai, J. Hydrochemical-isotopic characteristics of surface water and its controlling factors in southwest Tibetan plateau. J. North China Univ. Water Resour. Electr. Power (Nat. Sci. Ed.) 43(6), 96–107 (2022).
Zhang, S., Hu, Y. & Xing, S. Discrimination of the mine water inrush source based on principal component analyses-theory of gray relational degree. Hydrogeol. Eng. Geol. 45(06), 36–41 (2018).
Qiu, M. et al. Recognition method of mine water sources based on principal component analysis and support vector machine. China Sciencepap. 10(03), 251–255 (2015).
Xu, X., Wang, X., Li, K. & Li, Y. Source discrimination of mine water inrush based on elman neural network globally optimized by genetic algorithm. Arab. J. Geosci. https://doi.org/10.1007/s12517-021-06821-0 (2021).
Nadiri, A. A. et al. Supervised committee machine with artificial intelligence for prediction of fluoride concentration. J Hydroinform 15(4), 1474–1490 (2013).
Cao, X., Qian, J. & Sun, X. Hydrochemical classification and identification for groundwater system by using integral multivariate statistical models: A case study in Guqiao Mine. J China Coal Soc. 35(S1), 141–144 (2010).
Chitsazan, N., Nadiri, A. A. & Tsai, F.T.-C. Prediction and structural uncertainty analyses of artificial neural networks using hierarchical Bayesian model averaging. J. Hydrol. 528, 52–62 (2015).
Winsberg, S. & Ramsay, J. O. Monotone spline transformations for dimension reduction. Psychometrika 48(4), 575–595 (1983).
Abbasi, M. et al. A hybrid of random forest and deep auto-encoder with support vector regression methods for accuracy improvement and uncertainty reduction of long-term streamflow prediction. J. Hydrol. 597, 125717 (2021).
Huang, S. et al. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genom. Proteom. 15(1), 41–51 (2018).
Miller, C. H., Sacchet, M. D. & Gotlib, I. H. Support vector machines and affective science. Emot. Rev. 12, 297–308 (2020).
Kim, S. & Kim, C. Influence diagnostics in support vector machines. J. Korean Stat. Soc. 49, 757–778 (2020).
Lv, W., Li, T. T., Ren, H. L., Zeng, S. J. & Zhou, J. Inequality distance hyperplane multiclass support vector machines. Int. J. Intell. Syst. 37, 2046–2060 (2022).
Goretzko, D. & Bühner, M. Robustness of factor solutions in exploratory factor analysis. Behaviormetrika https://doi.org/10.1007/s41237-021-00152-w (2021).
Gai, Q., Huang, L. & Zhao, L. Floor water inrush model of Jiaozuo mining area based on factor analysis. Coal Eng. 53(01), 123–127 (2021).
Cai, X., Han, R., Meng, L. & Yang, J. Safe and warning water level control of closed pit groundwater in Zhaogezhuang Mine. Coal Eng. 52(09), 116–121 (2020).
Sun, W., Yang, H., Li, X., Wang, Z. & Yang, L. Research on rapid recognition method of mine water inrush source based on PCA and ELM model. Coal Eng. 52(01), 111–115 (2020).
Yang, Y. Research on groundwater chemical characteristics and genesis mechanism of main water-filled aquifers in Xinzhi coal mine. China University of Ming and Technology, Master Thesis (2020).
Li, C. Study on hydrochemical characteristics and identification model of water inrush source in Xieqiao coal mine. Anhui University of Science and Technology, Master Thesis (2020).
Lam, K. F. & Moy, J. W. A piecewise linear programming approach to the two-group discriminant problem—An adaptation to fisher’s linear discriminant function model. Eur. J. Oper. Res. 145, 471–481 (2003).
Liberda, E. N., Zuk, A. M., Martin, I. D. & Tsuji, L. Fisher’s linear discriminant function analysis and its potential utility as a tool for the assessment of health-and-wellness programs in indigenous communities. Int. J. Environ. Res. Public Health 17, 7894 (2020).
Acknowledgements
This research was funded by the National Emergency Management System Construction Project (grant 20VYJ061), the Construction and Empirical Research on Early Warning Index System of Major Engineering Safety Risks Based on Optimal Control Theory, National Natural Science Foundation of China (grant 71271031), the Innovation Fund for Doctoral Students of Beijing University of Posts and Telecommunications (grant CX2023102), and the Graduate Innovation and Entrepreneurship Project (2024-YC-A180).
Author information
Authors and Affiliations
Contributions
Q.Z. performed the data analyses and wrote the manuscript; C.W. provided research funding support; Y.Y. contributed significantly to analysis and manuscript preparation; W.L. performed the experiment and data analyses; Y.Z. helped perform part of the finite element analysis. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zheng, Q., Wang, C., Yang, Y. et al. Identification of mine water sources using a multi-dimensional ion-causative nonlinear algorithmic model. Sci Rep 14, 3305 (2024). https://doi.org/10.1038/s41598-024-53877-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-53877-5
Keywords
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.