Application of improved and optimized fuzzy neural network in classification evaluation of top coal cavability

Longwall top coal caving technology is one of the main methods of thick coal seam mining in China, and the classification evaluation of top coal cavability in longwall top coal caving working face is of great significance for improving coal recovery. However, the empirical or numerical simulation method currently used to evaluate the top coal cavability has high cost and low-efficiency problems. Therefore, in order to improve the evaluation efficiency and reduce evaluation the cost of top coal cavability, according to the characteristics of classification evaluation of top coal cavability, this paper improved and optimized the fuzzy neural network developed by Nauck and Kruse and establishes the fuzzy neural network prediction model for classification evaluation of top coal cavability. At the same time, in order to ensure that the optimized and improved fuzzy neural network has the ability of global approximation that a neural network should have, its global approximation is verified. Then use the data in the database of published papers from CNKI as sample data to train, verify and test the established fuzzy neural network model. After that, the tested model is applied to the classification evaluation of the top coal cavability in 61,107 longwall top coal caving working face in Liuwan Coal Mine. The final evaluation result is that the top coal cavability grade of the 61,107 longwall top coal caving working face in Liuwan Coal Mine is grade II, consistent with the engineering practice.


Propose an improved fuzzy neural network and its global approximation verification
Propose an improved fuzzy neural network. The fuzzy neural network developed by Nauck and Kruse is a fuzzy neural network composed of the input layer, hidden layer or rule layer, and output layer, in which the first layer fuzzies the clear input parameters, the second layer is the precondition of ruleset, the third layer performs defuzzification operations, as shown in Fig. 1. This paper improves the network under this framework and changes the network into a fuzzy neural network composed of input layer, membership generation layer, reasoning layer, activation layer, and output layer, as shown in Fig. 2. The details of each layer are as follows: Layer 1 This layer is the input layer, and clear independent variable values are input. Layer 2 This layer is the membership generation layer. The clear independent variable value passed in from layer 1 will be fuzzified by the membership function set in this layer. At this time, the clear variable value will be transformed into the corresponding membership fuzzy value and complete the fuzzy processing.
Layer 3 This layer is the reasoning layer. The fuzzy value of membership passed in from layer 2 is conjunct calculations, and the corresponding fuzzy rules are formed. Therefore, each neuron in this layer is a fuzzy rule. In addition, the output of the neurons in this layer is the activation strength of the association rules.  www.nature.com/scientificreports/ Layer 4 This layer is the activation layer, which combines the activation intensity of all association rules passed in from layer 3 to form the estimation of class to achieve the effect of defuzzification, thereby obtaining a clear value. The activation function processes the clear value to obtain the label code of the corresponding prediction category.
Layer 5 This layer is the output layer. In this layer, the final output value is finally obtained by inverse coding the coding of the prediction category passed in from layer 4 (In classification problems, the target value is generally performed one-hot coded during preprocessing).
The improved fuzzy neural network membership function adopts the Gaussian membership function because some studies 17,18 show that taking the Gaussian membership function as the membership function of the fuzzy neural network will generally obtain a fuzzy neural network model with good performance. The membership function adopted is shown in Eq. (1).
where u ij is the fuzzy membership value of the j-th node corresponding to the i-th variable; x i is the i-th input variable; m ij is the membership function cluster center of the j-th node corresponding to the i-th variable; σ ij is the width of the membership functions the j-th node corresponding to the i-th variable.
The defuzzification function of the network activation layer is the sum of the products of the inputs of each node in the fuzzy layer, as shown in Eqs. (2) and (3). The defuzzification function is an improvement point in addition to the improvement of network structure, because the defuzzification function form of the general fuzzy neural network is similar to Eq. (4), there is a complex calculation problem. Therefore, in order to reduce the calculation difficulty and realize fast calculation, the network has improved its network defuzzification function.
where w i is the weight of the i-th node in the reasoning layer; ψ i is the algebraic product of the membership degree of the i-th node in the reasoning layer; u ij is the i-th variable corresponds to the membership degree of the j-th node; n is the number of variables; m is the number of reasoning layer nodes.
The essence of classification evaluation of top coal cavability is a multi-classification problem, and the softmax function is a typical activation function in the neural network of multi-classification problems and a very mature activation function 19 . Therefore, the softmax function activates the clear value after defuzzification, and then the final prediction value is obtained. The softmax function is shown in Eq. (5).
where y t,k is the t-th sample is the output value of the k-th neuron of the activation layer. www.nature.com/scientificreports/ The softmax cross-entropy loss function can measure the similarity between the predicted and actual values in the neural network of multi-classification problems 20 . Therefore, the softmax cross-entropy loss function is used as the loss function of this paper. Softmax cross-entropy loss function, as shown in Eq. (6).
where ŷ is the predicted label vector; y* is the actual label vector.
The reverse adjustment gradient of the network is shown in Eqs. (7)-(9): where L is the loss function; y t,k is the t-th sample is the output value of the k-th neuron of the activation layer; y * is actual value label vector; η is the learning rate; u ij is the i-th variable corresponds to the membership degree of the j-th node; x i is the i-th input variable; m ij is the membership function cluster center of the j-th node corresponding to the i-th variable; σ ij is the width of the membership functions the j-th node corresponding to the i-th variable; ψ i is the algebraic product of the membership degree of the i-th node in the reasoning layer; w ij is the weight; n is the number of variables.
Verification of the global approximation of the improved fuzzy neural network. Both artificial and fuzzy neural networks are general approximators 21,22 , i.e., they both have global approximations. Therefore, it is necessary to demonstrate the global approximation of the proposed improved fuzzy neural network to prove that the modified network is still a neural network and possesses its properties. However, it is an effective method to use the Stone-Weirstrass theorem to prove the global approximation of the network 23 . Therefore, the Stone-Weirstrass theorem is used to demonstrate the global approximation of the improved fuzzy neural network. To prove the improved fuzzy neural network, only need to prove that the output function y satisfies the three lemmas in the Stone-Weirstrass theorem 24 : (1) (y, d ∞ ) is an algebra, i.e., y is closed to addition, multiplication, and number multiplication.
To prove that (y, d ∞ ) is an algebra, assume that y 1 , y 2 ∈ y. So Then Since both u ji1 (x j )and u ji2 (x j ) are Gaussian, they are still Gaussian after being multiplied. Therefore, the above formula can be equivalent to Eq. (2), and y 1 (x)⋅y 2 (x) ∈ y can be proved.
At the same time, for any c ∈ R it is easy to obtain: It can be seen that the above formula is equivalent to Eq. (2), so cy 1 (x) ∈ y. Similarly, www.nature.com/scientificreports/ It can be seen that the above formula is equivalent to Eq. (2), so y 1 (x) + y 2 (x) ∈ y. Based on the above, it can be proved that (y, d ∞ ) is an algebra. (2) (y, d ∞ ) can be separated on U, i.e., for any given x 0 , y 0 ∈ U, when x 0 ≠ y 0 , y(x 0 ) ≠ y(y 0 ).
In order to prove that (y, d ∞ ) can be separated on U, suppose that on the i-th subspace on U, and the corresponding membership function are : At the same time, two fuzzy sets (B 1 , u B1 1 ) and (B 2 , u B2 1 ), are defined on the output universe R, and the corresponding membership functions are: In addition, suppose that the fuzzy rule base is composed of two fuzzy rules, i.e., m = 2. Based on the above assumptions: Since x 0 i = y 0 i has been assumed, there is no situation of Eq. (20). At this time, when w 1 ≠ w 2 , there must be y(x 0 ) ≠ y(y 0 ). It can be proved that (y, d ∞ ) can be separated on U.
The above proof shows that the improved fuzzy neural network satisfies the three lemmas in the Stone-Weirstrass theorem, i.e., the improved fuzzy neural network has a global approximation. At the same time, it shows that the improved fuzzy neural network is still a neural network and has the properties of a neural network.

Influencing factors of top coal cavability and its evaluation grade division
Influencing factors of top coal cavability. The two main factors that affect the top coal cavability are geological factors and mining technology factors. Between them, geological factors occupy a dominant position. Because geological factors often determine the technical means that should be adopted in longwall top coal caving mining 25 . Therefore, this paper mainly studies the top coal cavability under geological factors. According to engineering practice, the burial depth of the coal seam (H) the thickness of the coal seam (M), the thickness of the gangue (m j ), the uniaxial compressive strength of the coal (Rc), the degree of crack development (DN, i.e., the product of the number of through fractures N on the coal surface of 1 m 2 and the fractal dimension D 1 of the distribution of the number of fractures counted by the coal specimen), and the filling coefficient of the direct roof (k, k = Σhk p /M) are important geological factors that affect the top coal cavability 26 . Therefore, this paper takes the factors mentioned above as the influencing factors for evaluating top coal cavability.
Classification of top coal cavability. The classification of top coal caving properties is itself a problem of fuzziness, complexity, and uncertainty. However, this paper aims to verify the applicability and superiority of the improved and optimized fuzzy neural network in the classification evaluation of top coal cavability. Therefore, this paper did not discuss classification standards and methods of top coal cavability. Therefore, it is only clas- www.nature.com/scientificreports/ sified into I II, III, and IV based on the broad experience of field workers. The specific conditions of each level are shown in Table 1.

Data and data preprocessing
Since the sample data is not easy to obtain, the sample data in this paper is obtained from the database of published papers from CNKI. The obtained sample data are 61 groups. However, some samples in the obtained sample data have missing parameters. This paper uses the average value within the grade to fill the missing parameters for the data with missing parameters considering the scarcity of sample data. At the same time, to minimize the influence of outliers on the model, the Mahalanobis distance method 27 is used to detect outliers, and samples detected as outliers at a confidence level of 0.5% are eliminated. After processing, 60 groups of data samples are obtained, as shown in attached Table 2.

Fuzzy neural network design and its training and test
Fuzzy neural network design. The network design generally consists of several parts: the network layer number design, the neuron number setting of each layer in the network, the over-fitting prevention design, the optimization design to improve network convergence speed, the initial value setting, and other hyperparametric settings. Therefore, according to the needs of the classification evaluation top coal cavability and the characteristics of the data obtained, this paper is designed according to the improved fuzzy neural network principle.
(1) Network layers design The number of network layers is closely related to the prediction output error. It can be proved from theory and practice that the prediction output error will continue to decrease with the increase of network layers. Nevertheless, while increasing the number of network layers, the neural network model will also become more complex, increasing the training time. Studies 28 have shown that increasing the number of neurons in the hidden layer to improve training accuracy is easier to achieve a good training effect than increasing the number of network layers to improve training accuracy, and at the same time, it can reduce training time to a certain extent. This paper hopes to get a better training effect while spending less training time. Therefore, the neural network designed is a simple fuzzy neural network with five layers: the input layer, membership function generation layer, inference layer, activation layer, and output layer.
(2) Setting the number of neurons in each layer of the network The geological factors affecting the top coal cavability main include burial depth, coal seam thickness, gangue thickness, uniaxial compressive strength of coal, fracture development degree, and direct top coal filling coefficient. The data parameters obtained are also mainly the above parameter data. Therefore, there are six neurons in the input layer. The membership function generation layer and reasoning layer belong to the hidden layer. Studies have shown that increasing the number of hidden neurons can effectively reduce the training error, but it does not reduce the error indefinitely. When the number of neurons in the hidden layer reaches a specific value, the error will not only not decrease. On the contrary, the error will increase, and the network will also lose its generalization ability. However, if the number of neurons in the hidden layer is too small, the training error cannot be reduced. Therefore, according to the empirical Eq. (21) 29 , this paper determines that the number of neurons in the membership generation layer is 18, and the number of neurons in the reasoning layer is 6. In this paper, the grade of top coal cavability is 4, so the number of neurons in the activation layer is set to 4. In addition, this paper only needs to output a parameter (top coal cavability grade), so the network output layer neuron is 1.
where N h is the number of hidden layer neurons; N is the number of neurons in the input layer.
(3) Anti-overfitting design and accelerating network convergence design Neural networks generally encounter two problems, "overfitting" and slow network convergence during the training process. Therefore, corresponding measures shall be taken for this problem. For the "overfitting" www.nature.com/scientificreports/ problem, generally using regularization to optimize the network can have a good effect. For slow network convergence, using variable learning rate and Adam (Adaptive Motion Estimate) optimization algorithm to optimize the network design further will generally achieve good results and significantly improve the network's convergence speed 30 . Therefore, this paper adopts regularized optimization design to prevent the overfitting problem in the training process; Adam optimization algorithm and a variable learning rate with exponentially decayed learning rate are used to achieve rapid network convergence. Then add a regular term after the loss function, as shown in Eq. (22); The network reverses the adjustment gradient, and the Adam algorithm flow are shown in Eqs. (23)-(33).
The network reverses adjustment gradients are: Similarly,  www.nature.com/scientificreports/ where Δw ij , Δm ij , and Δσ ij are network initial adjustment gradients; n is the total number of samples; η 0 is the initial learning rate; η t is the real-time learning rate; t is the number of iterations; β is the decay index, which is 0.9; ϖ is the constan to prevent the learning rate from returning to 0, which is 10 -5 ; λ is the regular parameter; κ 1 , κ 2 , ε, α are the hyperparameters, among which is 0.9, 0.99, 10 -8 , and 0.001; V ∆wij , V ∆mij , V ∆σij are intermediate amounts, the initial values are 0; V Corrected ∆wij , V Corrected ∆mij , V Corrected ∆σij are correction amounts; ∆w′ ij , ∆m′ ij , ∆σ′ ij are the the network finally adjusts the gradients.

(4) Initial weight and other initial value settings
The parameters in the membership function of the improved fuzzy neural network mainly include the membership function's cluster center (m) and the membership function's width (σ), and it is also the first step to generate the membership in the network. In addition, the membership function's cluster center and the membership function's width will be adjusted accordingly according to the error. Therefore, according to the number of neurons in the input and membership generation layers, the initial membership function cluster center and the membership function width are both 6 × 18 random Gaussian distribution matrix with a value range of [0,1].
The network designed in this paper does not have weight transfer in the input layer, membership function generation layer, and fuzzy layer, so it only needs to set the weight from the fuzzy layer to the active layer and set the initial weight as an 18 × 4 random Gaussian matrix with the value range of [0,1].
Since this paper adopts a variable learning rate that decays exponentially, the initial learning rate of the network can take an immense value, so the initial learning rate in this paper is set to 3.0.
The setting of the maximum number of training iterations will directly affect the generalization ability of the network. Generally, with the increase of the number of training iterations, the training error and test error will decrease, but with the increase of the number of iterations, the phenomenon of "overfitting" will appear, resulting in the increase of test error instead of decreasing. For this reason, since the network is not complex, the number of iterations is set to 100.
When the training reaches a specific error requirement, the operation should be stopped, which can prevent overfitting to a certain extent and effectively reduce the training time. Generally, the training error can meet the requirements when it reaches 10 −4 . Therefore, this paper sets the training error stops as 10 −4 .

(5) Network training and test design
Network training and testing require at least one test set and one test set, and the two sets are required to be independent of each other. Since there are only 60 groups of samples after data preprocessing, this paper takes the sample number of 1-50 as the training sample and the sample number of 51-60 as the test data. In addition, the model's generalization ability is an important parameter to measure the reliability and robustness of the model. However, if the model's generalization ability can be verified during training, the model's generalization ability can be well measured to a certain extent, and the overfitting problem can be avoided to a certain extent. Therefore, the K-fold cross-validation method is used to train the model, and part of the training set is divided into the training set, and the other part is the validation set. Due to the small number of samples, to better test the model's generalization ability, the training adopts the tenfold cross-validation method to train the model. In addition, it is found from the sample data that there is an imbalance in the grade of top coal cavability, so this issue needs to be considered in order to obtain a stable and robust prediction model. Stratified sampling can alleviate the imbalance of sample categories to a certain extent, so the tenfold cross-validation of stratified sampling is used to train the model. The top coal cavability grade distribution is shown in Fig. 3.

Analysis of network training and test results.
After the above design, MATLAB was used to implement it, and it was trained and tested. Finally, the training diagram of the tenfold cross-validation (as shown in Fig. 4), the validation diagram of the tenfold cross-validation (as shown in Fig. 5), and the test result graph (shown in Fig. 6) are obtained. Through the tenfold cross-validation training chart in Fig. 4, it can be seen that in each fold cross-validation, the prediction accuracy of the network is as expected, and the accuracy increases with the increasing number of iterations. At the same time, when the model is fully converged, the prediction accuracy rate of the model reaches more than 92.5%, i.e., the number of prediction correct samples reaches 37 or more in the 40 training samples of the model. It shows that the model can fit the benchmark data well during training. From the prediction accuracy of the verification set in each fold in Fig. 5, the verification prediction accuracy of each fold in the training process reaches more than 80%, i.e., among the 10 verification samples, more than 8 samples were predicted correctly by the model. In addition, the total verification average prediction rate of tenfold cross-validation reached 92%. It shows that the model has good generalization ability, and there is no phenomenon such as overfitting and underfitting. However, this generalization ability has also been well verified from the final test. In the test stage, the 10 samples used for the test are correctly predicted, i.e., the positive prediction rate of the network test is 100%. The above training, verification, and testing fully show that the optimized and improved network has good generalization and is suitable for evaluating top coal cavability. www.nature.com/scientificreports/    Through the model prediction and evaluation, the prediction and evaluation result is that the top coal cavability grade of 61,107 longwall top coal caving face in Liuwan Coal Mine is grade II. From the project site, the predicted results are consistent with the actual site. In the production process, the 61,107 longwall top coal caving working face can cave well and be well discharged from the coal caving support, but there are large blocks in the discharged coal and blocking occasionally. It can be well solved by taking corresponding measures.

Conclusion
The fuzzy neural network developed by Nauck and Kruse is improved and optimized because of the shortcomings of the current classification evaluation of top coal cavability. Moreover, the global approximation of the improved neural network that the neural network should have is demonstrated. In order to avoid the "overfitting" problem of the model, improve the rapid convergence of the neural network model and make it have good generalization ability, the corresponding optimization design has been made. The model was constructed using MATLAB software and was trained and tested. Finally, the trained and tested network evaluated the top coal cavability grade of the 61,107 longwall top coal caving working face of Liuwan Coal Mine. The prediction evaluation result was consistent with the actual situation of the project. It fully proves that the improved and optimized fuzzy neural network has good universal applicability in the classification evaluation of top coal cavability. It provides another more scientific and reliable method for the classification evaluation of top coal cavability.