Introduction

Mining activities and civil projects are carried out using one of the most important operations, namely rock blasting, as a wide and economical way to rock breakage and displacement of them1. In this regard, the rock mass is drilled (drilling operations), and then many blast-holes are charged using explosive materials (blasting operations). Inevitably, blasting operations are caused several side environmental consequences/issues such as flyrock, back-break, dust pollution, air-overpressure, and ground vibration2,3,4,5,6,7. The blast-induced, air over-pressure, ground vibration, and flyrock are the most adverse phenomenon among them1,8,9. Therefore, the blasting sites and mine environment must be safe by monitoring and eliminating the adverse effects of the aforementioned consequences. It should be noted that the accurate amount of each phenomenon should be determined/predicted before conducting the operations. The pre-split and power-deck methods can be used to minimization adverse effects1.

Ground vibration is the most crucial side environment effect due to bench blasting based on previous investigations10,11. The effective parameters on ground vibration should be identified for its prediction/evaluation. The ground vibrations can be measured/recorded based on two different terms: peak particle velocity (PPV) and frequency12,13,14,15. According to various standards, the PPV is the most famous representative for estimating and evaluating blast-induced ground vibration in surface mines1,16,17. The most significant parameters on PPV are the number of blast-holes, hole depth, burden, spacing, powder factor, the charge per delay, and the distance between installed seismograph and blasting bench18,19,20,21.

In recent decades, many models have been introduced for PPV prediction in mines and open pits. The empirical models have been developed by Davis et al.22, Ambraseys and Hendron23, Dowding24, Roy25, and Rai and Singh26 for estimation of blast-induced PPV. However, the performance of empirical predictive models is weak and unacceptable. In addition, the empirical equations do not have the ability to accurately predict the PPV values while they must be accurately estimated to overcome the adverse effects. On the other hand, new computational techniques i.e., soft computing (SC) and artificial intelligence (AI) are capable to resolve science and engineering problems in terms of accuracy level27,28,29,30.

In the field of PPV, a vast range of SC/AI techniques have been proposed for prediction purposes7,31,32,33,34,35. For example, Hasanipanah et al.35 predicted the PPV values using a genetic algorithm. They concluded that this optimization algorithm can predict PPV values with high accuracy. Imperialist competitive algorithm (ICA) as another optimization algorithm was employed to estimate the value of PPV in the research conducted by Armaghani et al.6. They concluded that the ICA algorithm is capable for PPV prediction with high performance. In another study, Taheri et al.36 combined artificial neural network (ANN) and artificial bee colony (ABC) to the prediction of PPV; then results were compared to empirical equations. Their results indicated that the performance of the ANN-ABC model is higher than empirical models. Fuzzy system (FS) combined with ICA was introduced in the study conducted by Hasanipanah et al.13 to predict PPV. The results of their hybrid model showed that FS-ICA can forecast PPV with a high level of accuracy. Fouladgar et al.37 used the cuckoo search (CS) as a novel swarm intelligence algorithm for PPV prediction induced by mine blasting. Additionally, Hasanipanah et al.38 established a particle swarm optimization (PSO) technique for forecasting PPV values. In other studies, different techniques such as adaptive neuro-fuzzy inference system (ANFIS) were developed by Iphar et al.39 for the estimation of PPV with an acceptable degree of prediction performance. Table 1 summarises the most important studies related to PPV estimation by utilizing the AI and SC techniques.

Table 1 Literature review of PPV estimation using AI and SC methods.

An overview of the literature demonstrated that various SC/AI models have been established to estimate the PPV values. Nevertheless, scholars are always looking for models with the highest performance to enhance the accuracy of developed predictive models and decrease the adverse effect of PPV on the environment. Hence, in this study, to increase the accuracy and performance of AI models in the estimation of PPV, an ensemble of XGBoost as well as ANN models are proposed. According to certain research, no machine learning algorithm could ever consistently outperform every other algorithm. In reaction to this assertion, the ensemble learning method was created. Contrary to traditional machine learning approaches, which try to learning a single hypothesis from train dataset, ensemble learning algorithms develop numerous hypotheses and integrate them to solve a specific issue. Ensemble algorithms have resulted in significant improvements and minimized the overfitting issue by integrating numerous learners and fully using these learners. They also offer the flexibility to handle various jobs. Three well-known ensemble approaches include bagging, boosting, and stacking, while there are a few variations and more ensemble algorithms that have been put to use in real-world scenarios63. In this way, several publications analysed the performance capability of ensemble models in the various fields such as health science64, sport science65, agriculture66,67, finance68, wireless sensor network (WSN)69 and geosciences70.

The combination of multiple networks and creating an ensemble system can reduce the risk of incorrect results and potentially improve the accuracy and generalization capability. Indeed, an ensemble technique is a robust machine learning method that combines several learners, e.g., ANNs or any other machine learning methods, to improve overall prediction accuracy. In most cases, an ensemble of machine learning methods in comparison to a single learner gives better results70,71,72. This study will introduce a new viewpoint of ensemble modeling to estimate PPV based on two machine learning methods, i.e., XGBoost and ANN models as a stacked generalization technique. For comparison purposes, the performance of the ANNs ensemble method is compared to the XGBoost ensemble method. The more accurate model in forecasting blast-produced PPV will be selected based on the statistical results of all proposed models.

The main research questions are presented as follows:

  • How to increase the accuracy of predictive models?

  • How is the accuracy level of the model evaluated?

  • How is the performance of the proposed model compared to the literature?

  • How to measure the validity of the model?

  • How is output parameter performance measured against input parameters?

Case study and data preparation

This study was focused on the Anguran lead–zinc open-pit mine (Iran), which is located at between longitudes 47° 23′ 27″ N and 47° 25′ 50″ N, and between latitudes 36° 36′ 37″ N and 36° 38′ 04″ N. In addition, the altitude of this mine reported about 2935 m above sea level. Anguran is one of the largest mines in the Middle East (Fig. 1), which is operated with an annual 1.2 Mt extraction rate.

Figure 1
figure 1

Location of Anguran lead–zinc mine and designed pit73 (this figure is modified by EdrawMax, version 12.0.7, www.edrawsoft.com).

The previous studies considered the blast design parameters as effective parameters on PPV intensity. In this study, we considered the seven blasting pattern design parameters which are used as models’ inputs. These parameters include the number of blast-holes (n), hole depth (ld), burden (B), spacing (S), powder factor (q), the charge per delay (Q), and the distance between installed seismograph and blasting bench (d). A total number of 162 blasting rounds were monitored and the effective parameters were measured during operations. The descriptive statistics of the aforementioned parameters are tabulated in Table 2. In the Anguran mine, initiation sequence is inter-row with the time delay of 9 to 23 ms.

Table 2 The properties of the parameters and their ranges.

The significant relationships between effective parameters and PPV were determined using Pearson cross-correlation. The Pearson test measured the linear correlation of bivariable. The Pearson correlation between parameters and PPV is demonstrated in Table 3, in which the values are calculated in the range of − 1 to + 1. The positive and negative values indicated the positive and negative dependence degree, respectively. Besides, the value of 0 denoted no correlation between the two parameters74.

Table 3 Pearson’s correlation matrix of parameters and PPV.

As can be found, the correlation between PPV and PF is high and positive; while PPV and Di have a low and negative correlation. The matrix plot of all parameters is shown in Fig. 2.

Figure 2
figure 2

Scatter matrix plot of all parameters considered in this study.

Method background

Artificial neural network (ANN)

ANN is one of the AI techniques, which first presented in the 1970s. The application of ANN has penetrated various fields of science75. A model of ANN is designed based on activities of artificial neural of the human brain. The architecture of an ANN is constructed using the input layer, hidden layer(s), and output layer76. Noteworthy, each layer includes many nodes (neurons) which are linked to each other by the weight of the processing components (connections). Input signals, which are the same as input data, are propagated throughout the network using input neurons. Then, input signals pass through the hidden layer(s) and access the output layer. In other words, some calculations are performed during passing signals in each layer and then delivered to the subsequent layer77,78,79. These calculations are formulated in Eq. (1) which simulated the training process of the network80

$$y = f_{i} \left( {\sum\limits_{i = 1}^{n} {w_{ij} x_{j} + b_{i} } } \right)$$
(1)

where f denotes activation function, w is the weight of connections, b indicates bias, and x is input data. Notably, the monolayer architecture of the neural network is suitable for simple problems, as well multi-layer architecture is used for complex problems81. However, an ANN architecture with two hidden layers for solving engineering problems is usually efficient75.

Extreme gradient boosting (XGBoost)

XGBoost is one of the applicable artificial intelligence techniques, which is firstly introduced by Chen et al.82 in 2015. XGBoost, as an AI method, is developed based on the gradient boosting decision. The most important characteristic of this method is creating boosted trees effectively and generating them in parallel. Besides, XGBoost deals with well-known classification and regression problems e.g., Bhattacharya et al.83, Duan et al.75, Nguyen et al.84, Ren et al.85, and Zhang and Zhan86. In XGBoost, gradient boosting (GB) creates a status under which an objective function (OF) is determined. The optimization of the value of OF is the core of the XGBoost algorithm, which operating to each various optimization technique. Overcoming the problems of data science has made it a robust algorithm. In XGBoost, parallel tree boosting of GB decision tree and GB machine can accurately solve many problems75,84. Training loss (L) and regularization (Ω) are the two main components of an OF in this algorithm that defined as follows:

$$OF\left( \theta \right) = L\left( \theta \right) \, + \Omega \left( \theta \right)$$
(2)

The model performance related to training data is measured using training loss. Notably, the control and overcome overfitting problem as a model complexity is performed by the regularization term. In this regard, the complexity associated with each tree is calculated in several ways; nevertheless, the following formula is widely used to determine the complexity:

$$\Omega \left( f \right) = \left( {\gamma \cdot n} \right) + 1/2\lambda \cdot \sum\limits_{j = 1}^{n} {\left( {\omega_{j}^{2} } \right)}$$
(3)

where n indicates the number of leaves and \(\omega\) denotes the vector of scores on leaves. In XGBoost, the structure score is the OF represented as:

$$OF = \sum\limits_{j = 1}^{n} q + \left( {\gamma \cdot n} \right)$$
(4)
$$q = \left( {G_{j} \cdot \omega_{j} } \right) + \left( {1/2\left[ {H_{j} + \lambda } \right]\omega_{j}^{2} } \right)$$
(5)

where q is the best \(\omega_{j}\) for a presented structure (a quadratic form). Noteworthy, the \(\omega_{j}\) is an independent vector.

Ensemble modeling

The ensemble of multiple individual learners (base models) is a robust way to enhance the performance and accuracy of artificial intelligence predictive models. In other words, the ensemble model deals with the combination of various models with different results87. In general, ensemble modeling includes two components, i.e., an ensemble of base models and a combiner. Training several base models/networks by different subsets of the training data, and employing the different architectures for each of the base models are two common techniques to build the base models71, which in current work later method for constructing the base models are used. Also, to the combination of base models, different strategies are proposed where all attempt to reduce the error of estimation.

Generally, combiners are divided into two main groups, i.e., trainable and non-trainable methods. For the combination of the outputs of the base models to achieve a single solution two non-trainable methods, i.e., majority voting and averaging methods, are widely used by scholars, e.g., Barzegar and Asghari Moghaddam88, Dogan and Birant89, and Krogh and Vedelsby90. As such, the mixture of experts and stacked generalization are two trainable combiners that are successfully used in different studies, e.g., Alizadeh et al.70, Jacobs et al.91, and Wolpert92. The trainable combination methods are trained by outputs of base models and expected correct results to predict the final results. The trainable combiners for predicting models that there are complex relations between inputs and targets are more efficient.

In this study, for each of the methods, i.e., XGBoost and ANN, several models to predict the PPV by stacked generalization technique were combined. In this regard, some ANNs models with a different number of hidden nodes, various activation functions, and different training algorithms for predicting PPV were used. Then top ANNs architectures were combined by the stacked generalization methods to construct the ensemble ANNs that named EANNS model. Notably, various XGBoost models as individual models are developed with different nrounds and different maximum depth for PPV estimation, and then top XGBoost models were combined by the stacked generalization technique, which this newly constructed model is called ensemble XGBoosts (EXGBoosts) model. Figure 3 represents the framework of EANNs and EXGBoosts methods, respectively.

Figure 3
figure 3

A schematic representation of EANNs and EXGBoosts methods for predicting PPV.

Stacking ensemble model

The stacking model basis is divided into two main phases, which are referred to as level-0 and level-1 structures, respectively. Base models are referred to as level-0, whereas the meta model at level-1 allows base-model predictions to be combined. Estimates provided by base-models are employed throughout the meta-training model's phase. In the case of regression or classification, the predictions result of the basic-models are utilized as inputs and can be of genuine use to the meta-model69. The methods of ANN and XGBoost are employed as the base-models in our research. Noteworthy, these models' several separately architectures are each employed individually as meta-learners.

Pre-analysis of modeling process

This study develops EXGBoosts and EANNs models with seven effective variables and only one output variable to estimate PPV in Anguran lead–zinc mine. In the first step of modeling, all data were normalized in the interval of [0,1], for better network training. Equation (6) was used for normalization of data:

$$x_{NORM} = \left( {\frac{{\left[ {x_{i} - x_{min} } \right]}}{{\left[ {x_{max} - x_{min} } \right]}}} \right)$$
(6)

where xnorm denotes normalized value, xmax and xmin are the maximum and minimum values, and xi indicates the input value. In the second step, to present the PPV predictive models, the collected data from the blasting site is randomly divided into two parts, i.e., training and testing datasets. In this regard, 80% of the whole data, namely approximately 130 blasting events, were specified randomly to the training part of models. While the remaining data (approximately 32 blasting events) were used for evaluation of the models' performance.

In the third step, several base models are developed for PPV estimation and the performance of models is compared and evaluated using several statistical indicators such as coefficient determination (R2), root mean square error (RMSE), mean absolute error (MAE), the variance accounted for (VAF), and Accuracy (Eqs.7 to 11). These indices are calculated to evaluate the relationship between measured PPV values and estimated one by developed models.

$$R^{2} = 1 - \left( {\frac{{\sum\limits_{i = 1}^{n} {(O_{i} - P_{i} )^{2} } }}{{\sum\limits_{i = 1}^{n} {(P_{i} - \overline{P}_{i} )^{2} } }}} \right)$$
(7)
$$RMSE = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {(O_{i} - P_{i} )^{2} } }$$
(8)
$$Accuracy = 100 - \left( \frac{100}{N} \right) \times \frac{{2 \times \sum\limits_{i = 1}^{n} {\left| {O_{i} - P_{i} } \right|} }}{{\left( {O_{i} - P_{i} } \right)}}$$
(9)
$$MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left| {O_{i} - P_{i} } \right|}$$
(10)
$$VAF = 100 \cdot \left( {1 - \frac{{var(O_{i} - P_{i} )}}{{var(O_{i} )}}} \right)$$
(11)

where Oi, Pi, and \(\overline{P}_{i}\) are measured value, predicted value, and the average of the predicted values, respectively. Also, n indicates the number of datasets. However, the value of R2, RMSE, MAE, VAF, and Accuracy for the most accurate system are one, zero, zero, 100, and 100, respectively.

PPV predictive models

ANN model

In the present study, for PPV prediction in a surface mine the multi-layer perceptron (MLP) artificial neural network as the most popular structure of ANN was used. The MLP structure contains at least one hidden layer. Hence, the determination of the training algorithm, number of hidden nodes, and hidden layers is a challenge in MLP modeling. In other words, the MLP structure must be designed to train optimally. The feedforward-backpropagation algorithm was used for MLP structure training. In addition, the “trial-and-error” procedure was employed to achieve an MLP model with an optimal structure to predict accurately PPV value. Therefore, 15 different MLP models as base models were developed (Table 4). As can be found, each of the models was trained with different training algorithms, hidden activation functions, output activation functions, and architectures. To determine the optimal architecture, the validation indices of R2, RMSE, Accuracy, MAE, and VAF that were formulated in Eqs. (7) to (11) were separately calculated for ANN training and testing datasets. Remarkably, the scoring system proposed by Zorlu et al.93 was applied to calculate the rate of each indices for MLP developed models. Table 5 shows the rating indices and ranking of MLP models. Based on results, base model number three with two hidden layers, “tansig” as hidden and output activation functions, and Levenberg–Marquardt (LM) training algorithm is the best base model for PPV prediction. This base model had the 141 total rates out of 150, that the values of (0.948, 0.567, 0.350, 94.767, 94.247) and (0.928, 0.293, 0.487, 92.773, 90.254) are obtained for R2, RMSE, MAE, VAF, and Accuracy of training and testing datasets, respectively.

Table 4 The base models of ANN and their evaluations.
Table 5 Performance of the base models of ANN and their rankings.

XGBoost model

Herein, the XGBoost algorithm is used for PPV prediction. Before that, two main stopping criteria, including maximum tree depth and nrounds, were determined. These criteria have a significant impact on the performance of models. Similar to MLP networks, the overfitting problem there is also in XGBoost, which is occurred when the tree depth and the nrounds are set in the much values. Therefore, the range of [1–3] and [50–200] are considered for the maximum tree depth and nrounds. Similar to the ANN, the “trial-and-error” technique was used to determine an XGBoost model with the best performance. As shown in Table 6, the validation indices were computed to evaluate the base models of XGBoost performance. To construct the ensemble of XGBoost, 15 base models with different values of nrounds and maximum tree depth were developed. Based on Table 7, 15 base models of XGBoost were evaluated using Zorlu et al.93 scoring system. The results were shown that XGBoost base model number two, with the values of 50 and 1 for nrounds and maximum tree depth had the best performance in the PPV prediction, which this base model of XGBoost gets the score of 145 out of 150. The validation indices, i.e., R2, RMSE, MAE, VAF, and Accuracy were calculated as (0.977, 0.650, 0.402, 97.578 (%), 96.828) and (0.979, 0.536, 0.680, 97.895(%), 96.528) for training and testing datasets, respectively. However, a comparison between top base models of XGBoost and ANN reveals the superiority of the XGBoost method in the prediction of PPV.

Table 6 The base models of XGBoost and their evaluations.
Table 7 Performance of the base models of XGBoost and their rankings.

Ensemble model of ANNs (EANNs) to predict PPV

For the ensemble model of ANN, first, 15 base models for ANN are developed, and then after evaluation of the base models, five top base models for combination were chosen, that the scores of these models were 141, 127, 118, 106, and 100 out of 150, respectively. The correlation of measured PPV and predicted ones by five base models are illustrated in Fig. 4. After that, the stacked generalization combination technique was employed to combine the selected base models. For combination, the results of selected base models a feed-forward neural network with sigmoid activation function for hidden layers were used (Fig. 5). The input data of the combiner network consists of seven variables and the target dataset is the measured value of PPV.

Figure 4
figure 4

Correlation graph between measured and predicted values of PPV, using five top base models of ANN.

Figure 5
figure 5

The architecture of the ensemble ANN model for PPV prediction in Anguran mine (this figure is generated by EdrawMax, version 12.0.7, www.edrawsoft.com).

The correlation graph of predicted values using the stacked generalization technique and measured values is illustrated in Fig. 6. The values of (0.960, 0.402, 0.233, 95.963(%), 95.724) and (0.941, 0.189, 0.219, 92.827(%), 95.713) were obtained for both R2, RMSE, MAE, VAF, and Accuracy of training and testing datasets, respectively. Results proved that the EANNs model predicts PPV better than individual ANN (base models), so that the EANNs model 41% and 55% improved the RMSE of PPV prediction for training and testing part, respectively, in comparison with the best base model.

Figure 6
figure 6

Correlation graph between predicted data (EANNs model) and measured data.

Ensemble model of XGBoosts (EXGBoosts) to predict PPV

To construct EXGBoosts model for the prediction of PPV, first, several XGBoost models were developed. In this regard, 15 constructed XGBoost models were analyzed, and the five top base models with the highest score were selected. The numbers 145, 126, 115, 100, and 98 were the scores of the five top base models. The EXGBoosts model was structured based on a combination of five XGBoost base models. The base models using stacked generalization technique was combined to predict PPV. Figure 7 showed the correlation of PPV estimations by five XGBoost base models and measured values of PPV. The combiner was structured using a nrounds of 15 and a maximum tree depth of three. The results of stacked generalization show, the accuracy of the EXBoosts model in comparison with the best XGBoost base models is better (Fig. 8 and Table 8). To better comparing of the applied methods capability in estimating of PPV value, the performance of developed ANN, EANNs, XGBoost, and EXGBoosts models are tabulated in Table 8. The obtained statistical indices indicated that the EXGBoosts model with the value of (0.990, 0.391, 0.257, 99.013(%), 98.216) and (0.968, 0.295, 0.427, 96.674(%), 96.059) for R2, RMSE, MAE, VAF, and Accuracy of training and testing datasets, respectively, represents the highest performance for prediction of PPV among all applied models. Besides, EXGBoosts model 66% and 82% improved the RMSE of PPV prediction for training and testing part, respectively, in comparison with the best base model. The obtained results of performance indices regarding to our model presented in Table 9. This table compares the prediction accuracy and performance level of out proposed approach with three latest research. The results demonstrates that EXGBoost model has more performance capacity in model and estimation of PPV in comparison with the other methods.

Figure 7
figure 7

Correlation graph between predicted PPV by various XGBoost base models and measured data.

Figure 8
figure 8

Correlation graph between predicted data (EXGBoosts model) and measured data.

Table 8 Performance of the ensemble and the best individuals of ANN and EXGBoosts.
Table 9 Accuracy comparison of our proposed technique with other reseach.

It is known that the significance of the estimation of level l (where l reveals the percentage of estimation) stands the quotient of the number of samples in which the estimations are within l absolute limit of measured values divided by the total number of samples. A common metric for evaluating the best models is P(0.25) ≥ 0.75 or 75%94. The level of 25% was used to test model in our study.

In which, where n is the number of dataset, Pi denotes the predicted value, and Oi indicates the observed values.

The 25% level estimation of ANN, XGBoost, EANNs, and EXGBoosts are showed in Table 10. As can be seen, the ANN at P(0.25) is not acceptable in validation dataset, but other models is acceptable in both testing and validation datasets. It can be concluded that the ensemble models developed in this study have the highest performance and capability in predicting PPV.

Table 10 Estimation level at 25% in testing and validation datasets.

Multiple parametric sensitivity analysis (MPSA)

In this part, a parametric analysis was conducted to specify which influential parameters have the highest impact on the average PPV value. In this regard, a multiple parametric sensitivity analysis (MPSA) was performed that follows the six main steps for applying to the outputs of the system for a specific set of parameters. These steps are as follows:

Step 1 Selecting the effective parameters to be subjected.

Step 2 Adjusting the range of input parameters.

Step 3 Generating a set of independent parameters in the form of random numbers with a uniform distribution for each parameter.

Step 4 Running the machine learning method utilizing the generated series and calculating the objective function using Eq. (12). The objective function was computed using the sum of square errors between measured and predicted values95:

$$f_{h} = \sum\limits_{i = 1}^{n} {\left[ {x_{o,h} - x_{c,h} \left( i \right)} \right]^{2} }$$
(12)

where fh denots the objective function value for a particular PPVt variable h; xo,h indicates the measured values; xc,h(i) is the calculated value xc for variable h for each generated inputs; and n is the number of variables contained in the random set. In the computation process, the Monte Carlo simulation was used to generate 162 random data for seven effevtive parameters used in this study. At each iteration of the model, the trained models were provided with the newly produced values for one parameter.

Step 5 Determining the relative importance of effevtive parameters separately using Eq. (13)95:

$$\delta_{h} = \frac{{f_{h} }}{{x_{o,h} }}$$
(13)

In which, h is the variable that is used to introduce pairs of effective parameters. The outcomes that were achieved for each of the evaluated parameters were produced by using the technique that was provided to the PPVt model. Equation (13) had a significant importance in the accomplishment of these results.

Step 6 Evaluating parametric sensitivity and determining relative relevance of effective parameters using Eq. (14)95:

$$\gamma = \sum\limits_{h = 1}^{{i_{PPV,max} }} {\delta_{h} }$$
(14)

where the δt is computed from the first series of dataset (h = 1) to the maximum values (\(i_{PPV,max}\)), which is 162 data for developed model in this study. Table 11 provides a tabular breakdown of the value spectrum that was employed throughout the evaluating of each parameter.

Table 11 The range of γ index to determine sensitivity of each parameter95.

The lower the γ index value for each parameter, the less sensitive the st model is to that parameter, and the higher the γ index, the more sensitive the model is to the parameter under consideration. Table 11 has presented the γ index to evaluate the impact of model parameters and identify the most sensitive parameters. The calculated γ index for each parameter is depicted in Fig. 9. It can be found that the order of the sensitivity of the PPV to the parameters is ld < S < n < Q < q < B < d. It can be concluded that the PPV is highly sensitive to d, B, q, Q, and n, as well as sensitive to S and ld.

Figure 9
figure 9

The impact of effective parameters on PPV.

Influence of delay sequence on PPV

The seismic energy is what causes the blasting vibrations to be generated, and it also literally symbolizes the problems created to the rock-mass that extends beyond the boundaries of the explosion patch. The blasting pattern design specifications, explosives type and properties, and the physio-mechanical characteristics of the rock-mass all affect how much PPV occurs. The generation of PPV for several experimental implementing blasting has been obtained; the PPV value is reported as 5.12–17.23, 3.91–12.14, and 1.48–5.93 in the delay sequence (row to row) of 9, 15, and 23 ms. It can be concluded that the 23 ms delay between the rows will assist in lowering the PPV, which may be lowered up to a particular value by choosing the right delay sequence in production blast, according to field observations and data analysis.

The superimposition of waveform due to delay sequence refers to the effect of time delays on the coherence of signals. When two or more signals are delayed relative to each other, their waveforms may overlap and interfere with each other, resulting in a composite waveform that may be difficult to interpret. The impact of this effect on the outcome of a result depends on the specific context of the analysis. In some cases, such as in signal processing or communication systems, delay sequences are intentionally introduced to improve signal quality or reduce interference. In these cases, the superimposition of waveforms may be a desirable effect. However, in other cases, such as in physiological or biological signal analysis, the superimposition of waveforms due to delay sequences can lead to a loss of information and inaccuracies in the analysis. For example, in electroencephalogram (EEG) recordings, time delays between signals from different brain regions can result in overlapping waveforms that make it difficult to identify the underlying brain activity.

Results and discussions

This paper was accurately focused on estimation PPV due to mine blasting. In this way, the most effective parameters on PPV variation were identified. Two AI-based models i.e., ANN and XGBoost, were considered for choosing the best between PPV predictive models. For each predictive method, an ensemble model, i.e., EANNs and EXGBoosts, was developed, and the best one was chosen. The obtained results from statistical indicators (R2 and RMSE) associated with the best predictive models of ANN, XGBoost, EANNs, and EXGBoosts for training and testing parts were illustrated in Figs. 10 and 11.

Figure 10
figure 10

The value of R2, RMSE, and MAE for selecting the best model in the predicting PPV values.

Figure 11
figure 11

The value of VAF and accuracy for selecting the best model in the predicting PPV values.

The predictive model of EXGBoosts has specified capable of presenting the highest performance prediction level in train and test parts. Therefore, EXGBoosts was found a superior accuracy level regarding statistical indicators values among other predictive models. The R2 values of (0.948, 0.977, 0.960, and 0.990) and (0.928, 0.979, 0.941, and 0.968) were calculated for training and testing phases of ANN, XGBoost, EANNs, and EXGBoosts models, respectively. Besides, RMSE values of (0.567, 0.650, 0.402, and 0.391) and (0.293, 0.536, 0.189, and 0.295) were obtained for training and testing parts of ANN, XGBoost, EANNs, and EXGBoosts models, respectively. The EXGBoosts model revealed a maximum performance and minimum system error between other predictive models. In situations where the testing datasets reflect adequate generalizability of predictive techniques, the excellent efficiency of the train phases suggests the success of the learning procedures of the predictive models.

Benefits and drawbacks of the study

The main benefit of this study is in improving the performance and accuracy of the proposed ANN and XGBoost models. These models separately provide lower accuracy than the ensemble models. Therefore, using the combination of these methods and constructing an ensemble model, it is possible to predict the PPV with acceptable accuracy. Noteworthy, neural network base models each have different results and have uncertainty due to being a black-box. However, the ensemble model solves this problem to an acceptable. This study also has drawbacks. In this study, only two AI models have been used i.e., ANN and XGBoost. However, the number of AI models can be increased to reach maximum accuracy. It should be noted that the number of base models in this study is acceptable; nevertheless, more models can be obtained and run the ensemble model based on them.

Conclusions

In this study, the PPV induced from bench blasting is studied in Anguran lead–zinc mine, Iran. Considering the crucial importance of the adverse effects of ground vibration in blasting operations, the prediction of PPV at a high level of accuracy is essential. Therefore, this study investigates the ensemble of various artificial intelligence models to construct an accurate model for PPV estimation using 162 blasting datasets and seven effective parameters. For this aim, several ANN and XGBoost base models were developed and the five top base models among them were combined to generate EANNs and EXGBoosts models. To combination of top base models’ outputs and achieve a single result stacked generalization technique was used. The statistical indexes of R2, RMSE, MAE, VAF, and Accuracy were used to evaluate the performance of developed models and a scoring system was applied to select the best ANN and XGBoost base models with optimal structure. The results revealed that the EANNs with R2 of (0.960, and 0.941), RMSE of (0.402, and 0.189), MAE of (0.233, and 0.219), VAF of (95.963(%), and 92.827(%)), and Accuracy of (95.724, and 95.713) for training and testing datasets, respectively, and EXGBoosts model with R2 of (0.990, and 0.968), RMSE of (0.391, and 0.295), MAE of (0.257, and 0.427), VAF of (99.013(%), and 96.674(%)), and Accuracy of (98.216, and 96.059) for training and testing datasets, respectively, were two efficient machine learning ensemble methods for forecasting PPV. Comparison of the results of developed ensemble methods, i.e., EANNs and EXGBoosts, with the best individual models showed the superiority of ensemble modeling in predicting PPV in surface mines. Moreover, EXGBoosts model was most accurate compared to the EANN model. In the final step of this study, the effectiveness of each input variable on PPV intensity is determined using the CA method, which results denoted the spacing has the most impact on PPV. From practical applications, the proposed model can be updated for other engineering fields, specially mining and civil activities. Meanwhile, the ensemble machine learning approach can be applied to improve performance capacity of machine learning techniques and increase the accuracy level of prediction targets. The proposed models can be used to analyze safety data and identify potential hazards, blasting safety zone, and risks in blasting operations. The PPV values can be predicted before blasting operations to check any potential issues or damage to the workers, equipment and surrounding residential area. If the predicted results are higher than those suggested in literature or standards, the blasting pattern/design can be reviewed again to have a predicted PPV values within the suggested safe ranges. Generally, machine learning algorithms can be used to analyze environmental data and monitor the impact of mining operations on the environment.