Introduction

One of the most important issues facing the oil industry, especially the drilling industry, is the costs of drilling, and has attracted much attention in recent decades. Many factors can affect the cost of drilling, the most important of which is the drilling time of the well, which can increase drilling costs several times. Therefore, reducing drilling time is one of the most significant purposes of drilling engineers1,2,3. In other words, one of the major aims of drilling optimization is to lessen the total time4. For this purpose, two ways have been proposed: choosing optimum drilling variables (e.g. picking a suitable drilling fluid type and drill-bit) and instantaneous analysis so as to optimize operational parameters such as rotary speed and weight on bit while drilling4.

The major factor affecting drilling time is the rate of penetration (ROP)5. Hence, the precision of ROP model is critical6. Many parameters affect the drilling rate, including drilling mud properties, formation characteristics, rotary speed, and bit characteristics2,7. The main parameters that affect ROP are presented in Fig. 1. Some of these parameters are uncontrollable, such as formation characteristics, and others are controllable, such as the properties of drilling mud. Studying the effect of different parameters individually on ROP can easily be investigated, such as rock strength, revolutions per minute (RPM), and weight on bit (WOB)8. Increasing uniaxial compressive strength of formation rock causes hardening and thus decreases penetration rate8,9. The drilling parameters are also controllable factors for changing drilling rate. The type of bit and its genus10, and the fit of bit and formation are effective in increasing drilling rate. Although increasing RPM11 increases drilling rate in soft formations, it is not directly visible in hard formations and low rotational speeds can result in better drilling rates. The flow rate and characteristics of drilling mud, such as plastic viscosity (PV), mud weight (MW), and yield point (YP) determine the ability of the mud to transfer drilling cuttings to the surface. Better cutting transportation to the surface prevents the accumulation of cuttings and regrinding, and increases drilling rate. The WOB determines the degree of contact and penetration of bit into the formation which depends upon the type of rock, and can increase the drilling rate, but the WOB has a direct relation to the drilling rate to a certain extent, and then has no great impact on drilling rate12,13. Many models have been proposed to predict ROP, but they are problematic as they are obtained either in the lab or by using incomplete field data2,14. In other word, effects of the drilling variables on the ROP has not yet been understood completely15. So far, different methods have been proposed to optimize the drilling rate, but due to the fact that a large number of parameters influence the drilling rate, development of an efficient and comprehensive model is very difficult. On the other hand, the complex relationship between these parameters has led to a lack of a comprehensive model2,14.

Figure 1
figure 1

The main factors affecting ROP.

Normally, two main approaches are used to predict ROP, including traditional models and machine learning (ML) models.

Some famous traditional correlations are as follows:

Maurer16 developed Eq. (1) for rolling cutter bits:

$$ROP = \frac{K}{{S^{2} }}\left( {\frac{W}{{d_{b} }} - \frac{{W_{o} }}{{d_{b} }}} \right)^{2} N.$$
(1)

In the above equation, S and K are the compressive rock strength and constant of proportionality, respectively. Wo and W are the threshold bit weight and bit weight, respectively. db shows diameter of drill-bit and N denotes the rotary speed.

Another traditional model for ROP was introduced by Bingham17:

$$ROP = K\left( {\frac{W}{{d_{bit} }}} \right)^{{a_{5} }} N.$$
(2)

where R, W, \(d_{bit}\), and N refer to ROP (ft/hr), weight on bit (klbs), bit diameter (in), and rotary speed (rot/min), respectively. K and \(a_{5}\). are Bingham coefficients, and have different values for various formations18.

One of the most important ROP models was developed by Bourgoyne and Young19. This model is extensively employed in the industry20. Equation (3) was proposed by Bourgoyne and Young19. Eight parameters were involved in Bourgoyne and Young model19 (BYM).

$$\frac{dD}{{dt}} = {\text{exp}}\left( {a_{1} + \mathop \sum \limits_{j = 2}^{8} a_{j} x_{j} } \right)$$
(3)

where D shows the well depth, the coefficient a1to a8 are associated with the formation strength parameter, formation compaction, pore pressure, differential pressure, weight on bit exponent, rotary drilling, drill-bit tooth wear, and bit hydraulic jet impact, respectively, and t denotes the time. Afterwards, Bourgoyne et al.18 suggested an adaptation to their original ROP model:

$$ROP = \left( {f_{1} } \right)*\left( {f_{2} } \right)* \ldots \left( {f_{8} } \right)$$
(4)

In the above equation, the functions f1 to f8 involves the empirical coefficients a1–a8. As stated by Soares and Gray6, the main difference between Eqs. (3) and (4) is in the last function. Equation (3) uses Eckel's hydraulics Reynolds number, however in Eq. (4) a power law function of the hydraulic jet impact force was used. Although the BYM equations denote all important features of drilling, some parameters which are necessary in the model are not simply measured in real-time (e.g. drill bit wear, differential pressure)6.

A general drag bit model was introduced by Hareland and Rampersad21:

$$ROP = \frac{{14.14*N_{c} NA_{v} }}{{d_{b} }}$$
(5)

where, Nc and Av show the number of cutters and the area of rock compressed ahead of a cutter, which supposes a different formulation based on the drill-bit type, respectively. More details can be found in Soares et al. work22.

Finding the definite connection among the drilling parameters is not well realized and is very difficult15. Hence, some researches23,24,25 have been made to better comprehend the connection among the drilling parameters and how they affect the ROP. For instance, Motahhari et al.23 suggested an ROP model for polycrystalline diamond compact (PDC) bit:

$$ROP = W_{f} \frac{{GN^{y} N^{\alpha } }}{{d_{b} S}}$$
(6)

In this equation, S shows confined rock strength. \(\alpha\) and y represent the coefficient of ROP model and Wf denotes wear. G presents coefficient related to bit geometry and bit-rock interactions. Deng et al.24 suggested a theoretical model for ROP. This model was developed for roller cone bit and it was validated with results that were achieved from experimental data. In this model, the rock dynamic compressive strength was used in reverse static compressive strength, which improved the accuracy of the theoretical model. Eq. (7) developed by Al-AbdulJabbar et al.25 and it is based on regression analysis:

$$ROP = 16.96\frac{{W^{a} *N*T*SSP*Q}}{{PV*\rho *d_{b}^{2} *UCS^{b} }}$$
(7)

where 16.96 is used to converted units, \({\uprho }\) shows the mud density, T denotes the torque, SSP represents the standpipe pressure, Q shows the flow rate, PV presents the plastic viscosity, UCS denotes the uniaxial compressive strength. Nonlinear regression was used to calculate the coefficients (a and b).

Equation (8) proposed by Warren:

$$ROP = { }\left( {\frac{{aS^{2} d_{b}^{3} }}{{NW^{2} }} + \frac{c}{{Nd_{b} }}} \right)^{ - 1}$$
(8)

where S shows rock strength, and \({\text{a}}\) and \({\text{c}}\) denote constant8.

Effects of other factors, such as hold down of chip26,27, bit wear28, and cutting geometry 29,30 was considered by many researchers. Eckel31 expressed that mud properties have no direct effect on ROP, while Paiaman et al.32 showed that growing the plastic viscosity and mud weight can decrease the rate of penetration. Moraveji et al.33 developed a model and illustrated that the gel strength, WOB and YP/PV ratio have remarkable effect on ROP.

Soares et al.22 showed limitations of traditional ROP mods such as model introduced by Bourgoyne et al.19. ML methods are interesting methods to predict ROP. Priority of machine learning techniques than traditional model was proved by several researchers8,34,35,36. The first work about prediction of ROP by ML was conducted by Bilgesu et al.37. The ability of the neural networks to find a complex relationship between data has led to this approach being taken to predict drilling rates. Nowadays, artificial neural networks (ANNs) are widely used in oil industry. We briefly mention few of them in the following part. Alarfaj et al.38 predicted ROP using ANNs and compared several models. They concluded that the extreme learning machine (ELM) gives the accurate results. They did not consider the effect of flow rate, RPM, MW and bit diameter in their models. Ansari et al.39 used error analysis of multivariate regression to select the best parameters to predict ROP and then used support vector regression (SVR) techniques to model ROP. Finally, committee support vector regression (CSVR) based on imperialist competitive algorithm (ICA) was employed to predict ROP. Their results showed that CSVR-ICA model can improve the result of SVR39. Hegde et al.36 conducted evaluation of two different approaches, physics-based and data-driven modeling approaches, for prediction of ROP. Their results showed that the data-driven model had better prediction than traditional models36. Soares and Gray6 studied real-time predictive capabilities of ML and analytical ROP models. Their results showed than ML models decrease the error much better than analytical ones. In addition, among analytical models, the best performance belonged to BYM6. Ashrafi et al.40 employed hybrid artificial intelligence models to predict ROP. Based on their results, particle swarm optimization-multi-layer perception (PSO-MLP) gained the best performance40. Usage of ANN for ROP prediction during drilling operation was also evaluated by Diaz et al.41. Gan et al.42 suggested a new hybrid modeling model to estimate ROP. Excellent prediction performance of their proposed model was shown in this study42. Mehrad et al.43 used mud logging and geomechanical parameters to predict ROP by hybrid algorithm. Least-square support-vector machines-cuckoo optimization algorithm (LSSVM-COA) had the best performance among used models. The difference of errors in training and testing data of the developed model by LSSVM-COA was small43.

This study is conducted to develop an empirical correlation and some smart models including least square vector machine (LSSVM), multilayer perceptron (MLP), Decision Tree (DT), and Radial Basis Function (RBF), for ROP based on a large data bank (more than 5000 data points) obtained from drilling in South fields of Iran. Gradient boosting (GB) is used for DT optimization and Bayesian Regularization Algorithm (BRA), Scaled Conjugate Gradient Algorithm (SCGA) and Levenberg-Marquardet Algorithm (LMA) are used to train MLP modes. What distinguishes this study is to consider more effective parameters in developing the models. These parameters include depth (D), weight on bit (WOB), pit total (PT), pump pressure (PP), hookload (H), surface torque (ST), rotary speed (RS), flow in (Fi), flow out (Fo), and wellhead pressure (Wp). The accuracy and validity of the proposed models are evaluated by statistical and graphical techniques. In addition, the Leverage approach is employed to check the validity of the experimental data and applicability domain of the proposed models.

Modelling

Generalized reduced gradient (GRG)

For developing an empirical correlation for ROP, we proposed a structure for the correlation and used Generalized reduced gradient (GRG) to optimize the coefficient of the correlation. The optimum structure was obtained by a trial-and-error procedure. GRG is known as one of the techniques for solving multivariable problems. This method is used to solve both nonlinear and linear problems44. In this method, variables are regulated to continue the active restrictions being satisfied once the process shifts from one point to another. Linear guess to the gradient at a specific point y is developed by GRG. Both the objective gradient and restriction are solved alongside. The objective gradient function can be denoted in the form of the gradients of restrictions. Later, the search can move in a realistic way and the search area's size is reduced. For an objective function f(y) subjected to h(y)45:

$$Minimize: \, f\left( y \right) = y$$
(9)
$$Subjected \, to: h_{k} \left( y \right) = 0$$
(10)

GRG can be stated as follows45:

$$\frac{df}{{dy_{k} }} = \nabla y_{k}^{t} f - \nabla y_{i}^{t} f\left( {\frac{\partial h}{{\partial y_{i} }}} \right)^{ - 1} \left( {\frac{\partial h}{{\partial y_{k} }}} \right)$$
(11)

One of the vital conditions for f(y) to be minimized is that df(y) = 0. Interested readers can achieve more details in the literature46,47,48,49.

Decision Tree (DT)

DT is known as a non-parametric supervised learning method that can be applied for both classification and regression problems. Morgan and Sonquist50 introduced Automatic Interaction Detection (AID), known as the first decision tree. Messenger and Mandell51 introduced THeta Automatic Interaction Detection (THAID), the first classification tree algorithm. THAID is a hierarchical flow chart involving branches, root nodes, internal nodes, and leaf nodes. A top node that does not have any income branch is called root node. The root node presents the entire sample space. Nodes contain one incoming branch and more outgoing edges are the internal or test nodes. Leaves or terminal nodes are nodes that show the final results. Pruning, stopping, and splitting are three main procedures for making a decision tree52. Separating the data into a number of subsets, based on testing the most noted attribute that is valid also for the training instances is accomplished in the splitting step. Various criteria such as Gini index, information gain, gain ratio, information gain, classification error, and towing could be considered for standard deviation reduction, variance reduction, and classification tree53. Figure 2 shows an instance of a decision tree that is used for both regression and classification. Data splitting is started from the root node and develops to the internal node until reaching the stopping criteria or satisfaction of the predefined homogeneity. Representing the stopping criteria can result in a lessening of the problem complexity. This approach results in avoiding overfitting. Splitting would cause a complex tree once stopping criteria are not applied. Although the training data would be fitted well, it does not occur for the test data. Usage of represented stopping criteria would be restricted to tuning the model for the best value. In order to avoid overfitting, if stopping methods do not work properly, pruning technique is applied. In pruning technique, a complete tree is made. Afterward, it is pruned to small trees which are generated by the removal of some nodes that contain less information gain or validation data.

Figure 2
figure 2

The schematic diagram of DT.

Radial basis function (RBF) neural network

RBF and MLP are the most widely used artificial neural network (ANN) models. With these differences that the RBF model has a simpler design and its structure is fixed and consists of only three layers. It should also be noted that the categorization methods are unalike between the MLP and RBF. The data values in this method are gained based on the space of the points from the points called the center. Centers are chosen in three different ways: (a) supervised, (b) unsupervised (c) fixed. In each neuron, a transport function acts, thus, we have for f(zi) = output:

$$f\left( {z_{i} } \right) = { }\emptyset \left( {z_{i} } \right) \times w^{t} + b$$
(12)

where terms \(\emptyset \left( {z_{i} } \right)\), \(w^{t}\) and \(b\) refer to transport function, transposed matrix of weights, and bias vector, respectively.

Equation (13) shows Gaussian function and generally it is the transport function in RBF models:

$$\emptyset \left( r \right) = \exp \left( {0.5 \times \left( {\frac{r}{\sigma }} \right)^{2} } \right)\;{\text{for}}\;\sigma {\text{ > }}0$$
(13)

Other common radial functions are:

$$\emptyset \left( r \right) = \sqrt {1 + \left( {\frac{r}{\sigma }} \right)^{2} }$$
(14)
$$\emptyset \left( r \right) = \frac{1}{{\sqrt {1 + \left( {\frac{r}{\sigma }} \right)^{2} } }}$$
(15)
$$\emptyset \left( r \right) = 1 + \left( {\frac{r}{\sigma }} \right)^{2}$$
(16)
$$\emptyset \left( r \right) = { }r^{2} \ln \left( r \right)$$
(17)

The distance of point \(z_{i}\). from center \(c_{k}\). is shown as, \(\left\| {z_{k} - c_{i} } \right\|\), thus, we have:

$$\varphi _{{ki}} \left( z \right) = \exp \left( {0.5 \times \frac{{\left\| {z_{k} - c_{i} } \right\|}}{{\sigma ^{2} }}} \right),\;i = 1, \ldots ,N\;{\text{and}}\;k = 1, \ldots ,M$$
(18)

The number of inputs and kernels, centers, and Gaussian transport function is symbolized by, N, M, \(c_{k}\). and \(\varphi_{ik} \left( z \right)\), respectively.

According to the above statements, outputs are obtained by54,55,56,57:

$$output = f_{k} \left( {z_{i} } \right) = w_{0} + \sum\limits_{{i = 1}}^{N} {\varphi _{{ki}} } \times w_{i} \times \left( {\left\| {z_{k} - c_{i} } \right\|} \right),i = 1, \ldots ,N\;{\text{and}}\;k = 1, \ldots ,M$$
(19)

The schematic of RBF model and flowchart for the proposed RBF model illustrated in Figs. 3 and 4, respectively. The spread coefficient and the maximum number of neurons in RBF are 2 and 100, respectively. In addition, Gaussian function was used as a transfer function in the present study for RBF model.

Figure 3
figure 3

The schematic diagram of RBF.

Figure 4
figure 4

Flowchart for the suggested RBF model72.

Multilayer perceptron (MLP)

MLP is a feed-forward ANN that can have several layers. A simple MLP model consists of at least three layers. In this case, a hidden layer connects input and output layer. The layers are composed of neurons, except for the input layer, the neurons of the other layers contain a nonlinear activation function. The number of layers and neurons in each layer could be determined by considering the number of input data and complexity of the problem. Learning this network is performed using a supervised back-propagation algorithm. Weights and bias are the parameters of each neuron. Several functions can be used as a transfer function in hidden and output neurons. Some of these functions are presented below:

$$Binary \, step: \, f\left( z \right) = \left\{ {\begin{array}{*{20}l} z \hfill & {for} \hfill & {z < 0} \hfill \\ { - z} \hfill & {for} \hfill & {z \ge 0} \hfill \\ \end{array} } \right.$$
(20)
$${\text{Tansig}}: \, f\left( z \right) = { }\frac{{e^{z} - e^{ - z} }}{{e^{z} + e^{ - z} }} = \frac{2}{{1 + e^{ - 2z} }} - 1$$
(21)
$${\text{Logsig}}: \, f\left( z \right) = \frac{1}{{1 + e^{ - z} }}$$
(22)
$$ArcTan:f\left( z \right) = { }tan^{ - 1} \left( z \right){ }$$
(23)
$$inusid:f\left( z \right) = {\text{sin}}\left( z \right)$$
(24)
$$Purelin:{ }f{ }\left( z \right){ } = { }z$$
(25)

In the present study, Purelin, Tansig, and Logsig are three-transfer function used for MLP model. As mentioned above, the first layer has a linear function and the others have nonlinear. For example, output of an MLP model with two hidden layers is calculated as follows:

$$output = purelin\left( {w_{3} \times \left( {tan\;sig\left( {w_{2} \times \left( {\log sig\left( {w_{1} z} \right) + b_{1} } \right)} \right) + b_{2} } \right) + b_{3} } \right)$$
(26)

where \(b_{1}\), \(b_{2}\), and \(b_{3}\) refer to the first and second hidden layer bias vector and output layer bias, respectively. Matrixes of the first and second hidden layer and output layer are also denoted by \(w_{1}\), \(w_{2}\), and \(w_{3}\)54,55,57,58. Schematic of a single hidden layer MLP model illustrated in Fig. 5.

Figure 5
figure 5

The schematic diagram of MLP.

Least square support vector machine (LSSVM)

LSSVM was firstly suggested by Suykens and Vandewalle59. In LSSVM, a set of linear equations is solved; therefore, we have simplification in the learning process. Eq. (27) shows the cost function of Support Vector Machine (SVM):

$$Cost\; function = \frac{1}{2} We^{T} We + \frac{1}{2}Tu\mathop \sum \limits_{j = 1}^{Num} Ve_{j}^{2}$$
(27)

Here superscript T represents the transport matrix of a variable and We shows regression weight. A variable error of the LSSVM algorithm is shown by \(Ve_{j}^{2}\) and Tu shows the tuning parameter.

Subjected to the following restriction:

$$Z_{j} = We^{T} \varphi \left( {y_{j} } \right) + c + Ve_{j}$$
(28)

By equating the Lagrange function of the LSSVM to zero and then using the following formula, model's parameter could be achieved:

$$\left\{ {\begin{array}{*{20}l} {\frac{dL}{{dWe}} = 0 \to We = \mathop \sum \limits_{j = 1}^{Num} \alpha_{j} \varphi \left( {y_{j} } \right)} \hfill \\ {\frac{dL}{{dc}} = 0 \to \mathop \sum \limits_{j = 1}^{Num} \alpha_{j} = 0} \hfill \\ {\frac{dL}{{dVe_{j} }} = 0 \to \alpha_{i} = Tu\varphi \left( {y_{i} } \right); j = 1,2, \ldots ,Num} \hfill \\ {\frac{dL}{{\alpha_{j} }} = 0 \to We^{T} \varphi \left( {y_{j} } \right) + c + Ve_{j} - Z_{j} = 0;j = 1,2, \ldots ,Num} \hfill \\ \end{array} } \right.$$
(29)

By using Eq. (31), the parameters of LSSVM can be achieved. Unknown parameters in Eq. (31), are We, c, \(Ve_{j}\), and \(\alpha_{j} .\)

\(Ve_{j}\) and \(\sigma^{2}\) control the reliability of LSSVM. In this study, the amount of Tu and \(\sigma^{2}\) are 24.7959, and 2.2514, respectively.

Optimization algorithms

Levenberg–Marquardt algorithm

In order to train data in MLP model, training algorithms are used to optimize weights and bias values. Levenberg–Marquardt is one of these algorithms which is used to solve nonlinear problems. In this method, even if there is an inappropriate initial guess for weights and bias, the algorithm will be able to get the final answer. Due to having sum square form for performance function, the gradient and Hessian matrixes are determined as follows:

$$g = { }J^{T} e$$
(30)
$$H = { }J^{T} J$$
(31)

Here, the Jacobian matrix and network errors vector are denoted by \(J\) and \(e\).

By updating the equations, the weight values in each step are obtained as:

$$w_{k + 1} = { }w_{k} - \left( {J^{T} J - \eta I} \right)^{ - 1} J^{T} e$$
(32)

It should be noted that \(\eta { }\) is a constant, and due to the condition of performance function in each step, it increases or decreases60.

Bayesian regularization algorithm (BRA)

Like Levenberg–Marquardt, Bayesian regularization algorithm is also used to optimize weights and bias and minimize squares of errors. Weights are determined as follows:

$$F\left( w \right) = \alpha E_{w} + \beta E_{D}$$
(33)

in which, \(\alpha\), \(\beta\), \(E_{D}\), \(E_{w} { }\), and F(ω) are objective function parameters, sum of network errors, sum of squared network weights, and objective function, respectively. Bayes' theorem was used to determine \(\alpha\) and \(\beta\) Moreover Gaussian distribution was employed to develop both network weight and training sets. These parameters are updated and repeated procedure until convergence achieved61.

Boosting method

Schapire62 introduced boosting method which is a type of ensemble methods. In this method, some weak predictors/learners are combined to create a stronger learner. In order to correct previous learners, each weak learner is trained. One of the most popular types of Boosting is Gradient Boosting which is used in this paper.

Gradient boosting (GB)

Gradient boosting is known as one type of Boosting methods. In this type, new learners are applied to residual errors which are made by the previous learners63. The GB could be considered as a form of functional gradient decent (FGD), in which a specific loss is lessened by adding a learner at each step of gradient descent64. The algorithm of GB is as follows:

  1. 1.

    Initialize \(g_{0} \left( y \right) = argmin_{\gamma } \mathop \sum \limits_{q = 1}^{Nu} O\left( {x_{q} ,\gamma } \right)\)

  2. 2.

    Iteration for c = 1: C (C is number if tree learners

    1. a.

      Compute the negative gradient \(a_{q} = \left[ {\frac{{\partial O(x_{q} ,g\left( {y_{q} } \right)}}{{\partial g\left( {y_{q} } \right)}}} \right]_{{g = g_{c - 1} }} ,{ }q = 1,2, \ldots ,NU\)

    2. b.

      Set a regression free \(F_{c} \left( y \right)\) to the target \(\left\{ {a_{q} ,{ }q = 1,2, \ldots ,NU} \right\}\)

    3. c.

      Compute the gradient descent step size by following equation:

      $$t = argmin_{\gamma } \mathop \sum \limits_{q = 1}^{Nu} O\left( {x_{q} ,g_{c - 1} \left( {y_{q} } \right) + \gamma F_{c} \left( {y_{q} } \right)} \right)$$
    4. d.

      Update the model as \(g_{c} \left( y \right) = g_{c - 1} \left( y \right) + tF_{c} \left( y \right)\)

  3. 3.

    For data test (y,?) output is \(g_{C} \left( y \right)\)

The parameters of GB used in this study are presented in Table 1.

Table 1 The parameters used in Gradient Boosting trees.

Results and discussion

In this research, 5040 data points from South Azadgan field in Iran have been used. Table 2 shows the preprocessing of this dataset. In all the developed models, depth (D), weight on bit (WOB), pit total (PT), pump pressure (PP), hook load (H), surface torque (ST), rotary speed (RS), flow in (Fi), flow out (Fo), and wellhead pressure (Wp) were considered as inputs and ROP is regarded as output. Histogram of inputs and output are presented in Fig. 6. As shown in Fig. 6 most of data of surface torque are between 75 and 175 psi. Figure 6 showed that data of flow out and flow in are altered between 50–100% and 600–800 gal/min, respectively (Fig. 6). Hook load data varied from 75–125 k-lbs and most of them are 50 k-lbs (Fig. 6). Data of pump pressure and wellhead pressure are varied from 1000 to 2000 psi and from 0 to 10 psi, respectively (Fig. 6). Pit total data lie between 200 and 280 bbls (Fig. 6). Most of Weight on bit data are around 35 k-lbs (Fig. 6). Most of the rotary speed data in our study were from 25 to100 rpm. Maximum ROP in our data is around 25 ft/hr (Fig. 6). Figure 7 shows box plot of inputs and output data. As shown in Fig. 7, range of WOB data are lower than 50, while pit total data are higher than 200. The data of hook load varied between 50 and 100 (Fig. 7). The range of surface torque and rotary speed are less than 150 and 100, respectively. In addition, the range of flow out and wellhead pressure are less than 100 and 25, respectively (Fig. 7). As shown in Fig. 7, 25% to 75% of ROP's data are less than 50. Figure 7 shows that pump pressure is varied from 750 to more than 1500. As stated in Fig. 7, all of the flow in data are less than 1000. Figure 8 shows the relation of ROP vs. depth for our data. As shown in Fig. 8, by increasing the depth, ROP will decrease.

Table 2 Statistical parameters of the inputs and output data.
Figure 6
figure 6

Histogram of inputs and output data.

Figure 7
figure 7

Box plot of inputs and output data.

Figure 8
figure 8

The relation between ROP and depth.

In first step, in order to estimate ROP based on input parameters, the following correlation was developed based and its coefficients were optimized by GRG:

$$\begin{aligned} {\text{ROP}} &= a_{1} + \left( {a_{2} *D} \right) + \left( {a_{3} *Wp} \right) + \left( {a_{4} *Fo} \right) + \left( {a_{5} *Fi} \right) + \left( {a_{6} *RS} \right) \\ &\quad + \left( {a_{7} *ST} \right) + \left( {a_{8} *H} \right) + \left( {a_{9} *PP} \right) + \left( {a_{10} *PT} \right) + \left( {a_{11} *WOB} \right) + \left( {\frac{{a_{12} }}{D}} \right) \\ &\quad + \left( {\frac{{a_{13} }}{WOB}} \right) + \left( {\frac{{a_{14} }}{PT}} \right) + \left( {\frac{{a_{15} }}{PP}} \right) + \left( {\frac{{a_{16} }}{H}} \right) + \left( {\frac{{a_{17} }}{ST}} \right) + \left( {\frac{{a_{18} }}{RS}} \right) + \left( {\frac{{a_{19} }}{Fi}} \right) + \left( {\frac{{a_{20} }}{Fo}} \right) \\ &\quad + \left( {\frac{{a_{21} }}{Wp}} \right) + \left( {a_{22} *\ln \left( {Wp} \right)} \right) + \left( {a_{23} *\ln \left( {Fo} \right)} \right) + \left( {a_{24} *\ln \left( {Fi} \right)} \right) \\ &\quad + \left( {a_{25} *\ln \left( {RS} \right)} \right) + \left( {a_{26} *\ln \left( {ST} \right)} \right) + \left( {a_{27} *\ln \left( H \right)} \right) + \left( {a_{28} *\ln \left( {PP} \right)} \right) \\ &\quad+ \left( {a_{29} *\ln \left( {PT} \right)} \right) + \left( {a_{30} *\ln \left( {WOB} \right)} \right) + \left( {a_{31} *\ln \left( D \right)} \right) \\ \end{aligned}$$
(34)

where a1–a31 are constants which are presented in Table 3. As shown in Eq. (34), all involved parameters are available and recorded during drilling operation. Therefore, this correlation can be used to estimate ROP roughly. Although the developed correlation can give us a good sense of ROP, if we want to have a good estimation of ROP, it is recommended to use artificial intelligence (AI) which are more flexible and could solve complicated problems. In this study, AI methods namely, LSSVM, MLP, RBF, and DT were used. In order to develop AI models, first, the databank was randomly separated into two subgroups known as the training set, in which the model learns and tries to find best and optimum predictive model, and the test set, which is used to investigate the prediction capability of the developed model. Classification of data points for intelligent models and the developed correlation are as follow:

  1. 1.

    80 percent of the data were used for training

  2. 2.

    20 percent of the data were used for testing

Table 3 The constants of developed correlation.

LMA, BR, and SCG are the three algorithms developed for MLP model and GB is the optimization technique used for DT model.

Statistical evaluation

In order to evaluate and compare the developed models in this study, statistical analysis of errors is performed. For this purpose, the values of standard deviation (SD), average absolute percent relative error (AAPRE), coefficient of determination (R2), root mean square error (RMSE), and the average percent relative error (APRE) are computed and the results are summarized in Table 4. Equations (35)–(39) presented the formulation employed to calculate the aforementioned parameters58,65.

$$SD = \sqrt {\frac{1}{Num}\mathop \sum \limits_{l = 1}^{Num} \left( {\frac{{ROP_{exp,l} - ROP_{pred,l} }}{{ROP_{exp,l} }}} \right)^{2} }$$
(35)
$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{l = 1}^{Num} \left( {ROP_{exp,l} - ROP_{pred,l} } \right)^{2} }}{{\mathop \sum \nolimits_{l = 1}^{Num} \left( {ROP_{pred,l} - \overline{ROP} } \right)^{2} }}$$
(36)
$$AAPRE = \frac{100}{{Num}}\mathop \sum \limits_{l = 1}^{Num} \frac{{\left| {ROP_{exp,l} - ROP_{pred,l} } \right|}}{{ROP_{exp,l} }}$$
(37)
$$RMSE = \sqrt {\frac{1}{Num}\mathop \sum \limits_{l = 1}^{Num} \left( {ROP_{exp,l} - ROP_{pred,l} } \right)^{2} }$$
(38)
$$APRE = \frac{100}{{Num}}\mathop \sum \limits_{l = 1}^{Num} \frac{{ROP_{exp,l} - ROP_{pred,l} }}{{ROP_{exp,l} }}$$
(39)
Table 4 Statistical error analysis of the developed models for the rate of penetration.

Two suitable statistical errors to compare the developed models are AAPRE and R2. As presented in Table 4, R2 of the developed correlation is 0.807. Then, among different MLP models, the best performance was for BRA, followed by LMA and SCGA. R2 of MLP-SCGA, MLP-LMA, and MLP-BRA were 0.944, 0.965, and 0.969, respectively. AAPRE of these models is in good agreement with the R2 results, 13.88% for MLP-BRA, 14.05% for MLP-LMA, and 18.49% for MLP-SCGA. As shown in Table 4, RBF had the worst performance among the developed models. AAPRE and R2 of this model are 21.409% and 0.937, respectively. R2 and AAPRE for LSSVM are 0.971 and 10.497%, respectively. As stated in Table 4, DT-GB had the best performance among the developed models. AAPRE for this model is 9.013% and its R2 is 0.977. Therefore, DT-GB has the best performance among the developed models, followed by LSSVM, MLP-BR, MLP-LM, MLP-SCG, and RBF.

Graphical analysis of models

Figure 9 shows the crossplots for the developed models. In these plots, the values of modeled ROP are plotted versus experimental data. The more data around the line Y = X is, the more accurate the model will be. In other words, line Y = X is a visual criterion for quick examination of model accuracy. Parameter R2 specifies how much data sets conform to the line of Y = X. In other words, as far as R2 is closer to 1, the degree of conformance of the model with the experimental data is more remarkable. Subplot (a) of Fig. 9 presents crossplots of the developed models. As shown in subplot (a), until ROP of 50, the developed correlation obtains an acceptable prediction. However, at high ROP values, scattering of data around 45o line is obvious. As shown in subplot (b) of Fig. 9, except at high ROP values, concentration of the data around the unit slope line is well for MLP-LMA. Concentration of training set around the unit slope line is better than testing set in MLP-LMA model. The same results were achieved for MLP-BRA; however, a better concentration of the data is noticed in MLP-BRA than MLP-LMA (subplot (c) of Fig. 9). However, scattering of data is obvious for MLP-SCGA (subplot (d) of Fig. 9). Scattering of the testing set is obvious and much more than the training data. In subplot (e) of Fig. 9, it can be seen that the estimations of RBF model are scattered around the Y = X line. Scattering of the testing data both at high and low ROP values is obvious. Although scattering of the test data is obvious, concentration of the training data around the Y = X line is acceptable for LSSVM (subplot (f) of Fig. 9. Subplot (g) of Fig. 9 shows that the best performance among AI models belongs to DT-GB. As shown in subplot (g) of Fig. 9, concentration of the data around 45° straight line is good.

Figure 9
figure 9

Cross plots of the implemented intelligent models.

Error distribution of the proposed correlation and developed models is presented in Fig. 10. In each subplot, the percent relative error is plotted against rate of penetration. Subplot (a) of Fig. 10 shows that the developed correlation has reasonable prediction at low ROP values and concentration of the data points around the zero-error line is good. As shown in subplot (b) of Fig. 10, concentration of the data sets around the zero-error line is suitable. In addition, subplot (c) of Fig. 10 shows a much better concentration of the data for MLP-BRA around the zero-error line than MLP-LMA. However, concentration of the data points, which are estimated by model MLP-SCGA, around zero-error line is not as good as that of the two other MLP models (subplot (d) of Fig. 10). Statistical analysis showed that the performance of RBF is not well. Both cross plot and error distribution of RBF confirmed this finding (subplot (e) of Fig. 10). As illustrated in subplot (f) of Fig. 10, concertation of the training data around the zero-error line is satisfactory for LSSVM model, although concentration of the testing data was not well at some points. As displayed in subplot (e) of Fig. 10, the predictions of DT-GB display very appropriate concentration around the zero-error line at both high and low ROP values. The subplot (e) of Fig. 10 supports the superiority of DT-GB.

Figure 10
figure 10

Error distribution plots of the proposed models.

Figure 11 shows comparison between experimental ROP and ROP predicted values by DT-GB model for the first 100 testing data points. As shown in Fig. 11, the best developed model in this study, DT-GB, has good predictions. Except in some data points, the predictions of DT-GB match well with the experimental ROP.

Figure 11
figure 11

Comparison of experimental data and output of DT-GB model for the first 100 testing data points.

Figure 12 shows the comparison of statistical errors for developed models using bar chart. Each subplot of Fig. 12 confirms that the best and worst performance belong to DT-GB and RBF, respectively.

Figure 12
figure 12

Comparison of statistical errors of the intelligent models.

3D plot of absolute relative error of DT-GB model versus different parameters including, hook load, depth, ROP, and WOB, are shown in Fig. 13. As shown in subplot (a) of Fig. 13, maximum absolute relative error is seen when WOB is around 18 k-lbs and depth is 4000 ft. In subplot (b) of Fig. 13, once ROP is 14 ft/hr, and depth is 4000 ft, maximum absolute relative error is reported. Also, at Hook load 90 k-lbs and depth of 4000 ft, the model has high error.

Figure 13
figure 13

Absolute relative error contour versus different parameters (a) WOB and depth (b) ROP and depth (c) hook load and depth.

Figure 14 shows cumulative frequency vs. absolute relative error. Above 50% of the predicted ROP values by DT-GB models have an absolute relative error of less than 10%. 50% of the predicted ROP by LSSVM have an error less than 10%. About 50% of the predicted values by MLP-LMA and MLP-BR models have an absolute relative error of less than 10%. For MLP-SCG and RBF, about 40% and around 30% of the predicted ROP values, respectively, have an absolute relative error of less than 10%.

Figure 14
figure 14

Cumulative frequency vs. absolute relative error of different models proposed in this study.

Sensitivity analysis

A sensitivity analysis was investigated to study the quantitative effects of all input parameters on the ROP of the developed model. Relevancy factor with directionality (r) was chosen for this purpose. The value of r and its sign show the level of effect of input on the output of model and the impact direction, respectively66. The following formula shows the definition of r:

$$r\left( {In_{n} ,OU} \right) = \frac{{\mathop \sum \nolimits_{j = 1}^{m} (In_{n,j} - \overline{{In_{n} )}} \left( {OU_{j} - \overline{Ou} } \right)}}{{\sqrt {\mathop \sum \nolimits_{j = 1}^{m} (In_{n,j} - \overline{{In_{n} )}}^{2} \mathop \sum \nolimits_{j = 1}^{m} (Ou_{j} - \overline{Ou)}^{2} } }}$$
(40)

In the above equation, \(In_{k}\) and \(OU\) show the nth input of the model and the predicted ROP, respectively. The relative effect of input variables on the ROP estimated by the proposed DT-GB model is shown in Fig. 15. As shown in Fig. 15, pit total, rotary speed, and flow in, have a positive effect on the ROP, while depth, weight on bit, pump pressure, hook load, surface torque, and wellhead pressure have negative impacts on the ROP. The highest absolute value of r belongs to depth; therefore, depth has the most important effect among the inputs on the predicted ROP value.

Figure 15
figure 15

The relative effect of input variables on the ROP based on DT-GB model.

Applicability area of the developed model and outlier analysis

Outliers are the data that may vary from the bulk of the data. Frequently, these types of data are expected to appear in large sets of experimental data. The presence of such data can affect the accuracy and reliability of models. Hence, finding these data is necessary in the development of models67,68,69,70,71. In this study, leverage approach has been employed for determining outliers67,69,70,71. In this method, deviation of predicted valued from corresponding experimental data, was calculated. More details about this method can be found in literature67,68,69,70.

Figure 16 shows the William's plot for the predicted ROP obtained by the DT-GB model. Data of out of leverage and suspected data, presented in Fig. 16, can be found in Table 5. As shown in Fig. 16, majority of the data points are positioned in the applicability domain (− 3 ≤ R ≤ 3 and 0 ≤ hat ≤ 0.0057). Therefore, the developed model, DT-GB, has statistical validity and high reliability. A small amount of data is out of the applicability domain, which is negligible. In this plot, we have two important definitions, Good High Leverage and Bad High Leverage. Good High Leverage data are known as data that their R is located between 3 and -3 and their hat* ≤ hat. These data points are different from the bulk of data and they are out of feasibility domain of the developed model, however, they may be predicted well by developed model. If R of data are less than -3 or are more than 3, these data are known as Bad High Leverage. These data are experimentally doubtful data or outliers67,68,69,70.

Figure 16
figure 16

William's plot for discovering the probable outliers and the feasibility domain of the developed model, DT-GB, in this study.

Table 5 Inputs and output which are out of the applicability domain of the DT-GB model based on the leverage approach.

Conclusions

In this study, new methods were used to predict drilling rates. Since the parameters affecting the drilling rates are different, as well as the conditions vary from field to field, it is always difficult to develop a comprehensive, efficient, and precise model. The model that can accommodate more parameters, could better predict the drilling rate. Therefore, we tried to develop a correlation and smart models including MLP, RBF, LSSVM, and DT, with ten input parameters. The main findings of this study are as follows:

  1. 1.

    The developed correlation and smart models need parameters that are accessible in field and can give fast prediction of ROP.

  2. 2.

    All four smart models have a good prediction of drilling rates, which would increase the tendency to use smart methods to predict drilling rates.

  3. 3.

    The best predictions belong to DT-GB model with R2 of 0.977. In addition, the LSSVM model has acceptable performance. R2 of this model was 0.969. In addition, MLP models have good performance and finally the worst performance among the developed models belongs to RBF.

  4. 4.

    Sensitivity analysis showed that flow in, rotary speed, and pit total have positive effects on ROP, while other parameters have negative effects. Among input parameters, depth has the greatest effect on ROP.

  5. 5.

    The leverage approach indicated that the developed DT-GB model is statistically valid and only few data points are located out of the applicability domain of the model.