Hybrid physics-machine learning models for predicting rate of penetration in the Halahatang oil field, Tarim Basin

Rate of penetration (ROP) is a key factor in drilling optimization, cost reduction and drilling cycle shortening. Due to the systematicity, complexity and uncertainty of drilling operations, however, it has always been a problem to establish a highly accurate and interpretable ROP prediction model to guide and optimize drilling operations. To solve this problem in the Tarim Basin, this study proposes four categories of hybrid physics-machine learning (ML) methods for modeling. One of which is residual modeling, in which an ML model learns to predict errors or residuals, via a physical model; the second is integrated coupling, in which the output of the physical model is used as an input to the ML model; the third is simple average, in which predictions from both the physical model and the ML model are combined; and the last is bootstrap aggregating (bagging), which follows the idea of ensemble learning to combine different physical models’ advantages. A total of 5655 real data points from the Halahatang oil field were used to test the performance of the various models. The results showed that the residual modeling model, with an R2 of 0.9936, had the best performance, followed by the simple average model and bagging with R2 values of 0.9394 and 0.5998, respectively. From the view of prediction accuracy, and model interpretability, the hybrid physics-ML model with residual modeling is the optimal method for ROP prediction.

(1) Limitation of the understanding of drilling downhole physics.A drilling system is a complex system 10 with mixed elements, multiscale coupling and multiple process intertwined; however, accurately depicting all drilling processes is still difficult to in the existing physical model, and some physical processes are still unknown.For example, in the actual drilling process, the influence of drilling fluid on ROP is complex, and the influence of drilling fluid displacement, viscosity and density on ROP is not clearly understood, resulting in the assumption and simplification of physical model modeling and thereby triggering uncertainty in ROP prediction.(2) Underdetermined system problems.Even if most physicals in drilling processes are clear, some parameter inversions are often underdetermined systems; that is, the number of observation equations is less than the number of unknown parameters, resulting in unstable effects of the drilling speed model.For example, in the actual drilling process, the heterogeneity of rocks and the wear of drill bits cannot be directly observed, and simple assumptions and some deductions will affect the effectiveness of the physical model.(3) Insufficient accuracy.Despite making a lot of efforts (theoretical and experimental), modeling the ROP as a mathematical function of some variables is not trivial because this is a highly non-linear problem 2 .The systematic, complex and uncertain drilling conditions of the downhole drilling process and the limitations of traditional ROP modeling result in a lack of sufficiently high accuracy.
Due to the limitations of physical models and the development of ML algorithms and the improvement in computer computing power, data-driven ML methods have been widely applied to ROP prediction and have achieved good results.One of the earliest applications of ML methods for ROP prediction dates back to 1997.Bilgesu creatively designed a one-hidden layer feed-forward back-propagation network to predict ROP and demonstrated the applicability of this neural network approach 11 .Mendes 12 also presented a methodology based on a neural network model for ROP and a neuro-genetic adaptive controller to address the problem that relationships between operational variables affecting ROP are complex and not easily modeled.In addition, with the boom in ML algorithms approximately 2010, more and more ML methods are being used for ROP prediction, including Moran 13 , Arabjamaloei 14 , Esmaeili 15 , Ning 16 , Zare 17 , Bodaghi 18 , Hegde 19 , Mantha 20 , Hegde 21 , Anemangely 22 , Soares 7 , Sabah 23 , Felipe 2 , Korhan 24 , Li 25 , Mohammad 26 , Gan 27 , Hazbeh 28 , Salaheldin 29 , Zhang 30 , Ren 31 , Zhang 32 , Brenjkar 33 , Riazi 34 , Song 35 , Wang 36 , Mohammad 37 , Kaveh 38 and so on.Judging from the increasing number of articles published each year in recent years on the use of machine learning for ROP prediction, it can be amply demonstrated that ML methods are well suited for application in the field of ROP prediction.
The ML methods used in the more than thirty articles mentioned above are all different, and to provide a clearer picture of the specifics of ML used for ROP prediction, the above articles are categorized according to the five ML algorithm types: artificial neural network (ANN), support vector machines (SVM), random forests (RF), hybrid model and deep learning (DL) methods.For each method, an additional information is given in the field of detail, and in the "Detail" column, the models used for comparison in the articles are in parentheses, as shown in Table 1.
Table 1 shows a history of the application of ML models in predicting ROP, and the more commonly used methods include ANN, SVM, RF and hybrid models.Moreover, recent years have also seen the emergence of approaches involving hybrid models and DL for ROP prediction.For example, in 2016, B. Mantha et al. realized the use of Step-Wise regression, neural networks (NN, KNN), support vector regression (SVR), categorical regression trees (CART) and other models for prediction 20 .Ensemble methods such as RF and Boosting help improve accuracy and reduce errors.
In 2022, based on data-driven thinking, Zhang proposed solving the real-time optimization problem of ROP by combining attention-based Gated Recurrent Unit networks and fully connected neural networks, which are accurate and robust and can make predictions after training on the first few data streams 32 .Compared with the traditional data-driven model, the proposed model shows great superiority due to its subnetwork structure, gated loop unit network and attention physical.
In 2022, Gan used a mobile window strategy, extreme learning machines, and tenfold cross-validation to build an ROP model 27 .Alternately, two steps (modeling and forecasting) are performed within a moving drilling depth window to more accurately predict the ROP.
In 2023, Zhang used RF, ANN and SVM combined with real-time workflow to predict drilling speed in real time 35 and optimized drilling parameters through the NSG-III algorithm through an objective function for ROP and MSE to obtain a better real-time prediction effect during drilling.
Considering the above for the application of ML methods in ROP prediction, ML models simulate the "inductive" and "speculative" processes of humans through "training" and "prediction" respectively, to model and solve typical problems.Unlike the explicit expression of mechanistic models, ML models establish implicit associations between different variables through training on data; this is often a typical "black box" model.One of the key advantages of ML models is that when physical activity is unknown, they can skip the understanding Table 1.Classification of ML methods employed in ROP prediction.

ANN
Three-layer feed-forward back propagation Bilgesu 11 , Mendes 12 , Moran 13 , Arabjamaloei 14 Multi-layer perceptron (MLP) networks trained with a back-propagation algorithm (BP) Esmaeili 15 , Ning 16 , Zare 17 , Anemangely 22 MLP with particle swarm optimization algorithm (PSO) (MLP, a radial basis function (RBF) ANN, SVM) Sabah 23 ANN with an improved genetic algorithm (IGA) Li 25 MLP with Firefly algorithm (FF), Gravitational search algorithm (GSA), Artificial bee colony algorithm (ABC), Independent component analysis (ICA) Hazbeh 28 ANN with 1 hidden layer, 20 neurons, fitnet as a network function, trainbr as a training function, tansig as a transfer function (adaptive neuro-fuzzy inference system (ANFIS), SVM) Salaheldin 29 ANN with extreme learning machine (ELM) Gan 27 combining an attention-based Gated Recurrent Unit network and fully connected neural networks Zhang 32 multilayer perceptron neural network (MLPNN), radial basis function neural network (RBFNN) (adaptive neurofuzzy inference system (ANFIS), and support vector regression (SVR)) Brenjkar 33 MLP with Bayesian Regularization Algorithm (BRA) (Radial Basis Function, Decision Tree (DT), Least Square Vector Machine (LSSVM)) Mohsen 34 SVM Support vector regression with the genetic algorithm (GA) and the cuckoo search algorithm (CS) Bodaghi 18 least-squares support-vector machines (LSSVM) with cuckoo optimization algorithm (COA), particle swarm optimization (PSO), and genetic algorithms (GA) (SVR-COA, MLP-COA, linear multivariate regression (LMR), and nonlinear multivariate regression (NLMR)) www.nature.com/scientificreports/ of the physical process and go directly to data-driven modeling, especially when the training data are sufficient, and high modeling accuracy can often be obtained.In addition, although ML is time-consuming in the training stage, it generally has high computational efficiency in the test application stage, which has become one of its important advantages.Despite this, ML still has some limitations, especially in the drilling process of complex processes, which often have the following problems: (1) Insufficient generalization.The lack of sufficient training samples is the most common problem of ML in drilling applications, and the use of limited samples to learn complex drilling processes is prone to overfitting, even if the training samples show high modeling accuracy, the test application accuracy will be greatly reduced.In particular, when the actual numerical range, variable relationship, etc. are not covered by the training sample, the prediction result is more likely to be extremely biased, that is, the typical generalization ability is insufficient.(2) Insufficient migration.Regionality is the essential characteristic of rocks in drilling, and the differences between different regions manifest not only as differences in different geological elements but also as differences in the relationships between various elements.As a result, it is often difficult for ML models trained in one region to migrate to other regions for application.Second, the Earth's surface elements and their interrelationships are undergoing constant change, and human activities make them more drastic; therefore, models of different time spans in the same region are often difficult to apply.In addition, insufficient scale migration is another dilemma in drilling applications.(3) Insufficient interpretability.The goal of scientific research is not only to develop a usable model, but also to discover the intrinsic causal relationships and driving patterns between different variables, and use them to explain theories and hypotheses, thereby contributing to the advancement of scientific knowledge 9 .One of the outstanding problems of ML is the lack of interpretability; although it can also obtain relatively high accuracy under specific conditions, lacks the ability to explain the internal physical process.
According to the result of the analysis of the application of ROP prediction models in different articles, the physical models and the ML models have different characteristics for ROP prediction and no single ROP model is suitable for all drilling site conditions.More importantly, most of the above studies have updated physical models or fused ML models to improve accuracy, but few studies have combined physical models and ML models to improve both accuracy and interpretability.On the one hand, the construction methods of the hybrid models are various, and it is uncertain which kind of construction method has more accurate prediction results.On the other hand, there are kinds of physical and ML models in the hybrid models, and it is also unknown which combination of physical and ML models is the best.Therefore, it is hard to satisfy the high-accuracy and great interpretability requirement for ROP prediction by the previous method, and it is necessary to deepen the studies on hybrid models of ROP prediction.
To construct an ROP prediction model for the Halahatang oil field that achieves high prediction accuracy and maintains a certain degree of interpretability, this paper combines the advantages of different physical models and ML models to propose four novel hybrid physics-ML models because there is a growing consensus that solutions to complex science and engineering problems require novel methodologies that are able to integrate traditional physics-based modeling approaches with state-of-the art ML techniques.This paper starts from the operation parameters that can be conveniently controlled by a field driller: WOB, RPM and Q.On the basis of several classical physical models, combined with commonly used ML methods for data training and experiments, the two parameters are combined to construct an ROP prediction model.There are four novel kinds of hybrid modeling approaches designed to reach this goal.The results obtained from this study can be applied in drilling parameter optimization and ROP management of drilling wells to add technical and economic benefits in the future 31 .

Procedure of the hybrid physics-ML ROP modeling
In general, no ROP prediction model can be adapted for all drilling fields.From the previous demonstration, physical models and ML models have different advantages; perhaps the hybrid physics-ML ROP prediction model which combines physical models and ML models can have better accuracy and interpretability.The hybrid ROP prediction model may be more suitable as long as it is rebuilt before it is used in new drilling fields via the hybrid ROP modeling procedure.The procedure is shown in Fig. 1.
On the top of Fig. 1, four stages, which include collecting data from drilled wells, data denoising, feature selection, division of dataset are used for data processing before modeling.In the middle of Fig. 1, the goal of this stage is to select suitable physical and ML models from among the different models through three model performance evaluation metrics.On the bottom of Fig. 1, this study puts forward four novel approaches for hybrid physics-ML modeling, which include residual modeling, integrated coupling, simple average and bagging, maybe in the future, there will be more approaches for hybrid modeling.At the end of the procedure, a suitable model for this field will be selected by comparing the RMSE, MAPE and R 2 .However, perhaps the best hybrid model is not unitary because of the characteristics of different regions despite having the same modeling procedure.

Tarim Basin (Xinjiang) dataset
In this study, comprehensive logging data from an ultra-deep well in the Halahatang oil field of the Tarim Basin in southern Xinjiang were used.The Halahatang area is the main oil-bearing zone in the Tarim Basin, and the Ordovician-rich carbonate-rich oil and gas resources in this area, which are deep unconventional oil and gas resources, are buried at a depth of more than 7000 m (Fig. 2a-c).Due to the drilling depth generally above 7000 m, the lithology is carbonate, and the drilling period is long (Fig. 2d).The basic data of this well is shown in Table 2, the measured data relevant to ROP are composed of formation parameters including dc exponent (DC), normal dc exponent (DCN), formation pressure gradient ( g p ), forma- tion fracture pressure gradient ( G f ), Poisson's ratio ( µ ), drilling parameters containing WOB, RPM, drill fluid flow rate (Q), torque (TORQUE), drilling time (DT), hook load (HL), stand pipe pressure (SPP), well trajectory parameters including measured depth (MD), true vertical depth (TVD), downhole parameters consisting of drill-bit diameter ( d b ), drill-bit footage ( d f ), and drilling fluid parameters containing drilling fluid equivalent density (ECD).
Ten PDC drill bits were used in the well, two sizes, 406.4 mm and 241.3 mm.The entire well spans 10 different strata including the Quaternary, Neogene, Paleogene, Cretaceous, Jurassic, Triassic, Carboniferous, Devonian, Silurian and Ordovician strata.Due to the serious lack of data for the first drill bit and the tenth drill bit in the logging data, only the second to eighth drill bits were selected for experimentation, and the different drill bits drilled into the well section are shown in Table 3: Post drilling data analysis and preprocessing from the preliminary description of the data in Table 2 reveal that due to sensor errors or measurement errors, several zero value points or mutation values need to be processed, to improve the data quality and lay a good foundation for subsequent model fitting.

Data denoising
According to the basic statistics of the dataset in Table 2, the data quality of individual variables is poor, such as the variance in Q is large, there are obvious abnormal peaks and valleys in the distribution of Q, so data noise reduction is needed to improve the quality of data.Different data denoising methods may be suitable for different datasets and three data denoising methods are selected to improve the quality of data including standard deviation, quartile deviation and Savitzky-Golay (SG).
The preprocessing of data was divided into four situations: one was the raw data (as a control group), one was to determine the abnormal data according to a distance of 3 times the standard deviation, one was to preprocess the data according to 1.5 times the quartile difference, and the other was to smooth the data according to the SG algorithm.The SG smoothing filter 40 is used to remove some of the noise in the original dataset.This method applies a polynomial function to reduce noise in data variables, replacing values identified as noise in the data records with values generated by the SG function 23 .Based on the identified least squares error, the nth order polynomial function is derived from the selected drill encounter stratigraphic points.The number of points chosen should be odd and greater than the order of the derived polynomial function.If a higher polynomial order is applied, or if the number of data records fitted within that interval decreases, the derived polynomial function preserves the data trend of the variable.However, a decrease in the order of the polynomials or an increase in the number of data records used to define a particular interval can disrupt the data trend of the variable, resulting in excessive smoothing of the data.
To obtain a good SG filtering effect, a sensitivity analysis of the SG algorithm is carried out, the smoothing effect of the SG algorithm is performed on the data, and the filtering effects of the window length and polynomial order are compared; these two aspects are considered.One is to try to maintain a smoothing effect, that is, to use the standard deviation of the field data and the smoothed data to ensure that the dispersion degree of the data is better, and the other is to compare R 2 to ensure that the smoothed data are as good as possible and that the field data are consistent.The window length was set from 3 to 29 for experimental comparison: As shown in Fig. 3, the best filtering parameters are 17 and 1 because R 2 is close to 1 and the data standard deviation is relatively small, resulting in better data noise reduction.Preprocessing via the SG algorithm, clearly revealed that the data quality is better, and some abnormal points are removed more smoothly, which is more in line with the actual drilling situation.
From Figs. 4, 5, 6, and 7, it can be seen that denoising is necessary to ensure that the impact of various abnormal situations can be removed and the data more in line with the real situation can be obtained.For example, in Fig. 6, Q has a dozen exceptions that were filtered by the SG algorithm so that the overall trend of the Q data is more consistent with the actual drilling conditions.

Data spilt
According to the drilling field, different drilling bits are suitable for different well sections and formations.To avoid the influence of different kinds of bits, the drilling data from the drilled well were divided into nine datasets, namely, the whole well dataset, first bit dataset, second bit dataset, third bit dataset, fourth bit dataset, fifth bit dataset, sixth bit dataset, seventh bit dataset and eighth bit dataset according to the type of bit used; the specific segments are shown in Table 3.
Whether through the regression fitting of physical models or during the training process of ML models, how to divide the training dataset and test dataset has a great impact on the prediction effect.Moreover, during the drilling process in the drilling field, the logging data are acquired segment by segment, and the random spilt method perhaps can't be meet.Therefore, in this study, nine datasets all were spilt with two kinds of method including randomly divided and divided according to the ration before and after.

Feature selection
Table 2 shows that there are approximately 18 kinds of drilling variables, which may be related to the ROP.However, some drilling variables need not be taken into account because they may not be measured directly but may  be calculated twice by some of the measured variables, such as DC and DCN.In addition, some variables, such as MD, TVD and Bit Depth, have the same meaning and effect and simply consider MD.
According to the mechanistic analysis, 14 variables were retained, namely, MD, HL, RPM, DT, ROP, WOB, TORQUE, SPP, Q, ECD, d b , µ , G f and g p .However, to quantitatively analyze the relationship between each variable and the ROP, correlation analysis was carried out.Common methods for correlation analysis include Pearson, Spearman and Kendall correlation analyses, which quantify correlations by correlation coefficient.However, in statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data 41 ; Spearman's rank correlation coefficient is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables) 42 ; and the Kendall rank correlation coefficient is a statistic used to measure the ordinal association between two measured quantities 43 .The three correlation analysis methods generally need to follow a linear or normal distribution or order, and are not suitable for variable correlation related to ROP.
The maximal information coefficient (MIC) 44 , which was proposed in 2011, has two excellent properties: generality and equitability.In other words, the MIC can detect various relationships including linear, non-linear, functional and non-functional relationships.The MIC values of the different types of relationships are similar at the same noise level 45 .Therefore, in this study, the MIC was selected as the correlation analysis method.The  www.nature.com/scientificreports/correlation heat map obtained from the MIC is shown in Fig. 8. Variables with high correlation can be used as references for feature selection and modeling.

Physical models
In 1965, Bingham proposed an early R-W-N form of an ROP prediction model considering the influence of drill bit diameter.An empirical coefficient b is added to the ratio of the drilling pressure to the diameter of the drill bit, and a comprehensive drilling coefficient a is also included, which is calculated and fitted using the field data of each formation 3 .The specific model is as follows: ( where a and b are dimensionless coefficients of experience corresponding to each formation, WOB is the weight on the drilling bits (KN), d b is the diameter of the drill bit (mm), and RPM is the bit revolution per minute (rev/ min).Although both model coefficients (a and b) were determined for the entire rock formation, they could not account for changes in the physical behavior of the well within different operating parameter areas 7 .
In 1967, Eckel studied the influence of the drilling fluid flow rate and hydraulic parameters on drilling speed through the indoor laboratory of micro drill bits, and proposed a drilling speed prediction equation that includes the drilling fluid flow rate on the basis of the drilling speed equation proposed by traditional experiments 4 .
where K is the comprehensive dimensionless coefficient of drilling; a, b and c are dimensionless coefficients; WOB is the drilling pressure; RPM is the turntable speed; Q is the drilling fluid flow rate (L/s); ρ is the specific gravity of the drilling fluid (dimensionless density); d is the nozzle diameter; µ is the drilling fluid viscosity; and k is the dimensionless coefficient fitted by the microdrill laboratory and field data.
Although Eq. ( 2) describes the relationship between the properties of the drilling fluid and the ROP to a certain extent, in actual drilling operations, the density and viscosity of the drilling fluid are designed in advance and generally do not change during the actual drilling process.The operating parameters that are convenient for driller adjustment and control on site are the WOB, RPM and Q. Considering the actual working conditions, Eckel's drilling speed equation is simplified, and a simplified Eckel drilling speed prediction equation is proposed, as shown in Eq. ( 3).
In 1974, Bourgoyne and Young 6 proposed a broad ROP model that included all common drilling parameters.Subsequently, in 1986, the newly proposed modified Bourgoyne and Young formula eliminated normalization factors and constant parameters 46 , simplifying the model and making it dependent on the measurement of core variables in real-time drilling optimization: Although Bourgoyne and Young's mechanical ROP model provides a comprehensive description of the drilling process, many of the parameters used in the model are difficult or impossible to measure in real time with prior art and must be approximated, such as the pore pressure gradient and drill bit wear.In addition, the model relies on normalization constants for drill bit depth and the WOB, RPM and Q terms, first derived from the 1970s.Nascimento 47 and Kutas 48  (3) Vol:.( 1234567890)

Artificial neural network
In 1997, Bilgesu introduced neural networks to train a new ROP prediction model, which was well applied in real situations 11 .The structure of ANN is shown in Fig. 9.

Support vector machine
In 2015, Bodaghi proposed an "optimized support vector machine regression" method for ROP prediction 18 .
Optimizing the parameters of the support vector machine by optimizing the parameters of the cuckoo search algorithm has a higher ROP prediction accuracy than the genetic algorithm and the pattern search and grid search algorithms.The largest edge hyperplane is based entirely on the data point located at the edge.These two points, equidistant from the hyperplane, are called support vectors. Figure 10 shows an ideal example of detecting drilling speed anomalies.Given the additional drilling parameters and actual site conditions, it is unreasonable to expect linearly separable data to distinguish between efficient drilling and high vibration, inadequate hole clearing, or drill bit ladle operation areas.Support vector machines overcome this problem by allowing some data points to violate margins.

Random forest
In 2015, Hegde used limited ground measurements to predict ROP using Trees, Bagging, and Random Forest, respectively, and random forests provided the best accuracy for the data used; therefore, real-time, closed-loop applications were used 19 .Different regression algorithms, including Least Squares Regression, Ridge and Lasso Regression, and Principal Components Regression, have also been used to predict ROP incidence.The random forest algorithm is a Bagging ensemble algorithm based on decision trees 49 .The structure of the random forest algorithm is shown in Fig. 11.The calculation relies mainly on the construction of multiple decision trees, and the average value of each decision tree is taken as the final prediction result.The original sample sampling of each decision tree is random, and the sampling processes are independent of each other.The specific training process of the random forest regression algorithm is shown in Fig. 11.
(1) Randomly put back the sampling to construct the original samples X and Y of the decision tree, and construct the root node of the decision tree.

Model performance evaluation metrics
Whether it is a classical physical model or an ML model, it is essentially a fitting of the real drilling law, and the effect of its fitting needs to be evaluated and measured by certain indicators.Common model performance evaluation indicators are selected: the root mean square error (RMSE), average absolute percentage error (MAPE) and coefficient of determination (R 2 ).The RMSE is the square root of the square of the deviation between the predicted value and the true value and the ratio of n observations; this metric measures the deviation between the predicted value and the true value, and is sensitive to outliers in the data.where N is the number of samples, ROP Field,i represents the actual drilling speed at the site and ROP Model,i rep- resents the drilling speed predicted by the drilling speed model.The MAPE, which measures the relative error of a predicted value to a true value, is one of the most popular metrics for evaluating forecast performance, as defined below: The R 2 evaluates the fitting performance of the regression model.When R 2 is close to 1, the model has good prediction accuracy.

Hyperparameter selection in ML models
The hyperparameters control the structure of ML models and determine their performances, which are parameters that are not directly learnt within training process.There are no definite rules for hyperparameter selection, as optimal model structure varies by application.Each ML model has multiple different hyperparameters.For instance, the hyperparameters of the RF include the number of trees in the forest (n_estimators), the function to measure the quality of a split, the number of features to consider when looking for the best split (max_features) and the minimum number of samples required to be at a leaf node (min_samples_leaf).The number of trees in the forest and the minimum number of samples required to be at a leaf node are closely related to the training accuracy of the RF model.The more n_estimators and min_samples_leaf, the more accurate the prediction performance is; however, the training costs of the RF increase.Therefore, it is necessary to balance the relationship between training accuracy and training costs.Researchers typically define a grid and search for the best hyperparameter combinations with cross-validation.The same methodology was applied in this study, and it was implemented using the sklearn.model_selection.GridSearchCV function of Python's scikit-learn 50 .
After referring to the hyperparameters in Sepideh 51 and several cross-validations, the optimal hyperparameters of those ML models selected are shown in Table 4.To provide a more specific explanation of the process of hyperparameter selection, the optimization process of two RF hyperparameters, n_estimators and max_features, is illustrated in Fig. 12.As shown in Fig. 12, the mean_test_score of max_features = None(1.0)is the highest among the three max_features, and the mean_test_score is the average test core obtained from fivefold cross validation so that overfitting can be avoid in the process of hyperparameter selection.Among the n_estimators values, n_estimators = 14 had the highest mean_test_score, and max_features = None (1.0) and n_estimators = 140 were ultimately selected as the optimized hyperparameters.In this way, two hyperparameters are selected, and the hyperparameters of the other ML models are also determined in accordance with this process.

Hybrid physics-ML models
The purpose of ROP prediction is to better guide the optimization of drilling parameters, and three physical data models including WOB, RPM and Q are selected from the actual application situation on site, namely, the physical model proposed by Bingham in 1965 (Eq.1), the modified physical model proposed by Eckel in 1967  (Eq. 3) and the physical model proposed by Soares in 2019 (Eq.7).With a certain amount of logging data, the optimize method in Python's SciPy library is used for multivariate nonlinear fitting to obtain the empirical coefficients of the three physical models.4. Hyperparameters of ML models.

ANN
The ith element represents the number of neurons in the ith hidden layer (1000, 500)

Activation function for the hidden layer relu
The solver for weight optimization adam

Residual modeling
The residual approach may be the oldest and most common approach for directly addressing the imperfection of physics-based models in the scientific community; an ML model (usually linear regression) learns to predict the errors or residuals made by a physics-based model 52 .The structure of the residual model is shown in Fig. 13.First, drilling data are used to train the physics model with a regression method to obtain the empirical coefficients of the physics model.Then, the physics model can obtain the ROP phy when the drilling data are the input of the physics model.Finaly, the drilling data and ROP phy are used as the inputs of the ML model which is subsequently trained on a hybrid residual model.

Integrated coupling
Different physical ROP models have different emphases, and the ML model can combine their different advantages when the output of the physical model is used as input to the ML model.A detailed description of this approach of hybrid modeling is shown in Fig. 14.First, the drilling data are used for three physical model regression fitting.Then, the output from Bingham model, Eckel model and Soares model are recognized as inputs to the ML model to train the ML model, which may yield better prediction than individual physics models.

Simple average
Physical models have better generalization performance and interpretability than ML models, but most of ML models have better accuracy when enough good data are available for training ML models.When predictions from both physics and the ML model are combined, the prediction of the ROP may improve.In this study, the simple average is the combined approach, which is shown in Fig. 15.First, the physics model and ML model are regression fit and trained alone to obtain predictions from both the physics model and the ML model.Then, the average of ROP phy and ROP ML is recognized as the final prediction of the ROP from the hybrid model.

Bagging
Ensemble learning works by building and combining multiple learners to accomplish learning tasks; this approach is sometimes referred to as multi-classifier system or as committee-based learning 53 .By combining multiple learners, ensemble learning often achieves significantly better generalization performance than does single learner learning by creating and selecting good and different learners.According to the methods used to generate individual learners, the current ensemble learning methods can be roughly divided into two categories: serialization methods with strong dependency between individual learners, which must be generated serially; and parallel methods with no strong dependency between individual learners, which can be generated simultaneously.
The former is represented by Boosting and the latter is represented by Bagging and Random Forest.Because the physics prediction ROP models are not strongly dependent, this study selects Bagging as the ensemble learning method and different physics models are recognized as individual learners.
Bagging is a type of parallel ensemble learning 54 , that is directly based on bootstrap sampling 55 .The specific process is shown in Fig. 16.First, the drilling data are divided into three training sets for physics models through bootstrap sampling and every training set accounts for approximately 63.6% of the drilling data without

Results and discussion
In this study, the Bingham, Eckel, and Soares models were selected as physical models for ROP prediction.Among the ML models, ANN, SVM and RF were also selected for training and testing.Three evaluation metrics, RMSE, MAPE and R 2 , were used to comprehensively compare the performances of these models, and the optimal physical and ML models were selected to establish hybrid models.Four categories of hybrid models, namely, residual modeling, integrated coupling, simple average and Bagging, were established and compared with the optimal physical and ML models.To further investigate the influence of different models on hybrid models, physical models and ML models were used to establish different hybrid models.In total, 34 kinds of drilling speed prediction models were used for a total of 2448 groups of experiments.

Comparison of different physical models
In this section, the performances of three physical models, namely, the Bingham, Eckel and Soares models, are compared in detail, and the optimal physical model is selected.First, the coefficients in these physical models must be obtained.The Trust Region Reflective (TRF) algorithm, is motivated by the process of solving a system of equations that constitute the first-order optimality condition for a bound-constrained minimization problem, as formulated in STIR 56 .This algorithm is particularly suitable for large sparse problems with bounds and works quite robustly in unbounded and bounded problems; thus, it is chosen as the selected algorithm for solving the non-linear least squares problem to obtain the empirical coefficients contained in the physical models.The performances of these three ROP prediction models are compared, and the results are shown in Figs. 17 through 18.To better reflect that the empirical coefficients obtained by the TRF algorithm can accurately predict ROP; the training set and test set are randomly divided at a ratio of 4:1, and the comparison between the ROP predicted by the three physical models and the original ROP is shown in Fig. 17. Figure 17 shows that the prediction results of the named Soares model fit the actual ROP curve more closely, confirming that this model is more suitable for this field.The R 2 of the amended Soares model (0.7900) is larger than that of the other physical models, and its RMSE (9.0874) and MAPE (0.4190) are both smaller than those of the other physical models, indicating that the prediction accuracy of the amended Soares model is significantly better (Fig. 18 and Table 5).Therefore, this model is selected as the physical part of the hybrid part of the hybrid model for ROP prediction.

Comparison of different ML models
In practice, the ROP performances of different ML models are quite different for different regions, and there is no a unified model suitable for all situations.In this section, the performances of three ML algorithms, ANN, SVM and RF, were compared, and the optimal ML algorithm for the Tarim Basin was selected.
The training results of the ML models are shown in Figs.19 and 20.Due to its powerful learning capability, the prediction result of the RF model matches the original value well as shown in Fig. 19.Moreover, the training curve of the SVM also achieved a good fit.The R 2 of the SVM prediction result is 0.7929, and the R 2 of the RF prediction result is 0.9938 (Fig. 20 and Table 6).
What's more, the RMSE and MAPE values of RF are also better than those of ANN and SVM; thus, the RF model outperforms than the other two models, and it is selected as the ML part of the hybrid model.
As shown in Fig. 19 and Table 6, the RF model has a high level of ROP prediction accuracy.However, it is well known that ML models are prone to overfitting, and to avoid overfitting, this study adopts the k-fold cross-validation (CV) method to avoid overfitting and ensure generalization.With respect to the tenfold CV used in this study, the entire dataset was split into 10 smaller sets.The following procedure is followed for each of the k "folds", as shown in Fig. 21: A model is trained using k − 1 of the folds as training data; The result model is validated on the remaining part of the data (it is used as a test to compute a performance measure such as accuracy).
The performance measure reported by k-fold CV is then the average of the values computed in the loop.It was repeated ten times until each fold was used as the validation set.A tenfold CV was performed for all three ML models, and the specific results are shown in Fig. 22.
The performance of the ML models during ten-fold CV was assessed by calculating the train score and test score, and the score represents the R 2 .According to the similar score of each fold in Fig. 22, these models can obtain similar results on different validation sets.These models are verified to have good generalization.Furthermore, as shown in Fig. 22, there was not much difference between the train score and the test score for each cross-validation, especially for RF.It was verified that these models were not overfitted.The fluctuations in the train score and test score in the SVM and ANN models may be caused by the two models being more sensitive to fluctuations in the data.There may be different distributions between the test data and the training data, and the model may be better adapted to the distribution of the test data, resulting in higher test accuracy.As shown in Fig. 23, Soares, Second and Fourth cannot obtain very good accurate results compared with the other models.However, when RF error compensation was introduced, the prediction results of the first hybrid model fit the actual ROP curve more closely in most well sections.According to Fig. 24 and Table 7, the prediction accuracy of the first hybrid model (R 2 = 0.9936) is better than that of the RF model (R 2 = 0.9934).Moreover, the prediction accuracy is also greatly improved compared with that of the Soares model.The RMSE and MAPE also exhibited better performance than did the other five models.8), indicating that the prediction accuracies of the Fourth model are higher.

Comparison of different hybrid models
Based on the above analyses, it can be concluded that the best hybrid model with high prediction accuracy is obtained only when the optimal physical and ML models are selected, which can provide important guidance for future ROP prediction modeling.

Comparison of the effects of different dataset sizes on models
In the drilling field, the ROP will change with rounding trip operations or when connections are made; consequently, the ROP prediction model may not reach enough accuracy when the dataset falls short of more than 5000 records in the above experiment.To study the effects of different dataset sizes on ROP models, an additional eight sets of tests were added to compare the full interval data according to the drilling depth of each bit (Table 3).The prediction results are shown in Figs. 26 and 27.
As shown in Fig. 27, four different kinds of ROP models including the Soares model, the RF model, the First hybrid model and the Second hybrid model, were selected for comparison.According to Figs. 27, 28, and 29, the RF model and the First hybrid model, which are suitable for ROP prediction in the field, are less sensitive to changes in dataset size, but the accuracy of the Soares and the Second model is strongly affected by the dataset size.It is also proven that the First hybrid model has better generalization performance and accuracy and is more suitable for field application.

Comparison of the effects of different data denoising methods on the model
In the experiments in the above section, most of them used the whole well data after noise reduction by the SG algorithm to conduct the experiments.To compare the advantages and disadvantages of the SG algorithm and the effects of data denoising methods on the model, common standard deviation and quartile deviation were also used to conduct control experiments with the original data.The prediction results are shown in Fig. 30.
As shown in Fig. 30, from the overall trend, in the case of the same model prediction, the R2 is the highest, and the RMSE and MAPE are the lowest after SG noise reduction, except that the Second hybrid model is not suitable for the SG algorithm.In other words, the ROP prediction model should be retrained before it is used for new drilling areas, not only for physical models, ML models and hybrid models but also the data denoising methods and dataset sizes.In this study, a logging dataset was collected after real drilling, and the data of all the well segments were obtained.After the data noise reduction processing, the training set and the test set are randomly divided to obtain a better prediction effect.Moreover, in the real-time drilling process, the ROP prediction model can be trained only by drilled data, which include tens, hundreds or even thousands of meters of data that will be drilled in the next time.Perhaps this situation is similar to the division according to the front-to-back ratio, and cannot obtain a better drilling rate prediction effect like random division.How to ensure that the ROP prediction

Further discussion on hybrid models for ROP prediction
Judging from the above accuracy of the ROP predictions of the different models, the hybrid model has advantages in terms of prediction accuracy and interpretability, but it also has several limitations, such as the computational complexity of the hybrid models being greater than that of the single model.The computational complexity of hybrid models can be divided into p steps 57 , each corresponding to the training of a model (M 0 , M 1 , …, M p ).The evaluated hybrid models are trained sequentially, and their computational complexity can be described as M 0 + M 1 + ••• + M p , where M p is the computational complexity in ROP prediction.In this way, the four hybrid models are approximately two times more expensive than the single model, because the hybrid models use at least two kinds of models, including physical models and ML models.This work applied six combinations of hybrid models, Bingham, Eckel, Soares or ANN, SVM and RF models.The three physical models were trained and fitted using the curve_fit function in scipy.optimize 58, whose core method is non-linear least squares method to fit the ROM model.The computational complexity of it depends on the size and complexity problem, as well as the number of iterations required.Therefore, in this study, it is approximately assumed that the computational complexity is O(m 2 n), where n are the training examples and m is the total number of features/variables.The ANN training process has a complexity of O(emnk), where e is the number of epochs and k is the number of neurons 59 .The SVM training process has a complexity of O(nm), where n is the size of the dataset, and m represents the number of input features 60 .The RF training process has a complexity of O(Mmnlog(n)), where M is the number of trees, m is the number of features and n is the number of data samples in the training set 61 .
In the previous section, through ten-fold CV experiment, it was proven that the generalization of the model is not problematic in the Halahatang oil field.However, further in-depth research needs to be conducted in additional regions to further optimize the model and study generalizability.

Conclusions
Through the above research and analysis, the following conclusions can be drawn: 1.The Soares model considering MD and Q was proven to be the most accurate for the Tarim Basin field among all physical models such as the Bingham and Eckel models.The R 2 score of the Soares model is approximately 3 times greater than that of the Bingham model and 11.9% greater than that of the Eckel model.2. Three ML algorithms were used to create machine learning models.Compared with these machine learning models, the RF model yielded the best results, with the highest correlation coefficient (R 2 = 0.9934) and lowest prediction error (MAPE = 0.0501 and RMSE = 1.6072).3. Four hybrid models, including residual modeling, integrate couple, simple average and bagging, were established.Most hybrid models showed higher accuracy than physical models and greater interpretability than conventional ML models.4. The performance of the hybrid model with error compensation by ML was optimal among all the ROP prediction models.The R 2 score improved greatly, specifically by approximately 25.77% in comparison with that of the Soares model.Moreover, the best hybrid model can be achieved when both physical and ML models with the best performances were selected for hybrid modeling.5.The idea of a hybrid model with residual modeling and combining predictions from both physical and ML models are recommended for different drilling operations due to its clear physical meaning, awesome generalization capability, low modeling difficulty, and good interpretability.The RF algorithm is also recommended when using artificial intelligence because of its outstanding predictive accuracy.6.The RF model and First hybrid model are relatively insensitive to dataset size compared with the other ROP models.7. Most ROP models achieve better performance after denoising with the SG algorithm than after denoising with Stand deviation or Quartile deviation, a suitable denoising reduction method should be selected through experimentation and analysis.8. Dividing the training set and test set according to the ration before and back, the accuracy of the trained model is far less than that of the random division model.To employ the ROP model in real-time drilling, how to make the ROP prediction model accurately predict the ROP of the adjacent drilling interval on the basis of the drilled interval data may be the focus of further research.9. Further studies can focus on the optimizing drilling parameters to increase the ROP.In addition, in this study, the selected ML algorithms were mostly common ML methods, that can be combined with physical models by deep learning, transfer learning, reinforcement learning, large model and other algorithms. https://doi.org/10.1038/s41598-024-56640-ywww.nature.com/scientificreports/

Figure 1 .
Figure 1.Procedure of the hybrid ROP modeling.

Figure 2 .
Figure 2. Generalized geologic setting of the Halahatang region 39 .(a) Location of the Tarim Basin, China; (b) location of the study area in the northern Tarim Basin; (c) structural map of the Halahatang oil field; (d) and comprehensive stratigraphic column for the northern Tarim Basin.

Figure 4 .
Figure 4. Comparison of the SG denoised and original WOB.

Figure 5 .
Figure 5.Comparison of the SG denoised and original RPM.

Figure 6 .
Figure 6.Comparison of the SG denoised and original Q.

Figure 7 .
Figure 7.Comparison of the SG denoised and original ROP.

Figure 8 .
Figure 8. Maximal information coefficient results of the drilling data reveal the correlations between the variables: the larger the coefficient is, the greater the correlation is; the smaller the coefficient is, the smaller the correlation is.
exposed this problem by reporting Bourgoyne and Young model applications by different authors with different normalization factors and proposing new values.In 2019, Cesar Soares proposed a new drilling speed model 7 based on real-time drilling speed prediction and the problems of Bourgoyne and Young.The validity of the coefficients can be guaranteed only if the model predicts the exact same bit, drilling fluid, formation and mechanical drilling speed under similar operating conditions.Therefore, in a certain range of drill bits or mud, the same coefficient values are not applicable, and one model coefficient can absorb the influence of all constant parameters.The newly proposed modified Bourgoyne(2) ROP = K * WOB a RPM b kQρ dµ c 303a 2 (10000−D) f 3 = e 2.303a 3 D 0.69 (gp−9.0)

Figure 11 .
Figure 11.The structure of RF.

1 RF 2 Vol
type to be used in the algorithm rbf Degree of the polynomial kernel function 3 Kernel coefficient for 'rbf ' , 'poly' and 'sigmoid' scale Regularization parameter 50.0 Epsilon in the epsilon-SVR model 0.The number of trees in the forest 140The function to measure the quality of a split SEThe number of features to consider when looking for best split 1.0The minimum number of samples required to be at a leaf node 14:5957 | https://doi.org/10.1038/s41598-024-56640-ywww.nature.com/scientificreports/Three common ML algorithms were selected-ANN, SVM, and RF as the learning model.This study proposes four approaches of hybrid physics-ML modeling.

Figure 12 .
Figure 12.The process of grid search CV.

Figure 13 .
Figure 13.The structure of residual modeling.

Figure 14 .
Figure 14.The structure of integrated coupling.

Figure 15 .
Figure 15.The structure of simple average.

Figure 16 .
Figure 16.The structure of Bagging.

Figure 17 .
Figure 17.Prediction results of the physical models.

Figure 18 .
Figure 18.Proportionally divide the impact of different data preprocessing methods.

Figure 20 .
Figure 20.Evaluation indexes of ML model results.

Figure 21 .
Figure 21.The procedure of tenfold CV.

Figure 22 .
Figure 22.The result for tenfold CV.

Figure 23 .
Figure 23.Prediction results of physical, ML and hybrid models.
To further study the influence of different physical and ML models on the performance of the first hybrid model, the two best physical models (the Soares and Eckel model) and the two best ML models (SVM and RF) were selected for cross-mixing.The first model was established with the Eckel model as the ROP prediction model and the SVM as the error compensation model, and the second model was established with the Eckel model as the prediction model and the RF as the error compensation model.The third model takes the Soares model as prediction model combined with the SVM as the error compensation model, and the last model takes the Soares model as the prediction model and uses the RF as the error compensation model.The prediction results are shown in Figs. 25 and 26.As shown in Fig.25, all four models achieved outstanding performances, thus verifying the effectiveness of the first hybrid model.The best ROP prediction model was selected by comparing the performance indices of all the models.The R 2 values of the different models are close, but the Fourth model is slightly better than the other models.Moreover, the RMSE and MAPE values of the Fourth model are significantly lower than those of the other three models (Fig.25 and Table

Figure 24 .
Figure 24.Evaluation indexes of the physical, ML and hybrid models.

Figure 25 .
Figure 25.Prediction results of different hybrid models.

Figure 26 .
Figure 26.Evaluation indexes of different hybrid models.

Figure 27 .
Figure 27.Prediction accuracy (R 2 ) of different models with different dataset sizes.

Figure 28 .
Figure 28.MAPE of different models with different dataset sizes.

Figure 29 .
Figure 29.RMSE of different models with different dataset sizes.

Figure 31 .
Figure 31.Evaluation indexes of proportion and random.

Table 3 .
Different drill bits are drilled into the well section.

Table 5 .
Evaluation indices of the physical model results.Prediction results of ML models.

Table 6 .
Evaluation indexes of ML model results.

Table 7 .
Evaluation indices of physical, ML, and hybrid models.

Table 8 .
Evaluation indexed of different hybrid models.