Real-time prediction of Poisson’s ratio from drilling parameters using machine learning tools

Rock elastic properties such as Poisson’s ratio influence wellbore stability, in-situ stresses estimation, drilling performance, and hydraulic fracturing design. Conventionally, Poisson’s ratio estimation requires either laboratory experiments or derived from sonic logs, the main concerns of these methods are the data and samples availability, costs, and time-consumption. In this paper, an alternative real-time technique utilizing drilling parameters and machine learning was presented. The main added value of this approach is that the drilling parameters are more likely to be available and could be collected in real-time during drilling operation without additional cost. These parameters include weight on bit, penetration rate, pump rate, standpipe pressure, and torque. Two machine learning algorithms were used, artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS). To train and test the models, 2905 data points from one well were used, while 2912 data points from a different well were used for model validation. The lithology of both wells contains carbonate, sandstone, and shale. Optimization on different tuning parameters in the algorithm was conducted to ensure the best prediction was achieved. A good match between the actual and predicted Poisson’s ratio was achieved in both methods with correlation coefficients between 0.98 and 0.99 using ANN and between 0.97 and 0.98 using ANFIS. The average absolute percentage error values were between 1 and 2% in ANN predictions and around 2% when ANFIS was used. Based on these results, the employment of drilling data and machine learning is a strong tool for real-time prediction of geomechanical properties without additional cost.

Rock elasticity is a major identifier for rock mechanical properties and reflects the ability of the rock to recover from a deformation caused by external forces. Two main properties are used to define rock elasticity, Young's modulus, and Poisson's ratio. These geomechanical properties show the relationship between the forces and the resulted deformation 1 . Young's modulus (E) is a stiffness measure and defined by the ratio between the strain and the stress. While Poisson's ratio (ν) is the ratio between lateral and longitudinal strain (ε). Rock elastic properties influence hydraulic fracturing design, drilling performance, in-situ stresses estimation, and wellbore stability [2][3][4][5] .
In order to estimate Poisson's ratio, there are two options, using core samples or well logs. The Poisson's ratio determined by compressional tests on core plug samples is called static Poisson's ratio, while the dynamic Poisson's ratio is derived from shear and compressional acoustic wave velocities logs 6 using the following equation.
where ν dyn is the dynamic Poisson's ratio, V S and V P are the shear and compressional wave velocities respectively.
The advantage of ν dyn over ν static , is that it can provide a continuous profile, In addition, getting core samples are expensive and time-consuming. To overcome the fact that static and dynamic values for Poisson's ratio are usually different from each other, many researchers presented empirical correlations between static and dynamic Poisson's ratio based on linear regression [7][8][9] . However, some of these correlations are developed using limited samples and for a specific type of formation as summarized in Table 1.
While ν st is the static Poisson's ratio, ν dyn is the dynamic Poisson's ratio, V p and V s are the compressional and shear wave velocities respectively.
Artificial intelligence (AI) has a wide range of engineering, medical and industrial applications [10][11][12] . The use of machine learning in the oil industry is fast growing in various sectors. These applications include but are not , limited to estimation and optimization of drilling parameters [13][14][15][16][17][18] , drilling fluid properties [19][20][21] , reservoir fluid properties [22][23][24][25][26][27] , petrophysical properties [28][29][30][31][32] , and geomechanical properties [33][34][35][36] . Different models between static and dynamic Poisson's ratio were developed using different machine learning methods such as an artificial neural network (ANN), Fuzzy Logic (FL), Functional Network (FN), and Alternating Conditional Expectation (ACE) as presented in Table 2. Even though these presented models give good correlations between predicted and actual static Poisson's ratio, but they still require the availability of the shear and compressional velocities, which are used to estimate dynamic Poison's ratio, and may not always be available. Therefore, correlating between Poison's ratio and drilling parameters, which are available from the first encounter to the well, will be extremely beneficial. Moreover, successful applications of using the drilling data to obtain information that usually requires logs have been reported, namely bulk density and sonic velocity logs 44,45 . Furthermore, the use of drilling data in the estimation of formation pressure and abnormal pressure zones detection is an old technique 46,47 .
The approach presented in this paper is based on the idea that drilling data are always available, easier and earlier to obtain compared to conventional well logs and core samples. The use of drilling parameters for real-time estimation of Poisson's ratio using different AI techniques is investigated and presented in this paper.

Data and methods
In order to predict Poisson's Ratio from the drilling parameters, the following steps have been followed. Data for drilling parameters and Young's modulus have been gathered for two wells. Data from one well, has been used to build the model using several machine learning techniques. The dataset from the second well has been hidden from the algorithms and not used later to validate the built model. Figure 1 summarizes the methodology used for efficient young's modulus prediction.
Data description. The collected data for this study were gathered from drilling phase activities in the Middle East. The data covered the drilling parameters and the relevant Poisson's ratio values during drilling the intermediate section for 12.25″ hole size for vertical profile wells. As shown in Fig. 2, the complex lithology of the drilled formations through Well-1 covered four formation types (shale, sandstone, and carbonate rocks).
Well-1 has a total of 2905 data points used to build the model with 70% of the data points for training and 30% for testing the model. 2912 data points from well-2 were hidden from the AI algorithms and used later to validate the built model. Besides the PR that is set as targeted output, each data point contains six drilling parameters used as inputs. The drilling parameters, listed below, were obtained from field measurements and used in building this model: www.nature.com/scientificreports/ Data analysis. Before running the data into the machine learning algorithms, the datasets were cleaned from noise and outliers using Matlab code. Statistical analysis of the dataset used to build the models is presented in Table 3. The correlation coefficients between PR and different drilling parameters are given in Fig. 3. It shows relatively strong correlations between PR and some drilling parameters such as WOB, torque, and pump flow rate. Lower correlation coefficients for other parameters don't necessarily imply the absence of relation between these inputs and PR, but rather means that the linear equation doesn't describe the relationship between the inputs and the output.

Machine learning algorithms.
For the purpose of constructing the models between Poisson's ratio and drilling parameters, two machine-learning methods were used separately, artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS). ANN is a very common machine-learning tool that is inspired by biological neurons in brains 48 . ANN could function as supervised or unsupervised machine learning in regression, classification, and clustering problems 49 . ANN is composed of different components such as neurons, transfer functions, training functions, learning functions, and hidden layers 37 . In literature, there are many reported successful applications of ANN in the oil and gas industry 32,35,36,50,51 .
Adaptive neuro-fuzzy inference system (ANFIS) was developed in the 1990s and integrates the principles of neural networks and fuzzy logic (FL) 52,53 . In this method, ANN is used to set the fuzzy rules in FL 54 . This integration of the two methods provides an improved performance 55 . Similar to ANN, ANFIS has various reported applications in the oil industry 56-59 . Models evaluation. ANN and ANFIS were used for models' construction. These algorithms use 70% of the dataset from well-1 to build the model and 30% of the data to test it internally for several iterations and chose the best fit. After having the model, data from well-2 were used as an external validation set for the models. To evaluate all models' trials, two statistical parameters were used, correlation coefficient (R) and average absolute percentage error (AAPE). R and AAPE are calculated using Eqs. (2) and Eq. (3): Figure 1. Flow chart for the methodology used to generate AI-model. www.nature.com/scientificreports/    www.nature.com/scientificreports/ where ν given and ν Predicted are the available and the predicted Poisson's ratio respectively, and N is the total number of data points.
Sensitivity and optimization. Different runs were done in each method to determine the best tuning parameters inside the algorithms. This has been done by running the two machine learning methods inside multiple for-loops containing the range of tested parameters. In ANN models, a different number of neurons, network functions, training functions, and transfer functions were used. In ANFIS, different cluster radiuses and the number of iterations were used. Table 4 shows the total range of parameters used to get the best models.

Results and discussion
Avoiding overfitting. Overfitting is a very troublesome problem in machine learning, in which the model fits very well in training data and performs poorly in validation and testing. Overfitting results in a model that is limited only to the training data and could not be generalized for data from different sources. In this work, overfitting has been overcome by different methods. In machine learning, when the number of parameters used to optimize the fitting, such as weights and biases, is too much compared to the number of data points, this will increase the chances of overfitting. As indicated in the data description section, more than 2000 data points were used to train the model, which is relatively a high number. This data quantity helped to improve model generalization. Moreover, the models were built to be as simple as possible. For instance, in ANN one layer of neurons was used and the number of neurons was chosen to be as less as possible without significantly affecting the fitting performance.
Additionally, the used algorithms have an early stopping feature to avoid overfitting. In this feature, part of training data is separated and will not be used to build the model instead it will be used as an early validation. The fitting performance for training and validation is estimated at each iteration. For each iteration in Fig. 4, both training and validation error is reducing till point A, after which the model starts to overfit and the validation error starts to increase. Due to the early stopping feature, point A parameters will be used in the model instead of point B, even though it has less error in training.

Artificial neural network. Sensitivity.
To ensure the best results from ANN, a different number of neurons, network functions, training functions, and transfer functions were used. Figures 5, 6, 7 and 8 present the sensitivity analysis on these parameters. Increasing the number of neurons results in better results, however, the computational time increases as well. In addition, there is no significant increase in correlation coefficients when more than 25 neurons were used as shown in Fig. 5. Except for one case, there were no significant variations when different network functions were used as demonstrated in Fig. 6. Sensitivity analyses on training and transfer functions showed the most variations with correlations coefficient ranging between 0.75 and 0.99 as illustrated in Figs. 7 and 8.
Validation. The dataset from well-1 was used to build the model and to perform the sensitivity analysis. After the model has been built, data from well-2 have been used to validate the model. Good results have been achieved in both wells even though the algorithm only trained and test the model using the first well data. The correlation coefficients were 0.992, 0.988 and 0.980 for training, testing, and validation respectively, and the AAPE values were all in the range between 1 and 2%. Figure 9. Shows a comparison between actual and ANN predicted Poisson's ratio for well-1 and well-2. www.nature.com/scientificreports/   Validation. The same procedure used in ANN has been used in the ANFIS model's building and validations. The data set from Well-1 have been used to train and test the model using different parameters and Well-2 dataset was used to validate the built model. Even though all correlation coefficients were higher than 0.97 and the AAPE values were less than 2.2%, the ANN results presented earlier are better. The actual Poisson's ratio in comparison with the predicted Poisson's ratio with ANFIS is presented in Fig. 12.
Computational cost. Besides the key performance indices (correlation coefficient and average absolute percentage error), the computational cost is considered a very essential factor used to compare the different methods utilized. The calculation times (in seconds) were determined in each run for the two models in order to compare the calculation efficiency. As shown the Fig. 13, ANN outperformed the ANFIS model with 90% of the runs took less than 4.32 s while in ANFIS this value was more than 300 s.
Model. Different parameters' combinations have been tested to ensure optimum fit. Table 5 displays ANN and ANFIS parameters that yielded the best matches between the predictions and given values. The best fit was obtained using ANN with a correlation coefficient around 0.99 in training and testing and 0.98 in the validation process and AAPE between 1 and 2%. The generated model is expressed by Eq. 4, while Table 6 shows the weight and biases that are used in the model.    www.nature.com/scientificreports/

Conclusions
Conventionally, Poisson's ratio is estimated from sonic logs data, which may not always be available. An alternative real-time prediction of Poisson's ratio from drilling data has been proposed in this paper by employing different machine learning tools. In the light of the presented outcomes, the following statements could be used to conclude the study: • Compared to other means used to predict Poisson's ratio, drilling data is more likely to be available at an early stage of the well's life without additional cost and time. Therefore, the prediction of Poisson's ratio from drilling data will be very beneficial. • Two machine learning methods were investigated and both yielded a good match, however, a slightly better prediction of Poisson's ratio was achieved using ANN. The sensitivity and optimization of different parameters used in the algorithms have been presented and the best results were reported. • The correlation coefficient between the actual and predicted values ranged between 0.97 and 0.99, while the average errors were all less than 2.2%. The best model was presented as a white-box to allow using other datasets.

Recommendations
Supported by the outcomes presented in this paper that confirm the ability to obtain good predictions of Poisson's ratio from drilling data, it is recommended to investigated other machine learning methods. Moreover, the use of drilling data in the prediction of other geomechanical properties could be investigated using a similar approach. It is also worthy to mention that the data used in this study are from the same field, therefore, to generate general model data from different sources could be combined and used altogether.   www.nature.com/scientificreports/