Real-time prediction of formation pressure gradient while drilling

Accurate real-time pore pressure prediction is crucial especially in drilling operations technically and economically. Its prediction will save costs, time and even the right decisions can be taken before problems occur. The available correlations for pore pressure prediction depend on logging data, formation characteristics, and combination of logging and drilling parameters. The objective of this work is to apply artificial neural networks (ANN) and adaptive neuro-fuzzy inference system (ANFIS) to introduce two models to estimate the formation pressure gradient in real-time through the available drilling data. The used parameters include rate of penetration (ROP), mud flow rate (Q), standpipe pressure (SPP), and rotary speed (RS). A data set obtained from some vertical wells was utilized to develop the predictive model. A different set of data was utilized for validating the proposed artificial intelligence (AI) models. Both models forecasted the output with a good correlation coefficient (R) for training and testing. Moreover, the average absolute percentage error (AAPE) did not exceed 2.1%. For validation stage, the developed models estimated the pressure gradient with a good accuracy. This study proves the reliability of the proposed models to estimate the pressure gradient while drilling using drilling data. Moreover, an ANN-based correlation is provided and can be directly used by introducing the optimized weights and biases, whenever the drilling parameters are available, instead of running the ANN model.

Radial basis transfer function b 1 Input layer biases b 2 Output layer bias w 1 Weights linking inputs and hidden layer w 2 Weights linking output and hidden layer Subscripts i Index of each neuron in the hidden layer n Normalized value Formation pressure is exerted by the fluids within the rock pore space. At certain depth, the normal gradient originates from the saltwater column weight extended from the surface to the point of interest. The deviation from the normal trend can be described as abnormal which can be either subnormal or overpressure 1 . Normal pressure is not constant, and it depends on the amounts of dissolved salts, fluid types, gas presences and temperature gradient. Supernormal or overpressure is the formation pressure exceeding the normal hydrostatic pressure while subnormal pressure is the one that is lower than the normal pressure. Supernormal is created by normal pressure in addition to an extra pressure source. The excess pressure may be attributed to different reasons which may be geological, mechanical, geochemical and combined 2 . Abnormal pressure zones may lead to severe technical and economic issues such as kicks and blowouts. Subnormal pressure may lead to loss of circulation and differential pipe sticking resulting in setting additional casing strings (higher drilling costs) 2 . Accurate realtime formation pressure estimation may provide enhanced well path and casing design, better wellbore stability analysis, effective mud program and reduced overall drilling costs 3,4 . Formation pressure estimation can be either quantitative or qualitative. Most of these techniques depend on comparing the normal trend lines with the observed ones graphically to pick the anomalous changes that may refer to abnormal pressure zones. The existing techniques in the literature utilized well logs, strata properties and drilling parameters. Hottman and Johnson 5 were the first to estimate the pore pressure based on shale logging data by constructing cross plots that relate the pressure gradient to resistivity ratio or sonic travel time difference between the observed and the normal trend. Matthews and Kelly 6 utilized a semi-log scale for Hottman and Johnson correlation. Pennebaker 7 replaced the sonic travel time difference utilized by Hottman and Johnson 5 with the sonic travel time ratio. The author estimated the pore pressure from an X-Y cross plot like the one belongs to Hottman and Johnson. This technique used a single trendline for a certain rock type globally, but this may not be true for all rock types. Eaton 8 confirmed that formation pressure and overburden pressure gradients affect log-derived properties. As a result, the Hottman and Johnson correlations should be expanded to include overburden stress effect. Eaton 8 proposed an empirical model based on sonic data to predict the pressure gradient in shale formations.
Gardner et al. 9 analysed the data used by Hottman and Johnson and introduced another way to estimate the formation pressure by involving the overburden pressure. Bowers 10 mentioned that a power relationship exists between effective stress and sonic velocity. The author estimated the formation pressure using sonic data after rearranging the power equation and replacing the effective stress with α V − porepressure . Shell introduced another sonic-based prediction technique called Tau model by introducing a "Tau" parameter in the equation of the effective stress 11,12 . Foster and Whalen 13 were the first to use the equivalent depth method, a vertical method, to estimate the formation pressure from electrical logging. Moreover, Ham 14 utilized the equivalent depth approach with sonic, resistivity and density to predict the formation pressure and drilling fluid weight needed in Gulf Coast wells. Eaton 15,16 introduced empirical models based on resistivity or conductivity to estimate the pressure gradient in shale using well logging. This method can be fairly used in the sedimentary basins where under-compaction is the main source of overpressure 17,18 . Based on the drawbacks of the solo usage of ROP as an indicator of pore pressure, ROP should be corrected or normalized to consider the variation in different drilling parameters. Bingham 19 proposed the D exponent as an attempt to correct the ROP for the variations in weight on bit (WOB), RS and well diameter. Jorden and Shirley 20 proposed a modification to Bingham approach by introducing another term called d exp . Rehm and McClendon 21 adjusted Jorden and Shirley d exp by including the effect of drilling fluid density change. Quantitatively, formation pressure can be estimated using d c values by Eaton method and ratio method. Eaton 15 and Contreras et al. 22 observed that the corrected d exp graph is very analogous to the resistivity graph. Therefore, Eaton developed a prediction model for formation pressure gradient using estimated dc, normal d c value, and the gradients of overburden and normal formation pressures. The ratio method was proposed as a simple technique to estimate the pore pressure from d exponent or resistivity or sonic data without overburden pressure 1 . formation pressure. Li et al. 53 utilized ANN to estimate the formation pressure in the Saertu and Xingshugang oil fields in Daqing. The authors included input parameters like sonic transit time, gamma ray (GR), natural potential, and pipe pressure. Hu et al. 54 employed ANN to estimate the pore pressure. The authors included inputs such as depth, density, sonic transit time, and GR. Keshavarzi and Jahanbakhshi 55 applied neural networks to estimate the gradient in Asmari field. The inputs included porosity, permeability, density, and depth. Aliouane et al. 56 introduced ANN model to estimate the formation pressure from well logs in shale gas reservoir. Rashidi and Asadi 57 proposed ANN model to estimate the formation pressure utilizing mechanical specific energy and drilling efficiency. Ahmed et al. 58 utilized ANN to create a prediction model for formation pressure using seven inputs containing a combination of well logs and drilling data. Ahmed et al. 59 compared five machine learning techniques to predict the formation pressure with the same input parameters utilized in Ahmed et al. 58 work.
The provided models in the literature used some logging data, which may not be available while drilling as logging while drilling (LWD) is not used in all wells. Even if the LWD is present in the drill string, it is placed tens of feet above the bit that does not reflect the instantaneous response of the formations being penetrated in realtime. Other models used some reservoir properties derived from either logging data or lab measurements that limit their usage while drilling. The motivation is to develop a way to forecast the formation pressure gradient in real-time while drilling by using available drilling data only without combining them with other data that are not available in all wells. By doing so, we are maximizing the benefits of the available drilling data without involving higher costs to predict a crucial parameter that enhances the drilling operations technically and economically. The goal of this study is to use ANNs and ANFIS to propose two models for formation pressure gradient prediction in real-time using the available drilling data without additional costs. Moreover, an ANN based correlation is provided to use it directly to estimate the gradient. Unlike the developed empirical equations, the models in this study do not need a normal pressure trend to estimate the gradient.

Methodology
The methodology started with data collection followed by data cleaning and filtration. Then, data analysis was performed to get more insights about the data sets. After that, data were randomly divided while ensuring that the data sets are representative. The next stage was to select initial model parameters for the first runs. The parameters were updated, and the process was repeated until getting the best results. Once the optimum results came out, the model hyperparameters were extracted. Finally, the models were validated by blind holdout dataset that was not involved in developing the predictive models. Figure 1 briefly shows the methodology conducted in this work to develop the AI models.

Data processing and analysis
Data description. A set of data containing around 3145 points was provided from vertical sections in the same area. The set of data included the drilling data, the formation pressure and depth. The drilling data were utilized as inputs to feed the model to predict the formation pressure gradient as an output. These drilling data included hydraulic data like Q, and SPP, and mechanical data such as: RS, ROP, torque (T), and WOB. These drilling data can be recorded either at surface or downhole while drilling and are influenced by strata being penetrated and their fluid content. Statistical analysis was performed on the field data, and it showed that the data covered a broad range of the inputs and the output as presented in Table 1. For instance, the data had a good representation of the formation pressure gradient as it covers subnormal, normal and supernormal gradient values. Table 2 shows a sample of the field data utilized in this study. The relationship between each variable and the other variables was tested in terms of R as shown in Fig. 2. Moreover, cross-plots of each drilling parameter with pore pressure gradient were constructed as shown in Fig. 3.
Data processing . In AI, the quality of data is as significant as the prediction quality. As a result, the data set was cleaned by eliminating the unrepresentative values such as −999 values, and NAN (not a number). Then, the outliers which are the observations located outside the overall pattern of a distribution should be removed because they may cause serious problems in statistical analysis 60 . Outliers may exist owing to human and/or instrument error. Outlier detection can be conducted by many ways such as Z-Score (removing values located away from the mean by more than a certain number of standard deviations) and a box-and-whisker plot (removing values located beyond the upper and the lower limits determined by dividing the data into four quartiles) 61 . The quality and the reliability of the inputs were checked by various techniques like comparing the recorded variables with the ranges of the equipment and with the similar variables in the offset wells within the field. Moreover, the output was compared to the formation pressure gradient values produced by known trends     Table 3. The R, coefficient of determination (R 2 ) and AAPE were computed by Eqs. (1), (2) and (3) as presented in Supplemen-  www.nature.com/scientificreports/ tary Appendix 1. The hyperparameters providing the highest R, R 2 and the minimum error (RMSE, MSE and AAPE) had been selected. It was found that the optimum number of neurons is 10 occupying only one hidden layer. The model was built using newcf network with Levenberg-Marquardt algorithm (trainlm) as a training function to obtain the optimum weights and biases using 0.12 learning rate. Log-sigmoidal-type (logsig) activation function was used as a transfer function connecting the input and the hidden layer and a linear-type (purelin) activation function linked the hidden and output layers. Figure 4 shows a typical structure of the proposed ANN model. The proposed ANN model consists of three layers. The first layer contains the inputs; the second layer contains the neurons with their weights and biases and the third layer is the output layer. The input parameters for the model were Q, ROP, SPP and RS. The ANN model predicted the formation pressure gradient with high R of 0.981 and 0.973 for training and testing respectively. Moreover, the RMSE ranges between 0.015 to 0.018 and AAPE does not exceed 2.22% for training and testing. The obtained results for training and testing are summarized in Table 4. The error (predicted-actual) histogram shows that most predicted values have very small error ranging between − 0.02 to 0.02 psi/ft as shown in Fig. 5. The network training performance was monitored against mean squared error as shown in Fig. 6 with the best validation at epoch 48. Figure 7 presents the cross plots of the estimated versus the recorded target values showing the points coinciding with the 45° line. The recorded and predicted target values were graphed on the same plot to observe the variations through the chosen intervals, as presented in Fig. 8, indicating high estimation accuracy.
New empirical correlation for formation pressure gradient. The weights and biases were extracted from the optimized ANN model as listed in in Table 5 to provide an empirical equation for predicting the pore pressure gradient from the available drilling parameters. The developed equation in the normalized form is given by Eq. (1) Table 3. Parameters optimization process.
Steps to estimate the pressure gradient using the ANN-based correlation.
1. Normalize the input drilling parameters into PR n , SPP n , RS n andROP n using Eq. (2) and statistical data in Table 6. 2. Calculate the normalized value of the output Pg n using Eq. (1) and the optimum weights and biases listed in Table 5. The input data should be ordered as follows: pump rate (GPM), SPP (psi), rotary speed (RPM) and ROP (ft/h), with the same units. 3. The obtained Pg n is denormalized to an actual Pg value by Eq. (3): where, Pg n is the normalized Pg estimated by the developed correlation, Pg is the actual value (psi/ft). ANFIS model. ANFIS application in petroleum engineering showed a high reliability as a predictive tool 65 .
Genfis 1 that uses grid partitioning and Genfis 2 that uses subtractive clustering were both tested to obtain the model. Genfis 2 provided better results compared to Genfis 1 consequently, the ANFIS model was created by the subtractive clustering technique. The optimization process included using different combinations of cluster radius size and number of iterations. The model was built using the Sugeno-Fis type with a cluster radius of 0.2 and 400 iterations resulting in the best results. The ANFIS model predicted the target with high R of 0.98 and 0.97 for training and testing. Moreover, the RMSE was around 0.02 psi/ft and AAPE does not exceed 2.1% for training and testing. The obtained results for training and testing are summarized in Table 7. Figure 9 presents the cross plots of the predicted versus recorded target values showing the points coinciding with the 45° line. The recorded and estimated values were graphed on the same plot to observe the variations along the chosen intervals, as presented in Fig. 10, indicating high prediction accuracy.

Models validation.
The proposed ANN and ANFIS models were validated using a blind holdout data set that were not involved in developing the models. A data set (92 points) from the same field was collected to feed the models and compare the recorded versus the estimated pressure gradient values. The models provided continuous profiles of the target using the profiles of the drilling data. Both ANN and ANFIS predicted the target with high R of about 0.99 between the recorded and estimated target values for validation. Additionally, the RMSE was around 0.01 psi/ft and AAPE did not exceed 1.63% for the two models. Figure 11 presents the cross plots of the predicted versus recorded target values showing the points coinciding with the 45° line. The proposed models performed reasonably well when tested using testing and validation data sets that were not included in the training stage.

Conclusion
In this work, a novel way for estimating the formation pressure gradient using AI while drilling using the available surface drilling data was introduced. Unlike the developed empirical models in the literature, the developed models do not need a normal trend to predict the formation pressure. The developed models can be merged with any automatic drilling system to estimate the pressure gradient while drilling at low costs. Moreover, it may decrease the non-productive time by minimizing the time-consuming drilling issues by forecasting and minimizing them before they might occur. This tool may improve the drilling operations technically and economically during drilling and pre-drilling design to take the right decisions and to avoid possible issues like kick, blowout, and circulation losses. The results of this work can be listed as follows: • The optimum parameters of the ANN model are one hidden layer containing 10 neurons, newcf network with Levenberg-Marquardt algorithm (trainlm) as a training function with 0.12 learning rate, and a log-sigmoidal as a transfer function. • The optimum parameters of the ANFIS model based on subtractive clustering are cluster radius of 0.2, and 400 iterations. • The proposed models can predict the pore pressure gradient with reasonable accuracy as indicated by R around 0.975, and RMSE around 0.018 psi. • The ANN-based correlation can be directly utilized by introducing the optimum weights and biases, whenever the drilling parameters are available, instead of running the ANN model.