Log interpretation method of resistivity low-contrast oil pays in Chang 8 tight sandstone of Huanxian area, Ordos Basin by support vector machine

Resistivity low-contrast oil pays are a kind of unconventional oil resource with no obvious difference in physical and electrical properties from water layers, which makes it difficult to be identified based on the characteristics of the geophysical well logging response. In this study, the support vector machine (SVM) technology was used to interpret the resistivity low-contrast oil pays in Chang 8 tight sandstone reservoir of Huanxian area, Ordos Basin. First, the input data sequences of logging curves were selected by analyzing the relationship between reservoir fluid types and logging data. Then, the SVM classification model for fluid identification and SVR regression model for reservoir parameter prediction were constructed. Finally, these two models were applied to interpret the resistivity low-contrast oil pays in the study area. The application results show that the fluid recognition accuracy of the SVM classification model is higher than that of the logging cross plot method, back propagation neural network method and radial basis function neural network method. The calculation accuracy of permeability and water saturation predicted by the SVR regression model is higher than that based on the experimental fitting model, which indicates that it is feasible to carry out logging interpretation and evaluation of the resistivity low-contrast oil pays by the SVM method. The research results not only provide an important reference and basis for the review of old wells but also provide technical support for the exploration and development of new strata.

With the increasing volatility of international oil prices and the continuous reduction of oil reservoir scale, the resistivity low-contrast oil resources with strong concealment has received much interest in recent years. Carrying out the research on logging interpretation and evaluation method of resistivity low-contrast oil pays has become the most practical choice to supplement conventional oil resources and reduce oilfield exploration cost 1,2,27 . The resistivity low-contrast oil pay has the characteristics of little difference in porosity and resistivity logging response from water layer, and the oil saturation of resistivity low-contrast oil layer is relatively low 3,4 . At present, low porosity and low permeability reservoirs represented by tight sandstone has become the main battlefield to ensure the supply of oil and gas resources 5 . However, the complex pore structure and strong heterogeneity of tight sandstone reservoir reduce the sensitivity of the logging response to pore fluid, resulting in more resistivity low-contrast oil pays developed, and it is more difficult to interpret and identify this kind of reservoir by using conventional logging interpretation methods 6,7,25 .
In recent years, data mining technology has been increasingly applied in oil exploration and development, especially for unconventional reservoirs with unclear logging response characteristics, and how to use data mining technology to effectively solve some complex problems existing in the actual production of oil fields is of great significance [8][9][10] . Some classical optimization algorithms, such as the neural network method, support vector machine and fuzzy clustering method, provide a new technology for the identification of resistivity www.nature.com/scientificreports/ low-contrast oil pays 11,12 . Guo et al. 13 predicted the water saturation at the lower limit of three water models by using the generalized neural network (GRNN) and particle swarm optimization support vector machine (PSO-SVM), which is in good agreement with the core analysis results in the Sulige tight sandstone reservoir. Chen and Peng 14 used a BP neural network to train and learn the mathematical characteristics of logging curves of low resistivity oil reservoirs, which improved the accuracy of fluid identification and reservoir parameter prediction. Singh et al. 15 used the stepwise linear regression, multilayer feed forward neural (MLFN) network method to predict the 2D distribution of P-wave velocity, resistivity, porosity, and gas hydrate saturation. Miah et al. 16 used the multilayer perception artificial neural network (MLP-ANN) and kernel function-based least-squares support vector machine (LS-SVM) techniques to develop predictive models for water saturation, and the prediction performance was better than that of other models. Baouche and Nabawy 17 applied the fuzzy logic technique that enabled a reservoir zonation of the Southern Hassi R'Mel Gas Field into several hydraulic flow units with various reservoir properties, and then the permeability values of each flow unit were predicted. With the deepening of research, many machine learning algorithms based on theoretical mathematics have been proposed, and each has its own advantages and disadvantages. However, the key to applying this kind of method to log interpretation of actual formation is to select appropriate training data as input 18,19 . In this study, the support vector machine (SVM) learning method based on VC dimension theory in statistical learning and the structural risk minimization principle (SRM) were used to establish the interpretation model. By analyzing the relationship between logging response and pore fluid, training data were optimized, and SVM classification model for fluid identification and support vector machine regression (SVR) model for reservoir parameter prediction were established. The application results show that the log interpretation models established by the SVM method are more effective than conventional method, which proves that it is feasible to identify and evaluate resistivity low-contrast oil pays based on SVM method.

Geological and logging response characteristics of research area
The Ordos Basin is the second largest sedimentary basin in China, bearing more than half of China's energy output 20,21 . The Huanxian area is located in southwestern Ordos Basin, and the regional geological structure crosses the Tianhuan Depression and Yishan Slope from west to the east (Fig. 1). The Chang 8 member of the Yanchang Formation developed in the Huanxian area is a typical tight sandstone reservoir with large sedimentary thickness. The oil source of Chang 8 tight sandstone reservoir mainly comes from the overlying Chang 7 high-quality source rock, which makes it has great exploration and development potential 22,23 . However, with deepening of oil and gas exploration and development in this area, the problem of identification and evaluation of resistivity low-contrast oil reservoir has become increasingly prominent 7,24,25 . www.nature.com/scientificreports/ According to previous studies, the genesis of resistivity low-contrast oil reservoir is very complex, and it is usually caused by many factors 26,27 . Figure 2 shows the relationship between resistivity and density logging response of oil layer and water layer established by the oil test data in the study area. And the logging response characteristics of different fluids were shown in Table 1. It can be seen that the density (DEN) value and reservoir resistivity (RT) value of resistivity low-contrast oil reservoir is lower than that high resistivity oil reservoir. And the relative shale content ( GR) has little difference between resistivity low-contrast oil reservoir and high resistivity oil reservoir, indicating that the shale content of reservoir has little effect on resistivity. In addition, the relative amplitude of spontaneous potential ( SP) in resistivity low-contrast oil reservoir is higher than that of high resistivity oil reservoir, which reflects that the difference of formation water salinity property is likely to be an important reason for the change of electrical property. Besides, the complex pore structure and high irreducible water saturation in tight sandstone reservoir make it difficult for conventional logging to identify and evaluate resistivity low-contrast oil reservoir, which seriously restricts the exploration progress and development of oil resources in this area. Therefore, it is important to develop more effective methods to provide new logging technical support for the exploration and development of resistivity low-contrast oil layers.

Method and theory
Different from the neural network method to solve the number of hidden nodes of neurons, the basic idea of a support vector machine for reservoir parameter prediction is to map the input space to a high-dimensional space by introducing a kernel function and then solve a linearly separable hyperplane or function in this highdimensional space, which can separate all data types in the original space. The greater the separation distance is, the better the classification effect. Finally, the nonlinear discrimination ability of the original spatial data is realized 28 .
Taking T = (x i , y i )|i = 1, 2, . . . , n and x i ∈ R P as the input data, where x i is the logging data related to the predicted parameters, and y i is the core analysis data, that is, the target value.
Suppose that in high-dimensional space, the hyperplane or line function that can separate the two types of samples satisfies: where w ij is the weight vector representing high-dimensional unknown coefficients and b ij is a constant term. To use function (1) to distinguish all input data samples without error, function y k (�w · x�+b) − 1 ≥ 0 should be satisfied. When the classification interval is maximum, function φ(w) = 1 2 w T w should be minimum. In this way, the problem of solving the optimal hyperplane in high-dimensional space is transformed into the minimum value problem of the following convex programming function:  where ξ k is a nonnegative relaxation variable introduced when the sample data are linearly inseparable; C is a penalty parameter, and the greater its value is, the heavier the penalty for misclassification. The first term in the objective function (2) is to increase the classification interval, which effectively controls the generalization ability of the model. The second term is the training error to reduce the experience risk.
To map the training data set to the high-dimensional space, a kernel function needs to be introduced; that is, the convex programming problem of Eq. (2) is transformed into a quadratic programming problem. The expected weight vector can be written as w = n i=1 (α * i − α i )x i , and finally, the analytical expression of the support vector machine regression function is as follows: where α i and α * i are the nonnegative Lagrange multipliers and K(x, x i ) is a kernel function satisfying the Mercer condition. The commonly used kernel functions mainly include the polynomial kernel function, Gaussian kernel function, radial basis function kernel function and sigmoid kernel function.
The input sample set data have different physical meanings and different dimensions and orders of magnitude, and it is necessary to normalize the original data before learning and training. The normalization method selected in this paper is the mapminmax function, and its normalization formula is: where x is the normalized data, x is the input data, x max and x min are the maximum and minimum values of the input data, and the range of normalized data is between −1 and 1.
The libsvm toolbox in MATLAB software is used for SVM model learning and training, and the radial basis function is selected as the kernel function, that is, . The combination of grid search and k-fold cross validation is used to determine the best penalty factor ( C ) and kernel function parameters ( √ 2σ ), that is, the different combinations of penalty factor and kernel function parameters are selected to calculate the mean square errors obtained through training, and one group with the smallest mean square error is obtained as the optimal parameters. Figure 3 shows the flowchart of constructing the classification model and regression model by using the SVM method. The training samples are used for model training in the input data, the testing samples is used to determine the optimal model parameters, and the model validation samples are used to check the application effect of the constructed models. SVM classification model. Fluid identification using SVM is a multiclassification problem, but the SVM method initially solves two classification problems. Therefore, it is necessary to extend SVM method and construct a reasonable multiclassification coding scheme. At present, there are four main methods to construct SVM multiclassifiers: "one against one", "one against rest", "SVM decision tree" and "one-time solution method". www.nature.com/scientificreports/ When solving practical multiclassification problems, the "one-to-one" method has a better effect than other methods 29,30 . Therefore, this method is selected to construct an SVM multiclassifier in this paper, and the basic idea is that if there are class k data, class I data and class J data are selected to construct a classifier, where I < J, so k (k−1)/2 classifiers need to be trained. For class I and class J data, a two classification problem needs to be solved, and the voting method is used to solve this problem; that is, if the function judges that it belongs to class I, the number of votes of class I is increased by 1. Otherwise, the number of votes of class J is increased by 1, and the final output result is the class with the largest number of votes.
To build the SVM classification model for fluid identification, we must first determine the input logging data or parameters sensitive to the pore fluid. Considering that the study area is mainly conventional logging curves, nuclear magnetic resonance logs and array acoustic logs are not widely used in the whole area. Therefore, according to the characteristics of logging curve, the fluid identification factors sensitive to fluid type are selected as the input data, including (PERM/φ) 1/2 , D R , QT , Rt , SP , R wa and R wa_SP , where (PERM/φ) 1/2 is the comprehensive physical property index, which PERM represents the permeability, and φ is the porosity of reservoir. QT is the total hydrocarbon logging value, the greater the value, the greater the probability of possible oil and gas. Rt is the resistivity logging value. The specific calculation methods of other parameters are as follows: where SP is the relative amplitude of the spontaneous potential. When the salinity difference of formation water is small, the higher the oil saturation of the reservoir is, the smaller the SP value; SP is the spontaneous potential logging value; and SP Shale and SP sand are the spontaneous potential values of pure mudstone and pure sandstone, respectively.
where D R is the resistivity difference parameter, and its value is related to the characteristics of mud invasion into permeable formation. The AT10 , AT20 , AT30 , AT60 and AT90 are the resistivity logs of 10in, 20in, 30in, 60in, and 90in depth from the wellbore, respectively. In the target interval we studied, the permeability of the reservoir is poor, and the micro pores are relatively developed. For the fresh water mud, the oil layer is characterized by low invasion, while the water layer is characterized by high invasion. Therefore, the value of D R is large for the oil layer, while the value for the water layer is small 31,32 .
where R wa is the apparent formation water resistivity calculated by the Archie formula when the reservoir water saturation is assumed to be 100%, m is the cementation index, and a and b are the cementation indices.
where R wa_SP is the resistivity of the pure water layer calculated by spontaneous potential logging data. R mf is the resistivity of the mud filtrate, and U SSP is the static spontaneous potential value. K is the diffusion adsorption electromotive force coefficient. In water-saturated layers, R wa is equal to or less than R wa_SP , and with the increase in reservoir oil saturation, R wa is higher than R wa_SP .
The output characteristics are represented by digital labels representing different fluid types, in which the number 2 represents the oil layer, the number 1 represents the oil-water layer, the number − 2 represents the water layer, and the number − 1 represents the dry layer. According to the oil test conclusion of the target interval in the study area, the input logging parameters are matched and combined with the numbers representing different pore fluid types to form the input training set of the model. To ensure the effectiveness and representativeness of the input training set, 204 training samples are selected in the study area, of which 185 are training sample sets and 19 are test sample sets. Table 2 shows the logging parameters and oil test results of these 19 test sample sets. Figure 4 shows the plan maps of the mean square error and correlation coefficient trained by the fourfold cross validation method under different C and √ 2σ parameter combinations. By looking for the penalty factor and kernel function parameters with the smallest mean square error and the highest correlation coefficient of 19 test sample sets, the optimal penalty factor and kernel function parameter combination of the classification model is C = 4096 and √ 2σ = 2.
SVR regression model. The permeability and water saturation of unconventional reservoirs are seriously affected by pore structure, and it is difficult to obtain these two parameters based on conventional logging curves. Therefore, the support vector regression (SVR) method is considered to construct the prediction model of reservoir permeability and water saturation. The idea of using SVR to build a reservoir parameter prediction model is the same as the basic process of the SVM classification model. which is to first select the optimal dataset with high correlation to the prediction target value as the input. The relationship between permeability, water saturation and logging curve is very complex. To determine the appropriate input training set, different logging data set combinations were used as the input training data, and the optimal input data set was selected by comparing the errors of the prediction model. The combination of different input logging data sets is shown in Table 3, including logging curves reflecting the reservoir lithology (ΔSP and ΔGR), reservoir physical properties (DEN, AC, CNL), reservoir electrical properties (RT), and reservoir porosity calculated by core calibration From 16 wells in the study area, approximately 252 reliable and representative closed coring data are selected to analyze the reservoir permeability and water saturation. And 50 samples are randomly selected for back judgment, and the optimal input training sample set combination is selected according to the average relative error of the back judgment results. Figure 5 shows the change in the average relative error of the regression permeability model and regression water saturation model when using different input data sets. Combination 4 has the smallest (10.7%) average relative error to predict reservoir permeability, which reflects that reservoir permeability is jointly affected by porosity and shale content. Adding porosity data cannot improve the accuracy. Combination 5 has the smallest (2.1%) average relative error to predict reservoir water saturation. From different average relative errors, the average relative error of the saturation regression model changes little from combination 2  www.nature.com/scientificreports/ to combination 5, basically floating up and down by 2%, which illustrates that the porosity data calculated by conventional methods can improve the accuracy, but it is not obvious, which also shows that the reservoir water saturation is mainly related to the electrical and comprehensive physical properties of the reservoir. Therefore, the optimal input training data set by the SVR regression permeability model is finally selected as combination 4, and the optimal input training data set by the SVR regression water saturation model is combination 5.  Table 4 shows the comparison of fluid identification results of 19 test sample sets by using the SVM classification model, cross plot of porosity and resistivity log, BP model and RBF model. And the oil test results with only oil producing are resistivity low-contrast oil pays. It can be seen that the SVM classification model has the highest fluid identification accuracy (89.473%), followed by the RBF model (84.210%) and BP model (78.947%), and the conventional fluid recognition method has the lowest fluid identification accuracy (68.421%). This shows that using the SVM classification model to identify the resistivity low-contrast oil layer is effective and feasible. Moreover, compared with the commonly used artificial neural network algorithm (BP and RBF), the SVM classification model has certain advantages in solving the problem of small sample training, stronger generalization ability and better stability. SVR model. Figure 6 is the log interpretation result of an oil production well (M165) with low resistivity, in which the testing interval is 2590-2596.5 m, and the average resistivity is about 12.6 Ω•m. The 8th and 9th  In addition, the calculated permeability and water saturation by using the SVR model and the conventional model are compared with the core analysis data of 129 sealed cores from 12 wells (Fig. 7). The results show that the average relative error of permeability calculated by the multiple logging curves regression model is 0.385, and the permeability predicted by the SVR model is 0.259. The average relative error of water saturation calculated by the Archie model is 0.188, while the saturation predicted by the SVR model is 0.097. This further verifies that the constructed SVR prediction model is feasible and effective.

Application effect analysis
Discussion. Based on the support vector machine learning method, this paper constructs a classification model for resistivity low-contrast oil reservoir identification and SVR regression model for reservoir parameter prediction. Support vector machine learning method has the characteristics of low requirements for the number of training samples and not affected by local extremum and strong generalization ability, which makes it great advantages in solving complex practical problems such as nonlinear regression and classification compared with the classical neural network method. The application effect analysis also shows that the constructed model has a higher accuracy than the classical neural network prediction method and conventional logging interpretation model. However, it should be noted that during the process of model construction, the optimal input data set should be effectively selected. Therefore, in order to improve the application effect of SVM method in other similar areas, the logging response and reservoir characteristics of resistivity low-contrast oil pays should be analyzed to build an optimal input data sets.

Conclutions
(1) There is no obvious difference in physical and electrical properties between the resistivity low-contrast oil pay and water layer in the tight sandstone reservoir of the Chang 8 member in the Huanxian area, Ordos Basin. It is difficult to effectively identify and evaluate resistivity low-contrast oil pays by using conventional logging data, which seriously restricts the exploration progress and development benefits of oil resources in this area. (2) This study analyzed the relationship between the logging response and pore fluid to optimize the input training dataset. The SVM learning method was used to construct the SVM classification model and SVR regression model for fluid identification and reservoir parameter prediction. (3) The application results show that the SVM classification model has higher fluid identification accuracy, and the conventional fluid recognition method (cross plot of porosity and resistivity log) has the lowest fluid identification accuracy. The reservoir permeability and water saturation predicted by the SVR regression model are more consistent with the core analysis results, which proves that it is effective and feasible to interpret the resistivity low-contrast oil pays based on SVM method.