Introduction

With the increasing volatility of international oil prices and the continuous reduction of oil reservoir scale, the resistivity low-contrast oil resources with strong concealment has received much interest in recent years. Carrying out the research on logging interpretation and evaluation method of resistivity low-contrast oil pays has become the most practical choice to supplement conventional oil resources and reduce oilfield exploration cost1,2,27. The resistivity low-contrast oil pay has the characteristics of little difference in porosity and resistivity logging response from water layer, and the oil saturation of resistivity low-contrast oil layer is relatively low3,4. At present, low porosity and low permeability reservoirs represented by tight sandstone has become the main battlefield to ensure the supply of oil and gas resources5. However, the complex pore structure and strong heterogeneity of tight sandstone reservoir reduce the sensitivity of the logging response to pore fluid, resulting in more resistivity low-contrast oil pays developed, and it is more difficult to interpret and identify this kind of reservoir by using conventional logging interpretation methods6,7,25.

In recent years, data mining technology has been increasingly applied in oil exploration and development, especially for unconventional reservoirs with unclear logging response characteristics, and how to use data mining technology to effectively solve some complex problems existing in the actual production of oil fields is of great significance8,9,10. Some classical optimization algorithms, such as the neural network method, support vector machine and fuzzy clustering method, provide a new technology for the identification of resistivity low-contrast oil pays11,12. Guo et al.13 predicted the water saturation at the lower limit of three water models by using the generalized neural network (GRNN) and particle swarm optimization support vector machine (PSO-SVM), which is in good agreement with the core analysis results in the Sulige tight sandstone reservoir. Chen and Peng14 used a BP neural network to train and learn the mathematical characteristics of logging curves of low resistivity oil reservoirs, which improved the accuracy of fluid identification and reservoir parameter prediction. Singh et al.15 used the stepwise linear regression, multilayer feed forward neural (MLFN) network method to predict the 2D distribution of P-wave velocity, resistivity, porosity, and gas hydrate saturation. Miah et al.16 used the multilayer perception artificial neural network (MLP-ANN) and kernel function-based least-squares support vector machine (LS-SVM) techniques to develop predictive models for water saturation, and the prediction performance was better than that of other models. Baouche and Nabawy17 applied the fuzzy logic technique that enabled a reservoir zonation of the Southern Hassi R'Mel Gas Field into several hydraulic flow units with various reservoir properties, and then the permeability values of each flow unit were predicted. With the deepening of research, many machine learning algorithms based on theoretical mathematics have been proposed, and each has its own advantages and disadvantages. However, the key to applying this kind of method to log interpretation of actual formation is to select appropriate training data as input18,19. In this study, the support vector machine (SVM) learning method based on VC dimension theory in statistical learning and the structural risk minimization principle (SRM) were used to establish the interpretation model. By analyzing the relationship between logging response and pore fluid, training data were optimized, and SVM classification model for fluid identification and support vector machine regression (SVR) model for reservoir parameter prediction were established. The application results show that the log interpretation models established by the SVM method are more effective than conventional method, which proves that it is feasible to identify and evaluate resistivity low-contrast oil pays based on SVM method.

Geological and logging response characteristics of research area

The Ordos Basin is the second largest sedimentary basin in China, bearing more than half of China's energy output20,21. The Huanxian area is located in southwestern Ordos Basin, and the regional geological structure crosses the Tianhuan Depression and Yishan Slope from west to the east (Fig. 1). The Chang 8 member of the Yanchang Formation developed in the Huanxian area is a typical tight sandstone reservoir with large sedimentary thickness. The oil source of Chang 8 tight sandstone reservoir mainly comes from the overlying Chang 7 high-quality source rock, which makes it has great exploration and development potential22,23. However, with deepening of oil and gas exploration and development in this area, the problem of identification and evaluation of resistivity low-contrast oil reservoir has become increasingly prominent7,24,25.

Figure 1
figure 1

Geographical location of the research area.

According to previous studies, the genesis of resistivity low-contrast oil reservoir is very complex, and it is usually caused by many factors26,27. Figure 2 shows the relationship between resistivity and density logging response of oil layer and water layer established by the oil test data in the study area. And the logging response characteristics of different fluids were shown in Table 1. It can be seen that the density (DEN) value and reservoir resistivity (RT) value of resistivity low-contrast oil reservoir is lower than that high resistivity oil reservoir. And the relative shale content (\(\Delta\)GR) has little difference between resistivity low-contrast oil reservoir and high resistivity oil reservoir, indicating that the shale content of reservoir has little effect on resistivity. In addition, the relative amplitude of spontaneous potential (\(\Delta\)SP) in resistivity low-contrast oil reservoir is higher than that of high resistivity oil reservoir, which reflects that the difference of formation water salinity property is likely to be an important reason for the change of electrical property. Besides, the complex pore structure and high irreducible water saturation in tight sandstone reservoir make it difficult for conventional logging to identify and evaluate resistivity low-contrast oil reservoir, which seriously restricts the exploration progress and development of oil resources in this area. Therefore, it is important to develop more effective methods to provide new logging technical support for the exploration and development of resistivity low-contrast oil layers.

Figure 2
figure 2

The cross plot of reservoir resistivity and density.

Table 1 Logging response characteristics of different fluids.

Method and theory

Different from the neural network method to solve the number of hidden nodes of neurons, the basic idea of a support vector machine for reservoir parameter prediction is to map the input space to a high-dimensional space by introducing a kernel function and then solve a linearly separable hyperplane or function in this high-dimensional space, which can separate all data types in the original space. The greater the separation distance is, the better the classification effect. Finally, the nonlinear discrimination ability of the original spatial data is realized28.

Taking \({\text{T} = }\left\{ {{(}{\mathbf{x}}_{{\mathbf{i}}}, \text{y}_{\text{i}} {)}\left| {{\text{i} = 1,2,}\ldots,\text{n}} \right.} \right\}\) and \({\mathbf{x}}_{{\mathbf{i}}} \in R^{P}\) as the input data, where \({\mathbf{x}}_{{\mathbf{i}}}\) is the logging data related to the predicted parameters, and \(\text{y}_{i}\) is the core analysis data, that is, the target value.

Suppose that in high-dimensional space, the hyperplane or line function that can separate the two types of samples satisfies:

$$g(\text{x}_{i} ) = \left\langle {{\mathbf{w}}_{ij}\cdot {\mathbf{x}}_{i} } \right\rangle + \text{b}_{ij}$$
(1)

where \({\mathbf{w}}_{ij}\) is the weight vector representing high-dimensional unknown coefficients and \(\text{b}_{ij}\) is a constant term. To use function (1) to distinguish all input data samples without error, function \(\text{y}_{\text{k}} {(}\left\langle {{\mathbf{w}}\cdot{\mathbf{x}}} \right\rangle { + \text{b}) - 1} \ge {0}\) should be satisfied. When the classification interval is maximum, function \(\phi ({\mathbf{w}}) = \frac{{1}}{{2}}{\mathbf{w}}^{\text{T}} {\mathbf{w}}\) should be minimum. In this way, the problem of solving the optimal hyperplane in high-dimensional space is transformed into the minimum value problem of the following convex programming function:

$$\phi ({\mathbf{w}},\xi ) = \frac{{1}}{{2}}{\mathbf{w}}^{\text{T}} {\mathbf{w}}{ + \text{C}}\sum\limits_{{{\text{k} = 1}}}^{\text{n}} {\upxi _{\text{k}} }$$
(2)

Which satisfies the following constraint condition:

$$\text{y}_{\text{k}} {(}\left\langle {{\mathbf{w}}\cdot{\mathbf{x}}} \right\rangle { + \text{b})} \ge {1 - \xi }_{\text{k}},\,\, { \text{k} = 1,}\ldots,\text{n}$$
(3)

where \(\xi_{k}\) is a nonnegative relaxation variable introduced when the sample data are linearly inseparable; \(\text{C}\) is a penalty parameter, and the greater its value is, the heavier the penalty for misclassification. The first term in the objective function (2) is to increase the classification interval, which effectively controls the generalization ability of the model. The second term is the training error to reduce the experience risk.

To map the training data set to the high-dimensional space, a kernel function needs to be introduced; that is, the convex programming problem of Eq. (2) is transformed into a quadratic programming problem. The expected weight vector can be written as \(\text{w} = \sum\limits_{i = 1}^{n} {(\alpha_{i}^{*} - \alpha_{i} )} {\mathbf{x}}_{{\mathbf{i}}}\), and finally, the analytical expression of the support vector machine regression function is as follows:

$$f\text(x) = \sum\limits_{i = 1}^{n} {(\alpha_{i} - \alpha_{i}^{*} } )K(\text{x},{\mathbf{x}}_{{\mathbf{i}}} ) + \text{b}$$
(4)

where \(\alpha_{i}\) and \(\alpha_{i}^{*}\) are the nonnegative Lagrange multipliers and \(K(\text{x},{\mathbf{x}}_{{\mathbf{i}}} )\) is a kernel function satisfying the Mercer condition. The commonly used kernel functions mainly include the polynomial kernel function, Gaussian kernel function, radial basis function kernel function and sigmoid kernel function.

The input sample set data have different physical meanings and different dimensions and orders of magnitude, and it is necessary to normalize the original data before learning and training. The normalization method selected in this paper is the mapminmax function, and its normalization formula is:

$$\widehat{x} = 2*(\text{x} - \text{x}_{\min } )/(\text{x}_{\max } - \text{x}_{\min } ) - 1$$
(5)

where \(\widehat{x}\) is the normalized data, \(\text{x}\) is the input data, \(\text{x}_{\max }\) and \(\text{x}_{\min }\) are the maximum and minimum values of the input data, and the range of normalized data is between −1 and 1.

The libsvm toolbox in MATLAB software is used for SVM model learning and training, and the radial basis function is selected as the kernel function, that is, \(\text{K}(\text{x}_{\text{i}}, \text{x}_{\text{j}} {) = \text{exp}}\left( { - \frac{{\left\| {\text{x}_{\text{i}}- \text{ x}_{\text{j}} } \right\|^{2} }}{{2\sigma^{2} }}} \right)\). The combination of grid search and k-fold cross validation is used to determine the best penalty factor (\(\text{C}\)) and kernel function parameters (\(\sqrt 2 \sigma\)), that is, the different combinations of penalty factor and kernel function parameters are selected to calculate the mean square errors obtained through training, and one group with the smallest mean square error is obtained as the optimal parameters.

Figure 3 shows the flowchart of constructing the classification model and regression model by using the SVM method. The training samples are used for model training in the input data, the testing samples is used to determine the optimal model parameters, and the model validation samples are used to check the application effect of the constructed models.

Figure 3
figure 3

Flowchart of constructing the classification model and regression model by using the SVM method.

SVM classification model

Fluid identification using SVM is a multiclassification problem, but the SVM method initially solves two classification problems. Therefore, it is necessary to extend SVM method and construct a reasonable multiclassification coding scheme. At present, there are four main methods to construct SVM multiclassifiers: "one against one", "one against rest", "SVM decision tree" and "one-time solution method". When solving practical multiclassification problems, the "one-to-one" method has a better effect than other methods29,30. Therefore, this method is selected to construct an SVM multiclassifier in this paper, and the basic idea is that if there are class k data, class I data and class J data are selected to construct a classifier, where I < J, so k (k−1)/2 classifiers need to be trained. For class I and class J data, a two classification problem needs to be solved, and the voting method is used to solve this problem; that is, if the function judges that it belongs to class I, the number of votes of class I is increased by 1. Otherwise, the number of votes of class J is increased by 1, and the final output result is the class with the largest number of votes.

To build the SVM classification model for fluid identification, we must first determine the input logging data or parameters sensitive to the pore fluid. Considering that the study area is mainly conventional logging curves, nuclear magnetic resonance logs and array acoustic logs are not widely used in the whole area. Therefore, according to the characteristics of logging curve, the fluid identification factors sensitive to fluid type are selected as the input data, including \({(}{\text{PERM}}{/}\phi {)}^{{{1/2}}}\), \(\text{D}_{{\text{R}}}\), \(\text{QT}\), \(\text{Rt}\), \({\Delta \text{SP}}\), \(\text{R}_{{\text{wa}}}\) and \(\text{R}_{{{\text{wa}\_\text{SP}}}}\), where \({(}{\text{PERM}}{/}\phi {)}^{{{1/2}}}\) is the comprehensive physical property index, which \({\text{PERM}}\) represents the permeability, and \(\phi\) is the porosity of reservoir. \(\text{QT}\) is the total hydrocarbon logging value, the greater the value, the greater the probability of possible oil and gas. \(\text{Rt}\) is the resistivity logging value. The specific calculation methods of other parameters are as follows:

$${\Delta \text{SP} = }\frac{{\text{SP}_{{\text{Shale}}} { - \text{SP}}}}{{\text{SP}{}_{{\text{shale}}}-\text{ SP}_{{\text{sand}}} }}$$
(6)

where \({\Delta {\text{SP}}}\) is the relative amplitude of the spontaneous potential. When the salinity difference of formation water is small, the higher the oil saturation of the reservoir is, the smaller the \({\Delta \text{SP}}\) value; \(\text{SP}\) is the spontaneous potential logging value; and \(\text{SP}_{{\text{Shale}}}\) and \(\text{SP}_{{\text{sand}}}\) are the spontaneous potential values of pure mudstone and pure sandstone, respectively.

$$\text{D}_{\text{R}} { = }\frac{{\text{AT90}}}{{\text{AT10}}} \times \frac{{\text{AT90}}}{{\text{AT20}}} \times \frac{{\text{AT90}}}{{\text{AT30}}} \times \frac{{\text{AT90}}}{{\text{AT60}}}$$
(7)

where \(\text{D}_{\text{R}}\) is the resistivity difference parameter, and its value is related to the characteristics of mud invasion into permeable formation. The \(\text{AT10}\), \(\text{AT20}\), \(\text{AT30}\), \(\text{AT60}\) and \(\text{AT90}\) are the resistivity logs of 10in, 20in, 30in, 60in, and 90in depth from the wellbore, respectively. In the target interval we studied, the permeability of the reservoir is poor, and the micro pores are relatively developed. For the fresh water mud, the oil layer is characterized by low invasion, while the water layer is characterized by high invasion. Therefore, the value of \(\text{D}_{\text{R}}\) is large for the oil layer, while the value for the water layer is small31,32.

$$\text{R}_{{\text{wa}}} { = }\frac{{\text{R}_{\text{t}} *\phi^{\text{m}} }}{{\text{ab}}}{ }$$
(8)

where \(\text{R}_{{\text{wa}}}\) is the apparent formation water resistivity calculated by the Archie formula when the reservoir water saturation is assumed to be 100%, \(\text{m}\) is the cementation index, and \(\text{a}\) and \(\text{b}\) are the cementation indices.

$$\text{R}{}_{{{\text{wa}\_\text{SP}}}}{ = }\frac{{\text{R}_{{\text{mf}}} }}{{{10}^{{_{{^{{\text{U}_{{\text{ssp}}} \text{/K}}} }} }} }}$$
(9)

where \(\text{R}{}_{{{\text{wa}\_\text{SP}}}}\) is the resistivity of the pure water layer calculated by spontaneous potential logging data. \(\text{R}_{{\text{mf}}}\) is the resistivity of the mud filtrate, and \(\text{U}_{{\text{SSP}}}\) is the static spontaneous potential value. \(\text{K}\) is the diffusion adsorption electromotive force coefficient. In water-saturated layers, \(\text{R}_{{\text{wa}}}\) is equal to or less than \(\text{R}{}_{{\text{wa}\_\text{SP}}}\), and with the increase in reservoir oil saturation, \(\text{R}_{{\text{wa}}}\) is higher than \(\text{R}{}_{{{\text{wa}\_\text{SP}}}}\).

The output characteristics are represented by digital labels representing different fluid types, in which the number 2 represents the oil layer, the number 1 represents the oil–water layer, the number − 2 represents the water layer, and the number − 1 represents the dry layer. According to the oil test conclusion of the target interval in the study area, the input logging parameters are matched and combined with the numbers representing different pore fluid types to form the input training set of the model. To ensure the effectiveness and representativeness of the input training set, 204 training samples are selected in the study area, of which 185 are training sample sets and 19 are test sample sets. Table 2 shows the logging parameters and oil test results of these 19 test sample sets.

Table 2 The logging parameters and oil test results of these 19 test sample sets.

Figure 4 shows the plan maps of the mean square error and correlation coefficient trained by the fourfold cross validation method under different \(\text{C}\) and \(\sqrt 2 \sigma\) parameter combinations. By looking for the penalty factor and kernel function parameters with the smallest mean square error and the highest correlation coefficient of 19 test sample sets, the optimal penalty factor and kernel function parameter combination of the classification model is C = 4096 and \(\sqrt 2 \sigma\) = 2.

Figure 4
figure 4

(a) The mean square errors of testing sample sets with different combinations of penalty factors and kernel function parameters, (b) the correlation coefficient of testing sample sets with different combinations of penalty factors and kernel function parameters.

SVR regression model

The permeability and water saturation of unconventional reservoirs are seriously affected by pore structure, and it is difficult to obtain these two parameters based on conventional logging curves. Therefore, the support vector regression (SVR) method is considered to construct the prediction model of reservoir permeability and water saturation. The idea of using SVR to build a reservoir parameter prediction model is the same as the basic process of the SVM classification model. which is to first select the optimal dataset with high correlation to the prediction target value as the input. The relationship between permeability, water saturation and logging curve is very complex. To determine the appropriate input training set, different logging data set combinations were used as the input training data, and the optimal input data set was selected by comparing the errors of the prediction model. The combination of different input logging data sets is shown in Table 3, including logging curves reflecting the reservoir lithology (ΔSP and ΔGR), reservoir physical properties (DEN, AC, CNL), reservoir electrical properties (RT), and reservoir porosity calculated by core calibration logging curve method (POR). And the optimal value of SVR model parameters (\(\text{C}\) and \(\sqrt 2 \sigma\)) are still obtained by the fourfold cross validation method.

Table 3 The combination of different input logging data sets.

From 16 wells in the study area, approximately 252 reliable and representative closed coring data are selected to analyze the reservoir permeability and water saturation. And 50 samples are randomly selected for back judgment, and the optimal input training sample set combination is selected according to the average relative error of the back judgment results. Figure 5 shows the change in the average relative error of the regression permeability model and regression water saturation model when using different input data sets. Combination 4 has the smallest (10.7%) average relative error to predict reservoir permeability, which reflects that reservoir permeability is jointly affected by porosity and shale content. Adding porosity data cannot improve the accuracy. Combination 5 has the smallest (2.1%) average relative error to predict reservoir water saturation. From different average relative errors, the average relative error of the saturation regression model changes little from combination 2 to combination 5, basically floating up and down by 2%, which illustrates that the porosity data calculated by conventional methods can improve the accuracy, but it is not obvious, which also shows that the reservoir water saturation is mainly related to the electrical and comprehensive physical properties of the reservoir. Therefore, the optimal input training data set by the SVR regression permeability model is finally selected as combination 4, and the optimal input training data set by the SVR regression water saturation model is combination 5.

Figure 5
figure 5

Characteristics of the average relative error by using different input data set combinations.

Application effect analysis

SVM classification model

To evaluate the reliability of the SVM classification model for fluid recognition, conventional fluid recognition method (cross plot of porosity and resistivity log), back propagation neural network (BP) method and radial basis function neural network (RBF) method were introduced for comparison. The input parameters of the BP and RBF neural network prediction model are the same as those of the SVM classification model. The optimal number of neuron layers of BP model is two layers, and the number of neurons in each layer is 12 and 14. The training function adopts the gradient descent adaptive learning rate function (traingdx function). The Gaussian function is selected as the basis function of RBF model, and the optimal Gaussian width of the training model is 0.1. Table 4 shows the comparison of fluid identification results of 19 test sample sets by using the SVM classification model, cross plot of porosity and resistivity log, BP model and RBF model. And the oil test results with only oil producing are resistivity low-contrast oil pays. It can be seen that the SVM classification model has the highest fluid identification accuracy (89.473%), followed by the RBF model (84.210%) and BP model (78.947%), and the conventional fluid recognition method has the lowest fluid identification accuracy (68.421%). This shows that using the SVM classification model to identify the resistivity low-contrast oil layer is effective and feasible. Moreover, compared with the commonly used artificial neural network algorithm (BP and RBF), the SVM classification model has certain advantages in solving the problem of small sample training, stronger generalization ability and better stability.

Table 4 Comparison of fluid identification results by different methods.

SVR model

Figure 6 is the log interpretation result of an oil production well (M165) with low resistivity, in which the testing interval is 2590–2596.5 m, and the average resistivity is about 12.6 Ω∙m. The 8th and 9th tracks in Fig. 6 are the calculation results of reservoir permeability and water saturation, respectively. The blue solid line in 8th track is the permeability calculated by the multiple logging curves regression of acoustic log and density log, and the yellow solid line is the permeability curve predicted by the SVR model. The blue solid line in 9th track is the saturation calculated by the Archie saturation model, and the parameters of Archie model are a = 1.0, b = 1.13, m = 1.99, n = 1.85 from petroelectric experiment of 16 cores. The yellow solid line in 9th track is the water saturation curve predicted by the SVR model. It can be seen that the reservoir parameters calculated by the SVR model are more consistent with the core analysis results.

Figure 6
figure 6

Comparison of reservoir permeability and saturation calculated by the SVR regression model and conventional method (Well M165).

In addition, the calculated permeability and water saturation by using the SVR model and the conventional model are compared with the core analysis data of 129 sealed cores from 12 wells (Fig. 7). The results show that the average relative error of permeability calculated by the multiple logging curves regression model is 0.385, and the permeability predicted by the SVR model is 0.259. The average relative error of water saturation calculated by the Archie model is 0.188, while the saturation predicted by the SVR model is 0.097. This further verifies that the constructed SVR prediction model is feasible and effective.

Figure 7
figure 7

The comparison results of reservoir permeability (a) and water saturation (b) calculated by the SVR regression model and conventional method, respectively.

Discussion

Based on the support vector machine learning method, this paper constructs a classification model for resistivity low-contrast oil reservoir identification and SVR regression model for reservoir parameter prediction. Support vector machine learning method has the characteristics of low requirements for the number of training samples and not affected by local extremum and strong generalization ability, which makes it great advantages in solving complex practical problems such as nonlinear regression and classification compared with the classical neural network method. The application effect analysis also shows that the constructed model has a higher accuracy than the classical neural network prediction method and conventional logging interpretation model. However, it should be noted that during the process of model construction, the optimal input data set should be effectively selected. Therefore, in order to improve the application effect of SVM method in other similar areas, the logging response and reservoir characteristics of resistivity low-contrast oil pays should be analyzed to build an optimal input data sets.

Conclutions

  1. (1)

    There is no obvious difference in physical and electrical properties between the resistivity low-contrast oil pay and water layer in the tight sandstone reservoir of the Chang 8 member in the Huanxian area, Ordos Basin. It is difficult to effectively identify and evaluate resistivity low-contrast oil pays by using conventional logging data, which seriously restricts the exploration progress and development benefits of oil resources in this area.

  2. (2)

    This study analyzed the relationship between the logging response and pore fluid to optimize the input training dataset. The SVM learning method was used to construct the SVM classification model and SVR regression model for fluid identification and reservoir parameter prediction.

  3. (3)

    The application results show that the SVM classification model has higher fluid identification accuracy, and the conventional fluid recognition method (cross plot of porosity and resistivity log) has the lowest fluid identification accuracy. The reservoir permeability and water saturation predicted by the SVR regression model are more consistent with the core analysis results, which proves that it is effective and feasible to interpret the resistivity low-contrast oil pays based on SVM method.