Application of mathematical statistics to shale gas-bearing property evaluation and main controlling factor analysis

Gas-bearing property evaluation and main controlling factor analysis have remained a concern in shale gas research. The application of principal component analysis, an important mathematical statistical method, in gas-bearing property evaluation and main controlling factor analysis of the Longmaxi shale in the Weirong area, Sichuan Basin, was examined. The Longmaxi shale exhibits high heterogeneity, manifested in the organic matter abundance, mineral composition, and pore structure. Seven geological factors, including the temperature, pressure, TOC content, clay content, brittle mineral content, pore volume, and specific surface area (SSA), were selected in principal component analysis. Four principal components with geological significance, such as mineral composition, formation condition, pore structure, and organic matter abundance, were extracted through principal component analysis, and further constituted a comprehensive factor. Shale gas-bearing properties were evaluated according to the score of the comprehensive factor. The Longmaxi shale could be identified as exhibiting good, medium, and poor gas-bearing properties based on the comprehensive factor scores of these samples. According to each geological factor’s contribution to the comprehensive factor, combined with geological analysis, it could be considered that gas-bearing properties are primarily controlled by pore volume, SSA, and clay content, followed by TOC content, brittle mineral content, temperature and pressure.


Geological setting
The Weirong block is located in the southern Sichuan Basin (Fig. 1). The basement of the Sichuan Basin comprises Middle and Upper Proterozoic strata, on which Sinian to Jurassic strata were successively deposited except for the lack of Devonian and Carboniferous strata. The Longmaxi Formation contains a set of marine shales deposited during the early Silurian 26 . The Longmaxi Formation can be divided into two members based on the lithology and biological characteristics 27 . The first member of the Longmaxi Formation (S 1 l 1 ) comprises gray black and black carbonaceous shale, with abundant graptolite and radiolarian fossils. The lithology of the second member (S 1 l 2 ) primarily indicates dark gray and grayish-green shale and silty shale, and biological fossils are underdeveloped. Vertically, the lithological change in the Longmaxi Formation indicates a trend whereby the sand content gradually increases and the clastic grain size increases from the bottom up, reflecting the change in the sedimentary environment from deep to shallow-water shelf environments. The Longmaxi Formation was uplifted and denuded in a large area during the late Silurian. This formation is mainly distributed in the east, www.nature.com/scientificreports/ south, and southwest of the Sichuan Basin, with a residual area of approximately 14.4 × 10 4 km 2 . The burial depth of the Longmaxi Formation in the Weirong area ranges from approximately 3000-3500 m. The thickness of the organic-rich shale of S 1 l 1 ranges from 80 to 90 m, which is the focus of shale gas exploration. Exploration and drilling results suggest that the Longmaxi Formation in the Weirong block is generally subject to a high pressure, and the pore fluid pressure coefficient varies between ~ 1.94 and 2.05 28 , which provides suitable conditions for shale gas preservation but also causes great difficulties in fracturing development of shale gas.

Materials and methods
Source rock and reservoir parameters. The Longmaxi shale samples from well WY1, WY11, and WY23 were analyzed for experimental data in the Experimental Research Center of Wuxi Research Institute of Petroleum Geology, SINOPEC. TOC measurement, X-ray diffraction (XRD) analysis, porosity test, low-pressure nitrogen adsorption was carried out on sixty-one samples, respectively. The TOC content, mineral composition, porosity, pore structure parameters of these shale samples were obtained. Twelve samples were prepared for kerogen microscopic examination to analyze the organic matter type. Eight samples were tested for bitumen reflectance to obtain organic matter maturity. Forty-two samples were performed on a gas chromatograph to obtain the composition of natural gas. Three samples were measured for the carbon isotopic composition of methane. For the specific operation process of the above experiment, please refer to Han et al. 32 , Pei et al. 33 , and Wu et al. 34 .
Total gas content measurement. The field desorption method directly measures the gas content of sixty-one shale samples. The core just taken out from the wellhead was quickly put into the desorption tank for shale gas natural desorption. When shale gas desorption, the mud circulation temperature is adopted for the first 3 h to simulate the gas desorption during coring. Then the temperature in the next 6-8 h is set to 110 °C. At this temperature, the residual gas is ignored. The desorbed gas volume was observed and recorded at different times. The gas desorption ends when the reading change of high-precision flowmeter is no more than 0.1 cm 3 . The lost gas volume was recovered by the USBM method. The measured desorption gas volume and calculated loss gas volume add up to the total gas content.

Principal component analysis. Principal component analysis, a mathematical statistical method,
involves the reduction in the dimensions of multiple variables 35 . The principle of this method entails the transformation of numerous possibly correlated variables into fewer linearly uncorrelated variables through orthogonal transformation 36 . The transformed variables are referred to as principal components, which can effectively reflect the information of the original variables. This study used principal component analysis to evaluate shale gas-bearing properties. The influencing factors of gas-bearing properties can be divided into three types: the first type includes gas generation conditions, such as the organic matter abundance and maturity; the second type includes gas storage conditions, such as the pore volume and specific surface area; and the third type includes gas preservation conditions, such as the formation pressure and fault development degree [37][38][39][40] . Combined with the actual geological features of the Longmaxi shale, seven original variables, including the temperature, pressure, TOC, clay content, brittle mineral content, pore volume, and specific surface area (SSA), were considered. The steps of principal component analysis are as follows: 1. Primitive variable X b (b = 1, 2, …, 7) can be standardized as Y b (b = 1, 2, …, 7) to reduce dimensions and eliminate order of magnitude differences among the various primitive variables, Eqs. (1)- (3).
where x ab is the value of the a-th sample of the b-th variable, x b is the mean value of 61 samples of the b-th variable, s b is the standard deviation of the 61 samples of the b-th variable, and y ab is the value of the a-th sample of the b-th variable after standardization. 2. The correlation coefficient matrix R = (r ab ) 7×7 of the standardized data matrix Y = (y ab ) 61×10 can be calculated based on Eq. (4).
where r ab is the correlation coefficient between the a-th and b-th variables, r aa = 1, and r ab = r ba . 3. The eigenvalue λ b (b = 1, 2, …, 7; λ 1 ≥ λ 2 ≥ … ≥ λ 7 ≥ 0) and standard orthogonal eigenvector β b (b = 1, 2, …, 7) of the correlation coefficient matrix R = (r ab ) 7×7 can be obtained. The information contribution rate δ b (b = 1, 2, …, 7) and cumulative contribution rate α b (b = 1, 2, …, 7) of λ b (b = 1, 2, …, 7) can be calculated according , a = 1, 2, · · · , 61; b = 1, 2, . . . , 7 (2)  Workflow. This study was conducted in four steps (Fig. 2). The first step was the establishment of a geological database of the Longmaxi shale and selection of geological variables, including the formation temperature and pressure, clay content, brittle mineral content, pore volume, specific surface area, and TOC content. The second step entailed the substitution of values of these geological variables of the Longmaxi shale samples into SPSS software for principal component analysis. The principle and steps of principal component analysis are described in "Principal component analysis" section. Four principal components were extracted through principal component analysis, constituting a comprehensive factor. The comprehensive score of each shale sample and weight coefficient of each variable of the comprehensive factor could also be obtained. Based on the results of principal component analysis, third and fourth steps were performed. In the third step, the gas-bearing property could be evaluated based on the calculated comprehensive scores. Shale samples with high comprehensive scores exhibit good gas-bearing properties. Moreover, the gas content in shale could be quantitatively predicted by establishing a relationship between the comprehensive score and gas content. In the fourth step, the weight coefficient of each variable of the comprehensive factor could be determined, reflecting the relative impact of this factor on the gas-bearing property. Combining the weight coefficients of each variable with geological analysis, the main controlling factors of the gas-bearing property could be further determined.

Results and discussion
Establishment of a geological database. Organic geochemical characteristics. The kerogen macerals of the Longmaxi shale samples mainly include sapropelic amorphous and planktonic alginite but hardly include vitrinite. The type index (TI) value ranges from 78.22 to 100, indicating that the organic matter in the Longmaxi www.nature.com/scientificreports/ shale is dominated by type I kerogen ( Table 1). The bitumen reflectance (R b ) of the Longmaxi shale ranges from 2.72 to 3.22%, and the equivalent vitrinite reflectance (R o ) ranges from 2.08 to 2.39%, demonstrating that the Longmaxi shale has generally entered the stage of overmature thermal evolution ( Table 1). The results of gas chromatographic analysis reveal that methane is the main component of shale gas, with a methane content ranging from 73.82 to 97.72% and a drying coefficient higher than 95%. The stable carbon isotope composition of methane (δ 13 C 1 ‰) ranged from approximately − 31.9‰ to − 35.8‰. According to the δ 13 C 1 ‰ and shale gas composition results (Fig. 3a), it could be determined that the organic parent material is lipoid, which is consistent with the results of kerogen microscopic examination. Moreover, δ 13 C 1 ‰ can provide information on the maturity of organic matter. Stahl 41 , Schoell 42 , and Chen et al. 43 established quantitative relationships between δ 13 C 1 ‰ and R o for oil-type gas. Based on these relationships, the Longmaxi shale can be classified as generally overmature, with R o exceeding 2.0% (Fig. 3b). The Longmaxi shale exhibits a TOC content ranging from 0.06 to 6.04 wt% (averaging 2.06 wt%), indicating a suitable shale hydrocarbon generation potential (Supplementary Table 1). R o is calculated with the following equation: R o = 0.618R b + 0.4 44 . The type index can be calculated with the following equation TI = 100 × a + 100 × b 1 + 50 × b 2 + 10 × c 1 − 75 × c 2 − 100 × d 45 , where a is the percentage of sapropelite, b 1 is the percentage of resinite in exinite, b 2 is the percentage of other exinite components, c 1 is the percentage of perhydrous vitrinite, c 2 is the percentage of normal vitrinite, and d is the percentage of inertinite. A blank column indicates that no data are measured. A dash indicates that the data value is zero.
Mineral compositions. The XRD analysis shows that the Longmaxi shale mainly contains clay minerals and quartz, and their average contents are 42.3% and 33.9%. Dolomite, calcite, pyrite, and feldspar are not abundant,  Pore structure characteristics. Nitrogen adsorption/desorption isotherms for twelve shale samples retrieved from two wells are shown in Fig. 5. These isotherms are characterized by an obvious hysteresis loop and Point B. The adsorption isotherm is convex under a low relative pressure, caused by monolayer coverage. When the adsorption isotherm is almost linear with the relative pressure, the single-layer adsorption process ends, and multilayer adsorption begins. The adsorption isotherm reveals a slightly concave trend under a high relative pressure and does not reach the limiting uptake. Then, the relative pressure drops, and the desorption isotherm does not coincide with the adsorption isotherm, forming a hysteresis loop associated with capillary condensation in pores. Based on the IUPAC classification 49 , the nitrogen adsorption/desorption isotherms belong to type IV isotherms, and the hysteresis loops are mainly of the H2 and H3 types. The information reflected by the isotherm shape indicates that the Longmaxi shale contains abundant mesopores, mainly ink bottle-shaped and slit-like pores. The change rate of the pore volume and SSA of the Longmaxi shale with the pore diameter is shown in Fig. 6. The pore volume obviously changes when the pore diameter ranges from 0.6 to 1.0 nm and 2.0 to 10.0 nm. The SSA obviously varies within the 0.6-1.0 nm pore size range. Thus, the pore size of the Longmaxi shale primarily ranges from 0.6 to 1.0 nm and 2.0 to 10.0 nm. In addition, a quantitative description of the pore structure can be obtained from the generated nitrogen adsorption/desorption isotherms. The pore volume and SSA of the shale range from 0.011 to 0.046 cm 3 /g (averaging 0.026 cm 3 /g) and 6.304 to 37.011 m 2 /g (averaging 20.009 m 2 /g), respectively (Supplementary Table 1). The pore volume of mesopores is the largest, with a mean value of 0.015 cm 3 /g, followed by micropores and macropores, with average values of 0.008 cm 3 /g and 0.003 cm 3 /g, respectively. Micropores exhibit the largest SSA, with an average value of 15.385 m 2 /g. In contrast, mesopores and macropores exhibit a smaller specific area. The average SSA values of mesopores and macropores are 4.585 m 2 /g and 0.039 m 2 /g, respectively.
Application of principal component analysis. Seven factors, including the TOC content, temperature, pressure, pore volume, specific surface area, clay content, and brittle mineral content, were selected from the geological database for principal component analysis (Supplementary Table 1). These factors are related to gasbearing properties, and these data are easy to obtain. The results of principal component analysis are reliable since the Kaiser-Meyer-Olkin (KMO) value is 0.73, which is greater than 0.6, and the significance of Bartlett's test of sphericity is lower than 0.05. Four principal components were extracted, namely, FAC1, FAC2, FAC3, and FAC4. Each principal component is a linear combination of the original variables. The weight coefficients of each variable of these four principal components are listed in Table 2. FAC1, FAC2, FAC3, and FAC4 captured 33.9%,  www.nature.com/scientificreports/ 32.8%, 25.6%, and 7.1%, respectively, of the total variability. FAC1, FAC2, FAC3, and FAC4 could explain 99.4% of the total variability. Thus, we used these four principal components to evaluate the gas-bearing property. FAC1 exhibited a positive relationship with the brittle mineral content but was negatively correlated with the clay content (Fig. 7a). FAC2 was positively correlated with the temperature and pressure (Fig. 7a). FAC3 attained a positive relationship with the pore volume and specific surface area (Fig. 7b). FAC4 was positively correlated with the TOC content (Fig. 7b). Based on these observations, FAC1-FAC4 could reflect geological significance well and could be interpreted as the mineral composition, formation condition, pore structure, and organic matter abundance, respectively. A comprehensive factor (CFAC) was obtained based on FAC1, FAC2, FAC3, and FAC4. The scores of FAC1, FAC2, FAC3, FAC4, and CFAC of each sample were calculated according to Eqs. (9)-(13), the eigenvectors of all standardized variables in Eqs. (9)-(13) are listed in Table 2, and the results are provided in Table 3. The weight coefficients of each variable of CFAC are listed in Table 2. The first three factors contributing the most to CFAC were the pore volume, SSA, and clay content ( Table 2).
where X 1 -X 7 are standardized variables; X 1 denotes the formation temperature, K; X 2 denotes the formation pressure, MPa; X 3 denotes TOC, wt%; X 4 denotes the pore volume, cm 3 /g; X 5 denotes the pore surface area, m 2 /g; X 6 denotes the clay content, %; and X 7 denotes the brittle mineral content, %.
13) CFAC sample = 0.341FAC1 + 0.330FAC2 + 0.258FAC3 + 0.071FAC4 Table 2. Factor score coefficient matrix and variance explained by each factor. a The ratio of the variance contribution refers to the variance explained by a single principal component (FAC1, FAC2, FAC3 or FAC4) to the total variance explained by the first four principal components.  www.nature.com/scientificreports/ Gas-bearing property evaluation. The measured gas content in the Longmaxi shale samples ranges from 0.31 to 5.32 m 3 /t. In general, shale with a gas content < 2 m 3 /t can be regarded as exhibiting poor gas-bearing properties, shale with a gas content ranging from 2 to 4 m 3 /t exhibits medium gas-bearing properties, and shale exhibits high gas-bearing properties given a gas content > 4 m 3 /t. In this study, the measured gas content in the Longmaxi shale samples was used to verify the principal component analysis results. The findings suggest that the comprehensive score obtained via principal component analysis is positively correlated with the measured gas content (Fig. 8). Therefore, the comprehensive score of principal component analysis can also be used to effectively evaluate the Longmaxi shale gas-bearing properties. A linear relationship between the comprehensive score and shale gas content was further established. The comprehensive scores corresponding to gas contents of 2 m 3 /t and 4 m 3 /t are − 0.209 and 0.788, respectively. Thus, shale with a score < − 0.209 can be evaluated as exhibiting poor gas-bearing properties, shale with a score ranging from − 0.209 to 0.788 exhibits medium gasbearing properties, and shale with a score > 0.788 exhibits good gas-bearing properties. Gas-bearing property evaluation of a single well reveals that the results based on principal component analysis suitably agree with those based on the measured gas content, with the coincidence rates for wells WY11 and WY23 reaching 80.8% and 77.1%, respectively (Fig. 9). In addition, apparent differences in gas-bearing properties between these two wells are observed. Well WY11 exhibits poor gas-bearing properties, whereas the gas-bearing properties of well WY23 are primarily medium to good. This phenomenon may occur because the geological factors influencing the gas-bearing properties in these two wells are quite different.

Main controlling factors of the gas-bearing properties.
The main controlling factors affecting the shale gas-bearing properties were revealed via principal component analysis. The above indicates that the geological factors contributing to the comprehensive factor include the pore volume, SSA, clay content, temperature, pressure, TOC content, and brittle mineral content, with the pore volume, SSA, and clay content yielding the greatest contribution (Table 2). Except for the clay content, which negatively contributes to the comprehensive factor, the other geological factors positively contribute to the comprehensive factor. Moreover, geological analysis reveals that most of these geological factors attain good relationships with the measured gas content, www.nature.com/scientificreports/ especially SSA, pore volume, TOC, brittle mineral content, and clay content (Fig. 10). The positive correlations among SSA, pore volume and gas content reflect the control of the pore structure on the gas-bearing properties (Fig. 10a,b), which is also shown in previous literature 50 . The pore volume and SSA primarily provide the occurrence space of free and adsorbed gas, respectively 51 . The larger the pore volume and SSA are, the more conducive these conditions to shale gas enrichment, which often indicates good gas-bearing properties 52 . Consistent with previous studies 53, 54 , the TOC content exerts a positive effect on the gas content in this study (Fig. 10c), as reflected by the gas supply and storage. On the one hand, a high TOC content reflects a high gas generation potential, which is the premise of good gas-bearing properties. On the other hand, the TOC content is positively related to SSA (Fig. 10d), indicating that the organic pores developed within the organic matter provide a very large SSA for adsorbed gas. In addition, it has been revealed in previous studies that organic pores notably contribute to the pore volume in regard to free gas 55,56 . A change in mineral composition can also lead to a dif-  www.nature.com/scientificreports/ ference in shale gas-bearing properties. The brittle mineral content is positively correlated with the gas content (Fig. 10e), while the clay mineral content is positively correlated with the gas content (Fig. 10f). There are primary intergranular pores, dissolved pores, and microcracks related to brittle minerals, which are conducive to the storage of shale gas, especially free gas 22 . In contrast, plastic clay minerals can be compressed and deformed under the action of stress and can increasingly fill pores, which is not conducive to pore preservation 57 . Therefore, a mineral composition with a high brittle mineral content and low clay content improves the gas-bearing property of shale. Finally, based on the results of principal component analysis and geological analysis, the main controlling factors of shale gas-bearing properties are pore volume, SSA, and clay mineral content. The secondary controlling factors are TOC content, brittle mineral content, temperature and pressure. Since temperature and pressure are external factors, this study will not focus on them. It is thus clear that the Longmaxi shale samples with good gas-bearing properties exhibit a high TOC content, large pore space, and high brittle mineral content. In contrast, shale with poor gas-bearing properties exhibits the opposite trend (Fig. 11).
Guidance for shale gas exploration. In the era of big data, it is inevitable that digital and intelligent oilfields are constructed, which is very beneficial to the optimization of oilfield exploration and development decisions and improvement in economic benefits 58 . The Longmaxi Formation is an important shale gas-producing layer in China. After decades of research, abundant basic geological data of the Longmaxi shale are available, which provides essential support for the establishment of a vast geological database. A geological database comprises an indispensable part of digital and intelligent oilfields. How can we effectively use a comprehensive geological database to guide digital and intelligent exploration of shale gas? The answer lies in the realization of data analysis intelligence, i.e., the compilation of algorithms. Principal component analysis is a basic, simple, and efficient algorithm. In this paper, effective evaluation of shale gas-bearing properties was realized using principal component analysis, which can provide guidance for digital and intelligent shale gas exploration (Fig. 12). Specifically, four principal components with geological significance, such as the pore structure, formation condition, pore structure, and organic matter abundance, are extracted through principal component analysis in this study. The shale gas-bearing property is evaluated according to the score of the comprehensive factor comprising these four principal components. Gas-bearing property evaluation is the result of comprehensively considering the above geological factors, which are generally also considered in sweet spot prediction of shale gas 59,60 . Therefore, the gas-bearing property evaluation results obtained by principal component analysis can be used as a comprehensive index for shale gas sweet spot prediction. Combined with other geological indicators in the geological database, such as regional structural characteristics, shale physical properties, and surface conditions 61 , the superposition analysis is carried out to realize the graded evaluation of sweet spots in the study area. www.nature.com/scientificreports/

Conclusions
The principal component analysis method can effectively evaluate the gas-bearing properties of the Longmaxi shale, with an accuracy of approximately 80%. Moreover, combining this method and geological analysis, the main controlling factors, namely, SSA, pore volume, and clay content, of gas-bearing properties can be determined. According to the gas-bearing characteristics, it can be considered that the Longmaxi Formation contains three levels of shale gas reservoirs with good, medium, and poor gas bearing properties. Good gas-bearing shale exhibits a gas content > 4 m 3 /t and a comprehensive score > 0.788, with an average TOC content of 4.17 wt%, average SSA of 29.032 m 2 /g, average pore volume of 0.035 m 3 /g, and average brittle mineral content of 65%. Medium gas-bearing shale exhibits a gas content of 2-4 m 3 /t and a comprehensive score of − 0.209 to 0.788, with average TOC, SSA, pore volume, and brittle mineral contents of 2.33 wt%, 23.002 m 2 /g, 0.029 m 3 /g, and 56%, respectively. Poor gas-bearing shale exhibits a gas content < 2 m 3 /t and a comprehensive score < − 0.209, with average TOC, SSA, pore volume, and brittle mineral contents of 0.97 wt%, 12.967 m 2 /g, 0.019 m 3 /g, and 44%, respectively. In addition, principal component analysis can be used as an algorithm to realize intelligent analysis. The application of principal component analysis in gas-bearing property evaluation lays a foundation for intelligent shale gas exploration, which is an inevitable trend in the era of big data.

Data availability
All data generated or analyzed during this study are included in this published article.