Towards estimation of CO2 adsorption on highly porous MOF-based adsorbents using gaussian process regression approach

In recent years, new developments in controlling greenhouse gas emissions have been implemented to address the global climate conservation concern. Indeed, the earth's average temperature is being increased mainly due to burning fossil fuels, explicitly releasing high amounts of CO2 into the atmosphere. Therefore, effective capture techniques are needed to reduce the concentration of CO2. In this regard, metal organic frameworks (MOFs) have been known as the promising materials for CO2 adsorption. Hence, study on the impact of the adsorption conditions along with the MOFs structural properties on their ability in the CO2 adsorption will open new doors for their further application in CO2 separation technologies as well. However, the high cost of the corresponding experimental study together with the instrument's error, render the use of computational methods quite beneficial. Therefore, the present study proposes a Gaussian process regression model with four kernel functions to estimate the CO2 adsorption in terms of pressure, temperature, pore volume, and surface area of MOFs. In doing so, 506 CO2 uptake values in the literature have been collected and assessed. The proposed GPR models performed very well in which the exponential kernel function, was shown as the best predictive tool with R2 value of 1. Also, the sensitivity analysis was employed to investigate the effectiveness of input variables on the CO2 adsorption, through which it was determined that pressure is the most determining parameter. As the main result, the accurate estimate of CO2 adsorption by different MOFs is obtained by briefly employing the artificial intelligence concept tools.

The concentration of atmospheric CO 2 has increased from 270 ppm before the industrial revolution to more than 400 ppm today, mainly due to the increasing consumption of fossil fuels 1 . In addition, it is widely believed that CO 2 has a major role in global climate change 2 . Thus, carbon capture technology has been employed as a promising route to reduce the CO 2 concentration into the atmosphere and inhibit global warming 3,4 . Several approaches have been studied for CO 2 capture: membranes 5,6 , chemical absorption 7,8 , physical adsorption 9 , and fluidized bed technologies 10 . However, these methods suffer from some drawbacks, such as high energy consumption, complex regeneration processes, and low CO 2 capture capacity. In order to build up a long-lasting chance in CO 2 elimination, an appropriate adsorption medium should provide the following conditions: (1) a periodical structure for both the capture and release of CO 2 reversibly, (2) high CO 2 selectivity, (3) optimized CO 2 adsorption capacities through modifying by chemical functionalization, and (4) thermal, chemical, and mechanical stabilities 11,12 . Metal-organic frameworks (MOFs) have been one of the most applicable porous compounds due to their regulating chemical structure, adjustable chemical functionality, and high thermal stability, allowing potential applications in gas adsorption [13][14][15][16] . MOFs are formed by a combination of two main parts of metal ions or clusters and organic ligands, creating a 3D structure with a network of channels and uniform pores. In addition to the robust 3D structure, the main characteristics of the MOFs are their permanent porosity and modular nature. These features of MOFs support them in adsorbing other molecules as a guest and sustaining their structures with negligible damage 17,18 . In comparison to the other porous materials, the most important advantage of the MOFs is their possibility to design the functionality and the pore size by choosing the metal ion, the functional group, the organic ligand, and the activation method 19  www.nature.com/scientificreports/ or IRMOF-1, containing zinc atoms linked to terephthalic acid molecules, possess a big void for gas capture, while M-dobdc or M-MOF-74 (M = Mg, Ni, Co, Zn), with unsaturated metal centers in their 3D structures, provide extra sites to bond with guest molecules 20,21 . Besides, the pore sizes of MOFs change from several angstroms to a few nanometers based on the diverse organic linkers 15 . Several studies reported high CO 2 adsorption capacity for MOF materials, ranging from 8.0 to 10.2 mol/kg at 298 K and 15 bar. CuBTC or HKUST-1 is one of the most explored MOFs for gas adsorption and storage 17,22-24 . To compare the adsorption capacity in zeolites and MOFs, at higher pressures, the adsorption capacity of the benchmark zeolite 13X is much lesser than that of MOFs 22 . Additionally, when the micropore diffusion is the rate control mechanism for CO 2 adsorption, the adsorption process in NaX and 5A zeolites proceeds slower than in MOF materials 25 . MOFs are promising candidates for gas adsorption applications among the various porous materials based on the mentioned features.
Despite numerous studies reported about gas-solid adsorption systems, investigating this phenomenon from a cohesive viewpoint is still challenging 26 . The experimental studies are time-consuming and costly, through which the instruments' errors affect the adsorption results. On the other hand, many adsorption isotherms are usable just for a specific range of data because they have been developed under simplified conditions assumptions 27 . Accordingly, a comprehensive and accurate model for examining the adsorption of a gas on MOFs should be developed. Intelligent methods (machine learning algorithms), namely, least-square support vector machine (LS-SVM), artificial neural network (ANN), random forest (RF) adaptive neuro-fuzzy inference system (ANFIS), and radial basis function network (RBF), can be possibly hired as an alternative to mathematical models for solving problems precisely and without the experimental works' troubles 28,29 . Compared to the conventional mathematic approaches, the smart models have gained excellent success in solving complex and non-linear optimization problems [30][31][32][33][34][35][36][37][38][39] .

Methodology
Gaussian process regression. This study used the machine learning technique, GPRs model, because they are able to deal with uncertainty in a probabilistic framework (Bayesian) and overcome the complex issues straightforwardly 41,42 . The non-linear GPR models need less training data and can combine new evidence when the available data increases. Typically, the low number of hyper-parameters to optimize through training makes this model less affected by the "overfitting" problem 43 . In the GPR technique, the training sample information determines the parameters of the model. Then, the GPR model is developed via adding the previous information to the modeling procedure and merging the actual (laboratory-measured) data 41 . In contrast to the traditional learning models, the GPR works through computing posterior distributions over models instead of finding the most acceptable match to the experimental data 44 .
Generally, the GPR model is established in this way: if the input and the target variables are represented by x and y, assume T = x T·i · y T·i n i=1 and L = x L·i · y L·i n i=1 as the arbitrarily chosen test and training data sets, respectively. The starting step in the GPR modeling is the following general equation: where x L indicates the independent variables and y L represents the targets of the learning data points. The ε ∼ N(0 · σ 2 noise I n ) , σ 2 noise , and I n are the observation noise, the variance of the noise, and the unit array, respectively. Therefore, each measured y is connected to the function f(x) by Gaussian noise model 45 . GPR assumes f as a random function that can be entirely defined by its covariance and mean functions. Likewise, we can write: where x T denotes the independent variables, and y T is the targets of the testing data sets. Also, the f(x) is distributed as a Gaussian process with covariance function k(x, x′) (also called kernel function) and mean function m(x) 45 : The mean function m(x) can be specified by using the explicit basis functions. Usually, the calculations are simplified by considering m(x) to be zero because it can be challenging to identify a fixed m(x) 41,45 . Thus, we have: The distribution of y is achieved by the combination of Eqs. (1) and (4): www.nature.com/scientificreports/ Considering all the above-described parameters and noises, we have: The summation of Eqs. (6) and (7) gives the following Gaussian expression: Then, the distribution of the y T can be derived through the conditioning rule of Gaussians, in which μ T and Σ T are the mean value and the covariance: e given independent variable and the training data set can obtain the outputs prediction of the test data. In training, choosing a powerful kernel function, which has an invertible and symmetric matrix, could significantly affect the estimation power of the established GPR model. To find the most appropriate kernel function for the current study, the learning method was conducted, through which four common and diverse kernel functions of Matern, Exponential, Squared exponential, and Rational quadratic are manipulated. These functions have the following forms: • Matern kernel function: • Exponential kernel function: • Rational quadratic kernel function: • Squared Exponential kernel function: where ℓ, α > 0, σ, and σ 2 are the length scale, scale-mixture, amplitude, and variance. Also, the K v and v represent the modified Bessel function and a positive parameter, respectively, while the symbol Γ indicates the gamma function. The exponential and squared exponential kernel functions are two special cases in the Matern function, where if v = 0.5 or 1 Matern function becomes exponential or squared exponential function.  Table S1) 17,40 . The pressure (P, bar), the temperature (T, K), the pore volume (V p , cm 3 /g), and the surface area (S, m 2 /g) of the MOFs are the model input variables, while the CO 2 uptake (xCO 2 mmol/g) is the output of the model. In order to establish the most accurate model, arbitrarily, 20% of the total data was separated as the testing set, which was used to study the validity of the www.nature.com/scientificreports/ model. The rest (80%) of the total data was utilized as the training set to investigate the MOF-CO 2 systems. Five statistical parameters (Eqs. [16][17][18][19][20], including R 2 (difference between the experiments and the calculated values), mean-square error (MSE), the standard deviation (STD), root-mean-square error (RMSE), and mean relative error (MRE) were used to evaluate the precision of the model.

Estimation of the precision of the collected data. Some data have inconsistent behavior in the data
bank with the remainder of the data points identified as the suspected data. The suspected data mainly makes mention of the experimental errors. Recognizing the suspected data is crucial because its presence in the data bank can result in an inappropriate forecast for the established model. Thus, to seek the suspected or outlier data and advance the data bank quality, the Leverage method is used. In this method, Hat matrix (H) and critical leverage limit (H*) are used for identification of the outlier data, which are defined as follow 46 .
where U, i, and j are a matrix dimensional of i * j, the number of the model parameters, and the number of training points, respectively. To investigate the precision of the CO 2 adsorption data bank, the standardized residuals are represented against Hat values in Fig. 1, namely William's plot. The bounded zone between the critical leverage limit and standardized residuals of − 3 to 3 is known as the reliable region in William's plot. It is clear that all the extracted data points for the CO 2 uptake by different MOFs are reliable. Therefore, the dataset is excellent for testing and training models.

Results and discussion
Analysis of sensitivity. In order to propose a precise model, identification of the effects of the input on the CO 2 uptake by MOFs is vital. A sensitivity analysis is the needed technique to obtain the relevancy factor of each input parameters, which is calculated as follow 47,48 : where X k.i , X k , Y i , and Y are the 'k' th input, input average, 'i'th output, and the average of outputs, respectively. The more value of r for an input parameter means that its efficiency on the CO 2 adsorption is higher and vice versa. The effect of the input variable on the CO 2 adsorption is shown in Fig. 2. The sensitivity analysis indicates that the pressure and the surface area of MOFs with r values of 0.68 and 0.52 are the most influential input variables on the CO 2 adsorption estimation. These inputs have a direct relationship with CO 2 uptake. Furthermore, increasing the pore volume of the MOFs results in higher CO 2 adsorption. It is worth mentioning that the small amount of r for the temperature can be related to its limited change in the experimental data.
Modeling results. In order to examine how exactly the proposed model is, the matching statistical parameters are used to specify a match between experimental and predicted CO 2 adsorption values. These parameters are determined and reported in Table 1. The R 2 values of 1.00, 0.998, 0.997, and 0.997 are obtained for GPR mod-  To further confirm the precision of the established models, the experimental and predicted CO 2 adsorption values are simultaneously shown in Fig. 3. It can be clearly observed that there is excellent agreement between the experimental CO 2 adsorptions and different GPR models. For all proposed models, the predicted CO 2 adsorption values follow the experimental CO 2 adsorption precisely. Thus, the proposed GPR models have outstanding capability in the prediction of CO 2 adsorption.
The predicted CO 2 adsorption values versus experimental data for all the models are plotted and described in Fig. 4. All the predicted CO 2 adsorption are situated to their experimental values so that the fitting lines on them have correlation coefficients higher than 0.98. The fitting lines cross considerably with 45° line representing the precision of all the GPR models for forecasting experimental CO 2 adsorption data. The bisector line (45° line) is a standard for the precision of established models. Nevertheless, the GPR model with Exponential kernel function yields the most precise results due to the correlation coefficient of 1. Figure 5 shows the relative deviations between the experimental CO 2 adsorption and all GPR models' predicted values. As it is presented, the various kernel functions of Matern, Squared exponential, and Rational quadratic have absolute deviation points lower than 30%, while for Exponential kernel function, they are lower than 20%.
According to the results, the proposed GPR models showed excellent performance for CO 2 adsorption prediction. To ensure that the suggested models have enough precision in estimating CO 2 adsorption by different MOFs, the current study results are compared to the available correlations with the same aim reported by Dashti et al. 25 . The statistical parameters, including R 2 , MSE, and STD, for the Dashti et al. study are listed in Table S2. Among the four examined algorithms, the RBF showed the best prediction with R 2 = 0.997, MSE = 0.204, and STD = 4.211. In comparison, all the established GPR models have better estimating of CO 2 adsorption, specifically, the GPR model with Exponential kernel function with R 2 = 1.00, MSE = 0.02, and STD = 0.14.
As shown in Fig. 6, MOF-177 has the highest CO 2 adsorption capacity of 33.5 mmol/g, which is much more significant than other MOFs. After that, IRMOFs-11, -1, and -3, with Zn 4 O(O 2 C) 6 -type frameworks, show excellent capacities for CO 2 adsorption at room temperature. These MOFs have great effective pore sizes, which induce a sigmoidal shape(step) in their adsorption isotherms 24 . Also, the CO 2 adsorption isotherms of MOF-2, MOF-74, Norit RB2, MOF-505, and Cu 3 (BTC) 2 are monotonic (Type I). The severe CO 2 adsorption at low pressure makes a "knee shape" in these isotherms, while the maximum capacity is gained at high pressure as the pores are saturated. Figure 7 indicates the CO2 adsorption isotherms of Co(BDP), Cu-BTTri, BeBTB, Mg 2 (dobdc), and MOF-177 at 313 K. The MOF-177 and BeBTB show much better performance than other MOFs in the CO 2 adsorption, which is due to their higher surface area (see Table S1). The isotherm of Co(BDP) has a step-like feature which might be attributed to its flexible structure, allowing gate-opening occurrence 49,50 . Cu-BTTri and Mg 2 (dobdc) adsorbed high CO 2 at low pressures, which is related to their surface areas and the additional polarizing effect of metal cations on the framework surface. Due to higher polarizability and the quadrupole moment of CO 2 , the surface area can affect the amount of CO 2 adsorption by MOF. Figure 8 shows the temperature effect on the CO 2 adsorption. www.nature.com/scientificreports/

Conclusion
In the current study, the GPR models based on different kernel functions have been established to estimate the CO 2 adsorption ability of MOFs in terms of pressure, temperature, pore volume, and surface area of MOFs. For this purpose, 506 experimental CO 2 uptake values in the literature have been collected and assessed. Four various kernel functions of Exponential, Squared exponential, Matern, and Rational quadratic have been studied. An excellent match has been detected between the experimental CO 2 adsorptions and predicted values by the developed GPR models, confirming these models' great ability in determining the CO 2 uptake. Among the proposed models, the GPR model based on exponential kernel function, was shown as the most precise predictive tool with R 2 = 1.00, MSE = 0.02, and STD = 0.14. Also, the suggested GPR models have better performance in comparison to the reported correlations. The sensitivity analysis indicates that the pressure is the most influential variable in CO 2 adsorption by MOFs. The surface area of the MOFs can be presented as the second determining paramater in the CO 2 capture by MOFs systems. The discussions in the current study can make it a helpful report for the engineers and researchers dealing with gas separation technologies.   www.nature.com/scientificreports/

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request. www.nature.com/scientificreports/