On the evaluation of the carbon dioxide solubility in polymers using gene expression programming

Evaluation, prediction, and measurement of carbon dioxide (CO2) solubility in different polymers are crucial for engineers in various chemical applications, such as extraction and generation of novel materials. In this paper, correlations based on gene expression programming (GEP) were generated to predict the value of carbon dioxide solubility in three polymers. Results showed that the generated correlations could represent an outstanding efficiency and provide predictions for carbon dioxide solubility with satisfactory average absolute relative errors of 9.71%, 5.87%, and 1.63% for polystyrene (PS), polybutylene succinate-co-adipate (PBSA), and polybutylene succinate (PBS), respectively. Trend analysis based on Henry’s law illustrated that increasing pressure and decreasing temperature lead to an increase in carbon dioxide solubility. Finally, outlier discovery was applied using the leverage approach to detect the suspected data points. The outlier detection demonstrated the statistical validity of the developed correlations. William’s plot of three generated correlations showed that all of the data points are located in the valid zone except one point for PBS polymer and three points for PS polymer.

www.nature.com/scientificreports/ quartz crystal microbalance technique. Next year, Webb et al. 22 and Sato et al. 23 evaluated diffusion and solubility of CO 2 in polymers under high pressures and temperatures. According to their research, the solubilities increased by increasing pressure and decreased by increasing temperature. In 2000, Sato et al. 15 suggested empirical relations to determine solubility and diffusion coefficient of CO 2 . They considered pressure and temperature as the dependent variables in the range of 1.025-20.144 MPa and 323.15-453.15 K, respectively. They achieved that solubility of CO 2 in molten state polymers increases by increasing pressure and decreasing temperature. A year later, Hilic et al. 24 measured solubility of N 2 and CO 2 in polystyrene, which considered pressure from 3.05 to 45 MPa and temperature from 338 to 402 K. In addition, an experimental technique with a vibrating-wire force sensor was applied. They got a linear relationship between increasing solubility with increasing pressure and decreasing temperature. In the same year, Sato et al. 25 26 studied about CO 2 solubility in alkanolamine solutions in the values of 40, 60 and 80 °C for temperature and 0.1-50 psia for pressure. They represented a vapor-liquid equilibrium of CO 2 in these solutions. In the same year, Sato et al. 27 28 predicted the adsorption of CO 2 in various polymers based on a group contribution equation of state (EoS) with input ranges of 283-453 K and 1-200 bar for temperature and pressure, respectively. Their best result was an average absolute relative error (AARE) of 5.5% for polystyrene. In 2006, Li et al. 29 measured gas solubilities and diffusivities in polylactide at a temperature of 180-200 K and pressures up to 28 MPa using a magnetic suspension balance (MSB). Furthermore, they adopted a theoretical model based on Fick's second law to extract diffusion coefficients of N 2 and CO 2 in polylactide. They obtained that CO 2 exhibited lower diffusivity than N 2 at the same temperature. At that year, Nalawade et al. 9 used SCCO 2 as a green solvent for processing polymer melts. They earned SCCO 2 is applicable in many polymerization processes due to its high solubility in polymers. In 2007, Lei et al. 30 36 investigated a comprehensive review of CO 2 polymer system. They used two types of multi-scaled methods, namely thermodynamic-calculation model and computer simulation to measure CO 2 solubility in polymers. Their developed model can be utilized in chemistry and chemical industries, such as phase rheological property and polymer self-assembly. In 2022, various experimental, theoretical, and modeling researches have been done in order to measure solubility of CO 2 and other gases in water-polymer systems. Sun et al. 37 measured CO 2 solubility in oil-based and water-based drilling fluids using the sample analysis approach. Their results indicated that the salting-out effect of electrolyte on gas solubility can be increased with increasing the molar concentration of ions. Their study also showed that the errors of CO 2 solubility in the oil-based and water-based drilling fluids are 6.75% and 3.47%, respectively. Besides, Ushiki et al. 38 evaluated CO 2 solubility and diffusivity in polycaprolactone (PCL) performing perturbedchain statistical associating fluid theory (PC-SAFT) and free volume methods. According to their work, CO 2 solubility was recognized to conform with Henry's law, and the PC-SAFT EoS sufficiently described the solubility. Also, Kiran et al. 39 assessed diffusivity and solubility of CO 2 and N 2 in polymers. They usedSanchez-Lacombe EoS in modeling solubility. Furthermore, Ricci et al. 40 provided a comprehensive theoretical framework for the supercritical sorption and transport of CO 2 in polymers. In their study, CO 2 sorption was modelled utilizing data available across the critical region, at different temperatures and pressures up to 18 MPa. The present research mostly focuses on generating accurate correlations for CO 2 solubility prediction considering the pressure and temperature of the polymer as input variables. The generated correlations are based on gene expression programming (GEP) technique. A comprehensive databank including of 53 data points for PBS, 43 data points for PBSA and 92 data points for PS polymer is collected 15,20,24,25 . After generating correlations, statistical and graphical error tests are applied to assess the accuracy of the correlations. Likewise, the capability of the represented correlations in predicting the real trend of the CO 2 solubility with the change of pressure and temperature is appraised. Lately, the leverage approach is performed to detect the outlier data points in the dataset.

Data collection
In this research, GEP algorithm was implemented to predict the amount of CO 2 solubility in three different polymers, namely PBS, PBSA, and polystyrene (PS). For this aim, 53 data points for PBS, 43 data points for PBSA, and 92 data points for PS polymer were collected 15,20,24,25 . In this work, pressure and temperature of carbon dioxide were considered as input parameters. A summary of the gathered data points is shown in Table 1. As pointed up in Table 1, extensive ranges of temperature and pressure of CO 2 are supplied in this study.

Correlation development
In order to generate CO 2 solubility correlations, Gene expression programming (GEP) evolutionary algorithm has been applied. GEP which was firstly proposed by Ferreira in 2001 41 , is a normally comprehensive phenotype technique in which the chromosomes form a correctly inseparable, operative entity 42 . This technique is extensively used in computer programming and modeling applications [43][44][45][46] . Gene expression programming algorithms are complicated tree-based structures that coordinate by changing their shape, composition and sizes. By encoding trees as vectors of symbols and transforming them into them just in order to assess their fitness, this technique can indirectly produce trees 47 . This soft computing technique is strong predictive algorithm that is widely used for various field application purposes. Commonly, the GEP technique has two components, namely chromosome and the expression trees (ETs). The possible solutions are encoded by the chromosomes and is regarded as the linear string with particular length, hence these solutions will be decoded into the real candidate solution termed expression tree 48 . After producing of chromosomes of first-production individuals and choosing them based on fitness function to re-generate with modifications, new generation individuals were presented to the developmental operation of selection environment confrontation, genome expression, and modified reproduce 49 . Additionally, gene expression programming automatically creates algebraic expressions to answer nonlinear problems 50 . The schematic flowchart of GEP procedure is depicted in Fig. 1.

Results and discussion
Development of correlations. In the present study, gene expression programming tree-based soft computing approach was carried out to develop accurate correlations for predicting CO 2 solubility in different polymers. The developed correlations consider CO 2 solubility as a function of pressure and temperature of cor- where P and T denote pressure and temperature of aforenamed polymers, respectively. In the above correlations, the units of P and T are MPa and K, respectively. The generated correlations in this study are applicable for CO 2 solubility prediction in various ranges of temperature and pressure of the mentioned polymers.
Statistical performance assessment. In order to show and compare the precision of the generated correlations, some important statistical parameters including root mean square error (RMSE), standard deviation (SD), coefficient of determination (R 2 ), the average relative error (ARE) and the average absolute relative error (AARE) were applied 51 . These terms are given below: where Ei is the partial deviation that is described as: www.nature.com/scientificreports/ where n, S (exp), S (cal) and S (avg) are the number of data, actual CO 2 solubility value, calculated CO 2 solubility value, and the average of the actual data points, respectively. The prementioned statistical parameters for the three generated correlations are detailed for the training, testing, and whole datasets in Table 3. As described in this table, the AARE of the correlation for the PBS polymer is lower than other two correlations generated in this work. Results demonstrate that generated correlation for the PBS polymer has the lowest standard deviation (0.028) and RMSE (0.00178). However, the correlations developed for the other two polymers also have acceptable accuracy. As presented in Table 3, the AARE values for PBS and PBSA polymers were obtained less than AARE for PS polymer, which was due to the nature of the experimental data related to PS polymer. It is obvious that the generated correlations are reliable and sometimes, due to the nature of the experimental data values of different materials (like polymers), different error values may be obtained.
Graphical performance assessment. This section represents a graphical description of the comparison among the results of the generated correlations and the actual data. The predicted CO 2 solubility values in PBS polymer are sketched versus actual ones in Fig. 2a. Likewise, the predicted CO 2 solubility values in PBSA and PS polymers are depicted versus experimental data in Fig. 2b,c, respectively. The closer the sketched data points to the 45° line, the greater the uniformity of the correlations is. According to these plots, it is apparent that the results of the generated user-friendly correlations illustrate satisfactory agreement around the ideal line. Additionally, the relative error curves of the developed correlations of the CO 2 solubility in PBS polymer, PBSA polymer, and PS polymer are presented in Fig. 3a-c, respectively.
Furthermore, to show the accuracy of the presented correlations in different ranges of pressure and temperature, the correlations' performances in terms of AARE were sketched against five sets of pressure and three sets of temperature. Figure 4 demonstrates the AARE of the correlations in different ranges of input parameters. For various ranges of pressure, the correlation of CO 2 solubility in PBS polymer clarifies a steady performance and its AARE is lower than 2.9% in all ranges. Besides, a reliable performance can be perceived from the correlation of CO 2 solubility in PBS polymer up to the last temperature range. This figure validates the efficiency of the developed correlation of CO 2 solubility in PBS polymer over other developed correlations in the present study.
Afterwards, the cumulative frequency analysis of the absolute percent relative error (APRE) for the generated correlations in this work is shown in Fig. 5. According to the results of this figure, the correlation of CO 2 solubility in PBS polymer could estimate more than 90% of CO 2 solubility values with an APRE of less than 5%, and also more than 98% of the CO 2 solubility values by the correlation for PBS polymer have an AARE of less than 10%.
Additionally, absolute relative error comparison among generated correlations was carried out. Figure 6 describes the AARE comparison between the prementioned correlations. According to this figure, the developed correlation of CO 2 solubility in PBS polymer revealed the highest accuracy and the lowest AARE between other correlations generated in this research.

Trend analysis of the generated correlations. Trend analysis is a well-known applicable technique
to visualize the output variation with the change of input variables 52,53 . The predictions of the CO 2 solubility correlations are depicted versus temperature and pressure in Fig. 7 to investigate the capability of the generated correlations in following the actual expected trends of CO 2 solubility values with the change of pressure and temperature. According to Henry's law, it is evident that CO 2 solubility increases with decreasing temperature and increasing pressure 54 . Carbon dioxide has a propensity, namely plasticizing effect 55 . It means that the molecules of CO 2 are pressured in the chains of the polymer as a consequence of increasing pressure, which results in an extension of the pore space within the molecules and, then, for this reason, an addition of their movement 56,57 . www.nature.com/scientificreports/ This causes it feasible to absorb more gas molecules. Likewise, by decreasing the temperature the CO 2 molecules obtain lower kinetic energy and they do not have a tendency for releasing from the solution and for staying in a condition with more independence 58 . As a consequence, the solubility would increase.
Outlier discovery of the developed correlations. Outlier discovery plays an important role to identify data that may vary from other data points exist in a dataset 59 . The leverage technique is a trustworthy method for outlier discovery which concerns with the values of the standardized residuals and a matrix, namely the Hat matrix made of the actual and the predicted values obtained from the correlations 60 . According to this approach, if most of the data points located in the ranges of − 3 ≤ R ≤ 3 (R denotes the standardized residual) and 0 ≤ H i ≤ H*, it illustrates that the results of the generated correlations are dependable and valid [61][62][63] . Figures 8, 9 and 10 represent William plots of the generated correlations of CO 2 solubility in PBS, PBSA, and PS polymers, respectively. For PBS polymer it is obvious that all of the data points placed in a valid zone except one. Also, the results of the generated correlation of PBSA polymer show that all of the data points located in a valid region. At the end, Fig. 10

Conclusions
The present research aimed to predict CO 2 solubility as a strong effective parameter in polymerization processes. PBS, PBSA, and PS were three polymers, which were utilized in this work. For this purpose, gene expression programming (GEP) technique was applied. To this aim, a widespread dataset was gathered from previous literature. Results showed that the generated correlation of CO 2 solubility for PBS polymer could present the highest accuracy in predicting solubility of CO 2 with an AARE of 1.63%, SD of 0.028, and RMSE of 0.001. The sketched CO 2 solubility curves using the trend analysis demonstrated that all three generated correlations in this study could exactly fit the actual trends of CO 2 solubility variation. The simple generated correlations can be performed in wide ranges of pressures and temperatures and represent high accuracy. The leverage approach showed that all the data points seem to be reliable and valid except four, which were placed in a lower suspected and out of leverage zones. In order to precisely simulate CO 2 solubility in polymers in a future works, it is recommended to generate new correlations, and also develop intelligent schemes. Relative error, % Actual CO 2 solubility, g/g

Standardized residuals
Hat Valid data Out of leverage leverage limit Upper suspected limit Lower suspected limit Figure 10. The William plot of the generated correlation for PS polymer.