Developing an accurate empirical correlation for predicting anti-cancer drugs’ dissolution in supercritical carbon dioxide

This study introduces a universal correlation based on the modified version of the Arrhenius equation to estimate the solubility of anti-cancer drugs in supercritical carbon dioxide (CO2). A combination of an Arrhenius-shape term and a departure function was proposed to estimate the solubility of anti-cancer drugs in supercritical CO2. This modified Arrhenius correlation predicts the solubility of anti-cancer drugs in supercritical CO2 from pressure, temperature, and carbon dioxide density. The pre-exponential of the Arrhenius linearly relates to the temperature and carbon dioxide density, and its exponential term is an inverse function of pressure. Moreover, the departure function linearly correlates with the natural logarithm of the ratio of carbon dioxide density to the temperature. The reliability of the proposed correlation is validated using all literature data for solubility of anti-cancer drugs in supercritical CO2. Furthermore, the predictive performance of the modified Arrhenius correlation is compared with ten available empirical correlations in the literature. Our developed correlation presents the absolute average relative deviation (AARD) of 9.54% for predicting 316 experimental measurements. On the other hand, the most accurate correlation in the literature presents the AARD = 14.90% over the same database. Indeed, 56.2% accuracy improvement in the solubility prediction of the anti-cancer drugs in supercritical CO2 is the primary outcome of the current study.


Results and discussion
This section presents the idea of developing the modified Arrhenius correlation, adjusts its unknown coefficients, and compares its accuracy with other available correlations. The next part of this section is devoted to the performance analysis of the modified Arrhenius correlation using different graphical methods. Finally, the modified Arrhenius correlation is employed to monitor the effect of operating conditions on the anti-cancer drug solubility in SCCO 2 .
Developing the modified Arrhenius correlation. The massive data processing stages are performed on the experimental values of solubility of each drug in SCCO 2 to reach a general form of the proposed correlation as follows: Equation (11) states that the anti-cancer drug solubility in the SCCO 2 can be accurately estimated by combining an Arrhenius term and a departure function.
At this stage, it is necessary to clarify how the pre-exponential and exponential parts of the Arrhenius term are related to the influential variables. Then, the departure function incorporates to reduce the deviation between the Arrhenius term predictions and experimental measurements.
Spearman and Pearson are two well-known relevancy discovery scenarios in the field of data processing 62 . They introduce the relevancy between a pair of feature-response variables by a factor in the range of − 1 to + 1. The minus, zero, and positive factors correspond with indirect dependency, no-relation, and direct dependency, respectively 62,66 . The strength of either direct or indirect relevancy increases by increasing the magnitude of factors 67 . Furthermore, the higher absolute value of the Spearman than the Pearson factor confirms that the non-linear relationship is stronger than the linear one and vice versa 62,66 . Figure 1 exhibits the values of relevancy factor between anti-cancer drug solubility and pressure, temperature, and pure SCCO 2 density. This figure confirms that direct relationships exist between the response and all feature (11) y 2 = Arrhenius term + departure function Table 1. Literature data for solubility of anti-cancer drugs in supercritical carbon dioxide. CO 2 (1) + drug (2) Temperature (K) Pressure (MPa) CO 2 density (kg/m 3 ) Drug solubility* × 10 6 Table 2. Available empirical correlations for solute/drug solubility in supercritical carbon dioxide.

Correlation Formula
Equation (1) Jouyban et al. 36 ln y 2 = a 1 + a 2 ρ + a 3 P 2 + a 4 P T + a 5 T P + a 6 ln (ρ) Equation (2) Kumar and Johnstone 37 ln y 2 = a 1 + a 2 ρ + a3 T Equation (3) Garlapati and Madras 38 ln y 2 = a 1 + (a 2 + a 3 ρ) ln (ρ) + a4 T + a 5 ln (ρT) Equation (4) Bian et al. 39 Equation (5) Bartle et al. 40 ln (6) Méndez-Santiago and Teja 41 T ln y 2 P = a 1 + a 2 ρ + a 3 T Equation (7) Sodeifian et al. 42 ln Tan et al. 43 ln y 2 = a 1 ln (ρT) + a 2 ρ + a3 T + a 4 Equation (9) Gordillo et al. 44 ln y 2 = a 1 + a 2 P + a 3 P 2 + a 4 P T + a 5 T + a 6 T 2 Equation ( www.nature.com/scientificreports/ variables. The anti-cancer drug solubility has the strongest relationship with the pressure and weakest dependency to the temperature. Moreover, since the Pearson factors for temperature and CO 2 density are higher than the Spearman ones, the linear relationship is superior to the non-linear one. The higher Spearman factor than Pearson for the pressure shows that the anti-cancer drug solubility non-linearly relates to the pressure. These findings are in complete agreement with the mathematical form of the Arrhenius model. Indeed, the pre-exponential term can be a function of temperature and CO 2 density, and the exponential term provides the non-linear relation with the pressure.
The previous findings specify the linear dependency of the anti-cancer drug solubility on temperature and CO 2 density and its non-linear relationship with the pressure. Figures 2, 3 and 4 are plotted to approve these findings through visual inspection.
The experimental values of typical anti-cancer drug solubility in the SCCO 2 as a function of temperature are shown in Fig. 2. This figure approves that the temperature dependency of the solubility of the anti-cancer drugs is almost linear. The departure function is efficiently involved in compensating for the deviation from the linear relationship.
Since the density of the pure SCCO 2 changes by both pressure and temperature, it is impossible to monitor the dependency of the anti-cancer drug solubility on the CO 2 density in the two-dimensional graph. Hence, Fig. 3 depicts the solubility of a typical anti-cancer drug versus the product of pressure and CO 2 density. The linear dependency of the anti-cancer drug solubility on the pure SCCO 2 density can be inferred from this figure. Similar to the temperature, the departure function can compensate for the deviation from the linear relationship between drug solubility and CO 2 density.
The semi-logarithm presentation of typical anti-cancer drug solubility in the SCCO 2 versus the inverse of pressure is shown in Fig. 4. This figure approves that the anti-cancer drug solubility in SCCO 2 exponentially relates to the inverse of pressure, i.e., exp (− E a /P) . The observed deviation between the exponential data and predictions of the Arrhenius term for the pressure effect is then reduced by applying the departure function.
In summary, the following Arrhenius-shape correlation 68 is inferred to estimate the anti-cancer drug solubility in the SCCO 2 (Eq. 12).
It is expected that some deviations observe between the Arrhenius term predictions and actual solubility data. However, it is possible to enhance the accuracy of the Arrhenius-shape model by diminishing the observed deviations. Therefore, a new term (i.e., departure function) adds to the Arrhenius-shape part to compensate for (12) Arrhenius term = f 1 (T, ρ) exp − E a P Figure 1. Relevancy between the solubility of anti-cancer drugs in supercritical CO 2 and temperature, pressure, and carbon dioxide density.  www.nature.com/scientificreports/ this deviation. The observed deviation shows the highest compatibility with the natural logarithm of the CO 2 density to the temperature as follows: In summary, the general form of the proposed correlation achieves by combining the Arrhenius term and departure function (Eq. 14).
Equation (15) presents the final form of the proposed correlation for estimating the solubility of the anticancer drugs in supercritical CO 2 .
The pre-exponential part of the Arrhenius term linearly combines the effect of temperature and CO 2 density, while its exponential part is a function of pressure only. The departure function linearly relates to the natural logarithm of the CO 2 density to the temperature ratio.
Adjusting the coefficients of the correlations. After determining the general form of the proposed correlation, it is now necessary to adjust its coefficients using an appropriate method. The differential evolution (DE) optimization algorithm 69,70 is employed to adjust these unknown coefficients through a non-linear regression process. The absolute average relative deviation (AARD%) between the model predictions and actual measurements is an objective function for the optimization stage. The AARD% formula can be expressed by Eq. (16) 71 . Table 3 presents the adjusted coefficients for estimating the solubility of different anti-cancer drugs in the SCCO 2 .
The literature has already used some correlations (see Table 2) to estimate the anti-cancer drug solubility in SCCO 2 . Therefore, the researchers readjusted coefficients and apply them in the drug/SCCO 2 systems. However, readjusting the coefficients of other ones are accomplished in the current study. Supplementary file presents the  Table 4. First of all, it is better to clarify that the highlighted cells (gray color) are calculated in the present study, and the clean cells are those reported in the literature. As mentioned earlier, the associated coefficients for calculating this AARD% are presented in Supplementary file. The cells shown by the bold font are the smallest AARD% (the best results) obtained for estimating a specific anti-cancer drug in supercritical CO 2 . It is obvious that the modified Arrhenius correlation provides the most accurate results for solubility of six out of twelve anti-cancer drugs in SCCO 2 (i.e., sorafenib tosylate, sunitinib malate, azathioprine, tamsulosin, 5-fluorouracil, thymidine). On the other hand, the derived correlation by Bian et al. 39 predicts the solubility of busulfan, tamoxifen, and decitabine in supercritical CO 2 with the highest accuracy. Finally, the Garlapati and Madras 38 , Sodeifian et al. 42 , and Tan et al. 43 correlations provide the most accurate predictions for only one anti-cancer drug. Figure 5 exhibits the results of ranking analysis on the accuracy of the modified Arrhenius model and available empirical correlations in the literature for calculating the solubility of different anti-cancer drugs in supercritical CO 2 . It can be readily deduced that the proposed correlation in the current study not only presents the most accurate predictions for six anti-cancer drugs, it also has two second and three third ranks. The worst accuracy of the modified Arrhenius correlation is associated with capecitabine solubility in the SCCO 2 (i.e., the fourth rank). The proposed correlation by Bian et al. 39 with the three first, two second, four third, one fourth, and one ninth ranks is the next reliable model for the given task. On the other hand, the proposed correlations by Gordillo 44 , Jouyban et al. 36 , and Tan et al. 43 have the highest levels of uncertainty, respectively.
Overall ranking of the correlation. This section investigates/compares the accuracy of the modified Arrhenius model and available empirical correlation in the literature for estimating the whole of the database (solubility of all anti-cancer drugs in supercritical CO 2 ). Hence, Fig. 6 illustrates the results of ranking analysis for the overall accuracy of the considered empirical correlations.
As expected, the modified Arrhenius correlation (with the smallest overall AARD = 9.54%) takes the first ranking place for the whole of the experimental databank. The Bian et al. correlation 39 with the overall AARD = 14.90% is the next accurate model for the given purpose. Generally, all available correlations in the literature have the AARD% equal to or higher than 14.9%. Indeed, the modified Arrhenius correlation improves the accuracy of available models in the literature by at least 56.2%.
Performance monitoring of the modified Arrhenius correlation. The agreement between the experimental solubility data and calculated values by the developed modified Arrhenius correlation is plotted in Fig. 7. This figure includes the solubility of all anti-cancer drugs in the supercritical carbon dioxide. Despite an infinitesimal range of the solubility data (~ 10 -4 ), an acceptable compatibility can be observed between actual and calculated information. The modified Arrhenius correlation provides the R 2 (regression coefficient, Eq. 17a 72 ) of 0.98479 and standard error of 2.02 × 10 -5 for all 316 experimental data.
(17a) Table 3. Adjusted coefficients of the proposed correlation for estimating the solubility of anti-cancer drugs in supercritical CO 2 .

Gordillo 44
Sorafenib tosylate 10.30 19 13.70 19 15.30 19    This figure confirms that the proposed correlation has successfully correlate the experimental solubility data to its corresponding influential variables. Excluding only three experiments, all other solubility measurements are estimated with the − 0.5 < RD < 0.5.
Differentiating between outlier/valid data. The focus of this section is concentrated on diagnosis of either valid and suspect data. The experimentally-measured information often contain noises 74 and uncertainties 75 . The leverage method is used to conduct this analysis 76 . As Fig. 9 shows, the leverage method discriminates between the valid (□ symbols) and suspect (○ symbols) information by plotting the standardized residual (SR) as a function of hat index. The SR can be obtained by dividing the residual error (RE) by its standard deviation (SD). Equations (18) to (21) present the RE, average value of RE, SD, and SR formula, respectively 77,78 .  www.nature.com/scientificreports/ Applying the leverage method on the experimental databank and estimated values of anti-cancer drug solubility ( Fig. 9) justifies that the major segment of the experimental data (92.72%) is valid, and only 23 datasets may be outliers.
The excellent accuracy of the modified Arrhenius correlation is previously approved using experimental data and comparison by other available models in the literature. Moreover, the current analysis confirms the validity of the experimental databank. Therefore, it can be claimed that the modified Arrhenius correlation can be readily used in the real application.
The numbers of possible outlier for each anti-cancer drug are reported in Fig. 10. It seems that the experimental solubility data for capecitabine, paclitaxel, and 5-fluorouracil with no outlier are the most reliable information. On the other hand, the solubility measurements of decitabine and tamoxifen (with seven and six outliers) in SCCO 2 are the under-question experiments.
Investigating the effect of operating conditions. It is previously shown in Table 4 that the modified Arrhenius correlation predict sunitinib malate (AARD = 3.89%) and thymidine (AARD = 16.64%) with the highest and lowest accuracies, respectively. This section investigates the effect of pressure and temperature on the solubility of these anti-cancer drugs in the SCCO 2 both experimentally and modeling. Figure 11 explains the effect of isothermal variation of the operating pressure on the sunitinib malate in supercritical carbon dioxide, while Fig. 12 is associated with the thymidine/SCCO 2 binary system.
Excluding some scattering data in Fig. 12, generally the solubility of anti-cancer drugs in SCCO 2 increase by increasing either pressure or temperature. This finding is in complete agreement of relevancy analysis (see Fig. 1). Moreover, an acceptable level of agreement exists between actual solubility data and their associated predictions by the modified Arrhenius correlation.
A relatively high scattering measurements for thymidine/SCCO 2 system (especially at higher temperatures) is responsible for observed deviation between actual and modeling data. It is worth noting that this is the most accurate predictions among eleven different empirical correlations (Supplementary Information).
Investigating the effect drug type. By measuring the average value of solubility of different anti-cancer drugs, it is concluded that busulfan and tamoxifen have the highest tendency for dissolution in supercritical CO 2 , while the sorafenib tosylate and tamsulosin show the lowest tendency. Figures 13 and 14 present the modeling and experimental data for two high-soluble and two low-soluble anti-cancer drugs in SCCO 2 , respectively. The provided AARD of 7.92% (busulfan) and 7.40% (tamoxifen) for Valid data Suspect data Upper suspect limit, 3% Cut off Lower suspect limit, -3% Cut off Warning leverage   It should be mentioned that this level of uncertainty for this ultra-low variable (anti-cancer drug solubility in SCCO 2 ) has its own scientific and real-field merits.
Maximum achievable drug solubility in SCCO 2 . The previous analysis approved that the busulfan is the most soluble anti-cancer drug in the supercritical CO 2 . Therefore, for locating the operating condition that maximizes the busulfan solubility in the SCCO 2 , it is necessary to monitor it for all pressures and temperatures. Figure 15 exhibits the busulfan solubility in SCCO 2 for all possible operating conditions from experimental and modeling perspectives.
Like all other analyses, an excellent performance of the modified Arrhenius correlation can be justified in this analysis too. This figure also clarifies that the positive effect of pressure on the drug solubility intensifies by increasing the temperature. In other word, the slope of solubility with respect to the pressure increases by increasing temperature.
Finally, both experimental data and modeling results show that the highest busulfan solubility in the SCCO 2 may be achieved at the highest allowable temperature and pressure (i.e., P = 40 bar, T = 338 K).

Conclusion
A combination of the Arrhenius-shape and departure functions is proposed to correlate the anti-cancer drug solubility in the supercritical carbon dioxide. The pre-exponential part of the Arrhenius-shape term is linearly related to the temperature and carbon dioxide density, and its exponential part inversely relates to the pressure. The departure function is directly related to the natural logarithm of the carbon dioxide density to the temperature ratio. The developed correlation outperformed all well-known literature equations for predicting the solute solubility in supercritical carbon dioxide. The modified Arrhenius correlation provided the AARD = 9.54% and R 2 = 0.98479 for estimating all experimental datasets in the literature. In contrast, the most accurate correlation in the literature (i.e., Bian et al. correlation) showed the AARD = 14.90% for predicting the considered database. It is possible to improve predicting accuracy of anti-cancer drug solubility in supercritical CO 2 by more than 56% using the developed correlation in this study. The relevancy analysis exhibited that anti-cancer drug solubility in supercritical CO 2 increases by increasing either pressure and temperature. Furthermore, it is found that less than 7.5% of the literature data are suspect information, and the remaining 92.5% are valid measurements.

Data availability
All data generated or analyzed during this study are available on reasonable request from the corresponding author.