Investigation of COSMO-SAC model for solubility and cocrystal formation of pharmaceutical compounds

In this study, a predictive model named COSMO-SAC was investigated in solid/liquid equilibria for pharmaceutical compounds. The examined properties were the solubility of drug in the pure and mixed solvents, octanol/water partition coefficient, and cocrystal formation. The results of the original COSMO-SAC model (COSMO-SAC (2002)) was compared with a semi-predictive model named Flory–Huggins model and a revised version of the COSMO-SAC (COSMO-SAC (2010)). The results indicated the acceptable accuracy of the COSMO-SAC (2002) in the considered scope. The results emphasized on the suitability of the COSMO-SAC model for simple molecules containing C, H, and O by covalent and hydrogen bonding interactions. Applicability of the COSMO-SAC for more complicated molecules made of various functional groups such as COO and COOH doubly requires more modification in the COSMO-SAC.


Methods
COSMO file and sigma profile. As described before, the basis of the COSMO-SAC model is quantum mechanics through density function theory calculations. Several commercial or free software provide preliminary information for COSMO-SAC in the form of a text file called COSMO-file. Dmol 3 module in Materials Studio and academic free software GAMESS are few examples. In COSMO calculations, a molecule separates into several parts called segment and charge distributions over entire segments are calculated in order to neutralize whole molecule. Location of segments, segment areas and charge densities are the computed properties in COSMO file. In order to perform COSMO-SAC calculations, the following data must obtain from COSMO-file: (1) surface area ( A ) and cavity volume of the molecule ( V ), (2) location of segment (a vector with x, y and z coordination), its charge density ( σ * n ) and area ( A n (σ ) ). The mentioned information were modified in order to make the sigma profile ( p(σ ) ) required for COSMO-SAC calculations. Klamt et al. 4 introduced the following equation to average the charge densities from COSMO-file In the above equation, d mn is the distance between two segments n and m. The r n (segment radius) is obtained from segment area as follows: Mullins et al. 16 reported the value of r ave . The sigma profile defined as the probability of finding segments with charge density σ m : where n is determined from accounting the number of segments with specific charge density σ m and A(σ m ) is surface area with charge density σ m .
Generally, for most molecules, charge density values range between − 0.025 to 0.025 ė A 2 . Four steps for generating the sigma profile are as below: 1. Consider 50 intervals by 0.001 increments in charge density range − 0.025 to 0. 025. 2. Each interval is defined by lower and upper bounds, σ left and σ right . Firstly, find the charge densities distributed at interval i and calculated their contributions according to: 3. Afterward, calculate probabilities at lower and upper bounds of interval i as below: (1) σ m = n σ * www.nature.com/scientificreports/ 4. The sigma profile is generated by plotting sigma values versus the calculated probabilities.
As described in literature review, some authors divided the sigma profile into parts to have a better description of hydrogen-bonding (hb) interactions. Hsieh et al. 22 proposed to separate the sigma profile into non hydrogen bounding, hydroxyl group (OH) and non-hydroxyl group as follows equation (COSMO-SAC (2010)): where p NHB (σ m ) donates probabilities of all non-hydrogen bounding atoms, p OH (σ m ) shows probabilities of OH bounding and p OT (σ m ) determines F, N, and hydrogen atoms connected to F and N atoms. The above-mentioned contributions were determined as follows: where σ o is threshold for hydrogen bounding determination and its values is 0.007 ė A 2 . COSMO-SAC model. COSMO-SAC (2002). In the COSMO-SAC model, activity coefficients computed by solvation energy were obtained from ab initio solvation calculation at two steps: (1) the dissolution of a solute in the conductor, (2) conversion of the conductor into a real solvent. The activity coefficient of component i in solvent S in the COSMO-SAC ( γ i,S ) obtained by considering two contributions; combinatorial part. (γ C i,s ) and residual part(γ R i,s ) as follows 6 : The size and shape differences of the molecules are accounted in the combinatorial part and calculated by the Staverman-Guggenheim term as follows 26 : where θ i , φ i and l i are defined as follows: In the above expressions, q i and r i are related to cavity volume of component i ( V i ) and total surface area of molecule i ( A i ) obtained from the COSMO-file and defined as follows: where r o and q o are the normalized volume and normalized surface area. The residual part of the COSMO-SAC (2002) was defined as follows 6,17 : where n i , effective segment number of molecule i, is correlated with effective segment surface area ( a eff ) and surface area of molecule i ( A i ) according to below expression: where Ŵ(σ m ) is the segment activity coefficient and calculated from: www.nature.com/scientificreports/ The exchange energy �W(σ m , σ n ) is defined: The c hb and σ hb are the energy-type constant and cutoff value for hydrogen bonding interaction 16 . The σ acc and σ don are maximum and minimum values of σ m and σ n . α ′ accounts the misfit energy and the T and R are system temperature and the universal gas constant. The values of above mentioned parameters are reported in Mullins et al. 16 . In Eq. (16), the sigma profile for the mixture ( P S (σ ) ) are obtained from: COSMO-SAC (2010). After establishing NHB, OH, and OT sigma profiles, the segment activity coefficient calculates as follows: where subscript j shows pure liquid or mixture and subscript t denotes NHB, OH, and OT sites. The exchange energy has defined based on interaction between segments of different types, and is given by: In the above equation, φ is the volume fraction ( φ i = x i V i i x i V i ) and V is the molar volume.χ is the Flory-Huggins interaction parameter obtained from the Hansen solubility ( δ ) contributions in the forms non-polar (dispersion) forces (d), polar forces (p) and hydrogen-bonding (h) effects as follows 27 : The Hansen solubility parameters and their contributions were obtained by group contribution methods according to the following equations 28 : . Solid-liquid equilibria. In solid-liquid equilibria, the solid solubility in liquid phase is calculated according to the following expression: where x i and γ i stand the solubility and activity coefficient of compound i. The activity coefficient in the above expression was computed from the considered models as described before. H m , C P and T m represent the fusion enthalpy, the heat capacity of phase change between solid and liquid phases and the melting point temperature, respectively. In the current study, the second term of Eq. (27) was neglected ( C P = 0).

Partition coefficient.
When the equilibrium condition between two immiscible liquid phases establishes, the components distribute between two phases. The distribution of component i between two phases α and β measured by partition coefficient as follows 15 : where x α i and x where a and b are stoichiometric coefficient of substances A and B in the cocrystal. In the above equations, the K cc is solubility product and are computed by the following equation: The activity coefficients in Eq. (31) computed from the examined model. The solubility product (K CC ) is depend only on temperature and independent to solvent type. By knowing solubility product at single point, it can be applied to other conditions. After obtaining solubility product for desired system, the invariant points as intersections of cocrystal line and solubility line were computed by simultaneous solvation of Eqs. (27) and (31). Afterward, the cocrystal region is determined by varying drug mole fraction between two invariant points and obtaining API mole fraction from Eq. (31).

Statistical analysis.
In order to explore model precision in comparison to experimental data, several statistics were applied such as absolute average percentage deviation (% AAD), root mean square error (RMSE), mean square error (MSE), normalized root mean square error (NRMSE) and normalized mean square error (NMSE). MSE, NRMSE and NMSE were obtained from goodness of Fit function in MATLAB programming software. Absolute average percentage deviation was calculated as following equations: where cal are exp calculated and experimental data of desired properties and n is number of experimental data. The root mean square error (RMSE) was obtained as follows:  Figures 1 and 2 compare sigma profiles generated in current studies for ibuprofen and acetyl salicylic acid in comparison to sigma profiles in the database provided by Mullins et al. 16 . Based   www.nature.com/scientificreports/ on Figs. 1 and 2, the same trends between results in this study and Mullins et al. 16 were observed. The small departures between two curves originated from the software version and the sigma profile generation program.
After generating the sigma profiles and providing the COSMO-SAC computation program for the activity coefficient, the solubilities in the binary and ternary systems were calculated and compared by experimental data obtained from the literature.  15 . The accuracy of these two COSMO-SAC models has been comprehensively examined through a very large dataset, containing 29,173 data points of infinite dilution activity coefficient and 139,921 VLE data points of 6940 binary mixtures 31 . The mentioned inconsistency arises from different universal constants implemented in sigma profile generation. The differences in investigated systems attribute the second reason for the observed inconsistency.
It is interesting that the COSMO-SAC (2002) was obtained by only eight universal constant parameters without any further modifications. A list of considered pharmaceutical compounds and their physical properties and references for experimental data were presented in supplementary materials (Table S1).
The Hansen solubility parameters, molar volumes for the Flory-Huggins model and the COSMO molar volume of the examined pharmaceutical compounds and solvents were presented on Table 1. Based on Table1, the molar volume obtained from group contribution method in Barton 28 and the COSMO calculations have some difference. Table 2 reports the COSMO-SAC (2002), the COSMO-SAC (2010), and the Flory-Huggins results for some pharmaceutical compounds categorized by the solvent type and sorted according to absolute average deviations (AAD%). The RMSE results for the COSMO-SAC (2002), the COSMO-SAC (2010), and the Flory-Huggins models were also reported in Table 2. Based on Table 2, the predictive model of the COSMO-SAC (2002) has a wide range of errors that are in agreement with errors reported by Hsieh et al. 15 . The COSMO-SAC (2010) and the Flory-Huggins have larger errors compared to the COSMO-SAC (2002).
According to Table 2, pharmaceutical compounds containing H, C and O with the lowest hydrogen bonding numbers have the lower error. Besides, the structure of molecule has a remarkable influence on accuracy. In the case of acetaminophen and acetyl salicylic acid, by solvent replacement from ethanol to acetone, deterioration in model prediction was observed. The impact of eliminating F atom from flurbiprofen observes in the lower error reported for ibuprofen. Although borneol and isoborneol have the same chemical formula, the accuracy of the COSMO-SAC (2002) for them is entirely different. The above studies implied that molecular structure, atoms, and intermolecular interaction must be widely incorporated into the COSMO-SAC model. Since, the COSMO-SAC (2002) provides better approximations of solubility in examined systems, we prefer utilizing the original COSMO-SAC (2002) in our further investigation on the binary and ternary systems. Afterward, two www.nature.com/scientificreports/ models, the COSMO-SAC (2002) and the Flory-Huggins models were considered for the octanol/water partition coefficient and cocrystal formation. Afterward, the ternary systems of pharmaceutical compounds in binary solvents were also examined. On the basis of Table 2, two pharmaceutical compounds, acetaminophen and salicylic acid, were suggested. Acetaminophen consists of 20 atoms H, C, N, and O and two functional groups, OH and NH. Salicylic acid consists of 16 atoms H, C, and O, and two functional groups, OH and COOH. Figure 4 presents the comparison between the experimental and calculated solubilities of acetaminophen in ethanol/water mixtures as a function of ethanol mole fraction at two temperatures, 293.15 and 303.15 K. According to Fig. 4, a good agreement between experimental data and the COSMO-SAC calculations observe. The observed trends of the COSMO-SAC as a function temperature match with the reported experiments.   www.nature.com/scientificreports/ Figure 5 shows the calculated solubility of salicylic acid in ethanol/ethyl acetate mixture compared to experimental data. On the basis of Fig. 5, a departure from experimental data was observed at higher ethyl acetate mole fraction. The ethyl acetate has a functional group COO which its interaction with COOH in salicylic acid has been ignored in the COSMO-SAC (2002).
The octanol/water partition coefficients for some pharmaceutical compounds obtained from the COSMO-SAC model. In Table 3, the results of the octanol/water partition coefficient from the COSMO-SAC model compared to experimental data from the national library of medicine 34 Table 3, the various accuracies obtained regarding activity ratio in the octanol/water partition coefficient. In the octanol/water partition coefficient, if the errors in the numerator and denominator cancel each other out, a good accuracy between the COSMO-SAC computation and experiment is harvested. Otherwise, the discrepancies in obtained errors were seen. It is possible that the COSMO-SAC model fails for solubility prediction (such as dapsone) but presents a reasonable estimation of the octanol/water partition coefficient due to the above discussions. As observed from Table 3, the simple molecules made of H, C, and O by only hydrogen bonding  Table 3, the octanol/water partition coefficients obtained from the Flory-Huggins model are farm from experimental data. In order to investigate a more complex system, a three-phases diagram of ternary system is explored by considering the sulfamethazine/salicylic acid cocrystal formation in methanol at 283.15 K, which studied by Ahuja et al. 35 . Details of calculation and methods were described in "Cocrystal formation" section. After performing the computation by the COSMO-SAC (2002), a triangular diagram of the considered system was plotted by a free software named ProSim Ternary Diagram. On the basis of Fig. 6 and experimental plots in Ahuja et al. 35 , some differences between experiments and the COSMO-SAC calculations were observed. The cocrystal region for SM/SA predicted by the COSMO-SAC is wider, while experimental data imply on the narrow region. The solubility line of SM in SA + ME mixture expanded in the COSMO-SAC model in comparison to experiments which interpreted by the COSMO-SAC ability in the considered system. The predicted solubility line of SA in the SM + SA is appropriately closer to the reported experimental data which indicates the good performance of the COSMO-SAC for SA. The reported inconsistencies in observed results originated from molecular structure, constituent atoms, and their interactions. The electronegative atoms S and N in sulfamethazine create the observed discrepancies, while their contributions were not considered in the COSMO-SAC (2002) model. The ternary phase diagram carbamazepine (CBZ)/acetylsalicylic acid (ASA) in ethanol (ET) at 298.15 K were computed by the COSMO-SAC (2002) and plotted in Fig. 7. Veith et al. 29 studied the CBZ/ASA/ET by PC-SAFT EOS. According to Veith et al. 29 , the PC-SAFT EOS without binary interaction parameters estimated the narrow  www.nature.com/scientificreports/

Conclusions
The COSMO-SAC as a predictive model has been gained a great attention in thermodynamic modeling and phase equilibria considerations. The eight universal parameters and predefined atomic radiuses for C, H, O, S, N, F, and Cl are the general basis of the COSMO-SAC model. In the current study, the COSMO-SAC model implemented in solid-liquid phase equilibria in form of solubility data in binary and ternary systems, octanol/ water partition coefficient, and cocrystal studies. For more comparison, the COSMO-SAC model was also compared with the Flory-Huggins model. The obtained results implied that molecular structure, constituent atoms, functional group, and their interactions have remarkable impacts on the obtained results. In general, the simple molecules made of atoms H, C, and O under special condition, atom N by simple covalent and hydrogen bonding interactions can be deliberated by the COSMO-SAC model. The presence of other atoms such as F and S and other functional groups such as COO and COOH made complex systems. This complexity provides some opportunities to modify the original the COSMO-SAC model. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.