Group contribution and atomic contribution models for the prediction of various physical properties of deep eutectic solvents

The urgency of advancing green chemistry from labs and computers into the industries is well-known. The Deep Eutectic Solvents (DESs) are a promising category of novel green solvents which simultaneously have the best advantages of liquids and solids. Furthermore, they can be designed or engineered to have the characteristics desired for a given application. However, since they are rather new, there are no general models available to predict the properties of DESs without requiring other properties as input. This is particularly a setback when screening is required for feasibility studies, since a vast number of DESs are envisioned. For the first time, this study presents five group contribution (GC) and five atomic contribution (AC) models for densities, refractive indices, heat capacities, speeds of sound, and surface tensions of DESs. The models, developed using the most up-to-date databank of various types of DESs, simply decompose the molecular structure into a number of predefined groups or atoms. The resulting AARD% of densities, refractive indices, heat capacities, speeds of sound and surface tensions were, respectively, 1.44, 0.37, 3.26, 1.62, and 7.59% for the GC models, and 2.49, 1.03, 9.93, 4.52 and 7.80% for the AC models. Perhaps, even more importantly for designer solvents, is the predictive capability of the models, which was also shown to be highly reliable. Accordingly, very simple, yet highly accurate models are provided that are global for DESs and needless of any physical property information, making them useful predictive tools for a category of green solvents, which is only starting to show its potentials in green technology.


Equations 1 to 30 in
present the developed GC and AC models. All of the investigated properties are functions of temperature, thus T introduces the system temperature to the equations, in kelvins. The superscripts of G and A denote the type of the model, being either GC or AC, respectively. Mw is the molecular weight of the DES in g/mol. X 1,i and X 2,i (where X = ρ, n, Cp, u or σ) are the contributions (weights) of each group/atom of type i for the GC and AC models. k i and l j indicate the number of occurrence of the functional group/atom of type i in the HBA and HBD molecules, respectively. p is the total number of HBA functional groups/atoms and q is the total number of HBD functional groups/atoms. m HBA and m HBD are the normalized number of moles of the HBA and HBD components making up the desired DES. For this purpose, the values of m HBA and m HBD are Table 1. The list of proposed AC and GC models for density, refractive index, heat capacity, speed of sound and surface tension of DESs at atmospheric pressure.

AC model
Density (g/cm 3 ) Mw T −0.7434 + 0.5139 (18) Refractive index n A = n A 1 Mw −0.2975 + n A 2 Mw T −2.9213 + 1.4335 (21) Heat capacity (J/mol K) Speed of sound (m/s) Surface tension (mN/m) σ A = σ A 1 − σ A 2 T 0.0099 + 40.4052 (30) Scientific Reports | (2021) 11:6684 | https://doi.org/10.1038/s41598-021-85824-z www.nature.com/scientificreports/ normalized based on the smallest value of m HBA or m HBD in any DES. For example, if for an arbitrary DES, the number of moles of HBA and HBD are 2 and 3, respectively, they should both be normalized by dividing each by 2 (which is the smaller number of moles), leading to the values of m HBA = 1 and m HBD = 1.5. These normalized values must be used in Eqs. 1 to 30. In this way, when using the proposed models, one of the HBA or HBD mole numbers will always be equal to one, while the other will be greater than one. The proposed models were developed based on data at atmospheric pressure, therefore, they are not recommended at higher or lower pressures. For each of the investigated physical properties, Table 2 presents the contributions (weights) of the functional groups for the GC models, while Table 3 lists the corresponding values for the atoms in the AC models. The deviations are shown separately for the training and test datasets in Fig. 1 for the GC model, and Fig. 2 for the AC model. From these figures it is interpreted that for all of the properties, there are no significant differences between the training and test datasets in terms of deviations, as both sets cover similar ranges. This is quite promising for the accuracy of predictions, and holds for both the GC models ( Fig. 1) and the AC models (Fig. 2).
Following the above validation of the test dataset using Figs. 1 and 2, all of the following discussions are for the entire databank, as we saw no necessity to separate the correlative and predictive datasets which show rather similar performances in accuracies.
The accuracies of the models are further investigated using the statistical parameters of absolute average relative deviation percent (AARD%), absolute relative deviation percent (ARD%), relative deviation percent (RD%), absolute average deviation (AAD), and standard deviation (S), as defined by Eqs. (32) - (36), respectively: In these equations, N is the number of investigated data points, X Model i is the calculated value of the property X by the model, and X Exp i is the corresponding experimental value of the property X, where X can be ρ, n, C p , u, or σ. The values of these statistical parameters for the entire dataset are presented in Table 4 for both the GC and AC models. The small deviations with respect to the experimental values indicate the accuracies of both models.
According to the results, the GC models show smaller error values for almost all of the statistical parameters in comparison to the AC model. The greatest differences in accuracies between the two models are observed for refractive index, heat capacity, and speed of sound, which have GC AARD% values that are nearly one-third of the corresponding AC models. For density, GC still show less errors than AC, while for surface tension, both models While a GC functional group can distinguish between, for example, acids and alcohols, in the AC models a hydrogen atom behaves the same whether in a hydrocarbon, an acid, or an alcohol. Chemistry, of course, has taught us the significant differences in the behavior of the H atom within CH 3 and OH. This highlights the main preference of the GC models over the AC models. However, the AC models of this study, although less accurate, are still acceptable in their errors and can be used not only for estimations, but also predictions. The AC models have the main advantage of simplicity. Decomposition of a compound into its atoms is, in fact, so simple that it allows an atomic model to be very easily incorporated into computer codes and software. This is  www.nature.com/scientificreports/ not as easily done for the GC models. Furthermore, the decomposition into atoms always gives a unique result for a specific structure, while the decomposition into groups can sometimes be up to different interpretations, leading to different results. Because both models have acceptable results and errors, this double-model study allows a freedom of choice by the users depending on their aims, circumstances, and desired accuracy. For a more detailed investigation, the values of AARD% and maximum ARD% are also presented individually for each specific DES in Tables S1-S5 for both the GC and AC models.
Furthermore, to check the distribution of errors over the entire range, the number of data points corresponding to their ARD% values were categorized into four ARD% ranges and reported in Table 5. According to the previously deduced results of Table 4 that the GC models have the lower errors, it is expected that the GC models will have a greater number of data in the lower ARD% ranges with respect to the AC models. This is validated    Table 5. For all five physical properties the GC models have the greatest number of data points within the smallest ARD% category. In the AC models, however, the data are more evenly distributed throughout the various error categories, although still having the greatest number of data within the least erroneous category. This holds for all of the properties. The distribution of the data into different ranges according to their RD% is shown graphically in Fig. 3. By differentiating between positive and negative relative deviations, this figure can indicate any possible bias regarding overestimations or underestimations, which could not be distinguished using the ARD% distribution comparison of Table 5. According to Fig. 3, both models show rather normal behavior in RD%, with no bias in their estimations, as the bell-shaped curves are more or less symmetric around the point zero. This holds for all of the different properties. Furthermore, the rather tall and slim shapes of the RD% domes are evidences of the high accuracy of the property models for the majority of the data, as contrasted to the more flattened-out shapes, which would have resulted if the accuracies were not high for a larger number of data. It is further observed in Fig. 3 that the peaks of the GC models are situated higher than the corresponding AC peaks, indicating the more reliable results of the GC model for a greater number of DESs. Table 6 presents the results of the two models based on the molecular weights of the DESs, categorized into four groups (molecular weight ranges: < 100, 100-150, 150-200, and > 200). While some group contribution methods of literature show systematic changes of errors with increasing molecular weights, this is not the case with the GC model of this study for any of the properties. In the case of the AC model, however, greater errors are observed for the larger molecular weights for the properties of heat capacity and speed of sound. The performances of the models are further investigated according to the nature of the HBA and HBD constituents and the comparisons are presented in Tables S6 and S7 of the Supporting Information.
All of the models of this study are also compared to the available literature models on DESs for each property (Table 4). It should be noted that component-specific literature models were not considered in this comparison, i.e., correlations developed for only a specific DES with equation constants that are valid for that one particular DES only. However, for a more comprehensive investigation, since models specific to DESs are very limited, we have also considered correlations for a close family of solvents, i.e., the ionic liquids. Additionally, physical property models for organic compounds have also been considered for a broader comparison. Table 4 presents these results for each of the investigated physical properties.
Regarding density, the only available generalized DES model of literature is the correlation of Haghbakhsh et al. 25 . The GC and AC models of this work show lower AARD% values, and almost similar values of AAD and S as the correlation of Haghbakhsh et al. 25 . However, a further issue of importance, in addition to accuracy, is the wide applicability and simplicity of a model. The correlation of Haghbakhsh et al. 25 has the following functionality.
This temperature-dependent function requires the critical temperature (T c ), critical volume (V c ), and acentric factor (ω) of the DES. These properties, when not available, can be calculated by the modified Lydersen-Joback-Reid group contribution model for each of the HBA and HBD components 117,118 , followed by the use of an appropriate mixing rule, such as the Lee-Kesler mixing rules 119 to calculate the desired property for the DES. In this manner, the calculations of the input parameters, alone, require nine different calculations, six of which are themselves group contribution in nature. The calculations required by the model of this study are far less cumbersome. In addition to the models of Haghbakhsh et al. 25 and Mjalli et al. 18 specific to DESs, the general density correlations of Rackett 120 , Spencer and Danner 121 were compared to the proposed GC/AC models of this study. In general, the present GC and AC models are both superior not only to the Rackett 120 , Spencer and Danner 121 models, which are general, but even the correlations developed specifically for DESs.
In the literature, there is only one generalized model available for the refractive indices of DESs, as given by Taherzadeh et al. 20  Table 4, indicate that the GC approach outperforms the other two. The AC model shows slightly better results than those of Taherzadeh et al. 20 Since the literature model requires knowledge of the critical pressure and acentric factor, which are themselves calculated by a combination of other group contribution models and mixing rules 117-119 , the two models of this work are, not only higher in accuracy, but also easier in calculations. Furthermore, results of two models by Riazi and Daubert 122,123 as well as the models of Riazi and Al-Sahhaf 124 and Lorentz-Lorenz 125 , all developed generally for organic compounds, are compared in Table 4. The results indicate that the Riazi and Al-Sahhaf 124 and Lorentz-Lorenz 125 models are promising models for DESs, however, both of the proposed GC/AC models still outperform the former. Apart from the model proposed in this study, there is one further generalized model available in the literature for calculating the heat capacities of DESs, as proposed by Taherzadeh et al. 23 ,    23 This is the case for all of the statistical parameters investigated. The model of Taherzadeh et al. 23 shows better results than the AC model, at the cost of more cumbersome calculations. The AC model is easier to use than both the models of Taherzadeh et al. and GC. In addition to the model of Taherzadeh et al. 23 , the literature models for the next closest families of substances were considered in the comparisons. These include the heat capacity correlations of Ahmadi et al. 126 , Huang et al. 127 , Ge et al. 128 and Oster et al. 129 which were developed for ionic liquids (Table 4), indicating that none of the heat capacity models proposed for ionic liquids are suitable.
For comparison with DES literature models on the speed of sound, only one general correlation was available, namely the approach of Peyrovedin et al. 24 , According to the results given in Table 4, the GC model shows higher accuracies with respect to all of the statistical parameters investigated. Following the GC, the AC model shows the better AARD% value with respect to the DES model of Peyrovedin et al 24 . The GC/AC models also show better results with respect to the ionic liquid-specific models of Haghbakhsh et al. 130 , Hekayati and Esmaeilzadeh 131 , Gardas and Coutinho 132 and Singh and Singh 133 .
The literature correlation of Haghbakhsh et al. 22 , specifically developed for the surface tension of DESs, has the following functionality. Table 4 shows that the GC model has the smallest statistical errors in all aspects, and so it is the most reliable of the three. Following the GC, the AC model is more accurate than the model of Haghbakhsh et al. 22 The AC model is the simplest of the three models, and the model of Haghbakhsh et al. 22 requires the greatest amount of calculations since the values of critical temperature, critical pressure, critical volume and acentric factor, when not available, need to be calculated by other group contribution methods 117,118 and mixing rules 119 , in addition to   25 . Also, both the GC/AC models show better results with respect to the organic compound models of Escobedo and Mansoori 134 , Curl and Pitzer 135 and Gharagheizi et al. 136 , which is of course expected as these are more generalized models.

Discussion
Up to date, there are no direct group contribution models available in the literature to estimate a variety of physical properties of DESs of various types and natures in order to fill this vital gap, we decided to propose two models, a group contribution model and an atomic contribution model for the estimation of some of the most important physical properties of DESs. In order to cover the properties of density, refractive index, heat capacity, speed of sound and surface tension. The methods presented are general and applicable to a great range of DESs. This is not only because a large number of the groups or atoms of DESs are covered, but also because the databank used to develop the models is the most recent and complete set of data to date. Furthermore, because the group contribution models consider the effects of different functional groups, they are also predictive models, possessing the physical backgrounds of group contribution models. Therefore, with the current exponential growth of academic and industrial interest in DESs, the models provided in this study can be of significant value for the estimation of physical properties which are often necessary in the progress of the field of DESs. With both the group and atomic contribution models, our goal was simplicity of the groups for ease of use. For this reason, the number of groups of the model is rather small compared to typical group contribution models, and the groups, themselves, are quite simple. Because of this, we expect that users will not be confronted with the ambiguities and doubts, and even multiple structural decomposition possibilities that often occur when using literature GC methods.
In order to develop the models, the most complete experimental data bank up to date was gathered from literature. This includes 1239, 117, 461, 398 and 538 data points from 149, 142, 24, 37 and 98 DESs, for density, refractive index, heat capacity, speed of sound and surface tension, respectively. Each databank was divided randomly into the two groups of training (70-80%) and testing (30-20%) data sets.
An extensive and comprehensive statistical investigation of errors was carried out on the developed GC and AC models. The results were shown to be quite accurate for all of the properties, with the GC model being superior to the AC model regarding errors. In brief, the calculated values of AARD% for the proposed GC models were 1.44, 0.37, 3.26, 1.62 and 7.59% for density, refractive index, heat capacity, speed of sound and surface tension, respectively. The corresponding values for the AC models were 2.49, 1.03, 9.93, 4.52 and 7.80%. Such results are not surprising because the GC models break the molecular structure into groups, whereas the AC models divide them simply into atoms. Therefore, if the chemical formula of two or more different components are the same (for example glucose, fructose and mannose as the HBD), the AC models cannot differentiate among them, while the GC models can. The AC models are also unable to distinguish among isomers. By proposing both AC and GC models in this study, we have provided the freedom of choice between greater simplicity or higher accuracy, depending on the aims and needs and limitations of the users. The choice can therefore be different in different cases.
For all of the physical properties covered in this work, the proposed GC models showed greater accuracy than the available literature correlations. However, the proposed AC models, while being more reliable than the literature correlations for density, refractive index, and surface tension, had less accuracy in the cases of heat capacity and speed of sound.
To summarize the pros and cons of the models proposed here in comparison to those available in literature for the estimation of DES physical properties, we point to the following. With respect to the literature correlations for DESs, they are either component-specific models, or else they have been developed for very limited numbers of DESs, and so are not widely-applicable to all types of DES families. For each property of density, refractive index, heat capacity, speed of sound and surface tension, there is only on global DES model available so far in open literature, each of which has been compared here in detail by providing numeric results of their errors. These generalized literature correlations for DESs are worthy in their own right, however the models presented here can be considered preferable due to several general advantages from various perspectives, as follows: (i) In the literature correlations, the critical properties (and sometimes acentric factors) were used as input parameters, whose calculations require indirect calculations as they often cannot be measured experimentally (by first calculating these properties for the HBA and HBD components separately, and then using a mixing rule to calculate the property for the DES). This makes the calculation of the input parameters difficult and timeconsuming, while the method presented here requires no input parameters other than the groups presented in the tables, so the calculations are quite easy and fast; (ii) Furthermore, the models used for critical property and acentric factor calculations were developed for ionic liquids, not DESs, possibly resulting in high errors for these input properties when extended to DESs; (iii) A further issue is the comparison of the theoretical background of the models. The DES literature correlations are purely empirical in nature, and although they were developed for a large data bank on DESs, they are still merely empirical models. It is possible that their extrapolation to the new DESs of the future will produce high errors. However, the proposed GC/AC models are group/atomic contribution models, and in being so, they have a more solid theoretical background with respect to the purely empirical models. This is because the effects of the interactions of the various functional groups have been trained in the model development process, and therefore, they have more predictive characteristics; (iv) While the GC and AC models are both quite simple and their calculations are straightforward, the AC models in particular, are so simple that they can very easily be programmed and incorporated into software in a very straightforward manner. This is of great value in today's academic and engineering world to have models which can be easily integrated into various software; (v) One further great advantage of the models of this work, similar to all other group contributions, is their independence of any experimental measurements on the DES. This easily allows www.nature.com/scientificreports/ for screening tests of DESs without actually requiring the DES to be prepared in laboratories, eliminating cost and time. This is invaluable in a field of science which is still at the infant stage, with innumerable numbers of DESs that can be envisioned. While the above lists the advantages of the proposed models with respect to correlative approaches, it should be reminded that thermodynamic-based models can also be employed for the estimation of physical properties. However, since DESs are very complex mixtures involving hydrogen bonds, only the more elaborate and sophisticated thermodynamic models can handle such systems, so for example, the popular equations of state such as Peng-Robinson and Soave-Redlich-Kwong will render useless for DESs. Regardless, even the more thermodynamically suitable models, which are much-more cumbersome and time-consuming, are still not accurate if used in a purely theoretical (predictive) mode. Such thermodynamic models, for example the association-type equations of state, are fit to experimental data by the use of adjustable parameters which assist to reduce the errors. In this quest, while the thermodynamic models do indeed have higher predictability and extrapolative power as compared to the models presented here, this comes at the cost of losing the advantages mentioned in the previous paragraph for the proposed AC/GC models.
One further point of thought on the approach to take for physical property estimations of DESs, is the nature of DESs. In contrast to most solvents, which are pure, DESs are mixtures. Not only are they mixtures, but they are quite complex mixtures with various types if intermolecular interactions, including hydrogen bonds. This causes certain issues when attempting to model them, among which, is the choice to consider the DES as a pseudo-component or as a true mixture of two or more components. In many of the estimation models, such as global correlations and equations of state, input parameters such as the critical properties and acentric factors of the DES are required, which usually cannot be measured experimentally. If the pseudo-component approach is taken to estimate the values, the only procedure up to now, is to calculate the desired properties of the HBA and HBD components separately, followed by the use of a mixing rule to obtain, for example, the desired critical properties of the DES. This is not an ideal procedure, because the errors of the various steps build up, especially by considering the very nonideal behavior of the components in such a complex system. Unfortunately, there are still no such models available in literature. Therefore, the most serious challenge facing the pseudo-component pathway is to develop accurate models which can directly estimate the critical properties of the DES, or any other required input parameter for that matter. However, before such models become available, we suggest to avoid using correlations and semi-empirical models which use the critical properties of the DESs as their input parameters. Direct calculation models, such as the group/atomic contribution models are more suitable in this respect. Also, other models which would use only those physical properties of DESs which are experimentally measurable (such as molecular weight, density, viscosity, etc.) as their input parameters are suggested for higher accuracy. However, such methods can no longer be used for screening tests of novel envisioned DESs, while the GC/AC models can. On the other hand, the mindset of considering the DES as a true mixture of components, instead of one pseudo-component, also has its pros and cons. Such an approach is more theoretically realistic and it would be safer to use when extrapolations are called for. However, only very highly sophisticated thermodynamic approaches can handle the highly nonideal behavior of DES mixtures, i.e. detailed models that can see all the various types of physical phenomena and interactions in the hydrogen-bond networks. Furthermore, such models most often involve fitting parameters that must be optimized to experimental data. This would also prevent the use of such approaches as screening tools on DESs which have not yet been made in the labs. Furthermore, since such approaches are cumbersome and time-consuming, they are not the typical and commonplace techniques used by the research and engineering communities, and so there is the real risk that oversimplified models will be used, perhaps without realizing the extent of the risks of errors. Therefore, at the end of the day, there is still no one superior approach available and the proper choice of estimation technique is ultimately casespecific depending on the task at hand, the type and amount of information available, and the goal of estimations (for example as a screening tool). Due to all of the shortcomings and issues mentioned above, there is still much room for progress in this field and many challenges need to be overcome. However, due to exactly the variety of goals of the different users, it is urgent that all the different pathways be pursued and developed further, be it the simple engineering correlations based on physical property input, the group contribution approach which requires absolutely no physical property data, or the more elaborate approaches based on strong thermodynamic theories, such as equations of state, computational techniques, etc. Every single one of these pathways is still at its early stages for DESs and there is much room for progress in all. However, a serious obstacle in progress is the inevitable fact that DESs are only a newly-introduced category of solvents, hence, the amount of published physical property data is still insignificant compared to the number of potential DESs. This is even more serious for some of the less-investigated properties, such as speed of sound and heat capacity. The progress and accuracy of the modelling approaches go hand-in-hand with the extent and diversity of the physical property databanks. Therefore, parallel to researchers enriching the models, experimentalists need to contribute their share for true progress in the field.

Methods
The basic procedure in group contribution models is that the molecular structure of a compound is considered to be made up of a number of functional groups. Specific numeric values, known as contributions or weights, are determined for each of the groups. The contribution of each of the groups is multiplied by the number of occurrences of that group in the structure, and the resulting summation on all the groups is considered within a mathematical function specific to the desired property. This procedure is highly dependent on how the chemical structure is decomposed. For complicated compounds, decomposition is not always easy. In some group contribution methods, it is even possible that the decomposition of the structure can be carried out in more than one way, with differing functional groups, and thus resulting in different calculated values for a property. www.nature.com/scientificreports/ In addition to this, structural decomposition into groups is a decisive task which is not easily programmable in computer software. While still following the mindset of the group contribution approach, atomic contribution models alleviate both of these issues. This is because in the AC procedure, the molecule is decomposed down to its atoms. Since the type and number of occurrences of these atoms are the only input parameters of the model, there is no risk of multiple methods of decomposition, and also, the simple approach makes it quite easily programmable and software-friendly. However, while AC models are very simple, they have absolutely no way of distinguishing the position of the atoms on the structure, and so they cannot differentiate isomers, or even different compounds with the same molecular formula.   www.nature.com/scientificreports/ By considering the specific advantages and disadvantages of each of the GC and AC models, we decided to propose both models for the estimation of densities, refractive indices, heat capacities, speeds of sound and surface tensions of DESs.
The data collected on each physical property were divided randomly into the two groups of training (70-80%) and testing (20-30%). In this manner, in order to check the predictive ability of the models, a number of DESs were totally set aside and not used for development of the mathematical functionalities and the adjustable coefficients.
Various mathematical functionalities were investigated. For the sake of higher accuracy, functional groups or atoms were considered separately for the HBA and HBD structures. The GC and AC models were developed and optimized for each physical property with the aid of genetic algorithm. Equation 42 gives the objective function considered and applied to the training dataset.
where X Model i is the calculated physical property by the GC or AC model and X Exp i is the corresponding experimental value.

Data availability
The data that support the findings of this work are available from the corresponding author upon reasonable request.