Introduction

The separation-based equipment has always been an accompanied part of chemical processes1, pharmaceutical industries2, water/wastewater treatment processes3,4,5, and environmental protection6. The separation process primary responsibility is to remove contaminants from feed stocks and effluent liquid and gas streams, purify final products, and recover unreacted materials7. Although diverse separation processes have already been established, the solvent-based technique has a wider range of applications.

The green chemistry principles introduced by Anastas and Warner suggested constructing chemical processes that eliminate or at least reduce utilizing/generating harmful substances8. Since the traditional organic solvents undesirably impact the ecosystem and human health, researchers have paid great attention to synthesizing green, sustainable, and environmentally-friendly solvents. Attempts to fulfill this objective have resulted in suggesting supercritical fluids9, renewable solvents10, liquid polymers11, and ionic liquids12.

Deep eutectic solvents (DES) have recently been recommended as materials that have the favorable features of ionic liquids and cover their undesirable characteristics13,14,15,16. Deep eutectic solvents have readily been synthesized by mixing two main agents, i.e., hydrogen bond donor (HBD) and hydrogen bond acceptor (HBA)17. Hydrogen bond formations between HBA and HBD resulted in synthesizing a mixture with a melting point highly smaller than its ingredient17. Generally, deep eutectic solvents are biodegradable, inexpensive, non-toxic, non-volatile, thermally/chemically stable, and easy to manufacture17.

Despite a short life of deep eutectic solvents, they have engaged in diverse applications, including material synthesis18, separation processes13, nanotechnology19, environmental protection17, biotechnology20, and pharmaceutical processing21.

The accurate values of volumetric properties of deep eutectic solvents, like density, are essential for feasibility study and detailed design of any possible industrial usages of DESs22. Moreover, selecting an appropriate DES with the desired density is an arduous task to be accomplished through laboratory-scale investigations.

Therefore, constructing a predictive tool to anticipate the density of deep eutectic solvents may be helpful in this regard. Although a few empirical correlations have technically been built for estimating the density of DESs (see Sect. 3.1), to the best of our knowledge, no intelligent scheme has been suggested yet. Hence, this work utilizes five intelligent schemes for calculating the DES’s density from some available variables, i.e., temperature and DES’s inherent features (acentric factor and critical pressure and temperature) and compares their prediction accuracy. This is the most comprehensive modeling study yet conducted for mechanizing the DES’s characterization. The databank includes massive laboratory-scale density measurements gathered from the literature to certify that the suggested paradigms are general and robust. The reliability of the constructed intelligent estimators is higher than the other correlations proposed in the literature.

Laboratory-measured datasets

The objective of the current study is constructing an intelligent tool to approximate the density of deep eutectic solvents precisely. Identical to the regression-based correlation23, all intelligent methods also need a laboratory-measured database to adjust their parameters and test their prediction reliability24,25. Thus, 1239 experimentally measured datasets for the density of deep eutectic solvents have been gathered from thirty references and engaged in the model development/validation stage. The summary of the collected density data has been presented in Table 1. This table introduces the name of hydrogen bond donors and hydrogen bond acceptors of the considered deep eutectic solvents. As Table 1 shows, the gathered databank includes thirteen HBA and forty-two HBD ingredients. This table also indicates the number of measurements and ranges of the working temperature and measured density.

Table 1 Summary of the reported laboratory-measured density for diverse deep eutectic solvents in the literature.

Critical pressure, critical temperature, and acentric factor

This study aims to build a single model to anticipate the density of 149 various deep eutectic solvents. Therefore, it is mandatory to include inherent characteristics of these materials in the list of independent variables. The three-parameter corresponding state theory explains that each material has its own specific acentric factor, critical temperature, and critical pressure26. Hence, these parameters could help the machine learning method distinguish different deep eutectic solvents and discriminate among their density values27. Haghbakhsh et al.28 utilized the improved Lydersen-Joback–Reid group contribution12 and the Lee-Kesler mixing rules29 to estimate acentric factor and critical temperature/pressure of different deep eutectic solvents.

Table 2 presents the range of these inherent characteristics for all considered deep eutectic solvents28. The supplementary excel files includes all experimental databank utilized in the current study.

Table 2 The reported critical pressure, critical temperature, and acentric factor for the considered deep eutectic solvents28.

In order to reduce the table size, the reported values have been presented for deep eutectic solvents based on their hydrogen bond acceptor type. Specific values of the acentric factor, critical temperature, and critical pressure for each deep eutectic solvent can be found in Haghbakhsh et al. article28.

Estimation scenarios for density of deep eutectic solvents

The literature has suggested several empirical correlations for estimating the liquid’s density. Furthermore, the current study focuses on seven machine learning methods to anticipate the density of 149 deep eutectic solvents. The mathematical formulation/background of the available empirical correlations and machine learning methods has been briefly reviewed in this section.

Empirical correlations

Rackett correlation

Rackett’s correlation is likely the first equation developed to calculate the saturated liquid’s density54. As Eq. (1) explains, the molar volume (\(\nu\)) is estimated as a function of temperature (T) and critical pressure (Pc), molar volume (\(\nu_{c}\)), and temperature (Tc). R and Tr show the gas constant and reduced temperature (Eq. 2), respectively.

$$\nu = \left( {RT_{c} /P_{c} } \right)\left( {P_{c} \nu_{c} /RT_{c} } \right)^{{1 + \left( {1 - T_{r} } \right)^{0.2857} }}$$
(1)
$$T_{r} = T/T_{c}$$
(2)

Equation (3) is then possible to be used to reach the density (\(\rho\)) from the molecular weight (M) and estimated molar volume.

$$\rho = M/\nu$$
(3)

Although Rackett’s correlation was initially suggested for the saturated liquid’s density, it has also presented good predictions for the deep eutectic solvent55.

Spencer and Danner correlation

Spencer and Danner incorporate a base molar volume measurement (\(\nu_{ref}\)) at a base temperature (\(T_{ref}\)) in Rackett’s correlation56. Equations (4) and (5) introduce the modified Rackett model, i.e., the Spencer and Danner correlation.

$$\nu = \nu_{ref} Z^{{\left( {1 - T_{r} } \right)^{{0.2857 - \left[ {1 - \left( {T_{ref} /T_{c} } \right)} \right]^{0.2857} }} }}$$
(4)
$$Z = \left( {\nu_{ref} P_{c} /RT_{c} } \right)^{{\left[ {1 + \left( {1 - \left( {T_{ref} /T_{c} } \right)} \right)^{0.2857} } \right]^{ - 1} }}$$
(5)

Mjalli et al. correlation

Mjalli et al.57 suggested a technical correlation for the density of deep eutectic solvents by reformulating the Spencer and Danner model. Equations (6) and (7) express the mathematical shape of the developed correlation by Mjalli et al.57.

$$\nu = \nu_{ref} Z^{{\left( {1 - T_{r} } \right)^{{2.2857 - \left[ {1 - \left( {T_{ref} /T_{c} } \right)} \right]^{2.2857} }} }}$$
(6)
$$Z = \left( {\nu_{ref} P_{c} /RT_{c} } \right)^{{\left[ {0.2083 + \left( {T_{ref} /T_{c} } \right)} \right]^{2.2857} }}$$
(7)

Haghbakhsh et al. correlation

Haghbakhsh et al. recently proposed a correlation for calculating the density of deep eutectic solvents from the working and critical temperatures, acentric factor (ω), and critical molar volume28.

$$\rho = \alpha - 4.64 \times 10^{ - 4} \times T$$
(8)
$$\alpha = - 1.13 \times 10^{ - 6} \times T_{c}^{2} + 2.566 \times 10^{ - 3} \times T_{c} + 0.2376 \times \omega^{0.2211} - 4.67 \times 10^{ - 4} \times \nu_{c}$$
(9)

It can be seen that all empirical correlations utilize the temperature and inherent characteristics of the material (a combination of the νc, Pc, Tc, and ω) to formulize the liquid’s density. Since the first three inherent properties (Tc, Pc, and νc) are related through the following equation, it is unnecessary to utilize all of them.

$$P_{c} \times \nu_{c} = R \times T_{c}$$
(10)

Therefore, the current study only utilizes temperature, Tc, Pc, and ω to estimate the DES’s density employing different intelligent estimators (Eq. 11).

$$\rho_{pred}^{DES} = f\left( {T,P_{c}^{DES} ,T_{c}^{DES} ,\omega^{DES} } \right)$$
(11)

Computational intelligent methods

Wide ranges of supervised and unsupervised artificial intelligence techniques have been suggested and applied in different modeling studies58,59,60,61,62,63. The working procedures of the used machine learning methods, i.e., least-squares support vector regression (LSSVR), hybrid neuro-fuzzy system, and five types of artificial neural networks have been briefly explained in this section.

Least-squares support vector regression

This intelligent estimator employs a particular equation (i.e., linear, Gaussian, and polynomial kernel function) to transfer original independent variables (\(\xi\)) to a multi-dimensional computational domain. The following equation defines these functions.

$$\varphi \left( {\xi_{i} ,\xi_{j} } \right) = \left\{ {\begin{array}{*{20}l} {\xi_{i}^{T} \xi_{j} } \hfill & {Linear} \hfill \\ {\left( {\xi_{i}^{T} \xi_{j} +\varepsilon } \right)^{\sigma } } \hfill & {Polynomial} \hfill \\ {\exp \left( { - \left\| {\xi_{i} - \xi_{j} } \right\|^{2} /2\delta^{2} } \right)} \hfill & {Gaussian} \hfill \\ \end{array} } \right.$$
(12)

The superscript of T shows the transpose operation. In addition, \(\varepsilon\), \(\sigma\), and \(\delta\) are the kernel-related parameters.

It is then possible to linearly relate the dependent (\(\gamma\)) to the independent (\(\chi\)) variables in this new computational domain utilizing Eq. (13).

$$\gamma_{LSSVR} (\chi ) = w^{T} \varphi \left( \chi \right) + b$$
(13)

In Eq. (13), \(\gamma_{LSSVR}\) represents the estimated target by the least-squares support vector regression. Furthermore, w and b are adjustable coefficients of this intelligent model. In summary, the kernel type is the main topology feature of the LSSVR that should be determined by a practical scenario like the trial-and-error process64.

The detailed working process of the least-squares support vector machine has recently been explained by Nabavi et al.64.

Artificial neural networks

This neuron-based machine learning method is the most widely-used tool as either estimator65,66 or classifier67. The working process of the artificial neural network is handled by a combination of linear (LPart) and non-linear (NLPart) operations conducted by the neuron as follows68:

$$LPart = \vec{w}\vec{\xi } + b$$
(14)
$$NLPart = \phi \left( {\vec{w}\vec{\xi } + b} \right)$$
(15)

w, b, and \(\phi\) are weight and bias coefficients and activation function, respectively. Although a linear activation function exists, the non-linear, continuous, and differentiable ones often provide artificial neural networks with a better generalization ability69. Equation (16) defines several widely-used activation functions in the field of artificial neural networks.

$$\phi \left( {LPart} \right) = \left\{ {\begin{array}{*{20}l} {LPart} \hfill & {Linear} \hfill \\ {\frac{1}{{1 + \exp \left( { - LPart} \right)}}} \hfill & {Logarithm\,sigmoid} \hfill \\ {\frac{2}{{1 + \exp \left( { - 2\, \times LPart} \right)}} - 1} \hfill & {Tangent\,sigmoid} \hfill \\ {\exp \left( { - LPart^{2} /2\delta^{2} } \right)} \hfill & {Gaussian} \hfill \\ \end{array} } \right.$$
(16)

Different artificial neural networks can be built by inserting neurons in several successive neuronic layers. The multilayer perceptron70, recurrent71,72,73, cascade feedforward70, radial basis function70, and general regression74 neural networks are those neuron-based estimators utilized in the current study. Interested readers are referred to the book written by Hagan et al. for the detailed understanding of the working procedure of these artificial neural networks75.

Hybrid neuro-fuzzy systems

The idea of combining the artificial neural network76,77 and fuzzy logic78,79 has resulted in a new class of machine learning, namely adaptive neuro-fuzzy inference system80,81. This method estimates a target response employing five successive layers (i.e., fuzzification, rule, normalization, defuzzification, and output)82. Shojaei et al. have comprehensively described the mathematical operations performed in each layer of the adaptive neuro-fuzzy inference system82. The membership function utilized in the fuzzification layer83, numbers of the cluster80, cluster radius84, and training algorithm25 are the main structural features that are often regulated by the trial-and-error scenario.

Results and discussions

This section comprehensively explains the followed procedure to choose the best intelligent method for estimating the DES’s density and determining its structural features. The accuracy of this smart approach and available correlations in the literature has then been compared. Several numerical and graphical analyses have also been employed for further monitoring the accuracy of the best model for predicting the density of deep eutectic solvents.

Constructing intelligent models

Topology determination

The topology of machine learning methods is often determined by trial-and-error practice85,85,87. This practical scenario changes the core features of a machine learning scheme and monitors its accuracy in diverse stages of the model development88,88,90. Table 3 specifies the core features of the considered intelligent techniques and their investigation range during the trial-and-error procedure. The literature approved that artificial neural networks with one hidden layer are accurate enough to simulate a wide range of problems72,91,91,93. Consequently, the multilayer perceptron (MLP), recurrent (RNN), cascade feedforward (CFF), general regression (GR), and radial basis function (RBF) have been fabricated with only one hidden layer.

Table 3 The name and range of deciding features of each intelligent estimator during the trial-and-error process.

Selecting the best topology of the intelligent methods

The core features of the machine learning methods have been changed according to the reported values in Table 3, both training and testing stages have been performed, and accuracy has been monitored utilizing several statistical indexes. Various uncertainty criteria, including MAPE (mean absolute percentage error), RMSE (root mean square error), RAPE (relative absolute percentage error), MAE (mean absolute error), and R2 (regression coefficient), have been utilized to accuracy monitor of the developed intelligent scenarios and selecting the most precise ones.

Equations (17) to (21) express the mathematical shapes of the MAPE, MAE, RAE, RMSE, and R2, respectively.

$$MAPE\% = \left( {100/n} \right) \times \sum\limits_{i = 1}^{n} {\left( {\left| {\rho_{\exp } - \rho_{pred} } \right|/\rho_{\exp } } \right)}_{i}$$
(17)
$$MAE = \left( {1/n} \right) \times \sum\limits_{i = 1}^{n} {\left| {\rho_{\exp } - \rho_{pred} } \right|}_{i}$$
(18)
$$RAPE\% = 100 \times \sum\limits_{i = 1}^{n} {\left| {\rho_{\exp } - \rho_{pred} } \right|_{i} /\sum\limits_{j = 1}^{n} {\left| {\rho_{\exp } - \rho_{\exp }^{ave} } \right|_{i} } }$$
(19)
$$RMSE = \sqrt {\sum\limits_{i = 1}^{n} {\left( {\rho_{\exp } - \rho_{pred} } \right)}_{i}^{2} /n}$$
(20)
$$R^{2} = 1 - \left\{ {\sum\limits_{i = 1}^{n} {\left( {\rho_{\exp } - \rho_{pred} } \right)_{i}^{2} /\sum\limits_{i = 1}^{n} {\left( {\rho_{\exp } - \rho_{\exp }^{ave} } \right)_{i}^{2} } } } \right\}$$
(21)

These equations only need the actual (\(\rho_{\exp }\)), predicted (\(\rho_{pred}\)), and average (\(\rho_{\exp }^{ave}\)) density values and numbers of the dataset (n) to measure the accuracy of any constructed model.

The most precise density estimations obtained by each machine learning method have been reported in Table 4. The accuracy monitoring approves that 1) the Gaussian function is the best kernel for LSSVR, 2) eleven hidden neurons is the best feature for the MLP, 3) ten hidden neurons provides the CFF with the best performance, 4) spread factor of 0.04312 and 1053 hidden neurons should be used in the GR structure, 5) the RBF is better to construct by spread factor of 1.0526 and eleven hidden neurons, and 6) the ANFIS (adaptive neuro-fuzzy inference systems) with the subtractive clustering membership function, twelve clusters, and hybrid training algorithm has the best performance.

Table 4 The most precise prediction obtained by different intelligent estimators (1053 training and 186 testing datasets).

Although all these prediction accuracies confirm a high level of consistency with the laboratory-measured density, the LSSVR and RBF neural network present the highest and lowest precise results, respectively. For systematical approving this claim, the subsequent analysis has ranked these selected intelligent models based on their prediction accuracy in different stages of model development.

Selecting the best intelligent model using the ranking analysis

The ranking analysis is a well-established procedure to arrange several models based on their performance. The previous step measured the prediction ability of the seven selected intelligent models using five well-known statistical indexes. Now the ranking analysis utilizes the numerical values of these statistical indexes to arrange them from the best to the worst model. Equation (22) indicates that the selected models have been ranked based on their average rankings over five statistical criteria (indx).

$$Rank = round\left( {\sum\limits_{indx = 1}^{5} {Rank_{indx} } /5} \right)$$
(22)

This ranking analysis has been separately applied to the model’s performances during the learning and testing stages. Furthermore, the rank orders of the chosen intelligent models have also been tracked over the whole 1239 datasets. Figure 1 displays the rank order of the LSSVR, artificial neural network models (i.e., MLP, RNN, RBF, CFF, and GR), and ANFIS over three different databases. It can be easily inferred that the LSSVR with the three first ranking places and the RBF neural network with the three seventh ranking places are the best and worst tools for calculating the density of deep eutectic solvents. The ranking order of other constructed models has also displayed in this figure.

Figure 1
figure 1

The ranking order of the intelligent estimators in different stages of the model development.

In summary, it can be claimed that the LSSVR equipped with the Gaussian kernel function is the most trustful model for calculating the density of deep eutectic solvents from temperature and inherent characteristics (i.e., ω, Tc, and Pc) of the involved substance. This highly accurate model anticipates the density of 1239 deep eutectic solvents with the MAPE = 0.26%, MAE = 2.94, RAPE = 4.06%, RMSE = 5.65, and R2 = 0.99798.

Validation stage

The LSSVR versus empirical correlations

The accuracy of the suggested LSSVR and four empirical correlations in the literature (Rackett, Spencer and Danner, Mjalli et al., and Haghbakhsh et al.) for estimating 1239 densities of the deep eutectic solvent has been compared in the current section. The results of this comparison utilizing the MAPE have been described in Fig. 2. The observed results confirm that the LSSVR is the most accurate tool for estimating the density of deep eutectic solvents. The LSSVR anticipates 1239 density samples of 149 deep eutectic solvents with the MAPE = 0.26%, while the most accurate empirical correlation (Spencer and Danner model) presents the MAPE = 1.02% for an entirely similar database. The suggested LSSVR improves the best previously achieved accuracy by more than 74%.

Figure 2
figure 2

The prediction accuracy of the LSSVR and four empirical correlations in the literature28 to estimate the DES’s density of a completely similar database.

Validation using graphical inspections

The anticipated densities by the LSSVR (\(\rho_{LSSVR}\)) versus their counterpart experimental values (i.e., cross-plot) have been shown in Fig. 3. This cross-plot separately presents the LSSVR predictions for both learning and testing steps. Two straight lines associated with the relative deviation percent (RD%) of − 2% and + 2% have also been added to this figure. Equation (23) expresses the formula of the RD%.

$$RD\% = 100 \times \left[ {\left( {\rho_{\exp } - \rho_{LSSVR} } \right)/\rho_{\exp } } \right]_{i} \;\;\;i = 1,2, \ldots ,n$$
(23)
Figure 3
figure 3

The consistency between experimental values of DES’s density and the LSSVR’s prediction.

Figure 3 displays that about ten density samples have been anticipated with the RD% of lower than − 2% and higher than + 2%. The excellent ability of the built LSSVR to estimate the density of deep eutectic solvents can be readily approved by this observation.

The kernel density estimation is a reliable method for visually inspecting the compatibility between a given variable’s actual and anticipated values. As Fig. 4 shows, this method depicts the cumulative distribution function (CDF) as a function of the experimental values of a given variable. Figure 4A–C illustrate the compatibility between actual and anticipated density values over the training and testing subdivisions and the whole database. Excluding the intermediate values of the DES’s density, a remarkable consistency can be seen between actual and predicted values. Moreover, it can be detected that both the experimental data and the LSSVR predictions have a standard Gaussian distribution shape.

Figure 4
figure 4

Utilizing the kernel density estimation method to check the LSSVR validity in the training (A) and testing (B) stages and against whole the databank (C).

The magnitude of difference between actual and predicted densities (the residual error, i.e., RE) is another statistical index applied to monitor the prediction accuracy of the built LSSVER. The mathematical expression of the RE is given in Eq. (24).

$$RE_{i} = \left( {\rho_{\exp } - \rho_{LSSVR} } \right)_{i} \;\;\;i = 1,2, \ldots ,n$$
(24)

Based on reported results in Fig. 5, 61% of the available samples have been estimated with a residual error of less than 2 kg/m3. Moreover, the LSSVR successfully anticipated 84% of the experimental databank with an RE of lower than 5 kg/m3. Only 16% of the gathered database has been estimated with a residual error of higher than 5 kg/m3. All these observations confirm the excellent compatibility between calculated densities by the LSSVR and their related actual measurements.

Figure 5
figure 5

The cumulative frequency of the residual error (RE) of the LSSVR for estimating the DES’s density.

Checking the reliability of the gathered database

The gathered experimental data had a central role during the development/validation/selection of machine learning methods hereinbefore. Furthermore, this experimental databank has been used to compare the accuracy of empirical correlations and the selected LSSVR. The entire previous findings are valid only if the gathered laboratory-measured densities have an acceptable validity level. The leverage is a well-trusted technique to detect both valid and outlier data in an experimentally-measured database94. This technique plots the standardized residuals (SR) against the Hat index to accomplish its duty89. Equation (27) explains that the SR can be obtained by dividing the average value (\(RE^{ave}\)) and standard deviation (SD) of the residual error. Equations (25) and (26) give the \(RE^{ave}\) and SD formula, respectively.

$$RE^{ave} = \left( {1/n} \right) \times \sum\limits_{i = 1}^{n} {RE_{i} }$$
(25)
$$SD = \sqrt {\sum\limits_{i = 1}^{n} {\left( {RE_{i} - RE^{ave} } \right)}^{2} /n}$$
(26)
$$SR_{i} = RE_{i} /SD\;\;i = 1,2,...,n$$
(27)

Furthermore, numerical values of the Hat index (HI) can be reached by applying Eq. (28) on the matrix of the independent variables (\(\xi\))95. The superscripts of T and -1 stand for the transpose and inverse operations, respectively.

$$HI = \xi \left( {\xi^{T} \xi } \right)\xi^{ - 1}$$
(28)

Figure 6 shows the plot of SR versus the HI values associated with the DES’s density databank. The leverage method states that the region bounded by the -3 < SR <  + 3 and HI lower than the critical leverage is valid, and all other positions are the suspect domain96. Equation (29) helps calculate the critical leverage (CL) from the number of independent variables (NIV) and experimental data points (n)83,95. Having four independent variables and 1239 data points, the CL equals 0.0121.

$$CL = 3 \times \left( {NIV + 1} \right)/n$$
(29)
Figure 6
figure 6

The results of applying the leverage method on the gathered density databank.

The leverage method approves that 1210 out of 1239 data points have appeared in the valid zone, and only 29 density samples may be outlier measurements. It can be claimed that the validity of the gathered database has been approved now, and all previous findings based on this databank are trustful.

LSSVR accuracy for density predicting each deep eutectic solvent

It may be a good idea to monitor the prediction accuracy of the LSSVR against the deep eutectic solvents with the same HBA agent. Since the average relative deviation (Eq. 30)97 clarifies both underestimated and overestimated predictions, it has been selected to measure the LSSVR accuracy in this stage.

$$ARD\% = \left( {100/n} \right) \times \sum\limits_{i = 1}^{n} {\left[ {\left( {\rho_{\exp } - \rho_{LSSVR} } \right)/\rho_{\exp } } \right]}_{i}$$
(30)

Figure 7 states that the density of thirteen classes of the deep eutectic solvent with the HBA#1 to HBA#13 (see Table 1) has been estimated with the ARD ranges from − 0.24 to + 0.17%. Those deep eutectic solvents having the HBA #1, 9, and 13 have been underestimated by the LSSVR. On the other hand, the DESs with the HBA #3, 5, and 12 have been overestimated. The ARD% associated with the other deep eutectic solvent classes is almost equal to zero.

Figure 7
figure 7

The observed average relative deviation between actual and predicted densities of the deep eutectic solvent with the same HBA agent.

Investigating the effect of temperature, and HBD/HBA types

The effect of temperature on the density of deep eutectic solvents with the specific HBA agent (i.e., Choline chloride) and different HBD substances can be deduced from Fig. 8. This figure reports both experimentally-measured densities and their counterparts simulated values by the LSSVR. This figure readily justifies an excellent agreement between experimental and predicted density values. The LSSVR effectively discriminates between the effect of HBD type and working temperature on the density of the Choline chloride-based DESs and accurately estimates all distinct data points. Like the conventional liquid, the density of deep eutectic solvents decreases by increasing the working temperature. Increasing the intermolecular void volume in the DES’s body by increasing the temperature has been introduced as responsible for this observation98.

Figure 8
figure 8

The excellent performance of the LSSVR model for correctly identifying the HBD effect on the density of Choline chloride as an HBA.

The density variation of deep eutectic solvents with the temperature and HBA type has been exhibited in Fig. 9. All DESs in this analysis have glycerol as their HBD agent. A high level of compatibility between actual density values and their counterparts estimated by the LSSVR can be seen in Fig. 9. The LSSVR distinguishes the effect of HBA type and temperature on the DES’s density and accurately anticipates all individual density data points.

Figure 9
figure 9

Monitoring the ability of the LSSVR model to anticipate the HBA effect on the density of the glycerol as an HBD.

Simple flowchart of our study

A simple and understandable flowchart for the stages followed in the current research study has been presented in Fig. 10. This figure can be broken down into four distinct parts as follows:

  1. 1.

    Developing machine learning methods

  2. 2.

    Comparing accuracy performances of the machine learning methods and empirical correlations

  3. 3.

    Selecting the model with the highest prediction accuracy

  4. 4.

    Utilizing the model chosen for further analyzing purposes

Figure 10
figure 10

A simple flowchart for explaining the stages followed in the present study.

Conclusion

The accuracy of seven machine learning methods and four empirical correlations has been compared to find the highest accurate tool for estimating the density of 149 deep eutectic solvents. Huge performed statistical analyses proved that the least-squares support vector regression equipped with the Gaussian kernel function is more accurate than the other methods investigated. This suggested scheme predicted 1239 experimentally-measured densities with the MAPE = 0.26%, MAE = 2.94, RAPE = 4.06%, RMSE = 5.65, and R2 = 0.99798. Visual inspections utilizing the cross-plot, kernel density estimation, residual error, and average relative deviation have also justified a high level of compatibility between LSSVR predictions and their experimentally-measured counterparts. Investigating the experimental database employing the leverage technique resulted in founding 1210 valid and 29 suspect information. Furthermore, the fabricated LSSVR successfully infers the effect of temperature and HBA and HBD types on the density of the deep eutectic solvent. The current research may be viewed as an initiative towards constructing reliable models for anticipating DESs properties. Such a model promotes an efficient solvent synthesis that can help design and simulate new processes utilizing the DES.