Introduction

Ionic liquids (ILs) as a molten salt around room temperature have been highlighted in chemical industries because of their excellent physicochemical properties1. Especially, their replaceability of traditional organic solvents causing environmental pollutions makes them more interesting in green or sustainable chemistry2. Moreover, since they have uncountable structural diversities, the potentials of their application may be boundless. In detailed information on the properties and application of ILs is introduced by review article3.

On the other hand, ILs can play a role as toxicant in the viewpoint of toxicology, and will continue to impact on the environment due to their structural stability. Therefore, toxicological natures of ILs should be well clarified before released into the environments. So far, millions of ILs structures have been proposed and several thousands are described in literature. However, the toxicological testing of such large substance sets would be labor intensive, time and material consuming; it may be limited to all types of ILs. For extensive testing and proactive designing of the numerous IL structures, computational modelling is desirable as an alternative of experimental determinations because it is faster, safer, and less expensive. Indeed, OECD4 and REACH5 recommend the approach like quantitative structure activity relationship (QSAR) for risk management. With the motivation, several researchers have developed the theoretical models for the toxicological effects of ILs on various toxicity testing systems including water fleas6,7,8,9,10,11,12,13,14.algae12,15,16, animal cell12,15,17,18,19, bacteria12,15,20,21,22,23,24,25,26,27, and enzyme activity12,17,28,29,30,31,32. However, since the previous models have physico-chemically ambiguous parameters and each study employs different parameters according to the toxicity testing systems, it is hard to comprehensively understand the structural effects of ILs on their toxicological activities to various environmental organisms. Moreover, QSAR models for some responses e.g., algal photosynthetic activity and Hela cell lines’ growth have not been developed due to the lack of experimental toxicity data. Therefore, to complement these shortcomings, a simple linear model based on 50 types of biological responses with about 1600 data points on the toxicity of ILs was developed using unified parameters which are linear free energy relationship (LFER) descriptors. In the viewpoint of finding relationships among biological responses, the concept of this research is slightly related to previous article33, which presented that based on quantitative toxicity-toxicity relationship (QTTR), interspecies correlation between different biological responses to toxicants can be applied to predict non-existing toxicity data for a particular compound. However, the QTTR model requires experimentally measured toxicity values, and moreover it could not address the toxicological mechanisms in molecular basis. Therefore, in this study, the employed LFER descriptors as unified parameters were calculated based on in silico methods i.e., density functional theory (DFT)34, conductor-like screening (COSMO) model35, and obprop internet freeware36.

Explanation on theoretical model

For modelling, linear free energy relationship (LFER) concept was used because it consists of simple and well-defined solute descriptors (E, S, A, B, V, J, J+)37 and it has been used for predicting the toxicological effects of chemicals as previously shown38. The LFER model is:

where, SP stands for solute property. The small letters (e, s, a, b, v, j, j+, and c) are system parameters (sometimes, called as system coefficients) explaining the molecular interactions of toxicity testing system. They can be simply determined by multiple linear regression (MLR) analysis. And the capital letters are solute descriptors that describe the intrinsic molecular interaction potentials of an atom or a molecule. The meanings of the solute descriptors of a molecule or an atom are as following: E [cm3 mol−1/10] – excess molar refraction due to interaction of n- or pi- electron lone pairs; S [dimensionless] – dipolarity/polarizability by dipole-dipole and dipole-induced dipole interactions; A and B [dimensionless] – hydrogen bonding acidity and hydrogen bonding basicity; V [cm3 mol−1/100] – McGowan volume; J and J+ [dimensionless] – ionic interactions of the anion and the cation, respectively.

Since the prediction modelling was performed in the assumption of ion dissociation status, comprised of two ions i.e., cation and anion, each terms were divided into cationic and anionic part as Eq. (2).

where, the solute property includes five endpoints i.e., half maximal effective concentration (EC50), half maximal lethal concentration (LC50), half maximal inhibitory concentration (IC50), minimal inhibitory concentration (MIC) and minimal biocidal concentration (MBC) in the log units of mM. For the positive trends of system parameters on IL toxicity, the five end points were transferred into inverted logarithm e.g., log 1/EC50. The subscripts ‘c’ and ‘a’ means cation and anion, respectively. Here, the experimental values for LFER descriptors are not limited to all ions of ILs; thus they were calculated according to the methods developed by our group39.

Results and Discussion

To build the comprehensively approachable model based on various biological responses to IL toxicity, it was hypothesized that toxicity test methods have a similar response pattern and different sensitivities according to IL chemical structure and organism’s tolerance to toxicants. However, if the hypothesis is inconsistent to some toxic compound with specific action and to some testing method occupied by different molecular interactions, it shows large prediction error and/or low coefficient of determination in statistical analysis.

In order to address the degree of the sensitivity of each method based on Eq. (2), we simply amended the LFER model by adding a term zxZx as shown Eq. (3).

where, subscript ‘x’ of Z term indicates kinds (1~58) of the toxicity testing methods (see Table 1). Initially, the system parameters of the model and sensitivity terms based on toxicity values of 52 methods (No.1~52 in Table 1) were studied and the rest i.e., six systems (No. 53~58 in Table 1) was used for an example study. When performing MLR analysis, each Zx should have fixed value as +1 for a specific case while the rest should be zero (see an example in Table S2). The system parameter z means degree of toxicity testing system’s sensitivity. It can be mathematically calculated as an average of the differences between predicted values and experimental values in a specific system; but it can be automatically determined by MLR. The other system parameters of Eq. (3) (i.e., ec, sc, ac, bc, vc, j+, ea, sa, aa, ba, va, j and c) can be determined by the same step i.e., MLR. In the statistical analysis, the importance of each term for Eq. (3) was also checked for simplifying the model. The degree of the significance was judged by their probability values (p-values) estimated by MLR. Here, if the p-value is higher than 0.05, it is not significant; thus it was excluded, while if lower, it is significant and included for a model. From the analysis, it was found that three descriptors in the anion part i.e., Sa, Aa, and Ba were useless as shown that they have higher p-values than 0.05, while all cation-related terms and rests of anionic terms (Ea, Va and J) have reasonable contributions to the model because they have lower p-value than 0.05. The determined system parameters of eight selected terms that can explain a common toxic effect of ILs are as below:

Table 1 The studied biological systems (ends point) and data number (N) for modelling.

where, the system parameters including zx values were estimated based on 1633 data points of 44 toxicity testing systems. The R2 value is 0.880 and standard error is 0.465 log unit, which indicate reasonable accuracy of Eq. (4). The z values of each testing methods are given in Table 1. In Table 1, z value of leukemia rat cell line was zero because it automatically became a standard test method due to that the method has the largest number of data points among the studied systems. Here, the magnitude of z value indicates the degree of sensitivity, that is, the higher value means the higher sensitivity. The sensitivity values are given in Table 1.

In the selection of biological responses to build Eq. (4), the acceptable value of R2 was internally set to 0.6. However, in case of MBC of R. rubra (32), its R2 is slightly lower than 0.6 and its standard error is low i.e., 0.34 log unit. Thus we included MBC of R. rubra for building Eq. (4). Among the initially studied 52 methods, eight cases (No. 45~52 in Table 1) were exceptional cases as shown their large prediction errors. Those are acetylcholinesterase (45), L. minor (46), P. subcapitata (47), MBC of P. vulgaris (48), MIC of P. aeruginosa (49), MBC of P. aeruginosa (50), MIC of S. marcescens (51), and MBC of S. marcescens (52). When applying the Eq. (4) to predict log 1/EC50 values of acetylcholinesterase, large distributions of data points in a correlation between calculated and observed values were observed (see No. 45 in Figure S2). It might be due to that the toxicological interactions of ILs to the enzyme inhibition were different from other studies (No 1~44 in Table 1). Since there are not enough data points or experimental data distributions are narrow, no. 46~52 (L. minor (46) and antimicrobial testing methods i.e., MBC of P. vulgaris (48), MIC of P. aeruginosa (49), MBC of P. aeruginosa (50), MIC of S. marcesene (51), and MBC of S. marcesene (52) could not be predicted by Eq. (4) (see No. 46, 48~52 in Figure S2). In case of growth rate of P. subcupitata (48), the correlation by Eq. (4) was dependent on dataset from two different research groups i.e., Prof. Yun and coworkers40,41,42 and Prof. Pretti and coworkers43. Actually, the calculated values by Eq. (4) for the former case were well correlated with observed ones with R2 of 0.857, while those of the latter case were scattered in the correlation (see Figure S3). The exceptional cases from application domain of Eq. (4) should be individually developed. Additionally, some outliers were observed and given in supporting information 2. Generally, the outliers are large cationic molecules with long alkyl chains e.g., trihexyldecylphosphonium [P666-10]+, trihexyldodecylphosphonium [P666-12]+, and trihexylbutadecylphosphonium [P666-14]+. It was guessed that their steric effects and low water-solubility lead to lower toxic effect than those calculated by Eq. (4).

As expected, the correlation between observed and calculated toxicity values by Eq. (4) of a specific system has rather different a linear slope (see Figure S1), because each toxicity testing system have different tolerances according to IL chemical structures. Nevertheless, their linear directions or trends of data distribution between observed and calculated values were similar as we hypothesized at the beginning of this study. Their z values were mathematically calculated as an average value of the differences between measured and calculated log 1/EC50 by Eq. (4). And each linear slope (αx) and constant (βx) between observed and calculated values by Eq. (4) were determined by linearly fitting using Sigma plot and the values are given in Table 1. Additionally their fittings are shown in Figure S1 of Supplementary information 1.

The comprehensively approachable prediction method provides several advantages

First, it can simplify the toxicological meanings of several minimal toxicity of ILs using nine descriptors and three sensitive terms as Eq. (4). However, as some outliners and exceptional testing systems (45~52 in Table 1) from Eq. (4) were demonstrated, the prediction method cannot perfectly replace the experimental estimations. Nevertheless, it is helpful when doing the experimental design. Again, it is expected that Eq. (4) can be used as a starting point for estimating of toxic effects of ILs towards different endpoints of environmental organisms.

Second, the explanation on toxicological interactions of IL cation and anion identified by the model can facilitate design of new IL structures or selection of IL ions by a combination of cation and anion which already exist. Especially, the toxic effect of anion, which is one of issues in the estimation of IL toxicity, was clearly explained by three terms i.e., Ea, Va, and J. The three system parameters describe that as increasing McGowan volume and ionic interaction of anion, IL toxicity increases, while an increase of excess molar refraction of anion leads to decrease in IL toxicity. Based on anionic part in Eq. (4) i.e., −0.201 Ea + 0.418 Va + 0.131 J, the magnitude order of 57 anions’ toxic effect can be arranged (Table S3). Note that additional effects causing increase or decrease of anion toxicity e.g., hydrolysis or biodegradability were not considered. Similarly, the toxic magnitude of cations can be calculated by cationic part of Eq. (4) (2.254 Ec − 2.545 Sc + 0.646 Ac − 1.471 Bc + 1.650 Vc + 2.917 J+) as given in Table S4. In the viewpoint of molecular interactions in IL toxicity, the contribution of McGowan volume and charge interactions of cation has the same trend with those of Va and J, while Ec is opposite to Ea because in general Ea have near zero or negative values unlikely those of cations. As seen in Eq. (4), cation needs more descriptors (i.e., Sc, Ac and Bc) than anion. The H-bonding acidity term of cation (Ac) is proportional to the toxic effect of ILs, while H-bonding dipolarity/polarizability and H-bonding basicity terms of cations lead to a reduction of the toxicity.

Third, degree of the biological inhibition of ILs in toxicity test methods can be numerically explained by sensitivity-related terms i.e., zx, αx and βx shown in Table 1. It will be helpful to understand environmental aspects of ILs and select ILs for appropriate use e.g., excellent antimicrobial activity in medicinal applications or low toxic action in sustainable chemistry.

Fourth, the calculated value by Eq. (4) can be used as an indicator to correlate the predicted values with toxicity values from new toxicity testing system. Surely, it should need small number of data set to determine sensitivity related terms i.e., zx, αx and βx. Unlike real modelling steps where plenty of data points are required, the calculated value by Eq. (4) (2.254 Ec − 2.545 Sc + 0.646 Ac − 1.471 Bc + 1.650 Vc + 2.917 J+ − 0.201 Ea + 0.418 Va + 0.131 J − 0.709) can simply correlate with a few experimental dataset. For a validation test, some examples were made by correlating toxicity values from six algae species i.e., B. paxillifer, G. amphibium, C. vulgaris, O. submarina, S. marinoi, and C. meneghiniana. Each species has 10 toxicity values measured by Latała et al44,45. In results, the calculated values of ILs by Eq. (4) are well correlated with observed ones with R2 of 0.82~0.98 (Fig. 1). Their sensitivity terms were estimated and are given in 53~58 of Table 1. For validating the predicted values of test set, the mean absolute error (MAE) criteria derived by Roy et al.46. was used since R2-based metrics is sometimes strongly dependent on the distribution of data. The MAE based estimation can be performed based on an internet freeware tool ‘XternalValidationPlus’ (available at http://dtcab.webs.com/software). Detailed information including theoretical background is given in the article by Roy et al.46. The validation result using the MAE based criteria in the condition after removing 5% data with high deviations showed MAE (95% data) of 0.2302 with an indication of a good predictability.

Figure 1
figure 1

Correlations between observed log 1/EC50 of ILs to growth rate of algae species and calculated ones by the comprehensive model.

Since each fitting between observed and calculated values of ILs has different slope (αx) and constant (βx), the model can be expressed as below:

In result, the calculated values by Eq. (5) have an enhanced agreement with the measured toxicity values with R2 of 0.901 and SE of 0.426 log unit, compared to those by Eq. (4). The fitting by Eq. (5) is shown in Fig. 2.

Figure 2
figure 2

A correlation between the calculated [by Eq. (5)] and the observed toxicity values of ILs to 50 testing methods.

Conclusions

In this study, for the first time we correlated numerous toxicity data (with around 8 orders of magnitudes) of ILs (comprising of around 250 cations and 60 anions) in 58 systems to unified physicochemical (LFER) descriptors and developed a comprehensively approachable prediction method. It is therefore possible to make predictions on the toxicity to a certain test system, or even species sensitivity distributions, for nearly all possible ionic salt combinations. Moreover, by performing the modelling, the contribution degree of molecular interaction potentials of IL cation and anion to various toxicological responses was numerically estimated, and the sensitivity (including tolerance) of each testing system was valued. The prediction model includes excess molar refraction, McGowan volume, and charge interaction of cation and anion, dipolarity/polarizability, H-boding acidity and basicity of cation. It is indicating that ILs may act as separated ions. The magnitude of six terms (Ec, Ac, Vc, J+, Va, and J) related to an increases of IL toxicity, while that of the rests terms (Sc, Bc, and Ea) lead to a decrease.

It is expected that the comprehensively approachable method will provide faster, and safer toxicity estimation compared to experimental performances; thus it will be useful to efficiently manage numerous types of ILs and to design eco-friendly IL structure. Nevertheless, it needs to be further studied for check of a range of its application domains and simplifying the prediction model. Actually the Eq. (4) for intrinsic values of IL ions in a specific toxicity testing method has nine terms excluding sensitivity terms (z, αx, and βx). Moreover, since the validation of the prediction steps via Eq. (4) and Eq. (5) was performed with only growth rates of six algae species, external validation and reliability should be further examined. Furthermore, modelling for exceptional cases (e.g., acetylcholinesterase inhibition) should be made using the same descriptors to help us understand their toxic mechanisms based on the same chemical meanings.

Experimental

Database of ionic liquids and their abbreviations

For model development and some example study for the validation, half maximal effective concentration (EC50), half maximal lethal concentration (LC50), half maximal inhibitory concentration (IC50), minimum inhibitory concentration (MIC) and minimum biocidal concentration (MBC) values of ILs around 2200 data points towards 58 toxicity testing batteries (as given Table 1) were collected from literatures6,20,21,22,39,40,41,42,43,44,45,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71. The selected ILs are comprised of several head groups (i.e., piperidinium, sulfonium, melamine, morphodinium, guadinium, ammonium, phosphonium, imidazolium, pyridinium, quinolinium, and purinium with differently functionalized substitutes); which are around 200 types of cations, and 57 types of anions. The lists and abbreviations of IL ions were given in Supplementary information 1. And the collected data set were given in supporting information 2.

The studied toxicity testing methods for modeling

As listed in Table 1, total 58 biological responses to ILs’ toxicity for modelling were studied as below:

[Cell line] - Viability of three different animal cells such as leukemia rat IPC-81, MCF-7, Hela cell in different experiment conditions i.e., incubation time (24 h or 48 h) or the presence and absence of 10 percent foetal bovine serum (FBS);

[Algae] - Growth rates tests of eight algal species i.e., Scenedesmus vacuolatus, Pseudokirchneriella subcapitata, Bacillaria paxillifer, Geitlerinema amphibium, Chlorella vulgaris, Oocystis submarina, Skeletonema marino, Cycolotella meneghiniana; and photosynthetic activity of P. subcapitata;

[Water flea] - Immobilization to Daphnia magna; [Enzyme] - Inhibition of acetylcholinesterase activity; [Duckweed] - Growth response of Lamina minor;

[Bacteria and fungi] - Inhibition of luminescence (Vibrio fischeri); growth rate of gram-positive (Listeria monocytogenes L4 and Staphylococcus aureus S244) and gram-negative bacteria (Escherichia coli E149 and Aeromonas hydrophilia A97); and antimicrobial properties i.e., MIC and MBC of ILs [Rod type] Pseudomonas aeruginosa NCTN 6749, E.coli ATCC 25922, Proteus vulgaris NCTC 4635, Klebsiella pneumoniae ATCC 33495, Salmonella enteriditis, Listeria monocytogenes, Serratia marcescens ATCC 8100; [Bacillus type] Bacillus subtillis ATCC 6633; [Cocci type] Staphylococcus epidermidis ATCC 12228, Staphylococcus aureus ATCC 6538, S. taphulococcus aureus (MRSA) ATCC 43300, Enterococcus hirae ATCC 10541, Micrococcus luteus ATCC 9341, Enterococcus faecalis ATCC 29212, Moraxella catarrhalis ATCC 25238; [Yeast like fungi] - Candida albicans ATCC 10231, Rhodotorula rubra PhB, Candida glabrata DMS 11226, Candida tropicalis KKP 334, Saccharomyces cerevisiae ATCC9763, Saccharomyces cerevisiae JG, Saccharomyces cerevisiae JGCDR1, Geotrichum candidum, and Rhodotorula mucilaginosa.

Modelling method

Modelling steps can be shortly described as below:

  1. 1

    Theoretical model was developed based on LFER concept (Eq. 2) where the descriptors of cation and anion were separately provided since two ions may act in dissociated status. In the model, a parameter was further added as a dummy variable Z (i.e., +1) of each testing method for estimating sensitivity coefficient (z) as Eq. (3). (see Table S2).

  2. 2

    After inserting the calculated descriptors of cations & anions and dummy variable of each biological system to Eq. (3), multiple linear regression was performed to analyze the relevance of the employed descriptors to toxicity values, by checking their p-value. After excluding useless descriptors with higher p-value than 0.05, MLR analysis was again performed. From this step, the system coefficients of the selected descriptors and the sensitivity value (z) of each method were determined as Eq. (4).

  3. 3

    Based on the system coefficients including z of Eq. (4), an intrinsic value of ILs in specific toxicity testing system was calculated.

  4. 4

    The calculated intrinsic values of ILs were correlated with experimentally measured toxicity values to determine the sensitivity-related terms (αx and βx).

Computational details

For calculation of LFER descriptors, several sub-parameters are needed. To obtain the subparameters, targeted IL structures are calculated by using density functional theory (DFT)34 and conductor-like screening model (COSMO)35 in the Turbomole program package (version 5.10)72. First, reasonably starting IL structure were optimized by calculation of (RI-)BP86/SV(P)73,74,75,76 in gas phase. The vibration frequencies of each structure were calculated by using AOFORCE77,78. They were further refined with the TZVP basis set79, and then a full optimization was performed with inclusion of COSMO35. Those calculations gave us the.ccf of each IL structure, and the files were read to obtain the sub-parameters by COSMO-RS80 based on BP-TZVP-C21-0108 parameterization. For molecular reflectivity of molecule, we used obprop internet freeware36. The chemical meaning of sub-parameters and the calculation methods for LFER descriptors using the calculated sub-parameters are given in Supplementary information 1. The values are given in supporting information 2 (excel file).

Statistical analysis

Multiple linear regressions were performed by SPSS 12.0.1 for Windows, and fitting was done by Sigma-Plot for Windows versions 10.0.

Additional Information

How to cite this article: Cho, C.-W. et al. Comprehensive approach for predicting toxicological effects of ionic liquids on several biological systems using unified descriptors. Sci. Rep. 6, 33403; doi: 10.1038/srep33403 (2016).