Introduction

Metal(M)-ligand(L) complexes are one of the most important compounds in modern industry, such as electro-/electroless plating1, selective separation of rare or toxic elements2,3, drug design4, and analytical chemistry5. Among various properties of M-L complexes, their stability constants in an aqueous solution, which imply the binding strength between M and L, play an essential role in those industrial fields. For example, since the stability constants determine the concentration of free metal cations in the solution, they affect the quality of the plating film and the process efficiency of separating target metals. In the solution with a mixture of M and L, M-Ln complexes are formed through step-by-step ligand addition to the metal cation as follows:

$$ \begin{array}{*{20}c} {M + L \leftrightarrow M - L \Rightarrow K_{1} = \frac{{\left[ {{\text{M - L}}} \right]}}{{\left[ {\text{M}} \right]\left[ {\text{L}} \right]}},} \\ \end{array} $$
(1)
$$ \begin{array}{*{20}c} {M - L_{{n - 1}} + L \leftrightarrow M - L_{n} \Rightarrow K_{n} = \frac{{\left[ {{\text{M - L}}_{{\text{n}}} } \right]}}{{\left[ {{\text{M - L}}_{{{\text{n - 1}}}} } \right]\left[ {\text{L}} \right]}},} \\ \end{array} $$
(2)

where Kn corresponds to the equilibrium constant. Using Eqs. (1) and (2), the n-th overall stability constant βn is defined as:

$$ \begin{array}{*{20}c} {\beta_{n} = \log K_{1} \times \cdots \times K_{n} = \log \frac{{\left[ {{\text{M - L}}_{n} } \right]}}{{\left[ {\text{M}} \right]\left[ {\text{L}} \right]^{n} }} .} \\ \end{array} $$
(3)

Furthermore, βn intrinsically depends not only on the constituent elements of the ligand but also on its molecular structure. Considering an enormous number of M-L combinations in the chemical space, it is impractical to perform measurements of the overall stability constants for all candidates to find promising ligands. Therefore, there has been a great need for efficient methods predicting stability constants of arbitrary M-L pairs to accelerate either the design or screening of ligands for specific metals.

Over the past decades, machine learning approaches have been employed to predict various properties of M-L complexes, such as the spin-state splitting6 and the volcano plot7. In general, there are two ways of predicting the properties of M-L complexes by machine-learning techniques: using the features calculated from the M-L complex itself, which are usually derived from the first principles calculation, or using the features calculated from M and L. Because it is not obvious what three-dimensional molecular structure the M-L complex will form in an aqueous solution, most of the machine-learning studies aiming to predict overall stability constants were developed by compositional and/or topological features of metals and ligands8,9,10,11,12,13,14,15,16,17,18. Here, details of previous works, which are also issues to be resolved in this study, are summarized. First, the variety of cations needs to be expanded because most of the previous reports covered a limited set of less than 20 metals. Second, a machine-learning model that predicts multi-order βn needs to be developed because previous studies focused mainly on β1. Third, the regression models in the previous works cannot conduct Bayesian optimization, which is a powerful technique to find the optimum candidate19,20. Since the Bayesian optimization requires both the predicted value and predicted variance to choose the promising condition, Gaussian process regression (GPR) is the most suitable. GPR is one of the nonlinear and nonparametric regression algorithms and has been used to derive not only material and molecular properties but also force fields for molecular dynamics simulation19. To date, there is no report on developing GPR models for predicting stability constants. Forth, the interpretability of the machine-learning model needs to be improved. If we evaluate the relevance of both cation and ligand properties on overall stability constants, the results can be compared with physical understanding. Although Chaube et al. reported the feature importance of both cations and ligands through the analysis of their machine-learning models, such as random forest feature importance and permutation importance, none of the cation features were even in the top 10, despite βn being determined by the interaction between cation and ligand8. Moreover, to our knowledge, it remains unclear what kinds of properties are critical for multi-order βn. Thus, quantitatively predicting the overall stability constants of arbitrary M-L pairs in the diverse chemical space remains a challenge.

In this work, we overcome the above four obstacles. We collected experimental results for overall stability constants from existing publications to prepare an extremely large training dataset containing 19,810 data points. This original dataset is composed of two sub-datasets: one has 13,559 data points for β1 of 57 cations and the other one has 6251 data points for multi-order βn (n = 2–6) of 50 cations. Using compositional and topological features of both cations and ligands as the descriptor, we trained a GPR model for predicting β1. Subsequently, we developed another GPR model for predicting multi-order βn by employing the predicted β1 values of the corresponding M-L pairs as one of the features. To improve the interpretability of our models, we performed a sensitivity analysis. Consequently, it was found that electrical features, such as electronegativity and ionic properties, of both cations and ligands are the most important for predicting β1. Furthermore, the predicted β1 value was found to have the strongest relevance to predicting multi-order βn of the corresponding M-L pair. Note that these results are consistent with the physical understanding of the complex formation. Finally, the GPR models exhibited high generalizability for ligands for which data were not contained in the training datasets and those located near the edge of the applicability domain. Our machine-learning modeling and analysis provide novel insights for complex formation and are expected to provide a pathway to accelerating molecular design and screening for various applications.

Results

Visualization of the initial dataset

Details on how the initial dataset was prepared are described in the Methods section. As one of the descriptions for the chemical space, Fig. S1 shows the distribution of the molecular weights of the ligands. To our knowledge, there is no previous study on the prediction of overall stability constants using such a large dataset (19,810 data points containing 57 cations). Due to the increased number of data points and cation species, the generalizability of the machine-learning model is expected to improve. Figure 1a summarizes the total number of entries for each cation. Note that our dataset encompasses diverse metals, including alkali metals, alkaline-earth metals, noble metals, transition metals, and rare-earth metals. One can see that there is a large amount of data for Cu2+, Ni2+, Zn2+, Co2+, Cd2+, Ag+, and Ca2+, accounting for 50% of the total data. Figure 1b shows the distribution and total numbers of data, cations, and ligands for each βn. As shown in Fig. 1b, although there are a lot of experimental results up to β4, the amount for β5 and β6 is quite small. In this study, due to this limitation on the data for β5 and β6, we created two machine-learning models: a model for predicting the first overall stability constant β1 and a model for predicting multi-order βn (n = 2–6) using appropriate descriptors (see Methods section).

Figure 1
figure 1

(a) Total experimental results of each cation in the initial dataset, which is composed of 57 cations and 2706 ligands. (b) Distribution of each βn in the initial dataset. The total amount of data, cations, and ligands are also displayed.

Sensitivity analysis and optimization of the GPR model for predicting β 1

We prepared a total of 118 features to create a GPR model for predicting β1 in this study (see the Methods section). Feature selection is critically important for creating a machine-learning model with high predictive performance. In GPR, although the relevance of each feature is usually interpreted as the inverse of its length scale parameter, some previous reports have pointed out that this approach sometimes does not work well21,22,23. Accordingly, we evaluated the relevance of each feature via sensitivity analysis using a Kullback–Leibler (KL) divergence as a measure23. We set the perturbation to 0.001 during calculation. Figure 2a shows the standardized relevance of the 10 highest-ranked features using the GPR model with optimized hyperparameters that uses full feature β1 (all results are listed in Supplementary Information S2). The total contribution of these 10 features reaches 0.755. As shown in Fig. 2a, the Pauling electronegativity of metals is the most relevant feature for predicting β1. Moreover, ionic properties, such as molecular charge, cation charge, and ionic radius, are also highly relevant. Among the ligand features, Moreau–Broto autocorrelation of topological structure features (AATS0Z, AATS0i, and ATSC3se) and fragmental features (NssO and NssNH) are in the top 10 features. AATS0Z, AATS0i, and ATSC3se are computed based on a molecular graph and depend on atomic number, ionization potential, and the Sanderson electronegativity of the elements in the ligand, respectively. NssO and NssNH correspond to the number of chemical structures, such as -O- and -NH-, respectively. In particular, oxygen and nitrogen become coordination sites due to their high electronegativity, suggesting that the relevance scores of NssO and NssNH are high.

Figure 2
figure 2

(a) The top 10 highest ranked features through sensitivity analysis using a Kullback–Leibler divergence as a measure for predicting β1. (b) Predictive performance for the validation samples as a function of the number of features. Features are arranged in descending order of relevance. The black dashed line corresponds to the top 59 features. (c) Parity plot between true and predicted β1 values of the validation data using the best GPR model. Error bars indicate 1σ uncertainty of the predicted value.

Next, we performed feature optimization of the β1 GPR model while monitoring the predictive performance. Note that usual cross-validation techniques do not reproduce the original purpose of predicting unknown ligands because it is unavoidable for common ligands to remain in both training and validation data, which may result in an overestimation of the predictive performance. Thus, we extracted 20 appropriate ligands based on the applicability domain of our model and calculated mean absolute error (MAE) and coefficient of determination (R2) for them. The selection rule for the validation samples is described in Supplementary Information S3, and we would like to emphasize that the 20 selected ligands are not contained in the training dataset. Figure 2b summarizes the predictive performance for the validation data using the GPR model as a function of the descriptor dimension. The features were arranged in descending order of relevance scores, as shown in Fig. 2a. Consequently, it is concluded that the best features for predicting β1 are the top 59 features (MAE: 1.31, R2: 0.84), which are composed of 8 cation features, 49 ligand features, and 2 experimental conditions. Furthermore, Fig. 2c shows the parity plot between true and predicted β1 values of the validation data using the best GPR model, implying the high generalizability of our model. The cross-validations of the feature-optimized GPR model for predicting β1 also indicated good predictive performance (see Supplementary Information S5).

Sensitivity analysis and optimization of GPR model for predicting multi-order β n

As demonstrated in the prediction of β1, the feature selection is critical in predicting multi-order overall stability constants βn as well. For Co2+, Ni2+, and Cu2+ in particular, it has been reported that there are linear correlations between β1 and β216. In the present study, we demonstrate that the strong correlations between β1 and βn are observed not only in other cations but also in higher coordination numbers. Figure 3 summarizes the relationship between experimental multi-order overall stability constants βn and predicted β1 values of the corresponding M-L pair. Note that not all M-L pairs for βn are contained in the dataset for β1. As shown in Fig. 3, one can see a strong correlation between each of the true βn and predicted β1 values, resulting in large positive Pearson correlation coefficients (PCC). Therefore, the predicted β1 for the M-Ln complex is expected to be a significantly effective feature for predicting multi-order βn. Because we have succeeded in predicting β1 by combining features of cations and ligands, it is thought to be feasible to predict multi-order βn by using features of M-L complex and L. Consequently, we prepared a total of 60 features to create a GPR model for predicting multi-order βn in this study (see the Methods section).

Figure 3
figure 3

Relationships between experimental multi-order βn and predicted β1 of the corresponding M-L pair. The results of Pearson correlation coefficients (PCC) are also displayed.

Similar to the β1 GPR model, Fig. 4a shows the standardized relevance of the top 10 highest-ranked features using the full-feature-used βn GPR model with optimized hyperparameters (the full result is provided in Supplementary Information S4). The total contribution of these features reaches 0.986. As shown in Fig. 4a, it is obvious that the predicted β1 for the M-L pair is the most important feature. NaaO and nBridgehead are fragmental features, which are defined as the number of chemical structures like –O– among aromatic rings and the number of bridgehead atoms, respectively. The X_VSAY, such as SlogP_VSA4, PEOE_VSA13, and EState_VSA2, is defined as the sum of van der Waals surface area (VSA) of atoms whose property X lies in the range Y. In particular, PEOE_VSA13 and EState_VSA2 are related to the 3-dimensional distribution of electrons and are calculated using the partial equalization of orbital electronegativities (PEOE) method24 and electrotopological state index (EState) method25, respectively. Moreover, JGI2 is also a topological feature, which is computed by a 2-ordered mean topological charge. After optimizing the number of features (see Supplementary Information S3), the best predictive performances (MAE: 1.30, R2: 0.92) were obtained with the top 25 features, which are comparable to the predictive performance of the best β1 model. Figure 4b shows the parity plot between true and predicted multi-order βn values of the validation samples using the best GPR model, indicating the high generalizability of our model again.

Figure 4
figure 4

(a) The top 10 highest ranked features through sensitivity analysis using a Kullback–Leibler divergence as a measure for predicting multi-order βn. (b) Parity plot between true and predicted multi-order βn values of the validation data using the best GPR model. Error bars indicate 1σ uncertainty of the predicted value.

Discussion

In this section, we discuss the important features for predicting β1 and multi-order βn. As a summary of the results obtained from the sensitivity analysis of the GPR model for predicting β1, electronegativity- or ionic-related features are sensitive to β1. In principle, when the electron polarization between the cation and the element at the coordination site of the ligand is small, a strong coordination bond is formed between them2. The electron distribution between them is then determined not only by the difference in electronegativities but also by the size of the cation. For β1, the Coulomb interaction between the cation and negatively charged ligand assists the formation of stable M-L complexes. Therefore, as shown in Fig. 2a, it is quite reasonable that features relevant to the electronegativity and ionic properties of both metals and ligands exhibited high relevance scores for predicting β1. In addition, we believe that these results were successfully obtained thanks to using experimental data for various cations. Given that the electronegativities of lanthanides are very similar, we recognize that their importance was underestimated in Chaube et al.8. However, because PEOE_VSA2, which was the most important feature in their study, is also related to electronegativity24, our results do not deviate from the findings of the previous studies.

Next, we focus on the relationship between multi-order βn and β1. Because the n-th equilibrium constant Kn satisfies the relationship of \(K_{1} > K_{2} > \cdots > K_{n}\), one can derive the following universal inequality:

$$ \begin{array}{*{20}c} {\beta_{n - 1} < \beta_{n} < n\beta_{1} .} \\ \end{array} $$
(4)

Equation 4 implies that the ratio βn/β1 is always larger than 1 and βn-1/β1 is smaller than n, which is observed in Fig. 3, with a few exceptions. Considering β1 reflects the cation–ligand binding strength to some degree, this suggests that a strong correlation between βn and β1 is one of the intrinsic properties in the formation of complexes. In addition, the fact that VSA-related features are important for the multi-order βn model is presumably because 3-dimensional structures such as steric hindrance are more influential than in the case of forming M-L complexes. Finally, we would like to mention the relationship between βi and βj (i, j > 1). As shown in Fig. 3, the multi-order stability constants have a linear dependence on β1, which might also mean the linear relationship between βi and βj. We believe that these empirical trends can be useful to roughly predict stability constants for M-Ln complexes, which became soluble only when multiple ligands are coordinated.

Conclusion

In this study, we developed two machine-learning models: one for predicting the first overall stability constant β1 and the other for predicting the multi-order overall stability constant βn. Using a very large training dataset, the developed models covered more than 50 cations, realizing the high generalizability of our models. Note that this is the first time a machine-learning model was created to predict the multi-order overall stability constant. Moreover, the relevance scores of features for both cations and ligands are quantitatively evaluated through sensitivity analysis to improve the interpretability of our models. Consequently, the most relevant features are consistent with physical understanding for complex formation. We believe that our findings are useful for the design and screening of new ligands for various applications. In particular, because it was concluded that the predicted β1 value was the most important property to predict multi-order βn of the corresponding M-L pair, further development of the β1 model is expected to be necessary in the future. Finally, we would like to mention the advantages and disadvantages of our GPR models. One of the advantages is efficient searching for new ligands through Bayesian optimization, which is a topic we will study in the future. This is due to the fact that prediction uncertainty is quantified by GPR model. However, our models still cannot be applied to some cations, such as NH4+ and UO2+ because we focused on only single cations in this study. The descriptor for these cations may be prepared by averaging features of elements in them. By solving these remaining issues, we expect to realize a machine-learning model for predicting arbitrary complexes.

Methods

Dataset preparation

The experimental values of the n-th overall stability constants βn for the M:L = 1:n complexes (n = 1–6) and experimental conditions were collected from the NIST Critically Selected Stability Constants of Metal Complexes Database26 and various literature27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58. In this study, data for several heavy metals (i.e., Am, Cm, Cf, Bk, Es, Fm, and Md) or whose ligands contain elements such as Te, Se, As, Mn, Co, Fe, W, Mo, Cr, and Re were excluded due to the difficulties in making descriptors. Moreover, we collected experimental data according to the following priorities: data with temperature of 25 °C and ionic strength of 0.1 > data with temperature of 25 °C and any ionic strength > data with any temperature and ionic strength of 0.1 > data with the maximum overall stability constant. In the case of duplicates, the data with the largest overall stability constant was employed. Consequently, 19,810 M-Ln complexes remained, which consisted of 57 cations and 2706 ligands. The chemical structure of ligands is represented by SMILES (Simplified Molecular Input Line Entry System).

Feature engineering

Following the previous study8, we used cation properties, ligand compositional and topological features, and experimental conditions, namely temperature and ionic strength, as the machine-learning descriptors for predicting β1. For cation descriptors, we initially selected 12 element-level features, such as cation charge, atomic number, melting point, molar specific heat capacity, ionic radius, polarizability, electron affinity, Pauling electronegativity, and numbers of unfilled electrons in s, p, d, and f orbitals59,60. We used molecular descriptor calculation software Mordred to generate compositional and topological descriptors for ligands61. In addition, we prepared the molecular charge of ligands in an aqueous solution as one of the ligand features. After removing features that have only a single value or null value, 587 ligand features remained. Subsequently, we calculated the Pearson correlation coefficient of the pair of ligand features i and j, Corr(i, j), and if the absolute value of Corr(i, j) is greater than 0.7, we excluded the feature j. Furthermore, to avoid multicollinearity among features, we iteratively removed the feature with the largest variance inflation factor (VIF) score until the VIF score for all features became less than 4. In the case of predicting multi-order βn, we employed the predicted β1, the standard deviation of the predicted β1, and the charge of M-L complex, namely the sum of the cation charge and molecular charge, as the descriptor for M-L complex. The descriptor for ligands consisted of ligand features that were not used in the best β1 GPR model and the number of ligands to be additionally coordinated to the M-L complex, namely n−1. After feature engineering, the shapes of the final datasets for predicting β1 and multi-order βn were 13,559 data × 118 features and 6251 data × 60 features, respectively.

Gaussian process regression

In GPR, a similarity between data xi and xj is measured by the kernel, such as k(xi, xj), which in turn defines a covariance matrix. Therefore, GPR is one of the powerful techniques because it naturally quantifies predicted values and their uncertainties. A well-known kernel choice is a Matérn kernel with ν = 3/262,63, which is described as follows:

$$ \begin{array}{*{20}c} {k\left( {{\varvec{x}}_{i} , {\varvec{x}}_{j} } \right) = \sigma^{2} \left( {1 + \frac{\sqrt 3 r}{l}} \right)\exp \left( { - \frac{\sqrt 3 r}{l}} \right) ,} \\ \end{array} $$
(5)

where σ, l, and r are hyperparameters to represent the signal amplitude, length scale referring the relevance of features, and the Euclidean distance between data xi and xj. As shown in Eq. (5), the usual Matérn kernel with ν = 3/2 has a single length scale parameter l. However, in this study, considering that the relevance of each descriptor should be different, the Matérn kernel with ν = 3/2 is modified with the automatic relevance determination (ARD) structure as follows:

$$ \begin{array}{*{20}c} {k\left( {{\varvec{x}}_{i} , {\varvec{x}}_{j} } \right) = \sigma^{2} \left( {1 + \sqrt 3 r_{{{\text{ARD}}}} } \right)\exp \left( { - \sqrt 3 r_{{{\text{ARD}}}} } \right) ,} \\ \end{array} $$
(6)
$$ \begin{array}{*{20}c} {r_{{{\text{ARD}}}} = \sqrt {\mathop \sum \limits_{m = 1}^{d} \frac{{\left( {x_{im} - x_{jm} } \right)^{2} }}{{l_{m}^{2} }}} ,} \\ \end{array} $$
(7)

where d is the dimension of a descriptor. Our GPR modeling was performed using PyTorch64 and GPytorch65.