Introduction

The excretion process in the urine involves three main processes: glomerular filtration, tubular secretion, and reabsorption1. In glomerular filtration, only the unbound drugs in plasma are filtrated and enter the tubular lumen depending on the glomerular filtration rate (GFR) and the extent of the drug fraction unbound in plasma (fu,p). Active tubular secretion is mediated by several transporters for numerous acidic, basic, and some large neutral compounds. A variety of transporters are expressed predominantly in the proximal tubule, executing sequential uptake and efflux that facilitates renal tubular secretion2. Reabsorption is mediated by passive diffusion and reuptake by transporters, with the former being especially important for exogenous compounds. Thus, renal excretion is a result of complicated multiple-transport systems, with previous studies reporting that compounds can be classified into reabsorption, intermediate, and secretion type depending on the ratio of renal clearance (CLr) to glomerular filtration3,4,5.

Two important pharmacological indicators in renal drug excretion include the fraction of drug excreted unchanged in urine (fe) and renal clearance (CLr). fe is an important quantitative indicator showing the contribution of renal excretion for overall drug elimination and CLr is defined as the proportionality term between urinary excretion rate of unchanged drug and plasma concentration1. Predicting the degree of fe during the drug discovery stage is important to determine the basic principal for the subsequent development stage. Moreover, the use of renal excreted-type drugs should in general be avoided or administered at low dosages for patients with renal failure6,7.

The pharmacokinetic profile of a drug is an amalgamation of various properties, such as dissolution, intestinal absorption, plasma protein binding, metabolism, biliary excretion, distribution, and renal excretion. Recently, computer-aided drug design using in silico models to predict the absorption, distribution, metabolism, excretion, and toxicity (ADMET) parameters8,9,10 have attracted considerable attention in the field of drug development. This approach is effective to evaluate the physicochemical properties and in vivo pharmacokinetics during the early stages of drug discovery. In addition, the use of in silico prediction techniques minimizes the expenses and risks of subsequent withdrawals during clinical trials.

Properly validated in silico models for ADMET prediction can assist drug design by helping medicinal chemists prioritize suitable lead compounds in the optimization process of early drug discovery. Whereas industrial medicinal chemists may have access to comprehensive commercial suites to predict ADMET properties, this process is difficult for most academic researchers. Alternatively, models built using freely available computational tools can be easily shared with other researchers or can be integrated into other packages. Therefore, such models would constitute valuable assets for both academia and industry.

To the best of our knowledge, no models to predict fe and CLr based only on structure information have been developed using freely available software. For the prediction of fe, Doddareddy et al.11 generated a binary classification model of fe from structural information calculated using Volsurf and Molconn-Z, with threshold values of fe set to 0.2 in a dataset containing 130 compounds. This resulted in 65–80% of all test sets to be correctly predicted. Kusama et al.12 established a binary classification model to predict the major clearance pathways and provided an online prediction system, CPathPred, which was subsequently improved by Toshimoto et al.13 and Wakayama et al.14. In the latter prediction model14, threshold values of fe were set to 0.25 for the prediction of renal excretion, yielding an F-measure of 0.67 on the test set for renal excretion with the input of four fundamental parameters (charge, molecular weight [MW], logD, and fu,p). To predict the CLr, allometric scaling approaches and in vitro–in vivo extrapolation approaches have been extensively utilized. Nevertheless, although allometric scaling is a practical tool, it requires in vivo CLr data in several animal species, which may be difficult to obtain by academic researchers15,16. The in vitro–in vivo extrapolation approaches have successfully determined and incorporated in vitro permeability data from Caco-2 or LLCPK1 cells into prediction models17,18,19; however, it remains necessary to experimentally determine the individual scaling factors. Furthermore, unique quantitative structure-pharmacokinetics relationships have been constructed to predict the CLr of drugs or drug-like compounds in humans20.

Although the accuracy of previously reported models has been improved14,20, such models rely upon either the direct input of experimental values or commercial software for the calculation of descriptors or values of pKa and logD. It is difficult to find a free software that can calculate logD; moreover, even though ChemAxon (Marvin)21 has the ability to calculate pKa on an individual basis, it is not possible to calculate this value for multiple compounds simultaneously using a command line. As it is essential to perform calculations batch-wise when new structures are brought into our prediction system, we could not find suitable free software to calculate logD and pKa for the purpose of this open model.

Previously, we constructed prediction models of the human unbound fraction in plasma (fu,p)22, with the fu,p prediction models released via a freely available tool (fu,p Predictor, http://adme.nibiohn.go.jp/fup/). As approximately 10% of the blood volume is filtered at the glomerulus by the hydraulic pressure exerted by the arterial blood and, as a general rule, only the unbound drug in plasma is filtered, the value of fu,p significantly impacts the renal glomerular filtration23. Accordingly, Dave et al.20 pointed out that the fu,p represents the most important determinant of CLr prediction. Moreover, fu,p has been included as one of the four default descriptors in fe prediction in several reports12,13,14. Thus, we considered that our fu,p prediction models22 might be expanded to predict fe and CLr.

Here, we created fe and CLr datasets of 411 and 401 compounds, respectively, and generated two types of predictions: 1) binary classification models of fe and 2) a two-step prediction system of CLr through a combination of the classification and regression models, incorporating structure information without any experimental values but with predicted fu,p values, using freely available software. Moreover, the contribution of fu,p to the accuracy of regression models for CLr prediction was considered. These in silico prediction models are freely available.

Methods

Data set preparation and descriptor calculation

The dataset for fe prediction was acquired from Benet et al.24 and PharmaPendium25. The dataset for CLr prediction was acquired from the ChEMBL database and the dataset reported by Varma et al.3,26,27 and Ito et al.5. Both datasets were created after careful curation to select the values of fe or CLr in healthy adult humans for a single administration to obtain higher prediction accuracy28. The details of curation are provided in Supplementary Methods.

For the fe, a dataset containing 411 compounds (343 from Benet et al.24,27 and 68 from PharmaPendium) with fe, fu,p, and structure information was assembled (Dataset_fe). The list of 343 compounds and their fe values are summarized in Supplementary Table S1; detailed information for the 68 compounds acquired from PharmaPendium has not been presented owing to licensing restrictions.

For the CLr, a dataset containing 401 compounds with experimental CLr including fu,p values and structure information was assembled (Dataset_CLr); the clearance ratio (CR)5, which is also referred to as the renal extraction ratio29, to categorize compounds into three excretion types was calculated using the following equation:

$$CR=CLr/(fu,p\times {\rm{GFR}})$$

The GFR used in this study was 1.8 mL/min/kg (126 mL/min in a 70 Kg man). The compounds were categorized into three types based on their CR. The compounds that displayed CR < 0.67, 0.67 ≤ CR < 1.5, or 1.5 ≤ CR were classified into reabsorption (R) type (net reabsorbed compounds), intermediate (IM) type (apparently not reabsorbed or secreted compounds), and secretion (S) type (net secreted compounds), respectively5. Predicted fu,p was calculated using our previously developed fu,p predictor22. Ionization profiles in the data set were extracted from the ChEMBL database.

We employed the open source programs Mordred (ver. 1.0.0)30 and PaDEL-Descriptor31 to calculate the two-dimensional (2D) descriptors and fingerprints (Extended, KlekotaRoth, and AtomPairs2D), respectively. LogDpH7.4 and pKa (apKa) values were calculated using ChemAxon calculator plugin software (Budapest, Hungary) because of the importance of LogD and pKa as pharmacokinetic parameters; these values were used only for visualizing the chemical space by principal component analysis (PCA).

Data analysis

Data analysis was performed in R (version 3.5.132), and the results were visualised using the ggplot233 and ggfortify34 packages. In total, 11 descriptors, i.e., MW, topological polar surface area, SLogP, LogD pH 7.4, apKa, bpKa, hydrogen bond acceptor (HBAcc), hydrogen bond donor (HBDon), number of aromatic atoms (nAromAtom), number of aromatic bonds (nAromBond), and the number of rotatable bonds (nRot), were used for PCA.

Processes of model construction

The caret35 package in R was used to build the prediction models. An overview of the common process in model construction is shown in Supplementary Scheme S1. The data sets were split into training and test sets using random selection at a ratio of 8:2. In the training set, descriptors that showed near-zero-variance and absolute correlations >0.90 were identified and excluded by calculating the frequency ratio using the nearZeroVar function and by creating a correlation matrix using the findCorrelation function in the caret package. Thereafter, descriptors that significantly contributed to the prediction accuracy were selected using the Boruta36 algorithm to automatically rank and omit descriptors based on the random forest (RF) classification algorithm with the training set. Boruta is a wrapper built around the RF classification algorithm implemented in the R package randomForest37, which provides unbiased and stable selection of important and non-important attributes.

Prediction models were constructed using various machine learning techniques including linear and non-linear methods; i.e., RF, support vector machine (SVM with radial functions), artificial neural network (ANN), and partial least squares (PLS), to obtain the most accurate model for our data set. To adopt each technique, the train function was passed with method parameters set as rf, svm, nnet, and pls in the caret package. We used the automatic grid search of each tuning parameter with four (tuneLength = 4) values of each in the caret package to prioritize the optimal parameters for our predictions and models were created using a 10-fold cross validation. For 3-class classification, the RF algorithms can naturally handle multiclass classification, whereas all-versus-all and all-versus-rest approaches were used for multiclass SVM in the e1071 package38 and multinomial log-linear models via neural networks in the nnet package39, respectively. The generated models were evaluated with the test set. Kappa (True accuracy), balanced accuracy, sensitivity, and specificity obtained from the confusion matrix in classification models, and r-squared (r2, coefficient of determination) and root mean squared error (RMSE) in regression models were used to evaluate their performance on the test set. The best models were chosen according to the value of Kappa or r2 of the test set in the classification and regression model, respectively.

Model construction for f e and CL r prediction

As descriptors, more than 1600 2D descriptors calculated via Mordred and 5640 Extended, KlekotaRoth, and AtomPairs2D fingerprints generated using PaDEL-Descriptor were prepared, and descriptors for which the calculation failed were excluded (Supplementary Information 3). The 6974 and 6976 descriptors in fe and CLr prediction models were initially used for model construction and descriptors selected using the Boruta36 algorithm were finally applied for the predictions. Dataset_fe was split into 328 and 83 compounds for training and test sets, respectively, using random selection and the prediction model was constructed. Dataset_CLr containing 401 compounds was split by random selection at a 1:9 ratio into 41 and 360 compounds to isolate the external test set. Thereafter, the other 360 compounds were split at 8:2 (278 and 72 compounds) for 3-class classification models; in parallel, the other 360 compounds were classified into three excretion types; 94 reabsorption (R), 86 intermediate (IM), and 180 secretion (S) type compounds according to their CR calculated using CLr, fu,p, and GFR values. Subsets were defined as Dataset_CLr_R, Dataset_CLr_IM, and Dataset_CLr_S, respectively. An overview of CLr model construction is shown in Supplementary Scheme S2.

Results

Distribution and chemical space analysis in Dataset_f e and Dataset_CL r

Dataset_fe and Dataset_CLr, consisting of 411 and 401 compounds, respectively, were weighted towards the lower range of fe and CLr, with 220 compounds that were overlapped. Distribution of fe in Dataset_fe and CLr with a logarithmic scale in Dataset_CLr are shown in Fig. 1a,b, and that of CLr in the original scale is shown in Supplementary Fig. S1; this characteristic was also observed regarding the data sets used in previous reports11,20. The chemical spaces of the two datasets were visualized by PCA along with classification, with the threshold set to 0.30 in Dataset_fe (Fig. 1c) and with CR types such as R, IM, and S in Dataset_CLr (Fig. 1d). A total of 11 descriptors, all of which are generally considered to be important parameters for synthetic expansion, were used for the analysis. Compounds with higher fe were less lipophilic than those with lower fe, reflecting the fact that water soluble drugs generally undergo renal excretion. In Dataset_CLr, most of chemical space in R, IM and S type were overlapped, and it was difficult to separate the three classes using these 11 descriptors, indicating that R, IM, S compounds have similar physicochemical properties (Fig. 1d). It was considered reasonable that R type compounds showed a lower CLr, S type compounds showed a higher CLr and IM type compounds showed medium CLr (Fig. 1e). The averages of CLr were 0.20, 1.02, and 2.50 mL/min/kg in R, IM, and S types, respectively. The relationship between fe and CLr or observed fu,p in logarithmic scale, depends on the ionization properties of the compounds, was also analysed. No trend existed in the distribution of CLr in each ionization property and the assembled data set spanned a chemical space similar to that of the approved drugs (Supplementary Fig. S2).

Figure 1
figure 1

(a) Distribution of fe in Dataset_fe consisting of 411 compounds. Average and median are shown in the top-right. (b) Distribution of CLr with logarithmic scale in Dataset_CLr consisting of 401 compounds. Average and median are shown in the top left. (c) The chemical space of Dataset_fe with classification by the threshold set to 0.30. The frames indicate 95% normal confidence ellipses in the assembled 411 compounds with fe ≥ 0.3 (red) and fe < 0.3 (green). (d) The chemical space of Dataset_CLr in 96 intermediate (IM, red circle), 104 reabsorption (R, green triangle), and 201 secretion (S, blue square) types. (e) Plot of compound counts depending on CR type. Average and median of CLr in each CR type are shown on the right.

Classification models to predict the extent of f e

Binary classification models were created with fe threshold value set to 0.30 to define the low and high/medium classes, with 158 and 253 compounds classified into the high/medium and low class, respectively. These thresholds were chosen according to previous reports2,14. Fifty one descriptors were finally selected in the training set using the Boruta algorithm36. Prediction models were trained in a training set comprising 328 compounds, to which four machine learning methods (RF, SVM with radial, ANN, and PLS) were applied. Each model was validated on the common test set containing 83 compounds; the statistical results of the models are summarized in Table 1. Kappa was 0.46–0.52 and 0.29–0.49 in the training and test set, respectively. Balanced accuracy and specificity, which is the ratio to successfully distinguish the low fe class, were 0.63–0.74 and 0.76–0.90 in the test set. RF showed the highest Kappa in the test set; RF parameters (ntree and ntry) were 500 and 14, and the model was defined as Model_ fe. In parallel, to evaluate the statistical influence of fu,p as a descriptor to fe prediction accuracy, prediction models of fe were constructed with or without fu,p values (observed and predicted). Paired t-test analysis revealed no significant difference between the Kappa of Model_ fe and those of other models with fu,p (Supplementary Table S2).

Table 1 Statistical results of the binary classification models for fe prediction by each of the four models.

SLogP was the most important descriptor in all the models, whereas fu,p was listed as a second important descriptor in the models with fu,p. The top ranked descriptors according to their variable importance for the best models are listed in Supplementary Table S3, and the main important descriptors were common to all the three models including other lipophilic descriptors such as SlogP. In addition, topological descriptors such as ATS (Moreau-Broto autocorrelation), MATS (Moran autocorrelation), GATS (Geary autocorrelation), chi related index (Molecular connectivity), and ETA (Extended topochemical atom) were also determined as important descriptors.

Relationship between CL r and f u,p

The relationship between CLr and fu,p was analysed in Dataset_CLr. The correlation coefficient (r) between CLr and observed fu,p in logarithmic scale was moderate (r = 0.54) (Fig. 2a); however, the correlation between CLr and observed fu,p was increased (r = 0.72, 0.98, 0.80 in R, IM, and S type, respectively) in the subsets with the CR types (Fig. 3b), suggesting that fu,p values used as a descriptor are likely effective to create CLr prediction models in the sub-clustered dataset by CR types. In comparison, the correlation did not change in a subset of Dataset_CLr with ionization properties (Supplementary Fig. S3). In addition, fu,p in the IM type was significantly higher than that in the other types (Fig. 2c). This indicated that the mechanism of renal excretion in these compounds is mainly glomerular filtration, with the contribution of secretion by transporters or reabsorption by lipophilicity being low.

Figure 2
figure 2

Relationship between CLr in logarithmic scale and observed fu,p. (a) Whole Dataset_CLr (401 compounds), and (b) sub-categorized by CR type (104, 96, and 201 compounds in reabsorption [R], intermediate [IM] and secretion [S] type, respectively). (c) Boxplot of observed fu,p in each excretion type. n; compound counts, r; correlation coefficient.

Figure 3
figure 3

Plot of predicted and observed CLr by three regression models with predicted fu,p value. (a) in the test set (66 compounds) and (b) external test set (41 compounds).

Furthermore, upon comparison of the observed and predicted fu,p values, as shown in Supplementary Fig. S4, a correlation could be seen between observed and predicted fu,p values (r = 0.84), with 72.8% and 84.0% of the predicted fu,p values falling within 2-fold and 3-fold error, respectively. This indicated that the fu,p predicted by fu,p predictor22 correlated well with the observed fu,p.

Prediction models for CL r

A comprehensive CLr prediction model incorporating the whole Dataset_CLr using several machine learning methods was constructed for a randomly selected training set. This was validated by the test set with or without fu,p values. Although the average of r2 appeared to slightly increase (from 0.24 to 0.32) when fu,p was added as a descriptor, the highest r2 of all the models was 0.4 in the test set (Supplementary Table S4). As previously reported by Dave et al.20, a single model was not able to predict the renal clearance of all examined compounds.

As a next step, subsets of Dataset_CLr by CR type were generated and defined as Dataset_CLr_R, Dataset_CLr_IM, and Dataset_CLr_S as described in the experimental section. Regression models to predict the value of CLr were generated using four machine learning methods (RF, SVM with radial functions, ANN, and PLS). Three types of descriptors were applied: 1) 6,976 descriptors, 2) 6,976 descriptors + predicted fu,p, and 3) 6,976 descriptors with observed fu,p in each dataset. The statistical results of each model are summarized in Table 2, and r2 of the best model and average of r2 among several models with different randomized split of training and test set are shown. The p-values were calculated using the paired t-test with r2 against models without fu,p. All the models showed a significantly higher r2 when fu,p values were applied as descriptors: r2 in the test set increased from 0.38 to 0.66, 0.56 to 0.92, and 0.41 to 0.62 in the R, IM, and S type, respectively when the observed fu,p was included as a descriptor, indicating that inclusion of fu,p values as a descriptor increased the accuracy of the prediction model. In addition, r2 in the test set also increased significantly with predicted fu,p values, and its r2 values were slightly lower than those of the models with observed fu,p. In the model with predicted fu,p values, the PLS in R types and RF in IM and S type showed the best prediction capability, defined as Model_CLr_R, Model_CLr_IM, and Model_CLr_S, respectively. Fold error of the best models are also summarized; the percentage of samples within 2-fold error increased from 37.5% to 56.3% in R type, 68.8% to 100% in IM type, and from 48.6% to 62.9% in S type compounds using the observed fu,p as a descriptor. The percentage of samples within the 2-fold error also increased with predicted fu,p, as compared with that in the models without fu,p (to 43.8, 87.5, and 57.1% in R, IM, and S type, respectively). To ensure that this result was not derived from the inclusion of training compounds in the fu,p prediction model, whose fu,p can be predicted accurately in general, compounds included in the training set of the fu,p prediction model were excluded from the test set, with fold errors indicated in parentheses. Although the number of data sets in R type was small and this could accordingly not be compared accurately, a same trend was observed when using the entire data set in IM and S type. Predicted and observed CLr using Model_CLr_R, Model_CLr_IM, and Model_CLr_S in the test set and the external test set containing 41 compounds were plotted in Fig. 3a,b, 75.8% and 65.9% of the compounds fell into within 3-fold error, respectively.

Table 2 Statistical results and fold error of the best regression models for CLr prediction with or without fu,p.

The top ranked descriptors according to their variable importance for the three defined best models and a description of those descriptors are summarized in Supplementary Tables S5 and S6. Predicted fu,p was the most important descriptor in all the models.

To actualize the CLr prediction using structure information alone, three-class classification models to distinguish CR types (R, IM, and S) were constructed. The statistical results are summarized in Table 3. The RF models showed the highest Kappa (true accuracy) value of 0.32 in the test set, and balanced accuracy of 0.70, 0.58, and 0.68 in R, IM, and S type, respectively, and were defined as Model_CLr_CR. Although sensitivity in the R and IM type was not sufficiently high (0.56 and 0.29, respectively), 75% of S type compounds were successfully categorized into the correct type. The other raw parameters are shown in Supplementary Table S7. We also constructed three-class classification models with or without fu,p; no significant difference in the accuracy were detected (Supplementary Table S8).

Table 3 Statistical results of the 3-class classification models for CLr prediction.

CLr was predicted with the two-step prediction using CLr regression models (Model_CLr_R, Model_CLr_IM, and Model_CLr_S) following the prediction of CR type by a three-class classification model (Model_CLr_CR). An external test set consisting of 41 compounds that were not included to generate any model was used for the validation. The observed and predicted CLr values are plotted in Fig. 4; 39.0% and 43.9% of the predicted CLr values fell into 2- and 3-fold error ranges, respectively. An external validation set was then split into the higher and lower range of observed or predicted CLr with an average value of CLr in IM type compounds (CLr = 1.02 mL/min/kg). When the compounds were split according to observed value of CLr, 70.5% of the compounds fell within 2-fold error in the higher range, and 20.8% and 29.2% of the observed CLr values fell within 2- and 3-fold error in the lower range of CLr. When the compounds were split by predicted value of CLr, more compounds fell within 2- and 3-fold error in the higher range than in the lower range (78.6% in the higher range and 18.5% and 25.9% in the lower range). Using a combination of the classification model of CR type and the regression model of CLr in R, IM, and S type, CLr could be predicted from the structure information using only the freely available software, especially in the higher range of CLr. We also tried two step CLr prediction models with or without fu,p and fold error into 2- and 3- fold were not different (Supplementary Table S9).

Figure 4
figure 4

Plot of predicted and observed CLr in the external validation set consisting of 41 compounds by the two-step prediction system with predicted fu,p value.

Discussion

We developed an in silico prediction system to classify compounds into their degree of unchanged excretion in the urine and to predict the value of CLr using freely available tools without requiring any experimental data. Initially, a binary prediction model of fe was successfully generated; the threshold was set to 0.30 according to Varma et al.40, to define the compounds that are well- or poorly-eliminated in the urine. The inclusion of fu,p did not significantly affect the Kappa in the fe prediction models; rather, Model_fe without fu,p was sufficiently able to predict fe, equivalent to the results of previous studies11,12,13,14. The majority of the important variables identified in the generated models to predict fe were common, such that descriptors related to lipophilicity such as SLogP, topological descriptors related to electronic energy, and ionization potential indicators such as AATS, GATS, MATS, and chi comprised the key components of the models. Because lipophilicity is an important determinant for the choice between liver and renal excretion, it is natural that SLogP was the most important descriptor in all the models. In addition, hydrogen bonding interaction descriptors, including ionization potential, total energy, electronic energy, and sum of the total net charge were included in the previously constructed models4. Therefore, the inclusion of the descriptors related to lipophilicity, electronic energy, and ionization potential led to the models being able to successfully capture the key factors for fe prediction. Drug metabolism is generally important as one of the determinants for fe, because the compounds that are well metabolized show smaller values of fe27,40,41,42. We believe that it is ideal to predict fe in consideration of metabolic clearance as a task in future model construction because our fe prediction model did not take metabolism into consideration; this matter should be addressed in future studies wherein metabolic information has been collected.

In general, renal impairment alters drug efficacy, often increasing their pharmacological and toxicological effects owing to high concentrations7. Moreover, hepatic clearance is known to be impaired in patients with end-stage renal disease because of the accumulation of uremic toxins, which is influenced by the expression of several CYPs43,44,45. Information on renal clearance is useful in the early stages of drug discovery, not only for understanding pharmacokinetic profiles but also for avoiding potential risk in the population with renal impairment, as well as in those with renal disease and advanced age4. Our binary model (Model_fe) can be used to screen lead compounds in the early stage of drug discovery (Fig. 5 left). For example, Model_fe is appropriate for selecting compounds showing low fe that are not eliminated via the kidney, with an assumption that the drug could be administered to patients with renal impairment.

Figure 5
figure 5

Application of the generated prediction models. Left: In silico prediction system for fe in humans. Right: Two step in silico prediction system for CLr in humans. R; Reabsorption, IM; Intermediate, S; Secretion.

We concluded that a single in silico CLr prediction model was unable to predict CLr even if the fu,p value was applied as a descriptor, and no discernible linkages between CLr and ionization property were observed in our study. Comprehensive prediction will be difficult because renal excretion is a result of multiple processes with different mechanisms such as glomerular filtration, secretion, and reabsorption, which are mediated by active transport and passive diffusion by lipophilicity. This interpretation is in accordance with those of Dave et al.20, who also reported that splitting these compounds according to their ionization property did not improve prediction accuracy of CLr. Dave et al.20 finally constructed quantitative structure-pharmacokinetics relationships models that could be used to predict CLr of compounds that (1) undergo net reabsorption, and (2) are substrates and/or inhibitors of human renal transporters. Although the models were accurate, the experimental information, such as class of the compounds in the Biopharmaceutics Drug Disposition Classification System (BDDCS)24 and whether those compounds are substrates and/or an inhibitor of renal transporters, is required in advance to determine suitable prediction models. Thus, we aimed to generate a CLr prediction model in which an external input is not required, using only chemical structure information for devising a practical tool in drug design processes prior to chemical synthesis.

Previously, fu,p was reported as the most important determinant of renal excretion5,12,20. However, the inclusion of fu,p as a descriptor did not significantly affect fe and CR type prediction accuracy when the whole dataset was used in this study. In contrast, r2 of the regression models with the subset of each CR type was significantly increased when observed and predicted fu,p values were included (Table 2). The results suggest that because of the multiple mechanisms of renal excretion, the impact of fu,p was observably low in the overall prediction, whereas when Dataset_CLr was subclustered into three CR types, the influence of fu,p became more visible among the compounds with similar mechanisms.

The appearance of a drug in the urine is the net result of glomerular filtration, secretion, and reabsorption, for which CLr is defined as follows:

$$CLr=(1-FR)\,(fu,p\times {\rm{GFR}}+CLs)$$

where FR and CLs are the fractions reabsorbed from the lumen and the secretion clearance, respectively. When the compounds belong to R, IM, and S types, CLr is expressed by the following respective equations:

$${\rm{Reabsorption}}\,{\rm{type}}\,({\rm{R}}):CLr=(1-{\rm{FR}})\,(fu,p\times {\rm{GFR}})$$
$${\rm{Intermediate}}\,{\rm{type}}\,({\rm{IM}}):CLr=fu,p\times {\rm{GFR}}$$
$${\rm{Secretion}}\,{\rm{type}}\,({\rm{S}}):CLr=fu,p\times {\rm{GFR}}+{\rm{CLs}}$$

All the R, IM, and S type are proportionally affected by fu,p, and fu,p directly affects the value of CLr especially in the IM type. On the other hand, FR and CLs can also affect the values of CLr in addition to fu,p in the R and S type; information on renal transporters or metabolism related to FR and CLs is important for CLr prediction in these types. In addition, when the averages of r2 in Table 2 were compared, r2 was increased to the greatest degree in the IM type model (from 0.43 to 0.88 in the test set).

As shown in Figs. 3 and 4, the two-step prediction model of CLr was generated using a combination of several models. As a first step, the CR type could be predicted using a three-class classification model (Model_CLr_CR). As a second step, one of the three regression models (Model_CLr_R, Model_CLr_IM, or Model_CLr_S) was chosen according to the prediction results of Model_CLr_CR; then the final values of CLr were predicted. It should be mentioned that 12 out of 13 compounds that were miss-classified in the first three-class classification did not fall within 3-fold error in the final CLr prediction, indicating that improved accuracy in step 1 is necessary. Although it was difficult to identify a commonality among miss-classified compounds, cationic charges were frequently included in these miss-classified compounds (Fig. S5). Addition of similar compounds to the dataset or inclusion of pKa or logD information as descriptors which are related to charges will be effective to get higher accuracy. In the present study, we could not include pKa or logD as a descriptor because of the difficulties to find the freely available pKa or logD calculators suitable for our prediction system. Therefore, it is necessary to take into consideration that the accuracy of CLr prediction is low, particularly when the value of predicted CLr is <1.02 mL/min/kg. However, in contrast, 78.6% of the compounds in the higher range of predicted CLr were within 2-fold error, indicating that the results of compounds predicted to be >1.02 mL/min/kg are sufficiently reliable. This can be used for the designing of compounds and subsequent optimization of lead compounds in the early stages of drug discovery (Fig. 4 right).

Our dataset is one of the largest among those previously reported3,14. However, several hundreds of compounds were not sufficient to account for all potential diversity. We hope to further expand the number of compounds although it has been difficult to retrieve quality data from the public databases in the present circumstance. It is, therefore, desirable to develop an integrated database with curated data of high quality and sufficient compounds to cover a larger chemical space.

We have developed a prediction system of renal excretion focused on fe and CLr based on structure information alone using freely available software, which is available to the public. The prediction of CLr values from structure information was made possible using a two-step prediction, with three regression models to predict the value of CLr depending on CR type, following three-class classification into three CR types. Moreover, the accuracies of the regression models were increased by adding observed and predicted fu,p values, with contribution of fu,p being the highest in the regression models of IM type. In the external validation set, 78.6% of the samples fell within 2-fold error in the higher range of CLr. These prediction systems of renal excretion are expected to be practical tools, helping medicinal chemists to prioritize the actual synthesis of compounds during the drug design process before synthesis. A new web resource (http://adme.nibiohn.go.jp/renal_ex) has been established to access the online system for the prediction of overall renal excretion, as described in this study.