Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks


Triaging unpromising lead molecules early in the drug discovery process is essential for accelerating its pace while avoiding the costs of unwarranted biological and clinical testing. Accordingly, medicinal chemists have been trying for decades to develop metrics—ranging from heuristic measures to machine-learning models—that could rapidly distinguish potential drugs from small molecules that lack drug-like features. However, none of these metrics has gained universal acceptance and the very idea of ‘drug-likeness’ has recently been put into question. Here, we evaluate drug-likeness using different sets of descriptors and different state-of-the-art classifiers, reaching an out-of-sample accuracy of 87–88%. Remarkably, because these individual classifiers yield different Bayesian error distributions, their combination and selection of minimal-variance predictions can increase the accuracy of distinguishing drug-like from non-drug-like molecules to 93%. Because total variance is comparable with its aleatoric contribution reflecting irreducible error inherent to the dataset (as opposed to the epistemic contribution due to the model itself), this level of accuracy is probably the upper limit achievable with the currently known collection of drugs.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Comparison of selected property distributions in various drugs.
Fig. 2: Comparison of QED distributions in different datasets.
Fig. 3: Illustration of neural network architectures used in the present work.
Fig. 4: Different classifiers differ in terms of predicted drug-likeness and prediction variance.
Fig. 5: Comparison of sensitivity–precision curves obtained by varying variance thresholds.
Fig. 6: Comparison of ROC curves of selected classifiers.

Data availability

Datasets used in this work are deposited at Source Data are provided with this paper.

Code availability

Computer codes underlying this work are made freely available for non-commercial uses under a Creative Commons Attribution Non Commercial-No Derivatives 4.0 International (CC BY-NC-ND 4.0) license and are deposited at


  1. 1.

    Lipinski, C. A. Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1, 337–341 (2004).

    Article  Google Scholar 

  2. 2.

    Shultz, M. D. Two decades under the influence of the rule of five and the changing properties of approved oral drugs. J. Med. Chem. 62, 1701–1714 (2018).

    Article  Google Scholar 

  3. 3.

    Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

    Article  Google Scholar 

  4. 4.

    Berman, H., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Mol. Biol. 10, 980–980 (2003).

    Article  Google Scholar 

  5. 5.

    Yusof, I. & Segall, M. D. Considering the impact drug-like properties have on the chance of success. Drug Discov. Today 18, 659–666 (2013).

    Article  Google Scholar 

  6. 6.

    Mochizuki, M., Suzuki, S. D., Yanagisawa, K., Ohue, M. & Akiyama, Y. QEX: target-specific druglikeness filter enhances ligand-based virtual screening. Mol. Divers. 23, 11–18 (2018).

    Article  Google Scholar 

  7. 7.

    Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 48 (2017).

    Article  Google Scholar 

  8. 8.

    Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  Google Scholar 

  9. 9.

    Griffiths, R.-R. & Hernández-Lobato, J. M. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chem. Sci. 11, 577–586 (2020).

    Article  Google Scholar 

  10. 10.

    Putin, E. et al. Adversarial threshold neural computer for molecular de novo design. Mol. Pharmaceutics 15, 4386–4397 (2018).

    Article  Google Scholar 

  11. 11.

    Mignani, S. et al. Present drug-likeness filters in medicinal chemistry during the hit and lead optimization process: how far they can be simplified?Drug Discov. Today 23, 650–615 (2018).

    Article  Google Scholar 

  12. 12.

    Li, Q., Bender, A., Pei, J. & Lai, L. A large descriptor set and a probabilistic Kernel-based classifier significantly improve druglikeness classification. J. Chem. Inf. Model. 47, 1776–1786 (2007).

    Article  Google Scholar 

  13. 13.

    Hu, Q., Feng, M., Lai, L. & Pei, J. Prediction of drug-likeness using deep autoencoder neural networks. Front. Genet. 9, 585 (2018).

    Article  Google Scholar 

  14. 14.

    Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucl. Acids Res. 46, D1074–D1082 (2017).

    Article  Google Scholar 

  15. 15.

    Irwin, J. J., Sterling, T., Mysinger, M. M., Bolstad, E. S. & Coleman, R. G. ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 52, 1757–1768 (2012).

    Article  Google Scholar 

  16. 16.

    Fialkowski, M., Bishop, K. J. M., Chubukov, V. A., Campbell, C. J. & Grzybowski, B. A. Architecture and evolution of organic chemistry. Angew. Chem. Int. Ed. 44, 7263–7269 (2005).

    Article  Google Scholar 

  17. 17.

    Kowalik, M. et al. Parallel optimization of synthetic pathways within the Network of Organic Chemistry. Angew. Chem. Int. Ed. 124, 8052–8056 (2012).

    Article  Google Scholar 

  18. 18.

    Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article  Google Scholar 

  19. 19.

    RDKit: Open-source cheminformatics (RDKit);

  20. 20.

    Hong, H. et al. Mold2, molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. J. Chem. Inf. Model. 48, 1337–1344 (2008).

    Article  Google Scholar 

  21. 21.

    Cadeddu, A., Wylie, E. K., Jurczak, J., Wampler-Doty, M. & Grzybowski, B. A. Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. Angew. Chem. Int. Ed. 53, 8108–8112 (2014).

    Article  Google Scholar 

  22. 22.

    Woźniak, M. et al. Linguistic measures of chemical diversity and the ‘keywords’ of molecular collections. Sci. Rep. 8, 7598 (2018).

    Article  Google Scholar 

  23. 23.

    Defferrard, M., Bresson, X. & Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems Vol. 29, 3844–3852 (NIPS, 2016).

  24. 24.

    Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning 1263–1272 (PMLR, 2017).

  25. 25.

    Coley, C. W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).

    Article  Google Scholar 

  26. 26.

    Duvenaud, D. K. et al. Convolutional networks on graphs for learning molecular fingerprints. In Advances in Neural Information Processing Systems 2224–2232 (NIPS, 2015).

  27. 27.

    Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 30, 595–608 (2016).

    Article  Google Scholar 

  28. 28.

    Roszak, R., Beker, W., Molga, K. & Grzybowski, B. A. Rapid and accurate prediction of pKa values of C–H acids using graph convolutional neural networks. J. Am. Chem. Soc. 141, 17142–17149 (2019).

    Article  Google Scholar 

  29. 29.

    Leshno, M., Lin, V. Y., Pinkus, A. & Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6, 861–867 (1993).

    Article  Google Scholar 

  30. 30.

    Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303–314 (1989).

    MathSciNet  Article  Google Scholar 

  31. 31.

    Liu, B., Dai, Y., Li, X., Lee, W. S. & Yu, P. S. Building text classifiers using positive and unlabeled examples. In Proceedings of 3rd IEEE International Conference on Data Mining 179–186 (IEEE, 2003).

  32. 32.

    Fusilier, D. H., Montes-Y-Gómez, M., Rosso, P. & Cabrera, R. G. Detecting positive and negative deceptive opinions using PU-learning. Inf. Process. Manag. 51, 433–443 (2015).

    Article  Google Scholar 

  33. 33.

    Kwon, Y., Won, J.-H., Kim, B. J. & Paik, M. C. Uncertainty quantification using Bayesian neural networks in classification: application to biomedical image segmentation. Comput. Stat. Data Anal. 142, 106816 (2020).

    MathSciNet  Article  Google Scholar 

  34. 34.

    Doak, B. C., Zheng, J., Dobritzsch, D. & Kihlberg, J. How beyond rule of 5 drugs and clinical candidates bind to their targets. J. Med. Chem. 59, 2312–2327 (2015).

    Article  Google Scholar 

  35. 35.

    Kiureghian, A. D. & Ditlevsen, O. Aleatory or epistemic? Does it matter? Struct. Saf. 31, 105–112 (2009).

    Article  Google Scholar 

  36. 36.

    Chao, C., Liaw, A. & Breiman, L. Using Random Forest to Learn Imbalanced Data (Univ. California, 2004).

  37. 37.

    Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    Article  Google Scholar 

  38. 38.

    Tox21 Challenge (National Institutes of Health, accessed 3 February 2020);

  39. 39.

    Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954 (2016).

    Article  Google Scholar 

  40. 40.

    Jaeger, S., Fulle, S. & Turk, S. Mol2vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 58, 27–35 (2018).

    Article  Google Scholar 

Download references


This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA) under AMD award HR00111920027. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the US Government. A.W. and B.A.G. gratefully acknowledge funding from the Symfonia Award, grant 2014/12/W/ST5/00592, from the Polish National Science Centre (NCN). Data analysis and writing of the paper by B.A.G. was supported by the Institute for Basic Science, Korea (project code IBS-R020-D1). We also thank R. Zagórowicz and colleagues from IMAPP, sp., z.o.o. for technical help in code development, and M. Eder and colleagues from the Institute of Polish Language, Cracow, Poland, for helpful discussions.

Author information




W.B. and A.W. designed and performed most of the analyses and calculations. S.S. helped with data collection and analysis. B.A.G. conceived and supervised research. All authors wrote the manuscript.

Corresponding author

Correspondence to Bartosz A. Grzybowski.

Ethics declarations

Competing interests

The authors are consultants and/or stakeholders of Allchemy, Inc.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparison of Receiver Operational Characteristic curves of selected classifiers for certain types of compounds removed from the training set.

The Quantitative Estimate of Drug-likeness (QED, dark blue curves)3 is compared with Bayesian Neural Network (BNN) extension to RDKit AE model and the combination of BNN classifiers according to their prediction uncertainty. The models were evaluated on the subset of auxiliary test set obeying either classic (Ro5; n = 2071 molecules) or extended Rule of 5 (eRo5; n = 63), whereas the training (with exception of QED which is an already defined quantity) was performed on DRUGS+ZINC training set from which Ro5+eRo5 compounds were excluded (leaving n = 1127 molecules). Definitions of Ro5, eRo5 and bRo5 rules were taken from ref. 34. Source data

Extended Data Fig. 2 Distributions of drug-likeness probability for drug candidates considered in Phase 3 clinical trials.

The figure quantifies the performance of our classifiers on drug candidates in advanced clinical trials – we expected that these molecules, having made it so far in the development pipeline, would be thoroughly evaluated by expert medicinal chemists and therefore free of any obvious, non-drug-like features. In panel a), this hypothesis is tested using GCNN+RDKit classifier applied to a set of n = 727 Phase 3 candidates retrieved from ChEMBL39. As seen, ~15% of these candidates are predicted to be below 0.5 drug-likeness level, and 32% below 0.85 level. In other words, the majority – but not all – of the molecules are found to be drug-like, as could be expected. b) A related question is whether the classifiers could provide some information about molecules that are likely to fail a clinical trial, though it should be remembered that such failure can be due not only to scientific reasons, but also economic, IP factors, or superior performance of a similar-indication product from a competitor. We were able to identify only a limited number of molecules that failed Phase 3 investigations (n = 89 from ClinTox dataset37). The histogram evidences that on this set and using GCNN+RDKit classifier, ~15% of compounds have drug-likeness below 0.5 and 37% below 0.85. In other words, factors that make these drugs fail are not picked up by our filters. This result reinforces our belief that the classifiers we report are most useful to triage candidates considered early in the drug-discovery process and containing more obvious non-drug-like features. Source data

Extended Data Fig. 3 Performance of a ‘naïve’ 1-Nearest Neighbor classifier based on Tanimoto similarity.

As described in the main text, the negative, non-drug sets were pruned for molecules whose similarity to drugs was above 0.85 threshold of Tanimoto (ECFP4-based) similarity. This pre-processing step, after Bickerton’s QED work3, is intended to remove from the sets of non-drugs, the drugs’ metabolites or immediate synthetic precursors. Tanimoto similarity and the more accurate models we developed in the present work are, indeed, correlated to some degree – for instance, the Pearson correlation coefficient between the Tanimoto similarity to known drugs and the drug-likeness computed with our GCNN+ RDKit AE classifier is 0.615. To quantify the accuracy of the “naïve” Tanimoto-based measure, we constructed a Nearest-Neighbor (1-NN) classifier on the dataset used throughout our study, using the same train-test splits as for MLPs reported in Table 1. In this approach, the training set (comprised of molecules labeled as ‘drug’ or ‘non-drug’) is stored in memory – there is no parameter optimization of any kind. Prediction of drug-likeness for each test molecule (query) is involves the following steps: (i) Tanimoto similarities (based on ECFP4) between the test molecule and all molecules in the training set are calculated; (ii) The most similar one of these training-set molecules is selected; (iii) If this most similar molecule from the training set is a drug, then our test molecule is also classified as a drug; (iv) If this most similar molecule from the training set is a non-drug, then our test molecule is also classified as a non-drug. As shown in the Table, this conceptually simplistic classifier provides a surprisingly strong baseline of 82.6% (on the ZINC non-drug dataset), only 3-4% lower than MLP based on ECFP4 reported in the main text. This means that in terms of accuracy, all models with score below ~82% are actually not better than choices by Tanimoto similarity. Source data

Extended Data Fig. 4 Counts and percentages of compounds obeying Rule of 5, extended Rule of 5 and ‘beyond’ Rule of 5 in selected datasets.

Definitions of Ro5, eRo5 and bRo5 were taken from34. These rules are disjoint and do not cover the whole chemical space. Regarding ZINC (non-drugs), a random sample of 10,000 compounds (before removal of those with high Tanimoto similarity to FDA drugs) was analyzed. Source data

Extended Data Fig. 5 Percentages of drug-similar compounds estimated with PU-learning.

Two approaches were considered: incremental extension of reliable negatives (RN) from Liu et al.31 and decremental refinement of RN from Fusilier and co-workers32. In the incremental variant, the drug-likeness threshold for RN detection was set to 0.2, whereas in the latter method a threshold of 0.5 was used. Source data

Extended Data Fig. 6 Comparison of toxicity and drug-likeness predictions.

Evaluation was performed by RDKit AE on compounds listed in Supplementary Table 4 and grouped here as a, “Toxic compounds” (n = 23) and b, “Organic solvents” (n = 13). Auxiliary test set data were used in c, non-US drugs (n = 1281) and d, ZINC (n = 1281). The probability of toxicity was evaluated with a balanced Random Forest classifier trained on ClinTox dataset37. For results of other classifiers including toxicity predictions trained on the Tox21 dataset38, see SI, Section 6. Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–11 and Tables 1–7.

Source data

Source Data Fig. 1

Raw data used to compute histograms.

Source Data Fig. 2

Raw data used to compute histograms.

Source Data Fig. 4

Raw data used to prepare scatterplots.

Source Data Fig. 5

Raw data used for plotting precision-sensitivity curves.

Source Data Fig. 6

Raw data used for calculation of ROC curves.

Source Data Extended Data Fig. 1

Raw data used for calculation of ROC curves.

Source Data Extended Data Fig. 2

Raw data used to computed histograms.

Source Data Extended Data Fig. 3

Data presented in the table.

Source Data Extended Data Fig. 4

Data presented in the table.

Source Data Extended Data Fig. 5

Data presented in the table.

Source Data Extended Data Fig. 6

Raw data used to prepare scatterplots.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Beker, W., Wołos, A., Szymkuć, S. et al. Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks. Nat Mach Intell 2, 457–465 (2020).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing