Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design

A preprint version of the article is available at arXiv.


Generative models for structure-based molecular design hold considerable promise for drug discovery, with the potential to speed up the hit-to-lead development cycle while improving the quality of drug candidates and reducing costs. Data sparsity and bias are, however, the two main roadblocks to the development of three-dimensionally aware models. Here we propose a training protocol based on multilevel self-contrastive learning for improved bias control and data efficiency. The framework leverages the large data resources available for two-dimensional generative modelling with datasets of ligand–protein complexes, resulting in hierarchical generative models that are topologically unbiased, explainable and customizable. We show how, by deconvolving the generative posterior into chemical, topological and structural context factors, we not only avoid common pitfalls in the design and evaluation of generative models, but also gain detailed insight into the generative process itself. This improved transparency considerably aids method development and allows fine-grained control over novelty versus familiarity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Generative framework and PQR learning approach.
Fig. 2: Vocabulary shift across molecular libraries.
Fig. 3: Baseline-corrected enrichment in the PQR model.
Fig. 4: Visualization of the posterior.
Fig. 5: Analysis of experimental SARs.

Similar content being viewed by others

Data availability

The datasets used in this study are publicly available, for pointers see (repository; ref. 58).

Code availability

The source code and pre-trained models can be accessed at (ref. 58).


  1. Schneider, G. Automating drug discovery. Nat. Rev. Drug Discovery 17, 97–113 (2018).

    Article  Google Scholar 

  2. Boström, J., Brown, D. G., Young, R. J. & Keserü, G. M. Expanding the medicinal chemistry synthetic toolbox. Nat. Rev. Drug Discov. 17, 709–727 (2018).

    Article  Google Scholar 

  3. Blakemore, D. C. et al. Organic synthesis provides opportunities to transform drug discovery. Nat. Chem. 10, 383–394 (2018).

    Article  Google Scholar 

  4. Erlanson, D. A., Fesik, S. W., Hubbard, R. E., Jahnke, W. & Jhoti, H. Twenty years on: the impact of fragments on drug discovery. Nat. Rev. Drug Discov. 15, 605–619 (2016).

    Article  Google Scholar 

  5. Anderson, A. C. The process of structure-based drug design. Chem. Biol. 10, 787–797 (2003).

    Article  Google Scholar 

  6. Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).

    Article  Google Scholar 

  7. Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today 23, 1241–1250 (2018).

    Article  Google Scholar 

  8. Paul, D. et al. Artificial intelligence in drug discovery and development. Drug Discov. Today 26, 80–93 (2021).

    Article  Google Scholar 

  9. Tong, X. et al. Generative models for de novo drug design. J. Med. Chem. 64, 14011–14027 (2021).

    Article  Google Scholar 

  10. Sousa, T., Correia, J., Pereira, V. & Rocha, M. Generative deep learning for targeted compound design. J. Chem. Inf. Model. 61, 5343–5361 (2021).

    Article  Google Scholar 

  11. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminf. 9, 48 (2017).

    Article  Google Scholar 

  12. Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).

    Article  Google Scholar 

  13. Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

    Article  Google Scholar 

  14. Born, J. et al. Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2. Mach. Learn. Sci. Technol. 2, 025024 (2021).

    Article  Google Scholar 

  15. You, J., Liu, B., Ying, R., Pande, V. & Leskovec, J. Graph convolutional policy network for goal-directed molecular graph generation. In NIPS 6412–6422 (2018).

  16. Jin, W., Yang, K., Barzilay, R. & Jaakkola, T. Learning multimodal graph-to-graph translation for molecule optimization. In ICLR (2019).

  17. Jin, W., Barzilay, R. & Jaakkola, T. S. Junction tree variational autoencoder for molecular graph generation. In ICML 2328–2337 (2018).

  18. Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. CoRR abs/2001.09382 (2020).

  19. Jin, W., Barzilay, D. & Jaakkola, T. Hierarchical generation of molecular graphs using structural motifs. In ICML 4839–4848 (2020).

  20. Chen, Z., Min, M. R., Parthasarathy, S. & Ning, X. A deep generative model for molecule optimization via one fragment modification. Nat. Mach. Intell. 3, 1040–1049 (2021).

    Article  Google Scholar 

  21. Joshi, R. P. et al. 3D-Scaffold: a deep learning framework to generate 3D coordinates of drug-like molecules with desired scaffolds. J. Phys. Chem. B 125, 12166–12176 (2021).

    Article  Google Scholar 

  22. Simm, G. N. C., Pinsler, R., Csányi, G. & Hernández-Lobato, J. M. Symmetry-aware actor-critic for 3D molecular design. In ICLR (2021).

  23. Ghanbarpour, A. & Lill, M. A. Seq2mol: automatic design of de novo molecules conditioned by the target protein sequences through deep neural networks (2020).

  24. Skalic, M., Sabbadin, D., Sattarov, B., Sciabola, S. & De Fabritiis, G. From target to drug: generative modeling for the multimodal structure-based ligand design. Mol. Pharmaceutics 16, 4282–4291 (2019).

    Article  Google Scholar 

  25. Xu, M., Ran, T. & Chen, H. De novo molecule design through the molecular generative model conditioned by 3D information of protein binding sites. J. Chem. Inf. Model. 61, 3240–3254 (2021).

    Article  Google Scholar 

  26. Krishnan, S. R. et al. De novo structure-based drug design using deep learning. J. Chem. Inf. Model. (2021).

  27. Wang, M. et al. RELATION: a deep generative model for structure-based de novo drug design. J. Med. Chem. (2022).

  28. Zhang, J. & Chen, H. De novo molecule design using molecular generative models constrained by ligand–protein interactions. J. Chem. Inf. Model. (2022).

  29. Imrie, F., Hadfield, T. E., Bradley, A. R. & Deane, C. M. Deep generative design with 3D pharmacophoric constraints. Chem. Sci. 12, 14577–14589 (2021).

    Article  Google Scholar 

  30. Li, Y., Pei, J. & Lai, L. Structure-based de novo drug design using 3D deep generative models. Chem. Sci. 12, 13664–13675 (2021).

    Article  Google Scholar 

  31. Green, H., Koes, D. R. & Durrant, J. D. Deepfrag: a deep convolutional neural network for fragment-based lead optimization. Chem. Sci. 12, 8036–8047 (2021).

    Article  Google Scholar 

  32. Ragoza, M., Masuda, T. & Koes, D. R. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem. Sci. 13, 2701–2713 (2022).

    Article  Google Scholar 

  33. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  Google Scholar 

  34. Godinez, W. J. et al. Design of potent antimalarials with generative chemistry. Nat. Mach. Intell. 4, 180–186 (2022).

    Article  Google Scholar 

  35. Krenn, M., Häse, F., Nigam, A., Friederich, P. & Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach. Learn.: Sci. Technol. 1, 045024 (2020).

    Google Scholar 

  36. Cross, S. & Cruciani, G. Fragexplorer: Grid-based fragment growing and replacement. J. Chem. Inf. Model. 62, 1224–1235 (2022).

    Article  Google Scholar 

  37. Tan, X. et al. Discovery of pyrazolo[3,4-d]pyridazinone derivatives as selective DDR1 inhibitors via deep learning based design, synthesis, and biological evaluation. J. Med. Chem. 65, 103–119 (2022).

    Article  Google Scholar 

  38. Piticchio, S. G. et al. Discovery of novel BRD4 ligand scaffolds by automated navigation of the fragment chemical space. J. Med. Chem. 64, 17887–17900 (2021).

    Article  Google Scholar 

  39. Gebauer, N. W. A., Gastegger, M., Hessmann, S. S. P., Müller, K.-R. & Schütt, K. T. Inverse design of 3D molecular structures with conditional generative neural networks. Nat. Commun. 13, 973 (2022).

    Article  Google Scholar 

  40. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).

    Article  Google Scholar 

  41. Brown, N., Fiscato, M., Segler, M. H. & Vaucher, A. C. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 59, 1096–1108 (2019).

    Article  Google Scholar 

  42. Schnabel, T., Swaminathan, A., Singh, A., Chandak, N. & Joachims, T. Recommendations as treatments: debiasing learning and evaluation. In ICML 1670–1679 (ICML, 2016).

  43. Hu, L., Benson, M. L., Smith, R. D., Lerner, M. G. & Carlson, H. A. Binding MOAD (mother of all databases). Proteins. 60, 333–340 (2005).

    Article  Google Scholar 

  44. Ahmed, A., Smith, R. D., Clark, J. J., Dunbar, J. B. & Carlson, H. A. Recent improvements to Binding MOAD: a resource for protein–ligand binding affinities and structures. Nucleic Acids Res. 43, D465–D469 (2015).

    Article  Google Scholar 

  45. Smith, R. D. et al. Updates to Binding MOAD (mother of all databases): polypharmacology tools and their utility in drug repurposing. J. Mol. Biol. 431, 2423–2433 (2019).

    Article  Google Scholar 

  46. Wangtrakuldee, P. et al. Discovery of Inhibitors of Burkholderia pseudomallei methionine aminopeptidase with antibacterial activity. ACS Med. Chem. Lett. 4, 699–703 (2013).

    Article  Google Scholar 

  47. Helgren, T. R. et al. Rickettsia prowazekii methionine aminopeptidase as a promising target for the development of antibacterial agents. Bioorg. Med. Chem. 25, 813–824 (2017).

    Article  Google Scholar 

  48. Zhou, C., Ma, J., Zhang, J., Zhou, J. & Yang, H. Contrastive learning for debiased candidate generation in large-scale recommender systems. In KDD 3985–3995 (2021).

  49. Khac, P. H. L., Healy, G. & Smeaton, A. F. Contrastive representation learning: a framework and review. IEEE Access 8, 193907–193934 (2020).

    Article  Google Scholar 

  50. You, Y. et al. Graph contrastive learning with augmentations. In NeurIPS 5812–5823 (NeurIPS, 2020).

  51. Wang, Y., Wang, J., Cao, Z. & Barati Farimani, A. Molecular contrastive learning of representations via graph neural networks. Nat. Mach. Intell. 4, 279–287 (2022).

    Article  Google Scholar 

  52. Landrum, G. RDKit: Open-Source Cheminformatics (2020);

  53. Fey, M. & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds (ICLR, 2019).

  54. Enamine REAL Compounds (Enamine, 2020);

  55. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article  Google Scholar 

  56. Shultz, M. D. Two decades under the influence of the rule of five and the changing properties of approved oral drugs: miniperspective. J. Med. Chem. 62, 1701–1714 (2019).

    Article  Google Scholar 

  57. Gao, M. & Skolnick, J. Apoc: large-scale identification of similar protein pockets. Bioinformatics 29, 597–604 (2013).

    Article  Google Scholar 

  58. Poelking, C. & Chan, L. libpqr_v0.3 (Zenodo, 2022);

  59. Reppert, S. M. et al. Molecular characterization of a second melatonin receptor expressed in human retina and brain: the mel1b melatonin receptor. Proc. Natl. Acad. Sci. USA 92, 8734–8738 (1995).

    Article  Google Scholar 

  60. Boivin, R. P., Luu-The, V., Lachance, R., Labrie, F. & Poirier, D. Structure–activity relationships of 17α-derivatives of estradiol as inhibitors of steroid sulfatase. J. Med. Chem. 43, 4465–4478 (2000).

    Article  Google Scholar 

  61. Güzel, O., Innocenti, A., Scozzafava, A., Salman, A. & Supuran, C. T. Carbonic anhydrase inhibitors. Phenacetyl-, pyridylacetyl- and thienylacetyl-substituted aromatic sulfonamides act as potent and selective isoform VII inhibitors. Bioorg. Med. Chem. Lett. 19, 3170–3173 (2009).

    Article  Google Scholar 

Download references


L.C. acknowledges funding from Astex through the Sustaining Innovation Postdoctoral Program. We thank C. Murray and D. Branduardi for thoughtful comments on the manuscript, and L. Colwell for fruitful discussions.

Author information

Authors and Affiliations



C.P. and M.V. conceived the project. C.P. developed the PQR formalism. L.C. and C.P. developed the code, ran the experiments, performed the data analysis and wrote the paper. R.K. contributed to data preprocessing and visualization. All authors contributed to discussions and to the preparation of the manuscript.

Corresponding author

Correspondence to Carl Poelking.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Jannis Born and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Discussion, Figs. 1–4 and Tables 1–6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chan, L., Kumar, R., Verdonk, M. et al. A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design. Nat Mach Intell 4, 1130–1142 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics