Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Accelerated rational PROTAC design via deep learning and molecular simulations

Abstract

Proteolysis-targeting chimeras (PROTACs) have emerged as effective tools to selectively degrade disease-related proteins by using the ubiquitin-proteasome system. Developing PROTACs involves extensive tests and trials to explore the vast chemical space. To accelerate this process, we propose a novel deep generative model for the rational design of PROTACs in a low-resource setting, which is then guided to sample PROTACs with optimal pharmacokinetics through deep reinforcement learning. Applying this method to the bromodomain-containing protein 4 target protein, we generated 5,000 compounds that were further filtered through machine learning-based classifiers and physics-driven simulations. As a proof of concept, we identified, synthesized and experimentally tested six candidate bromodomain-containing protein 4-degrading PROTACs, of which three were validated by cell-based assays and western blot analysis. One lead candidate was further tested and demonstrated favourable pharmacokinetics in mice. This combination of deep learning and molecular simulations may facilitate rational PROTAC design and optimization.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Approach overview.
Fig. 2: Performance of PROTAC-RL.
Fig. 3: In silico post-generation screening.
Fig. 4: Bioactivity and PK characterization.
Fig. 5: Binding mode analysis and atomistic simulation.

Similar content being viewed by others

Data availability

All data used in this paper are publicly available and can be accessed at http://cadd.zju.edu.cn/protacdb/ for the PROTAC-DB dataset, https://zinc15.docking.org/ for the ZINC dataset and https://www.rcsb.org for the protein crystal structure. Source data are provided with this paper.

Code availability

Demo, instructions and codes for PROTAC-RL are available at https://github.com/biomed-AI/PROTAC-RL.

References

  1. Sakamoto, K. M. et al. Protacs: chimeric molecules that target proteins to the Skp1–cullin–F box complex for ubiquitination and degradation. Proc. Natl Acad. Sci. U. S. A. 98, 8554–8559 (2001).

    Article  Google Scholar 

  2. Deshaies, R. J. Prime time for PROTACs. Nat. Chem. Biol. 11, 634–635 (2015).

    Article  Google Scholar 

  3. Dale, B. et al. Advancing targeted protein degradation for cancer therapy. Nat. Rev. Cancer. 21, 1–17 (2021).

    Article  Google Scholar 

  4. Pettersson, M. & Crews, C. M. PROteolysis TArgeting Chimeras (PROTACs)—past, present and future. Drug Discov. Today Technol. 31, 15–27 (2019).

    Article  Google Scholar 

  5. Lai, A. C. & Crews, C. M. Induced protein degradation: an emerging drug discovery paradigm. Nat. Rev. Drug Discov. 16, 101–114 (2017).

    Article  Google Scholar 

  6. Bai, L. et al. A potent and selective small-molecule degrader of STAT3 achieves complete tumor regression in vivo. Cancer Cell 36, 498–511. e417 (2019).

    Article  Google Scholar 

  7. Liu, Z. et al. Design and synthesis of EZH2-based PROTACs to degrade the PRC2 complex for targeting the noncatalytic activity of EZH2. J. Med. Chem. 64, 2829–2848 (2021).

    Article  Google Scholar 

  8. Han, X. et al. Discovery of ARD-69 as a highly potent proteolysis targeting chimera (PROTAC) degrader of androgen receptor (AR) for the treatment of prostate cancer. J. Med. Chem. 62, 941–964 (2019).

    Article  Google Scholar 

  9. Zoppi, V. et al. Iterative design and optimization of initially inactive proteolysis targeting chimeras (PROTACs) identify VZ185 as a potent, fast, and selective von Hippel–Lindau (VHL) based dual degrader probe of BRD9 and BRD7. J. Med. Chem. 62, 699–726 (2018).

    Article  Google Scholar 

  10. Nowak, R. P. et al. Plasticity in binding confers selectivity in ligand-induced protein degradation. Nat. Chem. Biol. 14, 706–714 (2018).

    Article  Google Scholar 

  11. Bemis, T. A., La Clair, J. J. & Burkart, M. D. Unraveling the role of linker design in proteolysis targeting chimeras. J. Med. Chem. 64, 8042–8052 (2021).

    Article  Google Scholar 

  12. Smith, B. E. et al. Differential PROTAC substrate specificity dictated by orientation of recruited E3 ligase. Nat. Commun. 10, 131 (2019).

    Article  Google Scholar 

  13. Edmondson, S. D., Yang, B. & Fallan, C. Proteolysis targeting chimeras (PROTACs) in ‘beyond rule-of-five’chemical space: recent progress and future challenges. Bioorg. Med. Chem. Lett. 29, 1555–1564 (2019).

    Article  Google Scholar 

  14. Garber, K. The PROTAC gold rush. Nat. Biotechnol. 40, 12–16 (2022).

    Article  Google Scholar 

  15. Cecchini, C., Pannilunghi, S., Tardy, S. & Scapozza, L. From conception to development: investigating PROTACs features for improved cell permeability and successful protein degradation. Front. Chem. 9, 672267 (2021).

    Article  Google Scholar 

  16. Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).

    Article  Google Scholar 

  17. Ekins, S. et al. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 18, 435–441 (2019).

    Article  Google Scholar 

  18. Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).

    Article  Google Scholar 

  19. Gupta, A. et al. Generative recurrent networks for de novo drug design. Mol. Inf. 37, 1700111 (2018).

    Article  Google Scholar 

  20. Kotsias, P.-C. et al. Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks. Nat. Mach. Intell. 2, 254–265 (2020).

    Article  Google Scholar 

  21. Zheng, S. et al. QBMG: quasi-biogenic molecule generator with deep recurrent neural network. J Cheminform 11, 5 (2019).

    Article  Google Scholar 

  22. Wang, J., Zheng, S., Chen, J. & Yang, Y. Meta learning for low-resource molecular optimization. J. Chem. Inf. Model. 61, 1627–1636 (2021).

    Article  Google Scholar 

  23. Zheng, S. et al. Deep scaffold hopping with multimodal transformer neural networks. J Cheminform 13, 1–15 (2021).

    Article  Google Scholar 

  24. Gomez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  Google Scholar 

  25. Skalic, M., Jimenez, J., Sabbadin, D. & De Fabritiis, G. Shape-based generative modeling for de novo drug design. J. Chem. Inf. Model. 59, 1205–1214 (2019).

    Article  Google Scholar 

  26. De Cao, N. & Kipf, T. MolGAN: an implicit generative model for small molecular graphs. Preprint at https://arxiv.org/abs/1805.11973 (2018).

  27. Kadurin, A., Nikolenko, S., Khrabrov, K., Aliper, A. & Zhavoronkov, A. druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol Pharm 14, 3098–3104 (2017).

    Article  Google Scholar 

  28. Walters, W. P. & Barzilay, R. Applications of deep learning in molecule generation and molecular property prediction. Acc. Chem. Res. 54, 263–270 (2021).

    Article  Google Scholar 

  29. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).

    Article  Google Scholar 

  30. Das, P. et al. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nat. Biomed. Eng. 5, 613–623 (2021).

    Article  Google Scholar 

  31. Mason, D. M. et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat. Biomed. Eng. 5, 600–612 (2021).

    Article  Google Scholar 

  32. Imrie, F., Bradley, A. R., van der Schaar, M. & Deane, C. M. Deep generative models for 3D linker design. J. Chem. Inf. Model. 60, 1983–1995 (2020).

    Article  Google Scholar 

  33. Yang, Y. et al. SyntaLinker: automatic fragment linking with deep conditional transformer neural networks. Chem. Sci. 11, 8312–8322 (2020).

    Article  Google Scholar 

  34. Weng, G. et al. PROTAC-DB: an online database of PROTACs. Nucleic Acids Res. 49, D1381–D1387 (2021).

    Article  Google Scholar 

  35. Altae-Tran, H., Ramsundar, B., Pappu, A. S. & Pande, V. Low data drug discovery with one-shot learning. ACS Cent Sci 3, 283–293 (2017).

    Article  Google Scholar 

  36. Vaswani, A. et al. Attention is all you need. In Guyon, I. et al. (eds). Advances in Neural Information Processing Systems, 30 (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  37. Arús-Pous, J. et al. Randomized SMILES strings improve the quality of molecular generative models. J. Cheminform. 11, 1–13 (2019).

    Article  Google Scholar 

  38. Wang, Z. et al. Sample efficient actor-critic with experience replay. Preprint at https://arxiv.org/abs/1611.01224 (2016).

  39. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J Cheminform 9, 48 (2017).

    Article  Google Scholar 

  40. ClinicalTrials.gov database, https://clinicaltrials.gov/

  41. Winter, G. E. et al. BET bromodomain proteins function as master transcription elongation factors independent of CDK9 recruitment. Mol. Cell 67, 5–18 (2017). e19.

    Article  Google Scholar 

  42. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article  Google Scholar 

  43. Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).

    Article  Google Scholar 

  44. Butina, D. Unsupervised data base clustering based on daylight’s fingerprint and Tanimoto similarity: a fast and automated way to cluster small and large data sets. J. Chem. Inf. Comput. Sci. 39, 747–750 (1999).

    Article  Google Scholar 

  45. Zaidman, D., Prilusky, J. & London, N. PRosettaC: Rosetta based modeling of PROTAC mediated ternary complexes. J. Chem. Inf. Model. 60, 4894–4903 (2020).

    Article  Google Scholar 

  46. Paggi, J. M. et al. Leveraging nonstructural data to predict structures and affinities of protein-ligand complexes. Proc. Natl Acad. Sci. U. S. A. https://doi.org/10.1073/pnas.2112621118 (2021).

  47. Zheng, S., Li, Y., Chen, S., Xu, J. & Yang, Y. Predicting drug–protein interaction using quasi-visual question answering system. Nat. Mach. Intell. 2, 134–140 (2020).

    Article  Google Scholar 

  48. Paiva, S. L. & Crews, C. M. Targeted protein degradation: elements of PROTAC design. Curr. Opin. Chem. Biol. 50, 111–119 (2019).

    Article  Google Scholar 

  49. Cheng, M. et al. Discovery of potent and selective epidermal growth factor receptor (EGFR) bifunctional small-molecule degraders. J. Med. Chem. 63, 1216–1232 (2020).

    Article  Google Scholar 

  50. Jimenez-Luna, J., Skalic, M., Weskamp, N. & Schneider, G. Coloring molecules with explainable artificial intelligence for preclinical relevance assessment. J. Chem. Inf. Model. 61, 1083–1094 (2021).

    Article  Google Scholar 

  51. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  Google Scholar 

  52. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  53. Sterling, T. & Irwin, J. J. ZINC 15–ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).

    Article  Google Scholar 

  54. Ermondi, G., Garcia-Jimenez, D. & Caron, G. PROTACs and building blocks: the 2D chemical space in very early drug discovery. Molecules 26, 672 (2021).

    Article  Google Scholar 

  55. Hussain, J. & Rea, C. Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J. Chem. Inf. Model. 50, 339–348 (2010).

    Article  Google Scholar 

  56. Nair, V. & Hinton, G. E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Furnkranz, J. and Joachims, T. (eds) Proceedings of the 27th International Conference on Machine Learning, 807-814, (2010). https://icml.cc/Conferences/2010/papers/432.pdf

  57. Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016).

  58. He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Computer Society, 770–778 (2016).

  59. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 1–14 (2017).

    Article  Google Scholar 

  60. Cai, C. et al. Transfer learning for drug discovery. J. Med. Chem. 63, 8683–8694 (2020).

    Article  Google Scholar 

  61. Burslem, G. M. et al. The advantages of targeted protein degradation over inhibition: an RTK case study. Cell Chem. Biol. 25, 67–77. e63 (2018).

    Article  Google Scholar 

  62. Goracci, L. et al. Understanding the metabolism of proteolysis targeting chimeras (PROTACs): the next step toward pharmaceutical applications. J. Med. Chem. 63, 11615–11638 (2020).

    Article  Google Scholar 

  63. Dressman, J. B. & Reppas, C. In vitro–in vivo correlations for lipophilic, poorly water-soluble drugs. Eur. J. Pharm. Sci. 11, S73–S80 (2000).

    Article  Google Scholar 

  64. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).

    Article  Google Scholar 

  65. Veber, D. F. et al. Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 45, 2615–2623 (2002).

    Article  Google Scholar 

  66. DeGoey, D. A., Chen, H.-J., Cox, P. B. & Wendt, M. D. Beyond the rule of 5: lessons learned from AbbVie’s drugs and compound collection: miniperspective. J. Med. Chem. 61, 2636–2651 (2017).

    Article  Google Scholar 

  67. Schneidman-Duhovny, D., Inbar, Y., Nussinov, R. & Wolfson, H. J. PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Res. 33, W363–W367 (2005).

    Article  Google Scholar 

Download references

Acknowledgements

This study has been supported by the National Key R&D Program of China (2020YFB0204803, Y.Y.), National Natural Science Foundation of China (61772566, Y.Y.) and Guangdong Key Field R&D Plan (2019B020228001, Y.Y.; 2018B010109006, Y.Y.). We thank R. Hu, W. Lu, L. Shi and J. Zhang for helpful discussions.

Author information

Authors and Affiliations

Authors

Contributions

S.Z. and Y.Y. contributed the concept and experimental design. S.Z., Y.T. and C.L. contributed the code implementation. Z.W., X.S. and Y.T. contributed the development of the molecular simulations part. S.Z. and Q.Z. contributed to the wet experiment design. Y.Y., S.Z. and Y.T. wrote the manuscript. H.C participated in the discussion and revision of the manuscript. All authors contributed to the interpretation of the results. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Yuedong Yang.

Ethics declarations

Competing interests

S.Z., Z.W., C.L., Z.Z. and X.S. work directly or indirectly for Galixir. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Guowei Wei and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information

Supplementary text, Figs. 1–9, Tables 1-6 and chemical synthesis and analytical data.

Reporting summary

Source data

Source Data Fig. 4

Unprocessed western blots for Fig. 4d–f.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, S., Tan, Y., Wang, Z. et al. Accelerated rational PROTAC design via deep learning and molecular simulations. Nat Mach Intell 4, 739–748 (2022). https://doi.org/10.1038/s42256-022-00527-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-022-00527-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing