Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Learning characteristics of graph neural networks predicting protein–ligand affinities

Abstract

In drug design, compound potency prediction is a popular machine learning application. Graph neural networks (GNNs) predict ligand affinity from graph representations of protein–ligand interactions typically extracted from X-ray structures. Despite some promising findings leading to claims that GNNs can learn details of protein–ligand interactions, such predictions are also controversially viewed. For example, evidence has been presented that GNNs might not learn protein–ligand interactions but memorize ligand and protein training data instead. We have carried out affinity predictions with six GNN architectures on community-standard datasets and rationalized the predictions using explainable artificial intelligence. The results confirm a strong influence of ligand—but not protein—memorization during GNN learning and also show that some GNN architectures increasingly prioritize interaction information for predicting high affinities. Thus, while GNNs do not comprehensively account for protein–ligand interactions and physical reality, depending on the model, they balance ligand memorization with learning of interaction patterns.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Rationalizing affinity predictions based on protein–ligand interaction graphs.
Fig. 2: Relative proportions of edges determining predictions for different affinity subranges.
Fig. 3: Varying numbers of edges determining GC-GNN predictions.
Fig. 4: Edges determining GraphSAGE predictions.
Fig. 5: Mapping of edges determining predictions.

Similar content being viewed by others

Data availability

The data generated in this study are freely available on GitHub https://github.com/AndMastro/protein-ligand-GNN. The ligand interaction graph data were taken from Volkov et al.22. PDBbind data are available at: http://www.pdbbind.org.cn/. Source data are provided with this paper.

Code availability

The code generated in this study is freely available on GitHub https://github.com/AndMastro/protein-ligand-GNN, with an archived version also available through Zenodo54 at https://doi.org/10.5281/zenodo.8358539. A reproducible code capsule is available through CodeOcean at https://codeocean.com/capsule/9675097 (ref. 55), EdgeSHAPer code can be accessed on Zenodo56 https://doi.org/10.5281/zenodo.8358595 and GitHub https://github.com/AndMastro/EdgeSHAPer.

References

  1. Akamatsu, M. Current state and perspectives of 3D-QSAR. Curr. Top. Med. Chem. 2, 1381–1394 (2002).

    Article  Google Scholar 

  2. Lewis, R. A. & Wood, D. Modern 2D QSAR for drug discovery. WIREs Comp. Mol. Sci. 4, 505–522 (2014).

    Article  Google Scholar 

  3. Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A. & Vapnik, V. Support vector regression machines. Adv. Neur. Inform. Proc. Syst. 9 (1996).

  4. Smola, A. J. & Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 14, 199–222 (2004).

    Article  MathSciNet  Google Scholar 

  5. Svetnik, V. et al. Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003).

    Article  Google Scholar 

  6. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  Google Scholar 

  7. Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).

    Article  Google Scholar 

  8. Lavecchia, A. Deep learning in drug discovery: opportunities, challenges and future prospects. Drug Discov. Today 24, 2017–2032 (2019).

    Article  Google Scholar 

  9. Kim, J., Park, S., Min, D. & Kim, W. Comprehensive survey of recent drug discovery using deep learning. Int. J. Mol. Sci. 22, 9983 (2021).

    Article  Google Scholar 

  10. Bajorath, J. Deep machine learning for computer-aided drug design. Front. Drug Discov. 2, 829043 (2022).

    Article  Google Scholar 

  11. Guedes, I. A., Pereira, F. S. S. & Dardenne, L. E. Empirical scoring functions for structure-based virtual screening: applications, critical aspects, and challenges. Front. Pharmacol. 9, 1089 (2018).

    Article  Google Scholar 

  12. Liu, J. & Wang, R. Classification of current scoring functions. J. Chem. Inf. Model. 55, 475–482 (2015).

    Article  Google Scholar 

  13. Li, H., Sze, K.-H., Lu, G. & Ballester, P. J. Machine-learning scoring functions for structure-based virtual screening. WIREs Comp. Mol. Sci. 11, e1478 (2021).

    Article  Google Scholar 

  14. Gleeson, M. P. & Gleeson, D. QM/MM calculations in drug discovery: a useful method for studying binding phenomena? J. Chem. Inf. Model. 49, 670–677 (2009).

    Article  Google Scholar 

  15. Williams-Noonan, B. J., Yuriev, E. & Chalmers, D. K. Free energy methods in drug design: prospects of ‘alchemical perturbation’ in medicinal chemistry. J. Med. Chem. 61, 638–649 (2018).

    Article  Google Scholar 

  16. Gomes, J., Ramsundar, B., Feinberg, E. N. & Pande, V. S. Atomic convolutional networks for predicting protein-ligand binding affinity. Preprint at https://doi.org/10.48550/arXiv.1703.10603 (2017).

  17. Jimenez, J., Skalic, M., Martinez-Rosell, G. & De Fabritiis, G. K(DEEP): protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J. Chem. Inf. Model. 58, 287–296 (2018).

    Article  Google Scholar 

  18. Stepniewska-Dziubinska, M. M., Zielenkiewicz, P. & Siedlecki, P. Development and evaluation of a deep learning model for protein-ligand binding affinity prediction. Bioinformatics 34, 3666–3674 (2018).

    Article  Google Scholar 

  19. Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).

    Article  Google Scholar 

  20. Jiang, D. et al. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J. Chem. Inform. 13, 12 (2021).

    Google Scholar 

  21. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. Proc. Mach. Learn. Res. 70, 1263–1272 (2017).

    Google Scholar 

  22. Volkov, M. et al. On the frustration to predict binding affinities from protein–ligand structures with deep neural networks. J. Med. Chem. 65, 7946–7958 (2022).

    Article  Google Scholar 

  23. Shen, H., Zhang, Y., Zheng, C., Wang, B. & Chen, P. A Cascade graph convolutional network for predicting protein–ligand binding affinity. Int. J. Mol. Sci. 22, 4023 (2021).

    Article  Google Scholar 

  24. Xiong, J., Xiong, Z., Chen, K., Jiang, H. & Zheng, M. Graph neural networks for automated de novo drug design. Drug Discov. Today 26, 1382–1393 (2021).

    Article  Google Scholar 

  25. Son, J. & Kim, D. Development of a graph convolutional neural network model for efficient prediction of protein-ligand binding affinities. PLoS ONE 16, e0249404 (2021).

    Article  Google Scholar 

  26. Nguyen, T. et al. GraphDTA: predicting drug-target binding affinity with graph neural networks. Bioinformatics 37, 1140–1147 (2021).

    Article  Google Scholar 

  27. Wang, J. & Dokholyan, N. V. Yuel: improving the generalizability of structure-free compound–protein interaction prediction. J. Chem. Inf. Model. 62, 463–471 (2022).

    Article  Google Scholar 

  28. Yang, J., Shen, C. & Huang, N. Predicting or pretending: artificial intelligence for protein-ligand interactions lack of sufficiently large and unbiased datasets. Front. Pharmacol. 11, 69 (2020).

    Article  Google Scholar 

  29. Kipf, T. N. & Welling M. Semi-supervised classification with graph convolutional networks. Preprint at https://doi.org/10.48550/arXiv.1609.02907 (2016).

  30. Velickovic, P. et al. Graph attention networks. Preprint at https://doi.org/10.48550/arXiv.1710.10903 (2017).

  31. Xu, K., Hu, W. Leskovec J. & Jegalka S. How powerful are graph neural networks? Preprint at https://doi.org/10.48550/arXiv.1810.00826 (2018).

  32. Hu, W. et al. Strategies for pre-training graph neural networks. Preprint at https://doi.org/10.48550/arXiv.1905.12265 (2019).

  33. Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. Adv. Neur. Inform. Proc. Syst. 31 (2017).

  34. Morris, C. et al. Weisfeiler and Leman go neural: higher-order graph neural networks. In Proc. AAAI Conference on Artificial Intelligence Vol. 33, 4602–4609 (2019).

  35. Wang, R., Fang, X., Lu, Y., Yang, C. Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).

    Article  Google Scholar 

  36. Liu, Z. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2015).

    Article  Google Scholar 

  37. Liu, Z. et al. Forging the basis for developing protein-ligand interaction scoring functions. Acc. Chem. Res. 50, 302–309 (2017).

    Article  Google Scholar 

  38. Schmitt, S., Kuhn, D. & Klebe, G. A new method to detect related function among proteins independent of sequence and fold homology. J. Mol. Biol. 323, 387–406 (2002).

    Article  Google Scholar 

  39. Desaphy, J., Raimbaud, E., Ducrot, P. & Rognan, D. Encoding protein-ligand interaction patterns in fingerprints and graphs. J. Chem. Inf. Model. 53, 623–637 (2013).

    Article  Google Scholar 

  40. Mastropietro, A., Pasculli, G., Feldmann, C., Rodríguez-Pérez, R. & Bajorath, J. EdgeSHAPer: bond-centric Shapley value-based explanation method for graph neural networks. iScience 25, 105043 (2022).

    Article  Google Scholar 

  41. Mastropietro, A., Pasculli, G. & Bajorath, J. Protocol to explain graph neural network predictions using an edge-centric Shapley value-based approach. STAR Protoc. 3, 101887 (2022).

    Article  Google Scholar 

  42. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neur. Inform. Proc. Syst. 30 (2017).

  43. Shapley, L. S. in Contributions to the Theory of Games (AM-28) Vol. II (eds Kuhn, H. W. & Tucker, A. W.) 307–317 (Princeton Univ. Press, 1953).

  44. Ying, Z., Bourgeois, D., You, J., Zitnik, M. & Leskovec, J. GNNExplainer: generating explanations for graph neural networks. Adv. Neur. Inform. Proc. Syst. 32, 9240–9251 (2019).

    Google Scholar 

  45. Pfungst, O. Clever Hans (the horse of Mr. Von Osten): contribution to experimental animal and human psychology. J. Philos. Psychol. Sci. Method 8, 663–666 (1911).

    Google Scholar 

  46. Lapuschkin, S. et al. Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 10, 1096 (2019).

    Article  Google Scholar 

  47. Da Silva, F., Desaphy, J. & Rognan, D. IChem: a versatile toolkit for detecting, comparing, and predicting protein-ligand interactions. Chem. Med. Chem. 13, 507–510 (2018).

    Article  Google Scholar 

  48. Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. In Proc. 7th Python in Science Conference (SciPy008) (eds. Varoquaux, G. et al.) 11–15 (2008).

  49. Ahsan, M. M., Mahmud, M. P., Saha, P. K., Gupta, K. D. & Siddique, Z. Effect of data scaling methods on machine learning algorithms and model performance. Technologies 9, 52 (2021).

    Article  Google Scholar 

  50. Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet  Google Scholar 

  51. Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neur. Inform. Proc. Syst. 32, 8024–8035 (2019).

    Google Scholar 

  52. Fey, M. & Lenssen J. E. Fast graph representation learning with PyTorch Geometric. Preprint at https://doi.org/10.48550/arXiv.1903.02428 (2019).

  53. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2014).

  54. Mastropietro, A. & Pasculli, G. AndMastro/protein-ligand-GNN: v.1.0.0. Zenodo https://doi.org/10.5281/zenodo.8358539 (2023).

  55. Mastropietro, A., Pasculli, G. & Bajorath, J., Predicting affinities from simplistic protein-ligand interaction representations–what do graph neural networks learn? CodeCapsule. Code Ocean codeocean.com/capsule/8085311 (2023).

  56. Mastropietro, A., Feldmann, C. & Pasculli, G. EdgeSHAPer: v.1.1.0. Zenodo https://doi.org/10.5281/zenodo.8358595 (2023).

Download references

Acknowledgements

This work was partly supported (A.M.) by the EC H2020RIA project ‘SoBigData++’ (grant no. 871042), PNRR MUR project no. PE0000013-FAIR and PNRR MUR project no. IR0000013-SoBigData.it.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization was done by J.B. Methodology was done by A.M., G.P. and J.B. Data and code were written by A.M. and G.P. The investigation was carried out by A.M. and G.P. Analysis was done by A.M. and J.B. The original draft was written by A.M. and J.B. Review and editing of the draft were done by A.M. and J.B.

Corresponding author

Correspondence to Jürgen Bajorath.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Sai Pooja Mahajan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Table 1.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Table 1

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mastropietro, A., Pasculli, G. & Bajorath, J. Learning characteristics of graph neural networks predicting protein–ligand affinities. Nat Mach Intell 5, 1427–1436 (2023). https://doi.org/10.1038/s42256-023-00756-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00756-9

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics