Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Predicting drug–protein interaction using quasi-visual question answering system

A Publisher Correction to this article was published on 11 August 2020

This article has been updated


Identifying novel drug–protein interactions is crucial for drug discovery. For this purpose, many machine learning-based methods have been developed based on drug descriptors and one-dimensional protein sequences. However, protein sequences cannot accurately reflect the interactions in three-dimensional space. However, direct input of three-dimensional structure is of low efficiency due to the sparse three-dimensional matrix, and is also prevented by the limited number of co-crystal structures available for training. Here we propose an end-to-end deep learning framework to predict the interactions by representing proteins with a two-dimensional distance map from monomer structures (Image) and drugs with molecular linear notation (String), following the visual question answering mode. For efficient training of the system, we introduce a dynamic attentive convolutional neural network to learn fixed-size representations from the variable-length distance maps and a self-attentional sequential model to automatically extract semantic features from the linear notations. Extensive experiments demonstrate that our model obtains competitive performance against state-of-the-art baselines on the directory of useful decoys, enhanced (DUD-E), human and BindingDB benchmark datasets. Further attention visualization provides biological interpretation to depict highlighted regions of both protein and drug molecules.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: The framework of the proposed DrugVQA model.
Fig. 2: Performance comparisons of our proposed method and baselines on seen and unseen protein targets from the BindingDB dataset.
Fig. 3: Importance visualization of pocket and ligand pairs.

Data availability

All data used in this paper are publicly available and can be accessed at for the DUD-E dataset, for the BindingDB-IBM dataset, for human dataset and for the protein crystal structure.

Code availability

Demo, instructions and code for DrugVQA are available at

Change history


  1. 1.

    Koes, D. R., Baumgartner, M. P. & Camacho, C. J. Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J. Chem. Inf. Model. 53, 1893–1904 (2013).

    Article  Google Scholar 

  2. 2.

    Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).

    Google Scholar 

  3. 3.

    Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J. & Koes, D. R. Protein–ligand scoring with convolutional neural networks. J. Chem. Inf. Model. 57, 942–957 (2017).

    Article  Google Scholar 

  4. 4.

    Tsubaki, M., Tomii, K. & Sese, J. Compound–protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics 35, 309–318 (2018).

    Article  Google Scholar 

  5. 5.

    Gao, K. Y., Fokoue, A., Luo, H., Iyengar, A., Dey, S. & Zhang, P. Interpretable drug target prediction using deep neural representation. In Int. Joint Conf. on Artificial Intelligence 3371–3377 (IJCAI, 2018).

  6. 6.

    Zheng, S., Yan, X., Yang, Y. & Xu, J. Identifying structure–property relationships through SMILES syntax analysis with self-attention mechanism. J. Chem. Inf. Model. 59, 914–923 (2018).

    Article  Google Scholar 

  7. 7.

    Öztürk, H., Özgür, A. & Ozkirimli, E. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics 34, i821–i829 (2018).

    Article  Google Scholar 

  8. 8.

    Jastrzebski, S., Leśniak, D. & Czarnecki, W. M. Learning to SMILE(S). Preprint at (2016).

  9. 9.

    Wallach, I., Dzamba, M. & Heifets, A. AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. Preprint at (2015).

  10. 10.

    Stepniewska-Dziubinska, M. M., Zielenkiewicz, P. & Siedlecki, P. Development and evaluation of a deep learning model for protein–ligand binding affinity prediction. Bioinformatics 34, 3666–3674 (2018).

    Article  Google Scholar 

  11. 11.

    Skolnick, J., Kolinski, A. & Ortiz, A. R. MONSSTER: a method for folding globular proteins with a small number of distance restraints. J. Mol. Biol. 265, 217–241 (1997).

    Article  Google Scholar 

  12. 12.

    Namrata, A. & Possu, H. Generative modeling for protein structures. Adv. Neural Inf. Process. Syst. 31, 7494–7505 (2018).

    Google Scholar 

  13. 13.

    Bepler, T. & Berger, B. Learning protein sequence embeddings using information from structure. Preprint at (2019).

  14. 14.

    Yang, Z., He, X., Gao, J., Deng, L. & Smola, A. Stacked attention networks for image question answering. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 21–29 (2016).

  15. 15.

    Xu, K. et al. Show, attend and tell: neural image caption generation with visual attention. In Int. Conf. on Machine Learning 37, 2048–2057 (PMLR, 2015).

  16. 16.

    Noh, H., Seo, P. H. & Han, B. Image question answering using convolutional neural network with dynamic parameter prediction. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 30–38 (IEEE, 2016).

  17. 17.

    Agrawal, A. et al. VQA: visual question answering. Int. J. Comput. Vis. 123, 4–31 (2017).

    MathSciNet  Article  Google Scholar 

  18. 18.

    Antol, S. et al. VQA: Visual Question Answering. In Proc. IEEE International Conference on Computer Vision 2425–2433 (IEEE, 2015).

  19. 19.

    Weininger, D. et al. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).

    Article  Google Scholar 

  20. 20.

    Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article  Google Scholar 

  21. 21.

    Ma, L., Lu, Z. & Li, H. Learning to answer questions from image using convolutional neural network. In Thirtieth AAAI Conference on Artificial Intelligence (AAAI, 2016).

  22. 22.

    Shih, K. J., Singh, S. & Hoiem, D. Where to look: focus regions for visual question answering. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 4613–4621 (IEEE, 2016).

  23. 23.

    Xu, H. & Saenko, K. Ask, attend and answer: exploring question-guided spatial attention for visual question answering. In European Conference on Computer Vision (Springer, 2016).

  24. 24.

    Schwartz, I., Schwing, A. & Hazan, T. High-order attention models for visual question answering. Adv. Neural Inf. Process. Syst. 3664–3674 (2017).

  25. 25.

    Bleakley, K. & Yamanishi, Y. Supervised prediction of drug–target interactions using bipartite local models. Bioinformatics 25, 2397–2403 (2009).

    Article  Google Scholar 

  26. 26.

    Ballester, P. J. & Mitchell, J. B. O. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking. Bioinformatics 26, 1169–1175 (2010).

    Article  Google Scholar 

  27. 27.

    Durrant, J. D. & McCammon, J. A. NNScore 2.0: a neural-network receptor–ligand scoring function. J. Chem. Inf. Model. 51, 2897–2903 (2011).

    Article  Google Scholar 

  28. 28.

    Tabei, Y. & Yamanishi, Y. Scalable prediction of compound–protein interactions using minwise hashing. BMC Syst. Biol. 7, S3 (2013).

    Article  Google Scholar 

  29. 29.

    Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 48 (2017).

    Article  Google Scholar 

  30. 30.

    He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. In European Conference on Computer Vision 630–645 (Springer, 2016).

  31. 31.

    D.-A. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by exponential linear units (ELUs). Preprint at (2015).

  32. 32.

    Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In Proc. 27th International Conference on Machine Learning 807–814 (ICML, 2010).

  33. 33.

    Lin, Z. et al. A structured self-attentive sentence embedding. Preprint at (2017).

  34. 34.

    Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594 (2012).

    Article  Google Scholar 

  35. 35.

    Liu, H., Sun, J., Guan, J., Zheng, J. & Zhou, S. Improving compound–protein interaction prediction by building up highly credible negative samples. Bioinformatics 31, i221–i229 (2015).

    Article  Google Scholar 

  36. 36.

    Gilson, M. K. et al. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 44, D1045–D1053 (2015).

    Article  Google Scholar 

  37. 37.

    Paszke, A. et al. Automatic differentiation in PyTorch. In Neural Information Processing Systems Workshop Autodiff (NeurIPS, 2017).

  38. 38.

    Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at (2014).

  39. 39.

    Fokoue, A., Sadoghi, M., Hassanzadeh, O. & Zhang, P. Predicting drug–drug interactions through large-scale similarity-based link prediction. In European Semantic Web Conference 774–789 (Springer, 2016).

  40. 40.

    Wen, M. et al. Deep-learning-based drug–target interaction prediction. J. Proteome Res. 16, 1401–1409 (2017).

    Article  Google Scholar 

  41. 41.

    Torng, W. & Altman, R. B. Graph convolutional neural networks for predicting drug-target interactions. J. Chem. Inf. Model. 59, 4131–4149 (2019).

    Article  Google Scholar 

  42. 42.

    Burley, S. K. et al. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 47, D464–D474 (2018).

    Article  Google Scholar 

Download references


The work was supported in part by the National Key R&D Program of China (2018YFC0910500), GD Frontier and Key Tech Innovation Program (2018B010109006,2019B020228001), the National Natural Science Foundation of China (61772566, U1611261 and 81801132, 81903540) and the programme for Guangdong Introducing Innovative and Entrepreneurial Teams (2016ZT06D211).

Author information




S.Z., Y.L. and Y.Y. contributed concept and implementation. S.Z. and Y.L. co-designed experiments. S.Z. and Y.L. were responsible for programming. All authors contributed to the interpretation of results. S.Z. and Y.Y. wrote the manuscript. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Jun Xu or Yuedong Yang.

Ethics declarations

Competing interests

The authors declare no competing interests

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information


Supplementary dataset details, neural network training and performance details, visualization details, Supplementary Figs. 1–5 and Table 1.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zheng, S., Li, Y., Chen, S. et al. Predicting drug–protein interaction using quasi-visual question answering system. Nat Mach Intell 2, 134–140 (2020).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing