Abstract
Information regarding the residue–residue distance between interacting proteins is important for modelling the structures of protein complexes, as well as being valuable for understanding the molecular mechanism of protein–protein interactions. With the advent of deep learning, many methods have been developed to accurately predict the intra-protein residue–residue contacts of monomers. However, it is still challenging to accurately predict inter-protein residue–residue contacts for protein complexes, especially hetero-protein complexes. Here we develop a protein language model-based deep learning method to predict the inter-protein residue–residue contacts of protein complexes—named DeepInter—by introducing a triangle-aware mechanism of triangle update and triangle self-attention into the deep neural network. We extensively validate DeepInter on diverse test sets of 300 homodimeric, 28 CASP-CAPRI homodimeric and 99 heterodimeric complexes and compare it with state-of-the-art methods including CDPred, DeepHomo2.0, GLINTER and DeepHomo. The results demonstrate the accuracy and robustness of DeepInter.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon request. A full list of the protein complexes used in this study is also provided in Supplementary Data 10. The protein structures used in this study are all available in the PDB, and the sequence database of Uniref30_2020_03 used in this study is available at https://www.uniprot.org/help/uniref/. Source data are provided with this paper.
Code availability
The DeepInter package is freely available at http://huanglab.phys.hust.edu.cn/DeepInter/ and https://doi.org/10.5281/zenodo.8304327 ref. 61.
References
Yadid, I. & Tawfik, D. S. Reconstruction of functional beta-propeller lectins via homo-oligomeric assembly of shorter fragments. J. Mol. Biol. 365, 10–17 (2007).
Goodsell, D. S. & Olson, A. J. Structural symmetry and protein function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153 (2000).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Proteins 89, 1607–1617 (2021).
Ko, J. & Lee, J. Can AlphaFold2 predict protein-peptide complex structures accurately? Preprint at https://www.biorxiv.org/content/10.1101/2021.07.27.453972v2 (2021).
Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein–protein interactions using AlphaFold2. Nat. Commun. 13, 1265 (2022).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2 (2021).
Yan, Y. & Huang, S. Y. Accurate prediction of inter-protein residue-residue contacts for homo-oligomeric protein complexes. Brief. Bioinformatics. 22, bbab038 (2021).
Yan, Y., Tao, H., He, J. & Huang, S. Y. The HDOCK server for integrated protein–protein docking. Nat. Protoc. 15, 1829–1852 (2020).
Du, Z. et al. The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 16, 5634–5651 (2021).
Yan, Y., Tao, H. & Huang, S. Y. HSYMDOCK: a docking web server for predicting the structure of protein homo-oligomers with Cn or Dn symmetry. Nucleic Acids Res. 46, W423–W431 (2018).
Soltanikazemi, E., Quadir, F., Roy, R. S., Guo, Z. & Cheng, J. Distance-based reconstruction of protein quaternary structures from inter-chain contacts. Proteins 90, 720–731 (2022).
Hopf, T. A. et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, e03430 (2014).
Roy, R. S., Quadir, F., Soltanikazemi, E. & Cheng, J. A deep dilated convolutional residual network for predicting interchain contacts of protein homodimers. Bioinformatics 38, 1904–1910 (2022).
Quadir, F., Roy, R. S., Halfmann, R. & Cheng, J. DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning. Sci. Rep. 11, 12295 (2021).
Sanchez-Garcia, R., Sorzano, C. O. S., Carazo, J. M. & Segura, J. BIPSPI: a method for the prediction of partner-specific protein-protein interfaces. Bioinformatics 35, 470–477 (2019).
Sanchez-Garcia, R., Macias, J. R., Sorzano, C. O. S., Carazo, J. M. & Segura, J. BIPSPI+: mining type-specific datasets of protein complexes to improve protein binding site prediction. J. Mol. Biol. 434, 167556 (2022).
Zhao, Z. & Gong, X. Protein-protein interaction interface residue pair prediction based on deep learning architecture. IEEE/ACM Trans. Comput. Biol. Bioinform. 16, 1753–1759 (2019).
Liu, J. & Gong, X. Attention mechanism enhanced LSTM with residual architecture and its application for protein-protein interaction residue pairs prediction. BMC Bioinformatics 20, 609 (2019).
Soleymani, F., Paquet, E., Viktor, H., Michalowski, W. & Spinello, D. Protein-protein interaction prediction with deep learning: a comprehensive review. Comput. Struct. Biotechnol. J. 20, 5316–5341 (2022).
Baranwal, M. et al. Struct2Graph: a graph attention network for structure based predictions of protein-protein interactions. BMC Bioinformatics 23, 370 (2022).
Hu, X., Feng, C., Zhou, Y., Harrison, A. & Chen, M. DeepTrio: a ternary prediction system for protein-protein interaction using mask multiple parallel convolutional neural networks. Bioinformatics 38, 694–702 (2022).
Soleymani, F., Paquet, E., Viktor, H. L., Michalowski, W. & Spinello, D. ProtInteract: a deep learning framework for predicting protein-protein interactions. Comput. Struct. Biotechnol. J. 21, 1324–1348 (2023).
Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011).
Jones, D. T., Buchan, D. W., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012).
Ekeberg, M., Lövkvist, C., Lan, Y., Weigt, M. & Aurell, E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E 87, 012707 (2013).
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).
Li, Y. et al. Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput. Biol. 17, e1008865 (2021).
Adhikari, B., Hou, J. & Cheng, J. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 34, 1466–1472 (2018).
Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13, e1005324 (2017).
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).
Wang, W., Peng, Z. & Yang, J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat. Comput. Sci. 2, 804–814 (2022).
Lin, P., Yan, Y. & Huang, S. Y. DeepHomo2.0: improved protein-protein contact prediction of homodimers by transformer-enhanced deep learning. Brief. Bioinformatics 24, bbac499 (2023).
Xie, Z. & Xu, J. Deep graph learning of inter-protein contacts. Bioinformatics 38, 947–953 (2022).
Guo, Z., Liu, J., Skolnick, J. & Cheng, J. Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks. Nat. Commun. 13, 6963 (2022).
Bitbol, A. F., Dwyer, R. S., Colwell, L. J. & Wingreen, N. S. Inferring interaction partners from protein sequences. Proc. Natl Acad. Sci. USA 113, 12180–12185 (2016).
Szurmant, H. & Weigt, M. Inter-residue, inter-protein and inter-family coevolution: bridging the scales. Curr. Opin. Struct. Biol. 50, 26–32 (2018).
Gueudr’e, T., Baldassi, C., Zamparo, M., Weigt, M. & Pagnani, A. Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis. Proc. Natl Acad. Sci. USA 113, 12186–12191 (2016).
Ovchinnikov, S., Kamisetty, H. & Baker, D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 3, e02030 (2014).
Zeng, H. et al. ComplexContact: a web server for inter-protein contact prediction using deep learning. Nucleic Acids Res. 46, W432–W437 (2018).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).
Lensink, M. F. et al. The challenge of modeling protein assemblies: the CASP12-CAPRI experiment. Proteins 86, 257–273 (2018).
Lensink, M. F. et al. Blind prediction of homo- and hetero-protein complexes: the CASP13-CAPRI experiment. Proteins 87, 1200–1221 (2019).
Rao, R. et al. MSA transformer. Proc. 38th International Conference on Machine Learning 139, 8844–8856 (PMLR, 2021).
Hatos, A. et al. DisProt: intrinsic protein disorder annotation in 2020. Nucleic Acids Res. 48, D269–D276 (2020).
Xu, J. & Zhang, Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics 26, 889–895 (2010).
Mirdita, M. et al. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 45, D170–D176 (2017).
Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM–HMM alignment. Nat. Methods 9, 173–175 (2011).
Seemayer, S., Gruber, M. & Söding, J. CCMpred—fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics 30, 3128–3130 (2014).
Steinegger, M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics 20, 473 (2019).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Si, Y. & Yan, C. Improved protein contact prediction using dimensional hybrid residual networks and singularity enhanced loss function. Brief. Bioinformatics 22, bbab341 (2021).
Su, H. et al. Improved protein structure prediction using a new multi-scale network and homologous templates. Adv. Sci. 8, e2102592 (2021).
Hubbard, S. J. & Thornton, J. M. NACCESS: computer program (Department of Biochemistry and Molecular Biology, University College London, 1993).
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive datasets. Nat. Biotechnol. 35, 1026–1028 (2017).
Steinegger, M. & Söding, J. Clustering huge protein sequence sets in linear time. Nat. Commun. 9, 2542 (2018).
Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 318–327 (2020).
Kinga, D. & Adam, J. B. A method for stochastic optimization. In International Conference on Learning Representations (ICLR) (2015).
Lin, P., Tao, H., Li, H. & Huang, S.-Y. Protein-protein contact prediction by geometric triangle-aware protein language models. Zenodo (2023); https://doi.org/10.5281/zenodo.8304327
Acknowledgements
This work was supported by the National Natural Science Foundation of China (grants nos. 32161133002 and 62072199) and a startup grant of Huazhong University of Science and Technology.
Author information
Authors and Affiliations
Contributions
S.-Y.H. conceived and supervised the project. P.L. and S.-Y.H. designed and performed the experiments. P.L., H.T. and H.L. analysed the data. P.L., H.T., H.L. and S.-Y.H. wrote the paper. All authors reviewed and approved the final version of the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Jacob Huth, in collaboration with the Nature Machine Intelligence team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Impact of MSA depth and contact density, Impact of sequence cropping size, Impact of conformational changes, Impact of intrinsically disordered proteins, Impact of structural similarity, Impact of intra-protein distance usage, Supplementary Tables 1–4, Fig. 1, algorithm and reference.
Supplementary Data
Supplementary data 1–10.
Source data
Source Data
Source data Table 1–3, Source data Fig. 2–4, Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, P., Tao, H., Li, H. et al. Protein–protein contact prediction by geometric triangle-aware protein language models. Nat Mach Intell 5, 1275–1284 (2023). https://doi.org/10.1038/s42256-023-00741-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-023-00741-2
This article is cited by
-
Improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein
Nature Communications (2024)
-
Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design
Molecular Biotechnology (2024)