Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Integrated structure prediction of protein–protein docking with experimental restraints using ColabDock

An Author Correction to this article was published on 10 September 2024

This article has been updated

A preprint version of the article is available at bioRxiv.

Abstract

Protein complex structure prediction plays important roles in various applications, such as drug discovery and antibody design. However, due to limited prediction accuracy, there are frequent inconsistencies between the predictions and the experiments. Here we present ColabDock, a general framework adapting deep learning structure prediction models to integrate experimental restraints of different forms and sources without further large-scale retraining or fine tuning. With a generation–prediction architecture and trained ranking model, ColabDock outperforms HADDOCK and ClusPro using AlphaFold2 as the structure prediction model, not only in complex structure predictions with simulated residue and surface restraints but also in those assisted by nuclear magnetic resonance chemical shift perturbation as well as covalent labelling. It also assists antibody–antigen interface prediction with emulated interface scan restraints, which could be obtained by experiments such as deep mutational scanning. As a unified framework, we hope that ColabDock can help to bridge the gap between experimental and computational protein science.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The workflow of ColabDock.
Fig. 2: Performance of ColabDock on the validation set.
Fig. 3: Comparison of ColabDock, HADDOCK and ClusPro on the benchmark set.
Fig. 4: ColabDock performance and restraint analysis on the CSP set.
Fig. 5: ColabDock performance and restraint analysis on the CL set.
Fig. 6: Comparison of ColabDock, HADDOCK and ClusPro on the antibody–antigen benchmark set.

Similar content being viewed by others

Data availability

All the Protein Data Bank (PDB) samples used in this study are publicly available and can be downloaded at RCSB PDB website (https://www.rcsb.org/). Information for the synthetic datasets is listed in Supplementary Table 7. Data used in experimental datasets are listed in Supplementary Table 8. All the data used in this study are available at https://doi.org/10.17605/OSF.IO/N6R48 (ref. 41).

Code availability

The ColabDock code (ref. 42) is available at GitHub via https://github.com/JeffSHF/ColabDock with a doi of https://doi.org/10.5281/zenodo.10467048 under Apache 2.0 license. A Colab notebook is additionally provided at https://colab.research.google.com/github/JeffSHF/ColabDock/blob/dev/ColabDock.ipynb for ease of use.

Change history

References

  1. Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. Diffdock: diffusion steps, twists, and turns for molecular docking. In Proc. 2023 International Conference on Learning Representations (ICLR, 2023). https://doi.org/10.48550/arXiv.2210.01776

  2. Tsaban, T. et al. Harnessing protein folding neural networks for peptide–protein docking. Nat. Commun. 13, 176 (2022).

    Article  Google Scholar 

  3. Masters, M., Mahmoud, A. H., Wei, Y. & Lill, M. A. Deep learning model for efficient protein–ligand docking with implicit side-chain flexibility. J. Chem. Inf. Model. 63, 1695–1707 (2023).

    Article  Google Scholar 

  4. Zheng, W., Wuyun, Q., Freddolino, P. L. & Zhang, Y. Proteins: Structure, Function, and Bioinformatics (Wiley, 2023).

  5. Peng, Z., Wang, W., Wei, H., Li, X. & Yang, J. Improved protein structure prediction with trRosettaX2, AlphaFold2, and optimized MSAs in CASP15. Proteins Struct. Funct. Bioinf. 91, 1704–1711 (2023).

    Article  Google Scholar 

  6. Wallner, B. Improved multimer prediction using massive sampling with AlphaFold in CASP15. Proteins 91, 1734–1746 (2023).

    Article  Google Scholar 

  7. Pierce, B. G. et al. ZDOCK server: interactive docking prediction of protein–protein complexes and symmetric multimers. Bioinformatics 30, 1771–1773 (2014).

    Article  Google Scholar 

  8. Cheng, T. M.-K., Blundell, T. L. & Fernandez-Recio, J. pyDock: electrostatics and desolvation for effective scoring of rigid-body protein–protein docking. Proteins 68, 503–515 (2007).

    Article  Google Scholar 

  9. Torchala, M., Moal, I. H., Chaleil, R. A. G., Fernandez-Recio, J. & Bates, P. A. SwarmDock: a server for flexible protein–protein docking. Bioinformatics 29, 807–809 (2013).

    Article  Google Scholar 

  10. de Vries, S. J., van Dijk, M. & Bonvin, A. M. J. J. The HADDOCK web server for data-driven biomolecular docking. Nat. Protoc. 5, 883–897 (2010).

    Article  Google Scholar 

  11. Dominguez, C., Boelens, R. & Bonvin, A. M. J. J. HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).

    Article  Google Scholar 

  12. Comeau, S. R., Gatchell, D. W., Vajda, S. & Camacho, C. J. ClusPro: a fully automated algorithm for protein-protein docking. Nucleic Acids Res. 32, W96–W99 (2004).

    Article  Google Scholar 

  13. Comeau, S. R., Gatchell, D. W., Vajda, S. & Camacho, C. J. ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics 20, 45–50 (2004).

    Article  Google Scholar 

  14. Kozakov, D. et al. The ClusPro web server for protein–protein docking. Nat. Protoc. 12, 255–278 (2017).

    Article  Google Scholar 

  15. Vajda, S., Hall, D. R. & Kozakov, D. Sampling and scoring: a marriage made in heaven: sampling and scoring. Proteins 81, 1874–1884 (2013).

    Article  Google Scholar 

  16. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  17. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).

  18. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  Google Scholar 

  19. Roney, J. P. & Ovchinnikov, S. State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101 (2022).

    Article  Google Scholar 

  20. Jendrusch, M., Korbel, J. O. & Sadiq, S. K. AlphaDesign: a de novo protein design framework based on AlphaFold. Preprint at bioRxiv https://doi.org/10.1101/2021.10.11.463937 (2021).

  21. Moffat, L., Kandathil, S. M. & Jones, D. T. Design in the DARK: learning deep generative models for de novo protein design. Preprint at bioRxiv https://doi.org/10.1101/2022.01.27.478087 (2022).

  22. Wicky, B. I. M. et al. Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022).

    Article  Google Scholar 

  23. Frank, C. et al. Efficient and scalable de novo protein design using a relaxed sequence space. Preprint at bioRxiv https://doi.org/10.1101/2023.02.24.529906 (2023).

  24. Jiang, W. & Zheng, S. Structural insights into galanin receptor signaling. Proc. Natl Acad. Sci. USA 119, e2121465119 (2022).

    Article  Google Scholar 

  25. Jin, Z. et al. Structure of a TOC–TIC supercomplex spanning two chloroplast envelope membranes. Cell 185, 4788–4800.e13 (2022).

    Article  Google Scholar 

  26. Drake, Z. C., Seffernick, J. T. & Lindert, S. Protein complex prediction using rosetta, alphafold, and mass spectrometry covalent labeling. Nat. Commun. 13, 7846 (2022).

    Article  Google Scholar 

  27. Mitternacht, S. FreeSASA: an open source C library for solvent accessible surface area calculations. F1000Res 5, 189 (2016).

    Article  Google Scholar 

  28. Almagro, J. C. et al. Second antibody modeling assessment (AMA-II): 3D antibody modeling. Proteins 82, 1553–1562 (2014).

    Article  Google Scholar 

  29. Anishchenko, I., Kundrotas, P. J. & Vakser, I. A. Modeling complexes of modeled proteins: modeling complexes of modeled proteins. Proteins 85, 470–478 (2017).

    Article  Google Scholar 

  30. Ganea, O.-E. et al. Independent SE(3)-equivariant models for end-to-end rigid protein docking. In Proc. 2022 International Conference on Learning Representations (ICLR, 2022). https://doi.org/10.48550/arXiv.2111.07786

  31. Yan, Y., Tao, H., He, J. & Huang, S.-Y. The HDOCK server for integrated protein–protein docking. Nat. Protoc. 15, 1829–1852 (2020).

    Article  Google Scholar 

  32. Yin, R., Feng, B. Y., Varshney, A. & Pierce, B. G. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Sci. 31, e4379 (2022).

    Article  Google Scholar 

  33. Huang, M. et al. The mechanism of an inhibitory antibody on TF-initiated blood coagulation revealed by the crystal structures of human tissue factor, Fab 5G9 and TF·5G9 complex 1. J. Mol. Biol. 275, 873–894 (1998).

    Article  Google Scholar 

  34. Bryant, P., Kelkar, A., Guljas, A., Clementi, C. & Noé, F. Structure prediction of protein–ligand complexes from sequence information with Umol. Nat. Commun. 15, 4536 (2024).

    Article  Google Scholar 

  35. Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).

    Article  Google Scholar 

  36. Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).

    Article  Google Scholar 

  37. Vreven, T. et al. Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol. 427, 3031–3041 (2015).

    Article  Google Scholar 

  38. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).

    Article  Google Scholar 

  39. Joachims, T. Optimizing search engines using clickthrough data. In Proc. Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 133–142 (ACM, 2002).

  40. Basu, S. & Wallner, B. DockQ: a quality measure for protein–protein docking models. PLoS ONE 11, e0161879 (2016).

    Article  Google Scholar 

  41. Feng, S., et al. ColabDock (data). OSF https://doi.org/10.17605/OSF.IO/N6R48 (2024).

  42. Feng, S., et al. ColabDock (source code). OSF https://doi.org/10.5281/ZENODO.10467048 (2024).

Download references

Acknowledgements

We thank G. Jones from Vajda lab for very helpful discussions on usage of ClusPro. We also thank X. Lin for helpful discussions in revision. Z.C. thanks Z. Wang for the unwavering emotional support throughout this project. Financial support from the National Natural Science Foundation of China (92053202, 92353304 and 22050003 to Y.Q.G.) and New Cornerstone Science Foundation (NCI202305 to Y.Q.G.) is gratefully acknowledged. This work is supported by Changping Laboratory (S.F., Y.X., Y.Q.G. and S.L.). This work is also supported by Amgen (S.O.).

Author information

Authors and Affiliations

Authors

Contributions

S.L., Y.Q.G. and S.O. developed overall concepts in the paper and supervised the project. S.F., Z.C., C.Z. and S.O. developed and benchmarked the model and/or contributed to the code. Z.C., C.Z. and Y.X. performed data collection and analysis. S.F., Z.C., C.Z. and S.L. wrote the initial draft of the manuscript. All authors contributed ideas to the work and assisted in manuscript editing and revision.

Corresponding authors

Correspondence to Sergey Ovchinnikov, Yi Qin Gao or Sirui Liu.

Ethics declarations

Competing interests

The authors declare no competing interest.

Peer review

Peer review information

Nature Machine Intelligence thanks Dongbo Bu and Arne Elofsson for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–12, Tables 1–9, Notes 1–7 and references.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, S., Chen, Z., Zhang, C. et al. Integrated structure prediction of protein–protein docking with experimental restraints using ColabDock. Nat Mach Intell 6, 924–935 (2024). https://doi.org/10.1038/s42256-024-00873-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-024-00873-z

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics