Abstract
Protein complex structure prediction plays important roles in various applications, such as drug discovery and antibody design. However, due to limited prediction accuracy, there are frequent inconsistencies between the predictions and the experiments. Here we present ColabDock, a general framework adapting deep learning structure prediction models to integrate experimental restraints of different forms and sources without further large-scale retraining or fine tuning. With a generation–prediction architecture and trained ranking model, ColabDock outperforms HADDOCK and ClusPro using AlphaFold2 as the structure prediction model, not only in complex structure predictions with simulated residue and surface restraints but also in those assisted by nuclear magnetic resonance chemical shift perturbation as well as covalent labelling. It also assists antibody–antigen interface prediction with emulated interface scan restraints, which could be obtained by experiments such as deep mutational scanning. As a unified framework, we hope that ColabDock can help to bridge the gap between experimental and computational protein science.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All the Protein Data Bank (PDB) samples used in this study are publicly available and can be downloaded at RCSB PDB website (https://www.rcsb.org/). Information for the synthetic datasets is listed in Supplementary Table 7. Data used in experimental datasets are listed in Supplementary Table 8. All the data used in this study are available at https://doi.org/10.17605/OSF.IO/N6R48 (ref. 41).
Code availability
The ColabDock code (ref. 42) is available at GitHub via https://github.com/JeffSHF/ColabDock with a doi of https://doi.org/10.5281/zenodo.10467048 under Apache 2.0 license. A Colab notebook is additionally provided at https://colab.research.google.com/github/JeffSHF/ColabDock/blob/dev/ColabDock.ipynb for ease of use.
References
Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. Diffdock: diffusion steps, twists, and turns for molecular docking. In Proc. 2023 International Conference on Learning Representations (ICLR, 2023). https://doi.org/10.48550/arXiv.2210.01776
Tsaban, T. et al. Harnessing protein folding neural networks for peptide–protein docking. Nat. Commun. 13, 176 (2022).
Masters, M., Mahmoud, A. H., Wei, Y. & Lill, M. A. Deep learning model for efficient protein–ligand docking with implicit side-chain flexibility. J. Chem. Inf. Model. 63, 1695–1707 (2023).
Zheng, W., Wuyun, Q., Freddolino, P. L. & Zhang, Y. Proteins: Structure, Function, and Bioinformatics (Wiley, 2023).
Peng, Z., Wang, W., Wei, H., Li, X. & Yang, J. Improved protein structure prediction with trRosettaX2, AlphaFold2, and optimized MSAs in CASP15. Proteins Struct. Funct. Bioinf. 91, 1704–1711 (2023).
Wallner, B. Improved multimer prediction using massive sampling with AlphaFold in CASP15. Proteins 91, 1734–1746 (2023).
Pierce, B. G. et al. ZDOCK server: interactive docking prediction of protein–protein complexes and symmetric multimers. Bioinformatics 30, 1771–1773 (2014).
Cheng, T. M.-K., Blundell, T. L. & Fernandez-Recio, J. pyDock: electrostatics and desolvation for effective scoring of rigid-body protein–protein docking. Proteins 68, 503–515 (2007).
Torchala, M., Moal, I. H., Chaleil, R. A. G., Fernandez-Recio, J. & Bates, P. A. SwarmDock: a server for flexible protein–protein docking. Bioinformatics 29, 807–809 (2013).
de Vries, S. J., van Dijk, M. & Bonvin, A. M. J. J. The HADDOCK web server for data-driven biomolecular docking. Nat. Protoc. 5, 883–897 (2010).
Dominguez, C., Boelens, R. & Bonvin, A. M. J. J. HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).
Comeau, S. R., Gatchell, D. W., Vajda, S. & Camacho, C. J. ClusPro: a fully automated algorithm for protein-protein docking. Nucleic Acids Res. 32, W96–W99 (2004).
Comeau, S. R., Gatchell, D. W., Vajda, S. & Camacho, C. J. ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics 20, 45–50 (2004).
Kozakov, D. et al. The ClusPro web server for protein–protein docking. Nat. Protoc. 12, 255–278 (2017).
Vajda, S., Hall, D. R. & Kozakov, D. Sampling and scoring: a marriage made in heaven: sampling and scoring. Proteins 81, 1874–1884 (2013).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Roney, J. P. & Ovchinnikov, S. State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101 (2022).
Jendrusch, M., Korbel, J. O. & Sadiq, S. K. AlphaDesign: a de novo protein design framework based on AlphaFold. Preprint at bioRxiv https://doi.org/10.1101/2021.10.11.463937 (2021).
Moffat, L., Kandathil, S. M. & Jones, D. T. Design in the DARK: learning deep generative models for de novo protein design. Preprint at bioRxiv https://doi.org/10.1101/2022.01.27.478087 (2022).
Wicky, B. I. M. et al. Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022).
Frank, C. et al. Efficient and scalable de novo protein design using a relaxed sequence space. Preprint at bioRxiv https://doi.org/10.1101/2023.02.24.529906 (2023).
Jiang, W. & Zheng, S. Structural insights into galanin receptor signaling. Proc. Natl Acad. Sci. USA 119, e2121465119 (2022).
Jin, Z. et al. Structure of a TOC–TIC supercomplex spanning two chloroplast envelope membranes. Cell 185, 4788–4800.e13 (2022).
Drake, Z. C., Seffernick, J. T. & Lindert, S. Protein complex prediction using rosetta, alphafold, and mass spectrometry covalent labeling. Nat. Commun. 13, 7846 (2022).
Mitternacht, S. FreeSASA: an open source C library for solvent accessible surface area calculations. F1000Res 5, 189 (2016).
Almagro, J. C. et al. Second antibody modeling assessment (AMA-II): 3D antibody modeling. Proteins 82, 1553–1562 (2014).
Anishchenko, I., Kundrotas, P. J. & Vakser, I. A. Modeling complexes of modeled proteins: modeling complexes of modeled proteins. Proteins 85, 470–478 (2017).
Ganea, O.-E. et al. Independent SE(3)-equivariant models for end-to-end rigid protein docking. In Proc. 2022 International Conference on Learning Representations (ICLR, 2022). https://doi.org/10.48550/arXiv.2111.07786
Yan, Y., Tao, H., He, J. & Huang, S.-Y. The HDOCK server for integrated protein–protein docking. Nat. Protoc. 15, 1829–1852 (2020).
Yin, R., Feng, B. Y., Varshney, A. & Pierce, B. G. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Sci. 31, e4379 (2022).
Huang, M. et al. The mechanism of an inhibitory antibody on TF-initiated blood coagulation revealed by the crystal structures of human tissue factor, Fab 5G9 and TF·5G9 complex 1. J. Mol. Biol. 275, 873–894 (1998).
Bryant, P., Kelkar, A., Guljas, A., Clementi, C. & Noé, F. Structure prediction of protein–ligand complexes from sequence information with Umol. Nat. Commun. 15, 4536 (2024).
Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).
Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).
Vreven, T. et al. Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol. 427, 3031–3041 (2015).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Joachims, T. Optimizing search engines using clickthrough data. In Proc. Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 133–142 (ACM, 2002).
Basu, S. & Wallner, B. DockQ: a quality measure for protein–protein docking models. PLoS ONE 11, e0161879 (2016).
Feng, S., et al. ColabDock (data). OSF https://doi.org/10.17605/OSF.IO/N6R48 (2024).
Feng, S., et al. ColabDock (source code). OSF https://doi.org/10.5281/ZENODO.10467048 (2024).
Acknowledgements
We thank G. Jones from Vajda lab for very helpful discussions on usage of ClusPro. We also thank X. Lin for helpful discussions in revision. Z.C. thanks Z. Wang for the unwavering emotional support throughout this project. Financial support from the National Natural Science Foundation of China (92053202, 92353304 and 22050003 to Y.Q.G.) and New Cornerstone Science Foundation (NCI202305 to Y.Q.G.) is gratefully acknowledged. This work is supported by Changping Laboratory (S.F., Y.X., Y.Q.G. and S.L.). This work is also supported by Amgen (S.O.).
Author information
Authors and Affiliations
Contributions
S.L., Y.Q.G. and S.O. developed overall concepts in the paper and supervised the project. S.F., Z.C., C.Z. and S.O. developed and benchmarked the model and/or contributed to the code. Z.C., C.Z. and Y.X. performed data collection and analysis. S.F., Z.C., C.Z. and S.L. wrote the initial draft of the manuscript. All authors contributed ideas to the work and assisted in manuscript editing and revision.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interest.
Peer review
Peer review information
Nature Machine Intelligence thanks Dongbo Bu and Arne Elofsson for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–12, Tables 1–9, Notes 1–7 and references.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Feng, S., Chen, Z., Zhang, C. et al. Integrated structure prediction of protein–protein docking with experimental restraints using ColabDock. Nat Mach Intell 6, 924–935 (2024). https://doi.org/10.1038/s42256-024-00873-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-024-00873-z