Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Rotamer-free protein sequence design based on deep learning and self-consistency

A Publisher Correction to this article was published on 02 August 2022

This article has been updated

A preprint version of the article is available at Research Square.

Abstract

Several previously proposed deep learning methods to design amino acid sequences that autonomously fold into a given protein backbone yielded promising results in computational tests but did not outperform conventional energy function-based methods in wet experiments. Here we present the ABACUS-R method, which uses an encoder–decoder network trained using a multitask learning strategy to predict the sidechain type of a central residue from its three-dimensional local environment, which includes, besides other features, the types but not the conformations of the surrounding sidechains. This eliminates the need to reconstruct and optimize sidechain structures, and drastically simplifies the sequence design process. Thus iteratively applying the encoder–decoder to different central residues is able to produce self-consistent overall sequences for a target backbone. Results of wet experiments, including five structures solved by X-ray crystallography, show that ABACUS-R outperforms state-of-the-art energy function-based methods in success rate and design precision.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: An overview of the ABACUS-R method.
Fig. 2: Performance of Modeleval in computational tests.
Fig. 3: Results of overall sequence design for natural backbones.
Fig. 4: Results of experimental analysis of designed proteins.
Fig. 5: Variable sidechain types and sidechain packing have been designed by ABACUS-R.

Similar content being viewed by others

Data availability

The following data are available from Zenodo63: complete lists of proteins for training and testing the models; the amino acid sequences designed for the 100 targets by Modeleval; the amino acid sequences and DNA sequences of the experimentally examined proteins. The experimentally solved protein structures have been deposited in the PDB under accession codes: 7VQL (1r26-A3, 10.2210/pdb7VQL/pdb); 7VQV (1r26-A6, 10.2210/pdb7VQV/pdb); 7VQW (1r26-A7, 10.2210/pdb7VQW/pdb); 7VTY(1cy5-A7, 10.2210/pdb7VTY/pdb); 7VU4(1r26-B4, 10.2210/pdb7VU4/pdb). Source Data are provided with this paper.

Code availability

The source code is available from Code Ocean64 at https://doi.org/10.24433/CO.3351944.v1.

Change history

References

  1. Kuhlman, B. & Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 20, 681–697 (2019).

    Article  Google Scholar 

  2. Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).

    Article  Google Scholar 

  3. Silva, D.-A. et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature 565, 186–191 (2019).

    Article  Google Scholar 

  4. Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426–431 (2020).

    Article  Google Scholar 

  5. Siegel, J. B. et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels–Alder reaction. Science 329, 309–313 (2010).

    Article  Google Scholar 

  6. Cui, Y. et al. Development of a versatile and efficient C–N lyase platform for asymmetric hydroamination via computational enzyme redesign. Nat. Catal. 4, 364–373 (2021).

    Article  Google Scholar 

  7. Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).

    Article  Google Scholar 

  8. Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011).

    Article  Google Scholar 

  9. Xiong, P. et al. Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability. Nat. Commun. 5, 1–9 (2014).

    Article  Google Scholar 

  10. Xiong, P. et al. Increasing the efficiency and accuracy of the ABACUS protein sequence design method. Bioinformatics 36, 136–144 (2020).

    Article  Google Scholar 

  11. Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).

    Article  MathSciNet  MATH  Google Scholar 

  12. Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature 550, 74–79 (2017).

    Article  Google Scholar 

  13. Johansson, K. E. et al. Computational redesign of thioredoxin is hypersensitive toward minor conformational changes in the backbone template. J. Mol. Biol. 428, 4361–4377 (2016).

    Article  Google Scholar 

  14. Marin, F. I., Johansson, K. E., O’Shea, C., Lindorff-Larsen, K. & Winther, J. R. Computational and experimental assessment of backbone templates for computational redesign of the thioredoxin fold. J. Phy. Chem. B 125, 11141–11149 (2021).

    Article  Google Scholar 

  15. Murphy, G. S. et al. Increasing sequence diversity with flexible backbone protein design: the complete redesign of a protein hydrophobic core. Structure 20, 1086–1096 (2012).

    Article  Google Scholar 

  16. Zhou, J., Panaitiu, A. E. & Grigoryan, G. A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures. Proc. Natl Acad. Sci. USA 117, 1059–1068 (2020).

    Article  Google Scholar 

  17. Anand, N. et al. Protein sequence design with a learned potential. Nat. Commun. 13, 1–11 (2022).

    Article  Google Scholar 

  18. Dahiyat, B. I. & Mayo, S. L. De novo protein design: fully automated sequence selection. Science 278, 82–87 (1997).

    Article  Google Scholar 

  19. Simonson, T. et al. Computational protein design: the proteus software and selected applications. J. Comput. Chem. 34, 2472–2484 (2013).

    Article  Google Scholar 

  20. Huang, X., Pearce, R. & Zhang, Y. EvoEF2: accurate and fast energy function for computational protein design. Bioinformatics 36, 1135–1142 (2020).

    Article  Google Scholar 

  21. Liang, S., Li, Z., Zhan, J. & Zhou, Y. De novo protein design by an energy function based on series expansion in distance and orientation dependence. Bioinformatics 38, 86–93 (2021).

    Article  Google Scholar 

  22. Zhou, X. et al. Proteins of well-defined structures can be designed without backbone readjustment by a statistical model. J. Struct. Biol. 196, 350–357 (2016).

    Article  Google Scholar 

  23. Han, M. et al. Selection and analyses of variants of a designed protein suggest importance of hydrophobicity of partially buried sidechains for protein stability at high temperatures. Protein Sci. 28, 1437–1447 (2019).

    Article  Google Scholar 

  24. Liu, R., Wang, J., Xiong, P., Chen, Q. & Liu, H. De novo sequence redesign of a functional Ras-binding domain globally inverted the surface charge distribution and led to extreme thermostability. Biotechnol. Bioeng. 118, 2031–2042 (2021).

    Article  Google Scholar 

  25. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  Google Scholar 

  26. Wang, S., Li, W., Liu, S. & Xu, J. RaptorX-property: a web server for protein structure property prediction. Nucl. Acids Res. 44, W430–W435 (2016).

    Article  Google Scholar 

  27. Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).

    Article  Google Scholar 

  28. Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  29. Ingraham, J., Garg, V. K., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In Advances in Neural Information Processing Systems Vol 32 (NeurIPS, 2019).

  30. Strokach, A., Becerra, D., Corbi-Verge, C., Perez-Riba, A. & Kim, P. M. Fast and flexible protein design using deep graph neural networks. Cell Syst. 11, 402–411 (2020).

    Article  Google Scholar 

  31. Qi, Y. & Zhang, J. Z. H. DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with densenet. J. Chem. Inform. Model. 60, 1245–1252 (2020).

    Article  Google Scholar 

  32. Zhang, Y. et al. ProDCoNN: protein design using a convolutional neural network. Proteins Struct. Funct. Bioinform. 88, 819–829 (2020).

    Article  Google Scholar 

  33. Torng, W. & Altman, R. B. 3D Deep convolutional neural networks for amino acid environment similarity analysis. BMC Bioinform. 18, 1–23 (2017).

    Article  Google Scholar 

  34. Chen, S. et al. To improve protein sequence profile prediction through image captioning on pairwise residue distance map. J. Chem. Inform. Model. 60, 391–399 (2019).

    Article  Google Scholar 

  35. Ovchinnikov, S. & Huang, P.-S. Structure-based protein design with deep learning. Cur. Opin. Chem. Biol. 65, 136–144 (2021).

    Article  Google Scholar 

  36. Sillitoe, I. et al. CATH: increased structural coverage of functional space. Nucl Acids Res. 49, D266–D273 (2021).

    Article  Google Scholar 

  37. Ruder, S. An overview of multi-task learning in deep neural networks. Preprint at https://arxiv.org/abs/1706.05098 (2017).

  38. Bava, K. A., Gromiha, M. M., Uedaira, H., Kitajima, K. & Sarai, A. ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucl. Acids Res. 32, D120–D121 (2004).

    Article  Google Scholar 

  39. Jing, B, Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from Protein Structure with Geometric Vector Perceptrons. In International Conference on Learning Representations (ICLR, 2021).

  40. Li, A. J., Sundar, V., Grigoryan, G. & Keating, A. E. TERMinator: a neural framework for structure-based protein design using tertiary repeating motifs. Preprint at https://arxiv.org/abs/2204.13048 (2022).

  41. Conway, P., Tyka, M. D., DiMaio, F., Konerding, D. E. & Baker, D. Relaxation of backbone bond geometry improves protein energy landscape modeling. Protein Sci. 23, 47–55 (2014).

    Article  Google Scholar 

  42. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucl. Acids Res. 33, 2302–2309 (2005).

    Article  Google Scholar 

  43. Huang, B. et al. A backbone-centred energy function of neural networks for protein design. Nature 602, 523–528 (2022).

    Article  Google Scholar 

  44. Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 173–175 (2012).

    Article  Google Scholar 

  45. Buel, G. R. & Walters, K. J. Can AlphaFold2 predict the impact of missense mutations on structure? Nat. Struct. Mol. Biol. 29, 1–2 (2022).

    Article  Google Scholar 

  46. Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011).

    Article  Google Scholar 

  47. Krivov, G. G., Shapovalov, M. V. & Dunbrack, R. L. Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins Struct. Funct. Bioinform. 77, 778–795 (2009).

    Article  Google Scholar 

  48. Bengio, S., Vinyals, O., Jaitly, N. & Shazeer, S. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems Vol. 28 (NeurIPS, 2015).

  49. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems Vol. 30 (NeurIPS, 2017).

  50. Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016).

  51. Frishman, D. & Argos, P. Knowledge-based protein secondary structure assignment. Proteins Struct. Funct. Bioinform. 23, 566–579 (1995).

    Article  Google Scholar 

  52. Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems Vol. 32 (NeurIPS, 2019).

  53. Kingma, D., & Ba, J. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR, 2015).

  54. Wang, G. & Dunbrack, R. L. Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).

    Article  Google Scholar 

  55. The PyMOL Molecular Graphics System v.1.8 (Schrödinger, LLC, 2015).

  56. Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995).

    Article  Google Scholar 

  57. Lee, W., Westler, W. M., Bahrami, A., Eghbalnia, H. R. & Markley, J. L. PINE-SPARKY: graphical interface for evaluating automated probabilistic peak assignments in protein NMR spectroscopy. Bioinformatics 25, 2085–2087 (2009).

    Article  Google Scholar 

  58. Zhang, W.-Z. et al. The protein complex crystallography beamline (bl19u1) at the Shanghai synchrotron radiation facility. Nucl. Sci. Tech. 30, 1–11 (2019).

    Article  Google Scholar 

  59. Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326 (1997).

    Article  Google Scholar 

  60. Kabsch, W. Integration, scaling, space-group assignment and post-refinement. Acta Crystallogr. D 66, 133–144 (2010).

    Article  Google Scholar 

  61. Vagin, A. & Teplyakov, A. Molecular replacement with molrep. Acta Crystallogr. D 66, 22–25 (2010).

    Article  Google Scholar 

  62. Adams, P. D. et al. Phenix: building new software for automated crystallographic structure determination. Acta Crystallogr. D 58, 1948–1954 (2002).

    Article  Google Scholar 

  63. Liu, Y. Rotamer-Free Protein Sequence Design Based on Deep Learning and Self-Consistency (Zenodo, 2022); https://doi.org/10.5281/zenodo.6592054.

  64. Liu, Y. et al. ABACUS-R: Rotamer-free protein sequence design based on deep learning and self-consistency (Code Ocean, 2022); https://doi.org/10.24433/CO.3351944.v1.

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (grant no. 2018YFA0900703 to H.Y.L. and 2018YFA090 1600 to Q.C.), National Natural Science Foundation of China (grant no. 21773220 to H.Y.L., 31971175 and 32171411 to Q.C.), the Tianjin Synthetic Biotechnology Innovation Capacity Improvement Project (grant no. TSBICIP-PTJS-001 to H.Y.L.), and Youth Innovation Promotion Association, Chinese Academy of Sciences (grant no. 2017494 to Q.C.). We thank the members of staff from BL19U1 and BL02U1 beamlines of National Facility for Protein Science in Shanghai (NFPS) and of Shanghai Synchrotron Radiation Facility for assistance during crystallographic data collection. We thank M. Lv and Y. Yun for their assistance with X-ray diffraction data collection and processing.

Author information

Authors and Affiliations

Authors

Contributions

H.Y.L., H.Q.L, Y.F.L. and W.L.W. conceived the computational framework. Q.C., L.Z. and Y.F.L. designed the experimental study. Y.F.L. and W.L.W. wrote the computer programs and performed the calculations under the supervision of H.Y.L. and H.Q.L. L.Z. performed experimental analyses under the supervision of Q.C. and H.Y.L. M.Z., C.C.W. and F.D.L. analyzed the crystallographic data. J.H.Z. collected and processed NMR data. Y.F.L., W.L.W. and H.Y.L. wrote the paper with input from all of the other authors.

Corresponding authors

Correspondence to Houqiang Li, Quan Chen or Haiyan Liu.

Ethics declarations

Competing interests

H.Y.L. Q.C., H.Q.L., Y.F.L. and W.L.W. have filed patent application (no. 202210091553.7) relating to rotamer-free protein seuqence design in the name of University of Science and Technology of China. The other authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Jue Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Handling editor: Jie Pan, in collaboration with the Nature Computational Science team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–21, Tables 1–14 and references.

Reporting Summary

Peer Review File

Supplementary Data 1

Raw data for Supplementary Figs. 1a–d, 2a,b, 3a,c, 5c, 6a, 7 and 11, and protein lists.

Source data

Source Data Fig. 2

Raw data of recovery rate of Modeleval for single residues in a test set.

Source Data Fig. 3

Intermediate results for designing the overall sequences for 100 targets during self-consistency iteration. Overall sequence recovery rate and Rosetta energy for ABACUS-R-designed sequences. ABACUS-R-designed sequences for 100 targets.

Source Data Fig. 4

Raw data for three fast protein liquid chromatography experiments, three 1H NMR spectra, three DSC spectra, one HSQC spectrum and two crystal structures for three ABACUS-R designed sequences.

Source Data Fig. 5

PDB files and validation reports of X-ray structures, including those shown in this figure.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Zhang, L., Wang, W. et al. Rotamer-free protein sequence design based on deep learning and self-consistency. Nat Comput Sci 2, 451–462 (2022). https://doi.org/10.1038/s43588-022-00273-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-022-00273-6

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing