Rotamer-free protein sequence design based on deep learning and self-consistency

Liu, Yufeng; Zhang, Lu; Wang, Weilun; Zhu, Min; Wang, Chenchen; Li, Fudong; Zhang, Jiahai; Li, Houqiang; Chen, Quan; Liu, Haiyan

doi:10.1038/s43588-022-00273-6

Article
Published: 21 July 2022

Rotamer-free protein sequence design based on deep learning and self-consistency

Nature Computational Science volume 2, pages 451–462 (2022)Cite this article

3064 Accesses
18 Citations
25 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 02 August 2022

This article has been updated

A preprint version of the article is available at Research Square.

Abstract

Several previously proposed deep learning methods to design amino acid sequences that autonomously fold into a given protein backbone yielded promising results in computational tests but did not outperform conventional energy function-based methods in wet experiments. Here we present the ABACUS-R method, which uses an encoder–decoder network trained using a multitask learning strategy to predict the sidechain type of a central residue from its three-dimensional local environment, which includes, besides other features, the types but not the conformations of the surrounding sidechains. This eliminates the need to reconstruct and optimize sidechain structures, and drastically simplifies the sequence design process. Thus iteratively applying the encoder–decoder to different central residues is able to produce self-consistent overall sequences for a target backbone. Results of wet experiments, including five structures solved by X-ray crystallography, show that ABACUS-R outperforms state-of-the-art energy function-based methods in success rate and design precision.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: An overview of the ABACUS-R method.**

**Fig. 2: Performance of Model_eval in computational tests.**

**Fig. 3: Results of overall sequence design for natural backbones.**

**Fig. 4: Results of experimental analysis of designed proteins.**

**Fig. 5: Variable sidechain types and sidechain packing have been designed by ABACUS-R.**

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Emergence of fractal geometries in the evolution of a metabolic enzyme

Article Open access 10 April 2024

Data availability

The following data are available from Zenodo⁶³: complete lists of proteins for training and testing the models; the amino acid sequences designed for the 100 targets by Model_eval; the amino acid sequences and DNA sequences of the experimentally examined proteins. The experimentally solved protein structures have been deposited in the PDB under accession codes: 7VQL (1r26-A3, 10.2210/pdb7VQL/pdb); 7VQV (1r26-A6, 10.2210/pdb7VQV/pdb); 7VQW (1r26-A7, 10.2210/pdb7VQW/pdb); 7VTY(1cy5-A7, 10.2210/pdb7VTY/pdb); 7VU4(1r26-B4, 10.2210/pdb7VU4/pdb). Source Data are provided with this paper.

Code availability

The source code is available from Code Ocean⁶⁴ at https://doi.org/10.24433/CO.3351944.v1.

Change history

02 August 2022
A Correction to this paper has been published: https://doi.org/10.1038/s43588-022-00305-1

References

Kuhlman, B. & Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 20, 681–697 (2019).
Article Google Scholar
Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).
Article Google Scholar
Silva, D.-A. et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature 565, 186–191 (2019).
Article Google Scholar
Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426–431 (2020).
Article Google Scholar
Siegel, J. B. et al. Computational design of an enzyme catalyst for a stereoselective bimolecular Diels–Alder reaction. Science 329, 309–313 (2010).
Article Google Scholar
Cui, Y. et al. Development of a versatile and efficient C–N lyase platform for asymmetric hydroamination via computational enzyme redesign. Nat. Catal. 4, 364–373 (2021).
Article Google Scholar
Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).
Article Google Scholar
Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011).
Article Google Scholar
Xiong, P. et al. Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability. Nat. Commun. 5, 1–9 (2014).
Article Google Scholar
Xiong, P. et al. Increasing the efficiency and accuracy of the ABACUS protein sequence design method. Bioinformatics 36, 136–144 (2020).
Article Google Scholar
Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).
Article MathSciNet MATH Google Scholar
Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature 550, 74–79 (2017).
Article Google Scholar
Johansson, K. E. et al. Computational redesign of thioredoxin is hypersensitive toward minor conformational changes in the backbone template. J. Mol. Biol. 428, 4361–4377 (2016).
Article Google Scholar
Marin, F. I., Johansson, K. E., O’Shea, C., Lindorff-Larsen, K. & Winther, J. R. Computational and experimental assessment of backbone templates for computational redesign of the thioredoxin fold. J. Phy. Chem. B 125, 11141–11149 (2021).
Article Google Scholar
Murphy, G. S. et al. Increasing sequence diversity with flexible backbone protein design: the complete redesign of a protein hydrophobic core. Structure 20, 1086–1096 (2012).
Article Google Scholar
Zhou, J., Panaitiu, A. E. & Grigoryan, G. A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures. Proc. Natl Acad. Sci. USA 117, 1059–1068 (2020).
Article Google Scholar
Anand, N. et al. Protein sequence design with a learned potential. Nat. Commun. 13, 1–11 (2022).
Article Google Scholar
Dahiyat, B. I. & Mayo, S. L. De novo protein design: fully automated sequence selection. Science 278, 82–87 (1997).
Article Google Scholar
Simonson, T. et al. Computational protein design: the proteus software and selected applications. J. Comput. Chem. 34, 2472–2484 (2013).
Article Google Scholar
Huang, X., Pearce, R. & Zhang, Y. EvoEF2: accurate and fast energy function for computational protein design. Bioinformatics 36, 1135–1142 (2020).
Article Google Scholar
Liang, S., Li, Z., Zhan, J. & Zhou, Y. De novo protein design by an energy function based on series expansion in distance and orientation dependence. Bioinformatics 38, 86–93 (2021).
Article Google Scholar
Zhou, X. et al. Proteins of well-defined structures can be designed without backbone readjustment by a statistical model. J. Struct. Biol. 196, 350–357 (2016).
Article Google Scholar
Han, M. et al. Selection and analyses of variants of a designed protein suggest importance of hydrophobicity of partially buried sidechains for protein stability at high temperatures. Protein Sci. 28, 1437–1447 (2019).
Article Google Scholar
Liu, R., Wang, J., Xiong, P., Chen, Q. & Liu, H. De novo sequence redesign of a functional Ras-binding domain globally inverted the surface charge distribution and led to extreme thermostability. Biotechnol. Bioeng. 118, 2031–2042 (2021).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article Google Scholar
Wang, S., Li, W., Liu, S. & Xu, J. RaptorX-property: a web server for protein structure property prediction. Nucl. Acids Res. 44, W430–W435 (2016).
Article Google Scholar
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).
Article Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021).
Article Google Scholar
Ingraham, J., Garg, V. K., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In Advances in Neural Information Processing Systems Vol 32 (NeurIPS, 2019).
Strokach, A., Becerra, D., Corbi-Verge, C., Perez-Riba, A. & Kim, P. M. Fast and flexible protein design using deep graph neural networks. Cell Syst. 11, 402–411 (2020).
Article Google Scholar
Qi, Y. & Zhang, J. Z. H. DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with densenet. J. Chem. Inform. Model. 60, 1245–1252 (2020).
Article Google Scholar
Zhang, Y. et al. ProDCoNN: protein design using a convolutional neural network. Proteins Struct. Funct. Bioinform. 88, 819–829 (2020).
Article Google Scholar
Torng, W. & Altman, R. B. 3D Deep convolutional neural networks for amino acid environment similarity analysis. BMC Bioinform. 18, 1–23 (2017).
Article Google Scholar
Chen, S. et al. To improve protein sequence profile prediction through image captioning on pairwise residue distance map. J. Chem. Inform. Model. 60, 391–399 (2019).
Article Google Scholar
Ovchinnikov, S. & Huang, P.-S. Structure-based protein design with deep learning. Cur. Opin. Chem. Biol. 65, 136–144 (2021).
Article Google Scholar
Sillitoe, I. et al. CATH: increased structural coverage of functional space. Nucl Acids Res. 49, D266–D273 (2021).
Article Google Scholar
Ruder, S. An overview of multi-task learning in deep neural networks. Preprint at https://arxiv.org/abs/1706.05098 (2017).
Bava, K. A., Gromiha, M. M., Uedaira, H., Kitajima, K. & Sarai, A. ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucl. Acids Res. 32, D120–D121 (2004).
Article Google Scholar
Jing, B, Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from Protein Structure with Geometric Vector Perceptrons. In International Conference on Learning Representations (ICLR, 2021).
Li, A. J., Sundar, V., Grigoryan, G. & Keating, A. E. TERMinator: a neural framework for structure-based protein design using tertiary repeating motifs. Preprint at https://arxiv.org/abs/2204.13048 (2022).
Conway, P., Tyka, M. D., DiMaio, F., Konerding, D. E. & Baker, D. Relaxation of backbone bond geometry improves protein energy landscape modeling. Protein Sci. 23, 47–55 (2014).
Article Google Scholar
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucl. Acids Res. 33, 2302–2309 (2005).
Article Google Scholar
Huang, B. et al. A backbone-centred energy function of neural networks for protein design. Nature 602, 523–528 (2022).
Article Google Scholar
Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 173–175 (2012).
Article Google Scholar
Buel, G. R. & Walters, K. J. Can AlphaFold2 predict the impact of missense mutations on structure? Nat. Struct. Mol. Biol. 29, 1–2 (2022).
Article Google Scholar
Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011).
Article Google Scholar
Krivov, G. G., Shapovalov, M. V. & Dunbrack, R. L. Jr. Improved prediction of protein side-chain conformations with SCWRL4. Proteins Struct. Funct. Bioinform. 77, 778–795 (2009).
Article Google Scholar
Bengio, S., Vinyals, O., Jaitly, N. & Shazeer, S. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems Vol. 28 (NeurIPS, 2015).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems Vol. 30 (NeurIPS, 2017).
Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016).
Frishman, D. & Argos, P. Knowledge-based protein secondary structure assignment. Proteins Struct. Funct. Bioinform. 23, 566–579 (1995).
Article Google Scholar
Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems Vol. 32 (NeurIPS, 2019).
Kingma, D., & Ba, J. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR, 2015).
Wang, G. & Dunbrack, R. L. Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).
Article Google Scholar
The PyMOL Molecular Graphics System v.1.8 (Schrödinger, LLC, 2015).
Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995).
Article Google Scholar
Lee, W., Westler, W. M., Bahrami, A., Eghbalnia, H. R. & Markley, J. L. PINE-SPARKY: graphical interface for evaluating automated probabilistic peak assignments in protein NMR spectroscopy. Bioinformatics 25, 2085–2087 (2009).
Article Google Scholar
Zhang, W.-Z. et al. The protein complex crystallography beamline (bl19u1) at the Shanghai synchrotron radiation facility. Nucl. Sci. Tech. 30, 1–11 (2019).
Article Google Scholar
Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326 (1997).
Article Google Scholar
Kabsch, W. Integration, scaling, space-group assignment and post-refinement. Acta Crystallogr. D 66, 133–144 (2010).
Article Google Scholar
Vagin, A. & Teplyakov, A. Molecular replacement with molrep. Acta Crystallogr. D 66, 22–25 (2010).
Article Google Scholar
Adams, P. D. et al. Phenix: building new software for automated crystallographic structure determination. Acta Crystallogr. D 58, 1948–1954 (2002).
Article Google Scholar
Liu, Y. Rotamer-Free Protein Sequence Design Based on Deep Learning and Self-Consistency (Zenodo, 2022); https://doi.org/10.5281/zenodo.6592054.
Liu, Y. et al. ABACUS-R: Rotamer-free protein sequence design based on deep learning and self-consistency (Code Ocean, 2022); https://doi.org/10.24433/CO.3351944.v1.

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (grant no. 2018YFA0900703 to H.Y.L. and 2018YFA090 1600 to Q.C.), National Natural Science Foundation of China (grant no. 21773220 to H.Y.L., 31971175 and 32171411 to Q.C.), the Tianjin Synthetic Biotechnology Innovation Capacity Improvement Project (grant no. TSBICIP-PTJS-001 to H.Y.L.), and Youth Innovation Promotion Association, Chinese Academy of Sciences (grant no. 2017494 to Q.C.). We thank the members of staff from BL19U1 and BL02U1 beamlines of National Facility for Protein Science in Shanghai (NFPS) and of Shanghai Synchrotron Radiation Facility for assistance during crystallographic data collection. We thank M. Lv and Y. Yun for their assistance with X-ray diffraction data collection and processing.

Author information

These authors contributed equally: Yufeng Liu, Lu Zhang and Weilun Wang.

Authors and Affiliations

MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
Yufeng Liu, Lu Zhang, Min Zhu, Chenchen Wang, Fudong Li, Jiahai Zhang, Quan Chen & Haiyan Liu
CAS Key Laboratory of GIPAS, School of Information Science and Technology, Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, Anhui, China
Weilun Wang & Houqiang Li
Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui, China
Fudong Li, Jiahai Zhang, Quan Chen & Haiyan Liu
School of Data Science, University of Science and Technology of China, Hefei, Anhui, China
Haiyan Liu

Authors

Yufeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weilun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Chenchen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fudong Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiahai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Houqiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Quan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Haiyan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.Y.L., H.Q.L, Y.F.L. and W.L.W. conceived the computational framework. Q.C., L.Z. and Y.F.L. designed the experimental study. Y.F.L. and W.L.W. wrote the computer programs and performed the calculations under the supervision of H.Y.L. and H.Q.L. L.Z. performed experimental analyses under the supervision of Q.C. and H.Y.L. M.Z., C.C.W. and F.D.L. analyzed the crystallographic data. J.H.Z. collected and processed NMR data. Y.F.L., W.L.W. and H.Y.L. wrote the paper with input from all of the other authors.

Corresponding authors

Correspondence to Houqiang Li, Quan Chen or Haiyan Liu.

Ethics declarations

Competing interests

H.Y.L. Q.C., H.Q.L., Y.F.L. and W.L.W. have filed patent application (no. 202210091553.7) relating to rotamer-free protein seuqence design in the name of University of Science and Technology of China. The other authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Jue Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Handling editor: Jie Pan, in collaboration with the Nature Computational Science team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–21, Tables 1–14 and references.

Reporting Summary

Peer Review File

Supplementary Data 1

Raw data for Supplementary Figs. 1a–d, 2a,b, 3a,c, 5c, 6a, 7 and 11, and protein lists.

Source data

Source Data Fig. 2

Raw data of recovery rate of Model_eval for single residues in a test set.

Source Data Fig. 3

Intermediate results for designing the overall sequences for 100 targets during self-consistency iteration. Overall sequence recovery rate and Rosetta energy for ABACUS-R-designed sequences. ABACUS-R-designed sequences for 100 targets.

Source Data Fig. 4

Raw data for three fast protein liquid chromatography experiments, three ¹H NMR spectra, three DSC spectra, one HSQC spectrum and two crystal structures for three ABACUS-R designed sequences.

Source Data Fig. 5

PDB files and validation reports of X-ray structures, including those shown in this figure.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, Y., Zhang, L., Wang, W. et al. Rotamer-free protein sequence design based on deep learning and self-consistency. Nat Comput Sci 2, 451–462 (2022). https://doi.org/10.1038/s43588-022-00273-6

Download citation

Received: 27 December 2021
Accepted: 07 June 2022
Published: 21 July 2022
Issue Date: July 2022
DOI: https://doi.org/10.1038/s43588-022-00273-6

This article is cited by

Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes
- Peicong Lin
- Yumeng Yan
- Sheng-You Huang
Nature Communications (2023)
ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention
- Xinyi Zhou
- Guangyong Chen
- Pheng Ann Heng
Nature Communications (2023)
Exploring binding positions and backbone conformations of peptide ligands of proteins with a backbone-centred statistical energy function
- Lu Zhang
- Haiyan Liu
Journal of Computer-Aided Molecular Design (2023)
Protein sequence design by deep learning
- Jue Wang
Nature Computational Science (2022)