Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Sparks of function by de novo protein design

Abstract

Information in proteins flows from sequence to structure to function, with each step causally driven by the preceding one. Protein design is founded on inverting this process: specify a desired function, design a structure executing this function, and find a sequence that folds into this structure. This ‘central dogma’ underlies nearly all de novo protein-design efforts. Our ability to accomplish these tasks depends on our understanding of protein folding and function and our ability to capture this understanding in computational methods. In recent years, deep learning-derived approaches for efficient and accurate structure modeling and enrichment of successful designs have enabled progression beyond the design of protein structures and towards the design of functional proteins. We examine these advances in the broader context of classical de novo protein design and consider implications for future challenges to come, including fundamental capabilities such as sequence and structure co-design and conformational control considering flexibility, and functional objectives such as antibody and enzyme design.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The central dogma of de novo protein design.
Fig. 2: Defining functional motifs in protein design.
Fig. 3: Controlling protein structure to scaffold functional elements.
Fig. 4: Hierarchical nature of diffusion models.
Fig. 5: Designing sequence to specify structure.
Fig. 6: Examples of functional de novo design.

Similar content being viewed by others

References

  1. Chothia, C. Principles that determine the structure of proteins. Annu. Rev. Biochem. 53, 537–572 (1984).

    CAS  PubMed  Google Scholar 

  2. Korendovych, I. V. & DeGrado, W. F. De novo protein design, a retrospective. Q. Rev. Biophys. 53, e3 (2020).

    CAS  Google Scholar 

  3. Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).

    CAS  PubMed  ADS  Google Scholar 

  4. Baker, D. What has de novo protein design taught us about protein folding and biophysics? Protein Sci. 28, 678–683 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. DeGrado, W. F., Summa, C. M., Pavone, V., Nastri, F. & Lombardi, A. De novo design and structural characterization of proteins and metalloproteins. Annu. Rev. Biochem. 68, 779–819 (1999).

    CAS  PubMed  Google Scholar 

  6. Regan, L. & DeGrado, W. F. Characterization of a helical protein designed from first principles. Science 241, 976–978 (1988).

    CAS  PubMed  ADS  Google Scholar 

  7. Harbury, P. B., Plecs, J. J., Tidor, B., Alber, T. & Kim, P. S. High-resolution protein design with backbone freedom. Science 282, 1462–1467 (1998).

    CAS  PubMed  Google Scholar 

  8. Dahiyat, B. I. & Mayo, S. L. De novo protein design: fully automated sequence selection. Science 278, 82–87 (1997).

    CAS  PubMed  Google Scholar 

  9. Dahiyat, B. I. & Mayo, S. L. Protein design automation. Protein Sci. 5, 895–903 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Walsh, S. T. R., Cheng, H., Bryson, J. W., Roder, H. & DeGrado, W. F. Solution structure and dynamics of a de novo designed three-helix bundle protein. Proc. Natl Acad. Sci. USA 96, 5486–5491 (1999).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  11. Levinthal, C. Are there pathways for protein folding? J. Chim. Phys. 65, 44–45 (1968).

    Google Scholar 

  12. Maynard Smith, J. Natural selection and the concept of a protein space. Nature 225, 563–564 (1970).

    ADS  Google Scholar 

  13. Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).

    CAS  PubMed  ADS  Google Scholar 

  14. Gront, D., Kulp, D. W., Vernon, R. M., Strauss, C. E. M. & Baker, D. Generalized fragment picking in Rosetta: design, protocols and applications. PLoS ONE 6, e23294 (2011).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  15. Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Shapovalov, M. V. & Dunbrack, R. L. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 19, 844–858 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Leman, J. K. et al. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods 17, 665–680 (2020).

  18. Wicky, B. I. M. et al. Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  19. Vorobieva, A. A. et al. De novo design of transmembrane β barrels. Science 371, eabc8182 (2021).

    PubMed  PubMed Central  Google Scholar 

  20. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  21. Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Preprint at bioRxiv https://doi.org/10.1101/2023.10.09.561603 (2023).

  22. Sheffler, W. et al. Fast and versatile sequence-independent protein docking for nanomaterials design using RPXDock. PLoS Comput. Biol. 19, e1010680 (2023).

    CAS  Google Scholar 

  23. Eguchi, R. R., Choe, C. A. & Huang, P.-S. Ig-VAE: generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 18, e1010271 (2022).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  24. Lin, Y. & Alquraishi, M. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. In Proceedings of the 40th International Conference on Machine Learning (eds. Krause, A. et al.) Vol. 202, 20978–21002 (PMLR, 2023); https://proceedings.mlr.press/v202/lin23a.html

  25. Wu, K. E. et al. Protein structure generation via folding diffusion. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.15611 (2022).

  26. Yim, J. et al. SE(3) diffusion model with application to protein backbone generation. In Proceedings of the 40th International Conference on Machine Learning (eds. Krause, A. et al.) Vol. 202, 40001–40039 (PMLR, 2023); https://proceedings.mlr.press/v202/yim23a.html

  27. Bose, J. A. et al. SE(3)-stochastic flow matching for protein backbone generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.02391 (2024).

  28. Yim, J. et al. Fast protein backbone generation with SE(3) flow matching. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.05297 (2023).

  29. Fu, C. et al. A latent diffusion model for protein structure generation. Preprint at arXiv https://doi.org/10.48550/arXiv.2305.04120 (2023).

  30. Liu, Y., Chen, L. & Liu, H. Diffusion in a quantized vector space generates non-idealized protein structures and predicts conformational distributions. Preprint at arXiv https://doi.org/10.1101/2023.11.18.567666 (2023).

  31. Strokach, A., Becerra, D., Corbi-Verge, C., Perez-Riba, A. & Kim, P. M. Fast and flexible protein design using deep graph neural networks. Cell Syst. 11, 402–411 (2020).

    CAS  PubMed  Google Scholar 

  32. Ingraham, J., Garg, V., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In Advances in Neural Information Processing Systems (eds. Wallach, H. et al.) Vol. 32. (Curran Associates, 2019); https://proceedings.neurips.cc/paper_files/paper/2019/file/f3a4ff4839c56a5f460c88cce3666a2b-Paper.pdf

  33. Gao, Z. et al. PiFold: toward effective and efficient protein inverse folding. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.12643 (2022).

  34. Yi, K. et al. Graph denoising diffusion for inverse protein folding. Preprint at arXiv https://doi.org/10.48550/arXiv.2306.16819 (2023).

  35. Hsu, C. et al. Learning inverse folding from millions of predicted structures. In Proceedings of the 39th International Conference on Machine Learning (eds. Chaudhuri, K. et al.) Vol. 162, 8946–8970 (PMLR, 2022); https://proceedings.mlr.press/v162/hsu22a.html

  36. Xiong, P. et al. Increasing the efficiency and accuracy of the ABACUS protein sequence design method. Bioinformatics 36, 136–144 (2020).

    CAS  PubMed  Google Scholar 

  37. Liu, Y. et al. Rotamer-free protein sequence design based on deep learning and self-consistency. Nat. Comput. Sci. 2, 451–462 (2022).

    CAS  Google Scholar 

  38. Heinzinger, M. et al. ProstT5: bilingual language model for protein sequence and structure. Preprint at bioRxiv https://doi.org/10.1101/2023.07.23.550085 (2023).

  39. Su, J. et al. SaProt: protein language modeling with structure-aware vocabulary. Preprint at bioRxiv https://doi.org/10.1101/2023.10.01.560349 (2023).

  40. Gruver, N. et al. Protein design with guided discrete diffusion. Preprint at arXiv https://doi.org/10.48550/arXiv.2305.20009 (2023).

  41. Repecka, D. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat. Mach. Intell. 3, 324–333 (2021).

    Google Scholar 

  42. Greener, J. G., Moffat, L. & Jones, D. T. Design of metalloproteins and novel protein folds using variational autoencoders. Sci. Rep. 8, 16189 (2018).

    PubMed  PubMed Central  ADS  Google Scholar 

  43. Jin, W., Wohlwend, J., Barzilay, R. & Jaakkola, T. Iterative refinement graph neural network for antibody sequence–structure co-design. Preprint at arXiv https://doi.org/10.48550/arXiv.2110.04624 (2022).

  44. Martinkus, K. et al. AbDiffuser: full-atom generation of in-vitro functioning antibodies. Preprint at arXiv https://doi.org/10.48550/arXiv.2308.05027 (NeurIPS, 2023).

  45. Luo, S. Antigen-specific antibody design and optimization with diffusion-based generative models for protein structures. In Advances in Neural Information Processing Systems (eds. Koyejo, S. et al.) Vol. 35, 9754–9767 (Curran Associates, Inc., 2022); https://proceedings.neurips.cc/paper_files/paper/2022/file/3fa7d76a0dc1179f1e98d1bc62403756-Paper-Conference.pdf

  46. Davison, J. Zero-shot learning in modern NLP. Joe Davison Blog joeddav.github.io/blog/2020/05/29/ZSL.html (2020).

  47. Chen, R. T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems (eds. Bengio, S. et al.) Vol. 31 (Curran Associates, 2018); https://proceedings.neurips.cc/paper_files/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf

  48. Lipman, Y., Chen, R. T., Ben-Hamu, H., Nickel, M. & Le, M. Flow matching for generative modeling. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.02747 (2023).

  49. Liu, X., Gong, C. & Liu, Q. Flow straight and fast: learning to generate and transfer data with rectified flow. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.03003 (2022).

  50. Albergo, M. S., Boffi, N. M. & Vanden-Eijnden, E. Stochastic interpolants: a unifying framework for flows and diffusions. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.08797 (2023).

  51. Somnath, V. R. et al. Aligned diffusion Schrödinger bridges. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.11419 (2023).

  52. Conte, L. L., Chothia, C. & Janin, J. The atomic structure of protein–protein recognition sites. J. Mol. Biol. 285, 2177–2198 (1999).

    PubMed  Google Scholar 

  53. Woolfson, D. N. A brief history of de novo protein design: minimal, rational, and computational. J. Mol. Biol. 433, 167160 (2021).

    CAS  PubMed  Google Scholar 

  54. Sesterhenn, F. et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science 368, eaay5051 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Wang, J. et al. Scaffolding protein functional sites using deep learning. Science 377, 387–394 (2022).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  56. Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  57. Scott, A. J. et al. Constructing ion channels from water-soluble α-helical barrels. Nat. Chem. 13, 643–650 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Mahendran, K. R. et al. A monodisperse transmembrane α-helical peptide barrel. Nat. Chem. 9, 411–419 (2017).

    CAS  PubMed  Google Scholar 

  59. Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  60. Cao, L. et al. Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  61. Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227–1233 (2020).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  62. Eguchi, R. R. et al. Deep generative design of epitope-specific binding proteins by latent conformation optimization. Preprint at bioRxiv https://doi.org/10.1101/2022.12.22.521698 (2022).

  63. Glasscock, C. J. et al. Computational design of sequence-specific DNA-binding proteins. Preprint at bioRxiv https://doi.org/10.1101/2023.09.20.558720 (2023).

  64. Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).

    CAS  PubMed  Google Scholar 

  65. Gainza, P. et al. De novo design of protein interactions with learned surface fingerprints. Nature 617, 176–184 (2023).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  66. Torres, S. V. et al. De novo design of high-affinity binders of bioactive helical peptides. Nature https://doi.org/10.1038/s41586-023-06953-1 (2023).

  67. Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  68. Chu, A. E., Fernandez, D., Liu, J., Eguchi, R. R. & Huang, P.-S. De novo design of a highly stable ovoid TIM barrel: unlocking pocket shape towards functional design. Biodes. Res. 2022, 9842315 (2022).

    PubMed  PubMed Central  Google Scholar 

  69. Huang, P.-S. et al. RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS ONE 6, e24109 (2011).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  70. Marcos, E. et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nat. Struct. Mol. Biol. 25, 1028–1034 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Huang, P.-S. et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat. Chem. Biol. 12, 29–34 (2016).

    CAS  PubMed  ADS  Google Scholar 

  72. Jiang, L. et al. De novo computational design of retro-aldol enzymes. Science 319, 1387–1391 (2008).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  73. Winnifrith, A., Outeiral, C. & Hie, B. Generative artificial intelligence for de novo protein design. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.09685 (2023).

  74. Huang, B. et al. A backbone-centred energy function of neural networks for protein design. Nature 602, 523–528 (2022).

    CAS  PubMed  ADS  Google Scholar 

  75. Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).

    CAS  PubMed  ADS  Google Scholar 

  76. Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  77. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  78. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  79. Baek, M. et al. Efficient and accurate prediction of protein structure using RoseTTAFold2. Preprint at bioRxiv https://doi.org/10.1101/2023.05.24.542179 (2023).

  80. Frank, C. et al. Efficient and scalable de novo protein design using a relaxed sequence space. Preprint at bioRxiv https://doi.org/10.1101/2023.02.24.529906 (2023).

  81. Tischer, D. et al. Design of proteins presenting discontinuous functional sites using deep learning. Preprint at bioRxiv https://doi.org/10.1101/2020.11.29.402743 (2020).

  82. Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).

    Google Scholar 

  83. Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).

  84. Radford, A., Metz, L. & Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. Preprint at arXiv https://doi.org/10.48550/arXiv.1511.06434 (2015).

  85. Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at arXiv https://doi.org/10.48550/arXiv.1312.6114 (2013).

  86. Karras, T., Laine, S. & Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 4401–4410 (IEEE, 2018).

  87. Anand, N. & Huang, P. Generative modeling for protein structures. In Advances in Neural Information Processing Systems (eds. Bengio, S. et al.) Vol. 31 (Curran Associates, 2018); https://proceedings.neurips.cc/paper_files/paper/2018/file/afa299a4d1d8c52e75dd8a24c3ce534f-Paper.pdf

  88. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (eds. Larochelle, H. et al.) Vol. 33, 6840–6851 (Curran Associates, 2020); https://proceedings.neurips.cc/paper_files/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf

  89. Song, Y. et al. Score-based generative modeling through stochastic differential equations. Preprint at arXiv https://doi.org/10.48550/arXiv.2011.13456 (2021).

  90. Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems (eds. Ranzato, M. et al.) Vol. 34, 8780–8794 (Curran Associates, 2021); https://proceedings.neurips.cc/paper_files/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf

  91. Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at arXiv https://doi.org/10.48550/arXiv.2205.15019 (2022).

  92. Li, C. T. & Farnia, F. Mode-seeking divergences: theory and applications to GANs. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (eds. Ruiz, F., Dy, J. & van de Meent, J.-W.) Vol. 206, 8321–8350 (PMLR, 2023); https://proceedings.mlr.press/v206/ting-li23a.html

  93. Lee, J. S., Kim, J. & Kim, P. M. Score-based generative modeling for de novo protein design. Nat. Comput. Sci. 3, 382–392 (2023).

    CAS  PubMed  Google Scholar 

  94. Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  95. Chu, A. E., Cheng, L., Nesr, G. E., Xu, M. & Huang, P.-S. An all-atom protein generative model. Preprint at bioRxiv https://doi.org/10.1101/2023.05.24.542194 (2023).

  96. Basanta, B. et al. An enumerative algorithm for de novo design of proteins with diverse pocket structures. Proc. Natl Acad. Sci. USA 117, 22135–22145 (2020).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  97. Mravic, M. et al. Packing of apolar side chains enables accurate design of highly stable membrane proteins. Science 363, 1418–1423 (2019).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  98. Sumida, K. H. et al. Improving protein expression, stability, and function with ProteinMPNN. Preprint at bioRxiv https://doi.org/10.1101/2023.10.03.560713 (2023).

  99. Koga, R. et al. Robust folding of a de novo designed ideal protein even with most of the core mutated to valine. Proc. Natl Acad. Sci. USA 117, 31149–31156 (2020).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  100. Norn, C. et al. Protein sequence design by conformational landscape optimization. Proc. Natl Acad. Sci. USA 118, e2017228118 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  101. Goverde, C. A., Wolf, B., Khakzad, H., Rosset, S. & Correia, B. E. De novo protein design by inversion of the AlphaFold structure prediction network. Protein Sci. 32, e4653 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  102. Anand, N. et al. Protein sequence design with a learned potential. Nat. Commun. 13, 746 (2022).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  103. Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  104. Yang, K.K., Zanichelli, N. & Yeh, H. Masked inverse folding with sequence transfer for protein representation learning. Protein Eng. Des. Sel. 36, gzad015 (2023).

  105. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).

  106. Jeliazkov, J. R., Alamo, Ddel & Karpiak, J. D. ESMFold hallucinates native-like protein sequences. In NeurIPS Workshop on Machine Learning in Structural Biology. Preprint at bioRxiv https://doi.org/10.1101/2023.05.23.541774 (2023).

  107. Rettie, S. A. et al. Cyclic peptide structure prediction and design using AlphaFold. Preprint at bioRxiv https://doi.org/10.1101/2023.02.25.529956 (2023).

  108. Roney, J. P. & Ovchinnikov, S. State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett. 129, 238101 (2022).

    CAS  PubMed  ADS  Google Scholar 

  109. Gazizov, A., Lian, A., Goverde, C., Ovchinnikov, S. & Polizzi, N. F. AF2BIND: predicting ligand-binding sites using the pair representation of AlphaFold2. Preprint at bioRxiv https://doi.org/10.1101/2023.10.15.562410 (2023).

  110. Fleishman, S. J. & Baker, D. Role of the biomolecular energy gap in protein design, structure, and evolution. Cell 149, 262–273 (2012).

    CAS  PubMed  Google Scholar 

  111. Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. Nijkamp, E., Ruffolo, J., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978 (2023).

  113. Ferruz, N., Schmidt, S. & Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 13, 4348 (2022).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  114. Alamdari, S. et al. Protein generation with evolutionary diffusion: sequence is all you need. Preprint at bioRxiv https://doi.org/10.1101/2023.09.11.556673 (2023).

  115. Kamisetty, H., Ovchinnikov, S. & Baker, D. Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era. Proc. Natl Acad. Sci. USA 110, 15674–15679 (2013).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  116. Rao, R. et al. Evaluating protein transfer learning with TAPE. In Advances in Neural Information Processing Systems (eds. Wallach, H. et al.) Vol. 32 (Curran Associates, 2019); https://proceedings.neurips.cc/paper_files/paper/2019/file/37f65c068b7723cd7809ee2d31d7861c-Paper.pdf

  117. Vig, J. et al. BERTology meets biology: interpreting attention in protein language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2006.15222 (2021).

  118. Bepler, T. & Berger, B. Learning the protein language: evolution, structure, and function. Cell Syst. 12, 654–669 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  119. Verkuil, R. et al. Language models generalize beyond natural proteins. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521521 (2022).

  120. Wu, R. et al. High-resolution de novo structure prediction from primary sequence. Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500999 (2022).

  121. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    MathSciNet  CAS  PubMed  ADS  Google Scholar 

  122. Hie, B. et al. A high-level programming language for generative protein design. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521526 (2022).

  123. Mackenzie, C. O., Zhou, J. & Grigoryan, G. Tertiary alphabet for the observable protein structural universe. Proc. Natl Acad. Sci. USA 113, E7438–E7447 (2016).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  124. Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  125. Shin, J. E. et al. Protein design and variant prediction using autoregressive generative models. Nat. Commun. 12, 2403 (2021).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  126. Brookes, D., Park, H. & Listgarten, J. Conditioning by adaptive sampling for robust design. In Proceedings of the 36th International Conference on Machine Learning (eds. Chaudhuri, K. & Salakhutdinov, R.) Vol. 97, 773–782 (PMLR, 2019); https://proceedings.mlr.press/v97/brookes19a.html

  127. Lisanza, S. L. et al. Joint generation of protein sequence and structure with RoseTTAFold sequence space diffusion. Preprint at bioRxiv https://doi.org/10.1101/2023.05.08.539766 (2023).

  128. Langan, R. A. et al. De novo design of bioactive protein switches. Nature 572, 205–210 (2019).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  129. Praetorius, F. et al. Design of stimulus-responsive two-state hinge proteins. Science 381, 754–760 (2023).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  130. Wei, K. Y. et al. Computational design of closely related proteins that adopt two well-defined but structurally divergent folds. Proc. Natl Acad. Sci. USA 117, 7208–7215 (2020).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  131. St-Jacques, A. D. et al. Computational remodeling of an enzyme conformational landscape for altered substrate selectivity. Nat. Commun. 14, 6058 (2023).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  132. Pesce, F. et al. Design of intrinsically disordered protein variants with diverse structural properties. Preprint at bioRxiv https://doi.org/10.1101/2023.10.22.563461 (2023).

  133. Leaver-Fay, A., Jacak, R., Stranges, P. B. & Kuhlman, B. A generic program for multistate protein design. PLoS ONE 6, e20937 (2011).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  134. Wankowicz, S. A. et al. Uncovering protein ensembles: automated multiconformer model building for X-ray crystallography and cryo-EM. Preprint at bioRxiv https://doi.org/10.1101/2023.06.28.546963 (2023).

    Article  Google Scholar 

  135. Kim, J., McFee, M., Fang, Q., Abdin, O. & Kim, P. M. Computational and artificial intelligence-based methods for antibody development. Trends Pharmacol. Sci. 44, 175–189 (2023).

    CAS  PubMed  Google Scholar 

  136. North, B., Lehmann, A. & Dunbrack, R. L. A new clustering of antibody CDR loop conformations. J. Mol. Biol. 406, 228–256 (2011).

    CAS  PubMed  Google Scholar 

  137. Raybould, M. I. et al. Five computational developability guidelines for therapeutic antibody profiling. Proc. Natl Acad. Sci. USA 116, 4025–4030 (2019).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  138. Lipsh-Sokolik, R. et al. Combinatorial assembly and design of enzymes. Science 379, 195–201 (2023).

    CAS  PubMed  ADS  Google Scholar 

  139. Yeh, A. H. W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  140. Jing, B. et al. EigenFold: generative protein structure prediction with diffusion models. Preprint at arXiv https://doi.org/10.48550/arXiv.2304.02198 (2023).

  141. Zheng, S. et al. Towards predicting equilibrium distributions for molecular systems with deep learning. Preprint at arXiv https://doi.org/10.48550/arXiv.2306.05445 (2023).

  142. Abdin, O. & Kim, P. M. PepFlow: direct conformational sampling from peptide energy landscapes through hypernetwork-conditioned diffusion. Preprint at bioRxiv https://doi.org/10.1101/2023.06.25.546443 (2023).

  143. Wallner, B. AFsample: improving multimer prediction with AlphaFold using massive sampling. Bioinformatics 39, btad573 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  144. Wayment-Steele, H. K. et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature https://doi.org/10.1038/s41586-023-06832-9 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  145. Khakzad, H. et al. A new age in protein design empowered by deep learning. Cell Syst. 14, 925–939 (2023).

    CAS  PubMed  Google Scholar 

  146. Minami, S. et al. Exploration of novel αβ-protein folds through de novo design. Nat. Struct. Mol. Biol. 30, 1132–1140 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  147. Bonet, J. et al. Rosetta FunFolDes — a general framework for the computational design of functional proteins. PLoS Comput. Biol. 14, e1006623 (2018).

    Google Scholar 

  148. Dieleman, S. Diffusion Models are Autoencoders https://sander.ai/2022/01/31/diffusion.html (2022).

  149. Boyken, S. E. et al. De novo design of tunable, pH-driven conformational changes. Science 364, 658–664 (2019).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  150. Bethel, N. P. et al. Precisely patterned nanofibres made from extendable protein multiplexes. Nat. Chem. 15, 1664–1671 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  151. Kurihara, K. et al. Crystal structure and activity of a de novo enzyme, ferric enterobactin esterase Syn-F4. Proc. Natl Acad. Sci. USA 120, e2218281120 (2023).

    CAS  PubMed  Google Scholar 

  152. Naudin, E. A. et al. Acyl transfer catalytic activity in de novo designed protein with N-terminus of α-helix as oxyanion-binding site. J. Am. Chem. Soc. 143, 3330–3339 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  153. Mulligan, V. K. et al. Computational design of mixed chirality peptide macrocycles with internal symmetry. Protein Sci. 29, 2433–2445 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank S. Ovchinnikov for feedback on the manuscript. For readers interested in more depth on physics-based modeling approaches, such as Rosetta, we recommend other reviews2,3,17. We also recommend two recent reviews with related perspectives, focusing more on the details and impact of machine learning on protein engineering and design, especially on direct sequence modeling73,145. A.E.C. is supported by the NSF GRFP and the Merck SEEDS Program. T.L. is supported by a Stanford Graduate Fellowship. P.-S.H. is supported by the NIH (R01GM147893), the American Cancer Society (ACS 134055-IRG-218), the BASF CARA project and the Discovery Innovation Fund.

Author information

Authors and Affiliations

Authors

Contributions

Planning, figure production and writing of the Review: all authors. Reference curation: A.E.C. and T.L. Supplementary tables: A.E.C. and T.L.

Corresponding author

Correspondence to Po-Ssu Huang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review information

Nature Biotechnology thanks Philip Kim and Kevin Yang for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Tables 1 and 2

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chu, A.E., Lu, T. & Huang, PS. Sparks of function by de novo protein design. Nat Biotechnol 42, 203–215 (2024). https://doi.org/10.1038/s41587-024-02133-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-024-02133-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing