Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Opportunities and challenges in design and optimization of protein function

Abstract

The field of protein design has made remarkable progress over the past decade. Historically, the low reliability of purely structure-based design methods limited their application, but recent strategies that combine structure-based and sequence-based calculations, as well as machine learning tools, have dramatically improved protein engineering and design. In this Review, we discuss how these methods have enabled the design of increasingly complex structures and therapeutically relevant activities. Additionally, protein optimization methods have improved the stability and activity of complex eukaryotic proteins. Thanks to their increased reliability, computational design methods have been applied to improve therapeutics and enzymes for green chemistry and have generated vaccine antigens, antivirals and drug-delivery nano-vehicles. Moreover, the high success of design methods reflects an increased understanding of basic rules that govern the relationships among protein sequence, structure and function. However, de novo design is still limited mostly to α-helix bundles, restricting its potential to generate sophisticated enzymes and diverse protein and small-molecule binders. Designing complex protein structures is a challenging but necessary next step if we are to realize our objective of generating new-to-nature activities.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Goals of protein design methodologies.
Fig. 2: Computational design of stable proteins.
Fig. 3: Design of enhanced protein activity.
Fig. 4: Design methods must navigate an astronomically large sequence space that is extremely sparse in functional proteins.
Fig. 5: Comparison of structural features in natural and de novo-designed proteins.
Fig. 6: Applications of de novo protein design.

Similar content being viewed by others

References

  1. Arnold, F. H. Innovation by evolution: bringing new chemistry to life (Nobel Lecture). Angew. Chem. Int. Ed. Engl. 58, 14420–14426 (2019).

    Article  CAS  PubMed  Google Scholar 

  2. Winter, G. Harnessing evolution to make medicines (Nobel Lecture). Angew. Chem. Int. Ed. Engl. 58, 14438–14445 (2019).

    Article  CAS  PubMed  Google Scholar 

  3. Trudeau, D. L. & Tawfik, D. S. Protein engineers turned evolutionists-the quest for the optimal starting point. Curr. Opin. Biotechnol. 60, 46–52 (2019).

    Article  CAS  PubMed  Google Scholar 

  4. Packer, M. S. & Liu, D. R. Methods for the directed evolution of proteins. Nat. Rev. Genet. 16, 379–394 (2015).

    Article  CAS  PubMed  Google Scholar 

  5. Arnold, F. H. The nature of chemical innovation: new enzymes by evolution. Q. Rev. Biophys. 48, 404–410 (2015).

    Article  CAS  PubMed  Google Scholar 

  6. Arnold, F. H. Combinatorial and computational challenges for biocatalyst design. Nature 409, 253–257 (2001).

    Article  CAS  PubMed  Google Scholar 

  7. Tokuriki, N., Stricher, F., Serrano, L. & Tawfik, D. S. How protein stability and new functions trade off. PLoS Comput. Biol. 4, e1000002 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Tokuriki, N. et al. Diminishing returns and tradeoffs constrain the laboratory optimization of an enzyme. Nat. Commun. 3, 1257 (2012).

    Article  PubMed  Google Scholar 

  9. Goldsmith, M. et al. Overcoming an optimization plateau in the directed evolution of highly efficient nerve agent bioscavengers. Protein Eng. Des. Sel. 30, 333–345 (2017).

    Article  CAS  PubMed  Google Scholar 

  10. Fleishman, S. J. & Baker, D. Role of the biomolecular energy gap in protein design, structure, and evolution. Cell 149, 262–273 (2012).

    Article  CAS  PubMed  Google Scholar 

  11. Stranges, P. B. & Kuhlman, B. A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds. Protein Sci. 22, 74–82 (2013).

    Article  CAS  PubMed  Google Scholar 

  12. Baker, D. What has de novo protein design taught us about protein folding and biophysics? Protein Sci. 28, 678–683 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Khare, S. D. & Fleishman, S. J. Emerging themes in the computational design of novel enzymes and protein-protein interfaces. FEBS Lett. 587, 1147–1154 (2013).

    Article  CAS  PubMed  Google Scholar 

  14. Baker, D. An exciting but challenging road ahead for computational enzyme design. Protein Sci. 19, 1817–1819 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Baek, M. & Baker, D. Deep learning and protein structure modeling. Nat. Methods 19, 13–14 (2022).

    Article  CAS  PubMed  Google Scholar 

  16. Pan, X. & Kortemme, T. Recent advances in de novo protein design: principles, methods, and applications. J. Biol. Chem. 296, 100558 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Korendovych, I. V. & DeGrado, W. F. De novo protein design, a retrospective. Q. Rev. Biophys. 53, e3 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Woolfson, D. N. A brief history of de novo protein design: minimal, rational, and computational. J. Mol. Biol. 433, 167160 (2021).

    Article  CAS  PubMed  Google Scholar 

  19. Kortemme, T. De novo protein design — from new structures to programmable functions. Cell 187, 526–544 (2024).

    Article  CAS  PubMed  Google Scholar 

  20. Yue, K. & Dill, K. A. Inverse protein folding problem: designing polymer sequences. Proc. Natl Acad. Sci. USA 89, 4163–4167 (1992).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Bowie, J. U., Lüthy, R. & Eisenberg, D. A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164–170 (1991).

    Article  CAS  PubMed  Google Scholar 

  22. Weinstein, J., Khersonsky, O. & Fleishman, S. J. Practically useful protein-design methods combining phylogenetic and atomistic calculations. Curr. Opin. Struct. Biol. 63, 58–64 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).

    Article  CAS  PubMed  Google Scholar 

  24. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023). Applying diffusion models to backbone generation yields large de novo-designed proteins and assemblies. Available as a Colab notebook.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).

    Article  CAS  PubMed  Google Scholar 

  26. Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature 550, 74–79 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Cao, L. et al. Design of protein-binding proteins from the target structure alone. Nature 605, 551–560 (2022). Repertoires of miniprotein binders for 12 different antigens are designed based solely on the structure of the target antigen site.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Bershtein, S., Segal, M., Bekerman, R., Tokuriki, N. & Tawfik, D. S. Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444, 929–932 (2006).

    Article  CAS  PubMed  Google Scholar 

  29. Zhao, H. & Arnold, F. H. Directed evolution converts subtilisin E into a functional equivalent of thermitase. Protein Eng. 12, 47–53 (1999).

    Article  CAS  PubMed  Google Scholar 

  30. Anfinsen, C. B. Principles that govern the folding of protein chains. Science 181, 223–230 (1973).

    Article  CAS  PubMed  Google Scholar 

  31. Levinthal, C. Are there pathways for protein folding? J. Chim. Phys. 65, 44–45 (1968).

    Article  Google Scholar 

  32. Dill, K. A. Polymer principles and protein folding. Protein Sci. 8, 1166–1180 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Brocchieri, L. & Karlin, S. Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res. 33, 3390–3400 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Johansson, K. E. et al. Computational redesign of thioredoxin is hypersensitive toward minor conformational changes in the backbone template. J. Mol. Biol. 428, 4361–4377 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Cherny, I. et al. Engineering V-type nerve agents detoxifying enzymes using computationally focused libraries. ACS Chem. Biol. 8, 2394–2403 (2013).

    Article  CAS  PubMed  Google Scholar 

  36. Baran, D. et al. Principles for computational design of binding antibodies. Proc. Natl Acad. Sci. USA 114, 10900–10905 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Murphy, P. M., Bolduc, J. M., Gallaher, J. L., Stoddard, B. L. & Baker, D. Alteration of enzyme specificity by computational loop remodeling and design. Proc. Natl Acad. Sci. USA 106, 9215–9220 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Fleishman, S. J. et al. Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science 332, 816–821 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Whitehead, T. A. et al. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing. Nat. Biotechnol. 30, 543–548 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Goldenzweig, A. & Fleishman, S. J. Principles of protein stability and their application in computational design. Annu. Rev. Biochem. 87, 105–129 (2018).

    Article  CAS  PubMed  Google Scholar 

  41. Khersonsky, O. & Fleishman, S. J. Why reinvent the wheel? Building new proteins based on ready-made parts. Protein Sci. 25, 1179–1187 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Goldenzweig, A. et al. Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol. Cell 63, 337–346 (2016). Combining phylogenetic analysis with atomistic design calculations improves expression and stability of diverse proteins. Available as a web server.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Khersonsky, O. et al. Automated design of efficient and functionally diverse enzyme repertoires. Mol. Cell 72, 178–186.e5 (2018). An evolution-guided atomistic design method enhances enzyme activity levels. Available as a web server.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Hanning, K. R., Minot, M., Warrender, A. K., Kelton, W. & Reddy, S. T. Deep mutational scanning for therapeutic antibody engineering. Trends Pharmacol. Sci. 43, 123–135 (2022).

    Article  CAS  PubMed  Google Scholar 

  45. Fox, R. J. et al. Improving catalytic function by ProSAR-driven enzyme evolution. Nat. Biotechnol. 25, 338–344 (2007).

    Article  CAS  PubMed  Google Scholar 

  46. Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).

    Article  CAS  PubMed  Google Scholar 

  47. Taft, J. M. et al. Deep mutational learning predicts ACE2 binding and antibody escape to combinatorial mutations in the SARS-CoV-2 receptor-binding domain. Cell 185, 4008–4022.e14 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Bedbrook, C. N., Yang, K. K., Rice, A. J., Gradinaru, V. & Arnold, F. H. Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization. PLoS Comput. Biol. 13, e1005786 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Balchin, D., Hayer-Hartl, M. & Hartl, F. U. In vivo aspects of protein folding and quality control. Science 353, aac4354 (2016).

    Article  PubMed  Google Scholar 

  50. McLendon, G. & Radany, E. Is protein turnover thermodynamically controlled? J. Biol. Chem. 253, 6335–6337 (1978).

    Article  CAS  PubMed  Google Scholar 

  51. Kwon, W. S., Da Silva, N. A. & Kellis, J. T. Jr. Relationship between thermal stability, degradation rate and expression yield of barnase variants in the periplasm of Escherichia coli. Protein Eng. 9, 1197–1202 (1996).

    Article  CAS  PubMed  Google Scholar 

  52. Parsell, D. A. & Sauer, R. T. The structural stability of a protein is an important determinant of its proteolytic susceptibility in Escherichia coli. J. Biol. Chem. 264, 7590–7595 (1989).

    Article  CAS  PubMed  Google Scholar 

  53. Shusta, E. V., Kieke, M. C., Parke, E., Kranz, D. M. & Wittrup, K. D. Yeast polypeptide fusion surface display levels predict thermal stability and soluble secretion efficiency. J. Mol. Biol. 292, 949–956 (1999).

    Article  CAS  PubMed  Google Scholar 

  54. Christendat, D. et al. Structural proteomics: prospects for high throughput sample preparation. Prog. Biophys. Mol. Biol. 73, 339–345 (2000).

    Article  CAS  PubMed  Google Scholar 

  55. Mehlin, C. et al. Heterologous expression of proteins from Plasmodium falciparum: results from 1000 genes. Mol. Biochem. Parasitol. 148, 144–160 (2006).

    Article  CAS  PubMed  Google Scholar 

  56. Klenk, C., Ehrenmann, J., Schütz, M. & Plückthun, A. A generic selection system for improved expression and thermostability of G protein-coupled receptors by directed evolution. Sci. Rep. 6, 21294 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Andréll, J. & Tate, C. G. Overexpression of membrane proteins in mammalian cells for structural studies. Mol. Membr. Biol. 30, 52–63 (2013).

    Article  PubMed  Google Scholar 

  58. Bloom, J. D., Labthavikul, S. T., Otey, C. R. & Arnold, F. H. Protein stability promotes evolvability. Proc. Natl Acad. Sci. USA 103, 5869–5874 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Rosace, A. et al. Automated optimisation of solubility and conformational stability of antibodies and proteins. Nat. Commun. 14, 1937 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Wijma, H. J., Fürst, M. J. L. J. & Janssen, D. B. A computational library design protocol for rapid improvement of protein stability: FRESCO. Methods Mol. Biol. 1685, 69–85 (2018).

    Article  CAS  PubMed  Google Scholar 

  61. Musil, M. et al. FireProt: web server for automated design of thermostable proteins. Nucleic Acids Res. 45, W393–W399 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Campeotto, I. et al. One-step design of a stable variant of the malaria invasion protein RH5 for use as a vaccine immunogen. Proc. Natl Acad. Sci. USA 114, 998–1002 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Peleg, Y. et al. Community-wide experimental evaluation of the pross stability-design method. J. Mol. Biol. 433, 166964 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Pokorna, S. et al. Design of a stable human acid-β-glucosidase: towards improved Gaucher disease therapy and mutation classification. FEBS J. 290, 3383–3399 (2023).

    Article  CAS  PubMed  Google Scholar 

  65. Borgert, S. R. et al. Moonlighting chaperone activity of the enzyme PqsE contributes to RhlR-controlled virulence of Pseudomonas aeruginosa. Nat. Commun. 13, 7402 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Barber-Zucker, S. et al. Stable and functionally diverse versatile peroxidases designed directly from sequences. J. Am. Chem. Soc. 144, 3564–3571 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Williams, J. A. et al. Structural and computational design of a SARS-CoV-2 spike antigen with improved expression and immunogenicity. Sci. Adv. 9, eadg0330 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Mao, G. et al. A sustainable approach for degradation and detoxification of malachite green by an engineered polyphenol oxidase at high temperature. J. Clean. Prod. 328, 129437 (2021).

    Article  CAS  Google Scholar 

  69. Lambert, A. R., Hallinan, J. P., Werther, R., Głów, D. & Stoddard, B. L. Optimization of protein thermostability and exploitation of recognition behavior to engineer altered protein–DNA recognition. Structure 28, 760–775.e8 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Khersonsky, O. et al. Stable mammalian serum albumins designed for bacterial expression. J. Mol. Biol. 435, 168191 (2023).

    Article  CAS  PubMed  Google Scholar 

  71. Sherkhanov, S. et al. Isobutanol production freed from biological limits using synthetic biochemistry. Nat. Commun. 11, 4292 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Allouche-Arnon, H. et al. Computationally designed dual-color MRI reporters for noninvasive imaging of transgene expression. Nat. Biotechnol. 40, 1143–1149 (2022).

    Article  CAS  PubMed  Google Scholar 

  73. Doble, M. V. et al. Engineering thermostability in artificial metalloenzymes to increase catalytic activity. ACS Catal. 11, 3620–3627 (2021).

    Article  CAS  Google Scholar 

  74. Hsieh, C.-L. et al. Stabilized coronavirus spike stem elicits a broadly protective antibody. Cell Rep. 37, 109929 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Higgins, M. K. Can we AlphaFold our way out of the next pandemic? J. Mol. Biol. 433, 167093 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Graham, B. S., Gilman, M. S. A. & McLellan, J. S. Structure-based vaccine antigen design. Annu. Rev. Med. 70, 91–104 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Hsieh, C.-L. & McLellan, J. S. Protein engineering responses to the COVID-19 pandemic. Curr. Opin. Struct. Biol. 74, 102385 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. U.S. National Library of Medicine. ClinicalTrials.gov https://clinicaltrials.gov/study/NCT05790889 (2023).

  79. Hettiaratchi, M. H. et al. Reengineering biocatalysts: computational redesign of chondroitinase ABC improves efficacy and stability. Sci. Adv. 6, eabc6378 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Rosenzweig, E. S. et al. Chondroitinase improves anatomical and functional outcomes after primate spinal cord injury. Nat. Neurosci. 22, 1269–1275 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Busch, S. A., Horn, K. P., Silver, D. J. & Silver, J. Overcoming macrophage-mediated axonal dieback following CNS injury. J. Neurosci. 29, 9967–9976 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Schueler-Furman, O., Wang, C., Bradley, P., Misura, K. & Baker, D. Progress in modeling of protein structures and interactions. Science 310, 638–642 (2005).

    Article  CAS  PubMed  Google Scholar 

  83. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Tennenhouse, A. et al. Computational optimization of antibody humanness and stability by systematic energy-based ranking. Nat. Biomed. Eng. 8, 30–44 (2023).

    Article  PubMed  Google Scholar 

  86. Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science https://doi.org/10.1126/science.adl2528 (2024).

  87. Abanades, B. et al. ImmuneBuilder: deep-learning models for predicting the structures of immune proteins. Commun. Biol. 6, 575 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Zelnik, I. D. et al. Computational design and molecular dynamics simulations suggest the mode of substrate binding in ceramide synthases. Nat. Commun. 14, 2330 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Weinstein, J. J. et al. One-shot design elevates functional expression levels of a voltage-gated potassium channel. Preprint at bioRxiv https://doi.org/10.1101/2022.12.28.522065 (2022).

  90. Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res. 33, W382–W388 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Bednar, D. et al. FireProt: energy- and evolution-based computational design of thermostable multiple-point mutants. PLoS Comput. Biol. 11, e1004556 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  92. Marques, S. M., Planas-Iglesias, J. & Damborsky, J. Web-based tools for computational enzyme design. Curr. Opin. Struct. Biol. 69, 19–34 (2021).

    Article  CAS  PubMed  Google Scholar 

  93. Breen, M. S., Kemena, C., Vlasov, P. K., Notredame, C. & Kondrashov, F. A. Epistasis as the primary factor in molecular evolution. Nature 490, 535–538 (2012).

    Article  CAS  PubMed  Google Scholar 

  94. Weinreich, D. M., Watson, R. A. & Chao, L. Perspective: sign epistasis and genetic constraint on evolutionary trajectories. Evolution 59, 1165–1174 (2005).

    CAS  PubMed  Google Scholar 

  95. Smith, J. M. Natural selection and the concept of a protein space. Nature 225, 563–564 (1970).

    Article  CAS  PubMed  Google Scholar 

  96. Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Yang, G. et al. Higher-order epistasis shapes the fitness landscape of a xenobiotic-degrading enzyme. Nat. Chem. Biol. 15, 1120–1128 (2019).

    Article  CAS  PubMed  Google Scholar 

  98. Goldsmith, M. & Tawfik, D. S. Enzyme engineering: reaching the maximal catalytic efficiency peak. Curr. Opin. Struct. Biol. 47, 140–150 (2017).

    Article  CAS  PubMed  Google Scholar 

  99. Corbella, M., Pinto, G. P. & Kamerlin, S. C. L. Loop dynamics and the evolution of enzyme activity. Nat. Rev. Chem. 7, 536–547 (2023).

    Article  CAS  PubMed  Google Scholar 

  100. Sumbalova, L., Stourac, J., Martinek, T., Bednar, D. & Damborsky, J. HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information. Nucleic Acids Res. 46, W356–W362 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Stourac, J. et al. Caver Web 1.0: identification of tunnels and channels in proteins and analysis of ligand transport. Nucleic Acids Res. 47, W414–W422 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Klaus, M., Buyachuihan, L. & Grininger, M. Ketosynthase domain constrains the design of polyketide synthases. ACS Chem. Biol. 15, 2422–2432 (2020).

    Article  CAS  PubMed  Google Scholar 

  103. Ospina, F. et al. Selective biocatalytic N-methylation of unsaturated heterocycles. Angew. Chem. Int. Ed. Engl. 61, e202213056 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Gomez de Santos, P. et al. Repertoire of computationally designed peroxygenases for enantiodivergent C–H oxyfunctionalization reactions. J. Am. Chem. Soc. 145, 3443–3453 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Beltrán-Nogal, A. et al. Surfing the wave of oxyfunctionalization chemistry by engineering fungal unspecific peroxygenases. Curr. Opin. Struct. Biol. 73, 102342 (2022).

    Article  PubMed  Google Scholar 

  106. Warshel, A. Electrostatic origin of the catalytic power of enzymes and the role of preorganized active sites. J. Biol. Chem. 273, 27035–27038 (1998).

    Article  CAS  PubMed  Google Scholar 

  107. Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Tsuboyama, K. et al. Mega-scale experimental analysis of protein folding stability in biology and design. Nature 620, 434–444 (2023). More than a million miniproteins were designed and screened to learn the determinants of foldability and stability.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Lipsh-Sokolik, R. et al. Combinatorial assembly and design of enzymes. Science 379, 195–201 (2023).

    Article  CAS  PubMed  Google Scholar 

  110. Weinstein, J. Y. et al. Designed active-site library reveals thousands of functional GFP variants. Nat. Commun. 14, 2890 (2023). Millions of active-site variants were designed in the GFP active site and used to learn molecular determinants of activity.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Khersonsky, O. & Fleishman, S. J. What have we learned from design of function in large proteins? BioDesign Res. 2022, 9787581 (2022).

    Article  Google Scholar 

  112. Lambert, T. J. FPbase: a community-editable fluorescent protein database. Nat. Methods 16, 277–278 (2019).

    Article  CAS  PubMed  Google Scholar 

  113. Hoch, S. Y., Weinstein, J. Y., Netzer, R., Hakeny, K. & Fleishman, S. J. GGAssembler: economical design of gene libraries with precise control over mutations. Preprint at bioRxiv https://doi.org/10.1101/2023.05.18.541394 (2023).

  114. Povolotskaya, I. S. & Kondrashov, F. A. Sequence space and the ongoing expansion of the protein universe. Nature 465, 922–926 (2010).

    Article  CAS  PubMed  Google Scholar 

  115. Notin, P., Rollins, N., Gal, Y., Sander, C. & Marks, D. Machine learning for functional protein design. Nat. Biotechnol. 42, 216–228 (2024).

    Article  CAS  PubMed  Google Scholar 

  116. Ho, S. P. & DeGrado, W. F. Design of a 4-helix bundle protein: synthesis of peptides which self-associate into a helical protein. J. Am. Chem. Soc. 109, 6751–6758 (1987).

    Article  CAS  Google Scholar 

  117. Richardson, J. S. et al. Looking at proteins: representations, folding, packing, and design. Biophysical Society National Lecture, 1992. Biophys. J. 63, 1185–1209 (1992).

    CAS  PubMed  PubMed Central  Google Scholar 

  118. Broome, B. M. & Hecht, M. H. Nature disfavors sequences of alternating polar and non-polar amino acids: implications for amyloidogenesis. J. Mol. Biol. 296, 961–968 (2000).

    Article  CAS  PubMed  Google Scholar 

  119. Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 547–552 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Dahiyat, B. I. & Mayo, S. L. De novo protein design: fully automated sequence selection. Science 278, 82–87 (1997).

    Article  CAS  PubMed  Google Scholar 

  121. Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Marcos, E. et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nat. Struct. Mol. Biol. 25, 1028–1034 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Shakhnovich, E. I. Protein design: a perspective from simple tractable models. Fold. Des. 3, R45–58 (1998).

    Article  CAS  PubMed  Google Scholar 

  125. McMillan, P. F., Clary, D. C. & Wolynes, P. G. Energy landscapes and solved protein-folding problems. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 363, 453–467 (2004).

    Google Scholar 

  126. Govindarajan, S. & Goldstein, R. A. Why are some proteins structures so common? Proc. Natl Acad. Sci. USA 93, 3341–3345 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Helling, R. et al. The designability of protein structures. J. Mol. Graph. Model. 19, 157–167 (2001).

    Article  CAS  PubMed  Google Scholar 

  128. Tóth-Petróczy, A. & Tawfik, D. S. The robustness and innovability of protein folds. Curr. Opin. Struct. Biol. 26, 131–138 (2014).

    Article  PubMed  Google Scholar 

  129. Pierce, N. A. & Winfree, E. Protein design is NP-hard. Protein Eng. 15, 779–782 (2002).

    Article  CAS  PubMed  Google Scholar 

  130. Kuhlman, B. & Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 20, 681–697 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  131. Street, A. G. & Mayo, S. L. Computational protein design. Structure 7, R105–9 (1999).

    Article  CAS  PubMed  Google Scholar 

  132. Bhardwaj, G., Mulligan, V. K., Bahl, C. D. & Gilmore, J. M. Accurate de novo design of hyperstable constrained peptides. Nature 538, 329–335 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Pan, X. et al. Expanding the space of protein geometries by computational design of de novo fold families. Science 369, 1132–1136 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Verkuil, R. et al. Language models generalize beyond natural proteins. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521521 (2022).

  135. Lisanza, S. L. et al. Joint generation of protein sequence and structure with RoseTTAFold sequence space diffusion. Preprint at bioRxiv https://doi.org/10.1101/2023.05.08.539766 (2023).

  136. Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022). An artificial-intelligence-based sequence design method improves design success rate relative to previous, physics-based methods. Available as a Colab notebook.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Wicky, B. I. M. et al. Hallucinating symmetric protein assemblies. Science 378, 56–61 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Huang, B. et al. A backbone-centred energy function of neural networks for protein design. Nature 602, 523–528 (2022).

    Article  CAS  PubMed  Google Scholar 

  139. Anand, N. et al. Protein sequence design with a learned potential. Nat. Commun. 13, 746 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Harteveld, Z. et al. Deep sharpening of topological features for de novo protein design. OpenReview.net https://openreview.net/forum?id=DwN81YIXGQP (2022).

  141. Eguchi, R. R., Choe, C. A. & Huang, P.-S. Ig-VAE: generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput. Biol. 18, e1010271 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Wang, J. et al. Scaffolding protein functional sites using deep learning. Science 377, 387–394 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Kim, D. E. et al. De novo design of small beta barrel proteins. Proc. Natl Acad. Sci. USA 120, e2207974120 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Goverde, C. A. et al. Computational design of soluble analogues of integral membrane protein structures. Preprint at bioRxiv https://doi.org/10.1101/2023.05.09.540044 (2023).

  145. Harteveld, Z. et al. Exploring “dark matter” protein folds using deep learning. Preprint at bioRxiv https://doi.org/10.1101/2023.08.30.555621 (2023).

  146. Huang, P.-S. et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat. Chem. Biol. 12, 29–34 (2016).

    Article  CAS  PubMed  Google Scholar 

  147. Norn, C. et al. Protein sequence design by conformational landscape optimization. Proc. Natl Acad. Sci. USA 118, e2017228118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Lee, J. S., Kim, J. & Kim, P. M. Score-based generative modeling for de novo protein design. Nat. Comput. Sci. 3, 382–392 (2023).

    Article  CAS  PubMed  Google Scholar 

  149. Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Yim, J. et al. Fast protein backbone generation with SE(3) flow matching. Preprint at https://doi.org/10.48550/arXiv.2310.05297 (2023).

  151. Sesterhenn, F. et al. De novo protein design enables the precise induction of RSV-neutralizing antibodies. Science 368, eaay5051 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Yeh, A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  153. Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227–1233 (2020). Computational design of small-molecule binding sites using a precomputed, low-energy constellation of ligand and interacting amino acids.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  154. Marchand, A., Van Hall-Beauvais, A. K. & Correia, B. E. Computational design of novel protein-protein interactions — an overview on methodological approaches and applications. Curr. Opin. Struct. Biol. 74, 102370 (2022).

    Article  CAS  PubMed  Google Scholar 

  155. Linsky, T. W. et al. De novo design of potent and resilient hACE2 decoys to neutralize SARS-CoV-2. Science 370, 1208–1214 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).

    Article  CAS  PubMed  Google Scholar 

  157. Gainza, P. et al. De novo design of protein interactions with learned surface fingerprints. Nature 617, 176–184 (2023). Designing binders of four target proteins using an artificial-intelligence-based strategy that predicts putative binding sites.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  158. Cao, L. et al. De novo design of picomolar SARS-CoV-2 miniprotein inhibitors. Science 370, 426–431 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  159. Strauch, E.-M. et al. Computational design of trimeric influenza-neutralizing proteins targeting the hemagglutinin receptor binding site. Nat. Biotechnol. 35, 667–671 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. Silva, D.-A. et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature 565, 186–191 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  161. Hafler, D. A. Cytokines and interventional immunology. Nat. Rev. Immunol. 7, 423 (2007).

    Article  CAS  Google Scholar 

  162. Correia, B. E. et al. Proof of principle for epitope-focused vaccine design. Nature 507, 201–206 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. Azoitei, M. L. et al. Computation-guided backbone grafting of a discontinuous motif onto a protein scaffold. Science 334, 373–376 (2011).

    Article  CAS  PubMed  Google Scholar 

  164. Sesterhenn, F. et al. Boosting subdominant neutralizing antibody responses with a computationally designed epitope-focused immunogen. PLoS Biol. 17, e3000164 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  165. Jardine, J. G. et al. HIV-1 broadly neutralizing antibody precursor B cells revealed by germline-targeting immunogen. Science 351, 1458–1463 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  166. Marcandalli, J. et al. Induction of potent neutralizing antibody responses by a designed protein nanoparticle vaccine for respiratory syncytial virus. Cell 176, 1420–1431.e17 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  167. Kanekiyo, M. et al. Self-assembling influenza nanoparticle vaccines elicit broadly neutralizing H1N1 antibodies. Nature 499, 102–106 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  168. Abbott, R. K. et al. Precursor frequency and affinity determine B cell competitive fitness in germinal centers, tested with germline-targeting HIV vaccine immunogens. Immunity 48, 133–146.e6 (2018).

    Article  CAS  PubMed  Google Scholar 

  169. Arunachalam, P. S. et al. Adjuvanting a subunit COVID-19 vaccine to induce protective immunity. Nature 594, 253–258 (2021).

    Article  CAS  PubMed  Google Scholar 

  170. Walls, A. C. et al. Elicitation of potent neutralizing antibody responses by designed protein nanoparticle vaccines for SARS-CoV-2. Cell 183, 1367–1382.e17 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  171. Griss, R. et al. Bioluminescent sensor proteins for point-of-care therapeutic drug monitoring. Nat. Chem. Biol. 10, 598–603 (2014).

    Article  CAS  PubMed  Google Scholar 

  172. Dawson, W. M. et al. Differential sensing with arrays of de novo designed peptide assemblies. Nat. Commun. 14, 383 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  173. Lim, W. A. & June, C. H. The principles of engineering immune cell treat. Cancer Cell 168, 724–740 (2017).

    CAS  Google Scholar 

  174. Giordano-Attianese, G. et al. Author Correction: A computationally designed chimeric antigen receptor provides a small-molecule safety switch for T-cell therapy. Nat. Biotechnol. 38, 503 (2020).

    Article  CAS  PubMed  Google Scholar 

  175. Elazar, A. et al. De novo-designed transmembrane domains tune engineered receptor functions. eLife 11, e75660 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  176. Lajoie, M. J. et al. Designed protein logic to target cells with precise combinations of surface antigens. Science 1643, eaba6527 (2020).

    Google Scholar 

  177. Mushegian, A. R. Are there 1031 virus particles on earth, or more, or fewer? J. Bacteriol. 202, e00052-20 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank A. Tennenhouse for critical reading. Work in the Fleishman lab was funded by the Volkswagen Foundation grant 9474, the Israel Science Foundation grant 1844, the European Research Council through a Consolidator Award grant 815379, the Dr. Barry Sherman Institute for Medicinal Chemistry, and a donation in memory of Sam Switzer. Work in the Correia lab was supported by the Swiss National Foundation, the National Center of Competence in Molecular Systems Engineering and Fondation Leenaards.

Author information

Authors and Affiliations

Authors

Contributions

D.L. and C.A.G. researched data for the article. All authors contributed substantially to discussion of the content, wrote the article, and reviewed and/or edited the manuscript before submission.

Corresponding authors

Correspondence to Bruno E. Correia or Sarel Jacob Fleishman.

Ethics declarations

Competing interests

S.J.F. and B.E.C. are named inventors on patents relating to methods and designs described in the manuscript and consult on the application of protein design methods.

Peer review

Peer review information

Nature Reviews Molecular Cell Biology thanks Haiyan Liu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Protein Data Bank: https://www.rcsb.org/

Supplementary information

Glossary

Backbone designability

The ability of amino acid sequences to fold into the desired backbone. A backbone that has many solutions is highly designable.

Backbone generation

Generating a spatial arrangement of the protein backbone excluding the amino acid side chains.

Epistasis

Non-additive effects of combinatorial mutations; for instance, when mutations are tolerated in combination but not individually, or vice versa.

Fold design

Design of a protein backbone that shares no significant sequence homology with natural proteins. Sometimes denoted de novo design.

Function design

Implementing a new function into a protein scaffold.

Idealized topologies

Simplified geometric representation of protein structure, mostly comprising secondary structure elements connected by short linkers.

Negative design

Designing elements that destabilize undesired (for example, non-functional or aggregation-prone) structural states.

Physics-based methods

Computational methods that apply the laws of physics, usually in the form of forcefields, to minimize protein structures and analyse or design three-dimensional protein structures.

Positive design

Designing protein elements that improve the stability of a desired structural state.

Protein backbone

The protein mainchain of amino acids connected through covalent amide linkages. Also known as protein scaffold.

Protein optimization

Design with the goal of optimizing desired protein functional aspects such as thermodynamic and kinetic stabilities, production yields, catalytic efficiency, binding affinity, and specificity.

Protein switches

Proteins that toggle several different conformations by interacting with a specific molecule or environment.

Relative contact order

Represents the relative complexity of a protein fold. Computed as the extent to which amino acids that are far in the primary sequence are physically close in the 3D structure.

Sequence space

The theoretical space of possible combinations of protein sequence changes. This space is often too large for experimental or computational enumeration, and design methods must find ways to restrict and sample it efficiently.

Stability design

Design with the goal of improving protein thermodynamic and kinetic stabilities.

Structure-based design

Design based on computed or experimentally determined molecular structures using physical principles.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Listov, D., Goverde, C.A., Correia, B.E. et al. Opportunities and challenges in design and optimization of protein function. Nat Rev Mol Cell Biol (2024). https://doi.org/10.1038/s41580-024-00718-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41580-024-00718-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing