Advances in protein structure prediction and design

Abstract

The prediction of protein three-dimensional structure from amino acid sequence has been a grand challenge problem in computational biophysics for decades, owing to its intrinsic scientific interest and also to the many potential applications for robust protein structure prediction algorithms, from genome interpretation to protein function prediction. More recently, the inverse problem — designing an amino acid sequence that will fold into a specified three-dimensional structure — has attracted growing attention as a potential route to the rational engineering of proteins with functions useful in biotechnology and medicine. Methods for the prediction and design of protein structures have advanced dramatically in the past decade. Increases in computing power and the rapid growth in protein sequence and structure databases have fuelled the development of new data-intensive and computationally demanding approaches for structure prediction. New algorithms for designing protein folds and protein–protein interfaces have been used to engineer novel high-order assemblies and to design from scratch fluorescent proteins with novel or enhanced properties, as well as signalling proteins with therapeutic potential. In this Review, we describe current approaches for protein structure prediction and design and highlight a selection of the successful applications they have enabled.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Protein-folding landscapes and energies.
Fig. 2: Key steps in template-free structure prediction.
Fig. 3: Overview of the protein design process.
Fig. 4: Using computational design to create proteins that have valuable applications in research and medicine.

References

  1. 1.

    Jones, D. T., Singh, T., Kosciolek, T. & Tetchner, S. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31, 999–1006 (2015).

  2. 2.

    Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13, e1005324 (2017). This paper presents an accurate deep learning method that predicts residue–residue contacts by integrating 1D sequence features with 2D residue covariation and pairwise interaction features.

  3. 3.

    Huang, J. et al. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods 14, 71–73 (2017).

  4. 4.

    Park, H. et al. Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. J. Chem. Theory Comput. 12, 6201–6212 (2016).

  5. 5.

    Heo, L. & Feig, M. Experimental accuracy in protein structure refinement via molecular dynamics simulations. Proc. Natl Acad. Sci. USA 115, 13276–13281 (2018).

  6. 6.

    Park, H., Ovchinnikov, S., Kim, D. E., DiMaio, F. & Baker, D. Protein homology model refinement by large-scale energy optimization. Proc. Natl Acad. Sci. USA 115, 3054–3059 (2018). Heo et al. and Park et al. report substantial progress in refinement of protein structure models by physics-based simulations.

  7. 7.

    Mravic, M. et al. Packing of apolar side chains enables accurate design of highly stable membrane proteins. Science 363, 1418–1423 (2019). This study reports on the design of helical membrane proteins with only apolar interactions between side chains, which demonstrates that hydrogen bonding between helices is not required for the folding and stability of membrane proteins.

  8. 8.

    Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018). The first de novo design of a functional β-barrel protein, which reveals that symmetry breaking within the barrel is required to eliminate backbone strain and maximize hydrogen bonding between β-strands.

  9. 9.

    Silva, D.-A. et al. De novo design of potent and selective mimics of IL-2 and IL-15. Nature 565, 186–191 (2019).

  10. 10.

    Chen, I.-M. A. et al. IMG/M: integrated genome and metagenome comparative data analysis system. Nucleic Acids Res. 45, D507–D516 (2017).

  11. 11.

    Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000).

  12. 12.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

  13. 13.

    Anson, M. L. & Mirsky, A. E. Protein coagulation and its reversal: the preparation of insoluble globin, soluble globin and heme. J. Gen. Physiol. 13, 469–476 (1930).

  14. 14.

    Lumry, R. & Eyring, H. Conformation changes of proteins. J. Phys. Chem. 58, 110–120 (1954).

  15. 15.

    Anfinsen, C. B., Haber, E., Sela, M. & White, F. H. Jr The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc. Natl Acad. Sci. USA 47, 1309–1314 (1961).

  16. 16.

    Anfinsen, C. B. Principles that govern the folding of protein chains. Science 181, 223–230 (1973).

  17. 17.

    Anfinsen, C. B. & Scheraga, H. A. Experimental and theoretical aspects of protein folding. Adv. Protein Chem. 29, 205–300 (1975).

  18. 18.

    Lazaridis, T. & Karplus, M. Effective energy functions for protein structure prediction. Curr. Opin. Struct. Biol. 10, 139–145 (2000).

  19. 19.

    Dill, K. A., Ozkan, S. B., Shell, M. S. & Weikl, T. R. The protein folding problem. Annu. Rev. Biophys. 37, 289–316 (2008).

  20. 20.

    Karplus, M. The levinthal paradox: yesterday and today. Fold. Des. 2, S69–75 (1997).

  21. 21.

    Levitt, M. & Warshel, A. Computer simulation of protein folding. Nature 253, 694–698 (1975).

  22. 22.

    Levinthal, C. How to fold graciously. Mossbauer Spectrosc. Biol. Syst. 67, 22–24 (1969).

  23. 23.

    Bryngelson, J. D., Onuchic, J. N., Socci, N. D. & Wolynes, P. G. Funnels, pathways, and the energy landscape of protein folding: a synthesis. Proteins 21, 167–195 (1995).

  24. 24.

    Dill, K. A. Dominant forces in protein folding. Biochemistry 29, 7133–7155 (1990).

  25. 25.

    Monticelli, L. et al. The MARTINI coarse-grained force field: extension to proteins. J. Chem. Theory Comput. 4, 819–834 (2008).

  26. 26.

    Tozzini, V. Coarse-grained models for proteins. Curr. Opin. Struct. Biol. 15, 144–150 (2005).

  27. 27.

    Maisuradze, G. G., Senet, P., Czaplewski, C., Liwo, A. & Scheraga, H. A. Investigation of protein folding by coarse-grained molecular dynamics with the UNRES force field. J. Phys. Chem. A 114, 4471–4485 (2010).

  28. 28.

    Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

  29. 29.

    Eddy, S. R. Profile hidden Markov models. Bioinformatics 14, 755–763 (1998).

  30. 30.

    Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 173–175 (2011).

  31. 31.

    Sadreyev, R. & Grishin, N. COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J. Mol. Biol. 326, 317–336 (2003).

  32. 32.

    Söding, J. Protein homology detection by HMM–HMM comparison. Bioinformatics 21, 951–960 (2005).

  33. 33.

    Bowie, J. U., Lüthy, R. & Eisenberg, D. A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164–170 (1991).

  34. 34.

    Jones, D. T., Taylor, W. R. & Thornton, J. M. A new approach to protein fold recognition. Nature 358, 86–89 (1992).

  35. 35.

    Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46, W296–W303 (2018).

  36. 36.

    Krivov, G. G., Shapovalov, M. V. & Dunbrack, R. L. Improved prediction of protein side-chain conformations with SCWRL4. Proteins: Struct. Funct. Bioinf. 77, 778–795 (2009).

  37. 37.

    Webb, B. & Sali, A. Protein structure modeling with MODELLER. Methods Mol. Biol. 1654, 39–54 (2017).

  38. 38.

    Yang, J. et al. The I-TASSER Suite: protein structure and function prediction. Nat. Methods 12, 7–8 (2015).

  39. 39.

    Song, Y. et al. High-resolution comparative modeling with RosettaCM. Structure 21, 1735–1742 (2013).

  40. 40.

    Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355, 294–298 (2017). This study shows that inclusion of sequence data from metagenomics triples the number of protein families for which accurate structural models can be built using folding simulations that incorporate covariation-derived residue–residue contact predictions.

  41. 41.

    Jones, D. T. & McGuffin, L. J. Assembling novel protein folds from super-secondary structural fragments. Proteins 53, 480–485 (2003).

  42. 42.

    Simons, K. T., Kooperberg, C., Huang, E. & Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225 (1997).

  43. 43.

    Xu, D. & Zhang, Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 80, 1715–1735 (2012).

  44. 44.

    Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953).

  45. 45.

    Jones, T. A. & Thirup, S. Using known substructures in protein model building and crystallography. EMBO J. 5, 819–822 (1986).

  46. 46.

    Baeten, L. et al. Reconstruction of protein backbones from the BriX collection of canonical protein fragments. PLoS Comput. Biol. 4, e1000083 (2008).

  47. 47.

    Bystroff, C., Simons, K. T., Han, K. F. & Baker, D. Local sequence–structure correlations in proteins. Curr. Opin. Biotechnol. 7, 417–421 (1996).

  48. 48.

    Bujnicki, J. M. Protein-structure prediction by recombination of fragments. Chembiochem. 7, 19–27 (2006).

  49. 49.

    Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011).

  50. 50.

    Moult, J., Pedersen, J. T., Judson, R. & Fidelis, K. A large-scale experiment to assess protein structure prediction methods. Proteins: Struct. Funct. Bioinf. 23, ii–iv (1995).

  51. 51.

    Atchley, W. R., Wollenberg, K. R., Fitch, W. M., Terhalle, W. & Dress, A. W. Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol. Biol. Evol. 17, 164–178 (2000).

  52. 52.

    Fodor, A. A. & Aldrich, R. W. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004).

  53. 53.

    Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & Hwa, T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl Acad. Sci. USA 106, 67–72 (2009).

  54. 54.

    Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–301 (2011).

  55. 55.

    Balakrishnan, S., Kamisetty, H., Carbonell, J. G., Lee, S.-I. & Langmead, C. J. Learning generative models for protein fold families. Proteins 79, 1061–1078 (2011).

  56. 56.

    Jones, D. T., Buchan, D. W. A., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012).

  57. 57.

    Nugent, T. & Jones, D. T. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc. Natl Acad. Sci. USA 109, E1540–E1547 (2012).

  58. 58.

    Ovchinnikov, S. et al. Large-scale determination of previously unsolved protein structures using evolutionary information. eLife 4, e09248 (2015).

  59. 59.

    Zhang, C., Mortuza, S. M., He, B., Wang, Y. & Zhang, Y. Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins 86, 136–151 (2018).

  60. 60.

    Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6, e28766 (2011).

  61. 61.

    Brünger, A. T. et al. Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. D Biol. Crystallogr. 54, 905–921 (1998).

  62. 62.

    Hopf, T. A. et al. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149, 1607–1621 (2012).

  63. 63.

    Toth-Petroczy, A. et al. Structured states of disordered proteins from genomic sequences. Cell 167, 158–170.e12 (2016).

  64. 64.

    Cheng, J., Randall, A. Z., Sweredoski, M. J. & Baldi, P. SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res. 33, W72–W76 (2005).

  65. 65.

    Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56, 227–241 (2013).

  66. 66.

    Faraggi, E., Zhang, T., Yang, Y., Kurgan, L. & Zhou, Y. SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J. Comput. Chem. 33, 259–267 (2012).

  67. 67.

    Jones, D. T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999).

  68. 68.

    Karplus, K. SAM-T08, HMM-based protein structure prediction. Nucleic Acids Res. 37, W492–W497 (2009).

  69. 69.

    Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012).

  70. 70.

    Liu, Y., Palmedo, P., Ye, Q., Berger, B. & Peng, J. Enhancing evolutionary couplings with deep convolutional neural networks. Cell Syst. 6, 65–74.e3 (2018).

  71. 71.

    Jones, D. T. & Kandathil, S. M. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics 34, 3308–3315 (2018).

  72. 72.

    Huang, P.-S., Boyken, S. E. & Baker, D. The coming of age of de novo protein design. Nature 537, 320–327 (2016).

  73. 73.

    Khoury, G. A., Smadbeck, J., Kieslich, C. A. & Floudas, C. A. Protein folding and de novo protein design for biotechnological applications. Trends Biotechnol. 32, 99–109 (2014).

  74. 74.

    Woolfson, D. N. et al. De novo protein design: how do we expand into the universe of possible protein structures? Curr. Opin. Struct. Biol. 33, 16–26 (2015).

  75. 75.

    Coluzza, I. Computational protein design: a review. J. Phys. Condens. Matter 29, 143001 (2017).

  76. 76.

    Mackenzie, C. O. & Grigoryan, G. Protein structural motifs in prediction and design. Curr. Opin. Struct. Biol. 44, 161–167 (2017).

  77. 77.

    Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).

  78. 78.

    Brunette, T. J. et al. Exploring the repeat protein universe through computational protein design. Nature 528, 580–584 (2015).

  79. 79.

    Huang, P.-S. et al. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat. Chem. Biol. 12, 29–34 (2016).

  80. 80.

    Doyle, L. et al. Rational design of α-helical tandem repeat proteins with closed architectures. Nature 528, 585–588 (2015). First de novo design of repeat proteins that adopt ‘doughnut’-like structures with the N and C termini adjacent in three-dimensional space.

  81. 81.

    Marcos, E. et al. Principles for designing proteins with cavities formed by curved β sheets. Science 355, 201–206 (2017).

  82. 82.

    Marcos, E. et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nat. Struct. Mol. Biol. 25, 1028–1034 (2018).

  83. 83.

    Murphy, G. S. et al. Computational de novo design of a four-helix bundle protein–DND_4HB. Protein Sci. 24, 434–445 (2015).

  84. 84.

    Jacobs, T. M. et al. Design of structurally distinct proteins using strategies inspired by evolution. Science 352, 687–690 (2016).

  85. 85.

    Guffy, S. L., Teets, F. D., Langlois, M. I. & Kuhlman, B. Protocols for requirement-driven protein design in the Rosetta modeling program. J. Chem. Inf. Model. 58, 895–901 (2018).

  86. 86.

    Lin, Y.-R. et al. Control over overall shape and size in de novo designed proteins. Proc. Natl Acad. Sci. USA 112, E5478–85 (2015).

  87. 87.

    Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).

  88. 88.

    Crick, F. H. C. The Fourier transform of a coiled-coil. Acta Crystallogr. 6, 685–689 (1953).

  89. 89.

    Huang, P.-S. et al. High thermodynamic stability of parametrically designed helical bundles. Science 346, 481–485 (2014).

  90. 90.

    Lu, P. et al. Accurate computational design of multipass transmembrane proteins. Science 359, 1042–1046 (2018).

  91. 91.

    Thomson, A. R. et al. Computational design of water-soluble α-helical barrels. Science 346, 485–488 (2014).

  92. 92.

    Tinberg, C. E. et al. Computational design of ligand-binding proteins with high affinity and selectivity. Nature 501, 212–216 (2013).

  93. 93.

    Röthlisberger, D. et al. Kemp elimination catalysts by computational enzyme design. Nature 453, 190–195 (2008).

  94. 94.

    Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).

  95. 95.

    Boas, F. E. & Harbury, P. B. Potential energy functions for protein design. Curr. Opin. Struct. Biol. 17, 199–204 (2007).

  96. 96.

    O’Meara, M. J. et al. Combined covalent-electrostatic model of hydrogen bonding improves structure prediction with Rosetta. J. Chem. Theory Comput. 11, 609–622 (2015).

  97. 97.

    Gainza, P., Nisonoff, H. M. & Donald, B. R. Algorithms for protein design. Curr. Opin. Struct. Biol. 39, 16–26 (2016).

  98. 98.

    Dunbrack, R. L. Jr Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol. 12, 431–440 (2002).

  99. 99.

    Kuhlman, B. & Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl Acad. Sci. USA 97, 10383–10388 (2000).

  100. 100.

    Traoré, S. et al. Fast search algorithms for computational protein design. J. Comput. Chem. 37, 1048–1058 (2016).

  101. 101.

    Hallen, M. A. et al. OSPREY 3.0: open-source protein redesign for you, with powerful new features. J. Comput. Chem. 39, 2494–2507 (2018).

  102. 102.

    Lapidoth, G. et al. Highly active enzymes by automated combinatorial backbone assembly and sequence design. Nat. Commun. 9, 2780 (2018).

  103. 103.

    Ollikainen, N., de Jong, R. M. & Kortemme, T. Coupling protein side-chain and backbone flexibility improves the re-design of protein-ligand specificity. PLoS Comput. Biol. 11, e1004335 (2015).

  104. 104.

    Hallen, M. A. & Donald, B. R. CATS (coordinates of atoms by taylor series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics 33, i5–i12 (2017).

  105. 105.

    Mackenzie, C. O., Zhou, J. & Grigoryan, G. Tertiary alphabet for the observable protein structural universe. Proc. Natl Acad. Sci. USA 113, E7438–E7447 (2016).

  106. 106.

    Frappier, V., Jenson, J. M., Zhou, J., Grigoryan, G. & Keating, A. E. Tertiary structural motif sequence statistics enable facile prediction and design of peptides that bind anti-apoptotic Bfl-1 and Mcl-1. Structure 27, 606–617 (2019). Instead of using an all-atom model of the complex to calculate interaction energies, Frappier et al. employed a knowledge-based approach with sequence preferences from structural motifs similar to the designed interface to predict binding energies.

  107. 107.

    Boyken, S. E. et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science 352, 680–687 (2016).

  108. 108.

    Chen, Z. et al. Programmable design of orthogonal protein heterodimers. Nature 565, 106–111 (2019).

  109. 109.

    Maguire, J. B., Boyken, S. E., Baker, D. & Kuhlman, B. Rapid sampling of hydrogen bond networks for computational protein design. J. Chem. Theory Comput. 14, 2751–2760 (2018).

  110. 110.

    Harbury, P. B., Plecs, J. J., Tidor, B., Alber, T. & Kim, P. S. High-resolution protein design with backbone freedom. Science 282, 1462–1467 (1998).

  111. 111.

    Leaver-Fay, A., Jacak, R., Stranges, P. B. & Kuhlman, B. A generic program for multistate protein design. PLoS ONE 6, e20937 (2011).

  112. 112.

    Negron, C. & Keating, A. E. Multistate protein design using CLEVER and CLASSY. Methods Enzymol. 523, 171–190 (2013).

  113. 113.

    Allen, B. D. & Mayo, S. L. An efficient algorithm for multistate protein design based on FASTER. J. Comput. Chem. 31, 904–916 (2010).

  114. 114.

    Löffler, P., Schmitz, S., Hupfeld, E., Sterner, R. & Merkl, R. Rosetta:MSF: a modular framework for multi-state computational protein design. PLoS Comput. Biol. 13, e1005600 (2017).

  115. 115.

    Goldenzweig, A. et al. Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol. Cell 63, 337–346 (2016). This study uses protein design simulations coupled with sequence conservation information to create an effective protocol for identifying sets of mutations that increase protein thermostability and expression.

  116. 116.

    Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).

  117. 117.

    Gainza-Cirauqui, P. & Correia, B. E. Computational protein design — the next generation tool to expand synthetic biology applications. Curr. Opin. Biotechnol. 52, 145–152 (2018).

  118. 118.

    Wrenbeck, E. E., Faber, M. S. & Whitehead, T. A. Deep sequencing methods for protein engineering and design. Curr. Opin. Struct. Biol. 45, 36–44 (2017).

  119. 119.

    Malakauskas, S. M. & Mayo, S. L. Design, structure and stability of a hyperthermophilic protein variant. Nat. Struct. Biol. 5, 470–475 (1998).

  120. 120.

    Magliery, T. J. Protein stability: computation, sequence statistics, and new experimental methods. Curr. Opin. Struct. Biol. 33, 161–168 (2015).

  121. 121.

    Goldenzweig, A. & Fleishman, S. J. Principles of protein stability and their application in computational design. Annu. Rev. Biochem. 87, 105–129 (2018).

  122. 122.

    Borgo, B. & Havranek, J. J. Automated selection of stabilizing mutations in designed and natural proteins. Proc. Natl Acad. Sci. USA 109, 1494–1499 (2012).

  123. 123.

    Dantas, G., Kuhlman, B., Callender, D., Wong, M. & Baker, D. A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. J. Mol. Biol. 332, 449–460 (2003).

  124. 124.

    Murphy, G. S. et al. Increasing sequence diversity with flexible backbone protein design: the complete redesign of a protein hydrophobic core. Structure 20, 1086–1096 (2012).

  125. 125.

    Bednar, D. et al. FireProt: energy- and evolution-based computational design of thermostable multiple-point mutants. PLoS Comput. Biol. 11, e1004556 (2015).

  126. 126.

    Lehmann, M., Pasamontes, L., Lassen, S. F. & Wyss, M. The consensus concept for thermostability engineering of proteins. Biochim. Biophys. Acta 1543, 408–415 (2000).

  127. 127.

    Campeotto, I. et al. One-step design of a stable variant of the malaria invasion protein RH5 for use as a vaccine immunogen. Proc. Natl Acad. Sci. USA 114, 998–1002 (2017).

  128. 128.

    Kapp, G. T. et al. Control of protein signaling using a computationally designed GTPase/GEF orthogonal pair. Proc. Natl Acad. Sci. USA 109, 5277–5282 (2012).

  129. 129.

    Jenson, J. M., Ryan, J. A., Grant, R. A., Letai, A. & Keating, A. E. Epistatic mutations in PUMA BH3 drive an alternate binding mode to potently and selectively inhibit anti-apoptotic Bfl-1. eLife 6, e25541 (2017).

  130. 130.

    Froning, K. J. et al. Computational design of a specific heavy chain/κ light chain interface for expressing fully IgG bispecific antibodies. Protein Sci. 26, 2021–2038 (2017).

  131. 131.

    Leaver-Fay, A. et al. Computationally designed bispecific antibodies using negative state repertoires. Structure 24, 641–651 (2016).

  132. 132.

    Lewis, S. M. et al. Generation of bispecific IgG antibodies by structure-based design of an orthogonal Fab interface. Nat. Biotechnol. 32, 191–198 (2014). In this study, multi-state design simulations are used to create altered specificity interactions between antibody constant domains, allowing the proper assembly of IgG antibodies that recognize two separate antigens simultaneously.

  133. 133.

    Krishnamurthy, A. & Jimeno, A. Bispecific antibodies for cancer therapy: a review. Pharmacol. Ther. 185, 122–134 (2018).

  134. 134.

    Berger, S. et al. Computationally designed high specificity inhibitors delineate the roles of BCL2 family proteins in cancer. eLife 5, e20352 (2016).

  135. 135.

    Stranges, P. B. & Kuhlman, B. A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds. Protein Sci. 22, 74–82 (2013).

  136. 136.

    King, N. P. et al. Computational design of self-assembling protein nanomaterials with atomic level accuracy. Science 336, 1171–1174 (2012).

  137. 137.

    King, N. P. et al. Accurate design of co-assembling multi-component protein nanomaterials. Nature 510, 103–108 (2014).

  138. 138.

    Bale, J. B. et al. Accurate design of megadalton-scale two-component icosahedral protein complexes. Science 353, 389–394 (2016). One of several papers in which this team demonstrate that protein interface design combined with modelling of higher-order symmetries can be used to create large, multi-component protein cages.

  139. 139.

    Butterfield, G. L. et al. Evolution of a designed protein assembly encapsulating its own RNA genome. Nature 552, 415–420 (2017).

  140. 140.

    Liu, Y., Gonen, S., Gonen, T. & Yeates, T. O. Near-atomic cryo-EM imaging of a small protein displayed on a designed scaffolding system. Proc. Natl Acad. Sci. USA 115, 3362–3367 (2018).

  141. 141.

    Liu, Y., Huynh, D. T. & Yeates, T. O. A 3.8 Å resolution cryo-EM structure of a small protein bound to an imaging scaffold. Nat. Commun. 10, 1864 (2019).

  142. 142.

    Marcandalli, J. et al. Induction of potent neutralizing antibody responses by a designed protein nanoparticle vaccine for respiratory syncytial virus. Cell 176, 1420–1431 (2019).

  143. 143.

    LjubetiČ, A. et al. Design of coiled-coil protein-origami cages that self-assemble in vitro and in vivo. Nat. Biotechnol. 35, 1094–1101 (2017).

  144. 144.

    Lai, Y.-T., Cascio, D. & Yeates, T. O. Structure of a 16-nm cage designed by using protein oligomers. Science 336, 1129 (2012).

  145. 145.

    Shen, H. et al. De novo design of self-assembling helical protein filaments. Science 362, 705–709 (2018).

  146. 146.

    Gonen, S., DiMaio, F., Gonen, T. & Baker, D. Design of ordered two-dimensional arrays mediated by noncovalent protein-protein interfaces. Science 348, 1365–1368 (2015).

  147. 147.

    Zhang, H. V. et al. Computationally designed peptides for self-assembly of nanostructured lattices. Sci. Adv. 2, e1600307 (2016).

  148. 148.

    Tian, Y. et al. Nanotubes, plates, and needles: pathway-dependent self-assembly of computationally designed peptides. Biomacromolecules 19, 4286–4298 (2018).

  149. 149.

    Fleishman, S. J. et al. Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science 332, 816–821 (2011).

  150. 150.

    Chevalier, A. et al. Massively parallel de novo protein design for targeted therapeutics. Nature 550, 74–79 (2017).

  151. 151.

    Adolf-Bryfogle, J. et al. RosettaAntibodyDesign (RAbD): a general framework for computational antibody design. PLoS Comput. Biol. 14, e1006112 (2018).

  152. 152.

    Kundert, K. & Kortemme, T. Computational design of structured loops for new protein functions. Biol. Chem. 400, 275–288 (2019).

  153. 153.

    Adolf-Bryfogle, J., Xu, Q., North, B., Lehmann, A. & Dunbrack, R. L. Jr PyIgClassify: a database of antibody CDR structural classifications. Nucleic Acids Res. 43, D432–D438 (2015).

  154. 154.

    Baran, D. et al. Principles for computational design of binding antibodies. Proc. Natl Acad. Sci. USA 114, 10900–10905 (2017).

  155. 155.

    Kulp, D. W. & Schief, W. R. Advances in structure-based vaccine design. Curr. Opin. Virol. 3, 322–331 (2013).

  156. 156.

    Salvat, R. S. et al. Computationally optimized deimmunization libraries yield highly mutated enzymes with low immunogenicity and enhanced activity. Proc. Natl Acad. Sci. USA 114, E5085–E5093 (2017).

  157. 157.

    Bick, M. J. et al. Computational design of environmental sensors for the potent opioid fentanyl. eLife 6, e28909 (2017).

  158. 158.

    Polizzi, N. F. et al. De novo design of a hyperstable non-natural protein-ligand complex with sub-Å accuracy. Nat. Chem. 9, 1157–1164 (2017).

  159. 159.

    Reeve, S. M. et al. Protein design algorithms predict viable resistance to an experimental antifolate. Proc. Natl Acad. Sci. USA 112, 749–754 (2015).

  160. 160.

    Kiss, G., Çelebi-Ölçüm, N., Moretti, R., Baker, D. & Houk, K. N. Computational enzyme design. Angew. Chem. Int. Ed. Engl. 52, 5700–5725 (2013).

  161. 161.

    Baker, D. An exciting but challenging road ahead for computational enzyme design. Protein Sci. 19, 1817–1819 (2010).

  162. 162.

    Ambroggio, X. I. & Kuhlman, B. Computational design of a single amino acid sequence that can switch between two distinct protein folds. J. Am. Chem. Soc. 128, 1154–1161 (2006).

  163. 163.

    Joh, N. H. et al. De novo design of a transmembrane Zn2+-transporting four-helix bundle. Science 346, 1520–1524 (2014).

  164. 164.

    Davey, J. A., Damry, A. M., Goto, N. K. & Chica, R. A. Rational design of proteins that exchange on functional timescales. Nat. Chem. Biol. 13, 1280–1285 (2017).

  165. 165.

    Guntas, G. et al. Engineering an improved light-induced dimer (iLID) for controlling the localization and activity of signaling proteins. Proc. Natl Acad. Sci. USA 112, 112–117 (2015).

  166. 166.

    Dagliyan, O. et al. Engineering extrinsic disorder to control protein activity in living cells. Science 354, 1441–1444 (2016).

  167. 167.

    Dagliyan, O. et al. Computational design of chemogenetic and optogenetic split proteins. Nat. Commun. 9, 4042 (2018).

  168. 168.

    Blacklock, K. M., Yachnin, B. J., Woolley, G. A. & Khare, S. D. Computational design of a photocontrolled cytosine deaminase. J. Am. Chem. Soc. 140, 14–17 (2017).

  169. 169.

    Hoersch, D., Roh, S.-H., Chiu, W. & Kortemme, T. Reprogramming an ATP-driven protein machine into a light-gated nanocage. Nat. Nanotechnol. 8, 928–932 (2013).

  170. 170.

    Lindorff-Larsen, K., Piana, S., Dror, R. O. & Shaw, D. E. How fast-folding proteins fold. Science 334, 517–520 (2011).

  171. 171.

    Dror, R. O. et al. Structural basis for nucleotide exchange in heterotrimeric G proteins. Science 348, 1361–1365 (2015).

  172. 172.

    Correia, B. E. et al. Proof of principle for epitope-focused vaccine design. Nature 507, 201–206 (2014). Correia et al. used a de novo protein design to generate a small protein that mimics a conformational epitope from RSV and elicits neutralizing antibodies in animal studies.

Download references

Acknowledgements

The authors apologize to the scientists whose important work could not be cited in this Review owing to space constraints. This work was supported by NIH grants R01GM117968 and R35GM131923 to B.K. and R01GM121487 and R01GM123378 to P.B.

Author information

Both authors contributed equally to all aspects of the article.

Correspondence to Brian Kuhlman or Philip Bradley.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Molecular Cell Biology thanks W. DeGrado, T. O. Yeates and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

CASP protein structure prediction experiment: http://predictioncenter.org/casp13

DeepContact: https://github.com/largelymfs/deepcontact

DeepCov: https://github.com/psipred/DeepCov

EVfold: http://evfold.org/evfold-web/evfold.do

RaptorX-Contact: http://raptorx.uchicago.edu/ContactMap/

TripletRes: https://zhanglab.ccmb.med.umich.edu/TripletRes/

OSPREY: https://www2.cs.duke.edu/donaldlab/osprey.php

PSIPRED: http://bioinf.cs.ucl.ac.uk/psipred/

Rosetta: https://www.rosettacommons.org/

Supplementary information

Glossary

Protein energy functions

Functions that correspond to a mathematical model of the molecular forces that determine protein structures and interactions. The choice of an energy function defines a map from structures onto energy values, referred to as an energy landscape, which can guide structure prediction and design simulations. Typical protein energy functions are linear combinations of multiple terms, each term capturing a distinct energetic contribution (van der Waals interactions, electrostatics, desolvation), with the weights and atomic parameters for these terms chosen by a parameterization procedure that seeks to optimize the agreement between the quantities predicted from the energy function and the corresponding values derived from experiments or from quantum chemistry calculations on small chemical systems.

Deep learning

A form of machine learning that employs artificial neural networks with many internal-processing layers to recognize patterns in large and complex datasets, such as visual images and written and spoken language.

Van der Waals interactions

Inter-atomic or inter-molecular interactions that are individually weak (much weaker than covalent or ionic bonds) and relatively short-ranged.

Rotamers

A discrete set of conformations frequently adopted by amino acid side chains.

Degrees of freedom

The free parameters in a system that determine its structure and, hence, its energy. They can be continuous, such as a real-valued backbone torsion angle or atomic position, or discrete (permitting only a finite number of alternatives). Owing to strong torsional preferences, side-chain conformations can be successfully modelled using a discrete set of rotamers, identified by analysis of the structural database.

Phi angle

A torsion angle (or dihedral angle) that describes rotation about the bond that connects the backbone nitrogen and the backbone Cα carbon of an amino acid in a polypeptide chain. It is one of the two primary degrees of freedom (along with the psi angle) per amino acid residue that proteins use to adopt alternative conformations.

Psi angle

A torsion angle (or dihedral angle) that describes rotation about the bond that connects the backbone Cα carbon and backbone carbonyl carbon of an amino acid in a polypeptide chain. It is one of the two primary degrees of freedom (along with the phi angle) per amino acid residue that proteins use to adopt alternative conformations.

Chi1 angle

A torsion angle (or dihedral angle) in amino acid side chains that is numbered on the basis of the proximity in chemical connectivity of the bond to the protein backbone. Chi1 refers to rotation about the bond closest to the backbone, chi2 is the next closest position and so on. Some amino acids, such as alanine and glycine, have no rotatable bonds or torsion angles, while others, such as lysine and arginine, have up to four.

Rosetta

A software package for the prediction and design of protein structures and interactions that implements a wide range of backbone and side-chain conformational sampling algorithms and sequence optimization methods.

Dead-end elimination

An algorithm for side-chain rotamer optimization that functions by eliminating rotamers that are not compatible with adopting the sequence with the lowest possible energy.

Mean-field optimization

A protocol for designing sequences that assigns a probability to observing each amino acid at each sequence position in the protein and calculates an average (mean-field) energy for the protein based on the assigned probabilities. The probabilities are then adjusted to lower the mean-field energy of the protein.

Simulated annealing

A probabilistic approach for identifying low-energy sequences that accepts or rejects sequence changes on the basis of the calculated change in the energy of the protein when a sequence change is made and the temperature of the modelled system. If a change lowers the energy of the protein, it is automatically accepted; if it raises the energy of the system, it is accepted with some probability that depends on how much the energy has increased (a bigger increase in energy is less likely to be accepted) and the current temperature (at higher temperatures it is more likely to accept changes that raise the energy of the system). The temperature is lowered as the simulation progresses, to identify low-energy sequences.

Genetic algorithms

A sequence optimization protocol that repeatedly modifies a population of sequences by applying rounds of energy-based selection. The energy of a sequence is calculated by modelling it in the desired protein conformation. Lower-energy sequences are more likely to progress to the next generation. Before each round of selection, the previous winning sequences are recombined with each other and small numbers of mutations are incorporated into the sequences, to search for lower-energy sequences.

Multi-state design algorithms

An approach to designing sequences that satisfy multiple constraints simultaneously. For instance, such algorithms can be used to find protein sequences that are predicted to bind ligand X but not ligand Y. Alternatively, they can be repurposed to find sequences that are simultaneously good at binding both ligand X and ligand Y.

Yeast surface display

An experimental approach for probing a large library of protein sequences (up to tens of millions) for binding to another molecule. In the final yeast library, each yeast cell contains the DNA for one member of the protein library, and this protein is expressed as a fusion protein that presents the protein on the outside of the cell. The cells are mixed with the target molecule, which has been labelled with a fluorescent dye, and then fluorescence-activated cell sorting is used to identify the cells that contain a designed protein that binds the target protein. DNA sequencing is used to identify the designs that passed selection.

BCL-2 protein family

A family of structurally related proteins that interact with each other to induce or repress apoptosis.

DNA origami

A term describing approaches that use the high sequence specificity of DNA interactions to design DNA sequences that will fold into complex and predictable two- and three-dimensional shapes.

Major histocompatibility complexes

(MHCs). A set of cell surface proteins that bind to antigens from foreign pathogens and present them for recognition by other proteins and cells from the immune system. They are a key component of the acquired immune system.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kuhlman, B., Bradley, P. Advances in protein structure prediction and design. Nat Rev Mol Cell Biol 20, 681–697 (2019). https://doi.org/10.1038/s41580-019-0163-x

Download citation

Further reading