Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

De novo design of a non-local β-sheet protein with high stability and accuracy


β-sheet proteins carry out critical functions in biology, and hence are attractive scaffolds for computational protein design. Despite this potential, de novo design of all-β-sheet proteins from first principles lags far behind the design of all-α or mixed-αβ domains owing to their non-local nature and the tendency of exposed β-strand edges to aggregate. Through study of loops connecting unpaired β-strands (β-arches), we have identified a series of structural relationships between loop geometry, side chain directionality and β-strand length that arise from hydrogen bonding and packing constraints on regular β-sheet structures. We use these rules to de novo design jellyroll structures with double-stranded β-helices formed by eight antiparallel β-strands. The nuclear magnetic resonance structure of a hyperthermostable design closely matched the computational model, demonstrating accurate control over the β-sheet structure and loop geometry. Our results open the door to the design of a broad range of non-local β-sheet protein structures.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Constraints on β-arch geometry.
Fig. 2: Double-stranded β-helix topology specification.
Fig. 3: The NMR structure of BH_10 is nearly identical to the design model.

Data availability

NMR chemical shifts and NOESY cross-peak lists used to determine structures of BH_10 have been deposited in the BMRB with accession code 30495. Coordinates of the ten lowest-energy structures and the restraint lists have been deposited in the wwPDB as PDB 6E5C. The design model of BH_10 is available as Supplementary Dataset 1, and the loop dataset used to analyze the side chain patterns of naturally occurring β-arches is available in Supplementary Dataset 2. Other data are available from the corresponding authors upon reasonable request.


  1. 1.

    Kortemme, T., Ramírez-Alvarado, M. & Serrano, L. Design of a 20-amino acid, three-stranded β-sheet protein. Science 281, 253–256 (1998).

    CAS  Article  Google Scholar 

  2. 2.

    Searle, M. S. & Ciani, B. Design of β-sheet systems for understanding the thermodynamics and kinetics of protein folding. Curr. Opin. Struct. Biol. 14, 458–464 (2004).

    CAS  Article  Google Scholar 

  3. 3.

    Hughes, R. M. & Waters, M. L. Model systems for β-hairpins and β-sheets. Curr. Opin. Struct. Biol. 16, 514–524 (2006).

    CAS  Article  Google Scholar 

  4. 4.

    Marcos, E. & Adriano-Silva, D. Essentials of de novo protein design: methods and applications. WIREs Comput. Mol. Sci. 8, e1374 (2018).

    Article  Google Scholar 

  5. 5.

    Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).

    CAS  Article  Google Scholar 

  6. 6.

    Hecht, M. H. De novo design of β-sheet proteins. Proc. Natl Acad. Sci. USA 91, 8729–8730 (1994).

    CAS  Article  Google Scholar 

  7. 7.

    Plaxco, K. W., Simons, K. T. & Baker, D. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277, 985–994 (1998).

    CAS  Article  Google Scholar 

  8. 8.

    Quinn, T. P., Tweedy, N. B., Williams, R. W., Richardson, J. S. & Richardson, D. C. Betadoublet: de novo design, synthesis, and characterization of a β-sandwich protein. Proc. Natl Acad. Sci. USA 91, 8747–8751 (1994).

    CAS  Article  Google Scholar 

  9. 9.

    Nanda, V. et al. De novo design of a redox-active minimal rubredoxin mimic. J. Am. Chem. Soc. 127, 5804–5805 (2005).

    CAS  Article  Google Scholar 

  10. 10.

    Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).

    CAS  Article  Google Scholar 

  11. 11.

    Voet, A. R. D. et al. Computational design of a self-assembling symmetrical β-propeller protein. Proc. Natl Acad. Sci. USA 111, 15102–15107 (2014).

    CAS  Article  Google Scholar 

  12. 12.

    MacDonald, J. T. Synthetic β-solenoid proteins with the fragment-free computational design of a β-hairpin extension. Proc. Natl Acad. Sci. USA 113, 10346–10351 (2016).

    CAS  Article  Google Scholar 

  13. 13.

    Ottesen, J. J. & Imperiali, B. Design of a discretely folded mini-protein motif with predominantly β-structure. Nat. Struct. Biol. 8, 535–539 (2001).

    CAS  Article  Google Scholar 

  14. 14.

    Hu, X., Wang, H., Ke, H. & Kuhlman, B. Computer-based redesign of β sandwich protein suggests that extensive negative design is not required for de novo β sheet design. Structure 16, 1799–1805 (2008).

    CAS  Article  Google Scholar 

  15. 15.

    Hennetin, J., Jullian, B., Steven, A. C. & Kajava, A. V. Standard conformations of beta-arches in β-solenoid proteins. J. Mol. Biol. 358, 1094–1105 (2006).

    CAS  Article  Google Scholar 

  16. 16.

    Lin, Y.-R. et al. Control over overall shape and size in de novo designed proteins. Proc. Natl Acad. Sci. USA 112, E5478–E5485 (2015).

    CAS  Article  Google Scholar 

  17. 17.

    Kajava, A. V., Baxa, U. & Steven, A. C. β arcades: recurring motifs in naturally occurring and disease-related amyloid fibrils. FASEB J. 24, 1311–1319 (2010).

    CAS  Article  Google Scholar 

  18. 18.

    Kuhlman, B. & Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl Acad. Sci. USA 97, 10383–10388 (2000).

    CAS  Article  Google Scholar 

  19. 19.

    Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).

    CAS  Article  Google Scholar 

  20. 20.

    Richardson, J. S. & Richardson, D. C. Natural beta-sheet proteins use negative design to avoid edge-to-edge aggregation. Proc. Natl Acad. Sci. USA 99, 2754–2759 (2002).

    CAS  Article  Google Scholar 

  21. 21.

    Marcos, E. et al. Principles for designing proteins with cavities formed by curved β sheets. Science 355, 201–206 (2017).

    CAS  Article  Google Scholar 

  22. 22.

    Rohl, C. A., Strauss, C. E. M., Misura, K. M. S. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol. 383, 66–93 (2004).

    CAS  Article  Google Scholar 

  23. 23.

    Bradley, P. Toward high-resolution de novo structure prediction for small proteins. Science 309, 1868–1871 (2005).

    CAS  Article  Google Scholar 

  24. 24.

    Kuhn, M., Meiler, J. & Baker, D. Strand-loop-strand motifs: prediction of hairpins and diverging turns in proteins. Proteins 54, 282–288 (2004).

    CAS  Article  Google Scholar 

  25. 25.

    Bradley, P. & Baker, D. Improved β-protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation. Proteins 65, 922–929 (2006).

    CAS  Article  Google Scholar 

  26. 26.

    Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    CAS  Article  Google Scholar 

  27. 27.

    Camacho, C. et al. BLAST: architecture and applications. BMC Bioinformatics 10, 421 (2009).

    Article  Google Scholar 

  28. 28.

    Evangelidis, T. et al. Automated NMR resonance assignments and structure determination using a minimal set of 4D spectra. Nat. Commun. 9, 384 (2018).

    Article  Google Scholar 

  29. 29.

    Holm, L. & Laakso, L. M. Dali server update. Nucleic Acids Res. 44, W351–W355 (2016).

    CAS  Article  Google Scholar 

  30. 30.

    Zimmermann, L. et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 430, 2237–2243 (2018).

    CAS  Article  Google Scholar 

  31. 31.

    Clark, P. Protein folding in the cell: reshaping the folding funnel. Trends Biochem. Sci. 29, 527–534 (2004).

    CAS  Article  Google Scholar 

  32. 32.

    Wang, G. & Dunbrack, R. L. Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).

    CAS  Article  Google Scholar 

  33. 33.

    Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).

    CAS  Article  Google Scholar 

  34. 34.

    Fleishman, S. J. et al. RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS ONE 6, e20161 (2011).

    CAS  Article  Google Scholar 

  35. 35.

    O’Meara, M. J. et al. Combined covalent-electrostatic model of hydrogen bonding improves structure prediction with Rosetta. J. Chem. Theory Comput. 11, 609–622 (2015).

    Article  Google Scholar 

  36. 36.

    Bhardwaj, G. et al. Accurate de novo design of hyperstable constrained peptides. Nature 538, 329–335 (2016).

    CAS  Article  Google Scholar 

  37. 37.

    Sheffler, W. & Baker, D. RosettaHoles2: a volumetric packing measure for protein structure refinement and validation. Protein Sci. 19, 1991–1995 (2010).

    CAS  Article  Google Scholar 

  38. 38.

    Jones, D. T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999).

    CAS  Article  Google Scholar 

  39. 39.

    Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).

    CAS  Article  Google Scholar 

  40. 40.

    Studier, F. W. Protein production by auto-induction in high density shaking cultures. Protein Expres. Purif. 41, 207–234 (2005).

    CAS  Article  Google Scholar 

  41. 41.

    Delaglio, F. et al. NMRPipe: a –spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995).

    CAS  Article  Google Scholar 

  42. 42.

    Ying, J., Delaglio, F., Torchia, D. A. & Bax, A. Sparse multidimensional iterative lineshape-enhanced (SMILE) reconstruction of both non-uniformly sampled and conventional NMR data. J. Biomol. NMR 68, 101–118 (2017).

    CAS  Article  Google Scholar 

  43. 43.

    Lee, W., Tonelli, M. & Markley, J. L. Nmrfam-Sparky: enhanced software for biomolecular NMR spectroscopy. Bioinformatics 31, 1325–1327 (2015).

    Article  Google Scholar 

  44. 44.

    Nerli, S., McShan, A. C. & Sgourakis, N. G. Chemical shift-based methods in NMR structure determination. Prog. Nucl. Mag. Res. Sp 106-107, 1–25 (2018).

    CAS  Article  Google Scholar 

  45. 45.

    Lange, O. F. Automatic NOESY assignment in CS-RASREC-Rosetta. J. Biomol. NMR 59, 147–159 (2014).

    CAS  Article  Google Scholar 

  46. 46.

    Lange, O. F. & Baker, D. Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 80, 884–895 (2012).

    CAS  Article  Google Scholar 

  47. 47.

    Berjanskii, M. V. & Wishart, D. S. Unraveling the meaning of chemical shifts in protein NMR. Biochim. Biophys. Acta 1865, 1564–1576 (2017).

    CAS  Article  Google Scholar 

  48. 48.

    Nilges, M. A calculation strategy for the structure determination of symmetric dimers by 1H NMR. Proteins 17, 297–309 (1993).

    CAS  Article  Google Scholar 

  49. 49.

    Nilges, M. Ambiguous distance data in the calculation of NMR structures. Fold Des. 2, S53–S57 (1997).

    CAS  Article  Google Scholar 

  50. 50.

    Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319, 209–227 (2002).

    CAS  Article  Google Scholar 

  51. 51.

    Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56, 227–241 (2013).

    CAS  Article  Google Scholar 

  52. 52.

    Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 66, 12–21 (2010).

    CAS  Article  Google Scholar 

  53. 53.

    Costantini, S., Colonna, G. & Facchiano, A. M. ESBRI: a web server for evaluating salt bridges in proteins. Bioinformation 3, 137–138 (2008).

    Article  Google Scholar 

  54. 54.

    The PyMOL Molecular Graphics System, Version 1.7.2 (Schrödinger, LLC, 2016).––

Download references


We thank S. Rettie for mass spectrometry assistance and the rest of the Baker laboratory members for discussion. We acknowledge computing resources provided by the Hyak supercomputer system funded by the STF at the University of Washington, and Rosetta@Home volunteers in ab initio structure prediction calculations. Work carried out at the Baker laboratory was supported by the Howard Hughes Medical Institute, Open Philanthropy, and the Defense Threat Reduction Agency. E.M. was supported by a Marie Curie International Outgoing Fellowship (FP7-PEOPLE-2011-IOF 298976). G.O. was supported by a Marie Curie International Outgoing Fellowship (FP7-PEOPLE-2012-IOF 332094). This research was financially supported by the Ministry of Education, Youths and Sports of the Czech Republic within the CEITEC 2020 (LQ1601) project, the Grant Agency of Masaryk University (to K.T.), an R35 Outstanding Investigator Award to N.G.S. through NIGMS (1R35GM125034-01), and the Office of the Director, NIH, under High End Instrumentation Grant S10OD018455, which funded the 800-MHz NMR spectrometer at UCSC. IRB Barcelona is the recipient of a Severo Ochoa Award of Excellence from the Ministry of Economy, Industry and Competitiveness (government of Spain).

Author information




E.M. designed the research, carried out the loop structural analysis, set up the design method and performed design calculations. T.M.C. carried out design calculations, protein expression, purification and CD experiments. A.C.M. collected 4D NMR data. T.E. performed 4D-CHAINS analysis. S.N. carried out AutoNOE-Rosetta calculations. L.C. expressed isotopically labeled proteins and performed SEC-MALS analysis. L.G.N. designed the research and carried out design calculations. A.D. and G.O. helped in protein expression and characterization. K.T. and N.G.S. supervised NMR structure determination. D.B. designed and supervised the research. E.M., and D.B. prepared the manuscript with input from all authors.

Corresponding authors

Correspondence to Enrique Marcos or David Baker.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Coarse-grained representation of the Ramachandran plot based on ABEGO torsion bins.

ABEGO torsion bins provide a convenient way to classify the backbone geometry of protein residues based on the Ramachandran plot region of their ϕ/ψ dihedrals. A” corresponds to the right-handed α-helix, “B” to the extended region typical of β-strands, “E” to the extended region with a positive ϕ dihedral, “G” to a left-handed α-helix (mostly accessible by l-glycine). The “O” bin is assigned for the cis peptide conformation (torsion around the peptide bond with the preceding residue, Cα(i)-N(i)-C(i – 1)-Cα(i – 1), below 90º). Plotted data were collected from the residue torsional values of all β-arch loops from a non-redundant set of naturally occurring protein structures. Analysis of these loops revealed the sidechain pattern preferences of different loop ABEGO types, a critical feature for design

Supplementary Figure 2 SEC-MALS analysis of BH_10 and BH_11 designs.

Both proteins are monodisperse and have estimated molecular weights in good agreement with the theoretical value of the monomers

Supplementary Figure 3 Experimental characterization of the designed protein BH_11.

a, Cartoon representation of the design model. b, Calculated folding energy landscape. Each dot represents the lowest energy structure obtained from ab initio folding trajectories starting from an extended chain (red dots), biased forward folding trajectories (blue dots) or local relaxation of the designed structure (green dots); the x axis shows the Cα-root mean square deviation (RMSD) from the designed model; the y axis shows the Rosetta all-atom energy. c, Far-ultraviolet circular dichroism spectra (blue, 25 °C; red, 95 °C; green, 25 °C after cooling). d, 1H-15N HSQC spectra obtained at 37 °C at a 1H field of 800 MHz

Supplementary Figure 4 Validation of BH_10 backbone NMR assignments with TALOS-N analysis.

a, Sequence window displaying the BH_10 protein sequence with each residue colored according to classification in TALOS-N analysis (J. Biomol. NMR 56, 227–241, 2013). Green, good classification; yellow, ambiguous/no classification; red, bad classification (in the case of L66, its bad classification is likely due to its loop position); gray, no classification. b, Bottom, TALOS-N-derived secondary structure index (SSI) derived from combined 1H, 15N and 13C chemical shifts for BH_10. α-helices are colored red and β-sheets are colored aqua. Top, the random coil index order parameter (RCI-S2) for BH_10. Analysis of these NMR backbone resonance assignments using TALOS-N shows excellent agreement between the chemical-shift-derived secondary structure index and the DSSP annotation of the de novo model

Supplementary Figure 5 NOE patterns observed in β-arcade regions in the lowest-energy BH_10 structural model.

a,b, AutoNOE-Rosetta-assigned NOE contacts (red dashed lines) observed between β-arcade regions (a) and within β-arcade regions (b) in the lowest-energy structural model determined using RASREC-Rosetta. The sidechains for residues involving NOE contacts are highlighted by sticks

Supplementary Figure 6 Surface salt bridges of design BH_10.

Salt bridges of the computational model (top) that were supported by the NOE assignment in the NMR ensemble (bottom). Most of these salt bridges correspond to residues involved in the pairing between β-arcades 1 and 3 (E33:R64, E35:R62 and E78:R23). A salt bridge within β-arcade 3 is also well supported by the NOEs (E15:R62)

Supplementary Figure 7 Loop sequences and patterns of design BH_10.

a, Designed protein sequence and ABEGO strings of loops (β-strands, β-arches and the β-hairpin are colored in green, red and black, respectively). Critical residues determining the sidechain patterns of β-arches are in blue. b, All-atom stereo representation of the design with backbone hydrogen bonding and salt bridge interactions highlighted. Critical loop positions, such as prolines in β-arches or the central β-hairpin residues, are also indicated

Supplementary Figure 8 Naturally occurring proteins with the most similar structures or sequences to design BH_10.

a, The three closest structural analogs identified with DALI (Nucleic Acids Res. 44, W351–W355, 2016) are homodimers. The sequence identity over structurally aligned regions ranged from 7 to 19%. b, The three protein domains most similar in sequence and with structure available, as identified with HHpred (J. Mol. Biol. 430, 2237–2243, 2018) are also homodimers or part of a larger structure. For these three proteins, the sequence identity with BH_10 was 21% and E-values ranged between 0.14 and 0.39. Each chain is colored differently. The natural proteins identified all exhibit more irregular secondary structures, longer loops and extra elements building protein interfaces

Supplementary Figure 9 Sequence determinants of β-arch formation.

For the energy landscapes of different types of mutations, we calculated the frequency of formation of β-hairpins between every two consecutive β-strands. Increases of β-hairpin formation were correlated with a decrease of near-native sampling. a,d,g,i, Calculated energy landscapes for mutants assessing different types of interactions (black) are compared with the landscape of BH_10 (red). b,e,h, Effect of mutations on β-strand pairing (red, original design; black, mutant). Mutated loop connections are labeled with the corresponding amino acid substitution. Most of the mutations increase sampling of more local β-hairpin connections. Connection S4-S5 corresponds to the central β-hairpin of the β-helix. c, Sidechain packing interactions stabilizing β-arch loop connections that when mutated to alanine decrease β-arch stability and favor β-hairpin sampling. Mutant V18A favors hairpin sampling in the neighboring β-arch of the same β-arcade. f, Sidechain-backbone hydrogen-bonding interactions stabilizing β-arch loop geometry; upon removal by alanine substitutions, β-hairpin sampling increases. e, Mutations in the S6-S7 and S7-S8 connections favors sampling of β-hairpins between S6 and S8

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9 and Supplementary Tables 1–5

Reporting Summary

Supplementary Dataset 1

β-arch loop dataset analyzed from naturally occurring protein structures

Supplementary Dataset 2

Atomic coordinates of the BH_10 computational design model

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Marcos, E., Chidyausiku, T.M., McShan, A.C. et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nat Struct Mol Biol 25, 1028–1034 (2018).

Download citation


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing