β-sheet proteins carry out critical functions in biology, and hence are attractive scaffolds for computational protein design. Despite this potential, de novo design of all-β-sheet proteins from first principles lags far behind the design of all-α or mixed-αβ domains owing to their non-local nature and the tendency of exposed β-strand edges to aggregate. Through study of loops connecting unpaired β-strands (β-arches), we have identified a series of structural relationships between loop geometry, side chain directionality and β-strand length that arise from hydrogen bonding and packing constraints on regular β-sheet structures. We use these rules to de novo design jellyroll structures with double-stranded β-helices formed by eight antiparallel β-strands. The nuclear magnetic resonance structure of a hyperthermostable design closely matched the computational model, demonstrating accurate control over the β-sheet structure and loop geometry. Our results open the door to the design of a broad range of non-local β-sheet protein structures.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

NMR chemical shifts and NOESY cross-peak lists used to determine structures of BH_10 have been deposited in the BMRB with accession code 30495. Coordinates of the ten lowest-energy structures and the restraint lists have been deposited in the wwPDB as PDB 6E5C. The design model of BH_10 is available as Supplementary Dataset 1, and the loop dataset used to analyze the side chain patterns of naturally occurring β-arches is available in Supplementary Dataset 2. Other data are available from the corresponding authors upon reasonable request.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Kortemme, T., Ramírez-Alvarado, M. & Serrano, L. Design of a 20-amino acid, three-stranded β-sheet protein. Science 281, 253–256 (1998).

  2. 2.

    Searle, M. S. & Ciani, B. Design of β-sheet systems for understanding the thermodynamics and kinetics of protein folding. Curr. Opin. Struct. Biol. 14, 458–464 (2004).

  3. 3.

    Hughes, R. M. & Waters, M. L. Model systems for β-hairpins and β-sheets. Curr. Opin. Struct. Biol. 16, 514–524 (2006).

  4. 4.

    Marcos, E. & Adriano-Silva, D. Essentials of de novo protein design: methods and applications. WIREs Comput. Mol. Sci. 8, e1374 (2018).

  5. 5.

    Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).

  6. 6.

    Hecht, M. H. De novo design of β-sheet proteins. Proc. Natl Acad. Sci. USA 91, 8729–8730 (1994).

  7. 7.

    Plaxco, K. W., Simons, K. T. & Baker, D. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277, 985–994 (1998).

  8. 8.

    Quinn, T. P., Tweedy, N. B., Williams, R. W., Richardson, J. S. & Richardson, D. C. Betadoublet: de novo design, synthesis, and characterization of a β-sandwich protein. Proc. Natl Acad. Sci. USA 91, 8747–8751 (1994).

  9. 9.

    Nanda, V. et al. De novo design of a redox-active minimal rubredoxin mimic. J. Am. Chem. Soc. 127, 5804–5805 (2005).

  10. 10.

    Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).

  11. 11.

    Voet, A. R. D. et al. Computational design of a self-assembling symmetrical β-propeller protein. Proc. Natl Acad. Sci. USA 111, 15102–15107 (2014).

  12. 12.

    MacDonald, J. T. Synthetic β-solenoid proteins with the fragment-free computational design of a β-hairpin extension. Proc. Natl Acad. Sci. USA 113, 10346–10351 (2016).

  13. 13.

    Ottesen, J. J. & Imperiali, B. Design of a discretely folded mini-protein motif with predominantly β-structure. Nat. Struct. Biol. 8, 535–539 (2001).

  14. 14.

    Hu, X., Wang, H., Ke, H. & Kuhlman, B. Computer-based redesign of β sandwich protein suggests that extensive negative design is not required for de novo β sheet design. Structure 16, 1799–1805 (2008).

  15. 15.

    Hennetin, J., Jullian, B., Steven, A. C. & Kajava, A. V. Standard conformations of beta-arches in β-solenoid proteins. J. Mol. Biol. 358, 1094–1105 (2006).

  16. 16.

    Lin, Y.-R. et al. Control over overall shape and size in de novo designed proteins. Proc. Natl Acad. Sci. USA 112, E5478–E5485 (2015).

  17. 17.

    Kajava, A. V., Baxa, U. & Steven, A. C. β arcades: recurring motifs in naturally occurring and disease-related amyloid fibrils. FASEB J. 24, 1311–1319 (2010).

  18. 18.

    Kuhlman, B. & Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl Acad. Sci. USA 97, 10383–10388 (2000).

  19. 19.

    Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).

  20. 20.

    Richardson, J. S. & Richardson, D. C. Natural beta-sheet proteins use negative design to avoid edge-to-edge aggregation. Proc. Natl Acad. Sci. USA 99, 2754–2759 (2002).

  21. 21.

    Marcos, E. et al. Principles for designing proteins with cavities formed by curved β sheets. Science 355, 201–206 (2017).

  22. 22.

    Rohl, C. A., Strauss, C. E. M., Misura, K. M. S. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol. 383, 66–93 (2004).

  23. 23.

    Bradley, P. Toward high-resolution de novo structure prediction for small proteins. Science 309, 1868–1871 (2005).

  24. 24.

    Kuhn, M., Meiler, J. & Baker, D. Strand-loop-strand motifs: prediction of hairpins and diverging turns in proteins. Proteins 54, 282–288 (2004).

  25. 25.

    Bradley, P. & Baker, D. Improved β-protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation. Proteins 65, 922–929 (2006).

  26. 26.

    Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

  27. 27.

    Camacho, C. et al. BLAST: architecture and applications. BMC Bioinformatics 10, 421 (2009).

  28. 28.

    Evangelidis, T. et al. Automated NMR resonance assignments and structure determination using a minimal set of 4D spectra. Nat. Commun. 9, 384 (2018).

  29. 29.

    Holm, L. & Laakso, L. M. Dali server update. Nucleic Acids Res. 44, W351–W355 (2016).

  30. 30.

    Zimmermann, L. et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 430, 2237–2243 (2018).

  31. 31.

    Clark, P. Protein folding in the cell: reshaping the folding funnel. Trends Biochem. Sci. 29, 527–534 (2004).

  32. 32.

    Wang, G. & Dunbrack, R. L. Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).

  33. 33.

    Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).

  34. 34.

    Fleishman, S. J. et al. RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS ONE 6, e20161 (2011).

  35. 35.

    O’Meara, M. J. et al. Combined covalent-electrostatic model of hydrogen bonding improves structure prediction with Rosetta. J. Chem. Theory Comput. 11, 609–622 (2015).

  36. 36.

    Bhardwaj, G. et al. Accurate de novo design of hyperstable constrained peptides. Nature 538, 329–335 (2016).

  37. 37.

    Sheffler, W. & Baker, D. RosettaHoles2: a volumetric packing measure for protein structure refinement and validation. Protein Sci. 19, 1991–1995 (2010).

  38. 38.

    Jones, D. T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999).

  39. 39.

    Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).

  40. 40.

    Studier, F. W. Protein production by auto-induction in high density shaking cultures. Protein Expres. Purif. 41, 207–234 (2005).

  41. 41.

    Delaglio, F. et al. NMRPipe: a –spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995).

  42. 42.

    Ying, J., Delaglio, F., Torchia, D. A. & Bax, A. Sparse multidimensional iterative lineshape-enhanced (SMILE) reconstruction of both non-uniformly sampled and conventional NMR data. J. Biomol. NMR 68, 101–118 (2017).

  43. 43.

    Lee, W., Tonelli, M. & Markley, J. L. Nmrfam-Sparky: enhanced software for biomolecular NMR spectroscopy. Bioinformatics 31, 1325–1327 (2015).

  44. 44.

    Nerli, S., McShan, A. C. & Sgourakis, N. G. Chemical shift-based methods in NMR structure determination. Prog. Nucl. Mag. Res. Sp 106-107, 1–25 (2018).

  45. 45.

    Lange, O. F. Automatic NOESY assignment in CS-RASREC-Rosetta. J. Biomol. NMR 59, 147–159 (2014).

  46. 46.

    Lange, O. F. & Baker, D. Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 80, 884–895 (2012).

  47. 47.

    Berjanskii, M. V. & Wishart, D. S. Unraveling the meaning of chemical shifts in protein NMR. Biochim. Biophys. Acta 1865, 1564–1576 (2017).

  48. 48.

    Nilges, M. A calculation strategy for the structure determination of symmetric dimers by 1H NMR. Proteins 17, 297–309 (1993).

  49. 49.

    Nilges, M. Ambiguous distance data in the calculation of NMR structures. Fold Des. 2, S53–S57 (1997).

  50. 50.

    Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319, 209–227 (2002).

  51. 51.

    Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56, 227–241 (2013).

  52. 52.

    Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 66, 12–21 (2010).

  53. 53.

    Costantini, S., Colonna, G. & Facchiano, A. M. ESBRI: a web server for evaluating salt bridges in proteins. Bioinformation 3, 137–138 (2008).

  54. 54.

    The PyMOL Molecular Graphics System, Version 1.7.2 (Schrödinger, LLC, 2016).––

Download references


We thank S. Rettie for mass spectrometry assistance and the rest of the Baker laboratory members for discussion. We acknowledge computing resources provided by the Hyak supercomputer system funded by the STF at the University of Washington, and Rosetta@Home volunteers in ab initio structure prediction calculations. Work carried out at the Baker laboratory was supported by the Howard Hughes Medical Institute, Open Philanthropy, and the Defense Threat Reduction Agency. E.M. was supported by a Marie Curie International Outgoing Fellowship (FP7-PEOPLE-2011-IOF 298976). G.O. was supported by a Marie Curie International Outgoing Fellowship (FP7-PEOPLE-2012-IOF 332094). This research was financially supported by the Ministry of Education, Youths and Sports of the Czech Republic within the CEITEC 2020 (LQ1601) project, the Grant Agency of Masaryk University (to K.T.), an R35 Outstanding Investigator Award to N.G.S. through NIGMS (1R35GM125034-01), and the Office of the Director, NIH, under High End Instrumentation Grant S10OD018455, which funded the 800-MHz NMR spectrometer at UCSC. IRB Barcelona is the recipient of a Severo Ochoa Award of Excellence from the Ministry of Economy, Industry and Competitiveness (government of Spain).

Author information

Author notes

    • Lucas G. Nivón

    Present address: Cyrus Biotechnology, Seattle, WA, USA

    • Audrey Davis

    Present address: Amazon, Seattle, WA, USA

    • Gustav Oberdorfer

    Present address: Institute of Biochemistry, Graz University of Technology, Graz, Austria

  1. These authors contributed equally: Enrique Marcos, Tamuka M. Chidyausiku.


  1. Department of Biochemistry, University of Washington, Seattle, WA, USA

    • Enrique Marcos
    • , Tamuka M. Chidyausiku
    • , Lauren Carter
    • , Lucas G. Nivón
    • , Audrey Davis
    • , Gustav Oberdorfer
    •  & David Baker
  2. Institute for Protein Design, University of Washington, Seattle, WA, USA

    • Enrique Marcos
    • , Tamuka M. Chidyausiku
    • , Lauren Carter
    • , Lucas G. Nivón
    • , Audrey Davis
    • , Gustav Oberdorfer
    •  & David Baker
  3. Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology, Barcelona, Spain

    • Enrique Marcos
  4. Department of Chemistry and Biochemistry, University of California, Santa Cruz, Santa Cruz, CA, USA

    • Andrew C. McShan
    • , Santrupti Nerli
    •  & Nikolaos G. Sgourakis
  5. CEITEC—Central European Institute of Technology, Masaryk University, Brno, Czech Republic

    • Thomas Evangelidis
    •  & Konstantinos Tripsianes
  6. Department of Computer Science, University of California, Santa Cruz, Santa Cruz, CA, USA

    • Santrupti Nerli


  1. Search for Enrique Marcos in:

  2. Search for Tamuka M. Chidyausiku in:

  3. Search for Andrew C. McShan in:

  4. Search for Thomas Evangelidis in:

  5. Search for Santrupti Nerli in:

  6. Search for Lauren Carter in:

  7. Search for Lucas G. Nivón in:

  8. Search for Audrey Davis in:

  9. Search for Gustav Oberdorfer in:

  10. Search for Konstantinos Tripsianes in:

  11. Search for Nikolaos G. Sgourakis in:

  12. Search for David Baker in:


E.M. designed the research, carried out the loop structural analysis, set up the design method and performed design calculations. T.M.C. carried out design calculations, protein expression, purification and CD experiments. A.C.M. collected 4D NMR data. T.E. performed 4D-CHAINS analysis. S.N. carried out AutoNOE-Rosetta calculations. L.C. expressed isotopically labeled proteins and performed SEC-MALS analysis. L.G.N. designed the research and carried out design calculations. A.D. and G.O. helped in protein expression and characterization. K.T. and N.G.S. supervised NMR structure determination. D.B. designed and supervised the research. E.M., and D.B. prepared the manuscript with input from all authors.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Enrique Marcos or David Baker.

Integrated supplementary information

  1. Supplementary Figure 1 Coarse-grained representation of the Ramachandran plot based on ABEGO torsion bins.

    ABEGO torsion bins provide a convenient way to classify the backbone geometry of protein residues based on the Ramachandran plot region of their ϕ/ψ dihedrals. A” corresponds to the right-handed α-helix, “B” to the extended region typical of β-strands, “E” to the extended region with a positive ϕ dihedral, “G” to a left-handed α-helix (mostly accessible by l-glycine). The “O” bin is assigned for the cis peptide conformation (torsion around the peptide bond with the preceding residue, Cα(i)-N(i)-C(i – 1)-Cα(i – 1), below 90º). Plotted data were collected from the residue torsional values of all β-arch loops from a non-redundant set of naturally occurring protein structures. Analysis of these loops revealed the sidechain pattern preferences of different loop ABEGO types, a critical feature for design

  2. Supplementary Figure 2 SEC-MALS analysis of BH_10 and BH_11 designs.

    Both proteins are monodisperse and have estimated molecular weights in good agreement with the theoretical value of the monomers

  3. Supplementary Figure 3 Experimental characterization of the designed protein BH_11.

    a, Cartoon representation of the design model. b, Calculated folding energy landscape. Each dot represents the lowest energy structure obtained from ab initio folding trajectories starting from an extended chain (red dots), biased forward folding trajectories (blue dots) or local relaxation of the designed structure (green dots); the x axis shows the Cα-root mean square deviation (RMSD) from the designed model; the y axis shows the Rosetta all-atom energy. c, Far-ultraviolet circular dichroism spectra (blue, 25 °C; red, 95 °C; green, 25 °C after cooling). d, 1H-15N HSQC spectra obtained at 37 °C at a 1H field of 800 MHz

  4. Supplementary Figure 4 Validation of BH_10 backbone NMR assignments with TALOS-N analysis.

    a, Sequence window displaying the BH_10 protein sequence with each residue colored according to classification in TALOS-N analysis (J. Biomol. NMR 56, 227–241, 2013). Green, good classification; yellow, ambiguous/no classification; red, bad classification (in the case of L66, its bad classification is likely due to its loop position); gray, no classification. b, Bottom, TALOS-N-derived secondary structure index (SSI) derived from combined 1H, 15N and 13C chemical shifts for BH_10. α-helices are colored red and β-sheets are colored aqua. Top, the random coil index order parameter (RCI-S2) for BH_10. Analysis of these NMR backbone resonance assignments using TALOS-N shows excellent agreement between the chemical-shift-derived secondary structure index and the DSSP annotation of the de novo model

  5. Supplementary Figure 5 NOE patterns observed in β-arcade regions in the lowest-energy BH_10 structural model.

    a,b, AutoNOE-Rosetta-assigned NOE contacts (red dashed lines) observed between β-arcade regions (a) and within β-arcade regions (b) in the lowest-energy structural model determined using RASREC-Rosetta. The sidechains for residues involving NOE contacts are highlighted by sticks

  6. Supplementary Figure 6 Surface salt bridges of design BH_10.

    Salt bridges of the computational model (top) that were supported by the NOE assignment in the NMR ensemble (bottom). Most of these salt bridges correspond to residues involved in the pairing between β-arcades 1 and 3 (E33:R64, E35:R62 and E78:R23). A salt bridge within β-arcade 3 is also well supported by the NOEs (E15:R62)

  7. Supplementary Figure 7 Loop sequences and patterns of design BH_10.

    a, Designed protein sequence and ABEGO strings of loops (β-strands, β-arches and the β-hairpin are colored in green, red and black, respectively). Critical residues determining the sidechain patterns of β-arches are in blue. b, All-atom stereo representation of the design with backbone hydrogen bonding and salt bridge interactions highlighted. Critical loop positions, such as prolines in β-arches or the central β-hairpin residues, are also indicated

  8. Supplementary Figure 8 Naturally occurring proteins with the most similar structures or sequences to design BH_10.

    a, The three closest structural analogs identified with DALI (Nucleic Acids Res. 44, W351–W355, 2016) are homodimers. The sequence identity over structurally aligned regions ranged from 7 to 19%. b, The three protein domains most similar in sequence and with structure available, as identified with HHpred (J. Mol. Biol. 430, 2237–2243, 2018) are also homodimers or part of a larger structure. For these three proteins, the sequence identity with BH_10 was 21% and E-values ranged between 0.14 and 0.39. Each chain is colored differently. The natural proteins identified all exhibit more irregular secondary structures, longer loops and extra elements building protein interfaces

  9. Supplementary Figure 9 Sequence determinants of β-arch formation.

    For the energy landscapes of different types of mutations, we calculated the frequency of formation of β-hairpins between every two consecutive β-strands. Increases of β-hairpin formation were correlated with a decrease of near-native sampling. a,d,g,i, Calculated energy landscapes for mutants assessing different types of interactions (black) are compared with the landscape of BH_10 (red). b,e,h, Effect of mutations on β-strand pairing (red, original design; black, mutant). Mutated loop connections are labeled with the corresponding amino acid substitution. Most of the mutations increase sampling of more local β-hairpin connections. Connection S4-S5 corresponds to the central β-hairpin of the β-helix. c, Sidechain packing interactions stabilizing β-arch loop connections that when mutated to alanine decrease β-arch stability and favor β-hairpin sampling. Mutant V18A favors hairpin sampling in the neighboring β-arch of the same β-arcade. f, Sidechain-backbone hydrogen-bonding interactions stabilizing β-arch loop geometry; upon removal by alanine substitutions, β-hairpin sampling increases. e, Mutations in the S6-S7 and S7-S8 connections favors sampling of β-hairpins between S6 and S8

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–9 and Supplementary Tables 1–5

  2. Reporting Summary

  3. Supplementary Dataset 1

    β-arch loop dataset analyzed from naturally occurring protein structures

  4. Supplementary Dataset 2

    Atomic coordinates of the BH_10 computational design model

About this article

Publication history