Abstract
β-sheet proteins carry out critical functions in biology, and hence are attractive scaffolds for computational protein design. Despite this potential, de novo design of all-β-sheet proteins from first principles lags far behind the design of all-α or mixed-αβ domains owing to their non-local nature and the tendency of exposed β-strand edges to aggregate. Through study of loops connecting unpaired β-strands (β-arches), we have identified a series of structural relationships between loop geometry, side chain directionality and β-strand length that arise from hydrogen bonding and packing constraints on regular β-sheet structures. We use these rules to de novo design jellyroll structures with double-stranded β-helices formed by eight antiparallel β-strands. The nuclear magnetic resonance structure of a hyperthermostable design closely matched the computational model, demonstrating accurate control over the β-sheet structure and loop geometry. Our results open the door to the design of a broad range of non-local β-sheet protein structures.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Sampling of structure and sequence space of small protein folds
Nature Communications Open Access 22 November 2022
-
De novo design of immunoglobulin-like domains
Nature Communications Open Access 03 October 2022
-
ProtGPT2 is a deep unsupervised language model for protein design
Nature Communications Open Access 27 July 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout



Data availability
NMR chemical shifts and NOESY cross-peak lists used to determine structures of BH_10 have been deposited in the BMRB with accession code 30495. Coordinates of the ten lowest-energy structures and the restraint lists have been deposited in the wwPDB as PDB 6E5C. The design model of BH_10 is available as Supplementary Dataset 1, and the loop dataset used to analyze the side chain patterns of naturally occurring β-arches is available in Supplementary Dataset 2. Other data are available from the corresponding authors upon reasonable request.
References
Kortemme, T., Ramírez-Alvarado, M. & Serrano, L. Design of a 20-amino acid, three-stranded β-sheet protein. Science 281, 253–256 (1998).
Searle, M. S. & Ciani, B. Design of β-sheet systems for understanding the thermodynamics and kinetics of protein folding. Curr. Opin. Struct. Biol. 14, 458–464 (2004).
Hughes, R. M. & Waters, M. L. Model systems for β-hairpins and β-sheets. Curr. Opin. Struct. Biol. 16, 514–524 (2006).
Marcos, E. & Adriano-Silva, D. Essentials of de novo protein design: methods and applications. WIREs Comput. Mol. Sci. 8, e1374 (2018).
Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).
Hecht, M. H. De novo design of β-sheet proteins. Proc. Natl Acad. Sci. USA 91, 8729–8730 (1994).
Plaxco, K. W., Simons, K. T. & Baker, D. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277, 985–994 (1998).
Quinn, T. P., Tweedy, N. B., Williams, R. W., Richardson, J. S. & Richardson, D. C. Betadoublet: de novo design, synthesis, and characterization of a β-sandwich protein. Proc. Natl Acad. Sci. USA 91, 8747–8751 (1994).
Nanda, V. et al. De novo design of a redox-active minimal rubredoxin mimic. J. Am. Chem. Soc. 127, 5804–5805 (2005).
Dou, J. et al. De novo design of a fluorescence-activating β-barrel. Nature 561, 485–491 (2018).
Voet, A. R. D. et al. Computational design of a self-assembling symmetrical β-propeller protein. Proc. Natl Acad. Sci. USA 111, 15102–15107 (2014).
MacDonald, J. T. Synthetic β-solenoid proteins with the fragment-free computational design of a β-hairpin extension. Proc. Natl Acad. Sci. USA 113, 10346–10351 (2016).
Ottesen, J. J. & Imperiali, B. Design of a discretely folded mini-protein motif with predominantly β-structure. Nat. Struct. Biol. 8, 535–539 (2001).
Hu, X., Wang, H., Ke, H. & Kuhlman, B. Computer-based redesign of β sandwich protein suggests that extensive negative design is not required for de novo β sheet design. Structure 16, 1799–1805 (2008).
Hennetin, J., Jullian, B., Steven, A. C. & Kajava, A. V. Standard conformations of beta-arches in β-solenoid proteins. J. Mol. Biol. 358, 1094–1105 (2006).
Lin, Y.-R. et al. Control over overall shape and size in de novo designed proteins. Proc. Natl Acad. Sci. USA 112, E5478–E5485 (2015).
Kajava, A. V., Baxa, U. & Steven, A. C. β arcades: recurring motifs in naturally occurring and disease-related amyloid fibrils. FASEB J. 24, 1311–1319 (2010).
Kuhlman, B. & Baker, D. Native protein sequences are close to optimal for their structures. Proc. Natl Acad. Sci. USA 97, 10383–10388 (2000).
Kuhlman, B. et al. Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003).
Richardson, J. S. & Richardson, D. C. Natural beta-sheet proteins use negative design to avoid edge-to-edge aggregation. Proc. Natl Acad. Sci. USA 99, 2754–2759 (2002).
Marcos, E. et al. Principles for designing proteins with cavities formed by curved β sheets. Science 355, 201–206 (2017).
Rohl, C. A., Strauss, C. E. M., Misura, K. M. S. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol. 383, 66–93 (2004).
Bradley, P. Toward high-resolution de novo structure prediction for small proteins. Science 309, 1868–1871 (2005).
Kuhn, M., Meiler, J. & Baker, D. Strand-loop-strand motifs: prediction of hairpins and diverging turns in proteins. Proteins 54, 282–288 (2004).
Bradley, P. & Baker, D. Improved β-protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation. Proteins 65, 922–929 (2006).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Camacho, C. et al. BLAST: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Evangelidis, T. et al. Automated NMR resonance assignments and structure determination using a minimal set of 4D spectra. Nat. Commun. 9, 384 (2018).
Holm, L. & Laakso, L. M. Dali server update. Nucleic Acids Res. 44, W351–W355 (2016).
Zimmermann, L. et al. A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 430, 2237–2243 (2018).
Clark, P. Protein folding in the cell: reshaping the folding funnel. Trends Biochem. Sci. 29, 527–534 (2004).
Wang, G. & Dunbrack, R. L. Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).
Fleishman, S. J. et al. RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS ONE 6, e20161 (2011).
O’Meara, M. J. et al. Combined covalent-electrostatic model of hydrogen bonding improves structure prediction with Rosetta. J. Chem. Theory Comput. 11, 609–622 (2015).
Bhardwaj, G. et al. Accurate de novo design of hyperstable constrained peptides. Nature 538, 329–335 (2016).
Sheffler, W. & Baker, D. RosettaHoles2: a volumetric packing measure for protein structure refinement and validation. Protein Sci. 19, 1991–1995 (2010).
Jones, D. T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999).
Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017).
Studier, F. W. Protein production by auto-induction in high density shaking cultures. Protein Expres. Purif. 41, 207–234 (2005).
Delaglio, F. et al. NMRPipe: a –spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995).
Ying, J., Delaglio, F., Torchia, D. A. & Bax, A. Sparse multidimensional iterative lineshape-enhanced (SMILE) reconstruction of both non-uniformly sampled and conventional NMR data. J. Biomol. NMR 68, 101–118 (2017).
Lee, W., Tonelli, M. & Markley, J. L. Nmrfam-Sparky: enhanced software for biomolecular NMR spectroscopy. Bioinformatics 31, 1325–1327 (2015).
Nerli, S., McShan, A. C. & Sgourakis, N. G. Chemical shift-based methods in NMR structure determination. Prog. Nucl. Mag. Res. Sp 106-107, 1–25 (2018).
Lange, O. F. Automatic NOESY assignment in CS-RASREC-Rosetta. J. Biomol. NMR 59, 147–159 (2014).
Lange, O. F. & Baker, D. Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 80, 884–895 (2012).
Berjanskii, M. V. & Wishart, D. S. Unraveling the meaning of chemical shifts in protein NMR. Biochim. Biophys. Acta 1865, 1564–1576 (2017).
Nilges, M. A calculation strategy for the structure determination of symmetric dimers by 1H NMR. Proteins 17, 297–309 (1993).
Nilges, M. Ambiguous distance data in the calculation of NMR structures. Fold Des. 2, S53–S57 (1997).
Herrmann, T., Güntert, P. & Wüthrich, K. Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J. Mol. Biol. 319, 209–227 (2002).
Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56, 227–241 (2013).
Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 66, 12–21 (2010).
Costantini, S., Colonna, G. & Facchiano, A. M. ESBRI: a web server for evaluating salt bridges in proteins. Bioinformation 3, 137–138 (2008).
The PyMOL Molecular Graphics System, Version 1.7.2 (Schrödinger, LLC, 2016).––
Acknowledgements
We thank S. Rettie for mass spectrometry assistance and the rest of the Baker laboratory members for discussion. We acknowledge computing resources provided by the Hyak supercomputer system funded by the STF at the University of Washington, and Rosetta@Home volunteers in ab initio structure prediction calculations. Work carried out at the Baker laboratory was supported by the Howard Hughes Medical Institute, Open Philanthropy, and the Defense Threat Reduction Agency. E.M. was supported by a Marie Curie International Outgoing Fellowship (FP7-PEOPLE-2011-IOF 298976). G.O. was supported by a Marie Curie International Outgoing Fellowship (FP7-PEOPLE-2012-IOF 332094). This research was financially supported by the Ministry of Education, Youths and Sports of the Czech Republic within the CEITEC 2020 (LQ1601) project, the Grant Agency of Masaryk University (to K.T.), an R35 Outstanding Investigator Award to N.G.S. through NIGMS (1R35GM125034-01), and the Office of the Director, NIH, under High End Instrumentation Grant S10OD018455, which funded the 800-MHz NMR spectrometer at UCSC. IRB Barcelona is the recipient of a Severo Ochoa Award of Excellence from the Ministry of Economy, Industry and Competitiveness (government of Spain).
Author information
Authors and Affiliations
Contributions
E.M. designed the research, carried out the loop structural analysis, set up the design method and performed design calculations. T.M.C. carried out design calculations, protein expression, purification and CD experiments. A.C.M. collected 4D NMR data. T.E. performed 4D-CHAINS analysis. S.N. carried out AutoNOE-Rosetta calculations. L.C. expressed isotopically labeled proteins and performed SEC-MALS analysis. L.G.N. designed the research and carried out design calculations. A.D. and G.O. helped in protein expression and characterization. K.T. and N.G.S. supervised NMR structure determination. D.B. designed and supervised the research. E.M., and D.B. prepared the manuscript with input from all authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 Coarse-grained representation of the Ramachandran plot based on ABEGO torsion bins.
ABEGO torsion bins provide a convenient way to classify the backbone geometry of protein residues based on the Ramachandran plot region of their ϕ/ψ dihedrals. “A” corresponds to the right-handed α-helix, “B” to the extended region typical of β-strands, “E” to the extended region with a positive ϕ dihedral, “G” to a left-handed α-helix (mostly accessible by l-glycine). The “O” bin is assigned for the cis peptide conformation (torsion around the peptide bond with the preceding residue, Cα(i)-N(i)-C(i – 1)-Cα(i – 1), below 90º). Plotted data were collected from the residue torsional values of all β-arch loops from a non-redundant set of naturally occurring protein structures. Analysis of these loops revealed the sidechain pattern preferences of different loop ABEGO types, a critical feature for design
Supplementary Figure 2 SEC-MALS analysis of BH_10 and BH_11 designs.
Both proteins are monodisperse and have estimated molecular weights in good agreement with the theoretical value of the monomers
Supplementary Figure 3 Experimental characterization of the designed protein BH_11.
a, Cartoon representation of the design model. b, Calculated folding energy landscape. Each dot represents the lowest energy structure obtained from ab initio folding trajectories starting from an extended chain (red dots), biased forward folding trajectories (blue dots) or local relaxation of the designed structure (green dots); the x axis shows the Cα-root mean square deviation (RMSD) from the designed model; the y axis shows the Rosetta all-atom energy. c, Far-ultraviolet circular dichroism spectra (blue, 25 °C; red, 95 °C; green, 25 °C after cooling). d, 1H-15N HSQC spectra obtained at 37 °C at a 1H field of 800 MHz
Supplementary Figure 4 Validation of BH_10 backbone NMR assignments with TALOS-N analysis.
a, Sequence window displaying the BH_10 protein sequence with each residue colored according to classification in TALOS-N analysis (J. Biomol. NMR 56, 227–241, 2013). Green, good classification; yellow, ambiguous/no classification; red, bad classification (in the case of L66, its bad classification is likely due to its loop position); gray, no classification. b, Bottom, TALOS-N-derived secondary structure index (SSI) derived from combined 1H, 15N and 13C chemical shifts for BH_10. α-helices are colored red and β-sheets are colored aqua. Top, the random coil index order parameter (RCI-S2) for BH_10. Analysis of these NMR backbone resonance assignments using TALOS-N shows excellent agreement between the chemical-shift-derived secondary structure index and the DSSP annotation of the de novo model
Supplementary Figure 5 NOE patterns observed in β-arcade regions in the lowest-energy BH_10 structural model.
a,b, AutoNOE-Rosetta-assigned NOE contacts (red dashed lines) observed between β-arcade regions (a) and within β-arcade regions (b) in the lowest-energy structural model determined using RASREC-Rosetta. The sidechains for residues involving NOE contacts are highlighted by sticks
Supplementary Figure 6 Surface salt bridges of design BH_10.
Salt bridges of the computational model (top) that were supported by the NOE assignment in the NMR ensemble (bottom). Most of these salt bridges correspond to residues involved in the pairing between β-arcades 1 and 3 (E33:R64, E35:R62 and E78:R23). A salt bridge within β-arcade 3 is also well supported by the NOEs (E15:R62)
Supplementary Figure 7 Loop sequences and patterns of design BH_10.
a, Designed protein sequence and ABEGO strings of loops (β-strands, β-arches and the β-hairpin are colored in green, red and black, respectively). Critical residues determining the sidechain patterns of β-arches are in blue. b, All-atom stereo representation of the design with backbone hydrogen bonding and salt bridge interactions highlighted. Critical loop positions, such as prolines in β-arches or the central β-hairpin residues, are also indicated
Supplementary Figure 8 Naturally occurring proteins with the most similar structures or sequences to design BH_10.
a, The three closest structural analogs identified with DALI (Nucleic Acids Res. 44, W351–W355, 2016) are homodimers. The sequence identity over structurally aligned regions ranged from 7 to 19%. b, The three protein domains most similar in sequence and with structure available, as identified with HHpred (J. Mol. Biol. 430, 2237–2243, 2018) are also homodimers or part of a larger structure. For these three proteins, the sequence identity with BH_10 was 21% and E-values ranged between 0.14 and 0.39. Each chain is colored differently. The natural proteins identified all exhibit more irregular secondary structures, longer loops and extra elements building protein interfaces
Supplementary Figure 9 Sequence determinants of β-arch formation.
For the energy landscapes of different types of mutations, we calculated the frequency of formation of β-hairpins between every two consecutive β-strands. Increases of β-hairpin formation were correlated with a decrease of near-native sampling. a,d,g,i, Calculated energy landscapes for mutants assessing different types of interactions (black) are compared with the landscape of BH_10 (red). b,e,h, Effect of mutations on β-strand pairing (red, original design; black, mutant). Mutated loop connections are labeled with the corresponding amino acid substitution. Most of the mutations increase sampling of more local β-hairpin connections. Connection S4-S5 corresponds to the central β-hairpin of the β-helix. c, Sidechain packing interactions stabilizing β-arch loop connections that when mutated to alanine decrease β-arch stability and favor β-hairpin sampling. Mutant V18A favors hairpin sampling in the neighboring β-arch of the same β-arcade. f, Sidechain-backbone hydrogen-bonding interactions stabilizing β-arch loop geometry; upon removal by alanine substitutions, β-hairpin sampling increases. e, Mutations in the S6-S7 and S7-S8 connections favors sampling of β-hairpins between S6 and S8
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–9 and Supplementary Tables 1–5
Supplementary Dataset 1
β-arch loop dataset analyzed from naturally occurring protein structures
Supplementary Dataset 2
Atomic coordinates of the BH_10 computational design model
Rights and permissions
About this article
Cite this article
Marcos, E., Chidyausiku, T.M., McShan, A.C. et al. De novo design of a non-local β-sheet protein with high stability and accuracy. Nat Struct Mol Biol 25, 1028–1034 (2018). https://doi.org/10.1038/s41594-018-0141-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41594-018-0141-6
This article is cited by
-
Supramolecular assembly of protein building blocks: from folding to function
Nano Convergence (2022)
-
ProtGPT2 is a deep unsupervised language model for protein design
Nature Communications (2022)
-
De novo design of immunoglobulin-like domains
Nature Communications (2022)
-
Sampling of structure and sequence space of small protein folds
Nature Communications (2022)
-
Design of functionalised circular tandem repeat proteins with longer repeat topologies and enhanced subunit contact surfaces
Communications Biology (2021)