Naturally occurring, pharmacologically active peptides constrained with covalent crosslinks generally have shapes that have evolved to fit precisely into binding pockets on their targets. Such peptides can have excellent pharmaceutical properties, combining the stability and tissue penetration of small-molecule drugs with the specificity of much larger protein therapeutics. The ability to design constrained peptides with precisely specified tertiary structures would enable the design of shape-complementary inhibitors of arbitrary targets. Here we describe the development of computational methods for accurate de novo design of conformationally restricted peptides, and the use of these methods to design 18–47 residue, disulfide-crosslinked peptides, a subset of which are heterochiral and/or N–C backbone-cyclized. Both genetically encodable and non-canonical peptides are exceptionally stable to thermal and chemical denaturation, and 12 experimentally determined X-ray and NMR structures are nearly identical to the computational design models. The computational design methods and stable scaffolds presented here provide the basis for development of a new generation of peptide-based drugs.
Your institute does not have access to this article
Open Access articles citing this article.
Protein shape sampled by ion mobility mass spectrometry consistently improves protein structure prediction
Nature Communications Open Access 28 July 2022
Nature Communications Open Access 22 March 2022
Nano Convergence Open Access 13 January 2022
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Conibear, A. C. et al. Approaches to the stabilization of bioactive epitopes by grafting and peptide cyclization. Biopolymers 106, 89–100 (2016)
Craik, D. J., Fairlie, D. P., Liras, S. & Price, D. The future of peptide-based drugs. Chem. Biol. Drug Des. 81, 136–147 (2013)
Góngora-Benítez, M., Tulla-Puche, J. & Albericio, F. Multifaceted roles of disulfide bonds. Peptides as therapeutics. Chem. Rev. 114, 901–926 (2014)
Kimura, R. H., Levin, A. M., Cochran, F. V. & Cochran, J. R. Engineered cystine knot peptides that bind αvβ3, αvβ5, and α5β1 integrins with low-nanomolar affinity. Proteins 77, 359–369 (2009)
Boyken, S. E. et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science 352, 680–687 (2016)
Brunette, T. J. et al. Exploring the repeat protein universe through computational protein design. Nature 528, 580–584 (2015)
Lin, Y.-R. et al. Control over overall shape and size in de novo designed proteins. Proc. Natl Acad. Sci. USA 112, E5478–E5485 (2015)
Doyle, L. et al. Rational design of α-helical tandem repeat proteins with closed architectures. Nature 528, 585–588 (2015)
Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012)
Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011)
Huang, P.-S. et al. High thermodynamic stability of parametrically designed helical bundles. Science 346, 481–485 (2014)
Bandaranayake, A. D. et al. Daedalus: a robust, turnkey platform for rapid production of decigram quantities of active recombinant proteins in human cell lines using novel lentiviral vectors. Nucleic Acids Res. 39, e143 (2011)
Sagaram, U. S. et al. Structural and functional studies of a phosphatidic acid-binding antifungal plant defensin MtDef4: identification of an RGFRRR motif governing fungal cell entry. PLoS One 8, e82485 (2013)
Liu, G. et al. NMR data collection and analysis protocol for high-throughput protein structure determination. Proc. Natl Acad. Sci. USA 102, 10487–10492 (2005)
Sharma, D. & Rajarathnam, K. 13C NMR chemical shifts can predict disulfide bond formation. J. Biomol. NMR 18, 165–171 (2000)
Richardson, J. S. β-Sheet topology and the relatedness of proteins. Nature 268, 495–500 (1977)
Syud, F. A., Stanger, H. E. & Gellman, S. H. Interstrand side chain–side chain interactions in a designed β-hairpin: significance of both lateral and diagonal pairings. J. Am. Chem. Soc. 123, 8667–8677 (2001)
Lai, J. R., Huck, B. R., Weisblum, B. & Gellman, S. H. Design of non-cysteine-containing antimicrobial β-hairpins: structure-activity relationship studies with linear protegrin-1 analogues. Biochemistry 41, 12835–12842 (2002)
Wang, J., Yadav, V., Smart, A. L., Tajiri, S. & Basit, A. W. Toward oral delivery of biopharmaceuticals: an assessment of the gastrointestinal stability of 17 peptide drugs. Mol. Pharm. 12, 966–973 (2015)
Coutsias, E. A., Seok, C., Jacobson, M. P. & Dill, K. A. A kinematic view of loop closure. J. Comput. Chem. 25, 510–528 (2004)
Mandell, D. J., Coutsias, E. A. & Kortemme, T. Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling. Nat. Methods 6, 551–552 (2009)
Trabi, M., Schirra, H. J. & Craik, D. J. Three-dimensional structure of RTD-1, a cyclic antimicrobial defensin from Rhesus macaque leukocytes. Biochemistry 40, 4211–4221 (2001)
Sia, S. K. & Kim, P. S. A designed protein with packing between left-handed and right-handed helices. Biochemistry 40, 8981–8989 (2001)
Renfrew, P. D., Douglas Renfrew, P., Choi, E. J., Richard, B. & Brian, K. Incorporation of noncanonical amino acids into Rosetta and use in computational protein-peptide interface design. PLoS One 7, e32637 (2012)
Drew, K. et al. Adding diverse noncanonical backbones to Rosetta: enabling peptidomimetic design. PLoS One 8, e67051 (2013)
Fleishman, S. J. et al. Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science 332, 816–821 (2011)
Huang, P.-S. et al. RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS One 6, e24109 (2011)
Lee, J., Lee, D., Park, H., Coutsias, E. A. & Seok, C. Protein loop modeling by using fragment assembly and analytical loop closure. Proteins 78, 3428–3436 (2010)
Harrison, P. M. & Sternberg, M. J. Analysis and classification of disulphide connectivity in proteins. The entropic effect of cross-linkage. J. Mol. Biol. 244, 448–463 (1994)
Rodriguez-Granillo, A., Annavarapu, S., Zhang, L., Koder, R. L. & Nanda, V. Computational design of thermostabilizing d-amino acid substitutions. J. Am. Chem. Soc. 133, 18750–18759 (2011)
O’Meara, M. J. et al. Combined covalent-electrostatic model of hydrogen bonding improves structure prediction with Rosetta. J. Chem. Theory Comput. 11, 609–622 (2015)
Bradley, P., Misura, K. M. S. & Baker, D. Toward high-resolution de novo structure prediction for small proteins. Science 309, 1868–1871 (2005)
Caves, L. S., Evanseck, J. D. & Karplus, M. Locally accessible conformations of proteins: multiple molecular dynamics simulations of crambin. Protein Sci. 7, 649–666 (1998)
Wijma, H. J. et al. Computationally designed libraries for rapid enzyme stabilization. Protein Eng. Des. Sel. 27, 49–58 (2014)
Case, D. A. et al. AMBER 12 http://ambermd.org/doc12/Amber12.pdf (Univ. California, 2012)
Jorgensen, W. L. & Corky, J. Temperature dependence of TIP3P, SPC, and TIP4P water from NPT Monte Carlo simulations: seeking temperatures of maximum density. J. Comput. Chem. 19, 1179–1186 (1998)
Loncharich, R. J., Brooks, B. R. & Pastor, R. W. Langevin dynamics of peptides: the frictional dependence of isomerization rates of N-acetylalanyl-N′-methylamide. Biopolymers 32, 523–535 (1992)
Darden, T., York, D. & Pedersen, L. Particle mesh Ewald: an N · log(N) method for Ewald sums in large systems. J. Chem. Phys. 98, 10089–10092 (1993)
Ryckaert, J.-P., Giovanni, C. & Berendsen, H. J. C. Numerical integration of the Cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327–341 (1977)
Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996)
Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009)
Kotzsch, A. et al. A secretory system for bacterial production of high-profile protein targets. Protein Sci. 20, 597–609 (2011)
Marblestone, J. G. et al. Comparison of SUMO fusion technology with traditional gene fusion systems: enhanced expression and solubility with SUMO. Protein Sci. 15, 182–189 (2006)
Studier, F. W. Protein production by auto-induction in high-density shaking cultures. Protein Expr. Purif. 41, 207–234 (2005)
Neu, H. C. & Heppel, L. A. The release of enzymes from Escherichia coli by osmotic shock and during the formation of spheroplasts. J. Biol. Chem. 240, 3685–3692 (1965)
Cheneval, O. et al. Fmoc-based synthesis of disulfide-rich cyclic peptides. J. Org. Chem. 79, 5538–5544 (2014)
Pace, C. N. Determination and analysis of urea and guanidine hydrochloride denaturation curves. Methods Enzymol. 131, 266–280 (1986)
Neri, D. et al. Stereospecific nuclear magnetic resonance assignments of the methyl groups of valine and leucine in the DNA-binding domain of the 434 repressor by biosynthetically directed fractional carbon-13 labeling. Biochemistry 28, 7510–7516 (1989)
Herve du Penhoat, C. et al. The NMR solution structure of the 30S ribosomal protein S27e encoded in gene RS27_ARCFU of Archaeoglobus fulgidis reveals a novel protein fold. Protein Sci. 13, 1407–1416 (2004)
Shen, Y., Delaglio, F., Cornilescu, G. & Bax, A. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J. Biomol. NMR 44, 213–223 (2009)
Linge, J. P., Williams, M. A., Spronk, C. A. E. M., Alexandre, M. J. & Michael, N. Refinement of protein structures in explicit solvent. Proteins Struct. Funct. Bioinf. 50, 496–506 (2003)
Bhattacharya, A., Tejero, R. & Montelione, G. T. Evaluating protein structures determined by structural genomics consortia. Proteins 66, 778–795 (2007)
Vranken, W. F. et al. The CCPN data model for NMR spectroscopy: development of a software pipeline. Proteins Struct. Funct. Bioinf. 59, 687–696 (2005)
Shen, Y. & Bax, A. Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J. Biomol. NMR 56, 227–241 (2013)
Brunger, A. T. Version 1.2 of the Crystallography and NMR system. Nat. Protocols 2, 2728–2733 (2007)
Nederveen, A. J. et al. RECOORD: a recalculated coordinate database of 500 proteins from the PDB using restraints from the BioMagResBank. Proteins Struct. Funct. Bioinf. 59, 662–672 (2005)
Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D 66, 12–21 (2010)
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Cryst. 40, 658–674 (2007)
Computer time was awarded by the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. This research used resources of the Argonne Leadership Computing Facility, a Department of Energy (DOE) Office of Science User Facility supported under contract DE-AC02-06CH11357. We thank the University of Washington Hyak supercomputing network for computing and data storage resources, and Rosetta@Home volunteer participants on BOINC for additional computing resources. We are grateful for facility access at the Queensland NMR Network. We thank D. Alonso, J. Bardwell, G. Bhabha, T.J. Brunette, D. Ekiert, A. Ford, N. Hasle, B. Keir, N. Koga, Y. Liu, D. Madden, B. Mao, D. May, V. Ovchinnikov, S. Srivatsan, L. Stewart, R. van Deursen, and M. Williamson for help and advice, and R. Krishnamurty, P. Hosseinzadeh, and A. Vorobieva for critical comments and manuscript suggestions. This work was supported by NIH grant P50 AG005136 supporting the Alzheimer’s Disease Research Center, philanthropic gifts from the Three Dreamers and Washington Research Foundation, and funding from the Howard Hughes Medical Institute. The Australian Research Council funds D.J.C. as an Australian Laureate Fellow (FL150100146). C.D.B. was supported by NIH grant T32-H600035. T.S. acknowledges NIH support (GM094597), and S.V.S.R.K.P., A.E. and X.X. were supported with NESG funds. E.C. is funded by NIGMS GM090205. We thank P. Rupert and R.K. Strong at the Fred Hutchinson Cancer Research Center for aid in collecting and refining X-ray data for gEHEE_06. G.W.B. was funded by the National Institute of Allergy and Infectious Diseases, National Institute of Health, Department of Health and Human Services (Federal contract HHSN272201200025C). A portion of this research was performed using EMSL, a DOE Office of Science User Facility sponsored by the Office of Biological and Environmental Research and located at Pacific Northwest National Laboratory.
The authors declare no competing financial interests.
Reviewer Information Nature thanks V. Nanda and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
An Fo − Fc omit-map is shown in blue, contoured at 4σ, for design gEHEE_06. Disulfide sulfur atoms were removed, and the omit-map was calculated following real-space refinement. The gEHEE_06 structure is shown in grey as a cartoon representation. Disulfide bonds are shown here as sticks, with sulfur atoms in yellow and carbon atoms in grey.
Inputs are shown in blue, RosettaScripts-automated parts of the pipeline are in green, parts carried out by Rosetta standalone applications are pink (the fragment picker application) and purple (the various structure prediction applications), parts performed with MD software are yellow, and manual steps are grey. a, Fragment-dependent design workflow. Final computational validation was carried out using MD simulations and fragment-based Rosetta ab initio structure prediction. For peptides containing isolated d-amino acids, these residues were mutated to glycine for Rosetta ab initio structure prediction. b, Fragment-free design workflow using GenKIC. This approach permits design of non-canonical topologies like the mixed HLHR topology, which occurs in no known natural protein. The GenKIC-based structure prediction algorithm is described in Extended Data Fig. 7 and in Supplementary Information.
Extended Data Figure 3 Sidechain placement in non-canonical peptide designs chosen for experimental characterization.
Designs are shown as cartoon and stick representations (top row in each box) and as van der Waals spheres showing sidechain packing (bottom row in each box). l-amino acid residues are shown in cyan, and d-amino acid residues are coloured orange. Sidechains of d- or l-variants of alanine, phenylalanine, isoleucine, leucine, valine, tryptophan and tyrosine are coloured grey to aid visualization of hydrophobic packing interactions. Top box, disulfide-stapled non-canonical peptide designs; bottom box, N-to-C cyclic non-canonical peptide designs.
Fifty independent molecular dynamics (MD) simulations in explicit solvent conditions, all starting from the designed peptide, were used for discriminating good, kinetically stable (for example, EHE_D1) designs from non-optimal designs of the same topology (for example, EHE_X18 and EHE_X11). a, Five representative trajectories from MD simulation runs. Designs that showed good convergence and smaller fluctuations were selected for further experimental characterization. b, r.m.s.d. distribution from all 50 trajectories. Blue line indicates the Gaussian kernel density estimate for the data. Only the last one-third of the trajectory was used for this analysis. Designs with narrower distributions were picked for further testing. c, Concatenated trajectory of all 50 independent runs show lower fluctuations for the more optimal designs.
The NMR structure of NC_EEH_D1 does not match the designed topology. a, Rosetta-designed model for NC_EEH_D1. b, Ensemble of conformers representing the NMR solution structure. c, Superposition of the designed model (blue) with a representative NMR conformer (green).
Design NC_EHE_D1 and PDB entry 2MA5 show weak but significant (e-value, 2 × 10−4) sequence alignment, which is highlighted in purple. The aligned region folds into very different structures in the different contexts of peptide and protein.
GenKIC allows sampling of closed conformations of arbitrary chains of atoms, passing through canonical or non-canonical backbone or sidechain linkages. Bond length, bond angle and torsional degrees of freedom in the chain can be fixed, perturbed from a starting value by small amounts, set to user-defined values, or sampled randomly. The algorithm then solves for six torsion angles adjacent to three user-defined pivot atoms in order to enforce closure of the loop. The many solutions from the closure are then filtered internally, and each can be subjected to arbitrary user-defined Rosetta protocols and filtration in order to prune the solution list further. A single solution is selected from those passing filters by a user-defined selection criterion. This flowchart shows the steps in a single invocation of the algorithm; for sampling, a user may specify that the algorithm be applied any number of times. User inputs are shown in blue, steps carried out by the GenKIC algorithm itself are in green, steps carried out by Rosetta code external to the GenKIC algorithm are shown in yellow, and outputs are shown in salmon.
a, Flowchart of the steps required to generate a single sampled conformation. In typical usage, this process would be repeated tens of thousands of times to produce many samples. Inputs (the peptide sequence and an optional PDB file for the design structure) are shown in blue, and outputs (the sampled structure, its energy, and its r.m.s.d. from the design structure) are shown in salmon. Steps performed by the GenKIC algorithm are shaded green, and setup and completion steps performed by the simple_cycpep_predict application are shown in yellow. Further details of this algorithm are discussed in Supplementary Information. b, The initial, random peptide conformation with bad terminal peptide bond geometry. c, Ensemble of closed conformations found for a single closure attempt. In this example, residue 7 (cyan) is the fixed anchor residue. Certain regions of the peptide have been set to left- or right-handed helical conformations before solving closure equations. d, A single closed solution with relative cysteine sidechain orientations that pass the initial, low-stringency filter for disulfide (fa_dslf) conformational energy. e, The resulting structure, following sidechain repacking, energy minimization, and cyclic de-permutation.
Left column, RP-HPLC traces for the parental designs; middle and right, same for the resurfaced designs where applicable. Traces for proteins run under oxidizing conditions are shown as black lines, while traces for proteins run following reduction with 10 mM DTT are shown as red lines. Insets, gels highlighting the SDS–PAGE mobility of each purified protein under oxidizing (left band) and reducing conditions (right band). Under each row of panels are shown sequence alignments with the mutated positions highlighted in red, along with theoretical isoelectric points as calculated by ProtParam.
a, b, Mutational tolerance of the d-proline, l-proline loop of design NC_cEE_D1 (green in a), assessed by secondary 1Hα chemical shift (p.p.m.) for the design sequence (black bars in b) and the p18d loop mutation (red bars). Eliminating this key proline residue does not result in loss of β-strand signal. c, d, Mutational tolerance of loop region of design NC_HEE_D1 (green in c), as assessed by CD spectroscopy for the design sequence (left plot in d) and for the D19T, p20q, P21D triple mutant (right plot in d). Both proline residues may be mutated without loss of secondary structure or major change in the thermal stability. e–g, Computationally predicted mutational tolerance of design NC_HLHR_D1, across the entire sequence. Each position was successively mutated in silico to d- or l-alanine, arginine, aspartate, phenylalanine, or valine (preserving the position’s chirality), and full folding simulations were carried out with the Rosetta simple_cycpep_predict application. Folding funnel quality was evaluated using the Pnear metric described in Methods. e, Representative plots of energy versus r.m.s.d. from the design structure, plotted for the design sequence (top), for the non-disruptive R14F mutation (middle), and for the e18v mutation (bottom). Results from GenKIC-based structure prediction runs are shown in blue, and relaxation runs, in orange. Note that the bottom case shows many sampled states far from the design state with energy equal to or less than the design state energy. f, Mutational tolerance by position (vertical axis) and mutation (horizontal axis). Blue rectangles represent well-tolerated mutations, and red to black rectangles represent disruptive mutations, based on Pnear evaluation of the folding funnel. Black borders indicate the design sequence. g, Mutational tolerance mapped onto the NC_HLHR_D1 structure, with colours as in f. Most positions tolerate mutation well, with only the disulfide bridge (C8–c21) and the salt bridges formed by e18 being highly sensitive. The hydrogen bond networks formed by residues Q5, e24 and s25 show some moderate sensitivity to mutation, as do residues E3 and e16.
This file contains Supplementary Sections 1-4. Section 1 contains a detailed description of the computational methods development and example protocols for running the computational methods. Section 2 contains NMR spectra and structure determination statistics. Section 3 contains data from experimental screening of designs. Section 4 contains detailed experimental characterization and validation of reported designs. Collectively, this supplementary information contains details enabling the critical assessment and reproduction of the computational and experimental results described in the main text. (PDF 26062 kb)
This tar archive contains the PDB output files from Rosetta for all designed peptides reported in the main text. (ZIP 1750 kb)
About this article
Cite this article
Bhardwaj, G., Mulligan, V., Bahl, C. et al. Accurate de novo design of hyperstable constrained peptides. Nature 538, 329–335 (2016). https://doi.org/10.1038/nature19791
Nano Convergence (2022)
Protein shape sampled by ion mobility mass spectrometry consistently improves protein structure prediction
Nature Communications (2022)
Nature Communications (2022)
PepEngine: A Manually Curated Structural Database of Peptides Containing α, β- Dehydrophenylalanine (ΔPhe) and α-Amino Isobutyric Acid (Aib)
International Journal of Peptide Research and Therapeutics (2022)
Nature Chemical Biology (2021)