A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit1 are widespread in nature and have critical roles in molecular recognition, signalling, and other essential biological processes2. Naturally occurring repeat proteins have been re-engineered for molecular recognition and modular scaffolding applications3,4,5. Here we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix–loop–helix–loop structural motif. Eighty-three designs with sequences unrelated to known repeat proteins were experimentally characterized. Of these, 53 are monomeric and stable at 95 °C, and 43 have solution X-ray scattering spectra consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with root mean square deviations ranging from 0.7 to 2.5 Å. Our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Communications Open Access 24 June 2021
Nature Communications Open Access 16 April 2021
Pathway dissection, regulation, engineering and application: lessons learned from biobutanol production by solventogenic clostridia
Biotechnology for Biofuels Open Access 06 March 2020
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Protein Data Bank
Crystal structures have been deposited in the RCSB Protein Data Bank with the accession numbers 5CWB (DHR4), 5CWC (DHR5), 5CWD (DHR7), 5CWF (DHR8), 5CWG (DHR10), 5CWH (DHR14), 5CWI (DHR18), 5CWJ (DHR49), 5CWK (DHR53), 5CWL (DHR54), 5CWM (DHR64), 5CWN (DHR71), 5CWO (DHR76), 5CWP (DHR79) and 5CWQ (DHR81).
Kajava, A. V. Tandem repeats in proteins: from sequence to structure. J. Struct. Biol. 179, 279–288 (2012)
Marcotte, E. M., Pellegrini, M., Yeates, T. O. & Eisenberg, D. A census of protein repeats. J. Mol. Biol. 293, 151–160 (1999)
Binz, H. K. et al. High-affinity binders selected from designed ankyrin repeat protein libraries. Nature Biotechnol. 22, 575–582 (2004)
Varadamsetty, G., Tremmel, D., Hansen, S., Parmeggiani, F. & Plückthun, A. Designed Armadillo repeat proteins: library generation, characterization and selection of peptide binders with high specificity. J. Mol. Biol. 424, 68–87 (2012)
Cortajarena, A. L., Liu, T. Y., Hochstrasser, M. & Regan, L. Designed proteins to modulate cellular networks. ACS Chem. Biol. 5, 545–552 (2010)
Kobe, B. & Kajava, A. V. When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem. Sci. 25, 509–515 (2000)
Huang, P.-S., Feldmeier, K., Parmeggiani, F., Fernandez Velasco, D. A., Höcker, B. & Baker, D. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nature Chem. Biol. http://dx.doi.org/10.1038/nchembio.1966 (2015)
Cortajarena, A. L. & Regan, L. Calorimetric study of a series of designed repeat proteins: modular structure and modular folding. Protein Sci. 20, 336–340 (2011)
Binz, H. K., Stumpp, M. T., Forrer, P., Amstutz, P. & Plückthun, A. Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins. J. Mol. Biol. 332, 489–503 (2003)
Mosavi, L. K., Minor, D. L. & Peng, Z. Consensus-derived structural determinants of the ankyrin repeat motif. Proc. Natl Acad. Sci. USA 99, 16029–16034 (2002)
Main, E. R. G., Xiong, Y., Cocco, M. J., D’Andrea, L. & Regan, L. Design of stable α-helical arrays from an idealized TPR motif. Structure 11, 497–508 (2003)
Urvoas, A. et al. Design, production and molecular structure of a new family of artificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-like repeats. J. Mol. Biol. 404, 307–327 (2010)
Lee, S.-C. et al. Design of a binding scaffold based on variable lymphocyte receptors of jawless vertebrates by module engineering. Proc. Natl Acad. Sci. USA 109, 3299–3304 (2012)
Parmeggiani, F. et al. Designed Armadillo repeat proteins as general peptide-binding scaffolds: consensus design and computational optimization of the hydrophobic core. J. Mol. Biol. 376, 1282–1304 (2008)
Yadid, I. & Tawfik, D. S. Reconstruction of functional β-propeller lectins via homo-oligomeric assembly of shorter fragments. J. Mol. Biol. 365, 10–17 (2007)
Coquille, S. et al. An artificial PPR scaffold for programmable RNA recognition. Nature Commun. 5, 5729 (2014)
Rämisch, S., Weininger, U., Martinsson, J., Akke, M. & André, I. Computational design of a leucine-rich repeat protein with a predefined geometry. Proc. Natl Acad. Sci. USA 111, 17875–17880 (2014)
Lee, J. & Blaber, M. Experimental support for the evolution of symmetric protein architecture from a simple peptide motif. Proc. Natl Acad. Sci. USA 108, 126–130 (2011)
Voet, A. R. D. et al. Computational design of a self-assembling symmetrical β-propeller protein. Proc. Natl Acad. Sci. USA 111, 15102–15107 (2014)
Parmeggiani, F. et al. A general computational approach for repeat protein design. J. Mol. Biol. 427, 563–575 (2015)
Tripp, K. W. & Barrick, D. Enhancing the stability and folding rate of a repeat protein through the addition of consensus repeats. J. Mol. Biol. 365, 1187–1200 (2007)
Park, K. et al. Control of repeat-protein curvature by computational protein design. Nature Struct. Mol. Biol. 22, 167–174 (2015)
Huang, P.-S. et al. RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS ONE 6, e24109 (2011)
Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011)
Huang, P.-S. et al. High thermodynamic stability of parametrically designed helical bundles. Science 346, 481–485 (2014)
Bradley, P., Misura, K. M. S. & Baker, D. Toward high-resolution de novo structure prediction for small proteins. Science 309, 1868–1871 (2005)
Rambo, R. P. & Tainer, J. A. Super-resolution in solution X-ray scattering and its applications to structural systems biology. Annu. Rev. Biophys. 42, 415–441 (2013)
Hura, G. L. et al. Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS). Nature Methods 6, 606–612 (2009)
Rambo, R. P. & Tainer, J. A. Accurate assessment of mass, models and resolution by small-angle scattering. Nature 496, 477–481 (2013)
Hura, G. L. et al. Comprehensive macromolecular conformations mapped by quantitative SAXS analyses. Nature Methods 10, 453–454 (2013)
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009)
Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nature Methods 9, 173–175 (2012)
Punta, M. et al. The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012)
Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009)
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005)
Di Domenico, T. et al. RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res. 42, D352–D357 (2014)
Kabsch, W. XDS. Acta Crystallogr. D 66, 125–132 (2010)
Adams, P. D. et al. PHENIX: building new software for automated crystallographic structure determination. Acta Crystallogr. D 58, 1948–1954 (2002)
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D 60, 2126–2132 (2004)
Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D 66, 12–21 (2010)
Classen, S. et al. Implementation and performance of SIBYLS: a dual endstation small-angle X-ray scattering and macromolecular crystallography beamline at the Advanced Light Source. J. Appl. Crystallogr. 46, 1–13 (2013)
Classen, S. et al. Software for the high-throughput collection of SAXS data using an enhanced Blu-Ice/DCS control system. J. Synchrotron Radiat. 17, 774–781 (2010)
Schneidman-Duhovny, D., Hammel, M., Tainer, J. A. & Sali, A. Accurate SAXS profile computation and its assessment by contrast variation experiments. Biophys. J. 105, 962–974 (2013)
Schneidman-Duhovny, D., Hammel, M. & Sali, A. FoXS: a web server for rapid computation and fitting of SAXS profiles. Nucleic Acids Res. 38, W540–W544 (2010)
Svergun, D., Barberato, C. & Koch, M. H. J. CRYSOL – a program to evaluate X-ray solution scattering of biological macromolecules from atomic coordinates. J. Appl. Crystallogr. 28, 768–773 (1995)
Petoukhov, M. V. et al. New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Crystallogr. 45, 342–350 (2012)
We thank D. Kim and members of the protein production facility at the Institute for Protein Design. This work was facilitated though the use of advanced computational, storage and networking infrastructure provided by the Hyak supercomputer system at the University of Washington. This work was supported in part by grants from the National Science Foundation (NSF) (MCB-1445201 and CHE-1332907), the Defense Threat Reduction Agency (DTRA), the Air Force Office of Scientific Research (AFOSR) (FA950-12-10112) and the Howard Hughes Medical Institute (HHMI-027779). F.P. was the recipient of a Swiss National Science Foundation Postdoc Fellowship (PBZHP3-125470) and a Human Frontier Science Program Long-Term Fellowship (LT000070/2009-L). SAXS work at the Advanced Light Source SIBLYS beamline was supported by the National Institutes of Health grant MINOS (Macromolecular Insights on Nucleic Acids Optimized by Scattering) GM105404 and by United States Department of Energy program Integrated Diffraction Analysis Technologies (IDAT). D.C.E. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (Grant DRG-2140-12). G.B. is a recipient of the Merck fellowship of the Damon Runyon Cancer Research Foundation (DRG-2136-12) and is supported by NIH grant K99GM112982. J.A.T. is supported by a Robert A. Welch Distinguished Chair in Chemistry. We thank J. Holton for advice on S-SAD data collection, and the staff of ALS 8.2.1 and 8.3.1 for beamline support. The Advanced Light Source is supported by the Director, Office of Science, Office of Basic Energy Sciences, of the US Department of Energy under Contract No. DE-AC02-05CH11231. ALS beamline 8.3.1 is supported by the UC Office of the President, Multicampus Research Programs and Initiatives grant MR-15-338599 and the Program for Breakthrough Biomedical Research, which is partially funded by the Sandler Foundation. ALS beamline 8.2.1 and the Berkeley Center for Structural Biology are supported in part by the National Institutes of Health, National Institute of General Medical Sciences, and the Howard Hughes Medical Institute.
The authors declare no competing financial interests.
Extended data figures and tables
a, Flowchart of the design protocol. The green box indicates user-controlled inputs, the grey boxes represent steps where protein structure is created or modified, and the white boxes indicate where structures are filtered. b, Low-resolution backbone build. c, Quick full-atom design (grey) improves the backbone model (red). The superposition in the middle highlights the structural changes introduced. d, Structural profile: a 9-residue fragment is matched against the Protein Data Bank repository for structures within 0.5 Å r.m.s.d. The sequences from these structures are used to generate a sequence profile that influences design. e, Packing filters were used to discard designs with cavities in the core, illustrated as grey spheres.
a–c, Percentage of models accepted at backbone building or centroid (a), design (b) and ab initio (c) stages. Models are divided according to secondary structure length. The combination of loop 1 and loop 2 lengths is indicated on top. x and y axis indicate helix 1 and helix 2 lengths, respectively. The fraction of models in the bin that passed the selection stage is indicated in the side bar. Generally, one-residue loops and large differences between helix lengths reduce the number of selected models. d, Distribution of radius and twist of models in the three stages. e, Number of models passing design stages (log scale). From ~2.8 million structures, 761 are accepted.
To assess folding robustness seven sequence variants were made for each design. a–g, Illustrate the energy landscape explored by Rosetta ab initio. In red are the protein models produced by ab initio search, in green by side-chain repacking and minimization (relax). Models in deep global energy minima near the relaxed structures are considered folded. The variant with highest density of ab initio models near the relax region was chosen for experimental characterization (blue box). h, Jalview sequence alignment of the first 100 residues of the variants. The yellow bar height indicates sequence conservation, while the black bar represents how often the consensus sequence occurs.
Parameters for repeat protein family representatives were extracted as described in the Supplementary Information. The DHR models are the 761 proteins validated by in silico folding.
Extended Data Figure 5 Superposition between single internal repeats (second repeat) of designs (grey) and crystal structures (yellow).
Aliphatic and aromatic side chains are in red and cysteines are in orange. DHR7 and 18 show intra repeat disulfide bonds while DHR4 and 81 form inter-repeat cystines. DHR5 does not form the expected S–S bond. Core side chains in design recapitulate the conformation observed in the crystal structures. Even when the backbone is shifted (for example, DHR5, 8, 15), rotamers are by large correctly predicted.
a, Vr values for the fit of SAXS profiles to design models, in dark grey, and crystal structures, in yellow. For 43 designs, models are within the range defined by crystal structures. DHR49 and DHR76 form dimers in solution and the models employed the configuration observed in the crystal structures. Designs showing aggregation on the scattering profiles, including DHR5 for which the structure was solved, were not included in this figure. b, c, Pairwise Vr similarity maps30 of 43 design models. b, Experimental-to-model profile similarity (b) and model-to-model profile similarity (c). Models that are similar to each other show correlation off-diagonal in c, and the same pattern is observed when compared to experimental data in b. The order of display was obtained by clustering the original designed models by structural similarity. The ability to reproduce characteristic patterns within a large set of designs indicates that the models are capturing the relative structural similarities between proteins in solution. The scores are colour coded with red indicating best agreement and white lack of agreement.
Extended Data Figure 7 Designs are stable to chemical denaturation by guanidine hydrochloride (GuHCl).
Circular-dichroism-monitored GuHCl denaturant experiments were carried for two designs for which crystal structures were solved (DHR4 and DHR14), two with overall shapes confirmed by SAXS (DHR21 and DHR62), and two with overall shapes inconsistent with SAXS (DHR17 and DHR67). In contrast to almost all native proteins, four of the six proteins do not denature at GuHCl concentrations up to 7.5 M. Both designs not confirmed by SAXS were extremely stable to GuHCl denaturation and hence are very well-folded proteins; the discrepancies between the computed and experimental SAXS profiles may be due to small amounts of oligomeric species or variation in overall twist.
DHRs cluster separately from existing repeat proteins. DHRs are equally distributed between right-handed and left-handed repeats, as referred to the repeat handedness, in contrast to known α-helical repeat proteins, which are mostly right-handed. This result indicates that the handedness observed in known families is not an intrinsic limitation of repeat proteins structures. Repeat handedness, as defined by Kobe and Kajava6, indicates the rotation of the main chain going from the N- to the C-terminal around the axis connecting the repeat centres of mass. The structural similarity tree was built using pairwise comparison as measured by TM-score.
DHRs were characterized as containing four repeats but the number of internal repeats can be increased without additional design steps. Extended models highlight the differences in twist and radius between the validated designs.
This file contains Supplementary Discussions 1-5, Supplementary Tables 1-16 and additional references. (PDF 1156 kb)
This file contains Experimental Data part 1. (PDF 31983 kb)
This file contains Experimental Data part 2. (PDF 14051 kb)
This file contains Experimental Data part 3. (PDF 35230 kb)
This file contains Experimental Data part 4. (PDF 15058 kb)
About this article
Cite this article
Brunette, T., Parmeggiani, F., Huang, PS. et al. Exploring the repeat protein universe through computational protein design. Nature 528, 580–584 (2015). https://doi.org/10.1038/nature16162
Nature Chemical Biology (2022)
Nature Communications (2021)
Nature Communications (2021)
Nature Chemical Biology (2021)