Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Exploring the repeat protein universe through computational protein design


A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit1 are widespread in nature and have critical roles in molecular recognition, signalling, and other essential biological processes2. Naturally occurring repeat proteins have been re-engineered for molecular recognition and modular scaffolding applications3,4,5. Here we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix–loop–helix–loop structural motif. Eighty-three designs with sequences unrelated to known repeat proteins were experimentally characterized. Of these, 53 are monomeric and stable at 95 °C, and 43 have solution X-ray scattering spectra consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with root mean square deviations ranging from 0.7 to 2.5 Å. Our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Schematic overview of the computational design method.
Figure 2: The helical repeat protein universe.
Figure 3: Characterization of designed repeat proteins.
Figure 4: Crystal structures of 15 designs are in close agreement with the design models.

Accession codes

Primary accessions

Protein Data Bank

Data deposits

Crystal structures have been deposited in the RCSB Protein Data Bank with the accession numbers 5CWB (DHR4), 5CWC (DHR5), 5CWD (DHR7), 5CWF (DHR8), 5CWG (DHR10), 5CWH (DHR14), 5CWI (DHR18), 5CWJ (DHR49), 5CWK (DHR53), 5CWL (DHR54), 5CWM (DHR64), 5CWN (DHR71), 5CWO (DHR76), 5CWP (DHR79) and 5CWQ (DHR81).


  1. Kajava, A. V. Tandem repeats in proteins: from sequence to structure. J. Struct. Biol. 179, 279–288 (2012)

    CAS  Article  Google Scholar 

  2. Marcotte, E. M., Pellegrini, M., Yeates, T. O. & Eisenberg, D. A census of protein repeats. J. Mol. Biol. 293, 151–160 (1999)

    CAS  Article  Google Scholar 

  3. Binz, H. K. et al. High-affinity binders selected from designed ankyrin repeat protein libraries. Nature Biotechnol. 22, 575–582 (2004)

    CAS  Article  Google Scholar 

  4. Varadamsetty, G., Tremmel, D., Hansen, S., Parmeggiani, F. & Plückthun, A. Designed Armadillo repeat proteins: library generation, characterization and selection of peptide binders with high specificity. J. Mol. Biol. 424, 68–87 (2012)

    CAS  Article  Google Scholar 

  5. Cortajarena, A. L., Liu, T. Y., Hochstrasser, M. & Regan, L. Designed proteins to modulate cellular networks. ACS Chem. Biol. 5, 545–552 (2010)

    CAS  Article  Google Scholar 

  6. Kobe, B. & Kajava, A. V. When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem. Sci. 25, 509–515 (2000)

    CAS  Article  Google Scholar 

  7. Huang, P.-S., Feldmeier, K., Parmeggiani, F., Fernandez Velasco, D. A., Höcker, B. & Baker, D. De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nature Chem. Biol. (2015)

  8. Cortajarena, A. L. & Regan, L. Calorimetric study of a series of designed repeat proteins: modular structure and modular folding. Protein Sci. 20, 336–340 (2011)

    CAS  Article  Google Scholar 

  9. Binz, H. K., Stumpp, M. T., Forrer, P., Amstutz, P. & Plückthun, A. Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins. J. Mol. Biol. 332, 489–503 (2003)

    CAS  Article  Google Scholar 

  10. Mosavi, L. K., Minor, D. L. & Peng, Z. Consensus-derived structural determinants of the ankyrin repeat motif. Proc. Natl Acad. Sci. USA 99, 16029–16034 (2002)

    ADS  CAS  Article  Google Scholar 

  11. Main, E. R. G., Xiong, Y., Cocco, M. J., D’Andrea, L. & Regan, L. Design of stable α-helical arrays from an idealized TPR motif. Structure 11, 497–508 (2003)

    CAS  Article  Google Scholar 

  12. Urvoas, A. et al. Design, production and molecular structure of a new family of artificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-like repeats. J. Mol. Biol. 404, 307–327 (2010)

    CAS  Article  Google Scholar 

  13. Lee, S.-C. et al. Design of a binding scaffold based on variable lymphocyte receptors of jawless vertebrates by module engineering. Proc. Natl Acad. Sci. USA 109, 3299–3304 (2012)

    ADS  CAS  Article  Google Scholar 

  14. Parmeggiani, F. et al. Designed Armadillo repeat proteins as general peptide-binding scaffolds: consensus design and computational optimization of the hydrophobic core. J. Mol. Biol. 376, 1282–1304 (2008)

    CAS  Article  Google Scholar 

  15. Yadid, I. & Tawfik, D. S. Reconstruction of functional β-propeller lectins via homo-oligomeric assembly of shorter fragments. J. Mol. Biol. 365, 10–17 (2007)

    CAS  Article  Google Scholar 

  16. Coquille, S. et al. An artificial PPR scaffold for programmable RNA recognition. Nature Commun. 5, 5729 (2014)

    ADS  CAS  Article  Google Scholar 

  17. Rämisch, S., Weininger, U., Martinsson, J., Akke, M. & André, I. Computational design of a leucine-rich repeat protein with a predefined geometry. Proc. Natl Acad. Sci. USA 111, 17875–17880 (2014)

    ADS  Article  Google Scholar 

  18. Lee, J. & Blaber, M. Experimental support for the evolution of symmetric protein architecture from a simple peptide motif. Proc. Natl Acad. Sci. USA 108, 126–130 (2011)

    ADS  CAS  Article  Google Scholar 

  19. Voet, A. R. D. et al. Computational design of a self-assembling symmetrical β-propeller protein. Proc. Natl Acad. Sci. USA 111, 15102–15107 (2014)

    ADS  CAS  Article  Google Scholar 

  20. Parmeggiani, F. et al. A general computational approach for repeat protein design. J. Mol. Biol. 427, 563–575 (2015)

    CAS  Article  Google Scholar 

  21. Tripp, K. W. & Barrick, D. Enhancing the stability and folding rate of a repeat protein through the addition of consensus repeats. J. Mol. Biol. 365, 1187–1200 (2007)

    CAS  Article  Google Scholar 

  22. Park, K. et al. Control of repeat-protein curvature by computational protein design. Nature Struct. Mol. Biol. 22, 167–174 (2015)

    CAS  Article  Google Scholar 

  23. Huang, P.-S. et al. RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS ONE 6, e24109 (2011)

    ADS  CAS  Article  Google Scholar 

  24. Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011)

    CAS  Article  Google Scholar 

  25. Huang, P.-S. et al. High thermodynamic stability of parametrically designed helical bundles. Science 346, 481–485 (2014)

    ADS  CAS  Article  Google Scholar 

  26. Bradley, P., Misura, K. M. S. & Baker, D. Toward high-resolution de novo structure prediction for small proteins. Science 309, 1868–1871 (2005)

    ADS  CAS  Article  Google Scholar 

  27. Rambo, R. P. & Tainer, J. A. Super-resolution in solution X-ray scattering and its applications to structural systems biology. Annu. Rev. Biophys. 42, 415–441 (2013)

    CAS  Article  Google Scholar 

  28. Hura, G. L. et al. Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS). Nature Methods 6, 606–612 (2009)

    CAS  Article  Google Scholar 

  29. Rambo, R. P. & Tainer, J. A. Accurate assessment of mass, models and resolution by small-angle scattering. Nature 496, 477–481 (2013)

    ADS  CAS  Article  Google Scholar 

  30. Hura, G. L. et al. Comprehensive macromolecular conformations mapped by quantitative SAXS analyses. Nature Methods 10, 453–454 (2013)

    CAS  Article  Google Scholar 

  31. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    CAS  Article  Google Scholar 

  32. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009)

    Article  Google Scholar 

  33. Remmert, M., Biegert, A., Hauser, A. & Söding, J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nature Methods 9, 173–175 (2012)

    CAS  Article  Google Scholar 

  34. Punta, M. et al. The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012)

    CAS  Article  Google Scholar 

  35. Waterhouse, A. M., Procter, J. B., Martin, D. M. A., Clamp, M. & Barton, G. J. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009)

    CAS  Article  Google Scholar 

  36. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005)

    CAS  Article  Google Scholar 

  37. Di Domenico, T. et al. RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res. 42, D352–D357 (2014)

    CAS  Article  Google Scholar 

  38. Kabsch, W. XDS. Acta Crystallogr. D 66, 125–132 (2010)

    CAS  Article  Google Scholar 

  39. Adams, P. D. et al. PHENIX: building new software for automated crystallographic structure determination. Acta Crystallogr. D 58, 1948–1954 (2002)

    Article  Google Scholar 

  40. Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D 60, 2126–2132 (2004)

    Article  Google Scholar 

  41. Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D 66, 12–21 (2010)

    CAS  Article  Google Scholar 

  42. Classen, S. et al. Implementation and performance of SIBYLS: a dual endstation small-angle X-ray scattering and macromolecular crystallography beamline at the Advanced Light Source. J. Appl. Crystallogr. 46, 1–13 (2013)

    CAS  Article  Google Scholar 

  43. Classen, S. et al. Software for the high-throughput collection of SAXS data using an enhanced Blu-Ice/DCS control system. J. Synchrotron Radiat. 17, 774–781 (2010)

    CAS  Article  Google Scholar 

  44. Schneidman-Duhovny, D., Hammel, M., Tainer, J. A. & Sali, A. Accurate SAXS profile computation and its assessment by contrast variation experiments. Biophys. J. 105, 962–974 (2013)

    ADS  CAS  Article  Google Scholar 

  45. Schneidman-Duhovny, D., Hammel, M. & Sali, A. FoXS: a web server for rapid computation and fitting of SAXS profiles. Nucleic Acids Res. 38, W540–W544 (2010)

    CAS  Article  Google Scholar 

  46. Svergun, D., Barberato, C. & Koch, M. H. J. CRYSOL – a program to evaluate X-ray solution scattering of biological macromolecules from atomic coordinates. J. Appl. Crystallogr. 28, 768–773 (1995)

    CAS  Article  Google Scholar 

  47. Petoukhov, M. V. et al. New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Crystallogr. 45, 342–350 (2012)

    CAS  Article  Google Scholar 

Download references


We thank D. Kim and members of the protein production facility at the Institute for Protein Design. This work was facilitated though the use of advanced computational, storage and networking infrastructure provided by the Hyak supercomputer system at the University of Washington. This work was supported in part by grants from the National Science Foundation (NSF) (MCB-1445201 and CHE-1332907), the Defense Threat Reduction Agency (DTRA), the Air Force Office of Scientific Research (AFOSR) (FA950-12-10112) and the Howard Hughes Medical Institute (HHMI-027779). F.P. was the recipient of a Swiss National Science Foundation Postdoc Fellowship (PBZHP3-125470) and a Human Frontier Science Program Long-Term Fellowship (LT000070/2009-L). SAXS work at the Advanced Light Source SIBLYS beamline was supported by the National Institutes of Health grant MINOS (Macromolecular Insights on Nucleic Acids Optimized by Scattering) GM105404 and by United States Department of Energy program Integrated Diffraction Analysis Technologies (IDAT). D.C.E. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (Grant DRG-2140-12). G.B. is a recipient of the Merck fellowship of the Damon Runyon Cancer Research Foundation (DRG-2136-12) and is supported by NIH grant K99GM112982. J.A.T. is supported by a Robert A. Welch Distinguished Chair in Chemistry. We thank J. Holton for advice on S-SAD data collection, and the staff of ALS 8.2.1 and 8.3.1 for beamline support. The Advanced Light Source is supported by the Director, Office of Science, Office of Basic Energy Sciences, of the US Department of Energy under Contract No. DE-AC02-05CH11231. ALS beamline 8.3.1 is supported by the UC Office of the President, Multicampus Research Programs and Initiatives grant MR-15-338599 and the Program for Breakthrough Biomedical Research, which is partially funded by the Sandler Foundation. ALS beamline 8.2.1 and the Berkeley Center for Structural Biology are supported in part by the National Institutes of Health, National Institute of General Medical Sciences, and the Howard Hughes Medical Institute.

Author information

Authors and Affiliations



P.-S.H., F.P. and D.B. conceived the de novo repeat protein design project. T.B., F.P., P.-S.H. and D.B. conceived the large scale conformational sampling approach. T.B. developed the algorithm with help from F.P and P.-S.H. F.P. and T.B. expressed and characterized the design with help from P.-S.H. G.B. and D.C.E. setup crystallization trials and solved the crystal structures. F.P., S.E.T., G.L.H., J.T. collected and analysed the SAXS data. F.P., T.B., P.-S.H. and D.B. wrote the manuscript with help from all the authors.

Corresponding author

Correspondence to David Baker.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Computational protocol for designing de novo repeat proteins.

a, Flowchart of the design protocol. The green box indicates user-controlled inputs, the grey boxes represent steps where protein structure is created or modified, and the white boxes indicate where structures are filtered. b, Low-resolution backbone build. c, Quick full-atom design (grey) improves the backbone model (red). The superposition in the middle highlights the structural changes introduced. d, Structural profile: a 9-residue fragment is matched against the Protein Data Bank repository for structures within 0.5 Å r.m.s.d. The sequences from these structures are used to generate a sequence profile that influences design. e, Packing filters were used to discard designs with cavities in the core, illustrated as grey spheres.

Extended Data Figure 2 Repeat space explored and model discrimination across design stages.

ac, Percentage of models accepted at backbone building or centroid (a), design (b) and ab initio (c) stages. Models are divided according to secondary structure length. The combination of loop 1 and loop 2 lengths is indicated on top. x and y axis indicate helix 1 and helix 2 lengths, respectively. The fraction of models in the bin that passed the selection stage is indicated in the side bar. Generally, one-residue loops and large differences between helix lengths reduce the number of selected models. d, Distribution of radius and twist of models in the three stages. e, Number of models passing design stages (log scale). From ~2.8 million structures, 761 are accepted.

Extended Data Figure 3 Model validation by in silico folding.

To assess folding robustness seven sequence variants were made for each design. ag, Illustrate the energy landscape explored by Rosetta ab initio. In red are the protein models produced by ab initio search, in green by side-chain repacking and minimization (relax). Models in deep global energy minima near the relaxed structures are considered folded. The variant with highest density of ab initio models near the relax region was chosen for experimental characterization (blue box). h, Jalview sequence alignment of the first 100 residues of the variants. The yellow bar height indicates sequence conservation, while the black bar represents how often the consensus sequence occurs.

Extended Data Figure 4 Distribution of DHR axial displacement (z) and twist (ω).

Parameters for repeat protein family representatives were extracted as described in the Supplementary Information. The DHR models are the 761 proteins validated by in silico folding.

Extended Data Figure 5 Superposition between single internal repeats (second repeat) of designs (grey) and crystal structures (yellow).

Aliphatic and aromatic side chains are in red and cysteines are in orange. DHR7 and 18 show intra repeat disulfide bonds while DHR4 and 81 form inter-repeat cystines. DHR5 does not form the expected S–S bond. Core side chains in design recapitulate the conformation observed in the crystal structures. Even when the backbone is shifted (for example, DHR5, 8, 15), rotamers are by large correctly predicted.

Extended Data Figure 6 Structural validation by SAXS.

a, Vr values for the fit of SAXS profiles to design models, in dark grey, and crystal structures, in yellow. For 43 designs, models are within the range defined by crystal structures. DHR49 and DHR76 form dimers in solution and the models employed the configuration observed in the crystal structures. Designs showing aggregation on the scattering profiles, including DHR5 for which the structure was solved, were not included in this figure. b, c, Pairwise Vr similarity maps30 of 43 design models. b, Experimental-to-model profile similarity (b) and model-to-model profile similarity (c). Models that are similar to each other show correlation off-diagonal in c, and the same pattern is observed when compared to experimental data in b. The order of display was obtained by clustering the original designed models by structural similarity. The ability to reproduce characteristic patterns within a large set of designs indicates that the models are capturing the relative structural similarities between proteins in solution. The scores are colour coded with red indicating best agreement and white lack of agreement.

Extended Data Figure 7 Designs are stable to chemical denaturation by guanidine hydrochloride (GuHCl).

Circular-dichroism-monitored GuHCl denaturant experiments were carried for two designs for which crystal structures were solved (DHR4 and DHR14), two with overall shapes confirmed by SAXS (DHR21 and DHR62), and two with overall shapes inconsistent with SAXS (DHR17 and DHR67). In contrast to almost all native proteins, four of the six proteins do not denature at GuHCl concentrations up to 7.5 M. Both designs not confirmed by SAXS were extremely stable to GuHCl denaturation and hence are very well-folded proteins; the discrepancies between the computed and experimental SAXS profiles may be due to small amounts of oligomeric species or variation in overall twist.

Extended Data Figure 8 Structural similarity between DHRs and repeat protein families.

DHRs cluster separately from existing repeat proteins. DHRs are equally distributed between right-handed and left-handed repeats, as referred to the repeat handedness, in contrast to known α-helical repeat proteins, which are mostly right-handed. This result indicates that the handedness observed in known families is not an intrinsic limitation of repeat proteins structures. Repeat handedness, as defined by Kobe and Kajava6, indicates the rotation of the main chain going from the N- to the C-terminal around the axis connecting the repeat centres of mass. The structural similarity tree was built using pairwise comparison as measured by TM-score.

Extended Data Figure 9 Extended versions of models validated by SAXS and crystallography.

DHRs were characterized as containing four repeats but the number of internal repeats can be increased without additional design steps. Extended models highlight the differences in twist and radius between the validated designs.

Supplementary information

Supplementary Information

This file contains Supplementary Discussions 1-5, Supplementary Tables 1-16 and additional references. (PDF 1156 kb)

Supplementary Data 1

This file contains Experimental Data part 1. (PDF 31983 kb)

Supplementary Data 2

This file contains Experimental Data part 2. (PDF 14051 kb)

Supplementary Data 3

This file contains Experimental Data part 3. (PDF 35230 kb)

Supplementary Data 4

This file contains Experimental Data part 4. (PDF 15058 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Brunette, T., Parmeggiani, F., Huang, PS. et al. Exploring the repeat protein universe through computational protein design. Nature 528, 580–584 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing