Letter | Published:

Exploring the repeat protein universe through computational protein design

Nature volume 528, pages 580584 (24 December 2015) | Download Citation


A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit1 are widespread in nature and have critical roles in molecular recognition, signalling, and other essential biological processes2. Naturally occurring repeat proteins have been re-engineered for molecular recognition and modular scaffolding applications3,4,5. Here we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix–loop–helix–loop structural motif. Eighty-three designs with sequences unrelated to known repeat proteins were experimentally characterized. Of these, 53 are monomeric and stable at 95 °C, and 43 have solution X-ray scattering spectra consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with root mean square deviations ranging from 0.7 to 2.5 Å. Our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


Data deposits

Crystal structures have been deposited in the RCSB Protein Data Bank with the accession numbers 5CWB (DHR4), 5CWC (DHR5), 5CWD (DHR7), 5CWF (DHR8), 5CWG (DHR10), 5CWH (DHR14), 5CWI (DHR18), 5CWJ (DHR49), 5CWK (DHR53), 5CWL (DHR54), 5CWM (DHR64), 5CWN (DHR71), 5CWO (DHR76), 5CWP (DHR79) and 5CWQ (DHR81).


  1. 1.

    Tandem repeats in proteins: from sequence to structure. J. Struct. Biol. 179, 279–288 (2012)

  2. 2.

    , , & A census of protein repeats. J. Mol. Biol. 293, 151–160 (1999)

  3. 3.

    et al. High-affinity binders selected from designed ankyrin repeat protein libraries. Nature Biotechnol. 22, 575–582 (2004)

  4. 4.

    , , , & Designed Armadillo repeat proteins: library generation, characterization and selection of peptide binders with high specificity. J. Mol. Biol. 424, 68–87 (2012)

  5. 5.

    , , & Designed proteins to modulate cellular networks. ACS Chem. Biol. 5, 545–552 (2010)

  6. 6.

    & When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem. Sci. 25, 509–515 (2000)

  7. 7.

    , , , , & De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nature Chem. Biol. (2015)

  8. 8.

    & Calorimetric study of a series of designed repeat proteins: modular structure and modular folding. Protein Sci. 20, 336–340 (2011)

  9. 9.

    , , , & Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins. J. Mol. Biol. 332, 489–503 (2003)

  10. 10.

    , & Consensus-derived structural determinants of the ankyrin repeat motif. Proc. Natl Acad. Sci. USA 99, 16029–16034 (2002)

  11. 11.

    , , , & Design of stable α-helical arrays from an idealized TPR motif. Structure 11, 497–508 (2003)

  12. 12.

    et al. Design, production and molecular structure of a new family of artificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-like repeats. J. Mol. Biol. 404, 307–327 (2010)

  13. 13.

    et al. Design of a binding scaffold based on variable lymphocyte receptors of jawless vertebrates by module engineering. Proc. Natl Acad. Sci. USA 109, 3299–3304 (2012)

  14. 14.

    et al. Designed Armadillo repeat proteins as general peptide-binding scaffolds: consensus design and computational optimization of the hydrophobic core. J. Mol. Biol. 376, 1282–1304 (2008)

  15. 15.

    & Reconstruction of functional β-propeller lectins via homo-oligomeric assembly of shorter fragments. J. Mol. Biol. 365, 10–17 (2007)

  16. 16.

    et al. An artificial PPR scaffold for programmable RNA recognition. Nature Commun. 5, 5729 (2014)

  17. 17.

    , , , & Computational design of a leucine-rich repeat protein with a predefined geometry. Proc. Natl Acad. Sci. USA 111, 17875–17880 (2014)

  18. 18.

    & Experimental support for the evolution of symmetric protein architecture from a simple peptide motif. Proc. Natl Acad. Sci. USA 108, 126–130 (2011)

  19. 19.

    et al. Computational design of a self-assembling symmetrical β-propeller protein. Proc. Natl Acad. Sci. USA 111, 15102–15107 (2014)

  20. 20.

    et al. A general computational approach for repeat protein design. J. Mol. Biol. 427, 563–575 (2015)

  21. 21.

    & Enhancing the stability and folding rate of a repeat protein through the addition of consensus repeats. J. Mol. Biol. 365, 1187–1200 (2007)

  22. 22.

    et al. Control of repeat-protein curvature by computational protein design. Nature Struct. Mol. Biol. 22, 167–174 (2015)

  23. 23.

    et al. RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS ONE 6, e24109 (2011)

  24. 24.

    et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011)

  25. 25.

    et al. High thermodynamic stability of parametrically designed helical bundles. Science 346, 481–485 (2014)

  26. 26.

    , & Toward high-resolution de novo structure prediction for small proteins. Science 309, 1868–1871 (2005)

  27. 27.

    & Super-resolution in solution X-ray scattering and its applications to structural systems biology. Annu. Rev. Biophys. 42, 415–441 (2013)

  28. 28.

    et al. Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS). Nature Methods 6, 606–612 (2009)

  29. 29.

    & Accurate assessment of mass, models and resolution by small-angle scattering. Nature 496, 477–481 (2013)

  30. 30.

    et al. Comprehensive macromolecular conformations mapped by quantitative SAXS analyses. Nature Methods 10, 453–454 (2013)

  31. 31.

    et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

  32. 32.

    et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009)

  33. 33.

    , , & HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nature Methods 9, 173–175 (2012)

  34. 34.

    et al. The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012)

  35. 35.

    , , , & Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009)

  36. 36.

    & TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005)

  37. 37.

    et al. RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res. 42, D352–D357 (2014)

  38. 38.

    XDS. Acta Crystallogr. D 66, 125–132 (2010)

  39. 39.

    et al. PHENIX: building new software for automated crystallographic structure determination. Acta Crystallogr. D 58, 1948–1954 (2002)

  40. 40.

    & Coot: model-building tools for molecular graphics. Acta Crystallogr. D 60, 2126–2132 (2004)

  41. 41.

    et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D 66, 12–21 (2010)

  42. 42.

    et al. Implementation and performance of SIBYLS: a dual endstation small-angle X-ray scattering and macromolecular crystallography beamline at the Advanced Light Source. J. Appl. Crystallogr. 46, 1–13 (2013)

  43. 43.

    et al. Software for the high-throughput collection of SAXS data using an enhanced Blu-Ice/DCS control system. J. Synchrotron Radiat. 17, 774–781 (2010)

  44. 44.

    , , & Accurate SAXS profile computation and its assessment by contrast variation experiments. Biophys. J. 105, 962–974 (2013)

  45. 45.

    , & FoXS: a web server for rapid computation and fitting of SAXS profiles. Nucleic Acids Res. 38, W540–W544 (2010)

  46. 46.

    , & CRYSOL – a program to evaluate X-ray solution scattering of biological macromolecules from atomic coordinates. J. Appl. Crystallogr. 28, 768–773 (1995)

  47. 47.

    et al. New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Crystallogr. 45, 342–350 (2012)

Download references


We thank D. Kim and members of the protein production facility at the Institute for Protein Design. This work was facilitated though the use of advanced computational, storage and networking infrastructure provided by the Hyak supercomputer system at the University of Washington. This work was supported in part by grants from the National Science Foundation (NSF) (MCB-1445201 and CHE-1332907), the Defense Threat Reduction Agency (DTRA), the Air Force Office of Scientific Research (AFOSR) (FA950-12-10112) and the Howard Hughes Medical Institute (HHMI-027779). F.P. was the recipient of a Swiss National Science Foundation Postdoc Fellowship (PBZHP3-125470) and a Human Frontier Science Program Long-Term Fellowship (LT000070/2009-L). SAXS work at the Advanced Light Source SIBLYS beamline was supported by the National Institutes of Health grant MINOS (Macromolecular Insights on Nucleic Acids Optimized by Scattering) GM105404 and by United States Department of Energy program Integrated Diffraction Analysis Technologies (IDAT). D.C.E. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (Grant DRG-2140-12). G.B. is a recipient of the Merck fellowship of the Damon Runyon Cancer Research Foundation (DRG-2136-12) and is supported by NIH grant K99GM112982. J.A.T. is supported by a Robert A. Welch Distinguished Chair in Chemistry. We thank J. Holton for advice on S-SAD data collection, and the staff of ALS 8.2.1 and 8.3.1 for beamline support. The Advanced Light Source is supported by the Director, Office of Science, Office of Basic Energy Sciences, of the US Department of Energy under Contract No. DE-AC02-05CH11231. ALS beamline 8.3.1 is supported by the UC Office of the President, Multicampus Research Programs and Initiatives grant MR-15-338599 and the Program for Breakthrough Biomedical Research, which is partially funded by the Sandler Foundation. ALS beamline 8.2.1 and the Berkeley Center for Structural Biology are supported in part by the National Institutes of Health, National Institute of General Medical Sciences, and the Howard Hughes Medical Institute.

Author information

Author notes

    • TJ Brunette
    • , Fabio Parmeggiani
    •  & Po-Ssu Huang

    These authors contributed equally to this work.


  1. Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA

    • TJ Brunette
    • , Fabio Parmeggiani
    • , Po-Ssu Huang
    •  & David Baker
  2. Institute for Protein Design, University of Washington, Seattle, Washington 98195, USA

    • TJ Brunette
    • , Fabio Parmeggiani
    • , Po-Ssu Huang
    •  & David Baker
  3. Department of Cellular and Molecular Pharmacology, UCSF, San Francisco, California 94158, USA

    • Gira Bhabha
  4. Department of Microbiology and Immunology, UCSF, San Francisco, California 94158, USA

    • Damian C. Ekiert
  5. Molecular Biophysics & Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA

    • Susan E. Tsutakawa
    • , Greg L. Hura
    •  & John A. Tainer
  6. Department of Chemistry and Biochemistry, University of California, Santa Cruz, California 95064, USA

    • Greg L. Hura
  7. Department of Molecular and Cellular Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA

    • John A. Tainer
  8. Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA

    • David Baker


  1. Search for TJ Brunette in:

  2. Search for Fabio Parmeggiani in:

  3. Search for Po-Ssu Huang in:

  4. Search for Gira Bhabha in:

  5. Search for Damian C. Ekiert in:

  6. Search for Susan E. Tsutakawa in:

  7. Search for Greg L. Hura in:

  8. Search for John A. Tainer in:

  9. Search for David Baker in:


P.-S.H., F.P. and D.B. conceived the de novo repeat protein design project. T.B., F.P., P.-S.H. and D.B. conceived the large scale conformational sampling approach. T.B. developed the algorithm with help from F.P and P.-S.H. F.P. and T.B. expressed and characterized the design with help from P.-S.H. G.B. and D.C.E. setup crystallization trials and solved the crystal structures. F.P., S.E.T., G.L.H., J.T. collected and analysed the SAXS data. F.P., T.B., P.-S.H. and D.B. wrote the manuscript with help from all the authors.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to David Baker.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Discussions 1-5, Supplementary Tables 1-16 and additional references.

  2. 2.

    Supplementary Data 1

    This file contains Experimental Data part 1.

  3. 3.

    Supplementary Data 2

    This file contains Experimental Data part 2.

  4. 4.

    Supplementary Data 3

    This file contains Experimental Data part 3.

  5. 5.

    Supplementary Data 4

    This file contains Experimental Data part 4.

About this article

Publication history






Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.