A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit1 are widespread in nature and have critical roles in molecular recognition, signalling, and other essential biological processes2. Naturally occurring repeat proteins have been re-engineered for molecular recognition and modular scaffolding applications3,4,5. Here we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix–loop–helix–loop structural motif. Eighty-three designs with sequences unrelated to known repeat proteins were experimentally characterized. Of these, 53 are monomeric and stable at 95 °C, and 43 have solution X-ray scattering spectra consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with root mean square deviations ranging from 0.7 to 2.5 Å. Our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Protein Data Bank
Crystal structures have been deposited in the RCSB Protein Data Bank with the accession numbers 5CWB (DHR4), 5CWC (DHR5), 5CWD (DHR7), 5CWF (DHR8), 5CWG (DHR10), 5CWH (DHR14), 5CWI (DHR18), 5CWJ (DHR49), 5CWK (DHR53), 5CWL (DHR54), 5CWM (DHR64), 5CWN (DHR71), 5CWO (DHR76), 5CWP (DHR79) and 5CWQ (DHR81).
We thank D. Kim and members of the protein production facility at the Institute for Protein Design. This work was facilitated though the use of advanced computational, storage and networking infrastructure provided by the Hyak supercomputer system at the University of Washington. This work was supported in part by grants from the National Science Foundation (NSF) (MCB-1445201 and CHE-1332907), the Defense Threat Reduction Agency (DTRA), the Air Force Office of Scientific Research (AFOSR) (FA950-12-10112) and the Howard Hughes Medical Institute (HHMI-027779). F.P. was the recipient of a Swiss National Science Foundation Postdoc Fellowship (PBZHP3-125470) and a Human Frontier Science Program Long-Term Fellowship (LT000070/2009-L). SAXS work at the Advanced Light Source SIBLYS beamline was supported by the National Institutes of Health grant MINOS (Macromolecular Insights on Nucleic Acids Optimized by Scattering) GM105404 and by United States Department of Energy program Integrated Diffraction Analysis Technologies (IDAT). D.C.E. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (Grant DRG-2140-12). G.B. is a recipient of the Merck fellowship of the Damon Runyon Cancer Research Foundation (DRG-2136-12) and is supported by NIH grant K99GM112982. J.A.T. is supported by a Robert A. Welch Distinguished Chair in Chemistry. We thank J. Holton for advice on S-SAD data collection, and the staff of ALS 8.2.1 and 8.3.1 for beamline support. The Advanced Light Source is supported by the Director, Office of Science, Office of Basic Energy Sciences, of the US Department of Energy under Contract No. DE-AC02-05CH11231. ALS beamline 8.3.1 is supported by the UC Office of the President, Multicampus Research Programs and Initiatives grant MR-15-338599 and the Program for Breakthrough Biomedical Research, which is partially funded by the Sandler Foundation. ALS beamline 8.2.1 and the Berkeley Center for Structural Biology are supported in part by the National Institutes of Health, National Institute of General Medical Sciences, and the Howard Hughes Medical Institute.
Extended data figures
This file contains Supplementary Discussions 1-5, Supplementary Tables 1-16 and additional references.
This file contains Experimental Data part 1.
This file contains Experimental Data part 2.
This file contains Experimental Data part 3.
This file contains Experimental Data part 4.
About this article
Scientific Reports (2017)