Classical studies show that for many proteins, the information required for specifying the tertiary structure is contained in the amino acid sequence. Here, we attempt to define the sequence rules for specifying a protein fold by computationally creating artificial protein sequences using only statistical information encoded in a multiple sequence alignment and no tertiary structure information. Experimental testing of libraries of artificial WW domain sequences shows that a simple statistical energy function capturing coevolution between amino acid residues is necessary and sufficient to specify sequences that fold into native structures. The artificial proteins show thermodynamic stabilities similar to natural WW domains, and structure determination of one artificial protein shows excellent agreement with the WW fold at atomic resolution. The relative simplicity of the information used for creating sequences suggests a marked reduction to the potential complexity of the protein-folding problem.
Your institute does not have access to this article
Open Access articles citing this article.
Scientific Reports Open Access 17 January 2022
Nature Communications Open Access 02 November 2021
BMC Bioinformatics Open Access 10 June 2021
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Anfinsen, C. B. Principles that govern the folding of protein chains. Science 181, 223–230 (1973)
Daggett, V. & Fersht, A. The present view of the mechanism of protein folding. Nature Rev. Mol. Cell Biol. 4, 497–502 (2003)
Dinner, A. R., Sali, A., Smith, L. J., Dobson, C. M. & Karplus, M. Understanding protein folding via free-energy surfaces from theory and experiment. Trends Biochem. Sci. 25, 331–339 (2000)
Hidalgo, P. & MacKinnon, R. Revealing the architecture of a K+ channel pore through mutant cycles with a peptide inhibitor. Science 268, 307–310 (1995)
Horovitz, A. & Fersht, A. R. Co-operative interactions during protein folding. J. Mol. Biol. 224, 733–740 (1992)
Chen, J. & Stites, W. E. Higher-order packing interactions in triple and quadruple mutants of staphylococcal nuclease. Biochemistry 40, 14012–14019 (2001)
LiCata, V. J. & Ackers, G. K. Long-range, small magnitude nonadditivity of mutational effects in proteins. Biochemistry 34, 3133–3139 (1995)
Luque, I., Leavitt, S. A. & Freire, E. The linkage between protein folding and functional cooperativity: two sides of the same coin? Annu. Rev. Biophys. Biomol. Struct. 31, 235–256 (2002)
Gerstein, M. & Chothia, C. Packing at the protein-water interface. Proc. Natl Acad. Sci. USA 93, 10167–10172 (1996)
Richards, F. M. & Lim, W. A. An analysis of packing in the protein folding problem. Q. Rev. Biophys. 26, 423–498 (1993)
Lichtarge, O., Bourne, H. R. & Cohen, F. E. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996)
Lockless, S. W. & Ranganathan, R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999)
Hatley, M. E., Lockless, S. W., Gibson, S. K., Gilman, A. G. & Ranganathan, R. Allosteric determinants in guanine nucleotide-binding proteins. Proc. Natl Acad. Sci. USA 100, 14445–14450 (2003)
Shulman, A. I., Larson, C., Mangelsdorf, D. J. & Ranganathan, R. Structural determinants of allosteric ligand activation in RXR heterodimers. Cell 116, 417–429 (2004)
Suel, G. M., Lockless, S. W., Wall, M. A. & Ranganathan, R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nature Struct. Biol. 10, 59–69 (2003)
Peterson, F. C., Penkert, R. R., Volkman, B. F. & Prehoda, K. E. Cdc42 regulates the Par-6 PDZ domain through an allosteric CRIB-PDZ transition. Mol. Cell 13, 665–676 (2004)
Fuentes, E. J., Der, C. J. & Lee, A. L. Ligand-dependent dynamics and intramolecular signalling in a PDZ domain. J. Mol. Biol. 335, 1105–1115 (2004)
Estabrook, R. A. et al. Statistical coevolution analysis and molecular dynamics: identification of amino acid pairs essential for catalysis. Proc. Natl Acad. Sci. USA 102, 994–999 (2005)
Ota, N. & Agard, D. A. Intramolecular signalling pathways revealed by modeling anisotropic thermal diffusion. J. Mol. Biol. 351, 345–354 (2005)
Fodor, A. A. & Aldrich, R. W. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004)
Fodor, A. A. & Aldrich, R. W. On evolutionary conservation of thermodynamic coupling in proteins. J. Biol. Chem. 279, 19046–19050 (2004)
Russ, W. P., Lowery, D. M., Mishra, P., Yaffe, M. B. & Ranganathan, R. Natural-like function in artificial WW domains. Nature doi:10.1038/nature03990 (this issue)
Macias, M. J. et al. Structure of the WW domain of a kinase-associated protein complexed with a proline-rich peptide. Nature 382, 646–649 (1996)
Bork, P. & Sudol, M. The WW domain: a signalling site in dystrophin? Trends Biochem. Sci. 19, 531–533 (1994)
Jager, M., Nguyen, H., Crane, J. C., Kelly, J. W. & Gruebele, M. The folding mechanism of a β-sheet: the WW domain. J. Mol. Biol. 311, 373–393 (2001)
Koepf, E. K., Petrassi, H. M., Sudol, M. & Kelly, J. W. WW: An isolated three-stranded antiparallel β-sheet domain that unfolds and refolds reversibly; evidence for a structured hydrophobic cluster in urea and GdnHCl and a disordered thermal unfolded state. Protein Sci. 8, 841–853 (1999)
Bashford, D., Chothia, C. & Lesk, A. M. Determinants of a protein fold. Unique features of the globin amino acid sequences. J. Mol. Biol. 196, 199–216 (1987)
Wintjens, R. et al. 1H NMR study on the binding of Pin1 Trp-Trp domain with phosphothreonine peptides. J. Biol. Chem. 276, 25150–25156 (2001)
Kanelis, V., Rotin, D. & Forman-Kay, J. D. Solution structure of a Nedd4 WW domain-ENaC peptide complex. Nature Struct. Biol. 8, 407–412 (2001)
Schreiber, G. & Fersht, A. R. Energetics of protein-protein interactions: analysis of the barnase-barstar interface by single mutations and double mutant cycles. J. Mol. Biol. 248, 478–486 (1995)
Eriksson, A. E. et al. Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science 255, 178–183 (1992)
Bowie, J. U., Reidhaar-Olson, J. F., Lim, W. A. & Sauer, R. T. Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247, 1306–1310 (1990)
Liang, J. & Dill, K. A. Are proteins well-packed? Biophys. J. 81, 751–766 (2001)
Lindorff-Larsen, K., Best, R. B., Depristo, M. A., Dobson, C. M. & Vendruscolo, M. Simultaneous determination of protein structure and dynamics. Nature 433, 128–132 (2005)
Benkovic, S. J. & Hammes-Schiffer, S. A perspective on enzyme catalysis. Science 301, 1196–1202 (2003)
Eisenmesser, E. Z., Bosco, D. A., Akke, M. & Kern, D. Enzyme dynamics during catalysis. Science 295, 1520–1523 (2002)
Frauenfelder, H., McMahon, B. H. & Fenimore, P. W. Myoglobin: the hydrogen atom of biology and a paradigm of complexity. Proc. Natl Acad. Sci. USA 100, 8615–8617 (2003)
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Huang, X. et al. Structure of a WW domain containing fragment of dystrophin in complex with β-dystroglycan. Nature Struct. Biol. 7, 634–638 (2000)
Kasanov, J., Pirozzi, G., Uveges, A. J. & Kay, B. K. Characterizing Class I WW domains defines key specificity determinants and generates mutant domains with novel specificities. Chem. Biol. 8, 231–241 (2001)
John, D. M. & Weeks, K. M. van't Hoff enthalpies without baselines. Protein Sci. 9, 1416–1419 (2000)
Sklenar, V., Piotto, M., Leppik, R. & Saudek, V. Gradient-tailored water suppression for 1H-15N HSQC experiments optimized to retain full sensitivity. J. Magn. Reson. A 102, 241–245 (1993)
Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995)
Johnson, B. A. & Blevins, R. A. NMRView: a computer program for the visualization and analysis of NMR data. J. Biomol. NMR 4, 603–614 (1994)
Brünger, A. T. et al. Crystallography and NMR system (CNS): a new software system for macromolecular structure determination. Acta Crystallogr. D 54, 905–921 (1998)
Linge, J. P., O'Donoghue, S. I. & Nilges, M. Methods in Enzymology 71–90 (Academic Press, 2001)
Cornilescu, G., Delaglio, F. & Bax, A. Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR 13, 289–302 (1999)
Laskowski, R. A., Rullman, J. A. C., MacArthur, M. W., Kaptein, R. & Thornton, J. M. AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR 8, 477–486 (1996)
Koradi, R., Billeter, M. & Wüthrich, K. MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph. 14, 51–55 (1996)
Delano, W. L. The PyMOL Molecular Graphics Systemhttp://www.pymol.org (2002).
We thank A. G. Gilman, M. Rosen, and members of the Ranganathan laboratory for advice and critical review of the manuscript, and E. Olson for contributing to this project. R.R. and S.W.L. thank the students of the 2001 Woods Hole physiology course for participating in an initial phase of this work. This study was supported by the Robert A. Welch foundation (R.R. and K.H.G.), the Mallinckrodt Foundation Scholar Award (R.R.), and the NIH (K.H.G.). W.P.R. is an associate and R.R. is an investigator of the Howard Hughes Medical Institute.
Atomic coordinates for CC45 have been deposited in the Protein Data Bank under accession code 1YMZ. Reprints and permissions information is available at npg.nature.com/reprintsandpermissions. The authors declare no competing financial interests.
Supplementary Table S1, Supplementary Table S2, Supplementary Methods and Supplementary Figure Legends. (DOC 156 kb)
Statistical coupling analysis for one site in the WW domain family. (PDF 578 kb)
Summary of all sequences constructed and experimentally tested. (PDF 2191 kb)
Assessment of expression and solubility of all sequences constructed. (PDF 12056 kb)
Thermal denaturation and renaturation studies. (PDF 5249 kb)
One-dimensional proton NMR spectra for natural WW domains (a), CC sequences (b), and IC sequences (c). (PDF 6444 kb)
About this article
Cite this article
Socolich, M., Lockless, S., Russ, W. et al. Evolutionary information for specifying a protein fold. Nature 437, 512–518 (2005). https://doi.org/10.1038/nature03991
Scientific Reports (2022)
BMC Bioinformatics (2021)
Nature Biotechnology (2021)
Nature Machine Intelligence (2021)
Nature Communications (2021)