Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Evolutionary information for specifying a protein fold


Classical studies show that for many proteins, the information required for specifying the tertiary structure is contained in the amino acid sequence. Here, we attempt to define the sequence rules for specifying a protein fold by computationally creating artificial protein sequences using only statistical information encoded in a multiple sequence alignment and no tertiary structure information. Experimental testing of libraries of artificial WW domain sequences shows that a simple statistical energy function capturing coevolution between amino acid residues is necessary and sufficient to specify sequences that fold into native structures. The artificial proteins show thermodynamic stabilities similar to natural WW domains, and structure determination of one artificial protein shows excellent agreement with the WW fold at atomic resolution. The relative simplicity of the information used for creating sequences suggests a marked reduction to the potential complexity of the protein-folding problem.

Your institute does not have access to this article

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: SCA-based protein design.
Figure 2: Experimental evaluation of artificial proteins (part 1).
Figure 3: Experimental evaluation of artificial proteins (part 2).
Figure 4: Summary of experiments on all natural and artificial WW sequences.
Figure 5: NMR structure determination of CC45, an artificial WW domain.


  1. Anfinsen, C. B. Principles that govern the folding of protein chains. Science 181, 223–230 (1973)

    ADS  CAS  Article  Google Scholar 

  2. Daggett, V. & Fersht, A. The present view of the mechanism of protein folding. Nature Rev. Mol. Cell Biol. 4, 497–502 (2003)

    CAS  Article  Google Scholar 

  3. Dinner, A. R., Sali, A., Smith, L. J., Dobson, C. M. & Karplus, M. Understanding protein folding via free-energy surfaces from theory and experiment. Trends Biochem. Sci. 25, 331–339 (2000)

    CAS  Article  Google Scholar 

  4. Hidalgo, P. & MacKinnon, R. Revealing the architecture of a K+ channel pore through mutant cycles with a peptide inhibitor. Science 268, 307–310 (1995)

    ADS  CAS  Article  Google Scholar 

  5. Horovitz, A. & Fersht, A. R. Co-operative interactions during protein folding. J. Mol. Biol. 224, 733–740 (1992)

    CAS  Article  Google Scholar 

  6. Chen, J. & Stites, W. E. Higher-order packing interactions in triple and quadruple mutants of staphylococcal nuclease. Biochemistry 40, 14012–14019 (2001)

    CAS  Article  Google Scholar 

  7. LiCata, V. J. & Ackers, G. K. Long-range, small magnitude nonadditivity of mutational effects in proteins. Biochemistry 34, 3133–3139 (1995)

    CAS  Article  Google Scholar 

  8. Luque, I., Leavitt, S. A. & Freire, E. The linkage between protein folding and functional cooperativity: two sides of the same coin? Annu. Rev. Biophys. Biomol. Struct. 31, 235–256 (2002)

    CAS  Article  Google Scholar 

  9. Gerstein, M. & Chothia, C. Packing at the protein-water interface. Proc. Natl Acad. Sci. USA 93, 10167–10172 (1996)

    ADS  CAS  Article  Google Scholar 

  10. Richards, F. M. & Lim, W. A. An analysis of packing in the protein folding problem. Q. Rev. Biophys. 26, 423–498 (1993)

    CAS  Article  Google Scholar 

  11. Lichtarge, O., Bourne, H. R. & Cohen, F. E. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996)

    CAS  Article  Google Scholar 

  12. Lockless, S. W. & Ranganathan, R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999)

    CAS  Article  Google Scholar 

  13. Hatley, M. E., Lockless, S. W., Gibson, S. K., Gilman, A. G. & Ranganathan, R. Allosteric determinants in guanine nucleotide-binding proteins. Proc. Natl Acad. Sci. USA 100, 14445–14450 (2003)

    ADS  CAS  Article  Google Scholar 

  14. Shulman, A. I., Larson, C., Mangelsdorf, D. J. & Ranganathan, R. Structural determinants of allosteric ligand activation in RXR heterodimers. Cell 116, 417–429 (2004)

    CAS  Article  Google Scholar 

  15. Suel, G. M., Lockless, S. W., Wall, M. A. & Ranganathan, R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nature Struct. Biol. 10, 59–69 (2003)

    Article  Google Scholar 

  16. Peterson, F. C., Penkert, R. R., Volkman, B. F. & Prehoda, K. E. Cdc42 regulates the Par-6 PDZ domain through an allosteric CRIB-PDZ transition. Mol. Cell 13, 665–676 (2004)

    CAS  Article  Google Scholar 

  17. Fuentes, E. J., Der, C. J. & Lee, A. L. Ligand-dependent dynamics and intramolecular signalling in a PDZ domain. J. Mol. Biol. 335, 1105–1115 (2004)

    CAS  Article  Google Scholar 

  18. Estabrook, R. A. et al. Statistical coevolution analysis and molecular dynamics: identification of amino acid pairs essential for catalysis. Proc. Natl Acad. Sci. USA 102, 994–999 (2005)

    ADS  CAS  Article  Google Scholar 

  19. Ota, N. & Agard, D. A. Intramolecular signalling pathways revealed by modeling anisotropic thermal diffusion. J. Mol. Biol. 351, 345–354 (2005)

    CAS  Article  Google Scholar 

  20. Fodor, A. A. & Aldrich, R. W. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004)

    CAS  Article  Google Scholar 

  21. Fodor, A. A. & Aldrich, R. W. On evolutionary conservation of thermodynamic coupling in proteins. J. Biol. Chem. 279, 19046–19050 (2004)

    CAS  Article  Google Scholar 

  22. Russ, W. P., Lowery, D. M., Mishra, P., Yaffe, M. B. & Ranganathan, R. Natural-like function in artificial WW domains. Nature doi:10.1038/nature03990 (this issue)

  23. Macias, M. J. et al. Structure of the WW domain of a kinase-associated protein complexed with a proline-rich peptide. Nature 382, 646–649 (1996)

    ADS  CAS  Article  Google Scholar 

  24. Bork, P. & Sudol, M. The WW domain: a signalling site in dystrophin? Trends Biochem. Sci. 19, 531–533 (1994)

    CAS  Article  Google Scholar 

  25. Jager, M., Nguyen, H., Crane, J. C., Kelly, J. W. & Gruebele, M. The folding mechanism of a β-sheet: the WW domain. J. Mol. Biol. 311, 373–393 (2001)

    CAS  Article  Google Scholar 

  26. Koepf, E. K., Petrassi, H. M., Sudol, M. & Kelly, J. W. WW: An isolated three-stranded antiparallel β-sheet domain that unfolds and refolds reversibly; evidence for a structured hydrophobic cluster in urea and GdnHCl and a disordered thermal unfolded state. Protein Sci. 8, 841–853 (1999)

    CAS  Article  Google Scholar 

  27. Bashford, D., Chothia, C. & Lesk, A. M. Determinants of a protein fold. Unique features of the globin amino acid sequences. J. Mol. Biol. 196, 199–216 (1987)

    CAS  Article  Google Scholar 

  28. Wintjens, R. et al. 1H NMR study on the binding of Pin1 Trp-Trp domain with phosphothreonine peptides. J. Biol. Chem. 276, 25150–25156 (2001)

    CAS  Article  Google Scholar 

  29. Kanelis, V., Rotin, D. & Forman-Kay, J. D. Solution structure of a Nedd4 WW domain-ENaC peptide complex. Nature Struct. Biol. 8, 407–412 (2001)

    CAS  Article  Google Scholar 

  30. Schreiber, G. & Fersht, A. R. Energetics of protein-protein interactions: analysis of the barnase-barstar interface by single mutations and double mutant cycles. J. Mol. Biol. 248, 478–486 (1995)

    CAS  PubMed  Google Scholar 

  31. Eriksson, A. E. et al. Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science 255, 178–183 (1992)

    ADS  CAS  Article  Google Scholar 

  32. Bowie, J. U., Reidhaar-Olson, J. F., Lim, W. A. & Sauer, R. T. Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247, 1306–1310 (1990)

    ADS  CAS  Article  Google Scholar 

  33. Liang, J. & Dill, K. A. Are proteins well-packed? Biophys. J. 81, 751–766 (2001)

    ADS  CAS  Article  Google Scholar 

  34. Lindorff-Larsen, K., Best, R. B., Depristo, M. A., Dobson, C. M. & Vendruscolo, M. Simultaneous determination of protein structure and dynamics. Nature 433, 128–132 (2005)

    ADS  CAS  Article  Google Scholar 

  35. Benkovic, S. J. & Hammes-Schiffer, S. A perspective on enzyme catalysis. Science 301, 1196–1202 (2003)

    ADS  CAS  Article  Google Scholar 

  36. Eisenmesser, E. Z., Bosco, D. A., Akke, M. & Kern, D. Enzyme dynamics during catalysis. Science 295, 1520–1523 (2002)

    ADS  CAS  Article  Google Scholar 

  37. Frauenfelder, H., McMahon, B. H. & Fenimore, P. W. Myoglobin: the hydrogen atom of biology and a paradigm of complexity. Proc. Natl Acad. Sci. USA 100, 8615–8617 (2003)

    ADS  CAS  Article  Google Scholar 

  38. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    CAS  Article  Google Scholar 

  39. Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)

    CAS  Article  Google Scholar 

  40. Huang, X. et al. Structure of a WW domain containing fragment of dystrophin in complex with β-dystroglycan. Nature Struct. Biol. 7, 634–638 (2000)

    CAS  Article  Google Scholar 

  41. Kasanov, J., Pirozzi, G., Uveges, A. J. & Kay, B. K. Characterizing Class I WW domains defines key specificity determinants and generates mutant domains with novel specificities. Chem. Biol. 8, 231–241 (2001)

    CAS  Article  Google Scholar 

  42. John, D. M. & Weeks, K. M. van't Hoff enthalpies without baselines. Protein Sci. 9, 1416–1419 (2000)

    CAS  Article  Google Scholar 

  43. Sklenar, V., Piotto, M., Leppik, R. & Saudek, V. Gradient-tailored water suppression for 1H-15N HSQC experiments optimized to retain full sensitivity. J. Magn. Reson. A 102, 241–245 (1993)

    ADS  CAS  Article  Google Scholar 

  44. Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995)

    CAS  Article  Google Scholar 

  45. Johnson, B. A. & Blevins, R. A. NMRView: a computer program for the visualization and analysis of NMR data. J. Biomol. NMR 4, 603–614 (1994)

    CAS  Article  Google Scholar 

  46. Brünger, A. T. et al. Crystallography and NMR system (CNS): a new software system for macromolecular structure determination. Acta Crystallogr. D 54, 905–921 (1998)

    Article  Google Scholar 

  47. Linge, J. P., O'Donoghue, S. I. & Nilges, M. Methods in Enzymology 71–90 (Academic Press, 2001)

    Google Scholar 

  48. Cornilescu, G., Delaglio, F. & Bax, A. Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR 13, 289–302 (1999)

    CAS  Article  Google Scholar 

  49. Laskowski, R. A., Rullman, J. A. C., MacArthur, M. W., Kaptein, R. & Thornton, J. M. AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR 8, 477–486 (1996)

    CAS  Article  Google Scholar 

  50. Koradi, R., Billeter, M. & Wüthrich, K. MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph. 14, 51–55 (1996)

    CAS  Article  Google Scholar 

  51. Delano, W. L. The PyMOL Molecular Graphics System (2002).

Download references


We thank A. G. Gilman, M. Rosen, and members of the Ranganathan laboratory for advice and critical review of the manuscript, and E. Olson for contributing to this project. R.R. and S.W.L. thank the students of the 2001 Woods Hole physiology course for participating in an initial phase of this work. This study was supported by the Robert A. Welch foundation (R.R. and K.H.G.), the Mallinckrodt Foundation Scholar Award (R.R.), and the NIH (K.H.G.). W.P.R. is an associate and R.R. is an investigator of the Howard Hughes Medical Institute.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Rama Ranganathan.

Ethics declarations

Competing interests

Atomic coordinates for CC45 have been deposited in the Protein Data Bank under accession code 1YMZ. Reprints and permissions information is available at The authors declare no competing financial interests.

Supplementary information

Supplementary Notes

Supplementary Table S1, Supplementary Table S2, Supplementary Methods and Supplementary Figure Legends. (DOC 156 kb)

Supplementary Figure S1

Statistical coupling analysis for one site in the WW domain family. (PDF 578 kb)

Supplementary Figure S2

Summary of all sequences constructed and experimentally tested. (PDF 2191 kb)

Supplementary Figure S3

Assessment of expression and solubility of all sequences constructed. (PDF 12056 kb)

Supplementary Figure S4

Thermal denaturation and renaturation studies. (PDF 5249 kb)

Supplementary Figure S5

One-dimensional proton NMR spectra for natural WW domains (a), CC sequences (b), and IC sequences (c). (PDF 6444 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Socolich, M., Lockless, S., Russ, W. et al. Evolutionary information for specifying a protein fold. Nature 437, 512–518 (2005).

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI:

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing