Evolutionary information for specifying a protein fold

Socolich, Michael; Lockless, Steve W.; Russ, William P.; Lee, Heather; Gardner, Kevin H.; Ranganathan, Rama

doi:10.1038/nature03991

Article
Published: 22 September 2005

Evolutionary information for specifying a protein fold

Michael Socolich^1,2^na1,
Steve W. Lockless^1,2^na1^nAff4,
William P. Russ^1,2,
Heather Lee^1,2,
Kevin H. Gardner^2,3 &
…
Rama Ranganathan^1,2

Nature volume 437, pages 512–518 (2005)Cite this article

6292 Accesses
313 Citations
14 Altmetric
Metrics details

Abstract

Classical studies show that for many proteins, the information required for specifying the tertiary structure is contained in the amino acid sequence. Here, we attempt to define the sequence rules for specifying a protein fold by computationally creating artificial protein sequences using only statistical information encoded in a multiple sequence alignment and no tertiary structure information. Experimental testing of libraries of artificial WW domain sequences shows that a simple statistical energy function capturing coevolution between amino acid residues is necessary and sufficient to specify sequences that fold into native structures. The artificial proteins show thermodynamic stabilities similar to natural WW domains, and structure determination of one artificial protein shows excellent agreement with the WW fold at atomic resolution. The relative simplicity of the information used for creating sequences suggests a marked reduction to the potential complexity of the protein-folding problem.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Figure 2: **Experimental evaluation of artificial proteins (part 1).**

Figure 3: **Experimental evaluation of artificial proteins (part 2).**

Figure 4: **Summary of experiments on all natural and artificial WW sequences.**

Figure 5: **NMR structure determination of CC45, an artificial WW domain.**

Deciphering protein evolution and fitness landscapes with latent space models

Article Open access 10 December 2019

Xinqiang Ding, Zhengting Zou & Charles L. Brooks III

A physics-based energy function allows the computational redesign of a PDZ domain

Article Open access 07 July 2020

Vaitea Opuu, Young Joo Sun, … Thomas Simonson

The developing toolkit of continuous directed evolution

Article 22 May 2020

Mary S. Morrison, Christopher J. Podracky & David R. Liu

References

Anfinsen, C. B. Principles that govern the folding of protein chains. Science 181, 223–230 (1973)
Article ADS CAS Google Scholar
Daggett, V. & Fersht, A. The present view of the mechanism of protein folding. Nature Rev. Mol. Cell Biol. 4, 497–502 (2003)
Article CAS Google Scholar
Dinner, A. R., Sali, A., Smith, L. J., Dobson, C. M. & Karplus, M. Understanding protein folding via free-energy surfaces from theory and experiment. Trends Biochem. Sci. 25, 331–339 (2000)
Article CAS Google Scholar
Hidalgo, P. & MacKinnon, R. Revealing the architecture of a K⁺ channel pore through mutant cycles with a peptide inhibitor. Science 268, 307–310 (1995)
Article ADS CAS Google Scholar
Horovitz, A. & Fersht, A. R. Co-operative interactions during protein folding. J. Mol. Biol. 224, 733–740 (1992)
Article CAS Google Scholar
Chen, J. & Stites, W. E. Higher-order packing interactions in triple and quadruple mutants of staphylococcal nuclease. Biochemistry 40, 14012–14019 (2001)
Article CAS Google Scholar
LiCata, V. J. & Ackers, G. K. Long-range, small magnitude nonadditivity of mutational effects in proteins. Biochemistry 34, 3133–3139 (1995)
Article CAS Google Scholar
Luque, I., Leavitt, S. A. & Freire, E. The linkage between protein folding and functional cooperativity: two sides of the same coin? Annu. Rev. Biophys. Biomol. Struct. 31, 235–256 (2002)
Article CAS Google Scholar
Gerstein, M. & Chothia, C. Packing at the protein-water interface. Proc. Natl Acad. Sci. USA 93, 10167–10172 (1996)
Article ADS CAS Google Scholar
Richards, F. M. & Lim, W. A. An analysis of packing in the protein folding problem. Q. Rev. Biophys. 26, 423–498 (1993)
Article CAS Google Scholar
Lichtarge, O., Bourne, H. R. & Cohen, F. E. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996)
Article CAS Google Scholar
Lockless, S. W. & Ranganathan, R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999)
Article CAS Google Scholar
Hatley, M. E., Lockless, S. W., Gibson, S. K., Gilman, A. G. & Ranganathan, R. Allosteric determinants in guanine nucleotide-binding proteins. Proc. Natl Acad. Sci. USA 100, 14445–14450 (2003)
Article ADS CAS Google Scholar
Shulman, A. I., Larson, C., Mangelsdorf, D. J. & Ranganathan, R. Structural determinants of allosteric ligand activation in RXR heterodimers. Cell 116, 417–429 (2004)
Article CAS Google Scholar
Suel, G. M., Lockless, S. W., Wall, M. A. & Ranganathan, R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nature Struct. Biol. 10, 59–69 (2003)
Article Google Scholar
Peterson, F. C., Penkert, R. R., Volkman, B. F. & Prehoda, K. E. Cdc42 regulates the Par-6 PDZ domain through an allosteric CRIB-PDZ transition. Mol. Cell 13, 665–676 (2004)
Article CAS Google Scholar
Fuentes, E. J., Der, C. J. & Lee, A. L. Ligand-dependent dynamics and intramolecular signalling in a PDZ domain. J. Mol. Biol. 335, 1105–1115 (2004)
Article CAS Google Scholar
Estabrook, R. A. et al. Statistical coevolution analysis and molecular dynamics: identification of amino acid pairs essential for catalysis. Proc. Natl Acad. Sci. USA 102, 994–999 (2005)
Article ADS CAS Google Scholar
Ota, N. & Agard, D. A. Intramolecular signalling pathways revealed by modeling anisotropic thermal diffusion. J. Mol. Biol. 351, 345–354 (2005)
Article CAS Google Scholar
Fodor, A. A. & Aldrich, R. W. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004)
Article CAS Google Scholar
Fodor, A. A. & Aldrich, R. W. On evolutionary conservation of thermodynamic coupling in proteins. J. Biol. Chem. 279, 19046–19050 (2004)
Article CAS Google Scholar
Russ, W. P., Lowery, D. M., Mishra, P., Yaffe, M. B. & Ranganathan, R. Natural-like function in artificial WW domains. Nature doi:10.1038/nature03990 (this issue)
Macias, M. J. et al. Structure of the WW domain of a kinase-associated protein complexed with a proline-rich peptide. Nature 382, 646–649 (1996)
Article ADS CAS Google Scholar
Bork, P. & Sudol, M. The WW domain: a signalling site in dystrophin? Trends Biochem. Sci. 19, 531–533 (1994)
Article CAS Google Scholar
Jager, M., Nguyen, H., Crane, J. C., Kelly, J. W. & Gruebele, M. The folding mechanism of a β-sheet: the WW domain. J. Mol. Biol. 311, 373–393 (2001)
Article CAS Google Scholar
Koepf, E. K., Petrassi, H. M., Sudol, M. & Kelly, J. W. WW: An isolated three-stranded antiparallel β-sheet domain that unfolds and refolds reversibly; evidence for a structured hydrophobic cluster in urea and GdnHCl and a disordered thermal unfolded state. Protein Sci. 8, 841–853 (1999)
Article CAS Google Scholar
Bashford, D., Chothia, C. & Lesk, A. M. Determinants of a protein fold. Unique features of the globin amino acid sequences. J. Mol. Biol. 196, 199–216 (1987)
Article CAS Google Scholar
Wintjens, R. et al. 1H NMR study on the binding of Pin1 Trp-Trp domain with phosphothreonine peptides. J. Biol. Chem. 276, 25150–25156 (2001)
Article CAS Google Scholar
Kanelis, V., Rotin, D. & Forman-Kay, J. D. Solution structure of a Nedd4 WW domain-ENaC peptide complex. Nature Struct. Biol. 8, 407–412 (2001)
Article CAS Google Scholar
Schreiber, G. & Fersht, A. R. Energetics of protein-protein interactions: analysis of the barnase-barstar interface by single mutations and double mutant cycles. J. Mol. Biol. 248, 478–486 (1995)
CAS PubMed Google Scholar
Eriksson, A. E. et al. Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science 255, 178–183 (1992)
Article ADS CAS Google Scholar
Bowie, J. U., Reidhaar-Olson, J. F., Lim, W. A. & Sauer, R. T. Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247, 1306–1310 (1990)
Article ADS CAS Google Scholar
Liang, J. & Dill, K. A. Are proteins well-packed? Biophys. J. 81, 751–766 (2001)
Article ADS CAS Google Scholar
Lindorff-Larsen, K., Best, R. B., Depristo, M. A., Dobson, C. M. & Vendruscolo, M. Simultaneous determination of protein structure and dynamics. Nature 433, 128–132 (2005)
Article ADS CAS Google Scholar
Benkovic, S. J. & Hammes-Schiffer, S. A perspective on enzyme catalysis. Science 301, 1196–1202 (2003)
Article ADS CAS Google Scholar
Eisenmesser, E. Z., Bosco, D. A., Akke, M. & Kern, D. Enzyme dynamics during catalysis. Science 295, 1520–1523 (2002)
Article ADS CAS Google Scholar
Frauenfelder, H., McMahon, B. H. & Fenimore, P. W. Myoglobin: the hydrogen atom of biology and a paradigm of complexity. Proc. Natl Acad. Sci. USA 100, 8615–8617 (2003)
Article ADS CAS Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article CAS Google Scholar
Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Article CAS Google Scholar
Huang, X. et al. Structure of a WW domain containing fragment of dystrophin in complex with β-dystroglycan. Nature Struct. Biol. 7, 634–638 (2000)
Article CAS Google Scholar
Kasanov, J., Pirozzi, G., Uveges, A. J. & Kay, B. K. Characterizing Class I WW domains defines key specificity determinants and generates mutant domains with novel specificities. Chem. Biol. 8, 231–241 (2001)
Article CAS Google Scholar
John, D. M. & Weeks, K. M. van't Hoff enthalpies without baselines. Protein Sci. 9, 1416–1419 (2000)
Article CAS Google Scholar
Sklenar, V., Piotto, M., Leppik, R. & Saudek, V. Gradient-tailored water suppression for ¹H-¹⁵N HSQC experiments optimized to retain full sensitivity. J. Magn. Reson. A 102, 241–245 (1993)
Article ADS CAS Google Scholar
Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995)
Article CAS Google Scholar
Johnson, B. A. & Blevins, R. A. NMRView: a computer program for the visualization and analysis of NMR data. J. Biomol. NMR 4, 603–614 (1994)
Article CAS Google Scholar
Brünger, A. T. et al. Crystallography and NMR system (CNS): a new software system for macromolecular structure determination. Acta Crystallogr. D 54, 905–921 (1998)
Article Google Scholar
Linge, J. P., O'Donoghue, S. I. & Nilges, M. Methods in Enzymology 71–90 (Academic Press, 2001)
Google Scholar
Cornilescu, G., Delaglio, F. & Bax, A. Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR 13, 289–302 (1999)
Article CAS Google Scholar
Laskowski, R. A., Rullman, J. A. C., MacArthur, M. W., Kaptein, R. & Thornton, J. M. AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR 8, 477–486 (1996)
Article CAS Google Scholar
Koradi, R., Billeter, M. & Wüthrich, K. MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph. 14, 51–55 (1996)
Article CAS Google Scholar
Delano, W. L. The PyMOL Molecular Graphics Systemhttp://www.pymol.org (2002).

Download references

Acknowledgements

We thank A. G. Gilman, M. Rosen, and members of the Ranganathan laboratory for advice and critical review of the manuscript, and E. Olson for contributing to this project. R.R. and S.W.L. thank the students of the 2001 Woods Hole physiology course for participating in an initial phase of this work. This study was supported by the Robert A. Welch foundation (R.R. and K.H.G.), the Mallinckrodt Foundation Scholar Award (R.R.), and the NIH (K.H.G.). W.P.R. is an associate and R.R. is an investigator of the Howard Hughes Medical Institute.

Author information

Steve W. Lockless
Present address: The Rockefeller University, 1230 York Avenue, New York, New York, 10021, USA
Michael Socolich and Steve W. Lockless: *These authors contributed equally to this work

Authors and Affiliations

Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Texas, 75390-9050, Dallas, USA
Michael Socolich, Steve W. Lockless, William P. Russ, Heather Lee & Rama Ranganathan
Departments of Pharmacology, University of Texas Southwestern Medical Center, Texas, 75390-9050, Dallas, USA
Michael Socolich, Steve W. Lockless, William P. Russ, Heather Lee, Kevin H. Gardner & Rama Ranganathan
Departments of Biochemistry, University of Texas Southwestern Medical Center, Texas, 75390-9050, Dallas, USA
Kevin H. Gardner

Authors

Michael Socolich
View author publications
You can also search for this author in PubMed Google Scholar
Steve W. Lockless
View author publications
You can also search for this author in PubMed Google Scholar
William P. Russ
View author publications
You can also search for this author in PubMed Google Scholar
Heather Lee
View author publications
You can also search for this author in PubMed Google Scholar
Kevin H. Gardner
View author publications
You can also search for this author in PubMed Google Scholar
Rama Ranganathan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rama Ranganathan.

Ethics declarations

Competing interests

Atomic coordinates for CC45 have been deposited in the Protein Data Bank under accession code 1YMZ. Reprints and permissions information is available at npg.nature.com/reprintsandpermissions. The authors declare no competing financial interests.

Supplementary information

Supplementary Notes

Supplementary Table S1, Supplementary Table S2, Supplementary Methods and Supplementary Figure Legends. (DOC 156 kb)

Supplementary Figure S1

Statistical coupling analysis for one site in the WW domain family. (PDF 578 kb)

Supplementary Figure S2

Summary of all sequences constructed and experimentally tested. (PDF 2191 kb)

Supplementary Figure S3

Assessment of expression and solubility of all sequences constructed. (PDF 12056 kb)

Supplementary Figure S4

Thermal denaturation and renaturation studies. (PDF 5249 kb)

Supplementary Figure S5

One-dimensional proton NMR spectra for natural WW domains (a), CC sequences (b), and IC sequences (c). (PDF 6444 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Socolich, M., Lockless, S., Russ, W. et al. Evolutionary information for specifying a protein fold. Nature 437, 512–518 (2005). https://doi.org/10.1038/nature03991

Download citation

Received: 25 January 2005
Accepted: 30 June 2005
Issue Date: 22 September 2005
DOI: https://doi.org/10.1038/nature03991

This article is cited by

Machine learning-aided design and screening of an emergent protein function in synthetic cells
- Shunshi Kohyama
- Béla P. Frohn
- Petra Schwille
Nature Communications (2024)
Extracting phylogenetic dimensions of coevolution reveals hidden functional signals
- Alexandre Colavin
- Esha Atolia
- Kerwyn Casey Huang
Scientific Reports (2022)
PPalign: optimal alignment of Potts models representing proteins with direct coupling information
- Hugo Talibart
- François Coste
BMC Bioinformatics (2021)
MPL resolves genetic linkage in fitness inference from complex evolutionary histories
- Muhammad Saqib Sohail
- Raymond H. Y. Louie
- John P. Barton
Nature Biotechnology (2021)
Expanding functional protein sequence spaces using generative adversarial networks
- Donatas Repecka
- Vykintas Jauniskis
- Aleksej Zelezniak
Nature Machine Intelligence (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Evolutionary information for specifying a protein fold

Abstract

Access options

Similar content being viewed by others

Deciphering protein evolution and fitness landscapes with latent space models

A physics-based energy function allows the computational redesign of a PDZ domain

The developing toolkit of continuous directed evolution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Notes

Supplementary Figure S1

Supplementary Figure S2

Supplementary Figure S3

Supplementary Figure S4

Supplementary Figure S5

Rights and permissions

About this article

Cite this article

This article is cited by

Machine learning-aided design and screening of an emergent protein function in synthetic cells

Extracting phylogenetic dimensions of coevolution reveals hidden functional signals

PPalign: optimal alignment of Potts models representing proteins with direct coupling information

MPL resolves genetic linkage in fitness inference from complex evolutionary histories

Expanding functional protein sequence spaces using generative adversarial networks

Comments

Form and function instructions

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links