Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Insight
  • Published:

A protein taxonomy based on secondary structure

Abstract

Does a protein's secondary structure determine its three-dimensional fold? This question is tested directly by analyzing proteins of known structure and constructing a taxonomy based solely on secondary structure. The taxonomy is generated automatically, and it takes the form of a tree in which proteins with similar secondary structure occupy neighboring leaves. Our tree is largely in agreement with results from the structural classification of proteins (SCOP), a multidimensional classification based on homologous sequences, full three-dimensional structure, information about chemistry and evolution, and human judgment. Our findings suggest a simple mechanism of protein evolution.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Similarity tree for the 183 proteins in our data set, generated automatically as described in the Methods.
Figure 2: The tree obtained using VAST11.
Figure 3: Primary sequence tree, constructed using the neighbor-joining (NJ) method34.

References

  1. Minor, D.L. Jr. & Kim, P.S. Context-dependent secondary structure formation of a designed protein sequence. Nature 380, 730–734 ( 1996).

    Article  CAS  Google Scholar 

  2. Itahaki, L.S., Otzen, D.E. & Fersht, A.R. The structure of the transition state for folding of chymotrypsin inhibitor 2 analysed by protein engineering methods: evidence for a nucleation–condensation mechanism for protein folding. J. Mol. Biol. 254, 260–288 (1995).

    Article  Google Scholar 

  3. Shao, X. & Matthews, C.R. Single-tryptophan mutants of monomeric tryptophan repressor: optical spectroscopy reveals nonnative structure in a model for an early folding intermediate. Biochemistry 37, 7850–7858 (1998).

    Article  CAS  Google Scholar 

  4. Clark, P.L., Liu, Z.-P., Rizo, J. & Gierasch, L.M. Cavity formation before stable hydrogen bonding in the folding of a beta-clam protein. Nature Struct. Biol. 4, 883–886 (1997).

    Article  CAS  Google Scholar 

  5. Yee, D.P., Chan, H.S., Havel, T.F. & Dill, K.A. Does compactness induce secondary structure in proteins? A study of poly-alanine chains computed by distance geometry. J. Mol. Biol. 241, 557–573 (1994).

    Article  CAS  Google Scholar 

  6. Havel, T.F., Crippen, G.M. & Kuntz, I.D. Effects of distance constraints on macromolecular conformation. II. Simulation of experimental results and theoretical predictions. Biopolymers 18, 73–81 (1979).

    Article  CAS  Google Scholar 

  7. Reymond, M.T., Merutka, G., Dyson, H.J. & Wright, P.E. Folding propensities of peptide fragments of myoglobin. Protein Sci. 6, 706–716 (1997).

    Article  CAS  Google Scholar 

  8. Dyson, H.J. et al. Folding of peptide fragments comprising the complete sequence of proteins. Models for initiation of protein folding II. Plastocyanin. J. Mol. Biol. 226, 819–835 (1992).

    Article  CAS  Google Scholar 

  9. Srinivasan, R. & Rose, G.D. LINUS—a simple algorithm to predict the fold of a protein. Proteins Struct. Funct. Genet. 22, 81–99 (1995).

    Article  CAS  Google Scholar 

  10. Murzin, A.G., Brenner, S.E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995).

    CAS  PubMed  Google Scholar 

  11. Madej, T., Gibrat, J-F. & Bryant, S.H. Threading a database of protein cores. Proteins Struct. Funct. Genet. 23, 356– 369 (1995).

    Article  CAS  Google Scholar 

  12. Mitchell, E.M., Artymiuk, P.J., Rice, D.W. & Willett, P. Use of techniques derived from graph theory to compare secondary structure motifs in proteins. J. Mol. Biol. 212, 151 –166 (1990).

    Article  CAS  Google Scholar 

  13. Di Francesco, V., Garnier, J. & Munson, P.J. Protein topology recognition from secondary structure sequences: application of the hidden markov models to the alpha class proteins. J. Mol. Biol. 267, 446– 463 (1997).

    Article  CAS  Google Scholar 

  14. Russell, R.B., Copley, R.R. & Barton, G.J. Protein fold recognition by mapping predicted secondary structures. J. Mol. Biol. 259, 349– 365 (1996).

    Article  CAS  Google Scholar 

  15. Rost, B., Schneider, R. & Sander, C. Protein fold recognition by prediction-based threading. J Mol Biol 270, 471–480 (1997).

    Article  CAS  Google Scholar 

  16. Rice, D.W. & Eisenberg, D. A 3D–1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. J. Mol. Biol. 267, 1026– 1038 (1997).

    Article  CAS  Google Scholar 

  17. Aurora, R. & Rose, G.D. Seeking an ancient enzyme in Methanococcus jannaschii using ORF, a program based on predicted secondary structure comparisons. Proc. Natl. Acad. Sci. USA 95 , 2818–2823 (1998).

    Article  CAS  Google Scholar 

  18. Holm, L. & Sander, C. Mapping the protein universe. Science 273, 595–603 ( 1996).

    Article  CAS  Google Scholar 

  19. Needleman, S.B. & Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443– 453 (1970).

    Article  CAS  Google Scholar 

  20. Sander, C. & Schneider, R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins Struct. Funct. Genet. 9, 56–68 (1991).

    Article  CAS  Google Scholar 

  21. Doolittle, R.F. The multiplicity of domains in proteins. Annu. Rev. Biochem. 64, 287–314 (1995).

    Article  CAS  Google Scholar 

  22. Doolittle, R.F. Of Urfs and Orfs 1-1–103 (University Science Books, Sausalito, California; 1986).

    Google Scholar 

  23. Altschul, S.F., Boguski, M.S., Gish, W. & Wootton, J.C. Issues in searching molecular sequence databases. Nat. Genet. 6, 119–129 (1994).

    Article  CAS  Google Scholar 

  24. Smith, H.O., Annau, T.M. & Chandrasegaran, S. Finding sequence motifs in groups of functionally related proteins. Proc Natl Acad Sci USA 87, 826 –830 (1990).

    Article  CAS  Google Scholar 

  25. Lipman, D.J. & Pearson, W.R. Rapid and sensitive protein similarity searches. Science 227, 1435– 1441 (1985).

    Article  CAS  Google Scholar 

  26. Neuwald, A.F., Liu, J.S., Lipman, D.J. & Lawrence, C.E. Extracting protein alignment models from the sequence database. Nucleic Acids Res. 25, 1665–1677 ( 1997).

    Article  CAS  Google Scholar 

  27. Henikoff, S. & Henikoff, J.G. Embedding strategies for effective use of information from multiple sequence alignments. Protein Sci. 6, 698–705 ( 1997).

    Article  CAS  Google Scholar 

  28. Luthy, R., Bowie, J.U. & Eisenberg, D. Assessment of protein models with three-dimensional profiles. Nature 356, 83– 85 (1992).

    Article  CAS  Google Scholar 

  29. Gibrat, J-F., Madej, T. & Bryant, S.H. Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6, 377–385 (1996).

    Article  CAS  Google Scholar 

  30. Hobohm, U. & Sander, C. Enlarged representative set of protein structures. Protein Sci. 3, 522– 524 (1994).

    Article  CAS  Google Scholar 

  31. Bernstein, F.C. et al. The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542 (1977).

    Article  CAS  Google Scholar 

  32. Levitt, M. & Chothia, C. Structural patterns in globular proteins. Nature 261, 552– 558 (1976).

    Article  CAS  Google Scholar 

  33. Thompson, J.D., Higgins, D.G. & Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).

    Article  CAS  Google Scholar 

  34. Saitou, N. & Nei, M. The neighborhood-joining method: a new method for reconstructing phylogenic trees. Mol. Biol. Evol. 4, 406–424 (1987).

    CAS  PubMed  Google Scholar 

  35. Richardson, J.S. The anatomy and taxonomy of protein structure. Adv. Prot. Chem. 34, 168–340 ( 1981).

    Google Scholar 

  36. Orengo, C.A., Michie, A.D., Jones, D.T., Swindells, M.B. & Thornton, J.M. CATH—a hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997).

    Article  CAS  Google Scholar 

  37. Holm, L. & Sander, C. Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233, 123–138 (1993).

    Article  CAS  Google Scholar 

  38. King, J. Genetic analysis of protein folding pathways. Biotechnology 4, 297–303 (1986).

    CAS  Google Scholar 

  39. Lattman, E.E. & Rose, G.D. Protein folding — what's the question? Proc. Natl. Acad. Sci. USA 90, 439–441 (1993).

    Article  CAS  Google Scholar 

  40. Aurora, R., Creamer, T.P., Srinivasan, R. & Rose, G.D. Local interactions in protein folding: lessons from the α-helix. J. Biol. Chem. 272, 1413–1416 (1997).

    Article  CAS  Google Scholar 

  41. Baldwin, R.L. & Rose, G.D. Is protein folding hierarchic? I. Local structure and peptide folding. Trends Biochem. Sci. 24, 26–33 (1999).

    Article  CAS  Google Scholar 

  42. Holm, L. & Sander, C. An evolutionary treasure: unification of a broad set of amidohydrolases related to urease. Proteins Struct. Funct. Genet. 28, 72–82 (1997).

    Article  CAS  Google Scholar 

  43. Waterman, M.S. Introduction to computational biology: maps, sequences, and genomes (Chapman & Hall, London;1995).

    Book  Google Scholar 

  44. Cohen, J. & Farach, M. In Proc. of eighth ann. ACM–SIAM symp. on discrete algorithms. (Association for Computing Machinery, New York; 410–416; 1997).

    Google Scholar 

Download references

Acknowledgements

We thank R. Srinivasan, V. Murthy and P. Thiessen for helpful suggestions, and J. Cohen for providing access to his tree-construction program, Tande. We are particularly indebted to an anonymous referee for assistance in bringing this paper to fruition. Supported by the Sloan Foundation (T.P.) andthe NIH (G.D.R.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George D. Rose.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Przytycka, T., Aurora, R. & Rose, G. A protein taxonomy based on secondary structure. Nat Struct Mol Biol 6, 672–682 (1999). https://doi.org/10.1038/10728

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/10728

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing