Abstract
Does a protein's secondary structure determine its three-dimensional fold? This question is tested directly by analyzing proteins of known structure and constructing a taxonomy based solely on secondary structure. The taxonomy is generated automatically, and it takes the form of a tree in which proteins with similar secondary structure occupy neighboring leaves. Our tree is largely in agreement with results from the structural classification of proteins (SCOP), a multidimensional classification based on homologous sequences, full three-dimensional structure, information about chemistry and evolution, and human judgment. Our findings suggest a simple mechanism of protein evolution.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
References
Minor, D.L. Jr. & Kim, P.S. Context-dependent secondary structure formation of a designed protein sequence. Nature 380, 730–734 ( 1996).
Itahaki, L.S., Otzen, D.E. & Fersht, A.R. The structure of the transition state for folding of chymotrypsin inhibitor 2 analysed by protein engineering methods: evidence for a nucleation–condensation mechanism for protein folding. J. Mol. Biol. 254, 260–288 (1995).
Shao, X. & Matthews, C.R. Single-tryptophan mutants of monomeric tryptophan repressor: optical spectroscopy reveals nonnative structure in a model for an early folding intermediate. Biochemistry 37, 7850–7858 (1998).
Clark, P.L., Liu, Z.-P., Rizo, J. & Gierasch, L.M. Cavity formation before stable hydrogen bonding in the folding of a beta-clam protein. Nature Struct. Biol. 4, 883–886 (1997).
Yee, D.P., Chan, H.S., Havel, T.F. & Dill, K.A. Does compactness induce secondary structure in proteins? A study of poly-alanine chains computed by distance geometry. J. Mol. Biol. 241, 557–573 (1994).
Havel, T.F., Crippen, G.M. & Kuntz, I.D. Effects of distance constraints on macromolecular conformation. II. Simulation of experimental results and theoretical predictions. Biopolymers 18, 73–81 (1979).
Reymond, M.T., Merutka, G., Dyson, H.J. & Wright, P.E. Folding propensities of peptide fragments of myoglobin. Protein Sci. 6, 706–716 (1997).
Dyson, H.J. et al. Folding of peptide fragments comprising the complete sequence of proteins. Models for initiation of protein folding II. Plastocyanin. J. Mol. Biol. 226, 819–835 (1992).
Srinivasan, R. & Rose, G.D. LINUS—a simple algorithm to predict the fold of a protein. Proteins Struct. Funct. Genet. 22, 81–99 (1995).
Murzin, A.G., Brenner, S.E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995).
Madej, T., Gibrat, J-F. & Bryant, S.H. Threading a database of protein cores. Proteins Struct. Funct. Genet. 23, 356– 369 (1995).
Mitchell, E.M., Artymiuk, P.J., Rice, D.W. & Willett, P. Use of techniques derived from graph theory to compare secondary structure motifs in proteins. J. Mol. Biol. 212, 151 –166 (1990).
Di Francesco, V., Garnier, J. & Munson, P.J. Protein topology recognition from secondary structure sequences: application of the hidden markov models to the alpha class proteins. J. Mol. Biol. 267, 446– 463 (1997).
Russell, R.B., Copley, R.R. & Barton, G.J. Protein fold recognition by mapping predicted secondary structures. J. Mol. Biol. 259, 349– 365 (1996).
Rost, B., Schneider, R. & Sander, C. Protein fold recognition by prediction-based threading. J Mol Biol 270, 471–480 (1997).
Rice, D.W. & Eisenberg, D. A 3D–1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. J. Mol. Biol. 267, 1026– 1038 (1997).
Aurora, R. & Rose, G.D. Seeking an ancient enzyme in Methanococcus jannaschii using ORF, a program based on predicted secondary structure comparisons. Proc. Natl. Acad. Sci. USA 95 , 2818–2823 (1998).
Holm, L. & Sander, C. Mapping the protein universe. Science 273, 595–603 ( 1996).
Needleman, S.B. & Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443– 453 (1970).
Sander, C. & Schneider, R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins Struct. Funct. Genet. 9, 56–68 (1991).
Doolittle, R.F. The multiplicity of domains in proteins. Annu. Rev. Biochem. 64, 287–314 (1995).
Doolittle, R.F. Of Urfs and Orfs 1-1–103 (University Science Books, Sausalito, California; 1986).
Altschul, S.F., Boguski, M.S., Gish, W. & Wootton, J.C. Issues in searching molecular sequence databases. Nat. Genet. 6, 119–129 (1994).
Smith, H.O., Annau, T.M. & Chandrasegaran, S. Finding sequence motifs in groups of functionally related proteins. Proc Natl Acad Sci USA 87, 826 –830 (1990).
Lipman, D.J. & Pearson, W.R. Rapid and sensitive protein similarity searches. Science 227, 1435– 1441 (1985).
Neuwald, A.F., Liu, J.S., Lipman, D.J. & Lawrence, C.E. Extracting protein alignment models from the sequence database. Nucleic Acids Res. 25, 1665–1677 ( 1997).
Henikoff, S. & Henikoff, J.G. Embedding strategies for effective use of information from multiple sequence alignments. Protein Sci. 6, 698–705 ( 1997).
Luthy, R., Bowie, J.U. & Eisenberg, D. Assessment of protein models with three-dimensional profiles. Nature 356, 83– 85 (1992).
Gibrat, J-F., Madej, T. & Bryant, S.H. Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6, 377–385 (1996).
Hobohm, U. & Sander, C. Enlarged representative set of protein structures. Protein Sci. 3, 522– 524 (1994).
Bernstein, F.C. et al. The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542 (1977).
Levitt, M. & Chothia, C. Structural patterns in globular proteins. Nature 261, 552– 558 (1976).
Thompson, J.D., Higgins, D.G. & Gibson, T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).
Saitou, N. & Nei, M. The neighborhood-joining method: a new method for reconstructing phylogenic trees. Mol. Biol. Evol. 4, 406–424 (1987).
Richardson, J.S. The anatomy and taxonomy of protein structure. Adv. Prot. Chem. 34, 168–340 ( 1981).
Orengo, C.A., Michie, A.D., Jones, D.T., Swindells, M.B. & Thornton, J.M. CATH—a hierarchic classification of protein domain structures. Structure 5, 1093–1108 (1997).
Holm, L. & Sander, C. Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233, 123–138 (1993).
King, J. Genetic analysis of protein folding pathways. Biotechnology 4, 297–303 (1986).
Lattman, E.E. & Rose, G.D. Protein folding — what's the question? Proc. Natl. Acad. Sci. USA 90, 439–441 (1993).
Aurora, R., Creamer, T.P., Srinivasan, R. & Rose, G.D. Local interactions in protein folding: lessons from the α-helix. J. Biol. Chem. 272, 1413–1416 (1997).
Baldwin, R.L. & Rose, G.D. Is protein folding hierarchic? I. Local structure and peptide folding. Trends Biochem. Sci. 24, 26–33 (1999).
Holm, L. & Sander, C. An evolutionary treasure: unification of a broad set of amidohydrolases related to urease. Proteins Struct. Funct. Genet. 28, 72–82 (1997).
Waterman, M.S. Introduction to computational biology: maps, sequences, and genomes (Chapman & Hall, London;1995).
Cohen, J. & Farach, M. In Proc. of eighth ann. ACM–SIAM symp. on discrete algorithms. (Association for Computing Machinery, New York; 410–416; 1997).
Acknowledgements
We thank R. Srinivasan, V. Murthy and P. Thiessen for helpful suggestions, and J. Cohen for providing access to his tree-construction program, Tande. We are particularly indebted to an anonymous referee for assistance in bringing this paper to fruition. Supported by the Sloan Foundation (T.P.) andthe NIH (G.D.R.).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Przytycka, T., Aurora, R. & Rose, G. A protein taxonomy based on secondary structure. Nat Struct Mol Biol 6, 672–682 (1999). https://doi.org/10.1038/10728
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/10728
This article is cited by
-
Combination of site directed mutagenesis and secondary structure analysis predicts the amino acids essential for stability of M. leprae MurE
Interdisciplinary Sciences: Computational Life Sciences (2014)
-
Outer membrane proteins can be simply identified using secondary structure element alignment
BMC Bioinformatics (2011)
-
Improving protein secondary structure prediction based on short subsequences with local structure similarity
BMC Genomics (2010)
-
DescFold: A web server for protein fold recognition
BMC Bioinformatics (2009)
-
TIM-Finder: A new method for identifying TIM-barrel proteins
BMC Structural Biology (2009)