Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Opinion
  • Published:

Protein linguistics — a grammar for modular protein assembly?

Abstract

The correspondence between biology and linguistics at the level of sequence and lexical inventories, and of structure and syntax, has fuelled attempts to describe genome structure by the rules of formal linguistics. But how can we define protein linguistic rules? And how could compositional semantics improve our understanding of protein organization and functional plasticity?

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Grammatical hierarchies.
Figure 2: The Rosetta Stone.
Figure 3: Generation of a novel lexical protein entry.

Similar content being viewed by others

References

  1. Boguski, M. S. Biosequence exegesis. Science 286, 453–455 (1999).

    Article  CAS  Google Scholar 

  2. Baker, M. C. The atoms of language (Basic books, New York, 2001).

    Google Scholar 

  3. Pesole, G., Attimonelli, M. & Saccone, C. Linguistic approaches to the analysis of sequence information. Trends Biotechnol. 12, 401–408 (1994).

    Article  CAS  Google Scholar 

  4. Mantegna, R. N et al. Linguistic features of noncoding DNA sequences. Phys. Rev. Lett. 73, 3169–3172 (1994).

    Article  CAS  Google Scholar 

  5. Popov, O., Segal, D. M. & Trifonov, E. N. Linguistic complexity of protein sequences as compared to texts of human languages. Biosystems 38, 65–74 (1996).

    Article  CAS  Google Scholar 

  6. Doerfler, W. In search of more complex genetic codes — can linguistics be a guide? Med. Hypotheses 9, 563–579 (1982).

    Article  CAS  Google Scholar 

  7. Ji, S. Isomorphism between cell and human languages: molecular biological, bioinformatic and linguistic implications. Biosynthesis 44, 17–39 (1997).

    CAS  Google Scholar 

  8. Ji, S. & Ciobanu, G. Conformon-driven biopolymer shape changes in cell modelling. Biosystems 70, 165–181 (2002).

    Article  Google Scholar 

  9. Botstein, D. & Cherry, J. M. Molecular linguistics: extracting information from gene and protein sequences. Proc. Natl Acad. Sci. USA 94, 5506–5507 (1997).

    Article  CAS  Google Scholar 

  10. Editorial. Folding as grammar. Nature Struct. Biol. 9, 713 (2002).

  11. Brendel, V. & Busse, H. G. Genome structure described by formal languages. Nucleic Acids Res. 12, 2561–2568 (1984).

    Article  CAS  Google Scholar 

  12. Brendel, V., Beckman, J. S. & Trifonov, E. N. Linguistics of nucleotide sequences: morphology and comparison of vocabularies. J. Biomol. Struct. Dyn. 4, 11–21 (1986).

    Article  CAS  Google Scholar 

  13. Werner, E. Genome semantics, in silico multicellular systems and the central dogma. FEBS Lett. 579, 1779–1782 (2005).

    Article  CAS  Google Scholar 

  14. Searls, D. B. Linguistic approaches to biological sequences. Comput. Appl. Biosci. 13, 333–344 (1997).

    CAS  PubMed  Google Scholar 

  15. Searls, D. B. in Artificial Intelligence and Molecular Biology (ed. Hunter, L.) 47–121 (The MIT Press Classics Series and AAAI press, Cambridge, USA, 1993).

    Google Scholar 

  16. Searls, D. B. Using bioinformatics in gene and drug discovery. Drug Discov. Today 5, 135–143 (2000).

    Article  CAS  Google Scholar 

  17. Searls, D. B. Reading the book of life. Bioinformatics, 17, 579–580 (2001).

    Article  CAS  Google Scholar 

  18. Searls, D. B. The language of genes. Nature, 420, 211–217 (2002).

    Article  CAS  Google Scholar 

  19. Searls, D. B. Trees of life and of language, Nature 426, 391–392 (2003).

    Article  CAS  Google Scholar 

  20. Dong, S. & Searls, D. B. Gene structure prediction by linguistic methods. Genomics 23, 540–551 (1994).

    Article  CAS  Google Scholar 

  21. Koonin, E. V., Wolf, Y. I. & Karev, G. P. The structure of the protein universe and genome evolution. Nature 420, 218–223 (2002).

    Article  CAS  Google Scholar 

  22. Modular Protein Domains. (eds Cesareni,G., Gimona, M., Sudol, M. & Yaffe, M.) (WILEY-VCH, Weinheim, 2004).

  23. Papin, J. A., Hunter, T., Palsson, B. O. & Subramaniam, S. Reconstruction of cellular signalling networks and analysis of their properties. Nature Rev. Mol. Cell Biol. 6, 99–111 (2005).

    Article  CAS  Google Scholar 

  24. Barabasi, A. -L. & Oltvai, Z. N. Network biology: understanding the cell's functional organization. Nature Rev. Genet. 5, 101–113 (2004).

    Article  CAS  Google Scholar 

  25. Han, J. -D. et al. Evidence for dynamically organized modularity in the yeast protein–protein interaction network. Nature 430, 88–93 (2004).

    Article  CAS  Google Scholar 

  26. Wuchty, S. Scale-free behaviour in protein domain networks. Mol. Biol. Evol. 18, 1694–1702 (2001).

    Article  CAS  Google Scholar 

  27. Hartwell, L. H., Hopfield, J. J., Leibler, S. & Murray, A. W. From molecular to modular cell biology. Nature 402, C47–C52 (1999).

    Article  CAS  Google Scholar 

  28. Wuchty, S., Oltvai, Z. N. & Barabasi, A. -L. Evolutionary conservation of motif constituents in the yeast interaction network. Nature Genet. 35, 176–179 (2003).

    Article  CAS  Google Scholar 

  29. Pietrokovski, S, Hishon, J. & Trifonov, E. N. Linguistic measure of taxonomic and functional relatedness of nucleotide sequences. J. Biomol. Struct. 7, 1251–1268 (1990).

    Article  CAS  Google Scholar 

  30. Pietrokovski, S & Trifonov, E. N. Imported sequences in the mitochondrial yeast genome identified by nucleotide linguistics. Gene 122, 129–137 (1992).

    Article  CAS  Google Scholar 

  31. Pawson, T. Protein modules and signalling networks. Nature 373, 573–580 (1995).

    Article  CAS  Google Scholar 

  32. Przytycka, T., Aurora, R. & Rose, G. D. A protein taxonomy based on secondary structure. Nature Struct. Biol. 6, 672–682 (1999).

    Article  CAS  Google Scholar 

  33. Przytycka, T., Srinivasan, R. & Rose, G. D. Recursive domains in proteins. Prot. Sci. 11, 409–417 (2002).

    Article  CAS  Google Scholar 

  34. Sim, J., Kim, S. Y. & Lee, J. PPRODO: prediction of protein domain boundaries using neural networks. Proteins 59, 627–632 (2005).

    Article  CAS  Google Scholar 

  35. Sonnhammer, E. L. L. & Kahn, D. Modular arrangement of proteins as inferred from analysis of homology. Prot. Sci. 3, 482–492 (1994).

    Article  CAS  Google Scholar 

  36. Galzitskaya, O. V. & Melnik, B. S. Prediction of protein domain boundaries from sequence alone. Prot. Sci. 12, 696–701 (2003).

    Article  CAS  Google Scholar 

  37. Aasland, R. et al. Normalization of nomenclature for peptide motifs as ligands of modular protein domains. FEBS Lett. 513, 141–144 (2002).

    Article  CAS  Google Scholar 

  38. Arlinghaus, R. B. Bcr: a negative regulator of the Bcr–Abl oncoprotein in leukemia. Oncogene 21, 8560–8567 (2002).

    Article  CAS  Google Scholar 

  39. Park, S. -H., Zarrinpar, A. & Lim, W. A. Rewiring MAP kinase pathways using alternative scaffold assembly mechanisms. Science 299, 1061–1064 (2003).

    Article  CAS  Google Scholar 

  40. Dyson, H. J. & Wright, P. E. Intrinsically unstructured proteins and their functions. Nature Rev. Mol. Cell Biol. 6, 197–208 (2005).

    Article  CAS  Google Scholar 

  41. George, R. A. & Heringa, J. An analysis of protein domain linkers: their classification and role in protein folding. Prot. Eng. 15, 871–879 (2002).

    Article  CAS  Google Scholar 

  42. Pawson, T. Specificity in signal transduction: from phosphotyrosine–SH2 domain interactions to complex cellular systems. Cell 116, 191–203 (2004).

    Article  CAS  Google Scholar 

  43. Farooq, A., Sudol, M. & Zhou, M. -M. Two is better than one: structure function and mechanism of tandem domains. Nova Publications (in the press).

  44. Benner, S. A. & Gaucher, E. A. Evolution, language and analogy in functional genomics. Trends Genet. 17, 414–418 (2001).

    Article  CAS  Google Scholar 

  45. Vidal, M. Interactome modelling FEBS Lett. 579, 1834–1838 (2005).

    Article  CAS  Google Scholar 

  46. Zanzoni, A. et al. MINT: a Molecular INTeraction database. FEBS Lett. 513, 135–140 (2002).

    Article  CAS  Google Scholar 

  47. Sudol, M. From src homology modules to other signalling domains: proposal of the „Protein Recognition Code”. Oncogene 17, 1469–1474 (1998).

    Article  CAS  Google Scholar 

  48. Wuchty, S. & Almaas, E. Evolutionary cores of domain co-occurrence networks. BMC Evol. Biol. 5, 24 (2005).

    Article  Google Scholar 

Download references

Acknowledgements

I wish to thank M. C. Baker and M. Sudol for critically commenting on this manuscript, and the members of the Protein Modules Consortium for inspiring discussions. The author is supported by a Marie Curie Excellence Grant of the Framework Program 6 of the European Union.

Author information

Authors and Affiliations

Authors

Ethics declarations

Competing interests

The author declares no competing financial interests.

Related links

Related links

DATABASES

Artificial Intelligence and Molecular Biology (electronic text (PDF) of the out-of-print book)

Cytoscape

FEBS workshop on Modular Protein Domains: from functional plasticity to protein linguistics (official meeting web site)

Protein Modules Consortium

The BIND interaction database

The DIMA domain interaction map

The InterPro database

The MINT database

The Seefeld Convention

FURTHER INFORMATION

Mario Gimona's laboratory

Glossary

Affix

A meaningful element that cannot stand on its own but it is added to another element.

Automaton

A device that reads input, conventionally from left to right, and either recognizes or generates language.

Clause

A basic unit of grammatical structure that expresses a single thought.

Grammar

The part of a language that is responsible for assembling basic words into larger words, phrases and clauses in systematic ways. For simplicity, grammar may be viewed as a combination of syntax and morphology.

Lexica

The stocks of basic words.

Linguistics

The study of the nature, structure and variation of language (includes the sub-disciplines of morphology, syntax, semantics and pragmatics).

Module

Different languages have different concepts of a module but there are several shared ideas. Modules are similar to objects in an object-orientated language, although a module might contain many procedures and/or functions, which would correspond to many objects. In computer science, modules are described as a portion of a program that carries out a specific function and might be used alone or combined with other modules of the same program.

Phrase

A group of words that appear next to each other or stay together in the arrangement of a sentence and that form a syntactic unit.

Prefix

A meaningful element that cannot stand on its own but it is added to the beginning of another element.

Root

The core of a word, before prefixes and suffixes are attached.

Semantics

The branch of linguistics concerned with the meaning of linguistic expression.

Sentence

A basic unit of a language that expresses a complete thought.

Stem

Prefixes and suffixes attach to a stem in order to form a longer word.

Suffix

A meaningful element that cannot stand on its own but it is added to the end of another element.

Syntax

The branch of linguistics that studies how words are combined to make phrases and sentences.

Word

A freestanding portion of language with a coherent meaning.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gimona, M. Protein linguistics — a grammar for modular protein assembly?. Nat Rev Mol Cell Biol 7, 68–73 (2006). https://doi.org/10.1038/nrm1785

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrm1785

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing