Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Detection of parallel functional modules by comparative analysis of genome sequences

Abstract

Parallel functional modules are separate sets of proteins in an organism that catalyze the same or similar biochemical reactions but act on different substrates or use different cofactors. They originate by gene duplication during evolution. Parallel functional modules provide versatility and complexity to organisms, and increase cellular flexibility and robustness. We have developed a four-step approach for genome-wide discovery of parallel modules from protein functional linkages. From ten genomes, we identified 37 cellular systems that consist of parallel functional modules. This approach recovers known parallel complexes and pathways, and discovers new ones that conventional homology-based methods did not previously reveal, as illustrated by examples of peptide transporters in Escherichia coli and nitrogenases in Rhodopseudomonas palustris. The approach untangles intertwined functional linkages between parallel functional modules and expands our ability to decode protein functions from genome sequences.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The four-step approach for detecting parallel functional modules.
Figure 2: RNA polymerases display a parallel complex pattern in the clustered genome-wide functional linkage map of D. melanogaster.
Figure 3: The peptide transporters in E. coli K12 display a parallel complex pattern.
Figure 4: The nitrogenases in R. palustris display a parallel functional module pattern.
Figure 5: The functional linkage networks of nitrogenase proteins in R. palustris before and after untangling the parallel functional modules.

Similar content being viewed by others

References

  1. Henikoff, S. et al. Gene families: the taxonomy of protein paralogs and chimeras. Science 278, 609–614 (1997).

    Article  CAS  PubMed  Google Scholar 

  2. Teichmann, S.A., Park, J. & Chothia, C. Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc. Natl. Acad. Sci. USA 95, 14658–14663 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Brenner, S.E., Hubbard, T., Murzin, A. & Chothia, C. Gene duplications in H. influenzae. Nature 378, 140 (1995).

    Article  CAS  PubMed  Google Scholar 

  4. Gough, J., Karplus, K., Hughey, R. & Chothia, C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 313, 903–919 (2001).

    Article  CAS  PubMed  Google Scholar 

  5. Kelley, B.P. et al. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl. Acad. Sci. USA 100, 11394–11399 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  7. Huynen, M.A., Snel, B., von Mering, C. & Bork, P. Function prediction and protein networks. Curr. Opin. Cell Biol. 15, 191–198 (2003).

    Article  CAS  PubMed  Google Scholar 

  8. Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D. & Yeates, T.O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Marcotte, E.M. et al. Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999).

    Article  CAS  PubMed  Google Scholar 

  10. Enright, A.J., Iliopoulos, I., Kyrpides, N.C. & Ouzounis, C.A. Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999).

    Article  CAS  PubMed  Google Scholar 

  11. Dandekar, T., Snel, B., Huynen, M. & Bork, P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998).

    Article  CAS  PubMed  Google Scholar 

  12. Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G.D. & Maltsev, N. The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Pellegrini, M., Thompson, M., Fierro, J. & Bowers, P. Computational method to assign microbial genes to pathways. J. Cell. Biochem. Suppl. 37, 106–109 (2001).

    Article  PubMed  Google Scholar 

  14. Bowers, P.M. et al. Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol. 5, R35 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Eisenberg, D., Marcotte, E.M., Xenarios, I. & Yeates, T.O. Protein function in the post-genomic era. Nature 405, 823–826 (2000).

    Article  CAS  PubMed  Google Scholar 

  16. Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Strong, M. et al. Visualization and interpretation of protein networks in Mycobacterium tuberculosis based on hierarchical clustering of genome-wide functional linkage maps. Nucleic Acids Res. 31, 7099–7109 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Ramani, A.K. & Marcotte, E.M. Exploiting the co-evolution of interacting proteins to discover interaction specificity. J. Mol. Biol. 327, 273–284 (2003).

    Article  CAS  PubMed  Google Scholar 

  19. Gertz, J. et al. Inferring protein interactions from phylogenetic distance matrices. Bioinformatics 19, 2039–2045 (2003).

    Article  CAS  PubMed  Google Scholar 

  20. Baumann, P., Qureshi, S.A. & Jackson, S.P. Transcription: new insights from studies on Archaea. Trends Genet. 11, 279–283 (1995).

    Article  CAS  PubMed  Google Scholar 

  21. Archambault, J. & Friesen, J.D. Genetics of eukaryotic RNA polymerases I, II, and III. Microbiol. Rev. 57, 703–724 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Falkenburg, D., Dworniczak, B., Faust, D.M. & Bautz, E.K. RNA polymerase II of Drosophila. Relation of its 140,000 Mr subunit to the beta subunit of Escherichia coli RNA polymerase. J. Mol. Biol. 195, 929–937 (1987).

    Article  CAS  PubMed  Google Scholar 

  23. Seifarth, W. et al. Identification of the genes coding for the second-largest subunits of RNA polymerases I and III of Drosophila melanogaster. Mol. Gen. Genet. 228, 424–432 (1991).

    Article  CAS  PubMed  Google Scholar 

  24. Knackmuss, S., Bautz, E.F. & Petersen, G. Identification of the gene coding for the largest subunit of RNA polymerase I (A) of Drosophila melanogaster. Mol. Gen. Genet. 253, 529–534 (1997).

    Article  CAS  PubMed  Google Scholar 

  25. Jokerst, R.S., Weeks, J.R., Zehring, W.A. & Greenleaf, A.L. Analysis of the gene encoding the largest subunit of RNA polymerase II in Drosophila. Mol. Gen. Genet. 215, 266–275 (1989).

    Article  CAS  PubMed  Google Scholar 

  26. The FlyBase database of the. Drosophila genome projects and community literature. Nucleic Acids Res. 31, 172–175 (2003).

  27. Linton, K.J. & Higgins, C.F. The Escherichia coli ATP-binding cassette (ABC) proteins. Mol. Microbiol. 28, 5–13 (1998).

    Article  CAS  PubMed  Google Scholar 

  28. Hogarth, B.G. & Higgins, C.F. Genetic organization of the oligopeptide permease (opp) locus of Salmonella typhimurium and Escherichia coli. J. Bacteriol. 153, 1548–1551 (1983).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Smith, M.W., Tyreman, D.R., Payne, G.M., Marshall, N.J. & Payne, J.W. Substrate specificity of the periplasmic dipeptide-binding protein from Escherichia coli: experimental basis for the design of peptide prodrugs. Microbiology 145, 2891–2901 (1999).

    Article  CAS  PubMed  Google Scholar 

  30. Parra-Lopez, C., Baer, M.T. & Groisman, E.A. Molecular genetic analysis of a locus required for resistance to antimicrobial peptides in Salmonella typhimurium. EMBO J. 12, 4053–4062 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Chen, H.Y., Weng, S.F. & Lin, J.W. Identification and analysis of the sap genes from Vibrio fischeri belonging to the ATP-binding cassette gene family required for peptide transport and resistance to antimicrobial peptides. Biochem. Biophys. Res. Commun. 269, 743–748 (2000).

    Article  CAS  PubMed  Google Scholar 

  32. Navarro, C., Wu, L.F. & Mandrand-Berthelot, M.A. The nik operon of Escherichia coli encodes a periplasmic binding-protein-dependent transport system for nickel. Mol. Microbiol. 9, 1181–1191 (1993).

    Article  CAS  PubMed  Google Scholar 

  33. Nakai, K. & Horton, P. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci. 24, 34–36 (1999).

    Article  CAS  PubMed  Google Scholar 

  34. Eady, R.R. Structure-function relationships of alternative nitrogenases. Chem. Rev. 96, 3013–3030 (1996).

    Article  CAS  PubMed  Google Scholar 

  35. Howard, J.B. & Rees, D.C. Structural basis of biological nitrogen fixation. Chem. Rev. 96, 2965–2982 (1996).

    Article  CAS  PubMed  Google Scholar 

  36. Larimer, F.W. et al. Complete genome sequence of the metabolically versatile photosynthetic bacterium Rhodopseudomonas palustris. Nat. Biotechnol. 22, 55–61 (2004).

    Article  CAS  PubMed  Google Scholar 

  37. Brigle, K.E., Weiss, M.C., Newton, W.E. & Dean, D.R. Products of the iron-molybdenum cofactor-specific biosynthetic genes, nifE and nifN, are structurally homologous to the products of the nitrogenase molybdenum-iron protein genes, nifD and nifK. J. Bacteriol. 169, 1547–1553 (1987).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Wolfinger, E.D. & Bishop, P.E. Nucleotide sequence and mutational analysis of the vnfENX region of Azotobacter vinelandii. J. Bacteriol. 173, 7565–7572 (1991).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Hagting, A., Kunji, E.R., Leenhouts, K.J., Poolman, B. & Konings, W.N. The di- and tripeptide transport protein of Lactococcus lactis. A new type of bacterial peptide transporter. J. Biol. Chem. 269, 11391–11399 (1994).

    CAS  PubMed  Google Scholar 

  40. Higgins, C.F. & Gibson, M.M. Peptide transport in bacteria. Methods Enzymol. 125, 365–377 (1986).

    Article  CAS  PubMed  Google Scholar 

  41. Gur, E. et al. The Escherichia coli DjlA and CbpA proteins can substitute for DnaJ in DnaK-mediated protein disaggregation. J. Bacteriol. 186, 7236–7242 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Kaneko, T. et al. Complete genomic sequence of nitrogen-fixing symbiotic bacterium Bradyrhizobium japonicum USDA110. DNA Res. 9, 189–197 (2002).

    Article  PubMed  Google Scholar 

  43. Gelfand, M.S. & Koonin, E.V. Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. Nucleic Acids Res. 25, 2430–2439 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Morano, K.A., Liu, P.C. & Thiele, D.J. Protein chaperones and the heat shock response in Saccharomyces cerevisiae. Curr. Opin. Microbiol. 1, 197–203 (1998).

    Article  CAS  PubMed  Google Scholar 

  45. Piekny, A.J., Johnson, J.L., Cham, G.D. & Mains, P.E. The Caenorhabditis elegans nonmuscle myosin genes nmy-1 and nmy-2 function as redundant components of the let-502/Rho-binding kinase and mel-11/myosin phosphatase pathway during embryonic morphogenesis. Development 130, 5695–5704 (2003).

    Article  CAS  PubMed  Google Scholar 

  46. Price, N.T., Jackson, V.N. & Halestrap, A.P. Cloning and sequencing of four new mammalian monocarboxylate transporter (MCT) homologues confirms the existence of a transporter family with an ancient past. Biochem. J. 329, 321–328 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Heschl, M.F. & Baillie, D.L. The HSP70 multigene family of Caenorhabditis elegans. Comp. Biochem. Physiol. B 96, 633–637 (1990).

    Article  CAS  PubMed  Google Scholar 

  48. Bettencourt, B.R. & Feder, M.E. Rapid concerted evolution via gene conversion at the Drosophila hsp70 genes. J. Mol. Evol. 54, 569–586 (2002).

    Article  CAS  PubMed  Google Scholar 

  49. Bettencourt, B.R. & Feder, M.E. Hsp70 duplication in the Drosophila melanogaster species group: how and when did two become five? Mol. Biol. Evol. 18, 1272–1282 (2001).

    Article  CAS  PubMed  Google Scholar 

  50. Borner, G.H., Sherrier, D.J., Stevens, T.J., Arkin, I.T. & Dupree, P. Prediction of glycosylphosphatidylinositol-anchored proteins in Arabidopsis. A genomic analysis. Plant Physiol. 129, 486–499 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Lin, B.L. et al. Genomic analysis of the Hsp70 superfamily in Arabidopsis thaliana. Cell Stress Chaperones 6, 201–208 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank Thomas G. Graeber and Michael J. Thompson for helpful discussions and HHMI and Department of Energy for support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Eisenberg.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Schematic of the Four-Step Approach (PDF 18 kb)

Supplementary Methods (PDF 92 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H., Pellegrini, M. & Eisenberg, D. Detection of parallel functional modules by comparative analysis of genome sequences. Nat Biotechnol 23, 253–260 (2005). https://doi.org/10.1038/nbt1065

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt1065

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing