Abstract
Parallel functional modules are separate sets of proteins in an organism that catalyze the same or similar biochemical reactions but act on different substrates or use different cofactors. They originate by gene duplication during evolution. Parallel functional modules provide versatility and complexity to organisms, and increase cellular flexibility and robustness. We have developed a four-step approach for genome-wide discovery of parallel modules from protein functional linkages. From ten genomes, we identified 37 cellular systems that consist of parallel functional modules. This approach recovers known parallel complexes and pathways, and discovers new ones that conventional homology-based methods did not previously reveal, as illustrated by examples of peptide transporters in Escherichia coli and nitrogenases in Rhodopseudomonas palustris. The approach untangles intertwined functional linkages between parallel functional modules and expands our ability to decode protein functions from genome sequences.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Henikoff, S. et al. Gene families: the taxonomy of protein paralogs and chimeras. Science 278, 609–614 (1997).
Teichmann, S.A., Park, J. & Chothia, C. Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc. Natl. Acad. Sci. USA 95, 14658–14663 (1998).
Brenner, S.E., Hubbard, T., Murzin, A. & Chothia, C. Gene duplications in H. influenzae. Nature 378, 140 (1995).
Gough, J., Karplus, K., Hughey, R. & Chothia, C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 313, 903–919 (2001).
Kelley, B.P. et al. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl. Acad. Sci. USA 100, 11394–11399 (2003).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Huynen, M.A., Snel, B., von Mering, C. & Bork, P. Function prediction and protein networks. Curr. Opin. Cell Biol. 15, 191–198 (2003).
Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D. & Yeates, T.O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999).
Marcotte, E.M. et al. Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999).
Enright, A.J., Iliopoulos, I., Kyrpides, N.C. & Ouzounis, C.A. Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999).
Dandekar, T., Snel, B., Huynen, M. & Bork, P. Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998).
Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G.D. & Maltsev, N. The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999).
Pellegrini, M., Thompson, M., Fierro, J. & Bowers, P. Computational method to assign microbial genes to pathways. J. Cell. Biochem. Suppl. 37, 106–109 (2001).
Bowers, P.M. et al. Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol. 5, R35 (2004).
Eisenberg, D., Marcotte, E.M., Xenarios, I. & Yeates, T.O. Protein function in the post-genomic era. Nature 405, 823–826 (2000).
Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).
Strong, M. et al. Visualization and interpretation of protein networks in Mycobacterium tuberculosis based on hierarchical clustering of genome-wide functional linkage maps. Nucleic Acids Res. 31, 7099–7109 (2003).
Ramani, A.K. & Marcotte, E.M. Exploiting the co-evolution of interacting proteins to discover interaction specificity. J. Mol. Biol. 327, 273–284 (2003).
Gertz, J. et al. Inferring protein interactions from phylogenetic distance matrices. Bioinformatics 19, 2039–2045 (2003).
Baumann, P., Qureshi, S.A. & Jackson, S.P. Transcription: new insights from studies on Archaea. Trends Genet. 11, 279–283 (1995).
Archambault, J. & Friesen, J.D. Genetics of eukaryotic RNA polymerases I, II, and III. Microbiol. Rev. 57, 703–724 (1993).
Falkenburg, D., Dworniczak, B., Faust, D.M. & Bautz, E.K. RNA polymerase II of Drosophila. Relation of its 140,000 Mr subunit to the beta subunit of Escherichia coli RNA polymerase. J. Mol. Biol. 195, 929–937 (1987).
Seifarth, W. et al. Identification of the genes coding for the second-largest subunits of RNA polymerases I and III of Drosophila melanogaster. Mol. Gen. Genet. 228, 424–432 (1991).
Knackmuss, S., Bautz, E.F. & Petersen, G. Identification of the gene coding for the largest subunit of RNA polymerase I (A) of Drosophila melanogaster. Mol. Gen. Genet. 253, 529–534 (1997).
Jokerst, R.S., Weeks, J.R., Zehring, W.A. & Greenleaf, A.L. Analysis of the gene encoding the largest subunit of RNA polymerase II in Drosophila. Mol. Gen. Genet. 215, 266–275 (1989).
The FlyBase database of the. Drosophila genome projects and community literature. Nucleic Acids Res. 31, 172–175 (2003).
Linton, K.J. & Higgins, C.F. The Escherichia coli ATP-binding cassette (ABC) proteins. Mol. Microbiol. 28, 5–13 (1998).
Hogarth, B.G. & Higgins, C.F. Genetic organization of the oligopeptide permease (opp) locus of Salmonella typhimurium and Escherichia coli. J. Bacteriol. 153, 1548–1551 (1983).
Smith, M.W., Tyreman, D.R., Payne, G.M., Marshall, N.J. & Payne, J.W. Substrate specificity of the periplasmic dipeptide-binding protein from Escherichia coli: experimental basis for the design of peptide prodrugs. Microbiology 145, 2891–2901 (1999).
Parra-Lopez, C., Baer, M.T. & Groisman, E.A. Molecular genetic analysis of a locus required for resistance to antimicrobial peptides in Salmonella typhimurium. EMBO J. 12, 4053–4062 (1993).
Chen, H.Y., Weng, S.F. & Lin, J.W. Identification and analysis of the sap genes from Vibrio fischeri belonging to the ATP-binding cassette gene family required for peptide transport and resistance to antimicrobial peptides. Biochem. Biophys. Res. Commun. 269, 743–748 (2000).
Navarro, C., Wu, L.F. & Mandrand-Berthelot, M.A. The nik operon of Escherichia coli encodes a periplasmic binding-protein-dependent transport system for nickel. Mol. Microbiol. 9, 1181–1191 (1993).
Nakai, K. & Horton, P. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci. 24, 34–36 (1999).
Eady, R.R. Structure-function relationships of alternative nitrogenases. Chem. Rev. 96, 3013–3030 (1996).
Howard, J.B. & Rees, D.C. Structural basis of biological nitrogen fixation. Chem. Rev. 96, 2965–2982 (1996).
Larimer, F.W. et al. Complete genome sequence of the metabolically versatile photosynthetic bacterium Rhodopseudomonas palustris. Nat. Biotechnol. 22, 55–61 (2004).
Brigle, K.E., Weiss, M.C., Newton, W.E. & Dean, D.R. Products of the iron-molybdenum cofactor-specific biosynthetic genes, nifE and nifN, are structurally homologous to the products of the nitrogenase molybdenum-iron protein genes, nifD and nifK. J. Bacteriol. 169, 1547–1553 (1987).
Wolfinger, E.D. & Bishop, P.E. Nucleotide sequence and mutational analysis of the vnfENX region of Azotobacter vinelandii. J. Bacteriol. 173, 7565–7572 (1991).
Hagting, A., Kunji, E.R., Leenhouts, K.J., Poolman, B. & Konings, W.N. The di- and tripeptide transport protein of Lactococcus lactis. A new type of bacterial peptide transporter. J. Biol. Chem. 269, 11391–11399 (1994).
Higgins, C.F. & Gibson, M.M. Peptide transport in bacteria. Methods Enzymol. 125, 365–377 (1986).
Gur, E. et al. The Escherichia coli DjlA and CbpA proteins can substitute for DnaJ in DnaK-mediated protein disaggregation. J. Bacteriol. 186, 7236–7242 (2004).
Kaneko, T. et al. Complete genomic sequence of nitrogen-fixing symbiotic bacterium Bradyrhizobium japonicum USDA110. DNA Res. 9, 189–197 (2002).
Gelfand, M.S. & Koonin, E.V. Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. Nucleic Acids Res. 25, 2430–2439 (1997).
Morano, K.A., Liu, P.C. & Thiele, D.J. Protein chaperones and the heat shock response in Saccharomyces cerevisiae. Curr. Opin. Microbiol. 1, 197–203 (1998).
Piekny, A.J., Johnson, J.L., Cham, G.D. & Mains, P.E. The Caenorhabditis elegans nonmuscle myosin genes nmy-1 and nmy-2 function as redundant components of the let-502/Rho-binding kinase and mel-11/myosin phosphatase pathway during embryonic morphogenesis. Development 130, 5695–5704 (2003).
Price, N.T., Jackson, V.N. & Halestrap, A.P. Cloning and sequencing of four new mammalian monocarboxylate transporter (MCT) homologues confirms the existence of a transporter family with an ancient past. Biochem. J. 329, 321–328 (1998).
Heschl, M.F. & Baillie, D.L. The HSP70 multigene family of Caenorhabditis elegans. Comp. Biochem. Physiol. B 96, 633–637 (1990).
Bettencourt, B.R. & Feder, M.E. Rapid concerted evolution via gene conversion at the Drosophila hsp70 genes. J. Mol. Evol. 54, 569–586 (2002).
Bettencourt, B.R. & Feder, M.E. Hsp70 duplication in the Drosophila melanogaster species group: how and when did two become five? Mol. Biol. Evol. 18, 1272–1282 (2001).
Borner, G.H., Sherrier, D.J., Stevens, T.J., Arkin, I.T. & Dupree, P. Prediction of glycosylphosphatidylinositol-anchored proteins in Arabidopsis. A genomic analysis. Plant Physiol. 129, 486–499 (2002).
Lin, B.L. et al. Genomic analysis of the Hsp70 superfamily in Arabidopsis thaliana. Cell Stress Chaperones 6, 201–208 (2001).
Acknowledgements
We thank Thomas G. Graeber and Michael J. Thompson for helpful discussions and HHMI and Department of Energy for support.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Fig. 1
Schematic of the Four-Step Approach (PDF 18 kb)
Rights and permissions
About this article
Cite this article
Li, H., Pellegrini, M. & Eisenberg, D. Detection of parallel functional modules by comparative analysis of genome sequences. Nat Biotechnol 23, 253–260 (2005). https://doi.org/10.1038/nbt1065
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt1065
This article is cited by
-
The genes of the sulphoquinovose catabolism in Escherichia coli are also associated with a previously unknown pathway of lactose degradation
Scientific Reports (2018)
-
Sugar Lego: gene composition of bacterial carbohydrate metabolism genomic loci
Biology Direct (2017)
-
Modularity analysis based on predicted protein-protein interactions provides new insights into pathogenicity and cellular process of Escherichia coli O157:H7
Theoretical Biology and Medical Modelling (2011)
-
A novel method to identify cooperative functional modules: study of module coordination in the Saccharomyces cerevisiae cell cycle
BMC Bioinformatics (2011)
-
Complex fate of paralogs
BMC Evolutionary Biology (2008)