Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Revealing the hidden functional diversity of an enzyme family


Millions of protein database entries are not assigned reliable functions, preventing the full understanding of chemical diversity in living organisms. Here, we describe an integrated strategy for the discovery of various enzymatic activities catalyzed within protein families of unknown or little known function. This approach relies on the definition of a generic reaction conserved within the family, high-throughput enzymatic screening on representatives, structural and modeling investigations and analysis of genomic and metabolic context. As a proof of principle, we investigated the DUF849 Pfam family and unearthed 14 potential new enzymatic activities, leading to the designation of these proteins as β-keto acid cleavage enzymes. We propose an in vivo role for four enzymatic activities and suggest key residues for guiding further functional annotation. Our results show that the functional diversity within a family may be largely underestimated. The extension of this strategy to other families will improve our knowledge of the enzymatic landscape.

Your institute does not have access to this article

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: The BKACE active site pocket and reaction.
Figure 2: Substrates used in enzymatic assays.
Figure 3: Enzymatic screening of the BKACE family.
Figure 4: Dividing the BKACE family into groups of similar active sites.
Figure 5: Proposed in vivo roles of the newly discovered BKACEs.

Accession codes


Protein Data Bank


  1. Galperin, M.Y. & Koonin, E.V. From complete genome sequence to 'complete' understanding? Trends Biotechnol. 28, 398–406 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Roberts, R.J. Identifying protein function–a call for community action. PLoS Biol. 2, E42 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Karp, P.D. Call for an enzyme genomics initiative. Genome Biol. 5, 401 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Hanson, A.D., Pribat, A., Waller, J.C. & de Crecy-Lagard, V. 'Unknown' proteins and 'orphan' enzymes: the missing half of the engineering parts list–and how to find it. Biochem. J. 425, 1–11 (2010).

    CAS  Article  Google Scholar 

  5. Gifford, L.K., Carter, L.G., Gabanyi, M.J., Berman, H.M. & Adams, P.D. The Protein Structure Initiative Structural Biology Knowledgebase Technology Portal: a structural biology web resource. J. Struct. Funct. Genomics 13, 57–62 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. Gerlt, J.A. et al. The Enzyme Function Initiative. Biochemistry 50, 9950–9962 (2011).

    CAS  Article  PubMed  Google Scholar 

  7. Lukk, T. et al. Homology models guide discovery of diverse enzyme specificities among dipeptide epimerases in the enolase superfamily. Proc. Natl. Acad. Sci. USA 109, 4122–4127 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. Finn, R.D. et al. The Pfam protein families database. Nucleic Acids Res. 38, D211–D222 (2010).

    CAS  Article  PubMed  Google Scholar 

  9. Furnham, N., Garavelli, J.S., Apweiler, R. & Thornton, J.M. Missing in action: enzyme functional annotations in biological databases. Nat. Chem. Biol. 5, 521–525 (2009).

    CAS  Article  PubMed  Google Scholar 

  10. Huang, H. et al. Divergence of structure and function in the haloacid dehalogenase enzyme superfamily: Bacteroides thetaiotaomicron BT2127 is an inorganic pyrophosphatase. Biochemistry 50, 8937–8949 (2011).

    CAS  Article  PubMed  Google Scholar 

  11. Schnoes, A.M., Brown, S.D., Dodevski, I. & Babbitt, P.C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput. Biol. 5, e1000605 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Lespinet, O. & Labedan, B. Orphan enzymes? Science 307, 42 (2005).

    CAS  Article  PubMed  Google Scholar 

  13. Kreimeyer, A. et al. Identification of the last unknown genes in the fermentation pathway of lysine. J. Biol. Chem. 282, 7191–7197 (2007).

    CAS  Article  PubMed  Google Scholar 

  14. Bellinzoni, M. et al. 3-Keto-5-aminohexanoate cleavage enzyme: a common fold for an uncommon Claisen-type condensation. J. Biol. Chem. 286, 27399–27405 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–D114 (2012).

    CAS  Article  PubMed  Google Scholar 

  16. Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 40, D742–D753 (2012).

    CAS  Article  PubMed  Google Scholar 

  17. Deniélou, Y.P., Sagot, M.F., Boyer, F. & Viari, A. Bacterial syntenies: an exact approach with gene quorum. BMC Bioinformatics 12, 193 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  18. de Melo-Minardi, R.C., Bastard, K. & Artiguenave, F. Identification of subfamily-specific sites based on active sites modeling and clustering. Bioinformatics 26, 3075–3082 (2010).

    CAS  Article  PubMed  Google Scholar 

  19. Strehl, A. & Ghosh, J. Cluster ensembles—a knowledge reuse framework for combining partitionings. J. Mach. Learn. Res. 3, 583–617 (2002).

    Google Scholar 

  20. Pan, H., Bao, W., Xie, Z., Zhang, J. & Li, Y. Molecular cloning and characterization of a cis-epoxysuccinate hydrolase from Bordetella sp. BK-52. J. Microbiol. Biotechnol. 20, 659–665 (2010).

    CAS  Article  PubMed  Google Scholar 

  21. Bao, W. et al. Analysis of essential amino acid residues for catalytic activity of cis-epoxysuccinate hydrolase from Bordetella sp. BK-52. Appl. Microbiol. Biotechnol. (2013).

  22. Pelletier, E. et al. “Candidatus Cloacamonas acidaminovorans”: genome sequence reconstruction provides a first glimpse of a new bacterial division. J. Bacteriol. 190, 2572–2579 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. Uanschou, C., Frieht, R. & Pittner, F. What to learn from a comparative genomic sequence analysis of L-carnitine dehydrogenase. Monatsh. Chem. 136, 1365–1381 (2005).

    CAS  Article  Google Scholar 

  24. Wargo, M.J. & Hogan, D.A. Identification of genes required for Pseudomonas aeruginosa carnitine catabolism. Microbiology 155, 2411–2419 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. Bar-Even, A. et al. The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry 50, 4402–4410 (2011).

    CAS  Article  PubMed  Google Scholar 

  26. Kleber, H.P. Bacterial carnitine metabolism. FEMS Microbiol. Lett. 147, 1–9 (1997).

    CAS  Article  PubMed  Google Scholar 

  27. Collier, L.S., Gaines, G.L. III & Neidle, E.L. Regulation of benzoate degradation in Acinetobacter sp. strain ADP1 by BenM, a LysR-type transcriptional activator. J. Bacteriol. 180, 2493–2501 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Yalpani, M., Willecke, K. & Lynen, F. Triacetic acid lactone, a derailment product of fatty acid biosynthesis. Eur. J. Biochem. 8, 495–502 (1969).

    CAS  Article  PubMed  Google Scholar 

  29. Xie, D. et al. Microbial synthesis of triacetic acid lactone. Biotechnol. Bioeng. 93, 727–736 (2006).

    CAS  Article  PubMed  Google Scholar 

  30. Monticello, D.J. & Costilow, R.N. Interconversion of valine and leucine by Clostridium sporogenes. J. Bacteriol. 152, 946–949 (1982).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Magrane, M. & Consortium, U. UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford) 2011, bar009 (2011).

    Article  Google Scholar 

  32. Howe, K., Bateman, A. & Durbin, R. QuickTree: building huge neighbour-joining trees of protein sequences. Bioinformatics 18, 1546–1547 (2002).

    CAS  Article  PubMed  Google Scholar 

  33. Katoh, K., Kuma, K., Toh, H. & Miyata, T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  34. Brown, D.P., Krishnamurthy, N. & Sjolander, K. Automated protein subfamily identification and classification. PLoS Comput. Biol. 3, e160 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Fisher, D. Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139–172 (1987).

    Google Scholar 

  36. Puranen, J.S., Vainio, M.J. & Johnson, M.S. Accurate conformation-dependent molecular electrostatic potentials for high-throughput in silico drug discovery. J. Comput. Chem. 31, 1722–1732 (2010).

    CAS  PubMed  Google Scholar 

  37. Pettersen, E.F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).

    CAS  Article  PubMed  Google Scholar 

  38. Trott, O. & Olson, A.J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Rozen, S. & Skaletsky, H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 132, 365–386 (2000).

    CAS  PubMed  Google Scholar 

  40. Ralser, M. et al. An efficient and economic enhancer mix for PCR. Biochem. Biophys. Res. Commun. 347, 747–751 (2006).

    CAS  Article  PubMed  Google Scholar 

  41. Aslanidis, C. & de Jong, P.J. Ligation-independent cloning of PCR products (LIC-PCR). Nucleic Acids Res. 18, 6069–6074 (1990).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  42. Moriyama, T. & Srere, P.A. Purification of rat heart and rat liver citrate synthases. Physical, kinetic, and immunological studies. J. Biol. Chem. 246, 3217–3223 (1971).

    CAS  PubMed  Google Scholar 

  43. Swart, M., Snijders, J.G. & van Duijnenb, Th.P. Polarizabilities of amino acid residues. J. Comp. Meth. Sci. Eng. 4, 419–425 (2004).

    CAS  Google Scholar 

Download references


We are grateful to M. Besnard-Gonnet, D. Baud and A. Fossey for excellent technical assistance and to S. Tricot and L. Stuani for MS analyses. We also thank P.L. Saaidi, G. Cohen and M. Stam for helpful discussions and D. Roche, P. Bowe and A. Tolonen for useful comments on the manuscript. This work was supported by grants from Commissariat à l'énergie atomique et aux énergies alternatives (CEA), the CNRS and the University of Evry. A.A.T. Smith has been supported by the MICROME project, European Union Framework Program 7 Collaborative Project (222886-2).

Author information

Authors and Affiliations



D.V. and M.S. designed and supervised this study. K.B. and A.A.T.S. conducted all of the bioinformatics analyses. F.A. and R.D.M.-M. initiated structural bioinformatics. C.V.-V. and A.Z. contributed to the chemical sections. K.B., A.A.T.S., D.V. and M.S. wrote the manuscript. C.M. and J.W. contributed to the design of this study. A.K. gave support to metabolic analysis. A.P. and V.D.B. supervised experiments, and J.-L.P., V.P., A.D., A.M., N.P., M.B., C.L. and C.P. performed the experiments.

Corresponding author

Correspondence to Marcel Salanoubat.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Data Set descriptions, Supplementary Results, Supplementary Notes 1–4, Supplementary Tables 1–6 and Supplementary Figures 1–7. (PDF 3235 kb)

Supplementary Data Set 1

Multiple sequence alignment of the 725 sequences of the DUF849 family. (TXT 278 kb)

Supplementary Data Set 2

The different clusters of the 725 sequences depending on active site composition, genomic context, phylogeny and sequence similarity. (XLSX 65 kb)

Supplementary Data Set 3

Primer sequences of the 322 candidates proteins. (XLSX 49 kb)

Supplementary Data Set 4

Pre-process data of the enzymatic screening. (XLSX 78 kb)

Supplementary Data Set 5

The “active site” hierarchical tree of the 725 sequences of the DUF849 family in nexus format. (TXT 10 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bastard, K., Smith, A., Vergne-Vaxelaire, C. et al. Revealing the hidden functional diversity of an enzyme family. Nat Chem Biol 10, 42–49 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing