EC-BLAST: a tool to automatically search and compare enzyme reactions

Journal name:
Nature Methods
Year published:
Published online

We present EC-BLAST (, an algorithm and Web tool for quantitative similarity searches between enzyme reactions at three levels: bond change, reaction center and reaction structure similarity. It uses bond changes and reaction patterns for all known biochemical reactions derived from atom-atom mapping across each reaction. EC-BLAST has the potential to improve enzyme classification, identify previously uncharacterized or new biochemical transformations, improve the assignment of enzyme function to sequences, and assist in enzyme engineering.

At a glance


  1. Overview of the EC-BLAST method.
    Figure 1: Overview of the EC-BLAST method.
  2. All-by-all comparison across [sim]6,000 mapped representative enzyme reactions in the EC-BLAST database (Supplementary Data 1).
    Figure 2: All-by-all comparison across ~6,000 mapped representative enzyme reactions in the EC-BLAST database (Supplementary Data 1).

    (a) Typical output from a reaction query search as a ranked list of reactions. The searches are based on the reaction similarity metrics for (i) bond changes, (ii) reaction centers and (iii) structure similarity. Arrow color illustrates similarity between reactions; green to red represents highly similar to most dissimilar, respectively. (b) Distribution of Jaccard similarity scores (Tw) for the three different metrics, shown as density plots for Tw > 0. The yellow violin shapes indicate the kernel density estimation of the data at different scores; the thick black lines indicate the middle two quadrants in the distribution of each score; and the white circles give the median for each metric. (c) Accuracy plot for the prediction of IUBMB EC sub-subclass, derived for the three metrics for a given cutoff. (d) Receiver operating characteristic plot showing the efficacy of the different measures (with a given cutoff) to correctly predict the EC classification measured for all EC numbers. The area under the curve for EC sub-subclass using the reactions-center scoring is 0.87.

  3. Characterizing the universe of enzyme reactions using EC-BLAST.
    Figure 3: Characterizing the universe of enzyme reactions using EC-BLAST.

    (a) Distribution of overall top 20 bond changes in the six primary IUBMB EC classes, calculated from ~6,000 reactions. (b) Hierarchical clustering of IUBMB EC classes based on bond changes, using Euclidean distance and the Ward method. “C(R/S)” denotes stereo changes associated with carbon chiral inversion; “%” denotes bond changes in a ring system; and “↔” denotes a change of bond order. The top five bond changes in the six IUBMB EC primary classes are shown. (c) Clustering of 5,073 representative reactions, using a combination of bond and reaction-center similarity scores. Each sphere represents one reaction, colored by primary IUBMB EC class (using BioLayout23). All reaction similarity clusters with P <0.01 and cluster size of at least three reactions are shown arranged in a network according to reaction similarity. Circles indicate two clusters (i and ii) with mixed EC classes. (d) All reaction similarity clusters with P <0.01 and cluster size of more than ten reactions are shown.


  1. Thompson, R.H.S. Science 137, 405408 (1962).
  2. Tipton, K. & Boyce, S. Bioinformatics 16, 3440 (2000).
  3. Yamanishi, Y., Hattori, M., Kotera, M., Goto, S. & Kanehisa, M. Bioinformatics 25, i179i186 (2009).
  4. Gasteiger, J. Handbook of Chemoinformatics (Wiley, 2003).
  5. Chen, L. & Gasteiger, J. J. Am. Chem. Soc. 119, 40334042 (1997).
  6. Leber, M., Egelhofer, V., Schomburg, I. & Schomburg, D. Bioinformatics 25, 31353142 (2009).
  7. Faulon, J.-L., Misra, M., Martin, S., Sale, K. & Sapra, R. Bioinformatics 24, 225233 (2008).
  8. Kotera, M., Okuno, Y., Hattori, M., Goto, S. & Kanehisa, M. J. Am. Chem. Soc. 126, 1648716498 (2004).
  9. Egelhofer, V., Schomburg, I. & Schomburg, D. PLoS Comput. Biol. 6, e1000661 (2010).
  10. O'Boyle, N.M., Holliday, G.L., Almonacid, D.E. & Mitchell, J.B.O. J. Mol. Biol. 368, 14841499 (2007).
  11. Zhang, Q.-Y. & Aires-De-Sousa, J. J. Chem. Inf. Model. 45, 17751783 (2005).
  12. Latino, D.A.R.S.D. & Aires-de-Sousa, J.J. Angew. Chem. Int. Edn Engl. 45, 20662069 (2006).
  13. Mu, F., Unkefer, P.J., Unkefer, C.J. & Hlavacek, W.S. Bioinformatics 22, 30823088 (2006).
  14. Chen, W.L., Chen, D.Z. & Taylor, K.T. Wiley Interdiscip. Rev. Comput. Mol. Sci. 3, 560593 (2013).
  15. Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. Nucleic Acids Res. 40, D109D114 (2012).
  16. Rahman, S.A., Bashton, M., Holliday, G.L., Schrader, R. & Thornton, J.M. J. Cheminform. 1, 12 (2009).
  17. Jochum, C., Gasteiger, J. & Ugi, I. Angew. Chem. Int. Edn Engl. 19, 495505 (1980).
  18. Ugi, I. et al. Angew. Chem. Int. Edn Engl. 18, 111123 (1979).
  19. Steinbeck, C. et al. Curr. Pharm. Des. 12, 21112120 (2006).
  20. Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. Bioinformatics 21, 39403941 (2005).
  21. Cuff, A.L. et al. Nucleic Acids Res. 39, D420D426 (2011).
  22. Lees, J. et al. Nucleic Acids Res. 40, D465D471 (2012).
  23. Theocharidis, A., van Dongen, S., Enright, A.J. & Freeman, T.C. Nat. Protoc. 4, 15351550 (2009).
  24. Dalby, A. et al. J. Chem. Inf. Model. 32, 244255 (1992).
  25. Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D. & Pletnev, I. J. Cheminform. 5, 7 (2013).
  26. Rahman, S.A., Advani, P., Schunk, R., Schrader, R. & Schomburg, D. Bioinformatics 21, 11891193 (2005).
  27. Rahman, S.A. Pathway Hunter Tool (PHT) – A Platform for Metabolic Network Analysis and Potential Drug Targeting. PhD thesis, Univ. Cologne (2007).
  28. Dugundji, J. & Ugi, I. in Computers in Chemistry 1964 (Springer, 1973).
  29. Cahn, R.S., Ingold, C. & Prelog, V. Angew. Chem. Int. Edn Engl. 5, 385415 (1966).
  30. Prelog, V. & Helmchen, G.N. Angew. Chem. Int. Edn Engl. 21, 567583 (1982).
  31. Faulon, J.-L. & Bender, A. Handbook of Chemoinformatics Algorithms (Chapman and Hall/CRC, 2010).
  32. O'Boyle, N.M. et al. J. Cheminform. 3, 33 (2011).

Download references

Author information


  1. European Molecular Biology Laboratory–European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK.

    • Syed Asad Rahman,
    • Sergio Martinez Cuesta,
    • Nicholas Furnham,
    • Gemma L Holliday &
    • Janet M Thornton
  2. Present addresses: Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, London, UK (N.F.); Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, USA (G.L.H.).

    • Nicholas Furnham &
    • Gemma L Holliday


S.A.R. developed the algorithm, code and the EC-BLAST tool. S.A.R. and J.M.T. wrote the majority of the manuscript and performed the statistical analysis. S.M.C. and G.L.H. were involved in curating the molecules, testing the chemical validity of the reaction similarity clusters and helping in the manuscript write up. S.A.R. and N.F. performed the analysis of the PPI family and the write-up. J.M.T. supervised the whole project and the manuscript write-up.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (7,145 KB)

    Supplementary Figures 1–3, Supplementary Tables 1–3, Supplementary Results and Supplementary Notes 1 and 2

Zip files

  1. Supplementary Data 1 (280,912 KB)

    Source raw data for Figure 2 containing similarity scores between EC-Reactions, cluster information, etc.

Excel files

  1. Supplementary Data 2 (3,572 KB)

    Source data for Supplementary Figure 1

  2. Supplementary Data 3 (3,573 KB)

    Source data for Supplementary Figure 2

Additional data