Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Interpretable machine learning for knowledge generation in heterogeneous catalysis

Abstract

Most applications of machine learning in heterogeneous catalysis thus far have used black-box models to predict computable physical properties (descriptors), such as adsorption or formation energies, that can be related to catalytic performance (that is, activity or stability). Extracting meaningful physical insights from these black-box models has proved challenging, as the internal logic of these black-box models is not readily interpretable due to their high degree of complexity. Interpretable machine learning methods that merge the predictive capacity of black-box models with the physical interpretability of physics-based models offer an alternative to black-box models. In this Perspective, we discuss the various interpretable machine learning methods available to catalysis researchers, highlight the potential of interpretable machine learning to accelerate hypothesis formation and knowledge generation, and outline critical challenges and opportunities for interpretable machine learning in heterogeneous catalysis.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Synergistic relationship between black-box and interpretable ML approaches.
Fig. 2: Schematic depiction of black-box, grey-box and glass-box ML methods.
Fig. 3: Grey-box methods in catalysis applications.
Fig. 4: Interpreting glass-box ML results.

Data availability

The panels of Fig. 3 and Fig. 4 were adapted from refs. 25,29,34,38,43,49,51,57,63 and have associated raw data.

References

  1. Vlachos, D. G. in Advances in Chemical Engineering Vol. 30 (ed. Marin, G. B.) 1–61 (Academic, 2005).

  2. Goldsmith, B. R., Esterhuizen, J., Liu, J.-X., Bartel, C. J. & Sutton, C. Machine learning for heterogeneous catalyst design and discovery. AlChE J. 64, 2311–2323 (2018).

    CAS  Google Scholar 

  3. Schlexer Lamoureux, P. et al. Machine learning for computational heterogeneous catalysis. ChemCatChem 11, 3581–3601 (2019).

    CAS  Google Scholar 

  4. Kitchin, J. R. Machine learning in catalysis. Nat. Catal. 1, 230–232 (2018).

    Google Scholar 

  5. Toyao, T. et al. Machine learning for catalysis informatics: recent applications and prospects. ACS Catal. 10, 2260–2297 (2020).

    CAS  Google Scholar 

  6. Artrith, N. & Kolpak, A. M. Understanding the composition and activity of electrocatalytic nanoalloys in aqueous solvents: a combination of DFT and accurate neural network potentials. Nano Lett. 14, 2670–2676 (2014).

    CAS  PubMed  Google Scholar 

  7. Boes, J. R. & Kitchin, J. R. Modeling segregation on AuPd(111) surfaces with density functional theory and Monte Carlo simulations. J. Phys. Chem. C 121, 3479–3487 (2017).

    CAS  Google Scholar 

  8. Ulissi, Z. W., Singh, A. R., Tsai, C. & Nørskov, J. K. Automated discovery and construction of surface phase diagrams using machine learning. J. Phys. Chem. Lett. 7, 3931–3935 (2016).

    CAS  PubMed  Google Scholar 

  9. Peterson, A. A. Acceleration of saddle-point searches with machine learning. J. Chem. Phys. 145, 074106 (2016).

    PubMed  Google Scholar 

  10. Ulissi, Z. W., Medford, A. J., Bligaard, T. & Nørskov, J. K. To address surface reaction network complexity using scaling relations machine learning and DFT calculations. Nat. Commun. 8, 14621 (2017).

    PubMed  PubMed Central  Google Scholar 

  11. Kolsbjerg, E. L., Peterson, A. A. & Hammer, B. Neural-network-enhanced evolutionary algorithm applied to supported metal nanoparticles. Phys. Rev. B 97, 195424 (2018).

    CAS  Google Scholar 

  12. Jennings, P. C., Lysgaard, S., Hummelshøj, J. S., Vegge, T. & Bligaard, T. Genetic algorithms for computational materials discovery accelerated by machine learning. NPJ Comput. Mater. 5, 46 (2019).

    Google Scholar 

  13. Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R. & Yu, B. Definitions, methods, and applications in interpretable machine learning. Proc. Natl Acad. Sci. USA 116, 22071–22080 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Caruana, R. et al. Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1721–1730 (ACM, 2015).

  15. Unceta, I., Nin, J. & Pujol, O. Towards global explanations for credit risk scoring. Preprint at https://arxiv.org/abs/1811.07698 (2018).

  16. Tan, S., Caruana, R., Hooker, G. & Lou, Y. Distill-and-compare: auditing black-box models using transparent model distillation. Proc. 2018 AAAI/ACM Conference on AI, Ethics, and Society 303–310 (ACM, 2018)

  17. Azodi, C. B., Tang, J. & Shiu, S.-H. Opening the black box: interpretable machine learning for geneticists. Trends Genet. 36, 442–455 (2020).

    CAS  PubMed  Google Scholar 

  18. Dybowski, R. Interpretable machine learning as a tool for scientific discovery in chemistry. New J. Chem. 44, 20914–20920 (2020).

    CAS  Google Scholar 

  19. Rothenberg, G. Data mining in catalysis: separating knowledge from garbage. Catal. Today 137, 2–10 (2008).

    CAS  Google Scholar 

  20. Janet, J. P. & Kulik, H. J. Resolving transition metal chemical space: feature selection for machine learning and structure–property relationships. J. Phys. Chem. A 121, 8939–8954 (2017).

    CAS  PubMed  Google Scholar 

  21. Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    CAS  PubMed  Google Scholar 

  22. Maley, S. M. et al. Quantum-mechanical transition-state model combined with machine learning provides catalyst design features for selective Cr olefin oligomerization. Chem. Sci. 11, 9665–9674 (2020).

    PubMed  PubMed Central  Google Scholar 

  23. Reid, J. P. & Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Gallarati, S. et al. Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts. Chem. Sci. 12, 6879–6889 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Ma, X., Li, Z., Achenie, L. E. K. & Xin, H. Machine-learning-augmented chemisorption model for CO2 electroreduction catalyst screening. J. Phys. Chem. Lett. 6, 3528–3533 (2015).

    CAS  PubMed  Google Scholar 

  26. Li, Z., Wang, S., Chin, W. S., Achenie, L. E. & Xin, H. High-throughput screening of bimetallic catalysts enabled by machine learning. J. Mater. Chem. A 5, 24131–24138 (2017).

    CAS  Google Scholar 

  27. Zhong, M. et al. Accelerated discovery of CO2 electrocatalysts using active machine learning. Nature 581, 178–183 (2020).

    CAS  PubMed  Google Scholar 

  28. Tran, K. & Ulissi, Z. W. Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution. Nat. Catal. 1, 696–703 (2018).

    CAS  Google Scholar 

  29. Wexler, R. B., Martirez, J. M. P. & Rappe, A. M. Chemical pressure-driven enhancement of the hydrogen evolving activity of Ni2P from nonmetal surface doping interpreted via machine learning. J. Am. Chem. Soc. 140, 4678–4683 (2018).

    CAS  PubMed  Google Scholar 

  30. Wexler, R. B., Qiu, T. & Rappe, A. M. Automatic prediction of surface phase diagrams using ab initio grand canonical Monte Carlo. J. Phys. Chem. C 123, 2321–2328 (2019).

    CAS  Google Scholar 

  31. Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).

    Google Scholar 

  32. Apley, D. W. & Zhu, J. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 1059–1086 (2020).

  33. Tan, S., Caruana, R., Hooker, G., Koch, P. & Gordo, A. Learning global additive explanations for neural nets using model distillation. Preprint at https://arxiv.org/abs/1801.08640 (2018).

  34. Liu, C. et al. Frontier molecular orbital based analysis of solid–adsorbate interactions over group 13 metal oxide surfaces. J. Phys. Chem. C 124, 15355–15365 (2020).

    CAS  Google Scholar 

  35. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 4768–4777 (Curran Associates, 2017).

  36. Mine, S. et al. Analysis of updated literature data up to 2019 on the oxidative coupling of methane using an extrapolative machine-learning method to identify novel catalysts. ChemCatChem 13, 3636–3655 (2021).

    CAS  Google Scholar 

  37. Ding, R. et al. Machine learning-guided discovery of underlying decisive factors and new mechanisms for the design of nonprecious metal electrocatalysts. ACS Catal. 11, 9798–9808 (2021).

    CAS  Google Scholar 

  38. Back, S. et al. Convolutional neural network of atomic surface structures to predict binding energies for high-throughput screening of catalysts. J. Phys. Chem. Lett. 10, 4401–4408 (2019).

    CAS  PubMed  Google Scholar 

  39. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Google Scholar 

  40. Andersen, M., Levchenko, S., Scheffler, M. & Reuter, K. Beyond scaling relations for the description of catalytic materials. ACS Catal. 9, 2752–2759 (2019).

    CAS  Google Scholar 

  41. Jonayat, A. S. M., van Duin, A. C. T. & Janik, M. J. Discovery of descriptors for stable monolayer oxide coatings through machine learning. ACS Appl. Energy Mater. 1, 6217–6226 (2018).

    Google Scholar 

  42. O’Connor, N. J., Jonayat, A. S. M., Janik, M. J. & Senftle, T. P. Interaction trends between single metal atoms and oxide supports identified with density functional theory and statistical learning. Nat. Catal. 1, 531–539 (2018).

    Google Scholar 

  43. Weng, B. et al. Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts. Nat. Commun. 11, 3513 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Liu, C.-Y., Zhang, S., Martinez, D., Li, M. & Senftle, T. P. Using statistical learning to predict interactions between single metal atoms and modified MgO(100) supports. NPJ Comput. Mater. 6, 102 (2020).

    CAS  Google Scholar 

  45. Ouyang, R., Curtarolo, S., Ahmetcik, E., Scheffler, M. & Ghiringhelli, L. M. SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater. 2, 083802 (2018).

    CAS  Google Scholar 

  46. Wang, Y., Wagner, N. & Rondinelli, J. M. Symbolic regression in materials science. MRS Commun. 9, 793–805 (2019).

    Google Scholar 

  47. Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).

  48. Christensen, M. et al. Data-science driven autonomous process optimization. Commun. Chem. 4, 112 (2021).

    Google Scholar 

  49. Esterhuizen, J. A., Goldsmith, B. R. & Linic, S. Uncovering electronic and geometric descriptors of chemical activity for metal alloys and oxides using unsupervised machine learning. Chem Catal. 1, 923–940 (2021).

    Google Scholar 

  50. Atzmueller, M. Subgroup discovery. WIREs Data Min. Knowl. Discov. 5, 35–49 (2015).

    Google Scholar 

  51. Li, H. et al. Subgroup discovery points to the prominent role of charge transfer in breaking nitrogen scaling relations at single-atom catalysts on VS2. ACS Catal. 11, 7906–7914 (2021).

    CAS  Google Scholar 

  52. Goldsmith, B. R., Boley, M., Vreeken, J., Scheffler, M. & Ghiringhelli, L. M. Uncovering structure-property relationships of materials by subgroup discovery. New J. Phys. 19, 013031 (2017).

    Google Scholar 

  53. Foppa, L. & Ghiringhelli, L. M. Identifying outstanding transition-metal-alloy heterogeneous catalysts for the oxygen reduction and evolution reactions via subgroup discovery. Top. Catal. https://doi.org/10.1007/s11244-021-01502-4 (2021).

  54. Sutton, C. et al. Identifying domains of applicability of machine learning models for materials science. Nat. Commun. 11, 4428 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Hastie, T. J. & Tibshirani, R. J. Generalized Additive Models (Chapman and Hall, 1990).

  56. Lou, Y., Caruana, R. & Gehrke, J. Intelligible models for classification and regression. In Proc. 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 150–158 (ACM, 2012).

  57. Esterhuizen, J. A., Goldsmith, B. R. & Linic, S. Theory-guided machine learning finds geometric structure-property relationships for chemisorption on subsurface alloys. Chem 6, 3100–3117 (2020).

    CAS  Google Scholar 

  58. Mavrikakis, M., Hammer, B. & Nørskov, J. K. Effect of strain on the reactivity of metal surfaces. Phys. Rev. Lett. 81, 2819–2822 (1998).

    Google Scholar 

  59. Kitchin, J. R., Nørskov, J. K., Barteau, M. A. & Chen, J. G. Role of strain and ligand effects in the modification of the electronic and chemical properties of bimetallic surfaces. Phys. Rev. Lett. 93, 156801 (2004).

    CAS  PubMed  Google Scholar 

  60. Hammer, B., Morikawa, Y. & Nørskov, J. K. CO chemisorption at metal surfaces and overlayers. Phys. Rev. Lett. 76, 2141–2144 (1996).

    CAS  PubMed  Google Scholar 

  61. Xin, H. & Linic, S. Communications: exceptions to the d-band model of chemisorption on metal surfaces: the dominant role of repulsion between adsorbate states and metal d-states. J. Chem. Phys. 132, 221101 (2010).

    PubMed  Google Scholar 

  62. Nori, H., Jenkins, S., Koch, P. & Caruana, R. InterpretML: a unified framework for machine learning interpretability. Preprint at https://arxiv.org/abs/1909.09223 (2019).

  63. Feng, J., Lansford, J. L., Katsoulakis, M. A. & Vlachos, D. G. Explainable and trustworthy artificial intelligence for correctable modeling in chemical sciences. Sci. Adv. 6, eabc3204 (2020).

    PubMed  PubMed Central  Google Scholar 

  64. Wang, S., Pillai, H. S. & Xin, H. Bayesian learning of chemisorption for bridging the complexity of electronic descriptors. Nat. Commun. 11, 6132 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Wang, S.-H., Pillai, H. S., Wang, S., Achenie, L. E. K. & Xin, H. Infusing theory into deep learning for interpretable reactivity prediction. Nat. Commun. 12, 5288 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Pearl, J. Causal inference in statistics: an overview. Stat. Surv. 3, 96–146 (2009).

    Google Scholar 

  67. Schölkopf, B. et al. Modeling confounding by half-sibling regression. Proc. Natl Acad. Sci. USA 113, 7391–7398 (2016).

    PubMed  PubMed Central  Google Scholar 

  68. Andersen, M. & Reuter, K. Adsorption enthalpies for catalysis modeling through machine-learned descriptors. Acc. Chem. Res. 54, 2741–2749 (2021).

    CAS  PubMed  Google Scholar 

  69. Kim, E. et al. Materials synthesis insights from scientific literature via text extraction and machine learning. Chem. Mater. 29, 9436–9444 (2017).

    CAS  Google Scholar 

  70. Tabor, D. P. et al. Accelerating the discovery of materials for clean energy in the era of smart automation. Nat. Rev. Chem. 3, 5–20 (2018).

    CAS  Google Scholar 

  71. Yang, L. et al. Discovery of complex oxides via automated experiments and data science. Proc. Natl Acad. Sci. USA 118, e2106042118 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. Flores, R. A. et al. Active learning accelerated discovery of stable iridium oxide polymorphs for the oxygen evolution reaction. Chem. Mater. 32, 5854–5863 (2020).

    CAS  Google Scholar 

  73. Tran, K. et al. Computational catalyst discovery: Active classification through myopic multiscale sampling. J. Chem. Phys. 154, 124118 (2021).

    CAS  PubMed  Google Scholar 

  74. Chanussot, L. et al. Open Catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).

    CAS  Google Scholar 

  75. Jain, A. et al. Commentary: The Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).

    Google Scholar 

  76. Bartel, C. J. et al. New tolerance factor to predict the stability of perovskite oxides and halides. Sci. Adv. 5, eaav0693 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

  78. Rasmussen, C. E. in Advanced Lectures on Machine Learning (eds Bousquet, O. et al.) 63–71 (Springer, 2004).

  79. Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).

    Google Scholar 

  80. Montoya, J. H. et al. Autonomous intelligent agents for accelerated materials discovery. Chem. Sci. 11, 8517–8532 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Morris, M. D. Factorial sampling plans for preliminary computational experiments. Technometrics 33, 161–174 (1991).

    Google Scholar 

  82. Augusto, D. A. & Barbosa, H. J. C. Symbolic regression via genetic programming. In Proc. Vol.1. Sixth Brazilian Symposium on Neural Networks 173–178 (IEEE, 2000).

  83. Herrera, F., Carmona, C. J., González, P. & del Jesus, M. J. An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29, 495–525 (2011).

    Google Scholar 

  84. Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning (Springer, 2001).

  85. Koller, D. & Friedman, N. Probabilistic Graphical Models: Principles and Techniques (MIT Press, 2009).

Download references

Acknowledgements

This work was supported by the US DOE Office of Basic Energy Sciences, Division of Chemical Sciences (DE-SC0021008) (analysis of alloy chemisorption) and the CBET-National Science Foundation under DMREF grant no. 2116646. We acknowledge support from the Michigan Institute for Data Science (MIDAS) PODS Grant. J.A.E. acknowledges support from the University of Michigan J. Robert Beyster Computational Innovation Graduate Fellows Program.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bryan R. Goldsmith or Suljo Linic.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Catalysis thanks Johannes Margraf and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Esterhuizen, J.A., Goldsmith, B.R. & Linic, S. Interpretable machine learning for knowledge generation in heterogeneous catalysis. Nat Catal 5, 175–184 (2022). https://doi.org/10.1038/s41929-022-00744-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41929-022-00744-z

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing