Interpretable machine learning for knowledge generation in heterogeneous catalysis

Esterhuizen, Jacques A.; Goldsmith, Bryan R.; Linic, Suljo

doi:10.1038/s41929-022-00744-z

Perspective
Published: 17 March 2022

Interpretable machine learning for knowledge generation in heterogeneous catalysis

Nature Catalysis volume 5, pages 175–184 (2022)Cite this article

12k Accesses
123 Citations
9 Altmetric
Metrics details

Subjects

Abstract

Most applications of machine learning in heterogeneous catalysis thus far have used black-box models to predict computable physical properties (descriptors), such as adsorption or formation energies, that can be related to catalytic performance (that is, activity or stability). Extracting meaningful physical insights from these black-box models has proved challenging, as the internal logic of these black-box models is not readily interpretable due to their high degree of complexity. Interpretable machine learning methods that merge the predictive capacity of black-box models with the physical interpretability of physics-based models offer an alternative to black-box models. In this Perspective, we discuss the various interpretable machine learning methods available to catalysis researchers, highlight the potential of interpretable machine learning to accelerate hypothesis formation and knowledge generation, and outline critical challenges and opportunities for interpretable machine learning in heterogeneous catalysis.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Synergistic relationship between black-box and interpretable ML approaches.**

**Fig. 2: Schematic depiction of black-box, grey-box and glass-box ML methods.**

**Fig. 3: Grey-box methods in catalysis applications.**

**Fig. 4: Interpreting glass-box ML results.**

Exploring catalytic reaction networks with machine learning

Article 26 January 2023

Accurate energy barriers for catalytic reaction pathways: an automatic training protocol for machine learning force fields

Article Open access 04 October 2023

Bridging the complexity gap in computational heterogeneous catalysis with machine learning

Article 23 February 2023

Data availability

The panels of Fig. 3 and Fig. 4 were adapted from refs. ^{25,29,34,38,43,49,51,57,63} and have associated raw data.

References

Vlachos, D. G. in Advances in Chemical Engineering Vol. 30 (ed. Marin, G. B.) 1–61 (Academic, 2005).
Goldsmith, B. R., Esterhuizen, J., Liu, J.-X., Bartel, C. J. & Sutton, C. Machine learning for heterogeneous catalyst design and discovery. AlChE J. 64, 2311–2323 (2018).
Article CAS Google Scholar
Schlexer Lamoureux, P. et al. Machine learning for computational heterogeneous catalysis. ChemCatChem 11, 3581–3601 (2019).
Article CAS Google Scholar
Kitchin, J. R. Machine learning in catalysis. Nat. Catal. 1, 230–232 (2018).
Article Google Scholar
Toyao, T. et al. Machine learning for catalysis informatics: recent applications and prospects. ACS Catal. 10, 2260–2297 (2020).
Article CAS Google Scholar
Artrith, N. & Kolpak, A. M. Understanding the composition and activity of electrocatalytic nanoalloys in aqueous solvents: a combination of DFT and accurate neural network potentials. Nano Lett. 14, 2670–2676 (2014).
Article CAS PubMed Google Scholar
Boes, J. R. & Kitchin, J. R. Modeling segregation on AuPd(111) surfaces with density functional theory and Monte Carlo simulations. J. Phys. Chem. C 121, 3479–3487 (2017).
Article CAS Google Scholar
Ulissi, Z. W., Singh, A. R., Tsai, C. & Nørskov, J. K. Automated discovery and construction of surface phase diagrams using machine learning. J. Phys. Chem. Lett. 7, 3931–3935 (2016).
Article CAS PubMed Google Scholar
Peterson, A. A. Acceleration of saddle-point searches with machine learning. J. Chem. Phys. 145, 074106 (2016).
Article PubMed Google Scholar
Ulissi, Z. W., Medford, A. J., Bligaard, T. & Nørskov, J. K. To address surface reaction network complexity using scaling relations machine learning and DFT calculations. Nat. Commun. 8, 14621 (2017).
Article PubMed PubMed Central Google Scholar
Kolsbjerg, E. L., Peterson, A. A. & Hammer, B. Neural-network-enhanced evolutionary algorithm applied to supported metal nanoparticles. Phys. Rev. B 97, 195424 (2018).
Article CAS Google Scholar
Jennings, P. C., Lysgaard, S., Hummelshøj, J. S., Vegge, T. & Bligaard, T. Genetic algorithms for computational materials discovery accelerated by machine learning. NPJ Comput. Mater. 5, 46 (2019).
Article Google Scholar
Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R. & Yu, B. Definitions, methods, and applications in interpretable machine learning. Proc. Natl Acad. Sci. USA 116, 22071–22080 (2019).
Article CAS PubMed PubMed Central Google Scholar
Caruana, R. et al. Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1721–1730 (ACM, 2015).
Unceta, I., Nin, J. & Pujol, O. Towards global explanations for credit risk scoring. Preprint at https://arxiv.org/abs/1811.07698 (2018).
Tan, S., Caruana, R., Hooker, G. & Lou, Y. Distill-and-compare: auditing black-box models using transparent model distillation. Proc. 2018 AAAI/ACM Conference on AI, Ethics, and Society 303–310 (ACM, 2018)
Azodi, C. B., Tang, J. & Shiu, S.-H. Opening the black box: interpretable machine learning for geneticists. Trends Genet. 36, 442–455 (2020).
Article CAS PubMed Google Scholar
Dybowski, R. Interpretable machine learning as a tool for scientific discovery in chemistry. New J. Chem. 44, 20914–20920 (2020).
Article CAS Google Scholar
Rothenberg, G. Data mining in catalysis: separating knowledge from garbage. Catal. Today 137, 2–10 (2008).
Article CAS Google Scholar
Janet, J. P. & Kulik, H. J. Resolving transition metal chemical space: feature selection for machine learning and structure–property relationships. J. Phys. Chem. A 121, 8939–8954 (2017).
Article CAS PubMed Google Scholar
Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).
Article CAS PubMed Google Scholar
Maley, S. M. et al. Quantum-mechanical transition-state model combined with machine learning provides catalyst design features for selective Cr olefin oligomerization. Chem. Sci. 11, 9665–9674 (2020).
Article PubMed PubMed Central Google Scholar
Reid, J. P. & Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gallarati, S. et al. Reaction-based machine learning representations for predicting the enantioselectivity of organocatalysts. Chem. Sci. 12, 6879–6889 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ma, X., Li, Z., Achenie, L. E. K. & Xin, H. Machine-learning-augmented chemisorption model for CO₂ electroreduction catalyst screening. J. Phys. Chem. Lett. 6, 3528–3533 (2015).
Article CAS PubMed Google Scholar
Li, Z., Wang, S., Chin, W. S., Achenie, L. E. & Xin, H. High-throughput screening of bimetallic catalysts enabled by machine learning. J. Mater. Chem. A 5, 24131–24138 (2017).
Article CAS Google Scholar
Zhong, M. et al. Accelerated discovery of CO₂ electrocatalysts using active machine learning. Nature 581, 178–183 (2020).
Article CAS PubMed Google Scholar
Tran, K. & Ulissi, Z. W. Active learning across intermetallics to guide discovery of electrocatalysts for CO₂ reduction and H₂ evolution. Nat. Catal. 1, 696–703 (2018).
Article CAS Google Scholar
Wexler, R. B., Martirez, J. M. P. & Rappe, A. M. Chemical pressure-driven enhancement of the hydrogen evolving activity of Ni₂P from nonmetal surface doping interpreted via machine learning. J. Am. Chem. Soc. 140, 4678–4683 (2018).
Article CAS PubMed Google Scholar
Wexler, R. B., Qiu, T. & Rappe, A. M. Automatic prediction of surface phase diagrams using ab initio grand canonical Monte Carlo. J. Phys. Chem. C 123, 2321–2328 (2019).
Article CAS Google Scholar
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
Article Google Scholar
Apley, D. W. & Zhu, J. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 1059–1086 (2020).
Tan, S., Caruana, R., Hooker, G., Koch, P. & Gordo, A. Learning global additive explanations for neural nets using model distillation. Preprint at https://arxiv.org/abs/1801.08640 (2018).
Liu, C. et al. Frontier molecular orbital based analysis of solid–adsorbate interactions over group 13 metal oxide surfaces. J. Phys. Chem. C 124, 15355–15365 (2020).
Article CAS Google Scholar
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 4768–4777 (Curran Associates, 2017).
Mine, S. et al. Analysis of updated literature data up to 2019 on the oxidative coupling of methane using an extrapolative machine-learning method to identify novel catalysts. ChemCatChem 13, 3636–3655 (2021).
Article CAS Google Scholar
Ding, R. et al. Machine learning-guided discovery of underlying decisive factors and new mechanisms for the design of nonprecious metal electrocatalysts. ACS Catal. 11, 9798–9808 (2021).
Article CAS Google Scholar
Back, S. et al. Convolutional neural network of atomic surface structures to predict binding energies for high-throughput screening of catalysts. J. Phys. Chem. Lett. 10, 4401–4408 (2019).
Article CAS PubMed Google Scholar
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Article Google Scholar
Andersen, M., Levchenko, S., Scheffler, M. & Reuter, K. Beyond scaling relations for the description of catalytic materials. ACS Catal. 9, 2752–2759 (2019).
Article CAS Google Scholar
Jonayat, A. S. M., van Duin, A. C. T. & Janik, M. J. Discovery of descriptors for stable monolayer oxide coatings through machine learning. ACS Appl. Energy Mater. 1, 6217–6226 (2018).
Article Google Scholar
O’Connor, N. J., Jonayat, A. S. M., Janik, M. J. & Senftle, T. P. Interaction trends between single metal atoms and oxide supports identified with density functional theory and statistical learning. Nat. Catal. 1, 531–539 (2018).
Article Google Scholar
Weng, B. et al. Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts. Nat. Commun. 11, 3513 (2020).
Article CAS PubMed PubMed Central Google Scholar
Liu, C.-Y., Zhang, S., Martinez, D., Li, M. & Senftle, T. P. Using statistical learning to predict interactions between single metal atoms and modified MgO(100) supports. NPJ Comput. Mater. 6, 102 (2020).
Article CAS Google Scholar
Ouyang, R., Curtarolo, S., Ahmetcik, E., Scheffler, M. & Ghiringhelli, L. M. SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater. 2, 083802 (2018).
Article CAS Google Scholar
Wang, Y., Wagner, N. & Rondinelli, J. M. Symbolic regression in materials science. MRS Commun. 9, 793–805 (2019).
Article Google Scholar
Murphy, K. P. Machine Learning: A Probabilistic Perspective (MIT Press, 2012).
Christensen, M. et al. Data-science driven autonomous process optimization. Commun. Chem. 4, 112 (2021).
Article Google Scholar
Esterhuizen, J. A., Goldsmith, B. R. & Linic, S. Uncovering electronic and geometric descriptors of chemical activity for metal alloys and oxides using unsupervised machine learning. Chem Catal. 1, 923–940 (2021).
Article Google Scholar
Atzmueller, M. Subgroup discovery. WIREs Data Min. Knowl. Discov. 5, 35–49 (2015).
Article Google Scholar
Li, H. et al. Subgroup discovery points to the prominent role of charge transfer in breaking nitrogen scaling relations at single-atom catalysts on VS₂. ACS Catal. 11, 7906–7914 (2021).
Article CAS Google Scholar
Goldsmith, B. R., Boley, M., Vreeken, J., Scheffler, M. & Ghiringhelli, L. M. Uncovering structure-property relationships of materials by subgroup discovery. New J. Phys. 19, 013031 (2017).
Article Google Scholar
Foppa, L. & Ghiringhelli, L. M. Identifying outstanding transition-metal-alloy heterogeneous catalysts for the oxygen reduction and evolution reactions via subgroup discovery. Top. Catal. https://doi.org/10.1007/s11244-021-01502-4 (2021).
Sutton, C. et al. Identifying domains of applicability of machine learning models for materials science. Nat. Commun. 11, 4428 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hastie, T. J. & Tibshirani, R. J. Generalized Additive Models (Chapman and Hall, 1990).
Lou, Y., Caruana, R. & Gehrke, J. Intelligible models for classification and regression. In Proc. 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 150–158 (ACM, 2012).
Esterhuizen, J. A., Goldsmith, B. R. & Linic, S. Theory-guided machine learning finds geometric structure-property relationships for chemisorption on subsurface alloys. Chem 6, 3100–3117 (2020).
Article CAS Google Scholar
Mavrikakis, M., Hammer, B. & Nørskov, J. K. Effect of strain on the reactivity of metal surfaces. Phys. Rev. Lett. 81, 2819–2822 (1998).
Article Google Scholar
Kitchin, J. R., Nørskov, J. K., Barteau, M. A. & Chen, J. G. Role of strain and ligand effects in the modification of the electronic and chemical properties of bimetallic surfaces. Phys. Rev. Lett. 93, 156801 (2004).
Article CAS PubMed Google Scholar
Hammer, B., Morikawa, Y. & Nørskov, J. K. CO chemisorption at metal surfaces and overlayers. Phys. Rev. Lett. 76, 2141–2144 (1996).
Article CAS PubMed Google Scholar
Xin, H. & Linic, S. Communications: exceptions to the d-band model of chemisorption on metal surfaces: the dominant role of repulsion between adsorbate states and metal d-states. J. Chem. Phys. 132, 221101 (2010).
Article PubMed Google Scholar
Nori, H., Jenkins, S., Koch, P. & Caruana, R. InterpretML: a unified framework for machine learning interpretability. Preprint at https://arxiv.org/abs/1909.09223 (2019).
Feng, J., Lansford, J. L., Katsoulakis, M. A. & Vlachos, D. G. Explainable and trustworthy artificial intelligence for correctable modeling in chemical sciences. Sci. Adv. 6, eabc3204 (2020).
Article PubMed PubMed Central Google Scholar
Wang, S., Pillai, H. S. & Xin, H. Bayesian learning of chemisorption for bridging the complexity of electronic descriptors. Nat. Commun. 11, 6132 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, S.-H., Pillai, H. S., Wang, S., Achenie, L. E. K. & Xin, H. Infusing theory into deep learning for interpretable reactivity prediction. Nat. Commun. 12, 5288 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pearl, J. Causal inference in statistics: an overview. Stat. Surv. 3, 96–146 (2009).
Article Google Scholar
Schölkopf, B. et al. Modeling confounding by half-sibling regression. Proc. Natl Acad. Sci. USA 113, 7391–7398 (2016).
Article PubMed PubMed Central Google Scholar
Andersen, M. & Reuter, K. Adsorption enthalpies for catalysis modeling through machine-learned descriptors. Acc. Chem. Res. 54, 2741–2749 (2021).
Article CAS PubMed Google Scholar
Kim, E. et al. Materials synthesis insights from scientific literature via text extraction and machine learning. Chem. Mater. 29, 9436–9444 (2017).
Article CAS Google Scholar
Tabor, D. P. et al. Accelerating the discovery of materials for clean energy in the era of smart automation. Nat. Rev. Chem. 3, 5–20 (2018).
CAS Google Scholar
Yang, L. et al. Discovery of complex oxides via automated experiments and data science. Proc. Natl Acad. Sci. USA 118, e2106042118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Flores, R. A. et al. Active learning accelerated discovery of stable iridium oxide polymorphs for the oxygen evolution reaction. Chem. Mater. 32, 5854–5863 (2020).
Article CAS Google Scholar
Tran, K. et al. Computational catalyst discovery: Active classification through myopic multiscale sampling. J. Chem. Phys. 154, 124118 (2021).
Article CAS PubMed Google Scholar
Chanussot, L. et al. Open Catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).
Article CAS Google Scholar
Jain, A. et al. Commentary: The Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Article Google Scholar
Bartel, C. J. et al. New tolerance factor to predict the stability of perovskite oxides and halides. Sci. Adv. 5, eaav0693 (2019).
Article CAS PubMed PubMed Central Google Scholar
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
Rasmussen, C. E. in Advanced Lectures on Machine Learning (eds Bousquet, O. et al.) 63–71 (Springer, 2004).
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
Article Google Scholar
Montoya, J. H. et al. Autonomous intelligent agents for accelerated materials discovery. Chem. Sci. 11, 8517–8532 (2020).
Article CAS PubMed PubMed Central Google Scholar
Morris, M. D. Factorial sampling plans for preliminary computational experiments. Technometrics 33, 161–174 (1991).
Article Google Scholar
Augusto, D. A. & Barbosa, H. J. C. Symbolic regression via genetic programming. In Proc. Vol.1. Sixth Brazilian Symposium on Neural Networks 173–178 (IEEE, 2000).
Herrera, F., Carmona, C. J., González, P. & del Jesus, M. J. An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29, 495–525 (2011).
Article Google Scholar
Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning (Springer, 2001).
Koller, D. & Friedman, N. Probabilistic Graphical Models: Principles and Techniques (MIT Press, 2009).

Download references

Acknowledgements

This work was supported by the US DOE Office of Basic Energy Sciences, Division of Chemical Sciences (DE-SC0021008) (analysis of alloy chemisorption) and the CBET-National Science Foundation under DMREF grant no. 2116646. We acknowledge support from the Michigan Institute for Data Science (MIDAS) PODS Grant. J.A.E. acknowledges support from the University of Michigan J. Robert Beyster Computational Innovation Graduate Fellows Program.

Author information

Authors and Affiliations

Department of Chemical Engineering, University of Michigan, Ann Arbor, MI, USA
Jacques A. Esterhuizen, Bryan R. Goldsmith & Suljo Linic
Catalysis Science and Technology Institute, University of Michigan, Ann Arbor, MI, USA
Jacques A. Esterhuizen, Bryan R. Goldsmith & Suljo Linic

Authors

Jacques A. Esterhuizen
View author publications
You can also search for this author in PubMed Google Scholar
Bryan R. Goldsmith
View author publications
You can also search for this author in PubMed Google Scholar
Suljo Linic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bryan R. Goldsmith or Suljo Linic.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Catalysis thanks Johannes Margraf and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Esterhuizen, J.A., Goldsmith, B.R. & Linic, S. Interpretable machine learning for knowledge generation in heterogeneous catalysis. Nat Catal 5, 175–184 (2022). https://doi.org/10.1038/s41929-022-00744-z

Download citation

Received: 12 October 2021
Accepted: 20 December 2021
Published: 17 March 2022
Issue Date: March 2022
DOI: https://doi.org/10.1038/s41929-022-00744-z

This article is cited by

Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis
- Toshiaki Taniike
- Aya Fujiwara
- Keisuke Takahashi
Communications Chemistry (2024)
Transcending scales in catalysis for sustainable development
- Sharon Mitchell
- Antonio J. Martín
- Javier Pérez-Ramírez
Nature Chemical Engineering (2024)
Materials consideration for the design, fabrication and operation of microscale robots
- Chuanrui Chen
- Shichao Ding
- Joseph Wang
Nature Reviews Materials (2024)
Machine learning prediction of organic moieties from the IR spectra, enhanced by additionally using the derivative IR data
- Maurycy Krzyżanowski
- Grzegorz Matyszczak
Chemical Papers (2024)
Interpretable Machine Learning Method for Modelling Fatigue Short Crack Growth Behaviour
- Shuwei Zhou
- Bing Yang
- Tao Zhu
Metals and Materials International (2024)

Interpretable machine learning for knowledge generation in heterogeneous catalysis

Subjects

Abstract

Access options

Similar content being viewed by others

Exploring catalytic reaction networks with machine learning

Accurate energy barriers for catalytic reaction pathways: an automatic training protocol for machine learning force fields

Bridging the complexity gap in computational heterogeneous catalysis with machine learning

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Rights and permissions

About this article

Cite this article

This article is cited by

Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis

Transcending scales in catalysis for sustainable development

Materials consideration for the design, fabrication and operation of microscale robots

Machine learning prediction of organic moieties from the IR spectra, enhanced by additionally using the derivative IR data

Interpretable Machine Learning Method for Modelling Fatigue Short Crack Growth Behaviour

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links