Breakthroughs in molecular and materials discovery require meaningful outliers to be identified in existing trends. As knowledge accumulates, the inherent bias of human intuition makes it harder to elucidate increasingly opaque chemical and physical principles. Moreover, given the limited manual and intellectual throughput of investigators, these principles cannot be efficiently applied to design new materials across a vast chemical space. Many data-driven approaches, following advances in high-throughput capabilities and machine learning, have tackled these limitations. In this Review, we compare traditional, human-centred methods with state-of-the-art, data-driven approaches to molecular and materials discovery. We first introduce the limitations of human-centred Edisonian, model-system and descriptor-based approaches. We then discuss how data-driven approaches can address these limitations by promoting throughput, reducing cognitive overload and biases, and establishing atomistic understanding that is transferable across a broad chemical space. We examine how high-throughput capabilities can be combined with active learning and inverse design to efficiently optimize materials out of millions or an intractable number of candidates. Lastly, we pinpoint challenges to accelerate future workflows and ultimately enable self-driving platforms, which automate and streamline the optimization of molecules and materials in iterative cycles.
This is a preview of subscription content, access via your institution
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Chu, S. & Majumdar, A. Opportunities and challenges for a sustainable energy future. Nature 488, 294–303 (2012).
Chu, S., Cui, Y. & Liu, N. The path towards sustainable energy. Nat. Mater. 16, 16–22 (2017).
Seh, Z. W. et al. Combining theory and experiment in electrocatalysis: insights into materials design. Science 355, eaad4998 (2017).
Li, W., Erickson, E. M. & Manthiram, A. High-nickel layered oxide cathodes for lithium-based automotive batteries. Nat. Energy 5, 26–34 (2020).
Montoya, J. H. et al. Materials for solar fuels and chemicals. Nat. Mater. 16, 70–81 (2017).
Muy, S. et al. Tuning mobility and stability of lithium ion conductors based on lattice dynamics. Energy Environ. Sci. 11, 850–859 (2018).
Gorai, P., Stevanović, V. & Toberer, E. S. Computationally guided discovery of thermoelectric materials. Nat. Rev. Mater. 2, 17053 (2017).
Zavyalova, U., Holena, M., Schlögl, R. & Baerns, M. Statistical analysis of past catalytic data on oxidative methane coupling for new insights into the composition of high-performance catalysts. ChemCatChem 3, 1935–1947 (2011).
Hwang, J. et al. Perovskites in catalysis and electrocatalysis. Science 358, 751–756 (2017).
Medford, A. J. et al. From the Sabatier principle to a predictive theory of transition-metal heterogeneous catalysis. J. Catal. 328, 36–42 (2015).
Medford, A. J., Kunz, M. R., Ewing, S. M., Borders, T. & Fushimi, R. Extracting knowledge from data through catalysis informatics. ACS Catal. 8, 7403–7429 (2018).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Gromski, P. S., Henson, A. B., Granda, J. M. & Cronin, L. How to explore chemical space using algorithms and automation. Nat. Rev. Chem. 3, 119–128 (2019).
Tabor, D. P. et al. Accelerating the discovery of materials for clean energy in the era of smart automation. Nat. Rev. Mater. 3, 5–20 (2018).
Pyzer-Knapp, E. O., Suh, C., Gómez-Bombarelli, R., Aguilera-Iparraguirre, J. & Aspuru-Guzik, A. What is high-throughput virtual screening? A perspective from organic materials discovery. Annu. Rev. Mater. Res. 45, 195–216 (2015).
Häse, F., Roch, L. M. & Aspuru-Guzik, A. Next-generation experimentation with self-driving laboratories. Trends Chem. 1, 282–291 (2019).
Stein, H. S. & Gregoire, J. M. Progress and prospects for accelerating materials science with automated and autonomous workflows. Chem. Sci. 10, 9640–9649 (2019).
Aspuru-Guzik, A. & Persson, K. Materials Acceleration Platform: Accelerating Advanced Energy Materials Discovery by Integrating High-Throughput Methods and Artificial Intelligence. Report of the Clean Energy Materials Innovation Workshop http://nrs.harvard.edu/urn-3:HUL.InstRepos:35164974 (SENER/US Department of Energy/CIFAR, 2018).
George, J. & Hautier, G. Chemist versus machine: traditional knowledge versus machine learning techniques. Trends Chem. 3, 86–95 (2021).
Topham, S. A. in Catalysis: Science and Technology (eds Anderson, J. R. & Boudart, M.) 1–50 (Springer, 1985).
Haber, F. & Koenig, A. Wissenschaftliche Übersichten: Oxydation des Luftstickstoffes. Z. Elektrochem. Angew. Phys. Chem. 16, 11–25 (1910).
Erisman, J. W., Sutton, M. A., Galloway, J., Klimont, Z. & Winiwarter, W. How a century of ammonia synthesis changed the world. Nat. Geosci. 1, 636–639 (2008).
Kim, H.-S. et al. Lead iodide perovskite sensitized all-solid-state submicron thin film mesoscopic solar cell with efficiency exceeding 9%. Sci. Rep. 2, 591 (2012).
Lee, M. M., Teuscher, J., Miyasaka, T., Murakami, T. N. & Snaith, H. J. Efficient hybrid solar cells based on meso-superstructured organometal halide perovskites. Science 338, 643–647 (2012).
Kamaya, N. et al. A lithium superionic conductor. Nat. Mater. 10, 682–686 (2011).
Jia, X. et al. Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature 573, 251–255 (2019).
Qiao, B. et al. Quantitative mapping of molecular substituents to macroscopic properties enables predictive design of oligoethylene glycol-based lithium electrolytes. ACS Cent. Sci. 6, 1115–1128 (2020).
Lopez, J., Mackanic, D. G., Cui, Y. & Bao, Z. Designing polymers for advanced battery chemistries. Nat. Rev. Mater. 4, 312–330 (2019).
Bachman, J. C. et al. Inorganic solid-state electrolytes for lithium batteries: mechanisms and properties governing ion conduction. Chem. Rev. 116, 140–162 (2016).
Lu, Y. C. et al. Lithium–oxygen batteries: bridging mechanistic understanding and battery performance. Energy Environ. Sci. 6, 750–768 (2013).
Feng, S. et al. Mapping a stable solvent structure landscape for aprotic Li–air battery organic electrolytes. J. Mater. Chem. A 5, 23987–23998 (2017).
Giordano, L. et al. Ligand-dependent energetics for dehydrogenation: implications in Li-ion battery electrolyte stability and selective oxidation catalysis of hydrogen-containing molecules. Chem. Mater. 31, 5464–5474 (2019).
Stoerzinger, K. A., Qiao, L., Biegalski, M. D. & Shao-Horn, Y. Orientation-dependent oxygen evolution activities of rutile IrO2 and RuO2. J. Phys. Chem. Lett. 5, 1636–1641 (2014).
Rao, R. R. et al. Operando identification of site-dependent water oxidation activity on ruthenium dioxide single-crystal surfaces. Nat. Catal. 3, 516–525 (2020).
Shi, Y. et al. Noble-metal nanocrystals with controlled shapes for catalytic and electrocatalytic applications. Chem. Rev. 121, 649–735 (2021).
Sata, N., Eberman, K., Eberl, K. & Maier, J. Mesoscopic fast ion conduction in nanometre-scale planar heterostructures. Nature 408, 946–949 (2000).
Luckyanova, M. N. et al. Coherent phonon heat conduction in superlattices. Science 338, 936–939 (2012).
Ertl, G. Surface science and catalysis — studies on the mechanism of ammonia synthesis: the P. H. Emmett award address. Catal. Rev. 21, 201–223 (1980).
Ertl, G. Reactions at surfaces: from atoms to complexity (Nobel lecture). Angew. Chem. Int. Ed. 47, 3524–3535 (2008).
Stoltze, P. & Nørskov, J. K. Bridging the ‘pressure gap’ between ultrahigh-vacuum surface physics and high-pressure catalysis. Phys. Rev. Lett. 55, 2502–2505 (1985).
Duke, C. B. The birth and evolution of surface science: child of the union of science and technology. Proc. Natl Acad. Sci. USA 100, 3858–3864 (2003).
Diebold, U. The surface science of titanium dioxide. Surf. Sci. Rep. 48, 53–229 (2003).
Guo, J. et al. Real-space imaging of interfacial water with submolecular resolution. Nat. Mater. 13, 184–189 (2014).
Wei, C. et al. Recommended practices and benchmark activity for hydrogen and oxygen electrocatalysis in water splitting and fuel cells. Adv. Mater. 31, 1806296 (2019).
Talaie, E. et al. Methods and protocols for electrochemical energy storage materials research. Chem. Mater. 29, 90–105 (2017).
Bruix, A., Margraf, J. T., Andersen, M. & Reuter, K. First-principles-based multiscale modelling of heterogeneous catalysis. Nat. Catal. 2, 659–670 (2019).
Kitano, M. et al. Ammonia synthesis using a stable electride as an electron donor and reversible hydrogen store. Nat. Chem. 4, 934–940 (2012).
Ye, T.-N. et al. Vacancy-enabled N2 activation for ammonia synthesis on an Ni-loaded catalyst. Nature 583, 391–395 (2020).
Shen, T.-H. et al. Oxygen evolution reaction in Ba0.5Sr0.5Co0.8Fe0.2O3−δ aided by intrinsic Co/Fe spinel-like surface. J. Am. Chem. Soc. 142, 15876–15883 (2020).
Wan, G. et al. Amorphization mechanism of SrIrO3 electrocatalyst: how oxygen redox initiates ionic diffusion and structural reorganization. Sci. Adv. 7, eabc7323 (2021).
Reuter, K., Plaisance, C. P., Oberhofer, H. & Andersen, M. Perspective: on the active site model in computational catalyst screening. J. Chem. Phys. 146, 040901 (2017).
Gauthier, M. et al. Electrode–electrolyte interface in Li-ion batteries: current understanding and new insights. J. Phys. Chem. Lett. 6, 4653–4672 (2015).
Plaisance, C. P., Reuter, K. & van Santen, R. A. Quantum chemistry of the oxygen evolution reaction on cobalt(ii,iii) oxide — implications for designing the optimal catalyst. Faraday Discuss. 188, 199–226 (2016).
Ghiringhelli, L. M., Vybiral, J., Levchenko, S. V., Draxl, C. & Scheffler, M. Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 114, 105503 (2015).
Wagner, N. & Rondinelli, J. M. Theory-guided machine learning in materials science. Front. Mater. 3, 28 (2016).
Nørskov, J. K., Abild-Pedersen, F., Studt, F. & Bligaard, T. Density functional theory in surface chemistry and catalysis. Proc. Natl Acad. Sci. USA 108, 937–943 (2011).
Muratov, E. N. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020).
Jain, A., Shin, Y. & Persson, K. A. Computational predictions of energy materials using density functional theory. Nat. Rev. Mater. 1, 15004 (2016).
Stocker, S., Csányi, G., Reuter, K. & Margraf, J. T. Machine learning in chemical reaction space. Nat. Commun. 11, 5505 (2020).
Miller, G. A. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63, 81–97 (1956).
George, J. et al. The limited predictive power of the Pauling rules. Angew. Chem. Int. Ed. 59, 7569–7575 (2020).
Hong, W. T., Welsch, R. E. & Shao-Horn, Y. Descriptors of oxygen-evolution activity for oxides: a statistical evaluation. J. Phys. Chem. C 120, 78–86 (2016).
Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970).
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005).
Cramer, J. S. The early origins of the logit model. Stud. Hist. Phil. Sci. Part C 35, 613–626 (2004).
Ho, T. K. Random decision forests. Proc. 3rd Int. Conf. Doc. Anal. Recognit. 1, 278–282 (1995).
Wang, A. Y.-T. et al. Machine learning for materials scientists: an introductory guide toward best practices. Chem. Mater. 32, 4954–4965 (2020).
Artrith, N. et al. Best practices in machine learning for chemistry. Nat. Chem. 13, 505–508 (2021).
Zhu, T. et al. Charting lattice thermal conductivity for inorganic crystals and discovering rare earth chalcogenides for thermoelectrics. Energy Environ. Sci. 14, 3559–3566 (2021).
García-Muelas, R. & López, N. Statistical learning goes beyond the d-band model providing the thermochemistry of adsorbates on transition metals. Nat. Commun. 10, 4687 (2019).
Batra, R., Chen, C., Evans, T. G., Walton, K. S. & Ramprasad, R. Prediction of water stability of metal–organic frameworks using machine learning. Nat. Mach. Intell. 2, 704–710 (2020).
Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).
Ouyang, R., Curtarolo, S., Ahmetcik, E., Scheffler, M. & Ghiringhelli, L. M. SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater. 2, 83802 (2018).
Ouyang, R., Ahmetcik, E., Carbogno, C., Scheffler, M. & Ghiringhelli, L. M. Simultaneous learning of several materials properties from incomplete databases with multi-task SISSO. J. Phys. Mater. 2, 24002 (2019).
Andersen, M., Levchenko, S. V., Scheffler, M. & Reuter, K. Beyond scaling relations for the description of catalytic materials. ACS Catal. 9, 2752–2759 (2019).
Xu, W., Andersen, M. & Reuter, K. Data-driven descriptor engineering and refined scaling relations for predicting transition metal oxide reactivity. ACS Catal. 11, 734–742 (2021).
Bartel, C. J. et al. New tolerance factor to predict the stability of perovskite oxides and halides. Sci. Adv. 5, eaav0693 (2019).
Ouyang, B. et al. Synthetic accessibility and stability rules of NASICONs. Nat. Commun. 12, 5752 (2021).
Hanak, J. J. The ‘multiple-sample concept’ in materials research: synthesis, compositional analysis and testing of entire multicomponent systems. J. Mater. Sci. 5, 964–971 (1970).
Xiang, X.-D. et al. A combinatorial approach to materials discovery. Science 268, 1738–1740 (1995).
Szymanski, N. J. et al. Toward autonomous design and synthesis of novel inorganic materials. Mater. Horiz. 8, 2169–2198 (2021).
Adhikari, T. et al. Development of high-throughput methods for sodium-ion battery cathodes. ACS Comb. Sci. 22, 311–318 (2020).
Brown, C. R., McCalla, E., Watson, C. & Dahn, J. R. Combinatorial study of the Li–Ni–Mn–Co oxide pseudoquaternary system for use in Li–ion battery materials research. ACS Comb. Sci. 17, 381–391 (2015).
Potts, K. P., Grignon, E. & McCalla, E. Accelerated screening of high-energy lithium-ion battery cathodes. ACS Appl. Energy Mater. 2, 8388–8393 (2019).
Jonderian, A., Ting, M. & McCalla, E. Metastability in Li–La–Ti–O perovskite materials and its impact on ionic conductivity. Chem. Mater. 33, 4792–4804 (2021).
Yao, Y. et al. High-throughput, combinatorial synthesis of multimetallic nanoclusters. Proc. Natl Acad. Sci. USA 117, 6316–6322 (2020).
Hattrick-Simpers, J. R., Gregoire, J. M. & Kusne, A. G. Perspective: composition–structure–property mapping in high-throughput experiments: turning data into knowledge. APL Mater. 4, 53211 (2016).
Green, M. L. et al. Fulfilling the promise of the materials genome initiative with high-throughput experimental methodologies. Appl. Phys. Rev. 4, 11105 (2017).
Marzari, N., Ferretti, A. & Wolverton, C. Electronic-structure methods for materials design. Nat. Mater. 20, 736–749 (2021).
Axelrod, S. et al. Learning matter: materials design with machine learning and atomistic simulations. Acc. Mater. Res. 3, 343–357 (2022).
Reymond, J.-L., van Deursen, R., Blum, L. C. & Ruddigkeit, L. Chemical space as a source for new drugs. Med. Chem. Commun. 1, 30–38 (2010).
Horton, M. K., Dwaraknath, S. & Persson, K. A. Promises and perils of computational materials databases. Nat. Comput. Sci. 1, 3–5 (2021).
Hachmann, J. et al. The Harvard Clean Energy Project: large-scale computational screening and design of organic photovoltaics on the world community grid. J. Phys. Chem. Lett. 2, 2241–2251 (2011).
Olivares-Amaya, R. et al. Accelerated computational discovery of high-performance materials for organic photovoltaics by means of cheminformatics. Energy Environ. Sci. 4, 4849–4861 (2011).
Hachmann, J. et al. Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry — the Harvard Clean Energy Project. Energy Environ. Sci. 7, 698–704 (2014).
Er, S., Suh, C., Marshak, M. P. & Aspuru-Guzik, A. Computational design of molecules for an all-quinone redox flow battery. Chem. Sci. 6, 885–893 (2015).
Lin, K. et al. A redox-flow battery with an alloxazine-based organic electrolyte. Nat. Energy 1, 16102 (2016).
Boyd, P. G. et al. Data-driven design of metal–organic frameworks for wet flue gas CO2 capture. Nature 576, 253–256 (2019).
Schwalbe-Koda, D. et al. A priori control of zeolite phase competition and intergrowth with high-throughput simulations. Science 374, 308–315 (2021).
Shinde, A. et al. Discovery of manganese-based solar fuel photoanodes via integration of electronic structure calculations, Pourbaix stability modeling, and high-throughput experiments. ACS Energy Lett. 2, 2307–2312 (2017).
Yan, Q. et al. Solar fuels photoanode materials discovery by integrating high-throughput theory and experiment. Proc. Natl Acad. Sci. USA 114, 3040–3043 (2017).
Noh, J. et al. Unveiling new stable manganese based photoanode materials via theoretical high-throughput screening and experiments. Chem. Commun. 55, 13418–13421 (2019).
Xiong, Y. et al. Optimizing accuracy and efficacy in data-driven materials discovery for the solar production of hydrogen. Energy Environ. Sci. 14, 2335–2348 (2021).
Muy, S. et al. High-throughput screening of solid-state Li-ion conductors using lattice-dynamics descriptors. iScience 16, 270–282 (2019).
Wang, S., Wang, Z., Setyawan, W., Mingo, N. & Curtarolo, S. Assessing the thermoelectric properties of sintered compounds via high-throughput ab-initio calculations. Phys. Rev. X 1, 21012 (2011).
Xiao, Y., Miara, L. J., Wang, Y. & Ceder, G. Computational screening of cathode coatings for solid-state batteries. Joule 3, 1252–1275 (2019).
Aykol, M. et al. High-throughput computational design of cathode coatings for Li-ion batteries. Nat. Commun. 7, 13779 (2016).
Jain, A. et al. Commentary: the Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).
Aykol, M., Dwaraknath, S. S., Sun, W. & Persson, K. A. Thermodynamic limit for synthesis of metastable inorganic materials. Sci. Adv. 4, eaaq0148 (2018).
Singh, A. K. et al. Electrochemical stability of metastable materials. Chem. Mater. 29, 10159–10167 (2017).
Talirz, L. et al. Materials Cloud, a platform for open computational science. Sci. Data 7, 299 (2020).
Draxl, C. & Scheffler, M. The NOMAD laboratory: from data sharing to artificial intelligence. J. Phys. Mater. 2, 036001 (2019).
Curtarolo, S. et al. AFLOW: an automatic framework for high-throughput materials discovery. Comput. Mater. Sci. 58, 218–226 (2012).
Curtarolo, S. et al. AFLOWLIB.ORG: a distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227–235 (2012).
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65, 1501–1509 (2013).
Winther, K. T. et al. Catalysis-Hub.org, an open electronic structure database for surface reactions. Sci. Data 6, 75 (2019).
Chanussot, L. et al. Open Catalyst 2020 (OC20) dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).
Rosen, A. S. et al. Machine learning the quantum-chemical properties of metal–organic frameworks for accelerated materials discovery. Matter 4, 1578–1597 (2021).
O’Mara, J., Meredig, B. & Michel, K. Materials data infrastructure: a case study of the Citrination platform to examine data import, storage, and access. JOM 68, 2031–2034 (2016).
Blaiszik, B. et al. The materials data facility: data services to advance materials science research. JOM 68, 2045–2052 (2016).
Zakutayev, A. et al. An open experimental database for exploring inorganic materials. Sci. Data 5, 180053 (2018).
Soedarmadji, E., Stein, H. S., Suram, S. K., Guevarra, D. & Gregoire, J. M. Tracking materials science data lineage to manage millions of materials experiments and analyses. npj Comput. Mater. 5, 79 (2019).
Ma, S. & Liu, Z.-P. Machine learning for atomic simulation and activity prediction in heterogeneous catalysis: current status and future. ACS Catal. 10, 13213–13226 (2020).
Manzhos, S. & Carrington, T. Neural network potential energy surfaces for small molecules and reactions. Chem. Rev. 121, 10187–10217 (2021).
Wang, Y. et al. Design principles for solid-state lithium superionic conductors. Nat. Mater. 14, 1026–1031 (2015).
Rong, Z. et al. Materials design rules for multivalent ion mobility in intercalation structures. Chem. Mater. 27, 6016–6021 (2015).
Canepa, P. et al. High magnesium mobility in ternary spinel chalcogenides. Nat. Commun. 8, 1759 (2017).
Natarajan, A. R. & Van der Ven, A. Machine-learning the configurational energy of multicomponent crystalline solids. npj Comput. Mater. 4, 56 (2018).
Kaufman, J. L., Vinckevičiūtė, J., Krishna Kolli, S., Gabriel Goiri, J. & Van der Ven, A. Understanding intercalation compounds for sodium-ion batteries and beyond. Phil. Trans. R. Soc. A 377, 20190020 (2019).
Wang, Z. et al. Phase stability and sodium-vacancy orderings in a NaSICON electrode. J. Mater. Chem. A 10, 209–217 (2022).
Ament, S. E. et al. Multi-component background learning automates signal detection for spectroscopic data. npj Comput. Mater. 5, 77 (2019).
Kalinin, S. V., Sumpter, B. G. & Archibald, R. K. Big–deep–smart data in imaging for guiding materials design. Nat. Mater. 14, 973–980 (2015).
Spurgeon, S. R. et al. Towards data-driven next-generation transmission electron microscopy. Nat. Mater. 20, 274–279 (2021).
Kalinin, S. V. et al. Machine learning in scanning transmission electron microscopy. Nat. Rev. Methods Prim. 2, 11 (2022).
Friederich, P., Häse, F., Proppe, J. & Aspuru-Guzik, A. Machine-learned potentials for next-generation matter simulations. Nat. Mater. 20, 750–761 (2021).
Calegari Andrade, M. F., Ko, H.-Y., Zhang, L., Car, R. & Selloni, A. Free energy of proton transfer at the water–TiO2 interface from ab initio deep potential molecular dynamics. Chem. Sci. 11, 2335–2341 (2020).
Deringer, V. L. et al. Origins of structural and electronic transitions in disordered silicon. Nature 589, 59–64 (2021).
Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet — a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Wang, W., Yang, T., Harris, W. H. & Gómez-Bombarelli, R. Active learning and neural network potentials accelerate molecular screening of ether-based solvate ionic liquids. Chem. Commun. 56, 8920–8923 (2020).
Margraf, J. T. & Reuter, K. Pure non-local machine-learned density functional theory for electron correlation. Nat. Commun. 12, 344 (2021).
Garrido Torres, J. A., Jennings, P. C., Hansen, M. H., Boes, J. R. & Bligaard, T. Low-scaling algorithm for nudged elastic band calculations using a surrogate machine learning model. Phys. Rev. Lett. 122, 156001 (2019).
Ang, S. J., Wang, W., Schwalbe-Koda, D., Axelrod, S. & Gómez-Bombarelli, R. Active learning accelerates ab initio molecular dynamics on reactive energy surfaces. Chem 7, 738–751 (2021).
Artrith, N., Lin, Z. & Chen, J. G. Predicting the activity and selectivity of bimetallic metal catalysts for ethanol reforming using machine learning. ACS Catal. 10, 9438–9444 (2020).
Oviedo, F. et al. Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks. npj Comput. Mater. 5, 60 (2019).
Maffettone, P. M. et al. Crystallography companion agent for high-throughput materials discovery. Nat. Comput. Sci. 1, 290–297 (2021).
Szymanski, N. J., Bartel, C. J., Zeng, Y., Tu, Q. & Ceder, G. Probabilistic deep learning approach to automate the interpretation of multi-phase diffraction spectra. Chem. Mater. 33, 4204–4215 (2021).
Chen, D. et al. Automating crystal-structure phase mapping by combining deep learning with constraint reasoning. Nat. Mach. Intell. 3, 812–822 (2021).
Pollice, R. et al. Data-driven strategies for accelerated materials design. Acc. Chem. Res. 54, 849–860 (2021).
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).
Musil, F. et al. Physics-inspired structural representations for molecules and materials. Chem. Rev. 121, 9759–9815 (2021).
Bartók, A. P., Kondor, R. & Csányi, G. On representing chemical environments. Phys. Rev. B 87, 184115 (2013).
Smidt, T. E. Euclidean symmetry and equivariance in machine learning. Trends Chem. 3, 82–85 (2021).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Krenn, M., Häse, F., Nigam, A., Friederich, P. & Aspuru-Guzik, A. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach. Learn. Sci. Technol. 1, 45024 (2020).
Duvenaud, D. K. et al. Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 28, 2224–2232 (2015).
Mohapatra, S., Yang, T. & Gómez-Bombarelli, R. Reusability report: designing organic photoelectronic molecules with descriptor conditional recurrent neural networks. Nat. Mach. Intell. 2, 749–752 (2020).
Dunn, A., Wang, Q., Ganose, A., Dopp, D. & Jain, A. Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm. npj Comput. Mater. 6, 138 (2020).
Ziletti, A., Kumar, D., Scheffler, M. & Ghiringhelli, L. M. Insightful classification of crystal structures using deep learning. Nat. Commun. 9, 2775 (2018).
Noh, J. et al. Inverse design of solid-state materials via a continuous representation. Matter 1, 1370–1384 (2019).
Schütt, K. T. et al. How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys. Rev. B 89, 205118 (2014).
Seko, A., Hayashi, H., Nakayama, K., Takahashi, A. & Tanaka, I. Representation of compounds for machine-learning prediction of physical properties. Phys. Rev. B 95, 144110 (2017).
Isayev, O. et al. Universal fragment descriptors for predicting properties of inorganic crystals. Nat. Commun. 8, 15679 (2017).
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Fung, V., Zhang, J., Juarez, E. & Sumpter, B. G. Benchmarking graph neural networks for materials chemistry. npj Comput. Mater. 7, 84 (2021).
Xie, T. & Grossman, J. C. Hierarchical visualization of materials space with graph convolutional neural networks. J. Chem. Phys. 149, 174111 (2018).
Coley, C. W. Defining and exploring chemical spaces. Trends Chem. 3, 133–145 (2021).
Cheng, B. et al. Mapping materials and molecules. Acc. Chem. Res. 53, 1981–1991 (2020).
Ceriotti, M. Unsupervised machine learning in atomistic simulations, between predictions and understanding. J. Chem. Phys. 150, 150901 (2019).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
Wattenberg, M., Viégas, F. & Johnson, I. How to use t-SNE effectively. Distill 1, e2 (2016).
Kobak, D. & Linderman, G. C. Initialization is critical for preserving global data structure in both t-SNE and UMAP. Nat. Biotechnol. 39, 156–157 (2021).
Tran, K. & Ulissi, Z. W. Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution. Nat. Catal. 1, 696–703 (2018).
Zhong, M. et al. Accelerated discovery of CO2 electrocatalysts using active machine learning. Nature 581, 178–183 (2020).
Goldsmith, B. R., Boley, M., Vreeken, J., Scheffler, M. & Ghiringhelli, L. M. Uncovering structure–property relationships of materials by subgroup discovery. N. J. Phys. 19, 13031 (2017).
Hautier, G., Fischer, C., Ehrlacher, V., Jain, A. & Ceder, G. Data mined ionic substitutions for the discovery of new compounds. Inorg. Chem. 50, 656–663 (2011).
Zhang, Y. et al. Unsupervised discovery of solid-state lithium ion conductors. Nat. Commun. 10, 5260 (2019).
Blyth, C. R. On Simpson’s paradox and the sure-thing principle. J. Am. Stat. Assoc. 67, 364–366 (1972).
Li, H. et al. Subgroup discovery points to the prominent role of charge transfer in breaking nitrogen scaling relations at single-atom catalysts on VS2. ACS Catal. 11, 7906–7914 (2021).
Terayama, K., Sumita, M., Tamura, R. & Tsuda, K. Black-box optimization for automated discovery. Acc. Chem. Res. 54, 1334–1346 (2021).
Sun, S. et al. A data fusion approach to optimize compositional stability of halide perovskites. Matter 4, 1305–1322 (2021).
Rohr, B. et al. Benchmarking the acceleration of materials discovery by sequential learning. Chem. Sci. 11, 2696–2706 (2020).
Langner, S. et al. Beyond ternary OPV: high-throughput experimentation and self-driving laboratories optimize multicomponent systems. Adv. Mater. 32, 1907801 (2020).
Shields, B. J. et al. Bayesian reaction optimization as a tool for chemical synthesis. Nature 590, 89–96 (2021).
Kunkel, C., Margraf, J. T., Chen, K., Oberhofer, H. & Reuter, K. Active discovery of organic semiconductors. Nat. Commun. 12, 2422 (2021).
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Granda, J. M., Donina, L., Dragone, V., Long, D.-L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).
Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).
MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. Sci. Adv. 6, eaaz8867 (2020).
MacLeod, B. P. et al. A self-driving laboratory advances the Pareto front for material properties. Nat. Commun. 13, 995 (2022).
Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).
Stach, E. et al. Autonomous experimentation systems for materials development: a community perspective. Matter 4, 2702–2726 (2021).
Schwalbe-Koda, D. & Gómez-Bombarelli, R. in Machine Learning Meets Quantum Physics (eds Schütt, K. T. et al.) 445–467 (Springer, 2020).
Zunger, A. Inverse design in search of materials with target functionalities. Nat. Rev. Chem. 2, 121 (2018).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Int. Conf. on Learning Representations (ICLR, 2014).
Yao, Z. et al. Inverse design of nanoporous crystalline reticular materials with deep generative models. Nat. Mach. Intell. 3, 76–86 (2021).
Esterhuizen, J. A., Goldsmith, B. R. & Linic, S. Interpretable machine learning for knowledge generation in heterogeneous catalysis. Nat. Catal. 5, 175–184 (2022).
Wang, S.-H., Pillai, H. S., Wang, S., Achenie, L. E. K. & Xin, H. Infusing theory into deep learning for interpretable reactivity prediction. Nat. Commun. 12, 5288 (2021).
Ji, W. & Deng, S. Autonomous discovery of unknown reaction pathways from data by chemical reaction neural network. J. Phys. Chem. A 125, 1082–1092 (2021).
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Noh, J., Gu, G. H., Kim, S. & Jung, Y. Machine-enabled inverse design of inorganic solid materials: promises and challenges. Chem. Sci. 11, 4871–4881 (2020).
Ren, Z. et al. An invertible crystallographic representation for general inverse design of inorganic crystals with targeted properties. Matter 5, 314–335 (2022).
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conf. on Machine Learning Vol. 80 (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).
Xie, T., Fu, X., Ganea, O.-E., Barzilay, R. & Jaakkola, T. Crystal diffusion variational autoencoder for periodic material generation. Int. Conf. on Learning Representations (ICLR, 2022).
Nigam, A., Pollice, R., Krenn, M., Gomes, G. P. & Aspuru-Guzik, A. Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES. Chem. Sci. 12, 7079–7090 (2021).
Peng, J., Damewood, J. K., Karaguesian, J., Gómez-Bombarelli, R. & Shao-Horn, Y. Navigating multimetallic catalyst space with Bayesian optimization. Joule 5, 3069–3071 (2021).
Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60, 5714–5723 (2020).
Margraf, J. T., Ulissi, Z. W., Jung, Y. & Reuter, K. Heterogeneous catalysis in grammar school. J. Phys. Chem. C 126, 2931–2936 (2022).
Aykol, M. et al. Network analysis of synthesizable materials discovery. Nat. Commun. 10, 2018 (2019).
Jang, J., Gu, G. H., Noh, J., Kim, J. & Jung, Y. Structure-based synthesizability prediction of crystals using partially supervised learning. J. Am. Chem. Soc. 142, 18836–18843 (2020).
Hong, W. T. et al. Toward the rational design of non-precious transition metal oxides for oxygen electrocatalysis. Energy Environ. Sci. 8, 1404–1427 (2015).
Grimaud, A. et al. Double perovskites as a family of highly active catalysts for oxygen evolution in alkaline solution. Nat. Commun. 4, 2439 (2013).
Lee, Y. L., Kleis, J., Rossmeisl, J., Shao-Horn, Y. & Morgan, D. Prediction of solid oxide fuel cell cathode activity with first-principles descriptors. Energy Environ. Sci. 4, 3966–3970 (2011).
Grimaud, A. et al. Activating lattice oxygen redox reactions in metal oxides to catalyse oxygen evolution. Nat. Chem. 9, 457–465 (2017).
Hong, W. T. et al. Charge-transfer-energy-dependent oxygen evolution reaction mechanisms for perovskite oxides. Energy Environ. Sci. 10, 2190–2200 (2017).
Scharber, M. C. et al. Design rules for donors in bulk-heterojunction solar cells — towards 10% energy-conversion efficiency. Adv. Mater. 18, 789–794 (2006).
Chen, T., Sai Gautam, G. & Canepa, P. Ionic transport in potential coating materials for Mg batteries. Chem. Mater. 31, 8087–8099 (2019).
Gorai, P., Famprikis, T., Singh, B., Stevanović, V. & Canepa, P. Devil is in the defects: electronic conductivity in solid electrolytes. Chem. Mater. 33, 7484–7498 (2021).
Franceschetti, A. & Zunger, A. The inverse band-structure problem of finding an atomic configuration with given electronic properties. Nature 402, 60–63 (1999).
Le, T. C. & Winkler, D. A. Discovery and optimization of materials using evolutionary approaches. Chem. Rev. 116, 6107–6132 (2016).
Nouira, A., Sokolovska, N. & Crivello, J.-C. CrystalGAN: learning to discover crystallographic structures with generative adversarial networks. In Proc. AAAI 2019 Spring Symposium on Combining Machine Learning with Knowledge Engineering (eds Martin, A. et al.) (Stanford University, 2019).
Kim, B., Lee, S. & Kim, J. Inverse design of porous materials using artificial neural networks. Sci. Adv. 6, eaax9324 (2020).
Kim, S., Noh, J., Gu, G. H., Aspuru-Guzik, A. & Jung, Y. Generative adversarial networks for crystal structure prediction. ACS Cent. Sci. 6, 1412–1420 (2020).
Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).
Jørgensen, M. S. et al. Atomistic structure learning. J. Chem. Phys. 151, 54111 (2019).
Meldgaard, S. A., Mortensen, H. L., Jørgensen, M. S. & Hammer, B. Structure prediction of surface reconstructions by deep reinforcement learning. J. Phys. Condens. Matter 32, 404005 (2020).
Trasatti, S. Work function, electronegativity, and electrochemical behaviour of metals. III. Electrolytic hydrogen evolution in acid solutions. J. Electroanal. Chem. 39, 163–184 (1972).
Dahl, S. et al. Role of steps in N2 activation on Ru(0001). Phys. Rev. Lett. 83, 1814–1817 (1999).
Chen, M. S. & Goodman, D. W. The structure of catalytically active gold on titania. Science 306, 252–255 (2004).
Stamenkovic, V. R. et al. Improved oxygen reduction activity on Pt3Ni(111) via increased surface site availability. Science 315, 493–497 (2007).
Jaramillo, T. F. et al. Identification of active edge sites for electrochemical H2 evolution from MoS2 nanocatalysts. Science 317, 100–102 (2007).
Trasatti, S. Electrocatalysis by oxides — attempt at a unifying approach. J. Electroanal. Chem. Interf. Electrochem. 111, 125–131 (1980).
Bockris, J. O. M. O. & Otagawa, T. The electrocatalysis of oxygen evolution on perovskites. J. Electrochem. Soc. 131, 290–302 (1984).
Hammer, B. & Norskov, J. K. Why gold is the noblest of all the metals. Nature 376, 238–240 (1995).
Hammer, B. & Nørskov, J. K. Theoretical surface science and catalysis — calculations and concepts. Adv. Catal. 45, 71–129 (2000).
Jacobsen, C. J. H. et al. Catalyst design by interpolation in the periodic table: bimetallic ammonia synthesis catalysts. J. Am. Chem. Soc. 123, 8404–8405 (2001).
Suntivich, J., May, K. J., Gasteiger, H. A., Goodenough, J. B. & Shao-Horn, Y. A perovskite oxide optimized for molecular orbital principles. Science 334, 1383–1385 (2011).
Suntivich, J. et al. Design principles for oxygen-reduction activity on perovskite oxide catalysts for fuel cells and metal–air batteries. Nat. Chem. 3, 546–550 (2011).
Matsumoto, Y., Yoneyama, H. & Tamura, H. Influence of the nature of the conduction band of transition metal oxides on catalytic activity for oxygen reduction. J. Electroanal. Chem. Interf. Electrochem. 83, 237–243 (1977).
Jacobs, R., Hwang, J., Shao-Horn, Y. & Morgan, D. Assessing correlations of perovskite catalytic performance with electronic structure descriptors. Chem. Mater. 31, 785–797 (2019).
Giordano, L. et al. Electronic structure-based descriptors for oxide properties and functions. Acc. Chem. Res. 55, 298–308 (2022).
Kuznetsov, D. A., Peng, J., Giordano, L., Román-Leshkov, Y. & Shao-Horn, Y. Bismuth substituted strontium cobalt perovskites for catalyzing oxygen evolution. J. Phys. Chem. C 124, 6562–6570 (2020).
Yuan, S. et al. Tunable metal hydroxide–organic frameworks for catalysing oxygen evolution. Nat. Mater. 21, 673–680 (2022).
Lopez, N. et al. On the origin of the catalytic activity of gold nanoparticles for low-temperature CO oxidation. J. Catal. 223, 232–235 (2004).
Calle-Vallejo, F., Loffreda, D., Koper, M. T. M. & Sautet, P. Introducing structural sensitivity into adsorption–energy scaling relations by means of coordination numbers. Nat. Chem. 7, 403–410 (2015).
Calle-Vallejo, F. et al. Finding optimal surface sites on heterogeneous catalysts by counting nearest neighbors. Science 350, 185–189 (2015).
Mavrikakis, M., Hammer, B. & Nørskov, J. K. Effect of strain on the reactivity of metal surfaces. Phys. Rev. Lett. 81, 2819–2822 (1998).
Escudero-Escribano, M. et al. Tuning the activity of Pt alloy electrocatalysts by means of the lanthanide contraction. Science 352, 73–76 (2016).
Chattot, R. et al. Surface distortion as a unifying concept and descriptor in oxygen reduction reaction electrocatalysis. Nat. Mater. 17, 827–833 (2018).
Østergaard, T. M. et al. Oxidation of ethylene carbonate on Li metal oxide surfaces. J. Phys. Chem. C 122, 10442–10449 (2018).
Goodfellow, I. et al. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
This work was supported by the Advanced Research Projects Agency–Energy (ARPA-E), US Department of Energy under award number DE-AR0001220, and by the Toyota Research Institute through the Accelerated Materials Design and Discovery programme.
The authors declare no competing interests.
Peer review information
Nature Reviews Materials thanks Ivano Castelli and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Peng, J., Schwalbe-Koda, D., Akkiraju, K. et al. Human- and machine-centred designs of molecules and materials for sustainability and decarbonization. Nat Rev Mater 7, 991–1009 (2022). https://doi.org/10.1038/s41578-022-00466-5