Abstract
Machine learning (ML) is becoming a method of choice for modelling complex chemical processes and materials. ML provides a surrogate model trained on a reference dataset that can be used to establish a relationship between a molecular structure and its chemical properties. This Review highlights developments in the use of ML to evaluate chemical properties such as partial atomic charges, dipole moments, spin and electron densities, and chemical bonding, as well as to obtain a reduced quantum-mechanical description. We overview several modern neural network architectures, their predictive capabilities, generality and transferability, and illustrate their applicability to various chemical properties. We emphasize that learned molecular representations resemble quantum-mechanical analogues, demonstrating the ability of the models to capture the underlying physics. We also discuss how ML models can describe non-local quantum effects. Finally, we conclude by compiling a list of available ML toolboxes, summarizing the unresolved challenges and presenting an outlook for future development. The observed trends demonstrate that this field is evolving towards physics-based models augmented by ML, which is accompanied by the development of new methods and the rapid growth of user-friendly ML frameworks for chemistry.

This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Uncertainty-driven dynamics for active learning of interatomic potentials
Nature Computational Science Open Access 06 March 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout







Change history
16 November 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41570-022-00446-x
References
Purvis, G. D. & Bartlett, R. J. A full coupled-cluster singles and doubles model: the inclusion of disconnected triples. J. Chem. Phys. 76, 1910–1918 (1982).
Burke, K. Perspective on density functional theory. J. Chem. Phys. 136, 150901 (2012).
Mardirossian, N. & Head-Gordon, M. Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Mol. Phys. 115, 2315–2372 (2017).
Thiel, W. Semiempirical quantum–chemical methods. WIREs Comput. Mol. Sci. 4, 145–157 (2014).
Ratcliff, L. E. et al. Challenges in large scale quantum mechanical calculations. WIREs Comput. Mol. Sci. 7, e1290 (2017).
von Lilienfeld, O. A., Müller, K.-R. & Tkatchenko, A. Exploring chemical compound space with quantum-based machine learning. Nat. Rev. Chem. 4, 347–358 (2020).
Keith, J. A. et al. Combining machine learning and computational chemistry for predictive insights into chemical systems. Chem. Rev. 121, 9816–9872 (2021).
Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).
Dral, P. O. Quantum chemistry in the age of machine learning. J. Phys. Chem. Lett. 11, 2336–2347 (2020).
Musil, F. et al. Physics-inspired structural representations for molecules and materials. Chem. Rev. 121, 9759–9815 (2021).
Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).
David, L., Thakkar, A., Mercado, R. & Engkvist, O. Molecular representations in AI-driven drug discovery: a review and practical guide. J. Cheminform. 12, 56 (2020).
Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).
Pollice, R. et al. Data-driven strategies for accelerated materials design. Acc. Chem. Res. 54, 849–860 (2021).
Guo, H., Wang, Q., Stuke, A., Urban, A. & Artrith, N. Accelerated atomistic modeling of solid-state battery materials with machine learning. Front. Energy Res. 9, 265 (2021).
Kulichenko, M. et al. The rise of neural networks for materials and chemical dynamics. J. Phys. Chem. Lett. 12, 6227–6243 (2021).
Behler, J. Four generations of high-dimensional neural network potentials. Chem. Rev. 121, 10037–10072 (2021).
Gokcan, H. & Isayev, O. Learning molecular potentials with neural networks. WIREs Comput. Mol. Sci. 12, e1564.
Dral, P. O. & Barbatti, M. Molecular excited states through a machine learning lens. Nat. Rev. Chem. 5, 388–405 (2021).
Westermayr, J. & Marquetand, P. Machine learning for electronically excited states of molecules. Chem. Rev. 121, 9873–9926 (2021).
Jorner, K., Tomberg, A., Bauer, C., Sköld, C. & Norrby, P.-O. Organic reactivity from mechanism to machine learning. Nat. Rev. Chem. 5, 240–255 (2021).
Gallegos, L. C., Luchini, G., St. John, P. C., Kim, S. & Paton, R. S. Importance of engineered and learned molecular representations in predicting organic reactivity, selectivity, and chemical properties. Acc. Chem. Res. 54, 827–836 (2021).
Toyao, T. et al. Machine learning for catalysis informatics: recent applications and prospects. ACS Catal. 10, 2260–2297 (2020).
Yang, X., Wang, Y., Byrne, R., Schneider, G. & Yang, S. Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev. 119, 10520–10594 (2019).
Granda, J. M., Donina, L., Dragone, V., Long, D.-L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).
Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).
Bartók, A. P. & Csányi, G. Gaussian approximation potentials: a brief tutorial introduction. Int. J. Quantum Chem. 115, 1051–1057 (2015).
Thompson, A. P., Swiler, L. P., Trott, C. R., Foiles, S. M. & Tucker, G. J. Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials. J. Comput. Phys. 285, 316–330 (2015).
Novikov, I. S., Gubaev, K., Podryabinkin, E. V. & Shapeev, A. V. The MLIP package: moment tensor potentials with MPI and active learning. Mach. Learn. Sci. Technol. 2, 025002 (2021).
Chmiela, S., Sauceda, H. E., Poltavsky, I., Müller, K.-R. & Tkatchenko, A. sGDML: Constructing accurate and data efficient molecular force fields using machine learning. Comput. Phys. Commun. 240, 38–45 (2019).
Chmiela, S., Sauceda, H. E., Müller, K.-R. & Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 9, 3887 (2018).
Gubaev, K., Podryabinkin, E. V. & Shapeev, A. V. Machine learning of molecular properties: locality and active learning. J. Chem. Phys. 148, 241727 (2018).
Behler, J. Perspective: machine learning potentials for atomistic simulations. J. Chem. Phys. 145, 170901 (2016).
Behler, J. & Csányi, G. Machine learning potentials for extended systems: a perspective. Eur. Phys. J. B 94, 142 (2021).
Daw, M. S., Foiles, S. M. & Baskes, M. I. The embedded-atom method: a review of theory and applications. Mater. Sci. Rep. 9, 251–310 (1993).
Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).
Behler, J. Constructing high-dimensional neural network potentials: a tutorial review. Int. J. Quantum Chem. 115, 1032–1050 (2015).
Ko, T. W., Finkler, J. A., Goedecker, S. & Behler, J. A fourth-generation high-dimensional neural network potential with accurate electrostatics including non-local charge transfer. Nat. Commun. 12, 398 (2021).
Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 52, 2864–2875 (2012).
Devereux, C. et al. Extending the applicability of the ANI deep learning molecular potential to sulfur and halogens. J. Chem. Theory Comput. 16, 4192–4202 (2020).
Zubatyuk, R., Smith, J. S., Leszczynski, J. & Isayev, O. Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network. Sci. Adv. 5, eaav6490 (2019).
Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet — a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Schütt, K. T. et al. SchNetPack: a deep learning toolbox for atomistic systems. J. Chem. Theory Comput. 15, 448–455 (2019).
Unke, O. T. & Meuwly, M. PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).
Gasteiger, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. Preprint at arXiv https://doi.org/10.48550/arXiv.2003.03123 (2020).
Gasteiger, J., Giri, S., Margraf, J. T. & Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. Preprint at arXiv https://doi.org/10.48550/arXiv.2011.14115 (2020).
Mueller, T., Hernandez, A. & Wang, C. Machine learning for interatomic potential models. J. Chem. Phys. 152, 050902 (2020).
Glick, Z. L., Koutsoukas, A., Cheney, D. L. & Sherrill, C. D. Cartesian message passing neural networks for directional properties: fast and transferable atomic multipoles. J. Chem. Phys. 154, 224103 (2021).
Lubbers, N., Smith, J. S. & Barros, K. Hierarchical modeling of molecular energies using a deep neural network. J. Chem. Phys. 148, 241715 (2018).
Nebgen, B. et al. Transferable dynamic molecular charge assignment using deep neural networks. J. Chem. Theory Comput. 14, 4687–4698 (2018).
Sifain, A. E. et al. Discovering a transferable charge assignment model using machine learning. J. Phys. Chem. Lett. 9, 4495–4501 (2018).
Magedov, S., Koh, C., Malone, W., Lubbers, N. & Nebgen, B. Bond order predictions using deep neural networks. J. Appl. Phys. 129, 064701 (2021).
Zubatiuk, T. et al. Machine learned Hückel theory: interfacing physics and deep neural networks. J. Chem. Phys. 154, 244108 (2021).
Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).
Sifain, A. E. et al. Predicting phosphorescence energies and inferring wavefunction localization with machine learning. Chem. Sci. 12, 10207–10217 (2021).
Tretiak, S. & Mukamel, S. Density matrix analysis and simulation of electronic excitations in conjugated and aggregated molecules. Chem. Rev. 102, 3171–3212 (2002).
Bader, R. F. W. Atoms in Molecules: a Quantum Theory (Clarendon Press, 1994).
Zubatyuk, R., Smith, J. S., Nebgen, B. T., Tretiak, S. & Isayev, O. Teaching a neural network to attach and detach electrons from molecules. Nat. Commun. 12, 4870 (2021).
Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).
Miksch, A. M., Morawietz, T., Kästner, J., Urban, A. & Artrith, N. Strategies for the construction of machine-learning potentials for accurate and efficient atomic-scale simulations. Mach. Learn. Sci. Technol. 2, 031001 (2021).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1, a data set of 20 million calculated off-equilibrium conformations for organic molecules. Sci. Data 4, 170193 (2017).
Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 134 (2020).
Chambers, J. et al. UniChem: a unified chemical structure cross-referencing and identifier tracking system. J. Cheminform. 5, 3 (2013).
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).
Nakata, M. & Shimazaki, T. PubChemQC project: a large-scale first-principles electronic structure database for data-driven chemistry. J. Chem. Inf. Model. 57, 1300–1308 (2017).
Curtarolo, S. et al. AFLOW: an automatic framework for high-throughput materials discovery. Comput. Mater. Sci. 58, 218–226 (2012).
Pinheiro, G. A. et al. Machine learning prediction of nine molecular properties based on the SMILES representation of the QM9. J. Phys. Chem. A 124, 9854–9866 (2020).
Wießner, M. et al. Complete determination of molecular orbitals by measurement of phase symmetry and electron density. Nat. Commun. 5, 4156 (2014).
Gao, W. et al. Real-space charge-density imaging with sub-ångström resolution by four-dimensional electron microscopy. Nature 575, 480–484 (2019).
Hirshfeld, F. L. Bonded-atom fragments for describing molecular charge densities. Theor. Chim. Acta 44, 129–138 (1977).
Marenich, A. V., Jerome, S. V., Cramer, C. J. & Truhlar, D. G. Charge Model 5: an extension of Hirshfeld population analysis for the accurate description of molecular interactions in gaseous and condensed phases. J. Chem. Theory Comput. 8, 527–541 (2012).
Singh, U. C. & Kollman, P. A. An approach to computing electrostatic charges for molecules. J. Comput. Chem. 5, 129–145 (1984).
Glendening, E. D., Landis, C. R. & Weinhold, F. Natural bond orbital methods. WIREs Comput. Mol. Sci. 2, 1–42 (2012).
Pérez de la Luz, A., Aguilar-Pineda, J. A., Méndez-Bermúdez, J. G. & Alejandre, J. Force field parametrization from the hirshfeld molecular electronic density. J. Chem. Theory Comput. 14, 5949–5958 (2018).
Honda, S., Yamasaki, K., Sawada, Y. & Morii, H. 10 residue folded peptide designed by segment statistics. Structure 12, 1507–1518 (2004).
Neidigh, J. W., Fesinmeyer, R. M. & Andersen, N. H. Designing a 20-residue protein. Nat. Struct. Mol. Biol. 9, 425–430 (2002).
Ševčík, J. et al. Structure of glucoamylase from Saccharomycopsis fibuligera at 1.7 Å resolution. Acta Cryst. D. 54, 854–866 (1998).
Bleiziffer, P., Schaller, K. & Riniker, S. Machine learning of partial charges derived from high-quality quantum-mechanical calculations. J. Chem. Inf. Model. 58, 579–590 (2018).
Wang, X. & Gao, J. Atomic partial charge predictions for furanoses by random forest regression with atom type symmetry function. RSC Adv. 10, 666–673 (2020).
Kato, K. et al. High-precision atomic charge prediction for protein systems using fragment molecular orbital calculation and machine learning. J. Chem. Inf. Model. 60, 3361–3368 (2020).
Wang, J. et al. Fast and accurate prediction of partial charges using atom-path-descriptor-based machine learning. Bioinformatics 36, 4721–4728 (2020).
Martin, R. & Heider, D. ContraDRG: automatic partial charge prediction by machine learning. Front. Genet. 10, 990 (2019).
Cioslowski, J. & Surján, P. R. An observable-based interpretation of electronic wavefunctions: application to “hypervalent” molecules. J. Mol. Struc. THEOCHEM 255, 9–33 (1992).
Francl, M. M., Carey, C., Chirlian, L. E. & Gange, D. M. Charges fit to electrostatic potentials. II. Can atomic charges be unambiguously fit to electrostatic potentials? J. Comput. Chem. 17, 367–383 (1996).
Veit, M., Wilkins, D. M., Yang, Y., DiStasio, R. A. & Ceriotti, M. Predicting molecular dipole moments by combining atomic partial charges and atomic dipoles. J. Chem. Phys. 153, 024113 (2020).
Yao, K., Herr, J. E., Toth, D. W., Mckintyre, R. & Parkhill, J. The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics. Chem. Sci. 9, 2261–2269 (2018).
Loeffler, J. R. et al. Conformational shifts of stacked heteroaromatics: vacuum vs. water studied by machine learning. Front. Chem. https://doi.org/10.3389/fchem.2021.641610 (2021).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
McGaughey, G. B., Gagné, M. & Rappé, A. K. π-Stacking interactions: alive and well in proteins. J. Biol. Chem. 273, 15458–15463 (1998).
Metcalf, D. P. et al. Approaches for machine learning intermolecular interaction energies and application to energy components from symmetry adapted perturbation theory. J. Chem. Phys. 152, 074103 (2020).
Szalewicz, K. Symmetry-adapted perturbation theory of intermolecular forces. WIREs Comput. Mol. Sci. 2, 254–272 (2012).
Glick, Z. L. et al. AP-Net: an atomic-pairwise neural network for smooth and transferable interaction potentials. J. Chem. Phys. 153, 044112 (2020).
Geerlings, P., De Proft, F. & Langenaeker, W. Conceptual density functional theory. Chem. Rev. 103, 1793–1874 (2003).
Ko, T. W., Finkler, J. A., Goedecker, S. & Behler, J. General-purpose machine learning potentials capturing nonlocal charge transfer. Acc. Chem. Res. 54, 808–817 (2021).
Grisafi, A. et al. Transferable machine-learning model of the electron density. ACS Cent. Sci. 5, 57–64 (2019).
Glielmo, A., Sollich, P. & De Vita, A. Accurate interatomic force fields via machine learning with covariant kernels. Phys. Rev. B 95, 214302 (2017).
Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).
Nguyen, T. T. et al. Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions. J. Chem. Phys. 148, 241725 (2018).
Fabrizio, A., Grisafi, A., Meyer, B., Ceriotti, M. & Corminboeuf, C. Electron density learning of non-covalent systems. Chem. Sci. 10, 9424–9432 (2019).
Deringer, V. L. et al. Gaussian process regression for materials and molecules. Chem. Rev. 121, 10073–10141 (2021).
Cuevas-Zuviría, B. & Pacios, L. F. Analytical model of electron density and its machine learning inference. J. Chem. Inf. Model. 60, 3831–3842 (2020).
Cuevas-Zuviría, B. & Pacios, F. Machine learning of analytical electron density in large molecules through message-passing. J. Chem. Inf. Model. 61, 2658–2666.
Lewis, A. M., Grisafi, A., Ceriotti, M. & Rossi, M. Learning electron densities in the condensed phase. J. Chem. Theory Comput. 17, 7203–7214 (2021).
Zou, S.-J. et al. Recent advances in organic light-emitting diodes: toward smart lighting and displays. Mater. Chem. Front. 4, 788–820 (2020).
Nayak, P. K., Mahesh, S., Snaith, H. J. & Cahen, D. Photovoltaic solar cell technologies: analysing the state of the art. Nat. Rev. Mater. 4, 269–285 (2019).
Hirohata, A. et al. Review on spintronics: principles and device applications. J. Magn. Magn. Mater. 509, 166711 (2020).
Tretiak, S., Chernyak, V. & Mukamel, S. Localized electronic excitations in phenylacetylene dendrimers. J. Phys. Chem. B 102, 3310–3315 (1998).
Zhao, L., Pan, S., Holzmann, N., Schwerdtfeger, P. & Frenking, G. Chemical bonding and bonding models of main-group compounds. Chem. Rev. 119, 8781–8845 (2019).
Mayer, I. Bond order and valence indices: a personal account. J. Comput. Chem. 28, 204–221 (2007).
Wiberg, K. B. Application of the Pople–Santry–Segal CNDO method to the cyclopropylcarbinyl and cyclobutyl cation and to bicyclobutane. Tetrahedron 24, 1083–1096 (1968).
Alonso, M. & Herradón, B. Neural networks as a tool to classify compounds according to aromaticity criteria. Chem. Eur. J. 13, 3913–3923 (2007).
Alonso, M., Miranda, C., Martín, N. & Herradón, B. Chemical applications of neural networks: aromaticity of pyrimidine derivatives. Phys. Chem. Chem. Phys. 13, 20564–20574 (2011).
Ferreira, A. R. Chemical bonding in metallic glasses from machine learning and crystal orbital hamilton population. Phys. Rev. Mater. 4, 113603 (2020).
Matlock, M. K., Dang, N. L. & Swamidass, S. J. Learning a local-variable model of aromatic and conjugated systems. ACS Cent. Sci. 4, 52–62 (2018).
Li, H., Collins, C., Tanha, M., Gordon, G. J. & Yaron, D. J. A density functional tight binding layer for deep learning of chemical Hamiltonians. J. Chem. Theory Comput. 14, 5764–5776 (2018).
Schütt, K. T., Gastegger, M., Tkatchenko, A., Müller, K.-R. & Maurer, R. J. Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nat. Commun. 10, 5024 (2019).
Wang, Z. et al. Machine learning method for tight-binding Hamiltonian parameterization from ab-initio band structure. npj Comput. Mater. 7, 11 (2021).
Hoffmann, R. An extended Hückel theory. I. Hydrocarbons. J. Chem. Phys. 39, 1397–1412 (1963).
Grabill, L. P. & Berger, R. F. Calibrating the extended Hückel method to quantitatively screen the electronic properties of materials. Sci. Rep. 8, 10530 (2018).
Zhou, G., Lubbers, N., Barros, K., Tretiak, S. & Nebgen, B. Deep learning of dynamically responsive chemical Hamiltonians with semiempirical quantum mechanics. Proc. Natl Acad. Sci. USA 119, e2120333119 (2022).
Stewart, J. J. P. Optimization of parameters for semiempirical methods I. Method. J. Comput. Chem. 10, 209–220 (1989).
Elstner, M. et al. Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties. Phys. Rev. B 58, 7260–7268 (1998).
Gaus, M., Cui, Q. & Elstner, M. Density functional tight binding: application to organic and biological molecules. WIREs Comput. Mol. Sci. 4, 49–61 (2014).
Panosetti, C., Engelmann, A., Nemec, L., Reuter, K. & Margraf, J. T. Learning to use the force: fitting repulsive potentials in density-functional tight-binding with Gaussian process regression. J. Chem. Theory Comput. 16, 2181–2191 (2020).
Kranz, J. J., Kubillus, M., Ramakrishnan, R., von Lilienfeld, O. A. & Elstner, M. Generalized density-functional tight-binding repulsive potentials from unsupervised machine learning. J. Chem. Theory Comput. 14, 2341–2352 (2018).
Hastie, T., Tibshirani, R. & Friedman, J. Elements Of Statistical Learning: Data Mining, Inference, And Prediction 2nd edn (Springer, 2009).
Snyder, J. C., Rupp, M., Hansen, K., Müller, K.-R. & Burke, K. Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012).
Li, L. et al. Understanding machine-learned density functionals. Int. J. Quantum Chem. 116, 819–833 (2016).
Brockherde, F. et al. Bypassing the Kohn–Sham equations with machine learning. Nat. Commun. 8, 872 (2017).
Hollingsworth, J., Baker, T. E. & Burke, K. Can exact conditions improve machine-learned density functionals? J. Chem. Phys. 148, 241743 (2018).
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Big data meets quantum chemistry approximations: the Δ-machine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).
McGibbon, R. T. et al. Improving the accuracy of Møller–Plesset perturbation theory with neural networks. J. Chem. Phys. 147, 161725 (2017).
Wilkins, D. M. et al. Accurate molecular polarizabilities with coupled cluster theory and machine learning. Proc. Natl Acad. Sci. 116, 3401–3406 (2019).
Kulik, H. et al. Roadmap on machine learning in electronic structure. Electron. Struct. https://doi.org/10.1088/2516-1075/ac572f (2022).
Gastegger, M., McSloy, A., Luya, M., Schütt, K. T. & Maurer, R. J. A deep neural network for molecular wave functions in quasi-atomic minimal basis representation. J. Chem. Phys. 153, 044123 (2020).
Zubatiuk, T. & Isayev, O. Development of multimodal machine learning potentials: toward a physics-aware artificial intelligence. Acc. Chem. Res. 54, 1575–1585 (2021).
Chandrasekaran, A. et al. Solving the electronic structure problem with machine learning. npj Comput. Mater. 5, 22 (2019).
Smith, J. S. et al. Automated discovery of a robust interatomic potential for aluminum. Nat. Commun. 12, 1257 (2021).
Jia, W. et al. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. Preprint at arXiv https://doi.org/10.48550/arXiv.2005.00223 (2020).
Jung, J. et al. New parallel computing algorithm of molecular dynamics for extremely huge scale biological systems. J. Comput. Chem. 42, 231–241 (2021).
Jinnouchi, R., Miwa, K., Karsai, F., Kresse, G. & Asahi, R. On-the-fly active learning of interatomic potentials for large-scale atomistic simulations. J. Phys. Chem. Lett. 11, 6946–6955 (2020).
Zhang, L., Lin, D.-Y., Wang, H., Car, R. & E, W. Active learning of uniformly accurate interatomic potentials for materials simulation. Phys. Rev. Mater. 3, 023804 (2019).
Noé, F., Olsson, S., Köhler, J. & Wu, H. Boltzmann generators: sampling equilibrium states of many-body systems with deep learning. Science 365, eaaw1147 (2019).
Ribeiro, J. M. L., Bravo, P., Wang, Y. & Tiwary, P. Reweighted autoencoded variational Bayes for enhanced sampling (RAVE). J. Chem. Phys. 149, 072301 (2018).
Wang, Y., Ribeiro, J. M. L. & Tiwary, P. Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics. Nat. Commun. 10, 3573 (2019).
Gebauer, N. W. A., Gastegger, M. & Schütt, K. T. Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. Preprint at arXiv https://doi.org/10.48550/arXiv.1906.00957 (2020).
Kuenneth, C. et al. Polymer informatics with multi-task learning. Patterns 2, 100238 (2021).
Krämer, M. et al. Charge and exciton transfer simulations using machine-learned hamiltonians. J. Chem. Theory Comput. 16, 4061–4070 (2020).
Jeong, W. et al. Automation of active space selection for multireference methods via machine learning on chemical bond dissociation. J. Chem. Theory Comput. 16, 2389–2399 (2020).
Qiao, Z., Welborn, M., Anandkumar, A., Manby, F. R. & Miller, T. F. OrbNet: deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J. Chem. Phys. 153, 124111 (2020).
Artrith, N. & Urban, A. An implementation of artificial neural-network potentials for atomistic materials simulations: performance for TiO2. Comput. Mater. Sci. 114, 135–150 (2016).
Dral, P. O. et al. MLatom 2: an integrative platform for atomistic machine learning. Top. Curr. Chem. 379, 27 (2021).
Khorshidi, A. & Peterson, A. A. Amp: a modular approach to machine learning in atomistic simulations. Computer Phys. Commun. 207, 310–324 (2016).
Kolb, B., Lentz, L. C. & Kolpak, A. M. Discovering charge density functionals and structure-property relationships with PROPhet: a general framework for coupling machine learning and first-principles methods. Sci. Rep. 7, 1192 (2017).
Wang, H., Zhang, L., Han, J. & E, W. DeePMD-kit: a deep learning package for many-body potential energy representation and molecular dynamics. Computer Phys. Commun. 228, 178–184 (2018).
Gao, X., Ramezanghorbani, F., Isayev, O., Smith, J. S. & Roitberg, A. E. TorchANI: a free and open source pytorch-based deep learning implementation of the ANI neural network potentials. J. Chem. Inf. Model. 60, 3408–3415 (2020).
Himanen, L. et al. DScribe: library of descriptors for machine learning in materials science. Comput. Phys. Commun. 247, 106949 (2020).
Haghighatlari, M. et al. ChemML: a machine learning and informatics program package for the analysis, mining, and modeling of chemical and materials data. WIREs Comput. Mol. Sci. 10, e1458 (2020).
Lee, K., Yoo, D., Jeong, W. & Han, S. SIMPLE-NN: an efficient package for training and executing neural-network interatomic potentials. Comput. Phys. Commun. 242, 95–103 (2019).
Shao, Y., Hellström, M., Mitev, P. D., Knijff, L. & Zhang, C. PiNN: a Python library for building atomic neural networks of molecules and materials. J. Chem. Inf. Model. 60, 1184–1193 (2020).
Velde, Gte et al. Chemistry with ADF. J. Comput. Chem. 22, 931–967 (2001).
Larsen, A. H. et al. The atomic simulation environment — a Python library for working with atoms. J. Phys. Condens. Matter 29, 273002 (2017).
Chen, M. S., Morawietz, T., Mori, H., Markland, T. E. & Artrith, N. AENET–LAMMPS and AENET–TINKER: interfaces for accurate and efficient molecular dynamics simulations with machine learning potentials. J. Chem. Phys. 155, 074801 (2021).
Neese, F. Software update: the ORCA program system — version 5.0. WIREs Comput. Mol. Sci. https://doi.org/10.1002/wcms.1606 (2022).
Cova, T. F. G. G. & Pais, A. A. C. C. Deep learning for deep chemistry: optimizing the prediction of chemical patterns. Front. Chem. 7, 809 (2019).
Bzdok, D., Krzywinski, M. & Altman, N. Machine learning: supervised methods. Nat. Methods 15, 5–6 (2018).
Shaidu, Y. et al. A systematic approach to generating accurate neural network potentials: the case of carbon. npj Comput. Mater. 7, 52 (2021).
Botu, V., Batra, R., Chapman, J. & Ramprasad, R. Machine learning force fields: construction, validation, and outlook. J. Phys. Chem. C 121, 511–522 (2017).
Senftle, T. P. et al. The ReaxFF reactive force-field: development, applications and future directions. npj Comput. Mater. 2, 15011 (2016).
Leach, A. R. Molecular Modelling: Principles and Applications 2nd edn, Ch. 7 (Pearson, 2001)
Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142–10186 (2021).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989).
Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251–257 (1991).
Behler, J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. Edn 56, 12828–12840 (2017).
Benoit, M. et al. Measuring transferability issues in machine-learning force fields: the example of gold–iron interactions with linearized potentials. Mach. Learn. Sci. Technol. 2, 025003 (2021).
Anderson, B., Hy, T.-S. & Kondor, R. Cormorant: Covariant Molecular Neural Networks. Preprint at Arxiv https://arxiv.org/abs/1906.04015 (2019).
Jackson, R., Zhang, W. & Pearson, J. TSNet: predicting transition state structures with tensor field networks and transfer learning. Chem. Sci. 12, 10022–10040 (2021).
Kocer, E., Mason, J. K. & Erturk, H. A novel approach to describe chemical environments in high-dimensional neural network potentials. J. Chem. Phys. 150, 154102 (2019).
Acknowledgements
The work at Los Alamos National Laboratory (LANL) was supported by the LANL Directed Research and Development Funds (LDRD) and performed in part at the Center for Nonlinear Studies (CNLS) and the Center for Integrated Nanotechnologies (CINT), a US Department of Energy (DOE) Office of Science user facility at LANL. N.F. and M.K. acknowledge financial support from the Director’s Postdoctoral Fellowship at LANL funded by LDRD. K.B. and S.T. acknowledge support from the US DOE, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security (Triad) contract grant number 89233218CNA000001 (FWP: LANLE3F2). This research used resources provided by the LANL Institutional Computing Program. LANL is managed by Triad National Security for the US DOE’s NNSA, under contract number 89233218CNA000001. A.I.B. acknowledges the R. Gaurth Hansen Professorship. O.I. acknowledges support from the National Science Foundation (NSF) grants CHE-1802789 and CHE-2041108. The work performed by O.I. and R.Z. in part was made possible by the Office of Naval Research (ONR) through support provided by the Energetic Materials Program (MURI grant number N00014-21-1-2476).
Author information
Authors and Affiliations
Contributions
N.F., R.Z., M.K., J.S.S., B.N., R.M., Y.W.L., A.I.B., K.B., O.I. and S.T. researched data for the article. N.F., R.Z., M.K., J.S.S., B.N., K.B., O.I. and S.T. contributed substantially to discussion of the content. N.F., R.Z., M.K., N.L., O.I. and S.T. wrote the article. All authors reviewed and/or edited the manuscript before submission.
Corresponding author
Ethics declarations
Competing interests
Authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Chemistry thanks Eric Bittner, Hao Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Software for Chemistry & Materials — Machine Learning Potentials: https://www.scm.com/product/machine-learning-potentials/
Glossary
- Extensive properties
-
Properties that intrinsically grow with system size. Examples include enthalpies of formation, which scale with the number of chemical bonds in a system, and entropies of solvation.
- Extensibility
-
The applicability of a ML model to systems substantially larger than those in the training set. An example of extensibility is a ML model trained to small organic molecules that can accurately simulate a protein.
- Interaction layers
-
Groups of NN operations that transfer information between atoms. Typically expressed as a local graph convolution, this operation creates new features for each atom based on the features and distances of local neighbouring atoms. Also called message-passing.
- Charge equilibration scheme
-
An approach for predicting charge distributions in molecules or solids using tabulated or ML-predicted atomic properties such as atomic electronegativity and hardness, or bond polarizability.
- Transferability
-
The applicability of a ML model to systems not originally included in the training set. Transferable models exhibit high accuracy for chemical systems that are structurally different from the ones in the training set.
- One-hot scheme
-
A binary vector representation for unranked categorical values. For example, three atom types C, H and O are represented by the vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1), respectively.
- Atomic embedding
-
A fixed-length vector representation of an atom with no information about its environment. Embedding is a learnable map between the atomic number and a high-dimensional feature space. Can incorporate additional atomic physicochemical properties, such as electronegativity.
- Multitask learning
-
When one model is concurrently trained on multiple labels (for example, total molecular energies and atomic forces). Leverages useful information contained in multiple related tasks to improve the general performance of all tasks.
- Non-extensive properties
-
Properties that do not scale directly with system size. Examples include characteristics of localized electronic states, electronic excitation energies, vibrational frequencies, electron affinities and ionization potentials.
- Wiberg bond index
-
(WBI). A quantitative bonding model that expresses the bond order between pairs of atoms. WBI measures the electron population overlap between atoms and frequently numerically aligns with chemical intuition, for example, the WBI of the C=C bond in ethane C2H4 is expected to be close to two.
- Δ-ML
-
A composite approach in which baseline values, obtained from cheap QM methods (usually DFT), are corrected towards a target line, calculated by a more sophisticated level of theory, for example, the coupled-cluster methods of perturbation theory.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fedik, N., Zubatyuk, R., Kulichenko, M. et al. Extending machine learning beyond interatomic potentials for predicting molecular properties. Nat Rev Chem 6, 653–672 (2022). https://doi.org/10.1038/s41570-022-00416-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41570-022-00416-3
This article is cited by
-
Uncertainty-driven dynamics for active learning of interatomic potentials
Nature Computational Science (2023)