Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Extending machine learning beyond interatomic potentials for predicting molecular properties

A Publisher Correction to this article was published on 16 November 2022

This article has been updated

Abstract

Machine learning (ML) is becoming a method of choice for modelling complex chemical processes and materials. ML provides a surrogate model trained on a reference dataset that can be used to establish a relationship between a molecular structure and its chemical properties. This Review highlights developments in the use of ML to evaluate chemical properties such as partial atomic charges, dipole moments, spin and electron densities, and chemical bonding, as well as to obtain a reduced quantum-mechanical description. We overview several modern neural network architectures, their predictive capabilities, generality and transferability, and illustrate their applicability to various chemical properties. We emphasize that learned molecular representations resemble quantum-mechanical analogues, demonstrating the ability of the models to capture the underlying physics. We also discuss how ML models can describe non-local quantum effects. Finally, we conclude by compiling a list of available ML toolboxes, summarizing the unresolved challenges and presenting an outlook for future development. The observed trends demonstrate that this field is evolving towards physics-based models augmented by ML, which is accompanied by the development of new methods and the rapid growth of user-friendly ML frameworks for chemistry.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Coarse atomistic scale of matter from the perspective of a chemist.
Fig. 2: Relationships between chemical structure and properties from local and global perspectives.
Fig. 3: Modern architectures of neural networks for learning local and global properties.
Fig. 4: Machine learning prediction of atomic charges, vibrational spectra, dipoles and quadrupoles.
Fig. 5: Machine learning prediction of spin-polarized charges and total electron density.
Fig. 6: Machine learning prediction of spin-density, bond orders and effective Hamiltonian models.
Fig. 7: Large-scale molecular simulations enabled by machine learning.

Similar content being viewed by others

Change history

References

  1. Purvis, G. D. & Bartlett, R. J. A full coupled-cluster singles and doubles model: the inclusion of disconnected triples. J. Chem. Phys. 76, 1910–1918 (1982).

    Article  CAS  Google Scholar 

  2. Burke, K. Perspective on density functional theory. J. Chem. Phys. 136, 150901 (2012).

    Article  PubMed  Google Scholar 

  3. Mardirossian, N. & Head-Gordon, M. Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Mol. Phys. 115, 2315–2372 (2017).

    Article  CAS  Google Scholar 

  4. Thiel, W. Semiempirical quantum–chemical methods. WIREs Comput. Mol. Sci. 4, 145–157 (2014).

    Article  CAS  Google Scholar 

  5. Ratcliff, L. E. et al. Challenges in large scale quantum mechanical calculations. WIREs Comput. Mol. Sci. 7, e1290 (2017).

    Article  Google Scholar 

  6. von Lilienfeld, O. A., Müller, K.-R. & Tkatchenko, A. Exploring chemical compound space with quantum-based machine learning. Nat. Rev. Chem. 4, 347–358 (2020).

    Article  Google Scholar 

  7. Keith, J. A. et al. Combining machine learning and computational chemistry for predictive insights into chemical systems. Chem. Rev. 121, 9816–9872 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).

    Article  CAS  PubMed  Google Scholar 

  9. Dral, P. O. Quantum chemistry in the age of machine learning. J. Phys. Chem. Lett. 11, 2336–2347 (2020).

    Article  CAS  PubMed  Google Scholar 

  10. Musil, F. et al. Physics-inspired structural representations for molecules and materials. Chem. Rev. 121, 9759–9815 (2021).

    Article  CAS  PubMed  Google Scholar 

  11. Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. David, L., Thakkar, A., Mercado, R. & Engkvist, O. Molecular representations in AI-driven drug discovery: a review and practical guide. J. Cheminform. 12, 56 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Pollice, R. et al. Data-driven strategies for accelerated materials design. Acc. Chem. Res. 54, 849–860 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Guo, H., Wang, Q., Stuke, A., Urban, A. & Artrith, N. Accelerated atomistic modeling of solid-state battery materials with machine learning. Front. Energy Res. 9, 265 (2021).

    Article  Google Scholar 

  16. Kulichenko, M. et al. The rise of neural networks for materials and chemical dynamics. J. Phys. Chem. Lett. 12, 6227–6243 (2021).

    Article  CAS  PubMed  Google Scholar 

  17. Behler, J. Four generations of high-dimensional neural network potentials. Chem. Rev. 121, 10037–10072 (2021).

    Article  CAS  PubMed  Google Scholar 

  18. Gokcan, H. & Isayev, O. Learning molecular potentials with neural networks. WIREs Comput. Mol. Sci. 12, e1564.

  19. Dral, P. O. & Barbatti, M. Molecular excited states through a machine learning lens. Nat. Rev. Chem. 5, 388–405 (2021).

    Article  CAS  Google Scholar 

  20. Westermayr, J. & Marquetand, P. Machine learning for electronically excited states of molecules. Chem. Rev. 121, 9873–9926 (2021).

    Article  CAS  PubMed  Google Scholar 

  21. Jorner, K., Tomberg, A., Bauer, C., Sköld, C. & Norrby, P.-O. Organic reactivity from mechanism to machine learning. Nat. Rev. Chem. 5, 240–255 (2021).

    Article  CAS  Google Scholar 

  22. Gallegos, L. C., Luchini, G., St. John, P. C., Kim, S. & Paton, R. S. Importance of engineered and learned molecular representations in predicting organic reactivity, selectivity, and chemical properties. Acc. Chem. Res. 54, 827–836 (2021).

    Article  CAS  PubMed  Google Scholar 

  23. Toyao, T. et al. Machine learning for catalysis informatics: recent applications and prospects. ACS Catal. 10, 2260–2297 (2020).

    Article  CAS  Google Scholar 

  24. Yang, X., Wang, Y., Byrne, R., Schneider, G. & Yang, S. Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev. 119, 10520–10594 (2019).

    Article  CAS  PubMed  Google Scholar 

  25. Granda, J. M., Donina, L., Dragone, V., Long, D.-L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Bartók, A. P. & Csányi, G. Gaussian approximation potentials: a brief tutorial introduction. Int. J. Quantum Chem. 115, 1051–1057 (2015).

    Article  Google Scholar 

  28. Thompson, A. P., Swiler, L. P., Trott, C. R., Foiles, S. M. & Tucker, G. J. Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials. J. Comput. Phys. 285, 316–330 (2015).

    Article  CAS  Google Scholar 

  29. Novikov, I. S., Gubaev, K., Podryabinkin, E. V. & Shapeev, A. V. The MLIP package: moment tensor potentials with MPI and active learning. Mach. Learn. Sci. Technol. 2, 025002 (2021).

    Article  Google Scholar 

  30. Chmiela, S., Sauceda, H. E., Poltavsky, I., Müller, K.-R. & Tkatchenko, A. sGDML: Constructing accurate and data efficient molecular force fields using machine learning. Comput. Phys. Commun. 240, 38–45 (2019).

    Article  CAS  Google Scholar 

  31. Chmiela, S., Sauceda, H. E., Müller, K.-R. & Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 9, 3887 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Gubaev, K., Podryabinkin, E. V. & Shapeev, A. V. Machine learning of molecular properties: locality and active learning. J. Chem. Phys. 148, 241727 (2018).

    Article  PubMed  Google Scholar 

  33. Behler, J. Perspective: machine learning potentials for atomistic simulations. J. Chem. Phys. 145, 170901 (2016).

    Article  PubMed  Google Scholar 

  34. Behler, J. & Csányi, G. Machine learning potentials for extended systems: a perspective. Eur. Phys. J. B 94, 142 (2021).

    Article  CAS  Google Scholar 

  35. Daw, M. S., Foiles, S. M. & Baskes, M. I. The embedded-atom method: a review of theory and applications. Mater. Sci. Rep. 9, 251–310 (1993).

    Article  CAS  Google Scholar 

  36. Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).

    Article  PubMed  Google Scholar 

  37. Behler, J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys. 134, 074106 (2011).

    Article  PubMed  Google Scholar 

  38. Behler, J. Constructing high-dimensional neural network potentials: a tutorial review. Int. J. Quantum Chem. 115, 1032–1050 (2015).

    Article  CAS  Google Scholar 

  39. Ko, T. W., Finkler, J. A., Goedecker, S. & Behler, J. A fourth-generation high-dimensional neural network potential with accurate electrostatics including non-local charge transfer. Nat. Commun. 12, 398 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 52, 2864–2875 (2012).

    Article  CAS  PubMed  Google Scholar 

  41. Devereux, C. et al. Extending the applicability of the ANI deep learning molecular potential to sulfur and halogens. J. Chem. Theory Comput. 16, 4192–4202 (2020).

    Article  CAS  PubMed  Google Scholar 

  42. Zubatyuk, R., Smith, J. S., Leszczynski, J. & Isayev, O. Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network. Sci. Adv. 5, eaav6490 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet — a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).

    Article  PubMed  Google Scholar 

  44. Schütt, K. T. et al. SchNetPack: a deep learning toolbox for atomistic systems. J. Chem. Theory Comput. 15, 448–455 (2019).

    Article  PubMed  Google Scholar 

  45. Unke, O. T. & Meuwly, M. PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).

    Article  CAS  PubMed  Google Scholar 

  46. Gasteiger, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. Preprint at arXiv https://doi.org/10.48550/arXiv.2003.03123 (2020).

    Article  Google Scholar 

  47. Gasteiger, J., Giri, S., Margraf, J. T. & Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. Preprint at arXiv https://doi.org/10.48550/arXiv.2011.14115 (2020).

    Article  Google Scholar 

  48. Mueller, T., Hernandez, A. & Wang, C. Machine learning for interatomic potential models. J. Chem. Phys. 152, 050902 (2020).

    Article  CAS  PubMed  Google Scholar 

  49. Glick, Z. L., Koutsoukas, A., Cheney, D. L. & Sherrill, C. D. Cartesian message passing neural networks for directional properties: fast and transferable atomic multipoles. J. Chem. Phys. 154, 224103 (2021).

    Article  CAS  PubMed  Google Scholar 

  50. Lubbers, N., Smith, J. S. & Barros, K. Hierarchical modeling of molecular energies using a deep neural network. J. Chem. Phys. 148, 241715 (2018).

    Article  PubMed  Google Scholar 

  51. Nebgen, B. et al. Transferable dynamic molecular charge assignment using deep neural networks. J. Chem. Theory Comput. 14, 4687–4698 (2018).

    Article  CAS  PubMed  Google Scholar 

  52. Sifain, A. E. et al. Discovering a transferable charge assignment model using machine learning. J. Phys. Chem. Lett. 9, 4495–4501 (2018).

    Article  CAS  PubMed  Google Scholar 

  53. Magedov, S., Koh, C., Malone, W., Lubbers, N. & Nebgen, B. Bond order predictions using deep neural networks. J. Appl. Phys. 129, 064701 (2021).

    Article  CAS  Google Scholar 

  54. Zubatiuk, T. et al. Machine learned Hückel theory: interfacing physics and deep neural networks. J. Chem. Phys. 154, 244108 (2021).

    Article  CAS  PubMed  Google Scholar 

  55. Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).

    Article  Google Scholar 

  56. Sifain, A. E. et al. Predicting phosphorescence energies and inferring wavefunction localization with machine learning. Chem. Sci. 12, 10207–10217 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Tretiak, S. & Mukamel, S. Density matrix analysis and simulation of electronic excitations in conjugated and aggregated molecules. Chem. Rev. 102, 3171–3212 (2002).

    Article  CAS  PubMed  Google Scholar 

  58. Bader, R. F. W. Atoms in Molecules: a Quantum Theory (Clarendon Press, 1994).

  59. Zubatyuk, R., Smith, J. S., Nebgen, B. T., Tretiak, S. & Isayev, O. Teaching a neural network to attach and detach electrons from molecules. Nat. Commun. 12, 4870 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).

    Article  PubMed  Google Scholar 

  61. Miksch, A. M., Morawietz, T., Kästner, J., Urban, A. & Artrith, N. Strategies for the construction of machine-learning potentials for accurate and efficient atomic-scale simulations. Mach. Learn. Sci. Technol. 2, 031001 (2021).

    Article  Google Scholar 

  62. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1, a data set of 20 million calculated off-equilibrium conformations for organic molecules. Sci. Data 4, 170193 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 134 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Chambers, J. et al. UniChem: a unified chemical structure cross-referencing and identifier tracking system. J. Cheminform. 5, 3 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    Article  CAS  PubMed  Google Scholar 

  66. Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Nakata, M. & Shimazaki, T. PubChemQC project: a large-scale first-principles electronic structure database for data-driven chemistry. J. Chem. Inf. Model. 57, 1300–1308 (2017).

    Article  CAS  PubMed  Google Scholar 

  68. Curtarolo, S. et al. AFLOW: an automatic framework for high-throughput materials discovery. Comput. Mater. Sci. 58, 218–226 (2012).

    Article  CAS  Google Scholar 

  69. Pinheiro, G. A. et al. Machine learning prediction of nine molecular properties based on the SMILES representation of the QM9. J. Phys. Chem. A 124, 9854–9866 (2020).

    Article  CAS  PubMed  Google Scholar 

  70. Wießner, M. et al. Complete determination of molecular orbitals by measurement of phase symmetry and electron density. Nat. Commun. 5, 4156 (2014).

    Article  PubMed  Google Scholar 

  71. Gao, W. et al. Real-space charge-density imaging with sub-ångström resolution by four-dimensional electron microscopy. Nature 575, 480–484 (2019).

    Article  CAS  PubMed  Google Scholar 

  72. Hirshfeld, F. L. Bonded-atom fragments for describing molecular charge densities. Theor. Chim. Acta 44, 129–138 (1977).

    Article  CAS  Google Scholar 

  73. Marenich, A. V., Jerome, S. V., Cramer, C. J. & Truhlar, D. G. Charge Model 5: an extension of Hirshfeld population analysis for the accurate description of molecular interactions in gaseous and condensed phases. J. Chem. Theory Comput. 8, 527–541 (2012).

    Article  CAS  PubMed  Google Scholar 

  74. Singh, U. C. & Kollman, P. A. An approach to computing electrostatic charges for molecules. J. Comput. Chem. 5, 129–145 (1984).

    Article  CAS  Google Scholar 

  75. Glendening, E. D., Landis, C. R. & Weinhold, F. Natural bond orbital methods. WIREs Comput. Mol. Sci. 2, 1–42 (2012).

    Article  CAS  Google Scholar 

  76. Pérez de la Luz, A., Aguilar-Pineda, J. A., Méndez-Bermúdez, J. G. & Alejandre, J. Force field parametrization from the hirshfeld molecular electronic density. J. Chem. Theory Comput. 14, 5949–5958 (2018).

    Article  PubMed  Google Scholar 

  77. Honda, S., Yamasaki, K., Sawada, Y. & Morii, H. 10 residue folded peptide designed by segment statistics. Structure 12, 1507–1518 (2004).

    Article  CAS  PubMed  Google Scholar 

  78. Neidigh, J. W., Fesinmeyer, R. M. & Andersen, N. H. Designing a 20-residue protein. Nat. Struct. Mol. Biol. 9, 425–430 (2002).

    Article  CAS  Google Scholar 

  79. Ševčík, J. et al. Structure of glucoamylase from Saccharomycopsis fibuligera at 1.7 Å resolution. Acta Cryst. D. 54, 854–866 (1998).

    Article  Google Scholar 

  80. Bleiziffer, P., Schaller, K. & Riniker, S. Machine learning of partial charges derived from high-quality quantum-mechanical calculations. J. Chem. Inf. Model. 58, 579–590 (2018).

    Article  CAS  PubMed  Google Scholar 

  81. Wang, X. & Gao, J. Atomic partial charge predictions for furanoses by random forest regression with atom type symmetry function. RSC Adv. 10, 666–673 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Kato, K. et al. High-precision atomic charge prediction for protein systems using fragment molecular orbital calculation and machine learning. J. Chem. Inf. Model. 60, 3361–3368 (2020).

    Article  CAS  PubMed  Google Scholar 

  83. Wang, J. et al. Fast and accurate prediction of partial charges using atom-path-descriptor-based machine learning. Bioinformatics 36, 4721–4728 (2020).

    Article  CAS  PubMed  Google Scholar 

  84. Martin, R. & Heider, D. ContraDRG: automatic partial charge prediction by machine learning. Front. Genet. 10, 990 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Cioslowski, J. & Surján, P. R. An observable-based interpretation of electronic wavefunctions: application to “hypervalent” molecules. J. Mol. Struc. THEOCHEM 255, 9–33 (1992).

    Article  Google Scholar 

  86. Francl, M. M., Carey, C., Chirlian, L. E. & Gange, D. M. Charges fit to electrostatic potentials. II. Can atomic charges be unambiguously fit to electrostatic potentials? J. Comput. Chem. 17, 367–383 (1996).

    Article  CAS  Google Scholar 

  87. Veit, M., Wilkins, D. M., Yang, Y., DiStasio, R. A. & Ceriotti, M. Predicting molecular dipole moments by combining atomic partial charges and atomic dipoles. J. Chem. Phys. 153, 024113 (2020).

    Article  CAS  PubMed  Google Scholar 

  88. Yao, K., Herr, J. E., Toth, D. W., Mckintyre, R. & Parkhill, J. The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics. Chem. Sci. 9, 2261–2269 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Loeffler, J. R. et al. Conformational shifts of stacked heteroaromatics: vacuum vs. water studied by machine learning. Front. Chem. https://doi.org/10.3389/fchem.2021.641610 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. McGaughey, G. B., Gagné, M. & Rappé, A. K. π-Stacking interactions: alive and well in proteins. J. Biol. Chem. 273, 15458–15463 (1998).

    Article  CAS  PubMed  Google Scholar 

  92. Metcalf, D. P. et al. Approaches for machine learning intermolecular interaction energies and application to energy components from symmetry adapted perturbation theory. J. Chem. Phys. 152, 074103 (2020).

    Article  CAS  PubMed  Google Scholar 

  93. Szalewicz, K. Symmetry-adapted perturbation theory of intermolecular forces. WIREs Comput. Mol. Sci. 2, 254–272 (2012).

    Article  CAS  Google Scholar 

  94. Glick, Z. L. et al. AP-Net: an atomic-pairwise neural network for smooth and transferable interaction potentials. J. Chem. Phys. 153, 044112 (2020).

    Article  CAS  PubMed  Google Scholar 

  95. Geerlings, P., De Proft, F. & Langenaeker, W. Conceptual density functional theory. Chem. Rev. 103, 1793–1874 (2003).

    Article  CAS  PubMed  Google Scholar 

  96. Ko, T. W., Finkler, J. A., Goedecker, S. & Behler, J. General-purpose machine learning potentials capturing nonlocal charge transfer. Acc. Chem. Res. 54, 808–817 (2021).

    Article  CAS  PubMed  Google Scholar 

  97. Grisafi, A. et al. Transferable machine-learning model of the electron density. ACS Cent. Sci. 5, 57–64 (2019).

    Article  CAS  PubMed  Google Scholar 

  98. Glielmo, A., Sollich, P. & De Vita, A. Accurate interatomic force fields via machine learning with covariant kernels. Phys. Rev. B 95, 214302 (2017).

    Article  Google Scholar 

  99. Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).

    Article  PubMed  Google Scholar 

  100. Nguyen, T. T. et al. Comparison of permutationally invariant polynomials, neural networks, and Gaussian approximation potentials in representing water interactions through many-body expansions. J. Chem. Phys. 148, 241725 (2018).

    Article  PubMed  Google Scholar 

  101. Fabrizio, A., Grisafi, A., Meyer, B., Ceriotti, M. & Corminboeuf, C. Electron density learning of non-covalent systems. Chem. Sci. 10, 9424–9432 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Deringer, V. L. et al. Gaussian process regression for materials and molecules. Chem. Rev. 121, 10073–10141 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Cuevas-Zuviría, B. & Pacios, L. F. Analytical model of electron density and its machine learning inference. J. Chem. Inf. Model. 60, 3831–3842 (2020).

    Article  PubMed  Google Scholar 

  104. Cuevas-Zuviría, B. & Pacios, F. Machine learning of analytical electron density in large molecules through message-passing. J. Chem. Inf. Model. 61, 2658–2666.

  105. Lewis, A. M., Grisafi, A., Ceriotti, M. & Rossi, M. Learning electron densities in the condensed phase. J. Chem. Theory Comput. 17, 7203–7214 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Zou, S.-J. et al. Recent advances in organic light-emitting diodes: toward smart lighting and displays. Mater. Chem. Front. 4, 788–820 (2020).

    Article  CAS  Google Scholar 

  107. Nayak, P. K., Mahesh, S., Snaith, H. J. & Cahen, D. Photovoltaic solar cell technologies: analysing the state of the art. Nat. Rev. Mater. 4, 269–285 (2019).

    Article  CAS  Google Scholar 

  108. Hirohata, A. et al. Review on spintronics: principles and device applications. J. Magn. Magn. Mater. 509, 166711 (2020).

    Article  CAS  Google Scholar 

  109. Tretiak, S., Chernyak, V. & Mukamel, S. Localized electronic excitations in phenylacetylene dendrimers. J. Phys. Chem. B 102, 3310–3315 (1998).

    Article  CAS  Google Scholar 

  110. Zhao, L., Pan, S., Holzmann, N., Schwerdtfeger, P. & Frenking, G. Chemical bonding and bonding models of main-group compounds. Chem. Rev. 119, 8781–8845 (2019).

    Article  CAS  PubMed  Google Scholar 

  111. Mayer, I. Bond order and valence indices: a personal account. J. Comput. Chem. 28, 204–221 (2007).

    Article  CAS  PubMed  Google Scholar 

  112. Wiberg, K. B. Application of the Pople–Santry–Segal CNDO method to the cyclopropylcarbinyl and cyclobutyl cation and to bicyclobutane. Tetrahedron 24, 1083–1096 (1968).

    Article  CAS  Google Scholar 

  113. Alonso, M. & Herradón, B. Neural networks as a tool to classify compounds according to aromaticity criteria. Chem. Eur. J. 13, 3913–3923 (2007).

    Article  CAS  PubMed  Google Scholar 

  114. Alonso, M., Miranda, C., Martín, N. & Herradón, B. Chemical applications of neural networks: aromaticity of pyrimidine derivatives. Phys. Chem. Chem. Phys. 13, 20564–20574 (2011).

    Article  CAS  PubMed  Google Scholar 

  115. Ferreira, A. R. Chemical bonding in metallic glasses from machine learning and crystal orbital hamilton population. Phys. Rev. Mater. 4, 113603 (2020).

    Article  CAS  Google Scholar 

  116. Matlock, M. K., Dang, N. L. & Swamidass, S. J. Learning a local-variable model of aromatic and conjugated systems. ACS Cent. Sci. 4, 52–62 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Li, H., Collins, C., Tanha, M., Gordon, G. J. & Yaron, D. J. A density functional tight binding layer for deep learning of chemical Hamiltonians. J. Chem. Theory Comput. 14, 5764–5776 (2018).

  118. Schütt, K. T., Gastegger, M., Tkatchenko, A., Müller, K.-R. & Maurer, R. J. Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nat. Commun. 10, 5024 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  119. Wang, Z. et al. Machine learning method for tight-binding Hamiltonian parameterization from ab-initio band structure. npj Comput. Mater. 7, 11 (2021).

    Article  CAS  Google Scholar 

  120. Hoffmann, R. An extended Hückel theory. I. Hydrocarbons. J. Chem. Phys. 39, 1397–1412 (1963).

    Article  CAS  Google Scholar 

  121. Grabill, L. P. & Berger, R. F. Calibrating the extended Hückel method to quantitatively screen the electronic properties of materials. Sci. Rep. 8, 10530 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  122. Zhou, G., Lubbers, N., Barros, K., Tretiak, S. & Nebgen, B. Deep learning of dynamically responsive chemical Hamiltonians with semiempirical quantum mechanics. Proc. Natl Acad. Sci. USA 119, e2120333119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Stewart, J. J. P. Optimization of parameters for semiempirical methods I. Method. J. Comput. Chem. 10, 209–220 (1989).

    Article  CAS  Google Scholar 

  124. Elstner, M. et al. Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties. Phys. Rev. B 58, 7260–7268 (1998).

    Article  CAS  Google Scholar 

  125. Gaus, M., Cui, Q. & Elstner, M. Density functional tight binding: application to organic and biological molecules. WIREs Comput. Mol. Sci. 4, 49–61 (2014).

    Article  CAS  Google Scholar 

  126. Panosetti, C., Engelmann, A., Nemec, L., Reuter, K. & Margraf, J. T. Learning to use the force: fitting repulsive potentials in density-functional tight-binding with Gaussian process regression. J. Chem. Theory Comput. 16, 2181–2191 (2020).

    Article  CAS  PubMed  Google Scholar 

  127. Kranz, J. J., Kubillus, M., Ramakrishnan, R., von Lilienfeld, O. A. & Elstner, M. Generalized density-functional tight-binding repulsive potentials from unsupervised machine learning. J. Chem. Theory Comput. 14, 2341–2352 (2018).

    Article  CAS  PubMed  Google Scholar 

  128. Hastie, T., Tibshirani, R. & Friedman, J. Elements Of Statistical Learning: Data Mining, Inference, And Prediction 2nd edn (Springer, 2009).

  129. Snyder, J. C., Rupp, M., Hansen, K., Müller, K.-R. & Burke, K. Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012).

    Article  PubMed  Google Scholar 

  130. Li, L. et al. Understanding machine-learned density functionals. Int. J. Quantum Chem. 116, 819–833 (2016).

    Article  CAS  Google Scholar 

  131. Brockherde, F. et al. Bypassing the Kohn–Sham equations with machine learning. Nat. Commun. 8, 872 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  132. Hollingsworth, J., Baker, T. E. & Burke, K. Can exact conditions improve machine-learned density functionals? J. Chem. Phys. 148, 241743 (2018).

    Article  PubMed  Google Scholar 

  133. Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Big data meets quantum chemistry approximations: the Δ-machine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).

    Article  CAS  PubMed  Google Scholar 

  134. McGibbon, R. T. et al. Improving the accuracy of Møller–Plesset perturbation theory with neural networks. J. Chem. Phys. 147, 161725 (2017).

    Article  PubMed  Google Scholar 

  135. Wilkins, D. M. et al. Accurate molecular polarizabilities with coupled cluster theory and machine learning. Proc. Natl Acad. Sci. 116, 3401–3406 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Kulik, H. et al. Roadmap on machine learning in electronic structure. Electron. Struct. https://doi.org/10.1088/2516-1075/ac572f (2022).

    Article  Google Scholar 

  137. Gastegger, M., McSloy, A., Luya, M., Schütt, K. T. & Maurer, R. J. A deep neural network for molecular wave functions in quasi-atomic minimal basis representation. J. Chem. Phys. 153, 044123 (2020).

    Article  CAS  PubMed  Google Scholar 

  138. Zubatiuk, T. & Isayev, O. Development of multimodal machine learning potentials: toward a physics-aware artificial intelligence. Acc. Chem. Res. 54, 1575–1585 (2021).

    Article  CAS  PubMed  Google Scholar 

  139. Chandrasekaran, A. et al. Solving the electronic structure problem with machine learning. npj Comput. Mater. 5, 22 (2019).

    Article  Google Scholar 

  140. Smith, J. S. et al. Automated discovery of a robust interatomic potential for aluminum. Nat. Commun. 12, 1257 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Jia, W. et al. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. Preprint at arXiv https://doi.org/10.48550/arXiv.2005.00223 (2020).

  142. Jung, J. et al. New parallel computing algorithm of molecular dynamics for extremely huge scale biological systems. J. Comput. Chem. 42, 231–241 (2021).

    Article  CAS  PubMed  Google Scholar 

  143. Jinnouchi, R., Miwa, K., Karsai, F., Kresse, G. & Asahi, R. On-the-fly active learning of interatomic potentials for large-scale atomistic simulations. J. Phys. Chem. Lett. 11, 6946–6955 (2020).

    Article  CAS  PubMed  Google Scholar 

  144. Zhang, L., Lin, D.-Y., Wang, H., Car, R. & E, W. Active learning of uniformly accurate interatomic potentials for materials simulation. Phys. Rev. Mater. 3, 023804 (2019).

    Article  CAS  Google Scholar 

  145. Noé, F., Olsson, S., Köhler, J. & Wu, H. Boltzmann generators: sampling equilibrium states of many-body systems with deep learning. Science 365, eaaw1147 (2019).

    Article  PubMed  Google Scholar 

  146. Ribeiro, J. M. L., Bravo, P., Wang, Y. & Tiwary, P. Reweighted autoencoded variational Bayes for enhanced sampling (RAVE). J. Chem. Phys. 149, 072301 (2018).

    Article  PubMed  Google Scholar 

  147. Wang, Y., Ribeiro, J. M. L. & Tiwary, P. Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics. Nat. Commun. 10, 3573 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  148. Gebauer, N. W. A., Gastegger, M. & Schütt, K. T. Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. Preprint at arXiv https://doi.org/10.48550/arXiv.1906.00957 (2020).

    Article  Google Scholar 

  149. Kuenneth, C. et al. Polymer informatics with multi-task learning. Patterns 2, 100238 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Krämer, M. et al. Charge and exciton transfer simulations using machine-learned hamiltonians. J. Chem. Theory Comput. 16, 4061–4070 (2020).

    Article  PubMed  Google Scholar 

  151. Jeong, W. et al. Automation of active space selection for multireference methods via machine learning on chemical bond dissociation. J. Chem. Theory Comput. 16, 2389–2399 (2020).

    Article  CAS  PubMed  Google Scholar 

  152. Qiao, Z., Welborn, M., Anandkumar, A., Manby, F. R. & Miller, T. F. OrbNet: deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J. Chem. Phys. 153, 124111 (2020).

    Article  CAS  PubMed  Google Scholar 

  153. Artrith, N. & Urban, A. An implementation of artificial neural-network potentials for atomistic materials simulations: performance for TiO2. Comput. Mater. Sci. 114, 135–150 (2016).

    Article  CAS  Google Scholar 

  154. Dral, P. O. et al. MLatom 2: an integrative platform for atomistic machine learning. Top. Curr. Chem. 379, 27 (2021).

    Article  CAS  Google Scholar 

  155. Khorshidi, A. & Peterson, A. A. Amp: a modular approach to machine learning in atomistic simulations. Computer Phys. Commun. 207, 310–324 (2016).

    Article  CAS  Google Scholar 

  156. Kolb, B., Lentz, L. C. & Kolpak, A. M. Discovering charge density functionals and structure-property relationships with PROPhet: a general framework for coupling machine learning and first-principles methods. Sci. Rep. 7, 1192 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  157. Wang, H., Zhang, L., Han, J. & E, W. DeePMD-kit: a deep learning package for many-body potential energy representation and molecular dynamics. Computer Phys. Commun. 228, 178–184 (2018).

    Article  CAS  Google Scholar 

  158. Gao, X., Ramezanghorbani, F., Isayev, O., Smith, J. S. & Roitberg, A. E. TorchANI: a free and open source pytorch-based deep learning implementation of the ANI neural network potentials. J. Chem. Inf. Model. 60, 3408–3415 (2020).

    Article  CAS  PubMed  Google Scholar 

  159. Himanen, L. et al. DScribe: library of descriptors for machine learning in materials science. Comput. Phys. Commun. 247, 106949 (2020).

    Article  CAS  Google Scholar 

  160. Haghighatlari, M. et al. ChemML: a machine learning and informatics program package for the analysis, mining, and modeling of chemical and materials data. WIREs Comput. Mol. Sci. 10, e1458 (2020).

    Article  CAS  Google Scholar 

  161. Lee, K., Yoo, D., Jeong, W. & Han, S. SIMPLE-NN: an efficient package for training and executing neural-network interatomic potentials. Comput. Phys. Commun. 242, 95–103 (2019).

    Article  CAS  Google Scholar 

  162. Shao, Y., Hellström, M., Mitev, P. D., Knijff, L. & Zhang, C. PiNN: a Python library for building atomic neural networks of molecules and materials. J. Chem. Inf. Model. 60, 1184–1193 (2020).

    Article  CAS  PubMed  Google Scholar 

  163. Velde, Gte et al. Chemistry with ADF. J. Comput. Chem. 22, 931–967 (2001).

    Article  Google Scholar 

  164. Larsen, A. H. et al. The atomic simulation environment — a Python library for working with atoms. J. Phys. Condens. Matter 29, 273002 (2017).

    Article  Google Scholar 

  165. Chen, M. S., Morawietz, T., Mori, H., Markland, T. E. & Artrith, N. AENET–LAMMPS and AENET–TINKER: interfaces for accurate and efficient molecular dynamics simulations with machine learning potentials. J. Chem. Phys. 155, 074801 (2021).

    Article  CAS  PubMed  Google Scholar 

  166. Neese, F. Software update: the ORCA program system — version 5.0. WIREs Comput. Mol. Sci. https://doi.org/10.1002/wcms.1606 (2022).

    Article  Google Scholar 

  167. Cova, T. F. G. G. & Pais, A. A. C. C. Deep learning for deep chemistry: optimizing the prediction of chemical patterns. Front. Chem. 7, 809 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  168. Bzdok, D., Krzywinski, M. & Altman, N. Machine learning: supervised methods. Nat. Methods 15, 5–6 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  169. Shaidu, Y. et al. A systematic approach to generating accurate neural network potentials: the case of carbon. npj Comput. Mater. 7, 52 (2021).

    Article  CAS  Google Scholar 

  170. Botu, V., Batra, R., Chapman, J. & Ramprasad, R. Machine learning force fields: construction, validation, and outlook. J. Phys. Chem. C 121, 511–522 (2017).

    Article  CAS  Google Scholar 

  171. Senftle, T. P. et al. The ReaxFF reactive force-field: development, applications and future directions. npj Comput. Mater. 2, 15011 (2016).

    Article  CAS  Google Scholar 

  172. Leach, A. R. Molecular Modelling: Principles and Applications 2nd edn, Ch. 7 (Pearson, 2001)

  173. Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142–10186 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  174. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  CAS  PubMed  Google Scholar 

  175. Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989).

    Article  Google Scholar 

  176. Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251–257 (1991).

    Article  Google Scholar 

  177. Behler, J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. Edn 56, 12828–12840 (2017).

    Article  CAS  Google Scholar 

  178. Benoit, M. et al. Measuring transferability issues in machine-learning force fields: the example of gold–iron interactions with linearized potentials. Mach. Learn. Sci. Technol. 2, 025003 (2021).

    Article  Google Scholar 

  179. Anderson, B., Hy, T.-S. & Kondor, R. Cormorant: Covariant Molecular Neural Networks. Preprint at Arxiv https://arxiv.org/abs/1906.04015 (2019).

  180. Jackson, R., Zhang, W. & Pearson, J. TSNet: predicting transition state structures with tensor field networks and transfer learning. Chem. Sci. 12, 10022–10040 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  181. Kocer, E., Mason, J. K. & Erturk, H. A novel approach to describe chemical environments in high-dimensional neural network potentials. J. Chem. Phys. 150, 154102 (2019).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The work at Los Alamos National Laboratory (LANL) was supported by the LANL Directed Research and Development Funds (LDRD) and performed in part at the Center for Nonlinear Studies (CNLS) and the Center for Integrated Nanotechnologies (CINT), a US Department of Energy (DOE) Office of Science user facility at LANL. N.F. and M.K. acknowledge financial support from the Director’s Postdoctoral Fellowship at LANL funded by LDRD. K.B. and S.T. acknowledge support from the US DOE, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security (Triad) contract grant number 89233218CNA000001 (FWP: LANLE3F2). This research used resources provided by the LANL Institutional Computing Program. LANL is managed by Triad National Security for the US DOE’s NNSA, under contract number 89233218CNA000001. A.I.B. acknowledges the R. Gaurth Hansen Professorship. O.I. acknowledges support from the National Science Foundation (NSF) grants CHE-1802789 and CHE-2041108. The work performed by O.I. and R.Z. in part was made possible by the Office of Naval Research (ONR) through support provided by the Energetic Materials Program (MURI grant number N00014-21-1-2476).

Author information

Authors and Affiliations

Authors

Contributions

N.F., R.Z., M.K., J.S.S., B.N., R.M., Y.W.L., A.I.B., K.B., O.I. and S.T. researched data for the article. N.F., R.Z., M.K., J.S.S., B.N., K.B., O.I. and S.T. contributed substantially to discussion of the content. N.F., R.Z., M.K., N.L., O.I. and S.T. wrote the article. All authors reviewed and/or edited the manuscript before submission.

Corresponding author

Correspondence to Sergei Tretiak.

Ethics declarations

Competing interests

Authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Chemistry thanks Eric Bittner, Hao Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Software for Chemistry & Materials — Machine Learning Potentials: https://www.scm.com/product/machine-learning-potentials/

Glossary

Extensive properties

Properties that intrinsically grow with system size. Examples include enthalpies of formation, which scale with the number of chemical bonds in a system, and entropies of solvation.

Extensibility

The applicability of a ML model to systems substantially larger than those in the training set. An example of extensibility is a ML model trained to small organic molecules that can accurately simulate a protein.

Interaction layers

Groups of NN operations that transfer information between atoms. Typically expressed as a local graph convolution, this operation creates new features for each atom based on the features and distances of local neighbouring atoms. Also called message-passing.

Charge equilibration scheme

An approach for predicting charge distributions in molecules or solids using tabulated or ML-predicted atomic properties such as atomic electronegativity and hardness, or bond polarizability.

Transferability

The applicability of a ML model to systems not originally included in the training set. Transferable models exhibit high accuracy for chemical systems that are structurally different from the ones in the training set.

One-hot scheme

A binary vector representation for unranked categorical values. For example, three atom types C, H and O are represented by the vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1), respectively.

Atomic embedding

A fixed-length vector representation of an atom with no information about its environment. Embedding is a learnable map between the atomic number and a high-dimensional feature space. Can incorporate additional atomic physicochemical properties, such as electronegativity.

Multitask learning

When one model is concurrently trained on multiple labels (for example, total molecular energies and atomic forces). Leverages useful information contained in multiple related tasks to improve the general performance of all tasks.

Non-extensive properties

Properties that do not scale directly with system size. Examples include characteristics of localized electronic states, electronic excitation energies, vibrational frequencies, electron affinities and ionization potentials.

Wiberg bond index

(WBI). A quantitative bonding model that expresses the bond order between pairs of atoms. WBI measures the electron population overlap between atoms and frequently numerically aligns with chemical intuition, for example, the WBI of the C=C bond in ethane C2H4 is expected to be close to two.

Δ-ML

A composite approach in which baseline values, obtained from cheap QM methods (usually DFT), are corrected towards a target line, calculated by a more sophisticated level of theory, for example, the coupled-cluster methods of perturbation theory.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fedik, N., Zubatyuk, R., Kulichenko, M. et al. Extending machine learning beyond interatomic potentials for predicting molecular properties. Nat Rev Chem 6, 653–672 (2022). https://doi.org/10.1038/s41570-022-00416-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41570-022-00416-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing