Abstract
First-principles-based exploration of chemical space deepens our understanding of chemistry and might help with the design of new molecules, materials or experiments. Due to the computational cost of quantum chemistry methods and the immense number of theoretically possible stable compounds, comprehensive in silico screening remains prohibitive. To overcome this challenge, we combine atom-in-molecule-based fragments, dubbed ‘amons’ (A), with active learning in transferable quantum machine learning (ML) models. The efficiency, accuracy, scalability and transferability of the resulting AML models is demonstrated for important molecular quantum properties such as energies, forces, atomic charges, NMR shifts and polarizabilities and for systems including organic molecules, 2D materials, water clusters, Watson–Crick DNA base pairs and even ubiquitin. Conceptually, the AML approach extends Mendeleev’s table to account effectively for chemical environments, which allows the systematic reconstruction of many chemistries from local building blocks.

Image credit: ESA/Hubble & NASA, Acknowledgement: Judy Schmidt.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Graph neural networks for materials science and chemistry
Communications Materials Open Access 26 November 2022
-
cell2mol: encoding chemistry to interpret crystallographic data
npj Computational Materials Open Access 31 August 2022
-
QMugs, quantum mechanical properties of drug-like molecules
Scientific Data Open Access 07 June 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
All data used in this paper are available at https://github.com/binghuang2018/aqml-datahttps://doi.org/10.5281/zenodo.3911072. All pertinent details are specified in the README file.
Code availability
Mixed Python/Fortran code (MIT licence, no restrictions) for generating amons, aSLATM/SLATM representation as well as AML models, along with detailed instructions on how to reproduce our results are available at https://github.com/binghuang2018/aqmlhttps://doi.org/10.5281/zenodo.3742792.
References
Feynman, R. P., Leighton, R. B. & Sands, M. The Feynman Lectures on Physics Vol. 1 (Addison-Wesley, 1963).
Martin, R. M. Electronic Structure: Basic Theory and Practical Methods (Cambridge University press, 2004).
Reece, J. B. et al. Campbell Biology (Pearson Boston, 2011).
Rupp, M., Tkatchenko, A., Muller, K.-R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
Hansen, K., Biegler, F., von Lilienfeld, O. A., Muller, K.-R. & Tkatchenko, A. Interaction potentials in molecules and non-local information in chemical space. J. Phys. Chem. Lett. 6, 2326 (2015).
Huang, B. & von Lilienfeld, O. A. Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. J. Chem. Phys. 145, 161102 (2016).
Pilania, G., Wang, C., Jiang, X., Rajasekaran, S. & Ramprasad, R. Accelerating materials property predictions using machine learning. Sci. Rep. 3, 2810 (2013).
Meredig, B. et al. Combinatorial screening for new materials in unconstrained composition space with machine learning. Phys. Rev. B 89, 094104 (2014).
Pyzer-Knapp, E. O., Li, K. & Aspuru-Guzik, A. Learning from the Harvard clean energy project: The use of neural networks to accelerate materials discovery. Adv. Funct. Mat. 25, 6495–6502 (2015).
Faber, F. A., Lindmaa, A., von Lilienfeld, O. A. & Armiento, R. Machine learning energies of 2 million elpasolite (ABC2D6) crystals. Phys. Rev. Lett. 117, 135502 (2016).
De, S., Bartok, A. P., Csanyi, G. & Ceriotti, M. Comparing molecules and solids across structural and alchemical space. Phys. Chem. Chem. Phys. 18, 13754–13769 (2016).
Schutt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R. & Tkatchenko, A. Quantum-chemical insights from deep tensor neural networks. Nat. Commun. 8, 13890 (2017).
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
Schutt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet – A deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Gubaev, K., Podryabinkin, E. V. & Shapeev, A. V. Machine learning of molecular properties: Locality and active learning. J. Chem. Phys. 148, 241727 (2018).
Imbalzano, G. et al. Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials. J. Chem. Phys. 148, 241730 (2018).
Hierse, W. & Stechel, E. B. Order-N methods in self-consistent density-functional calculations. Phys. Rev. B 50, 17811–17819 (1994).
Goedecker, S. Linear scaling electronic structure methods. Rev. Mod. Phys. 71, 1085–1123 (1999).
Gordon, M. S., Fedorov, D. G., Pruitt, S. R. & Slipchenko, L. V. Fragmentation methods: A route to accurate calculations on large systems. Chem. Rev. 112, 632–672 (2012).
Faber, F. A. et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 13, 5255–5264 (2017).
Prodan, E. & Kohn, W. Nearsightedness of electronic matter. Proc. Natl Acad. Sci. USA 102, 11635–11638 (2005).
Fias, S., Heidar-Zadeh, F., Geerlings, P. & Ayers, P. W. Chemical transferability of functional groups follows from the nearsightedness of electronic matter. Proc. Natl Acad. Sci. USA 114, 11633–11638 (2017).
Hehre, W. J., Ditchfield, R., Radom, L. & Pople, J. A. Molecular orbital theory of the electronic structure of organic compounds. V. molecular theory of bond separation. J. Am. Chem. Soc. 92, 4796–4801 (1970).
Halgren, T. A. MMFF VI. MMFF94S option for energy minimization studies. J. Comput. Chem. 20, 720–729 (1999).
Ramakrishnan, R., Dral, P., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).
Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 52, 2864–2875 (2012).
Faber, F. A., Christensen, A. S., Huang, B. & von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 148, 241717 (2018).
von Lilienfeld, O. A. First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties. Int. J. Quantum Chem. 113, 1676–1689 (2013).
Bader, R. F. Atoms in Molecules (Wiley Online Library, 1990).
von Lilienfeld, O. A. Quantum machine learning in chemical compound space. Angew. Chem. Int. Ed. 57, 4164–4169 (2018).
Koch, W. & Holthausen, M. C. A Chemist’s Guide to Density Functional Theory (Wiley-VCH, 2002).
Lu, S., Pan, J., Huang, A., Zhuang, L. & Lu, J. Alkaline polymer electrolyte fuel cells completely free from noble metal catalysts. Proc. Natl Acad. Sci. USA 105, 20611–20614 (2008).
James, T., Wales, D. J. & Hernandez-Rojas, J. Global minima for water clusters (H2O)n, n ≤ 21. Chem. Phys. Lett. 415, 302–307 (2005). described by a five-site empirical potential.
Mao, K. et al. A theoretical study of single-atom catalysis of CO oxidation using au embedded 2D h-BN monolayer: a CO-promoted O2 activation. Sci. Rep. 4, 5441 (2014).
Yeole, S. D. & Gadre, S. R. On the applicability of fragmentation methods to conjugated systems within density functional framework. J. Chem. Phys. 132, 094102 (2010).
Medvedev, M. G., Bushmarinov, I. S., Sun, J., Perdew, J. P. & Lyssenko, K. A. Density functional theory is straying from the path toward the exact functional. Science 355, 49–52 (2017).
Mantina, M., Chamberlin, A. C., Valero, R., Cramer, C. J. & Truhlar, D. G. Consistent van der Waals radii for the whole main group. J. Phys. Chem. A 113, 5806–5812 (2009).
OEChem toolkit v2.1.2 (Openeye Scientific Software, 2017).
O’Boyle, N. M. et al. Open babel: An open chemical toolbox. J. Cheminform. 3, 1–14 (2011).
Frisch, M. J. et al. Gaussian 09 Revision D.01 (Gaussian Inc., 2009)
Ramakrishnan, R., Dral, P., Rupp, M. & von Lilienfeld, O. A. Big data meets quantum chemistry approximations: The -machine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).
Neese, F. The ORCA program system. WIREs Comput. Mol. Sci. 2, 73–78 (2012).
Werner, H.-J. et al. Molpro v.2015.1 (2015).
Kresse, G. & Furthmuller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comp. Mat. Sci. 6, 15–50 (1996).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).
Blochl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953–17979 (1994).
TURBOMOLE v.6.2 (TURBOMOLE GmbH, 2010).
Rupp, M., Ramakrishnan, R. & von Lilienfeld, O. A. Machine learning for quantum mechanical properties of atoms in molecules. J. Phys. Chem. Lett. 6, 3309–3313 (2015).
Rasmussen, C. & Williams, C. Gaussian Processes for Machine Learning. Adaptative Computation and Machine Learning Series (University Press Group, 2006).
Bartok, A. P., Payne, M. C., Kondor, R. & Csanyi, G. Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).
von Lilienfeld, O. A., Ramakrishnan, R., Rupp, M. & Knoll, A. Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties. Int. J. Quantum Chem. 115, 1084–1093 (2015).
Bartok, A. P., Kondor, R. & Csányi, G. On representing chemical environments. Phys. Rev. B 87, 184115 (2013).
Axilrod, B. M. & Teller, E. Interaction of the van der Waals type between three atoms. J. Chem. Phys. 11, 299–300 (1943).
Muto, Y. Force between nonpolar molecules. J. Phys.-Math. Soc. Jpn 17, 629–631 (1943).
Doran, M. & Zucker, I. Higher order multipole three-body van der Waals interactions and stability of rare gas solids. J. Phys. C 4, 307 (1971).
Ramakrishnan, R. & von Lilienfeld, O. A. Many molecular properties from one kernel in chemical space. Chimia 69, 182 (2015).
Acknowledgements
D. Bakowies is acknowledged for helpful discussions. O.A.v.L. acknowledges funding from the Swiss National Science Foundation (No. PP00P2_138932 and 407540 _167186 NFP 75 Big Data). This research was partly supported by NCCR MARVEL, funded by the Swiss National Science Foundation. Calculations were performed at the sciCORE scientific computing core facility (http://scicore.unibas.ch/) at the University of Basel.
Author information
Authors and Affiliations
Contributions
B.H. and O.A.v.L. conceived the idea of amons, B.H. implemented the corresponding algorithms. B.H. and O.A.v.L. designed all model systems for testing purposes and B.H. carried out all calculations. Both authors analysed the results and wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–14.
Supplementary Video 1
VR recording of the 1,000 most frequent amons of the QM9 dataset.
Rights and permissions
About this article
Cite this article
Huang, B., von Lilienfeld, O.A. Quantum machine learning using atom-in-molecule-based fragments selected on the fly. Nat. Chem. 12, 945–951 (2020). https://doi.org/10.1038/s41557-020-0527-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41557-020-0527-z
This article is cited by
-
Graph neural networks for materials science and chemistry
Communications Materials (2022)
-
QMugs, quantum mechanical properties of drug-like molecules
Scientific Data (2022)
-
cell2mol: encoding chemistry to interpret crystallographic data
npj Computational Materials (2022)
-
Inverse design of 3d molecular structures with conditional generative neural networks
Nature Communications (2022)
-
Machine learning based energy-free structure predictions of molecules, transition states, and solids
Nature Communications (2021)