Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Quantum machine learning using atom-in-molecule-based fragments selected on the fly

Abstract

First-principles-based exploration of chemical space deepens our understanding of chemistry and might help with the design of new molecules, materials or experiments. Due to the computational cost of quantum chemistry methods and the immense number of theoretically possible stable compounds, comprehensive in silico screening remains prohibitive. To overcome this challenge, we combine atom-in-molecule-based fragments, dubbed ‘amons’ (A), with active learning in transferable quantum machine learning (ML) models. The efficiency, accuracy, scalability and transferability of the resulting AML models is demonstrated for important molecular quantum properties such as energies, forces, atomic charges, NMR shifts and polarizabilities and for systems including organic molecules, 2D materials, water clusters, Watson–Crick DNA base pairs and even ubiquitin. Conceptually, the AML approach extends Mendeleev’s table to account effectively for chemical environments, which allows the systematic reconstruction of many chemistries from local building blocks.

Image credit: ESA/Hubble & NASA, Acknowledgement: Judy Schmidt.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: ‘Amons’. Compositional extension of the periodic table.
Fig. 2: The amons of organic chemistry.
Fig. 3: Scalability of AML demonstrated by systematic improvement of predicted atomization energies (E) for two dozen important biomolecules using increasingly larger amons.
Fig. 4: Applicability of AML demonstrated by low prediction errors as function of training set size for various quantum properties.
Fig. 5: Significantly improved learning through amons selection for training compared to random selection.

Data availability

All data used in this paper are available at https://github.com/binghuang2018/aqml-datahttps://doi.org/10.5281/zenodo.3911072. All pertinent details are specified in the README file.

Code availability

Mixed Python/Fortran code (MIT licence, no restrictions) for generating amons, aSLATM/SLATM representation as well as AML models, along with detailed instructions on how to reproduce our results are available at https://github.com/binghuang2018/aqmlhttps://doi.org/10.5281/zenodo.3742792.

References

  1. 1.

    Feynman, R. P., Leighton, R. B. & Sands, M. The Feynman Lectures on Physics Vol. 1 (Addison-Wesley, 1963).

  2. 2.

    Martin, R. M. Electronic Structure: Basic Theory and Practical Methods (Cambridge University press, 2004).

  3. 3.

    Reece, J. B. et al. Campbell Biology (Pearson Boston, 2011).

  4. 4.

    Rupp, M., Tkatchenko, A., Muller, K.-R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).

    PubMed  Google Scholar 

  5. 5.

    Hansen, K., Biegler, F., von Lilienfeld, O. A., Muller, K.-R. & Tkatchenko, A. Interaction potentials in molecules and non-local information in chemical space. J. Phys. Chem. Lett. 6, 2326 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Huang, B. & von Lilienfeld, O. A. Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. J. Chem. Phys. 145, 161102 (2016).

    PubMed  Google Scholar 

  7. 7.

    Pilania, G., Wang, C., Jiang, X., Rajasekaran, S. & Ramprasad, R. Accelerating materials property predictions using machine learning. Sci. Rep. 3, 2810 (2013).

    PubMed  PubMed Central  Google Scholar 

  8. 8.

    Meredig, B. et al. Combinatorial screening for new materials in unconstrained composition space with machine learning. Phys. Rev. B 89, 094104 (2014).

    Google Scholar 

  9. 9.

    Pyzer-Knapp, E. O., Li, K. & Aspuru-Guzik, A. Learning from the Harvard clean energy project: The use of neural networks to accelerate materials discovery. Adv. Funct. Mat. 25, 6495–6502 (2015).

    CAS  Google Scholar 

  10. 10.

    Faber, F. A., Lindmaa, A., von Lilienfeld, O. A. & Armiento, R. Machine learning energies of 2 million elpasolite (ABC2D6) crystals. Phys. Rev. Lett. 117, 135502 (2016).

    PubMed  Google Scholar 

  11. 11.

    De, S., Bartok, A. P., Csanyi, G. & Ceriotti, M. Comparing molecules and solids across structural and alchemical space. Phys. Chem. Chem. Phys. 18, 13754–13769 (2016).

    CAS  PubMed  Google Scholar 

  12. 12.

    Schutt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R. & Tkatchenko, A. Quantum-chemical insights from deep tensor neural networks. Nat. Commun. 8, 13890 (2017).

    PubMed  PubMed Central  Google Scholar 

  13. 13.

    Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).

    CAS  PubMed  Google Scholar 

  14. 14.

    Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Schutt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet – A deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).

    CAS  PubMed  Google Scholar 

  16. 16.

    Gubaev, K., Podryabinkin, E. V. & Shapeev, A. V. Machine learning of molecular properties: Locality and active learning. J. Chem. Phys. 148, 241727 (2018).

    PubMed  Google Scholar 

  17. 17.

    Imbalzano, G. et al. Automatic selection of atomic fingerprints and reference configurations for machine-learning potentials. J. Chem. Phys. 148, 241730 (2018).

    PubMed  Google Scholar 

  18. 18.

    Hierse, W. & Stechel, E. B. Order-N methods in self-consistent density-functional calculations. Phys. Rev. B 50, 17811–17819 (1994).

    CAS  Google Scholar 

  19. 19.

    Goedecker, S. Linear scaling electronic structure methods. Rev. Mod. Phys. 71, 1085–1123 (1999).

    CAS  Google Scholar 

  20. 20.

    Gordon, M. S., Fedorov, D. G., Pruitt, S. R. & Slipchenko, L. V. Fragmentation methods: A route to accurate calculations on large systems. Chem. Rev. 112, 632–672 (2012).

    CAS  PubMed  Google Scholar 

  21. 21.

    Faber, F. A. et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 13, 5255–5264 (2017).

    CAS  PubMed  Google Scholar 

  22. 22.

    Prodan, E. & Kohn, W. Nearsightedness of electronic matter. Proc. Natl Acad. Sci. USA 102, 11635–11638 (2005).

    CAS  PubMed  Google Scholar 

  23. 23.

    Fias, S., Heidar-Zadeh, F., Geerlings, P. & Ayers, P. W. Chemical transferability of functional groups follows from the nearsightedness of electronic matter. Proc. Natl Acad. Sci. USA 114, 11633–11638 (2017).

    CAS  PubMed  Google Scholar 

  24. 24.

    Hehre, W. J., Ditchfield, R., Radom, L. & Pople, J. A. Molecular orbital theory of the electronic structure of organic compounds. V. molecular theory of bond separation. J. Am. Chem. Soc. 92, 4796–4801 (1970).

    CAS  Google Scholar 

  25. 25.

    Halgren, T. A. MMFF VI. MMFF94S option for energy minimization studies. J. Comput. Chem. 20, 720–729 (1999).

    CAS  Google Scholar 

  26. 26.

    Ramakrishnan, R., Dral, P., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 52, 2864–2875 (2012).

    CAS  PubMed  Google Scholar 

  28. 28.

    Faber, F. A., Christensen, A. S., Huang, B. & von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 148, 241717 (2018).

    PubMed  Google Scholar 

  29. 29.

    von Lilienfeld, O. A. First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties. Int. J. Quantum Chem. 113, 1676–1689 (2013).

    Google Scholar 

  30. 30.

    Bader, R. F. Atoms in Molecules (Wiley Online Library, 1990).

  31. 31.

    von Lilienfeld, O. A. Quantum machine learning in chemical compound space. Angew. Chem. Int. Ed. 57, 4164–4169 (2018).

    Google Scholar 

  32. 32.

    Koch, W. & Holthausen, M. C. A Chemist’s Guide to Density Functional Theory (Wiley-VCH, 2002).

  33. 33.

    Lu, S., Pan, J., Huang, A., Zhuang, L. & Lu, J. Alkaline polymer electrolyte fuel cells completely free from noble metal catalysts. Proc. Natl Acad. Sci. USA 105, 20611–20614 (2008).

    CAS  Google Scholar 

  34. 34.

    James, T., Wales, D. J. & Hernandez-Rojas, J. Global minima for water clusters (H2O)n, n ≤ 21. Chem. Phys. Lett. 415, 302–307 (2005). described by a five-site empirical potential.

    CAS  Google Scholar 

  35. 35.

    Mao, K. et al. A theoretical study of single-atom catalysis of CO oxidation using au embedded 2D h-BN monolayer: a CO-promoted O2 activation. Sci. Rep. 4, 5441 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Yeole, S. D. & Gadre, S. R. On the applicability of fragmentation methods to conjugated systems within density functional framework. J. Chem. Phys. 132, 094102 (2010).

    PubMed  Google Scholar 

  37. 37.

    Medvedev, M. G., Bushmarinov, I. S., Sun, J., Perdew, J. P. & Lyssenko, K. A. Density functional theory is straying from the path toward the exact functional. Science 355, 49–52 (2017).

    CAS  PubMed  Google Scholar 

  38. 38.

    Mantina, M., Chamberlin, A. C., Valero, R., Cramer, C. J. & Truhlar, D. G. Consistent van der Waals radii for the whole main group. J. Phys. Chem. A 113, 5806–5812 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    OEChem toolkit v2.1.2 (Openeye Scientific Software, 2017).

  40. 40.

    O’Boyle, N. M. et al. Open babel: An open chemical toolbox. J. Cheminform. 3, 1–14 (2011).

    Google Scholar 

  41. 41.

    Frisch, M. J. et al. Gaussian 09 Revision D.01 (Gaussian Inc., 2009)

  42. 42.

    Ramakrishnan, R., Dral, P., Rupp, M. & von Lilienfeld, O. A. Big data meets quantum chemistry approximations: The -machine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).

    CAS  PubMed  Google Scholar 

  43. 43.

    Neese, F. The ORCA program system. WIREs Comput. Mol. Sci. 2, 73–78 (2012).

    CAS  Google Scholar 

  44. 44.

    Werner, H.-J. et al. Molpro v.2015.1 (2015).

  45. 45.

    Kresse, G. & Furthmuller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comp. Mat. Sci. 6, 15–50 (1996).

    CAS  Google Scholar 

  46. 46.

    Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Blochl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953–17979 (1994).

    CAS  Google Scholar 

  48. 48.

    TURBOMOLE v.6.2 (TURBOMOLE GmbH, 2010).

  49. 49.

    Rupp, M., Ramakrishnan, R. & von Lilienfeld, O. A. Machine learning for quantum mechanical properties of atoms in molecules. J. Phys. Chem. Lett. 6, 3309–3313 (2015).

    CAS  Google Scholar 

  50. 50.

    Rasmussen, C. & Williams, C. Gaussian Processes for Machine Learning. Adaptative Computation and Machine Learning Series (University Press Group, 2006).

  51. 51.

    Bartok, A. P., Payne, M. C., Kondor, R. & Csanyi, G. Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).

    PubMed  Google Scholar 

  52. 52.

    von Lilienfeld, O. A., Ramakrishnan, R., Rupp, M. & Knoll, A. Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties. Int. J. Quantum Chem. 115, 1084–1093 (2015).

    Google Scholar 

  53. 53.

    Bartok, A. P., Kondor, R. & Csányi, G. On representing chemical environments. Phys. Rev. B 87, 184115 (2013).

    Google Scholar 

  54. 54.

    Axilrod, B. M. & Teller, E. Interaction of the van der Waals type between three atoms. J. Chem. Phys. 11, 299–300 (1943).

    CAS  Google Scholar 

  55. 55.

    Muto, Y. Force between nonpolar molecules. J. Phys.-Math. Soc. Jpn 17, 629–631 (1943).

    CAS  Google Scholar 

  56. 56.

    Doran, M. & Zucker, I. Higher order multipole three-body van der Waals interactions and stability of rare gas solids. J. Phys. C 4, 307 (1971).

    CAS  Google Scholar 

  57. 57.

    Ramakrishnan, R. & von Lilienfeld, O. A. Many molecular properties from one kernel in chemical space. Chimia 69, 182 (2015).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

D. Bakowies is acknowledged for helpful discussions. O.A.v.L. acknowledges funding from the Swiss National Science Foundation (No. PP00P2_138932 and 407540 _167186 NFP 75 Big Data). This research was partly supported by NCCR MARVEL, funded by the Swiss National Science Foundation. Calculations were performed at the sciCORE scientific computing core facility (http://scicore.unibas.ch/) at the University of Basel.

Author information

Affiliations

Authors

Contributions

B.H. and O.A.v.L. conceived the idea of amons, B.H. implemented the corresponding algorithms. B.H. and O.A.v.L. designed all model systems for testing purposes and B.H. carried out all calculations. Both authors analysed the results and wrote the paper.

Corresponding author

Correspondence to O. Anatole von Lilienfeld.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–14.

Supplementary Video 1

VR recording of the 1,000 most frequent amons of the QM9 dataset.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huang, B., von Lilienfeld, O.A. Quantum machine learning using atom-in-molecule-based fragments selected on the fly. Nat. Chem. 12, 945–951 (2020). https://doi.org/10.1038/s41557-020-0527-z

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing