Machine learning for molecular and materials science


Here we summarize recent progress in machine learning for the chemical sciences. We outline machine-learning techniques that are suitable for addressing research questions in this domain, as well as future directions for the field. We envisage a future in which the design, synthesis, characterization and application of molecules and materials is accelerated by artificial intelligence.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Evolution of the research workflow in computational chemistry.
Fig. 2: Errors that arise in machine-learning approaches.
Fig. 3: The generative adversarial network (GAN) approach to molecular discovery.


  1. 1.

    Dirac, P. A. M. Quantum mechanics of many-electron systems. Proc. R. Soc. Lond. A 123, 714–733 (1929).

    Article  MATH  ADS  CAS  Google Scholar 

  2. 2.

    Pople, J. A. Quantum chemical models (Nobel lecture). Angew. Chem. Int. Ed. 38, 1894–1902 (1999).

    Article  CAS  Google Scholar 

  3. 3.

    Boyd, D. B. Quantum chemistry program exchange, facilitator of theoretical and computational chemistry in pre-internet history. ACS Symp. Ser. 1122, 221–273 (2013).

    Article  CAS  Google Scholar 

  4. 4.

    Arita, M., Bowler, D. R. & Miyazaki, T. Stable and efficient linear scaling first-principles molecular dynamics for 10000+ atoms. J. Chem. Theory Comput. 10, 5419–5425 (2014).

    Article  PubMed  CAS  Google Scholar 

  5. 5.

    Wilkinson, K. A., Hine, N. D. M. & Skylaris, C.-K. Hybrid mpi-openmp parallelism in the Onetep linear-scaling electronic structure code: application to the delamination of cellulose nanofibrils. J. Chem. Theory Comput. 10, 4782–4794 (2014).

    Article  PubMed  CAS  Google Scholar 

  6. 6.

    Havu, V., Blum, V., Havu, P. & Scheffler, M. Efficient O(N) integration for all-electron electronic structure calculation using numeric basis functions. J. Comput. Phys. 228, 8367–8379 (2009).

    Article  MATH  ADS  CAS  Google Scholar 

  7. 7.

    Catlow, C. R. A., Sokol, A. A. & Walsh, A. Computational Approaches to Energy Materials (Wiley-Blackwell, New York, 2013).

    Google Scholar 

  8. 8.

    Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964).

    MathSciNet  Article  ADS  Google Scholar 

  9. 9.

    Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133–A1138 (1965).

    MathSciNet  Article  ADS  Google Scholar 

  10. 10.

    Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351, aad3000 (2016).

    Article  PubMed  CAS  Google Scholar 

  11. 11.

    Hachmann, J. et al. The Harvard clean energy project: large-scale computational screening and design of organic photovoltaics on the world community grid. J. Phys. Chem. Lett. 2, 2241–2251 (2011).

    Article  CAS  Google Scholar 

  12. 12.

    Jain, A. et al. Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).

    Article  ADS  CAS  Google Scholar 

  13. 13.

    Calderon, C. E. et al. The AFLOW standard for high-throughput materials science calculations. Comput. Mater. Sci. 108, 233–238 (2015).

    Article  CAS  Google Scholar 

  14. 14.

    Agrawal, A. & Choudhary, A. Perspective: Materials informatics and big data: realization of the ‘fourth paradigm’ of science in materials science. APL Mater. 4, 053208 (2016).

    Article  ADS  CAS  Google Scholar 

  15. 15.

    Schwab, K. The fourth industrial revolution. Foreign Affairs (2015).

  16. 16.

    Fourches, D., Muratov, E. & Tropsha, A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model. 50, 1189–1204 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. 17.

    Kireeva, N. et al. Generative topographic mapping (GTM): universal tool for data visualization, structure-activity modeling and dataset comparison. Mol. Inform. 31, 301–312 (2012).

    Article  PubMed  CAS  Google Scholar 

  18. 18.

    Faber, F. A. et al. Prediction errors of molecular machine learning models lower than hybrid DFT error. J. Chem. Theory Comput. 13, 5255–5264 (2017).

    Article  PubMed  CAS  Google Scholar 

  19. 19.

    Rupp, M., Tkatchenko, A., Müller, K.-R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).

    Article  PubMed  ADS  CAS  PubMed Central  Google Scholar 

  20. 20.

    Bonchev, D. & Rouvray, D. H. Chemical Graph Theory: Introduction and Fundamentals (Abacus Press, New York, 1991).

    Google Scholar 

  21. 21.

    Schütt, K. T. et al. How to represent crystal structures for machine learning: towards fast prediction of electronic properties. Phys. Rev. B 89, 205118 (2014). A radial-distribution-function description of periodic solids is adapted for machine-learning models and applied to predict the electronic density of states for a range of materials.

    Article  ADS  CAS  Google Scholar 

  22. 22.

    Ward, L. et al. Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations. Phys. Rev. B 96, 024104 (2017).

    Article  ADS  Google Scholar 

  23. 23.

    Isayev, O. et al. Universal fragment descriptors for predicting electronic properties of inorganic crystals. Nat. Commun. 8, 15679 (2017).

    Article  PubMed  PubMed Central  ADS  CAS  Google Scholar 

  24. 24.

    Hand, D. J. & Yu, K. Idiot’s Bayes—not so stupid after all? Int. Stat. Rev. 69, 385–398 (2001).

    MATH  Google Scholar 

  25. 25.

    Shakhnarovich, G., Darrell, T. & Indyk, P. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (MIT Press, Boston, 2005).

    Google Scholar 

  26. 26.

    Rokach, L. & Maimon, O. in Data Mining and Knowledge Discovery Handbook (eds Maimon, O. & Rokach, L.) 149–174 (Springer, New York, 2010).

  27. 27.

    Shawe-Taylor, J. & Cristianini, N. Kernel Methods for Pattern Analysis (Cambridge Univ. Press, Cambridge, 2004).

    Google Scholar 

  28. 28.

    Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).

    Article  PubMed  Google Scholar 

  29. 29.

    Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic synthesis. Science 166, 178–192 (1969).

    Article  PubMed  ADS  CAS  Google Scholar 

  30. 30.

    Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018). A computer-driven retrosynthesis tool was trained on most published reactions in organic chemistry.

    Article  PubMed  ADS  CAS  Google Scholar 

  31. 31.

    Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).

    Article  CAS  Google Scholar 

  32. 32.

    Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).

    Article  CAS  Google Scholar 

  33. 33.

    Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J. 23, 5966–5971 (2017).

    Article  PubMed  CAS  Google Scholar 

  34. 34.

    Cole, J. C. et al. Generation of crystal structures using known crystal structures as analogues. Acta Crystallogr. B 72, 530–541 (2016).

    Article  CAS  Google Scholar 

  35. 35.

    Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016). This study uses machine learning to guide all stages of a materials discovery workflow from quantum-chemical calculations to materials synthesis.

    Article  PubMed  ADS  CAS  Google Scholar 

  36. 36.

    Jastrzębski, S., Leśniak, D. & Czarnecki, W. M. Learning to SMILE(S). Preprint at (2016).

  37. 37.

    Nam, J. & Kim, J. Linking the neural machine translation and the prediction of organic chemistry reactions. Preprint at (2016).

  38. 38.

    Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. 39.

    Altae-Tran, H., Ramsundar, B., Pappu, A. S. & Pande, V. Low data drug discovery with one-shot learning. ACS Cent. Sci. 3, 283–293 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. 40.

    Wicker, J. G. P. & Cooper, R. I. Will it crystallise? Predicting crystallinity of molecular materials. CrystEngComm 17, 1927–1934 (2015). This paper presents a crystal engineering application of machine learning to assess the probability of a given molecule forming a high-quality crystal.

    Article  CAS  Google Scholar 

  41. 41.

    Pillong, M. et al. A publicly available crystallisation data set and its application in machine learning. CrystEngComm 19, 3737–3745 (2017).

    Article  CAS  Google Scholar 

  42. 42.

    Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016). The study trains a machine-learning model to predict the success of a chemical reaction, incorporating the results of unsuccessful attempts as well as known (successful) reactions.

    Article  PubMed  ADS  CAS  Google Scholar 

  43. 43.

    Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  44. 44.

    Billinge, S. J. L. & Levin, I. The problem with determining atomic structure at the nanoscale. Science 316, 561–565 (2007).

    Article  PubMed  ADS  CAS  Google Scholar 

  45. 45.

    Kalinin, S. V., Sumpter, B. G. & Archibald, R. K. Big–deep–smart data in imaging for guiding materials design. Nat. Mater. 14, 973–980 (2015).

    Article  PubMed  ADS  CAS  Google Scholar 

  46. 46.

    Ziatdinov, M., Maksov, A. & Kalinin, S. V. Learning surface molecular structures via machine vision. npj Comput. Mater. 3, 31 (2017).

    Article  ADS  CAS  Google Scholar 

  47. 47.

    de Albuquerque, V. H. C., Cortez, P. C., de Alexandria, A. R. & Tavares, J. M. R. S. A new solution for automatic microstructures analysis from images based on a backpropagation artificial neural network. Nondestruct. Test. Eval. 23, 273–283 (2008).

    Article  ADS  CAS  Google Scholar 

  48. 48.

    Hui, Y. & Liu, Y. Volumetric data exploration with machine learning-aided visualization in neutron science. Preprint at (2017).

  49. 49.

    Carrasquilla, J. & Melko, R. G. Machine learning phases of matter. Nat. Phys. 13, 431–434 (2017).

    Article  CAS  Google Scholar 

  50. 50.

    Christensen, R., Hansen, H. A. & Vegge, T. Identifying systematic DFT errors in catalytic reactions. Catal. Sci. Technol. 5, 4946–4949 (2015).

    Article  CAS  Google Scholar 

  51. 51.

    Snyder, J. C., Rupp, M., Hansen, K., Müller, K.-R. & Burke, K. Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012).

    Article  PubMed  ADS  CAS  Google Scholar 

  52. 52.

    Wellendorff, J. et al. Density functionals for surface science: exchange-correlation model development with Bayesian error estimation. Phys. Rev. B 85, 235149 (2012).

    Article  ADS  CAS  Google Scholar 

  53. 53.

    Mardirossian, N. & Head-Gordon, M. ωB97M-V a combinatorially optimized, range-separated hybrid, meta-GGA density functional with VV10 nonlocal correlation. J. Chem. Phys. 144, 214110 (2016).

    Article  PubMed  ADS  CAS  Google Scholar 

  54. 54.

    Brockherde, F. et al. Bypassing the Kohn-Sham equations with machine learning. Nat. Commun. 8, 872 (2017). This study transcends the standard approach to DFT by providing a direct mapping from density to energy, paving the way for higher-accuracy approaches.

    Article  PubMed  PubMed Central  ADS  CAS  Google Scholar 

  55. 55.

    Behler, J. First principles neural network potentials for reactive simulations of large molecular and condensed systems. Angew. Chem. Int. Ed. 56, 12828–12840 (2017).

    Article  CAS  Google Scholar 

  56. 56.

    Smith, J. S., Isayev, O. & Roitberg, A. E. Ani-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  57. 57.

    Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010). In this study, machine learning is used to fit interatomic potentials that reproduce the total energy and energy derivatives from quantum-mechanical calculations and enable accurate low-cost simulations.

    Article  PubMed  ADS  CAS  Google Scholar 

  58. 58.

    Handley, C. M. & Popelier, P. L. A. Potential energy surfaces fitted by artificial neural networks. J. Phys. Chem. A 114, 3371–3383 (2010).

    Article  PubMed  CAS  Google Scholar 

  59. 59.

    Pulido, A. et al. Functional materials discovery using energy–structure–function maps. Nature 543, 657–664 (2017).

    Article  PubMed  PubMed Central  ADS  CAS  Google Scholar 

  60. 60.

    Hill, J. et al. Materials science with large-scale data and informatics: unlocking new opportunities. MRS Bull. 41, 399–409 (2016).

    Article  CAS  Google Scholar 

  61. 61.

    Kiselyova, N. N., Gladun, V. P. & Vashchenko, N. D. Computational materials design using artificial intelligence methods. J. Alloys Compd. 279, 8–13 (1998).

    Article  CAS  Google Scholar 

  62. 62.

    Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65, 1501–1509 (2013).

    Article  CAS  Google Scholar 

  63. 63.

    Pilania, G., Wang, C., Jiang, X., Rajasekaran, S. & Ramprasad, R. Accelerating materials property predictions using machine learning. Sci. Rep. 3, 2810 (2013).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  64. 64.

    Hautier, G., Fischer, C. C., Jain, A., Mueller, T. & Ceder, G. Finding nature's missing ternary oxide compounds using machine learning and density functional theory. Chem. Mater. 22, 3762–3767 (2010). In an early example of harnessing materials databases, information on known compounds is used to construct a machine-learning model to predict the viability of previously unreported chemistries.

    Article  CAS  Google Scholar 

  65. 65.

    Walsh, A. The quest for new functionality. Nat. Chem. 7, 274–275 (2015).

    Article  PubMed  CAS  Google Scholar 

  66. 66.

    Davies, D. W. et al. Computational screening of all stoichiometric inorganic materials. Chem 1, 617–627 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. 67.

    Franceschetti, A. & Zunger, A. The inverse band-structure problem of finding an atomic configuration with given electronic properties. Nature 402, 60–63 (1999).

    Article  ADS  CAS  Google Scholar 

  68. 68.

    Kuhn, C. & Beratan, D. N. Inverse strategies for molecular design. J. Phys. Chem. 100, 10595–10599 (1996).

    Article  CAS  Google Scholar 

  69. 69.

    Oliynyk, A. O. et al. High-throughput machine-learning-driven synthesis of full-Heusler compounds. Chem. Mater. 28, 7324–7331 (2016).

    Article  CAS  Google Scholar 

  70. 70.

    Legrain, F., Carrete, J., van Roekeghem, A., Madsen, G. K. H. & Mingo, N. Materials screening for the discovery of new half-heuslers: machine learning versus ab initio methods. J. Phys. Chem. B 122, 625–632 (2018).

    Article  PubMed  CAS  Google Scholar 

  71. 71.

    Moot, T. et al. Material informatics driven design and experimental validation of lead titanate as an aqueous solar photocathode. Mater. Discov. 6, 9–16 (2016).

    Article  Google Scholar 

  72. 72.

    Faber, F. A., Lindmaa, A., Von Lilienfeld, O. A. & Armiento, R. Machine learning energies of 2 million elpasolite (ABC 2 D 6) crystals. Phys. Rev. Lett. 117, 135502 (2016).

    Article  PubMed  ADS  CAS  Google Scholar 

  73. 73.

    Oprea, T. I. & Tropsha, A. Target, chemical and bioactivity databases – integration is key. Drug Discov. Today. Technol. 3, 357–365 (2006).

    Article  Google Scholar 

  74. 74.

    Sterling, T. & Irwin, J. J. ZINC 15 – ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  75. 75.

    Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 29, 476–488 (2010).

    Article  PubMed  CAS  Google Scholar 

  76. 76.

    Hansch, C. & Fujita, T. p-σ-π analysis. A method for the correlation of biological activity and chemical structure. J. Am. Chem. Soc. 86, 1616–1626 (1964).

    Article  CAS  Google Scholar 

  77. 77.

    Goodfellow, I. J. et al. Generative adversarial networks. Preprint at (2014).

  78. 78.

    Guimaraes, G. L., Sanchez-Lengeling, B., Outeiral, C., Farias, P. L. C. & Aspuru-Guzik, A. Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models. Preprint at (2017).

  79. 79.

    Fleuren, W. W. M. & Alkema, W. Application of text mining in the biomedical domain. Methods 74, 97–106 (2015).

    Article  PubMed  CAS  Google Scholar 

  80. 80.

    Kim, E. et al. Materials synthesis insights from scientific literature via text extraction and machine learning. Chem. Mater. 29, 9436–9444 (2017).

    Article  CAS  Google Scholar 

  81. 81.

    Jankowski, N., Duch, W. & Gra̧bczewski, K. (eds) Meta-Learning in Computational Intelligence (Springer, Berlin, 2011).

  82. 82.

    Graves, A., Wayne, G. & Danihelka, I. Neural Turing machines. Preprint at (2014).

  83. 83.

    Duan, Y. et al. One-shot imitation learning. Preprint at (2017).

  84. 84.

    Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015).

    MathSciNet  Article  PubMed  MATH  ADS  CAS  Google Scholar 

  85. 85.

    Ghiringhelli, L. M., Vybiral, J., Levchenko, S. V., Draxl, C. & Scheffler, M. Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 114, 105503 (2015).

    Article  PubMed  ADS  CAS  Google Scholar 

  86. 86.

    Curtarolo, S. et al. The high-throughput highway to computational materials design. Nat. Mater. 12, 191–201 (2013).

    Article  PubMed  ADS  CAS  Google Scholar 

  87. 87.

    Seko, A., Togo, A. & Tanaka, I. in Nanoinformatics (ed. Tanaka, I.) 3–23 (Springer, Singapore, 2018).

  88. 88.

    Duvenaud, D. et al. Convolutional networks on graphs for learning molecular fingerprints. Preprint at (2015).

  89. 89.

    Steane, A. Quantum computing. Rep. Prog. Phys. 61, 117 (1998).

    MathSciNet  Article  ADS  CAS  Google Scholar 

  90. 90.

    Harrow, A. W., Hassidim, A. & Lloyd, S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 103, 150502 (2009).

    MathSciNet  Article  PubMed  ADS  CAS  Google Scholar 

  91. 91.

    Aspuru-Guzik, A., Dutoi, A. D., Love, P. J. & Head-Gordon, M. Simulated quantum computation of molecular energies. Science 309, 1704–1707 (2005). In an early application of quantum computing to molecular problems, a quantum algorithm that scales linearly with the number of basis functions is demonstrated for calculating properties of chemical interest.

    Article  PubMed  ADS  CAS  Google Scholar 

  92. 92.

    Reiher, M., Wiebe, N., Svore, K. M., Wecker, D. & Troyer, M. Elucidating reaction mechanisms on quantum computers. Proc. Natl Acad. Sci. USA 114, 7555–7560 (2017).

    Article  PubMed  ADS  CAS  Google Scholar 

  93. 93.

    Dunjko, V., Taylor, J. M. & Briegel, H. J. Quantum-enhanced machine learning. Phys. Rev. Lett. 117, 130501 (2016).

    MathSciNet  Article  PubMed  ADS  CAS  Google Scholar 

  94. 94.

    Biamonte, J. et al. Quantum machine learning. Nature 549, 195–202 (2017).

    Article  PubMed  ADS  CAS  Google Scholar 

  95. 95.

    Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009).

    Article  PubMed  ADS  CAS  Google Scholar 

  96. 96.

    Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Data-driven discovery of partial differential equations. Sci. Adv. 3, e1602614 (2017).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  97. 97.

    Domingos, P. The Master Algorithm (Basic Books, New York, 2015).

    Google Scholar 

  98. 98.

    Coudert, F.-X. Reproducible research in computational chemistry of materials. Chem. Mater. 29, 2615–2617 (2017).

    Article  CAS  Google Scholar 

  99. 99.

    Tetko, I. V., Maran, U. & Tropsha, A. Public (Q)SAR services, integrated modeling environments, and model repositories on the web: state of the art and perspectives for future development. Mol. Inform. 36, 1600082 (2017).

    Article  CAS  Google Scholar 

Download references


This work was supported by the EPSRC (grant numbers EP/M009580/1, EP/K016288/1 and EP/L016354/1), the Royal Society and the Leverhulme Trust. O.I. acknowledges support from DOD-ONR (N00014-16-1-2311) and an Eshelman Institute for Innovation award.

Reviewer information

Nature thanks F.-X. Coudert, M. Waller and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information




All authors contributed equally to the design, writing and editing of the manuscript.

Corresponding authors

Correspondence to Olexandr Isayev or Aron Walsh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Butler, K.T., Davies, D.W., Cartwright, H. et al. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.