Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Machine-learning-assisted materials discovery using failed experiments

Abstract

Inorganic–organic hybrid materials1,2,3 such as organically templated metal oxides1, metal–organic frameworks (MOFs)2 and organohalide perovskites4 have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table5,6,7,8,9. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative10) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility11, photovoltaic properties12, gas adsorption capacity13 or lithium-ion intercalation14) to identify promising target candidates for synthetic efforts11,15; determination of the structure–property relationship from large bodies of experimental data16,17, enabled by integration with high-throughput synthesis and measurement tools18; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification19,20 or gas adsorption properties21). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions for new organically templated inorganic product formation with a success rate of 89 per cent. Inverting the machine-learning model reveals new hypotheses regarding the conditions for successful product formation.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Schematic representation of the feedback mechanism in the dark reactions project.
Figure 2: Comparison of experimental outcomes relating to the formation of templated vanadium-selenite crystals, as a function of amine similarity.
Figure 3: SVM-derived decision tree.
Figure 4: Graphical representation of the three hypotheses generated from the model, and representative structures for each hypothesis.

References

  1. Rao, C. N. R., Behera, J. N. & Dan, M. Organically-templated metal sulfates selenites and selenates. Chem. Soc. Rev. 35, 375–387 (2006)

    CAS  Article  Google Scholar 

  2. Zhou, H.-C., Long, J. R. & Yaghi, O. M. Introduction to metal–organic frameworks. Chem. Rev. 112, 673–674 (2012)

    CAS  Article  Google Scholar 

  3. Férey, G. Microporous solids: from organically templated inorganic skeletons to hybrid frameworks...ecumenism in chemistry. Chem. Mater. 13, 3084–3098 (2001)

    Article  Google Scholar 

  4. Stranks, S. D. & Snaith, H. J. Metal-halide perovskites for photovoltaic and light-emitting devices. Nature Nanotechnol. 10, 391–402 (2015)

    ADS  CAS  Article  Google Scholar 

  5. Cheetham, A. K., Férey, G. & Loiseau, T. Open-framework inorganic materials. Angew. Chem. Int. Ed. 38, 3268–3292 (1999)

    CAS  Article  Google Scholar 

  6. Cundy, C. S. & Cox, P. A. The hydrothermal synthesis of zeolites: history and development from the earliest days to the present time. Chem. Rev. 103, 663–702 (2003)

    CAS  Article  Google Scholar 

  7. Haushalter, R. C. & Mundi, L. A. Reduced molybdenum phosphates: octahedral-tetrahedral framework solids with tunnels, cages, and micropores. Chem. Mater. 4, 31–48 (1992)

    CAS  Article  Google Scholar 

  8. Férey, G. Oxyfluorinated microporous compounds ULM-n: chemical parameters structures and a proposed mechanism for their molecular tectonics. J. Fluor. Chem. 72, 187–193 (1995)

    Article  Google Scholar 

  9. Rao, C. N. R., Natarajan, S. & Neeraj, S. Exploration of a simple universal route to the myriad of open-framework metal phosphates. J. Am. Chem. Soc. 122, 2810–2817 (2000)

    CAS  Article  Google Scholar 

  10. Holdren, J. P. et al. Material Genome Initiative Strategic Plan. Technical Report December 2014, https://www.whitehouse.gov/sites/default/files/microsites/ostp/NSTC/mgi_strategic_plan_-_dec_2014.pdf (National Science and Technology Council, 2014)

  11. Sokolov, A. N. et al. From computational discovery to experimental characterization of a high hole mobility organic crystal. Nature Commun. 2, 437 (2011)

    ADS  Article  Google Scholar 

  12. Hachmann, J. et al. Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry – the Harvard Clean Energy Project. Energy Environ. Sci. 7, 698–704 (2014)

    CAS  Article  Google Scholar 

  13. Colón, Y. J. & Snurr, R. Q. High-throughput computational screening of metal–organic frameworks. Chem. Soc. Rev. 43, 5735–5749 (2014)

    Article  Google Scholar 

  14. Hautier, G., Fischer, C. C., Jain, A., Mueller, T. & Ceder, G. Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chem. Mater. 22, 3762–3767 (2010)

    CAS  Article  Google Scholar 

  15. Martin, R. L., Lin, L.-C., Jariwala, K., Smit, B. & Haranczyk, M. Mail-order metal–organic frameworks (MOFs): designing isoreticular MOF-5 analogues comprising commercially available organic molecules. J. Phys. Chem. C 117, 12159–12167 (2013)

    CAS  Article  Google Scholar 

  16. Gaultois, M. W. et al. Data-driven review of thermoelectric materials: performance and resource considerations. Chem. Mater. 25, 2911–2920 (2013)

    CAS  Article  Google Scholar 

  17. Kalidindi, S. R. & Graef, M. D. Materials data science: current status and future outlook. Annu. Rev. Mater. Res. 45, 171–193 (2015)

    ADS  CAS  Article  Google Scholar 

  18. Zhao, J.-C. High-throughput experimental tools for the materials genome initiative. Chin. Sci. Bull. 59, 1652–1661 (2014)

    Article  Google Scholar 

  19. Yang, S., Lach-hab, M., Vaisman, I. I. & Blaisten-Barojas, E. Identifying zeolite frameworks with a machine learning approach. J. Phys. Chem. C 113, 21721–21725 (2009)

    CAS  Article  Google Scholar 

  20. Li, Y. & Yu, J. New stories of zeolite structures: their descriptions, determinations, predictions, and evaluations. Chem. Rev. 114, 7268–7316 (2014)

    CAS  Article  Google Scholar 

  21. Fernandez, M., Boyd, P. G., Daff, T. D., Aghaji, M. Z. & Woo, T. K. Rapid and accurate machine learning recognition of high performing metal organic frameworks for CO2 capture. J. Phys. Chem. Lett. 5, 3056–3060 (2014)

    CAS  Article  Google Scholar 

  22. Groom, C. R. & Reilly, A. M. Sixth blind test of organic crystal-structure prediction methods. Acta Crystallogr. B70, 776–777 (2014)

    Google Scholar 

  23. Thakur, T. S., Dubey, R. & Desiraju, G. R. Crystal structure and prediction. Annu. Rev. Phys. Chem. 66, 21–42 (2015)

    ADS  CAS  Article  Google Scholar 

  24. Beran, G. J. O. A new era for ab initio molecular crystal lattice energy prediction. Angew. Chem. Int. Ed. 54, 396–398 (2015)

    CAS  Google Scholar 

  25. Wicker, J. G. P. & Cooper, R. I. Will it crystallise? Predicting crystallinity of molecular materials. CrystEngComm 17, 1927–1934 (2015)

    CAS  Article  Google Scholar 

  26. Allen, F. H. The Cambridge Structural Database: a quarter of a million crystal structures and rising. Acta Crystallogr. B58, 380–388 (2002)

    CAS  Article  Google Scholar 

  27. Olshansky, J. H. et al. Formation principles for vanadium selenites: the role of pH on product composition. Inorg. Chem. 53, 12027–12035 (2014)

    CAS  Article  Google Scholar 

  28. JChem 6.1.3, http://www.chemaxon.com (ChemAxon, 2013)

  29. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning 2nd edn, Ch. 9, 12, 13, 15 (Springer, 2009)

  30. Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  31. Üstün, B., Melssen, W. & Buydens, L. M. C. Facilitating the application of support vector regression by using a universal Pearson VII function based kernel. Chemom. Intell. Lab. Syst. 81, 29–40 (2006)

    Article  Google Scholar 

  32. Hall, M. et al. The WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11, 10–18 (2009)

    Article  Google Scholar 

  33. Chang, C.-C. & Lin, C.-J. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)

    Google Scholar 

  34. Joachims, T. Text categorization with support vector machines: Learning with many relevant features. In Proc. 10th European Conf. Machine Learning (eds Nédellec, C. & Rouveirol, C. ) 137–142 (Springer, 1998)

  35. Leach, A. & Gillet, V. J. An Introduction to Chemoinformatics Ch. 5 (Springer, 2007)

  36. Riniker, S. & Landrum, G. A. Open-source platform to benchmark fingerprints for ligand-based virtual screening. J. Cheminformat. 5, 26 (2013)

    CAS  Article  Google Scholar 

  37. Thangavelu, S. G., Butcher, R. J. & Cahill, C. L. Role of N-donor sterics on the coordination environment and dimensionality of uranyl thiophenedicarboxylate coordination polymers. Cryst. Growth Des. 15, 3481–3492 (2015)

    CAS  Article  Google Scholar 

  38. R Core Team. R: A Language and Environment for Statistical Computing http://www.R-project.org/ (R Foundation for Statistical Computing, 2015)

  39. Barakat, N. & Diederich, J. Eclectic rule-extraction from support vector machines. Int. J. Comput. Intell. 2, 59–62 (2005)

    Google Scholar 

Download references

Acknowledgements

We thank Y. Huang, G. Martin-Noble and D. Reilley for data entry and J. H. Koffer for synthetic efforts. M.Z. acknowledges support for the purchase of a diffractometer by the National Science Foundation (DMR 1337296), the Ohio Board of Reagents grant CAP-491 and from Youngstown State University. This work was supported by the National Science Foundation (DMR-1307801). A.J.N. and J.S. each acknowledge the Henry Dreyfus Teacher-Scholar Award program.

Author information

Authors and Affiliations

Authors

Contributions

S.A.F., J.S. and A.J.N. conceived the project and wrote the paper. A.J.N. supervised the data capture. C.F. developed the web-accessible database. A.J.N. and P.D.F.A. tested the data reliability. J.S. and P.R. developed the reactant descriptors. P.R., C.F. and S.A.F. developed the machine-learning models. J.S. performed diamine selection. P.D.F.A performed the Cambridge Structural Database search. K.C.E., M.B.W. and A.M. performed the hydrothermal experimental reactions, supervised by A.J.N. M.Z. performed X-ray crystallography on the resulting products. P.D.F.A. performed the statistical analyses. P.D.F.A., A.J.N., J.S. and S.A.F. performed the decision-tree calculation and analysis. All authors discussed the results and commented on the manuscript.

Corresponding authors

Correspondence to Sorelle A. Friedler, Joshua Schrier or Alexander J. Norquist.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related audio

Supplementary information

Supplementary Information

This file contains Supplementary Text and Data, Supplementary Tables 1-10 and Supplementary Figures 1-5. Included are tables of descriptor definitions, model evaluation results, a learning curve, synthetic and crystallographic details, packing figures, amine structures and a full decision tree. (PDF 1975 kb)

Supplementary Data

This file contains information on the historical reactions, gathered from historical laboratory notebooks. This was the data used to construct the SVM model described in the manuscript. (CSV 5260 kb)

Supplementary Data

This file contains information on the new experiments that were performed to test whether the model improved upon human strategies ("chemical intuition”), during the course of this study. These reactions were not used to train the model. (CSV 369 kb)

Supplementary Data

This shell script file contains the specific model names and parameters used in the model construction described in Table S5 of the Supplementary Information. (TXT 1 kb)

Supplementary Data

This file contains the full crystallographic details for [C6H22N4][VO(C2O4)(SeO3)]2·2H2O. (CIF 19 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Raccuglia, P., Elbert, K., Adler, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016). https://doi.org/10.1038/nature17439

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature17439

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing