Letter | Published:

Machine-learning-assisted materials discovery using failed experiments

Nature volume 533, pages 7376 (05 May 2016) | Download Citation

Abstract

Inorganic–organic hybrid materials1,2,3 such as organically templated metal oxides1, metal–organic frameworks (MOFs)2 and organohalide perovskites4 have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table5,6,7,8,9. Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation- and data-driven approaches (promoted by efforts such as the Materials Genome Initiative10) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility11, photovoltaic properties12, gas adsorption capacity13 or lithium-ion intercalation14) to identify promising target candidates for synthetic efforts11,15; determination of the structure–property relationship from large bodies of experimental data16,17, enabled by integration with high-throughput synthesis and measurement tools18; and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification19,20 or gas adsorption properties21). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on ‘dark’ reactions—failed or unsuccessful hydrothermal syntheses—collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions for new organically templated inorganic product formation with a success rate of 89 per cent. Inverting the machine-learning model reveals new hypotheses regarding the conditions for successful product formation.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , & Organically-templated metal sulfates selenites and selenates. Chem. Soc. Rev. 35, 375–387 (2006)

  2. 2.

    , & Introduction to metal–organic frameworks. Chem. Rev. 112, 673–674 (2012)

  3. 3.

    Microporous solids: from organically templated inorganic skeletons to hybrid frameworks...ecumenism in chemistry. Chem. Mater. 13, 3084–3098 (2001)

  4. 4.

    & Metal-halide perovskites for photovoltaic and light-emitting devices. Nature Nanotechnol. 10, 391–402 (2015)

  5. 5.

    , & Open-framework inorganic materials. Angew. Chem. Int. Ed. 38, 3268–3292 (1999)

  6. 6.

    & The hydrothermal synthesis of zeolites: history and development from the earliest days to the present time. Chem. Rev. 103, 663–702 (2003)

  7. 7.

    & Reduced molybdenum phosphates: octahedral-tetrahedral framework solids with tunnels, cages, and micropores. Chem. Mater. 4, 31–48 (1992)

  8. 8.

    Oxyfluorinated microporous compounds ULM-n: chemical parameters structures and a proposed mechanism for their molecular tectonics. J. Fluor. Chem. 72, 187–193 (1995)

  9. 9.

    , & Exploration of a simple universal route to the myriad of open-framework metal phosphates. J. Am. Chem. Soc. 122, 2810–2817 (2000)

  10. 10.

    et al. Material Genome Initiative Strategic Plan. Technical Report December 2014, (National Science and Technology Council, 2014)

  11. 11.

    et al. From computational discovery to experimental characterization of a high hole mobility organic crystal. Nature Commun. 2, 437 (2011)

  12. 12.

    et al. Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry – the Harvard Clean Energy Project. Energy Environ. Sci. 7, 698–704 (2014)

  13. 13.

    & High-throughput computational screening of metal–organic frameworks. Chem. Soc. Rev. 43, 5735–5749 (2014)

  14. 14.

    , , , & Finding nature’s missing ternary oxide compounds using machine learning and density functional theory. Chem. Mater. 22, 3762–3767 (2010)

  15. 15.

    , , , & Mail-order metal–organic frameworks (MOFs): designing isoreticular MOF-5 analogues comprising commercially available organic molecules. J. Phys. Chem. C 117, 12159–12167 (2013)

  16. 16.

    et al. Data-driven review of thermoelectric materials: performance and resource considerations. Chem. Mater. 25, 2911–2920 (2013)

  17. 17.

    & Materials data science: current status and future outlook. Annu. Rev. Mater. Res. 45, 171–193 (2015)

  18. 18.

    High-throughput experimental tools for the materials genome initiative. Chin. Sci. Bull. 59, 1652–1661 (2014)

  19. 19.

    , , & Identifying zeolite frameworks with a machine learning approach. J. Phys. Chem. C 113, 21721–21725 (2009)

  20. 20.

    & New stories of zeolite structures: their descriptions, determinations, predictions, and evaluations. Chem. Rev. 114, 7268–7316 (2014)

  21. 21.

    , , , & Rapid and accurate machine learning recognition of high performing metal organic frameworks for CO2 capture. J. Phys. Chem. Lett. 5, 3056–3060 (2014)

  22. 22.

    & Sixth blind test of organic crystal-structure prediction methods. Acta Crystallogr. B70, 776–777 (2014)

  23. 23.

    , & Crystal structure and prediction. Annu. Rev. Phys. Chem. 66, 21–42 (2015)

  24. 24.

    A new era for ab initio molecular crystal lattice energy prediction. Angew. Chem. Int. Ed. 54, 396–398 (2015)

  25. 25.

    & Will it crystallise? Predicting crystallinity of molecular materials. CrystEngComm 17, 1927–1934 (2015)

  26. 26.

    The Cambridge Structural Database: a quarter of a million crystal structures and rising. Acta Crystallogr. B58, 380–388 (2002)

  27. 27.

    et al. Formation principles for vanadium selenites: the role of pH on product composition. Inorg. Chem. 53, 12027–12035 (2014)

  28. 28.

    JChem 6.1.3, (ChemAxon, 2013)

  29. 29.

    , & The Elements of Statistical Learning 2nd edn, Ch. 9, 12, 13, 15 (Springer, 2009)

  30. 30.

    & Support-vector networks. Mach. Learn. 20, 273–297 (1995)

  31. 31.

    , & Facilitating the application of support vector regression by using a universal Pearson VII function based kernel. Chemom. Intell. Lab. Syst. 81, 29–40 (2006)

  32. 32.

    et al. The WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11, 10–18 (2009)

  33. 33.

    & LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)

  34. 34.

    Text categorization with support vector machines: Learning with many relevant features. In Proc. 10th European Conf. Machine Learning (eds & ) 137–142 (Springer, 1998)

  35. 35.

    & An Introduction to Chemoinformatics Ch. 5 (Springer, 2007)

  36. 36.

    & Open-source platform to benchmark fingerprints for ligand-based virtual screening. J. Cheminformat. 5, 26 (2013)

  37. 37.

    , & Role of N-donor sterics on the coordination environment and dimensionality of uranyl thiophenedicarboxylate coordination polymers. Cryst. Growth Des. 15, 3481–3492 (2015)

  38. 38.

    R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2015)

  39. 39.

    & Eclectic rule-extraction from support vector machines. Int. J. Comput. Intell. 2, 59–62 (2005)

Download references

Acknowledgements

We thank Y. Huang, G. Martin-Noble and D. Reilley for data entry and J. H. Koffer for synthetic efforts. M.Z. acknowledges support for the purchase of a diffractometer by the National Science Foundation (DMR 1337296), the Ohio Board of Reagents grant CAP-491 and from Youngstown State University. This work was supported by the National Science Foundation (DMR-1307801). A.J.N. and J.S. each acknowledge the Henry Dreyfus Teacher-Scholar Award program.

Author information

Affiliations

  1. Haverford College, 370 Lancaster Avenue, Haverford, Pennsylvania 19041, USA

    • Paul Raccuglia
    • , Katherine C. Elbert
    • , Philip D. F. Adler
    • , Casey Falk
    • , Malia B. Wenny
    • , Aurelio Mollo
    • , Sorelle A. Friedler
    • , Joshua Schrier
    •  & Alexander J. Norquist
  2. Department of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907-2084, USA

    • Matthias Zeller

Authors

  1. Search for Paul Raccuglia in:

  2. Search for Katherine C. Elbert in:

  3. Search for Philip D. F. Adler in:

  4. Search for Casey Falk in:

  5. Search for Malia B. Wenny in:

  6. Search for Aurelio Mollo in:

  7. Search for Matthias Zeller in:

  8. Search for Sorelle A. Friedler in:

  9. Search for Joshua Schrier in:

  10. Search for Alexander J. Norquist in:

Contributions

S.A.F., J.S. and A.J.N. conceived the project and wrote the paper. A.J.N. supervised the data capture. C.F. developed the web-accessible database. A.J.N. and P.D.F.A. tested the data reliability. J.S. and P.R. developed the reactant descriptors. P.R., C.F. and S.A.F. developed the machine-learning models. J.S. performed diamine selection. P.D.F.A performed the Cambridge Structural Database search. K.C.E., M.B.W. and A.M. performed the hydrothermal experimental reactions, supervised by A.J.N. M.Z. performed X-ray crystallography on the resulting products. P.D.F.A. performed the statistical analyses. P.D.F.A., A.J.N., J.S. and S.A.F. performed the decision-tree calculation and analysis. All authors discussed the results and commented on the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Sorelle A. Friedler or Joshua Schrier or Alexander J. Norquist.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Text and Data, Supplementary Tables 1-10 and Supplementary Figures 1-5. Included are tables of descriptor definitions, model evaluation results, a learning curve, synthetic and crystallographic details, packing figures, amine structures and a full decision tree.

CSV files

  1. 1.

    Supplementary Data

    This file contains information on the historical reactions, gathered from historical laboratory notebooks. This was the data used to construct the SVM model described in the manuscript.

  2. 2.

    Supplementary Data

    This file contains information on the new experiments that were performed to test whether the model improved upon human strategies ("chemical intuition”), during the course of this study. These reactions were not used to train the model.

Text files

  1. 1.

    Supplementary Data

    This shell script file contains the specific model names and parameters used in the model construction described in Table S5 of the Supplementary Information.

Crystallographic information files

  1. 1.

    Supplementary Data

    This file contains the full crystallographic details for [C6H22N4][VO(C2O4)(SeO3)]2·2H2O.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature17439

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.