The discovery of new structural and functional materials is driven by phase identification, often using X-ray diffraction (XRD). Automation has accelerated the rate of XRD measurements, greatly outpacing XRD analysis techniques that remain manual, time-consuming, error-prone and impossible to scale. With the advent of autonomous robotic scientists or self-driving laboratories, contemporary techniques prohibit the integration of XRD. Here, we describe a computer program for the autonomous characterization of XRD data, driven by artificial intelligence (AI), for the discovery of new materials. Starting from structural databases, we train an ensemble model using a physically accurate synthetic dataset, which outputs probabilistic classifications—rather than absolutes—to overcome the overconfidence in traditional neural networks. This AI agent behaves as a companion to the researcher, improving accuracy and offering substantial time savings. It is demonstrated on a diverse set of organic and inorganic materials characterization challenges. This method is directly applicable to inverse design approaches and robotic discovery systems, and can be immediately considered for other forms of characterization such as spectroscopy and the pair distribution function.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Communications Materials Open Access 09 November 2022
Communications Materials Open Access 30 August 2022
npj Computational Materials Open Access 05 April 2022
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
The experimental datasets and code used for constructing the synthetic datasets are available as examples with the source code. Source data are provided with this paper.
To facilitate the impact of this tool, the approach is kept entirely open-source under the BSD 3-clause license and is being embedded into data acquisition frameworks at central facilities (https://blueskyproject.io). Ongoing development of this tool is located at https://github.com/maffettone/xca. A release at the time of publication and example code for the results contained here are available at https://github.com/bnl/pub-Maffettone_2020_0853. The Bayesian optimization code is available at https://github.com/maffettone/bayes_opt.
Granda, J. M., Donina, L., Dragone, V., Long, D.-L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).
MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. Sci. Adv. 6, eaaz8867 (2020).
Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).
Iwasaki, Y., Kusne, A. G. & Takeuchi, I. Comparison of dissimilarity measures for cluster analysis of X-ray diffraction data from combinatorial libraries. npj Comput. Mater. 3, 4 (2017).
Stanev, V. et al. Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering. npj Comput. Mater. 4, 43 (2018).
Xiong, Z., He, Y., Hattrick-Simpers, J. R. & Hu, J. Automated phase segmentation for large-scale X-ray diffraction data using a graph-based phase segmentation (GPhase) algorithm. ACS Comb. Sci. 19, 137–144 (2017).
Long, C. J., Bunker, D., Li, X., Karen, V. L. & Takeuchi, I. Rapid identification of structural phases in combinatorial thin-film libraries using X-ray diffraction and non-negative matrix factorization. Rev. Sci. Instrum. 80, 103902 (2009).
Takeuchi, I. et al. Data management and visualization of X-ray diffraction spectra from thin film ternary composition spreads. Rev. Sci. Instrum. 76, 062223 (2005).
Oviedo, F. et al. Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks. npj Comput. Mater. 5, 60 (2019).
Lee, J.-W., Park, W. B., Lee, J. H., Singh, S. P. & Sohn, K.-S. A deep-learning technique for phase identification in multiphase inorganic compounds using synthetic XRD powder patterns. Nat. Commun. 11, 86 (2020).
Ziletti, A., Kumar, D., Scheffler, M. & Ghiringhelli, L. M. Insightful classification of crystal structures using deep learning. Nat. Commun. 9, 2775 (2018).
Aguiar, J. A., Gong, M. L., Unocic, R. R., Tasdizen, T. & Miller, B. D. Decoding crystallography from high-resolution electron imaging and diffraction datasets with deep learning. Sci. Adv. 5, eaaw1949 (2019).
Chen, D. et al. Deep reasoning networks for unsupervised pattern de-mixing with constraint reasoning. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Bach, F. & Blei, D.) 1500–1509 (PMLR, 2020).
Park, W. B. et al. Classification of crystal structure using a convolutional neural network. IUCrJ 4, 486–494 (2017).
King, R. D. Rise of the robo scientists. Sci. Am. 304, 72–77 (2011).
Li, J. et al. Synthesis of many different types of organic small molecules using one automated process. Science 347, 1221–1226 (2015).
Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017).
Buenconsejo, P. J. S. & Ludwig, A. Composition–structure–function diagrams of Ti–Ni–Au thin film shape memory alloys. ACS Comb. Sci. 16, 678–685 (2014).
Langner, S. et al. Beyond ternary OPV: high-throughput experimentation and self-driving laboratories optimize multicomponent systems. Adv. Mater. 32, 1907801 (2020).
Steiner, S. et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 363, eaav2211 (2019).
Bédard, A.-C. et al. Reconfigurable system for automated optimization of diverse chemical reactions. Science 361, 1220–1225 (2018).
Patterson, A. L. Homometric structures. Nature 143, 939–940 (1939).
Collins, C. et al. Accelerated discovery of two crystal structure types in a complex inorganic phase field. Nature 546, 280–284 (2017).
Pulido, A. et al. Functional materials discovery using energy–structure–function maps. Nature 543, 657–664 (2017).
Ivanisevic, I., Bugay, D. E. & Bates, S. On pattern matching of X-ray powder diffraction data. J. Phys. Chem. B 109, 7781–7787 (2005).
Huang, T. C. & Parrish, W. A new computer algorithm for qualitative X-ray powder diffraction analysis. Adv. X-ray Anal. 25, 213–219 (1981).
Gregoire, J. M., Dale, D. & van Dover, R. B. A wavelet transform algorithm for peak detection and application to powder X-ray diffraction data. Rev. Sci. Instrum. 82, 015105 (2011).
Stein, H. S., Jiao, S. & Ludwig, A. Expediting combinatorial data set analysis by combining human and algorithmic analysis. ACS Comb. Sci. 19, 1–8 (2017).
Ermon, S. et al. Pattern decomposition with complex combinatorial constraints: application to materials discovery. In Proc. Twenty-Ninth AAAI Conference on Artificial Intelligence AAAI’15, 636–643 (AAAI Press, 2015).
Xue, Y. et al. Phase-mapper: an AI platform to accelerate high throughput materials discovery. In 29th Conference on Innovative Applications of Artificial Intelligence (AAAI Press, 2017); https://aaai.org/ocs/index.php/IAAI/IAAI17/paper/view/14799
Kusne, A. G., Keller, D., Anderson, A., Zaban, A. & Takeuchi, I. High-throughput determination of structural phase diagram and constituent phases using grendel. Nanotechnology 26, 444002 (2015).
Suram, S. K. et al. Automated phase mapping with agilefd and its application to light absorber discovery in the V–Mn–Nb oxide system. ACS Comb. Sci. 19, 37–46 (2017).
Kaufmann, K., Zhu, C., Rosengarten, A. S. & Vecchio, K. S. Deep neural network enabled space group identification in EBSD. Microsc. Microanal. 26, 447–457 (2020).
Blundell, C., Cornebise, J., Kavukcuoglu, K. & Wierstra, D. Weight uncertainty in neural network. In Proc. 32nd International Conference on Machine Learning Vol. 37 (eds Bach, F. & Blei, D.) 1613–1622 (PMLR, 2015).
Wang, H. et al. Rapid identification of X-ray diffraction patterns based on very limited data by interpretable convolutional neural networks. J. Chem. Inf. Model. 60, 2004–2011 (2020).
Page, K., Proffen, T., Niederberger, M. & Seshadri, R. Probing local dipoles and ligand structure in BaTiO3 nanoparticles. Chem. Matter. 22, 4386–4391 (2010).
Ermer, O. Five-fold diamond structure of adamantane-1,3,5,7-tetracarboxylic acid. J. Am. Chem. Soc. 110, 3747–3754 (1988).
Cui, P. et al. Mining predicted crystal structure landscapes with high throughput crystallisation: old molecules, new insights. Chem. Sci. 10, 9988–9997 (2019).
Ludwig, A. Discovery of new materials using combinatorial synthesis and high-throughput characterization of thin-film materials libraries combined with computational methods. npj Comput. Mater. 5, 70 (2019).
Wegner, M., Gu, H., James, R. D. & Quandt, E. Correlation between phase compatibility and efficient energy conversion in Zr-doped Barium Titanate. Sci. Rep. 10, 3496 (2020).
Bernstein, J. Polymorphism in Molecular Crystals (Oxford Univ. Press, 2010).
Slater, A. G. et al. Computationally-guided synthetic control over pore size in isostructural porous organic cages. ACS Cent. Sci. 3, 734–742 (2017).
Cui, P. et al. An expandable hydrogen-bonded organic framework characterized by three-dimensional electron diffraction. J. Am. Chem. Soc. 142, 12743–12750 (2020).
Decker, P., Naujoks, D., Langenkämper, D., Somsen, C. & Ludwig, A. High-throughput structural and functional characterization of the thin film materials system Ni–Co–Al. ACS Comb. Sci. 19, 618–624 (2017).
Naujoks, D. et al. Phase formation and oxidation behavior at 500 ∘C in a Ni–Co–Al thin-film materials library. ACS Comb. Sci. 18, 575–582 (2016).
Miracle, D. B. & Senkov, O. N. A critical review of high entropy alloys and related concepts. Acta Mater. 122, 448–511 (2017).
Löffler, T. et al. Toward a paradigm shift in electrocatalysis using complex solid solution nanoparticles. ACS Energy Lett. 4, 1206–1214 (2019).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Li, Z. et al. Robot-accelerated perovskite investigation and discovery. Chem. Mater. 32, 5650–5663 (2020).
Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. The Computational Crystallography Toolbox: crystallographic algorithms in a reusable software framework. J. Appl. Cryst. 35, 126–136 (2002).
Giacovazzo, C. (ed.) Fundamentals of Crystallography 3rd edn (Oxford Univ. Press, 2011).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations, ICLR 2015 (eds Bengio, Y. & LeCun, Y.) (ICLR, 2015); https://arxiv.org/pdf/1412.6980.pdf
Maffettone, P. M. et al. bnl/pub-maffettone_2020_08 (2021); https://doi.org/10.11578/dc.20210316.6
We acknowledge financial support from the Engineering and Physical Sciences Research Council (EPSRC) (grant no. EP/N004884/1; P.M.M., M.A.L. and A.I.C.), BNL Laboratory Directed Research and Development (LDRD) projects 20-032 ‘Accelerating materials discovery with total scattering via machine learning’ (P.M.M. and D.O.), the Leverhulme Trust via the Leverhulme Research Centre for Functional Materials Design (P.C. and A.I.C.) and the German Research Foundation (DFG) as part of the Collaborative Research Centre TRR87/3 ‘Pulsed high power plasmas for the synthesis of nanostructured functional layers’ (SFB-TR 87), project C2 (L.B., Y.L. and A.L.). This research utilized the PDF (28-ID-1) Beamline and resources of the National Synchrotron Light Source II, a US Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under contract no. DE-SC0012704. We thank ZGH (Zentrum für Grenzflächendominierte Höchstleistungswerkstoffe, Ruhr-Universität Bochum) and Diamond Light Source for access to beamlines I19 (MT15777) and I11 (EE17193) for XRD measurements.
The authors declare no competing interests.
Peer review information Nature Computational Science thanks Wenhao Sun and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Jie Pan was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
XRD pattern data for Fig. 1a,b.
Sample experimental XRD data from each dataset in Fig. 3.
Probability data for BaTiO3, Confusion matrix for ADTA, and output probabilities for ternary NiCoAl phase diagrams.
Source data for benchmark plots.
About this article
Cite this article
Maffettone, P.M., Banko, L., Cui, P. et al. Crystallography companion agent for high-throughput materials discovery. Nat Comput Sci 1, 290–297 (2021). https://doi.org/10.1038/s43588-021-00059-2
This article is cited by
npj Computational Materials (2022)
Communications Materials (2022)
Human- and machine-centred designs of molecules and materials for sustainability and decarbonization
Nature Reviews Materials (2022)
Communications Materials (2022)
Identification of chemical compositions from “featureless” optical absorption spectra: Machine learning predictions and experimental validations
Nano Research (2022)