Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Crystallography companion agent for high-throughput materials discovery

A preprint version of the article is available at arXiv.


The discovery of new structural and functional materials is driven by phase identification, often using X-ray diffraction (XRD). Automation has accelerated the rate of XRD measurements, greatly outpacing XRD analysis techniques that remain manual, time-consuming, error-prone and impossible to scale. With the advent of autonomous robotic scientists or self-driving laboratories, contemporary techniques prohibit the integration of XRD. Here, we describe a computer program for the autonomous characterization of XRD data, driven by artificial intelligence (AI), for the discovery of new materials. Starting from structural databases, we train an ensemble model using a physically accurate synthetic dataset, which outputs probabilistic classifications—rather than absolutes—to overcome the overconfidence in traditional neural networks. This AI agent behaves as a companion to the researcher, improving accuracy and offering substantial time savings. It is demonstrated on a diverse set of organic and inorganic materials characterization challenges. This method is directly applicable to inverse design approaches and robotic discovery systems, and can be immediately considered for other forms of characterization such as spectroscopy and the pair distribution function.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Experimental XRD complexity and training an ensemble using synthetic data.
Fig. 2: Schematic of the crystallography companion agent (XCA).
Fig. 3: Schematics and example XRD data of three different inorganic and organic materials challenges.
Fig. 4: Autonomous XRD analysis results from XCA.
Fig. 5: Comparing different approaches to building a synthetic dataset and classifier.

Data availability

The experimental datasets and code used for constructing the synthetic datasets are available as examples with the source code. Source data are provided with this paper.

Code availability

To facilitate the impact of this tool, the approach is kept entirely open-source under the BSD 3-clause license and is being embedded into data acquisition frameworks at central facilities ( Ongoing development of this tool is located at A release at the time of publication and example code for the results contained here are available at The Bayesian optimization code is available at


  1. 1.

    Granda, J. M., Donina, L., Dragone, V., Long, D.-L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).

    Google Scholar 

  2. 2.

    MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. Sci. Adv. 6, eaaz8867 (2020).

  3. 3.

    Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).

    Google Scholar 

  4. 4.

    Iwasaki, Y., Kusne, A. G. & Takeuchi, I. Comparison of dissimilarity measures for cluster analysis of X-ray diffraction data from combinatorial libraries. npj Comput. Mater. 3, 4 (2017).

    Google Scholar 

  5. 5.

    Stanev, V. et al. Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering. npj Comput. Mater. 4, 43 (2018).

    Google Scholar 

  6. 6.

    Xiong, Z., He, Y., Hattrick-Simpers, J. R. & Hu, J. Automated phase segmentation for large-scale X-ray diffraction data using a graph-based phase segmentation (GPhase) algorithm. ACS Comb. Sci. 19, 137–144 (2017).

    Google Scholar 

  7. 7.

    Long, C. J., Bunker, D., Li, X., Karen, V. L. & Takeuchi, I. Rapid identification of structural phases in combinatorial thin-film libraries using X-ray diffraction and non-negative matrix factorization. Rev. Sci. Instrum. 80, 103902 (2009).

    Google Scholar 

  8. 8.

    Takeuchi, I. et al. Data management and visualization of X-ray diffraction spectra from thin film ternary composition spreads. Rev. Sci. Instrum. 76, 062223 (2005).

    Google Scholar 

  9. 9.

    Oviedo, F. et al. Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks. npj Comput. Mater. 5, 60 (2019).

    Google Scholar 

  10. 10.

    Lee, J.-W., Park, W. B., Lee, J. H., Singh, S. P. & Sohn, K.-S. A deep-learning technique for phase identification in multiphase inorganic compounds using synthetic XRD powder patterns. Nat. Commun. 11, 86 (2020).

    Google Scholar 

  11. 11.

    Ziletti, A., Kumar, D., Scheffler, M. & Ghiringhelli, L. M. Insightful classification of crystal structures using deep learning. Nat. Commun. 9, 2775 (2018).

    Google Scholar 

  12. 12.

    Aguiar, J. A., Gong, M. L., Unocic, R. R., Tasdizen, T. & Miller, B. D. Decoding crystallography from high-resolution electron imaging and diffraction datasets with deep learning. Sci. Adv. 5, eaaw1949 (2019).

    Google Scholar 

  13. 13.

    Chen, D. et al. Deep reasoning networks for unsupervised pattern de-mixing with constraint reasoning. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Bach, F. & Blei, D.) 1500–1509 (PMLR, 2020).

  14. 14.

    Park, W. B. et al. Classification of crystal structure using a convolutional neural network. IUCrJ 4, 486–494 (2017).

    Google Scholar 

  15. 15.

    King, R. D. Rise of the robo scientists. Sci. Am. 304, 72–77 (2011).

    Google Scholar 

  16. 16.

    Li, J. et al. Synthesis of many different types of organic small molecules using one automated process. Science 347, 1221–1226 (2015).

    Google Scholar 

  17. 17.

    Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017).

    Google Scholar 

  18. 18.

    Buenconsejo, P. J. S. & Ludwig, A. Composition–structure–function diagrams of Ti–Ni–Au thin film shape memory alloys. ACS Comb. Sci. 16, 678–685 (2014).

    Google Scholar 

  19. 19.

    Langner, S. et al. Beyond ternary OPV: high-throughput experimentation and self-driving laboratories optimize multicomponent systems. Adv. Mater. 32, 1907801 (2020).

    Google Scholar 

  20. 20.

    Steiner, S. et al. Organic synthesis in a modular robotic system driven by a chemical programming language. Science 363, eaav2211 (2019).

    Google Scholar 

  21. 21.

    Bédard, A.-C. et al. Reconfigurable system for automated optimization of diverse chemical reactions. Science 361, 1220–1225 (2018).

    Google Scholar 

  22. 22.

    Patterson, A. L. Homometric structures. Nature 143, 939–940 (1939).

    Google Scholar 

  23. 23.

    Collins, C. et al. Accelerated discovery of two crystal structure types in a complex inorganic phase field. Nature 546, 280–284 (2017).

    Google Scholar 

  24. 24.

    Pulido, A. et al. Functional materials discovery using energy–structure–function maps. Nature 543, 657–664 (2017).

    Google Scholar 

  25. 25.

    Ivanisevic, I., Bugay, D. E. & Bates, S. On pattern matching of X-ray powder diffraction data. J. Phys. Chem. B 109, 7781–7787 (2005).

    Google Scholar 

  26. 26.

    Huang, T. C. & Parrish, W. A new computer algorithm for qualitative X-ray powder diffraction analysis. Adv. X-ray Anal. 25, 213–219 (1981).

    Google Scholar 

  27. 27.

    Gregoire, J. M., Dale, D. & van Dover, R. B. A wavelet transform algorithm for peak detection and application to powder X-ray diffraction data. Rev. Sci. Instrum. 82, 015105 (2011).

    Google Scholar 

  28. 28.

    Stein, H. S., Jiao, S. & Ludwig, A. Expediting combinatorial data set analysis by combining human and algorithmic analysis. ACS Comb. Sci. 19, 1–8 (2017).

    Google Scholar 

  29. 29.

    Ermon, S. et al. Pattern decomposition with complex combinatorial constraints: application to materials discovery. In Proc. Twenty-Ninth AAAI Conference on Artificial Intelligence AAAI’15, 636–643 (AAAI Press, 2015).

  30. 30.

    Xue, Y. et al. Phase-mapper: an AI platform to accelerate high throughput materials discovery. In 29th Conference on Innovative Applications of Artificial Intelligence (AAAI Press, 2017);

  31. 31.

    Kusne, A. G., Keller, D., Anderson, A., Zaban, A. & Takeuchi, I. High-throughput determination of structural phase diagram and constituent phases using grendel. Nanotechnology 26, 444002 (2015).

    Google Scholar 

  32. 32.

    Suram, S. K. et al. Automated phase mapping with agilefd and its application to light absorber discovery in the V–Mn–Nb oxide system. ACS Comb. Sci. 19, 37–46 (2017).

    Google Scholar 

  33. 33.

    Kaufmann, K., Zhu, C., Rosengarten, A. S. & Vecchio, K. S. Deep neural network enabled space group identification in EBSD. Microsc. Microanal. 26, 447–457 (2020).

    Google Scholar 

  34. 34.

    Blundell, C., Cornebise, J., Kavukcuoglu, K. & Wierstra, D. Weight uncertainty in neural network. In Proc. 32nd International Conference on Machine Learning Vol. 37 (eds Bach, F. & Blei, D.) 1613–1622 (PMLR, 2015).

  35. 35.

    Wang, H. et al. Rapid identification of X-ray diffraction patterns based on very limited data by interpretable convolutional neural networks. J. Chem. Inf. Model. 60, 2004–2011 (2020).

    Google Scholar 

  36. 36.

    Page, K., Proffen, T., Niederberger, M. & Seshadri, R. Probing local dipoles and ligand structure in BaTiO3 nanoparticles. Chem. Matter. 22, 4386–4391 (2010).

    Google Scholar 

  37. 37.

    Ermer, O. Five-fold diamond structure of adamantane-1,3,5,7-tetracarboxylic acid. J. Am. Chem. Soc. 110, 3747–3754 (1988).

    Google Scholar 

  38. 38.

    Cui, P. et al. Mining predicted crystal structure landscapes with high throughput crystallisation: old molecules, new insights. Chem. Sci. 10, 9988–9997 (2019).

    Google Scholar 

  39. 39.

    Ludwig, A. Discovery of new materials using combinatorial synthesis and high-throughput characterization of thin-film materials libraries combined with computational methods. npj Comput. Mater. 5, 70 (2019).

    Google Scholar 

  40. 40.

    Wegner, M., Gu, H., James, R. D. & Quandt, E. Correlation between phase compatibility and efficient energy conversion in Zr-doped Barium Titanate. Sci. Rep. 10, 3496 (2020).

    Google Scholar 

  41. 41.

    Bernstein, J. Polymorphism in Molecular Crystals (Oxford Univ. Press, 2010).

  42. 42.

    Slater, A. G. et al. Computationally-guided synthetic control over pore size in isostructural porous organic cages. ACS Cent. Sci. 3, 734–742 (2017).

    Google Scholar 

  43. 43.

    Cui, P. et al. An expandable hydrogen-bonded organic framework characterized by three-dimensional electron diffraction. J. Am. Chem. Soc. 142, 12743–12750 (2020).

    Google Scholar 

  44. 44.

    Decker, P., Naujoks, D., Langenkämper, D., Somsen, C. & Ludwig, A. High-throughput structural and functional characterization of the thin film materials system Ni–Co–Al. ACS Comb. Sci. 19, 618–624 (2017).

    Google Scholar 

  45. 45.

    Naujoks, D. et al. Phase formation and oxidation behavior at 500 C in a Ni–Co–Al thin-film materials library. ACS Comb. Sci. 18, 575–582 (2016).

    Google Scholar 

  46. 46.

    Miracle, D. B. & Senkov, O. N. A critical review of high entropy alloys and related concepts. Acta Mater. 122, 448–511 (2017).

    Google Scholar 

  47. 47.

    Löffler, T. et al. Toward a paradigm shift in electrocatalysis using complex solid solution nanoparticles. ACS Energy Lett. 4, 1206–1214 (2019).

    Google Scholar 

  48. 48.

    Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).

    Google Scholar 

  49. 49.

    Li, Z. et al. Robot-accelerated perovskite investigation and discovery. Chem. Mater. 32, 5650–5663 (2020).

    Google Scholar 

  50. 50.

    Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. The Computational Crystallography Toolbox: crystallographic algorithms in a reusable software framework. J. Appl. Cryst. 35, 126–136 (2002).

    Google Scholar 

  51. 51.

    Giacovazzo, C. (ed.) Fundamentals of Crystallography 3rd edn (Oxford Univ. Press, 2011).

  52. 52.

    Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations, ICLR 2015 (eds Bengio, Y. & LeCun, Y.) (ICLR, 2015);

  53. 53.

    Maffettone, P. M. et al. bnl/pub-maffettone_2020_08 (2021);

Download references


We acknowledge financial support from the Engineering and Physical Sciences Research Council (EPSRC) (grant no. EP/N004884/1; P.M.M., M.A.L. and A.I.C.), BNL Laboratory Directed Research and Development (LDRD) projects 20-032 ‘Accelerating materials discovery with total scattering via machine learning’ (P.M.M. and D.O.), the Leverhulme Trust via the Leverhulme Research Centre for Functional Materials Design (P.C. and A.I.C.) and the German Research Foundation (DFG) as part of the Collaborative Research Centre TRR87/3 ‘Pulsed high power plasmas for the synthesis of nanostructured functional layers’ (SFB-TR 87), project C2 (L.B., Y.L. and A.L.). This research utilized the PDF (28-ID-1) Beamline and resources of the National Synchrotron Light Source II, a US Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under contract no. DE-SC0012704. We thank ZGH (Zentrum für Grenzflächendominierte Höchstleistungswerkstoffe, Ruhr-Universität Bochum) and Diamond Light Source for access to beamlines I19 (MT15777) and I11 (EE17193) for XRD measurements.

Author information




P.M.M., L.B. and Y.L. conceived the project. P.M.M. led the development of XCA and coordinated the research teams. L.B. contributed to development, prepared the alloy dataset and guided the inorganic dataset synthesis. P.C. and M.A.L. crystallized ADTA and measured XRD data. Y.L. advised on the machine learning. D.O. measured the BaTiO3 and advised on the relevant studies. A.L. supervised the development and the alloy studies. A.I.C. supervised the development and organic materials studies. Data were interpreted by all authors and the manuscript was prepared by all authors.

Corresponding authors

Correspondence to Phillip M. Maffettone or Andrew I. Cooper.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Computational Science thanks Wenhao Sun and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Jie Pan was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–40, discussion and Tables 1–4.

Source data

Source Data Fig. 1

XRD pattern data for Fig. 1a,b.

Source Data Fig. 3

Sample experimental XRD data from each dataset in Fig. 3.

Source Data Fig. 4

Probability data for BaTiO3, Confusion matrix for ADTA, and output probabilities for ternary NiCoAl phase diagrams.

Source Data Fig. 5

Source data for benchmark plots.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Maffettone, P.M., Banko, L., Cui, P. et al. Crystallography companion agent for high-throughput materials discovery. Nat Comput Sci 1, 290–297 (2021).

Download citation


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing