Deep learning for visualization and novelty detection in large X-ray diffraction datasets

Banko, Lars; Maffettone, Phillip M.; Naujoks, Dennis; Olds, Daniel; Ludwig, Alfred

doi:10.1038/s41524-021-00575-9

Download PDF

Article
Open access
Published: 09 July 2021

Deep learning for visualization and novelty detection in large X-ray diffraction datasets

npj Computational Materials volume 7, Article number: 104 (2021) Cite this article

6169 Accesses
28 Citations
8 Altmetric
Metrics details

Subjects

Abstract

We apply variational autoencoders (VAE) to X-ray diffraction (XRD) data analysis on both simulated and experimental thin-film data. We show that crystal structure representations learned by a VAE reveal latent information, such as the structural similarity of textured diffraction patterns. While other artificial intelligence (AI) agents are effective at classifying XRD data into known phases, a similarly conditioned VAE is uniquely effective at knowing what it doesn’t know: it can rapidly identify data outside the distribution it was trained on, such as novel phases and mixtures. These capabilities demonstrate that a VAE is a valuable AI agent for aiding materials discovery and understanding XRD measurements both ‘on-the-fly’ and during post hoc analysis.

Automated classification of big X-ray diffraction data using deep learning models

Article Open access 04 December 2023

Jerardo E. Salgado, Samuel Lerman, … Niaz Abdolrahim

Versatile domain mapping of scanning electron nanobeam diffraction datasets utilising variational autoencoders

Article Open access 21 January 2023

A. Bridger, W. I. F. David, … K. T. Butler

Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks

Article Open access 17 May 2019

Felipe Oviedo, Zekun Ren, … Tonio Buonassisi

Introduction

Innovations in high-throughput and autonomous experimentation^1,2,3,4 are exceedingly increasing the acquisition rate of data, particularly in the case of XRD. Manual analysis of combinatorial datasets, i.e. for the identification of composition–structure-relationships, is a challenging cognitive task that requires the ability to recognize patterns under the awareness of several constraints⁵. Recent progress has been made in using AI for unsupervised XRD dataset decomposition^6,7, crystal structure classification^{8,9,10,11,12,13,14,15,16,17,18,19,20}, and integrating the latter with autonomous experimentation²¹. Classification models are particularly promising, having been developed for a broad scope (classifying crystal system, space group, point group)^10,11,12,15, and specific challenges integrating experimental information^13,17,18,22. Further refinement of classification results could be achieved by model interpretation⁹. In broader approaches, experimentally relevant domain knowledge can be integrated modularly, e.g. by adding chemical composition¹⁵. Broader models operate without specific domain knowledge, such as predicted phases of an investigated materials or experimental measurement parameters, and can be directly applied to a broad suite of classification challenges without fine tuning^17,18. In contrast, experiment specific models have a more relevant, yet more narrow, distribution or training data, and thus struggle when encountering data outside of this distribution. Nonetheless, it is the domain knowledge and prior information (e.g. simulated X-ray diffractograms of expected phases from crystallographic database entries that encompass the breadth of possible experimental non-idealities) that makes these AI agents so successful¹⁸.

These priors stem from expert knowledge and expectation, and as such, even accurate, feed-forward classifiers can come to false conclusions when operating on new or novel data. Here, we define novelty as XRD patterns that are unknown to the AI, e.g. structures that were absent in the training data or XRD patterns of phase mixtures which were not considered in the training data. In the context of materials discovery, material novelty comprises unreported materials of certain chemical composition and crystal structure.

Hence, the recognition of novelty in large amounts of high-throughput data is a key challenge for scientists, and subsequently the tools that they use. An AI agent that is sensitive to novelty (i.e. aware of ‘what it doesn’t know‘) could inform scientists about experiments that mandate further investigation due to failure modes or material novelty. In this way, scientists can invest essential resources into the most promising areas and focus their manual effort where human experience is valued most.

Here, we use a variational autoencoder (VAE) (Fig. 1a) trained on a synthetic dataset as a prior^18,23 to solve commonly occurring visualization and novelty detection challenges in XRD analysis. Notably, this same synthetic dataset can be used to train a state-of-the-art classification model in tandem^18,24. Firstly, we identify regions of learned similarity and potentially degenerate solutions in our prior. Next, we develop a dynamic visualization tool for experimental XRD patterns and the VAE latent space. While this offers a correlation with structural classification, we show that visualizing the latent space with respect to the reconstruction error of the VAE allows for novelty detection during an experiment.

**Fig. 1: Visualization of the VAE latent space representation.**

Results and discussion

Visualizing a synthetic dataset with a variational autoencoder

Autoencoders are a class of self-supervised artificial neural networks that compress input data into a lower dimensional feature-value space (latent space) and then decodes this latent representation back into its original space. These have been broadly applied for the purpose of dimensionality reduction, denoising²⁵, anomaly²⁶ and novelty detection²⁷ and specific materials science applications such as detecting phase transitions²⁸, and translating between different dimensional representations, i.e., 2-d images and 1-d spectra²⁹. VAEs are a special class of autoencoders that approximate a posterior over latent random variables^30,31, jointly optimizing the reconstruction of the input and the Kullback-Leibler (KL) divergence between the latent representation and a smooth (often normal) distribution. This later loss function acts as a regularization mechanism, resulting in an efficient distribution over the latent space, and can be expanded to include physical constraints^32,33. The resulting probabilistic latent space has made these approaches popular for generating new data based on the prior training data³⁴, and inverse design in materials science^35,36.

We first study the behavior of a VAE on XRD data with regards to latent space distribution using a synthetic dataset of 15000 one-dimensional XRD patterns covering three spacegroups (Fm\(\overline{3}\)m (ISCD: 108308), Im\(\overline{3}\)m (ISCD: 108347), P6₃/mmc (ISCD: 622438), 5000 XRD patterns each). The dataset includes different aberrations that are frequently encountered in thin-film XRD patterns such as texture and preferred orientation, peak broadening, and peak shifting¹⁸. Figure 1c shows the latent space distribution of the validation dataset, color-coded by the spacegroup labels. The latent space representation provides direct visual evidence of the clustering properties of the encoder model and distribution of main reflection axes in the XRD patterns (Fig. 1b). In the latent feature space, proximity is a first indicator of structure type. P6₃/mmc and Fm\(\overline{3}\)m show an overlapping region which stems from homometrics: similar patterns that arise from distinct structures with preferred orientation in the training dataset. The XRD patterns, marked by green, red and blue crosses in Fig. 1c, are shown in Fig. 1d and highlight the similarity of Im\(\overline{3}\)m and P6₃/mmc for texturing along (020) and (102), respectively. By examining the latent space distribution with respect to the location of maximal intensity in the XRD patterns, we can see that the latent space is organized according to the main reflection axes of the crystal structures (Fig. 1b). This demonstrates how the learned latent representation can directly point towards possible structural ambiguities in the prior, and creates an explainable AI tool for understanding what determines location in latent space. Classification boundaries are mapped over this latent space to validate that the model is learning a physically meaningful representation, and to provide a visualization of the latent distribution and structural similarity (Fig. 2).

**Fig. 2: Classification and novelty detection using the reconstruction error.**

By interfacing a VAE trained on synthetic data with an experimental datastream³⁷, this latent representation can visualize a measurement in real time. To demonstrate how this visualization behaves with new or unknown phases, a set of 1000 XRD patterns of a novel phase, unknown to the model, with P4₂/mnm spacegroup (ICSD: 601378) are generated using the same parameter variation as in the training dataset. In Fig. 2, the reconstruction error is mapped over the latent space for the test set of all phases (known and unknown/ novel). The decision boundaries of a KNN classifier, trained on the latent representation are outlined. While it could be naively assumed from the visualized location in latent space (Fig. 2b) that the P4₂/mnm phase was the Im\(\overline{3}\)m or P6₃/mmc phase, the reconstruction error increases by an order of magnitude, which means that the naive assumption is most likely wrong. This indicates a posterior distribution which does not capture the information contained in the input XRD pattern: this is new to the model. The model encodes a latent representation for the P4₂/mnm phase near the P6₃/mmc and Im\(\overline{3}\)m phases which suggests similarly important reflections (e.g.: P4₂/mnm(330) = 43.6°2Θ, Im\(\overline{3}\)m(011) = 43.13°2Θ or P4₂/mnm(411) = 47.03°2Θ, P6₃/mmc(101) = 47.36°2Θ); however, the reconstruction error provides evidence of an unknown structure.

The reconstruction loss between the input and the decoder can also elicit knowledge of phase mixtures. Phase mixtures could be considered during dataset synthesis by generating binary and ternary mixtures; however, the combinatorial explosion in multinary material systems would drive the dataset size to numbers that cannot be handled efficiently¹⁶. To test the VAE behavior with phase mixtures, we generate a dataset of binary combinations of the three structures (Fm\(\overline{3}\)m, Im\(\overline{3}\)m, P6₃/mmc) for 50 different binary compositions between 0 and 100%, and calculate the reconstruction error (Fig. 3a). An increase in average reconstruction error of approximately one order of magnitude is observed for mixtures of known phases and a pure unknown phase. Additionally, the distributions show a larger spread. When considering the reconstruction error as a function of phase fraction, the error is maximized at approximately 50% for all binary mixtures (Fig. 3b), indicating a maximum in reconstruction error for XRD patterns that are furthest apart from the training data. The large standard deviation results from cases where the mixture of certain XRD patterns with preferred orientation shows similarity to a pure phase. These properties show that unlike contemporary classification models, a VAE has an indication of when it encounters something unfamiliar. The reconstruction error can be used for decision making by alerting a scientist to review the classification results of a corollary classification model^8,15,16,18, guiding an acquisition function for curiosity or exploration driven experiments, or to select an appropriate de-mixing model for multi-phase classification^22,24,38.

**Fig. 3: VAE reconstruction error of phases and phase mixtures.**

Visualization and anomaly detection in an experimental setting

We further tested the VAE in an experimental setting where a multitude of candidate phases are possible. An exemplary experimental dataset acquired from a materials library of the thin film system Co-Ni-Cr-Re contains 225 XRD patterns. Nine quasi-ternary composition spreads are distributed on a rectangular grid over a 100 mm substrate (Supplementary Figure 1). We selected 21 possible structures from the ICSD database and generated a synthetic dataset following the procedures outlined by Maffettone et al.¹⁸. The dataset contains several phases that show similar crystal structures for different chemical compositions. The chemical composition is a natural constraint in XRD analysis that bounds phase stability. A sensible alternative condition, beyond the scope of this work, is the phase transition temperature. The chemical composition acts as a constraint using a conditional VAE (cVAE)³⁹. In this case, the cVAE encodes the concatenation of chemical composition and 1D XRD pattern (Supplementary Information). Latent variables and chemical composition are concatenated and passed to the decoder. The relation between chemical composition and crystal structure is coded in the cVAEs parameters after training on the selected 21 structures from database entries.

The VAE is now conditioned on the prior of both the XRD pattern and the chemical composition, which implicates that a mismatch between the conditional (chemical composition) and the XRD pattern will respond with an increased reconstruction error. We exemplify this behavior by recreating the above approach with synthetic data, but using a cVAE (Supplementary Figures 2 and 3). Following training on the synthetic data, the reconstruction error of the experimental data is evaluated and mapped over the physical coordinates (Fig. 4a). We inspect three XRD patterns with different reconstruction errors in Fig. 4b. We also demonstrate the veracity of the latent representation by training a Gaussian Naïve Bayes (GNB) classifier on an ensemble of cVAEs to predict space group from latent vector and chemical composition (Supplementary Figure 4). As evident from the spacegroup classification results (Supplementary Fig. 4) the majority of the XRD patterns have a high probability of Fm\(\overline{3}\)m spacegroup while with increasing Co and Re concentration P6₃/mmc and P4₂/mnm probabilities are enhanced.

**Fig. 4: XRD pattern reconstruction by VAE and reconstruction error as a measure of novelty.**

The sample with the composition Co₅₄Cr₅Ni₃₁Re₁₀ (Fig. 4, red) shows a highly textured Fm\(\overline{3}\)m – Ni structure that was part of the synthetic dataset. Hence, the reconstruction error is minimal and the VAE-reconstructed XRD pattern closely matches the experimental pattern. The classification probability of the GNB is 73% (Supplementary Figure 4). The sample Co₆₃Cr₁₈Ni₁₄Re₅ (Fig. 4, blue) shows a phase mixture of Fm\(\overline{3}\)m – Ni (GNB probability: 45%) and P6₃/mmc – Co (GNB probability: 40%). As phase mixtures are unknown by the model, the reconstruction error is amplified, which informed us that closer inspection of this XRD pattern is required. The sample Co₂₉Cr₂₉Ni₆Re₃₆ (Fig. 4, yellow) shows a large peak at 39°2Θ that matches to Fm\(\overline{3}\)m – Ni (GNB probability: 43%) and a broad side-peak that could be associated with P4₂/mnm (GNB probability: 39%) according to the GNB prediction.

In summary, we have demonstrated that variational autoencoders provide a powerful toolset to assist in the analysis of XRD datasets that is complementary to other analysis methods. The crystal structure representation provided by the VAE latent vectors highlights intrinsic features of the dataset. This representation can be used for on-the-fly analysis of a dataset distribution across different structures, in which latent space proximity is a structural estimator. The reconstruction error informs the experimenter or adaptive AI driving an autonomous experiment of suspicious data, and assists in detecting unknown phases or phase mixtures. These models can be improved in the future by incorporating domain specific physics into the latent space as well as the dataset^32,33. We envision the VAE working cooperatively together in a federation of specialized AI agents. Combining the power of modern classification agents^16,18,38, data reduction agents⁴⁰, and the wisdom of ignorance offered by the VAE with adaptive experimental agents^21,41 will significantly increase the veracity and efficacy of high-throughput diffraction measurements.

Methods

XRD pattern simulation

The computed X-ray diffraction datasets used in this study were synthesized using a custom python package based on CCTBX⁴². Three crystallography information files (CIF) from three different spacegroups (ICSD-ids: 108347, 108308, 622438) were chosen. 5000 XRD patterns were simulated for Cu Kα radiation in an angular range from 20–90°2Θ. For each structure the same parameters that are optimized in a Rietveld refinement are varied. The angular range is propagated on 2048 datapoints. The 21-structure dataset for the Co–Ni–Cr–Re system was generated using the same parameters. Binary phase mixtures are calculated according to the formula \(x\star{\mathrm{XRD}}_{\mathrm{A}} + \left( {1 - x} \right)\star{\mathrm{XRD}}_{\mathrm{B}}\) and subsequent normalization of the intensity to 0 and 1.

Variational autoencoder models

The variational autoencoders were written with Python 3.7, using Tensorflow 2.1 and the Keras module. The encoder network consists of an input layer (2048) followed by two dense layers for the encoder (sizes 256 and 128) and a latent space with 2 dimensions (µ and σ). A sampling layer z randomly samples datapoints from a normal distribution (µ = 0 and σ = 1). The output of z is connected to two dense layers of the decoder (sizes 128 and 256) that is connected to an output layer (2048). The loss function is the weighted sum of the binary cross entropy between input XRD and decoded output and the KL-divergence calculated from the latent space distribution and a 2D Gaussian function. The reconstruction loss is weighted by a factor of 2048 to obtain a clear separation of structural features in the latent space. The conditional VAE used on the experimental dataset comprises the concatenation of the 1D XRD and the normalized chemical compositions as inputs. The latent variable output and the chemical composition are concatenated and provide the input to the decoder.

Synthesis of composition spread materials library

The Co–Ni–Cr–Re thin film composition spread materials library was deposited in a high vacuum magnetron sputter system (DCA Finland). Deposition was conducted from four 100 mm diameter magnetron sputter cathodes with pure elemental targets in Ar at a pressure of 0.66 Pa and no intentional heating of the substrate. The Ar flow was 60 sccm. Composition gradients were created by using a computer-controlled substrate shutter. The quaternary composition spread type thin film materials library was fabricated by a wedge-type multilayer technique by successive deposition of 200 nanoscale wedge-type elemental layers. A similar approach is described in detail by Salomon et al.⁴³. The wedge-type layers were made using a moving shutter, which was set to shield the substrate and was then slowly retracted during deposition. In case of Co and Ni single wedges with length of 67.5 mm were deposited with a rotation of 90° in between. In case of the Cr and Re wedge triplets consisting of three wedges with a length of 22.5 mm were deposited again with a rotation of 90° in between. This the nominal thickness of the wedges varies from 0 to ~13 nm in case of Co, Cr and Ni and from 0 to ~3 nm in case of Re. After deposition, the materials library was annealed at 600 °C for 24 hours in vacuum at 2.6 × 10⁻⁴ Pa.

XRD measurements

XRD measurements were done in a Bruker D8 Discover with a 2D detector (Vantec 500) with Cu Kα radiation. Four 2D-frames were measured to cover a 2θ range from 20° to 100°. The frames were merged and integrated into 1D-intensity diffractograms. The background was subtracted by fitting a 4-degree polynomial function.

Data availability

The exact datasets used in this study are available from the authors on reasonable request, with the available code used to generate the training data below.

Code availability

The XCA package was used to generate training datasets and is available at github.com/maffettone/xca. This package is actively being extended to support VAE models.

Additional code used in this study is available from the corresponding author on reasonable request.

References

Ludwig, A. Discovery of new materials using combinatorial synthesis and high-throughput characterization of thin-film materials libraries combined with computational methods. npj Comput. Mater. 5, 1–7 (2019).
Article Google Scholar
Batra, R., Le, S. & Ramprasad, R. Emerging materials intelligence ecosystems propelled by machine learning. Nat. Rev. Mater., 1–24 (2020).
Talapatra, A. et al. Autonomous efficient experiment design for materials discovery with Bayesian model averaging. Phys. Rev. Mater. 2, 113803 (2018).
Article CAS Google Scholar
Stein, H. S. & Gregoire, J. M. Progress and prospects for accelerating materials science with automated and autonomous workflows. Chem. Sci. 10, 9640–9649 (2019).
Article CAS Google Scholar
Olds, D. et al. Combinatorial appraisal of transition states for in situ pair distribution function analysis. J. Appl. Crystallogr. 50, 1744–1753 (2017).
Stanev, V. et al. Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering. npj Comput. Mater. 4, 1–10 (2018).
Article Google Scholar
Long, C. J., Bunker, D., Li, X., Karen, V. L. & Takeuchi, I. Rapid identification of structural phases in combinatorial thin-film libraries using x-ray diffraction and non-negative matrix factorization. Rev. Sci. Instrum. 80, 103902 (2009).
Article CAS Google Scholar
Park, W. B. et al. Classification of crystal structure using a convolutional neural network. IUCrJ 4, 486–494 (2017).
Article CAS Google Scholar
Vecsei, P. M., Choo, K., Chang, J. & Neupert, T. Neural network based classification of crystal symmetries from x-ray diffraction patterns. Phys. Rev. B 99, 245120 (2019).
Article CAS Google Scholar
Aguiar, J. A., Gong, M. L., Unocic, R. R., Tasdizen, T. & Miller, B. D. Decoding crystallography from high-resolution electron imaging and diffraction datasets with deep learning. Sci. Adv. 5, eaaw1949 (2019).
Article CAS Google Scholar
Suzuki, Y. et al. Symmetry prediction and knowledge discovery from X-ray diffraction patterns using an interpretable machine learning approach. Sci. Rep. 10, 1–11 (2020).
Article CAS Google Scholar
Tiong, L. C. O., Kim, J., Han, S. S. & Kim, D. Identification of crystal symmetry from noisy diffraction patterns by a shape analysis and deep learning. npj Comput. Mater. 6, 1–11 (2020).
Article Google Scholar
Wang, H. et al. Rapid identification of X-ray diffraction patterns based on very limited data by interpretable convolutional neural networks. J. Chem. Inf. Model. 60, 2004–2011 (2020).
Article CAS Google Scholar
Zaloga, A. N., Stanovov, V. V., Bezrukova, O. E., Dubinin, P. S. & Yakimov, I. S. Crystal symmetry classification from powder X-ray diffraction patterns using a convolutional neural network. Mater. Today Commun. 25, 101662 (2020).
Article CAS Google Scholar
Aguiar, J. A., Gong, M. L. & Tasdizen, T. Crystallographic prediction from diffraction and chemistry data for higher throughput classification using machine learning. Comput. Mater. Sci. 173, 109409 (2020).
Article CAS Google Scholar
Lee, J.-W., Park, W. B., Lee, J. H., Singh, S. P. & Sohn, K.-S. A deep-learning technique for phase identification in multiphase inorganic compounds using synthetic XRD powder patterns. Nat. Commun. 11, 86 (2020).
Article CAS Google Scholar
Oviedo, F. et al. Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks. npj Comput. Mater. 5, 1–9 (2019).
Article CAS Google Scholar
Maffettone, P. M. et al. Crystallography companion agent for high-throughput materials discovery. Nat. Comput. Sci. 1, 290–297 (2021).
Article Google Scholar
Suram, S. K. et al. Automated phase mapping with AgileFD and its application to light absorber discovery in the V–Mn–Nb oxide system. ACS Comb. Sci. 19, 37–46 (2017).
Article CAS Google Scholar
Dipendra Jha et al. Peak Area Detection Network for Directly Learning Phase Regions from Raw X-ray Diffraction Patterns. in 2019 International Joint Conference on Neural Networks (IJCNN), 1–8, https://doi.org/10.1109/IJCNN.2019.8852096 (2019)
Kusne, A. G. et al. On-the-fly closed-loop materials discovery via Bayesian active learning. Nat. Commun. 11, 5966 (2020).
Article CAS Google Scholar
Lee, J.-W. et al. A data-driven XRD analysis protocol for phase identification and phase-fraction prediction of multiphase inorganic compounds. Inorg. Chem. Front. 8, 2492–2504 (2021).
Article CAS Google Scholar
GitHub. maffettone/xca. Available at https://github.com/maffettone/xca (2021).
Szymanski, N. J., Bartel, C. J., Zeng, Y., Tu, Q. & Ceder, G. Probabilistic deep learning approach to automate the interpretation of multi-phase diffraction spectra. Chem. Mater. 33, 4204–4215 (2021).
Article CAS Google Scholar
Bengio, Y., Courville, A. & Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. pattern Anal. Mach. Intell. 35, 1798–1828 (2013).
Article Google Scholar
Beggel, L., Pfeiffer, M. & Bischl, B. Robust Anomaly Detection in Images using Adversarial Autoencoders. in Machine Learning and Knowledge Discovery in Databases. 11906, (eds. Brefeld U., Fromont E., Hotho A., Knobbe A., Maathuis M. & Robardet C.), 206–222 (Springer, 2019).
Amarbayasgalan, T., Jargalsaikhan, B. & Ryu, K. Unsupervised novelty detection using deep autoencoders with density based clustering. Appl. Sci. 8, 1468 (2018).
Article Google Scholar
Wetzel, S. J. Unsupervised learning of phase transitions: From principal component analysis to variational autoencoders. Phys. Rev. E 96, 22140 (2017).
Article Google Scholar
Stein, H. S., Guevarra, D., Newhouse, P. F., Soedarmadji, E. & Gregoire, J. M. Machine learning of optical properties of materials‐predicting spectra from images and images from spectra. Chem. Sci. 10, 47–55 (2019).
Article CAS Google Scholar
Rezende, D. J., Mohamed, S. & Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. in Proceedings of the 31st International Conference on Machine Learning. 32, (eds. Eric P. Xing & Tony Jebara), 1278-1286 (PMLR, 2014).
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. Preprint at https://arxiv.org/abs/1312.6114 (2013).
Kalinin, S. V., Dyck, O., Jesse, S. & Ziatdinov, M. Exploring order parameters and dynamic processes in disordered systems via variational autoencoders. Sci. Adv. 7, eabd5084 (2021).
Article CAS Google Scholar
Kalinin, S. V., Kelley, K., Vasudevan, R. K. & Ziatdinov, M. Toward decoding the relationship between domain structure and functionality in ferroelectrics via hidden latent variables. ACS Appl. Mater. Interfaces 13, 1693–1703 (2021).
Article CAS Google Scholar
Kingma, D. P. & Welling, M. An Introduction to Variational Autoencoders. Preprint at https://arxiv.org/abs/1906.02691 (2019).
Noh, J. et al. Inverse design of solid-state materials via a continuous representation. Matter 1, 1370–1384 (2019).
Article Google Scholar
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Article CAS Google Scholar
Bluesky Project. Available at https://blueskyproject.io/ (2021).
Di, C. et al. Deep reasoning networks for unsupervised pattern de-mixing with constraint reasoning. in Proceedings of the 37th International Conference on Machine Learning, 119, (eds. Hal Daumé III & Aarti Singh), 1500-1509 (PMLR, 2020).
Sohn, K., Lee, H. & Yan, X. Learning structured output representation using deep conditional generative models. Adv. neural Inf. Process. Syst. 28, 3483–3491 (2015).
Google Scholar
Long, C. J., Bunker, D., Li, X., Karen, V. L. & Takeuchi, I. Rapid identification of structural phases in combinatorial thin-film libraries using x-ray diffraction and non-negative matrix factorization. Rev. Sci. Instrum. 80, 103902 (2009).
Article CAS Google Scholar
Li, Z. et al. Robot-accelerated perovskite investigation and discovery. Chem. Mater. 32, 5650–5663 (2020).
Article CAS Google Scholar
Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. The Computational Crystallography Toolbox: crystallographic algorithms in a reusable software framework. J. Appl. Crystallogr. 35, 126–136 (2002).
Article CAS Google Scholar
Salomon, S., Wöhrle, F., Hübner, P., Decker, P. & Ludwig, A. Influences of Si substitution on existence, structural and magnetic properties of the CoMnGe phase investigated in a Co–Mn–Ge–Si thin-film materials library. ACS Comb. Sci. 21, 675–684 (2019).
Article CAS Google Scholar

Download references

Acknowledgements

This study was funded by the German Research Foundation (DFG) as part of Collaborative Research Centers SFB-TR 87 and SFB-TR 103. This research used resources of the National Synchrotron Light Source II, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under Contract No. DE-SC0012704 and the BNL Laboratory Directed Research and Development (LDRD) project 20-032 ‘Accelerating materials discovery with total scattering via machine learning’. The center for interface dominated high-performance materials (ZGH, Ruhr-Universität Bochum, Bochum, Germany) is acknowledged for X-ray diffraction experiments.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Materials Discovery and Interfaces, Institute for Materials, Ruhr University Bochum, Bochum, Germany
Lars Banko & Alfred Ludwig
National Synchrotron Light Source II, Brookhaven National Laboratory, Upton, NY, USA
Phillip M. Maffettone & Daniel Olds
ZGH, Center for Interface-Dominated High Performance Materials, Ruhr University Bochum, Bochum, Germany
Dennis Naujoks & Alfred Ludwig

Authors

Lars Banko
View author publications
You can also search for this author in PubMed Google Scholar
Phillip M. Maffettone
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Naujoks
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Olds
View author publications
You can also search for this author in PubMed Google Scholar
Alfred Ludwig
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.B. and P.M. conceptualized the study and wrote the main parts of the manuscript. L.B. performed the data experiments. D.N. was responsible for the synthesis and characterization of the experimental Co–Ni–Cr–Re materials library. All authors contributed to the discussion of the results and participated in writing the manuscript.

Corresponding author

Correspondence to Alfred Ludwig.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Banko, L., Maffettone, P.M., Naujoks, D. et al. Deep learning for visualization and novelty detection in large X-ray diffraction datasets. npj Comput Mater 7, 104 (2021). https://doi.org/10.1038/s41524-021-00575-9

Download citation

Received: 29 January 2021
Accepted: 16 June 2021
Published: 09 July 2021
DOI: https://doi.org/10.1038/s41524-021-00575-9

This article is cited by

Enhanced accuracy through machine learning-based simultaneous evaluation: a case study of RBS analysis of multinary materials
- Goele Magchiels
- Niels Claessens
- André Vantomme
Scientific Reports (2024)
Review on novelty detection in the non-stationary environment
- Supriya Agrahari
- Sakshi Srivastava
- Anil Kumar Singh
Knowledge and Information Systems (2024)
Enhancing Reproducibility in Precipitate Analysis: A FAIR Approach with Automated Dark-Field Transmission Electron Microscope Image Processing
- Ghezal Ahmad Jan Zia
- Thomas Hanke
- Bernd Bayerlein
Integrating Materials and Manufacturing Innovation (2024)
Finding the semantic similarity in single-particle diffraction images using self-supervised contrastive projection learning
- Julian Zimmermann
- Fabien Beguet
- Daniela Rupp
npj Computational Materials (2023)