A convolutional neural network for defect classification in Bragg coherent X-ray diffraction

Coherent diffraction imaging enables the imaging of individual defects, such as dislocations or stacking faults, in materials.These defects and their surrounding elastic strain fields have a critical influence on the macroscopic properties and functionality of materials. However, their identification in Bragg coherent diffraction imaging remains a challenge and requires significant data mining. The ability to identify defects from the diffraction pattern alone would be a significant advantage when targeting specific defect types and accelerates experiment design and execution. Here, we exploit a computational tool based on a three-dimensional (3D) parametric atomistic model and a convolutional neural network to predict dislocations in a crystal from its 3D coherent diffraction pattern. Simulated diffraction patterns from several thousands of relaxed atomistic configurations of nanocrystals are used to train the neural network and to predict the presence or absence of dislocations as well as their type(screw or edge). Our study paves the way for defect recognition in 3D coherent diffraction patterns for material science


Introduction
Defect detection and classification are important issues in material science, as defects strongly influence the properties of materials [1][2][3][4] . Although metallurgy has long recognized the importance of defects for the macroscopic mechanical properties (e. g. such as enhanced yield strength of steel), their more widespread influence in other fields of material science is still lacking detailed understanding. Nevertheless, the concept of strain engineering in a vast variety of functional materials is attracting a lot of attention, opening great opportunities for the design and optimisation of the mechanical, optical, electrical or catalytic properties of materials via deliberate defect manipulation [5][6][7] . Crystal defects of various nature and length scales are not always adverse but can instead activate specific functionalities, such as improving adsorption affinity or catalytic activity. For instance, twins and stacking faults can improve catalytic efficiency of nanoparticles 8 and more generally the strain generated by defects can affect the catalytic activity 9 . Similarly, the role of dislocations in battery performance has drawn the attention of scientists and could be a key point for further optimisation 6 . This defect sensitivity might open new avenues to engineering the properties of nanostructures by introducing specific defects. In order to achieve this goal, it is important to detect and classify defects in nanomaterials to better understand their behaviors (nucleation, propagation, annihilation, defect-defect interaction).
Unlike perfect crystals that can be described as equilibrium structures, the physics and thermodynamics of defects is much harder to describe with the available theoretical tools. It is thus of greatest relevance to supply imaging techniques capable delivering tomographic reconstructions of the crystal structure in the close environment of defects. Few experimental techniques can achieve this goal. Among them, transmission electron microscopy (TEM) is routinely used to image dislocations in real space by selecting relevant diffraction vectors, according to established invisibility criteria 10 . It has atomic resolution and can directly image individual crystal defects. However, the technique is hindered by several constraints related to sample preparation. These constraints are relaxed for X-rays, which have a great potential to study defects in crystals. With the advent of new generation synchrotron sources with higher coherent flux, a very attractive technique to probe the microstructure of defects has emerged: coherent X-ray diffraction (CXD) 11,12 . In Bragg geometry, it probes the local deviation from the perfect crystal lattice and is therefore highly sensitive to elastic strain 13 and crystal defects such as stacking faults 14 or dislocation loops 15 . In the past two decades, the technique has been turned into an imaging technique (Bragg Coherent Diffraction Imaging, BCDI), combining measurements of three-dimensional (3D) Coherent X-ray diffraction patterns (CXDPs) with phase retrieval algorithms 16,17 , to obtain a spatial reconstruction of isolated nanoscale objects 18 . The technique has been used successfully to image the strain field in defective nanocrystals 6,19 including for relatively complex defect configurations 20,21 , but tends to fail for highly strained systems. In addition, phase retrieval algorithms are relatively slow, while a live evaluation of the data is often required during in situ and operando experiments. This is particularly true in the case of Bragg ptychography, which requires a considerable amount of data. There is therefore an interest in understanding CXDPs qualitatively and interpreting them directly in reciprocal space. Depending on their type and on the measured Bragg reflection, single crystal defects have indeed a unique signature on CXDPs which enables their identification directly from the reciprocal space data 22 . For instance, a screw dislocation will lead to a ring-shaped Bragg diffraction signal, if the Burgers vector b of the dislocation is parallel to the scattering vector at the measured Bragg position, g.
For identifying defects, pattern classification 23,24 and neural networks (NN) for fault detection 25 have been previously used, for example in diffraction phase microscopy. Deep learning has also been used successfully for optical surface-defect detection [26][27][28] and for defect segmentation in scanning transmission electron microscopy 29 . These methods are therefore relevant to detect and classify defects in CXDPs, which are very sensitive to the defect type. The need of extensive training sets and prior data with different type of defects is one of the main difficulties to overcome with these computational methods. These requirements could potentially limit their performance and practical feasibility. However, with the exponential advancements in computational resources 30 and the possibility of ultra-fast atomistic relaxation and computation of diffraction patterns with massive parallelism or graphical processing units (GPUs), it is now straightforward to calculate the 3D CXDPs of single nanocrystals from their atomistic configurations. These configurations can be generated by varying the type and location of the crystal defects and then relaxed by energy minimization. The relaxation of the faulted crystal structure allows to model accurately the crystal defect and has been shown to have a large impact on CXDPs 22 , leading to a better agreement between the simulated 3D CXDPs and experimental measurements.
While models have been widely applied to generate 2D images, generation of 3D structures is a nascent field. For example, a deep learning NN model has been recently successfully developed for classification of crystal structures from 2D diffraction maps of more than 100,000 simulated crystal structures 31 , but it has the drawback that 2D diffraction fingerprint is not unique across space groups. Recently, several papers proposed to use deep learning models trained on simulated CXDPs to perform phase retrieval 32-36 which is commonly carried out using iterative algorithms. This demonstrates the emergence of deep learning in the field of CXD and BCDI.
In this work, we develop and train a 3D convolutional neural network (CNN), which aims to obtain a fast and precise defect classification in nanocrystals of common face-centered cubic (fcc) transition metals. The training data are generated from atomistic simulations that are representative of the physics of the material. Once trained, the network can predict dislocations on simulated and measured 3D CXDPs. The predictions are categorized in two (defect free and single dislocation) or three (defect free, single screw and edge dislocations) classes. This work paves the way for automated defect detection and its reliable recognition from 3D CXDPs.

Results and discussion
Building the datasets In order to build the dataset required for training the neural network (NN), several material simulation tools were used. The data pipeline allows one to generate simulated CXDPs very close to the ones obtained from Bragg CXD experiments. Fig.  1 illustrates our approach for the creation of 3D CXDPs. The geometry considered in this study is derived from the Wulff construction, i.e., the equilibrium crystal shape of a free-standing crystallite obtained by Gibbs thermodynamic principle, which minimizes the total surface free energy associated to the crystal-medium interface. 37 In order to take into account the presence of a solid-solid interface, i.e. the presence of an underlying substrate as in the experimental nanoparticles, the so-called Winterbottom shape, which can be described as a truncated Wulff construction, is employed. 38 . An example of a simulated crystal is shown in Fig. 1b-d. Only fcc transition metals are considered in this study (Al, Au, Ag, Pt), for which the Wulff/ Winterbottom geometries mostly consist of {1 1 1} and {1 0 0} facets. The Winterbottom constructions are generated using the atomistic simulation code MERLIN 39 , by creating a cube of atoms and cutting it along the <1 1 1> and <1 1 0> crystallographic directions, the position of the cut planes being defined by the ratio of the surface energies γ 111 / γ 100 and γ 110 / γ 100 of the material/potential of interest. The lattice orientations corresponding to the axes of the simulation cell are x[1 0 0], y[0 1 0] and z[0 0 1] and are kept constant for all configurations. The interface plane is selected randomly among the eight possible {1 1 1} planes, and is cut at a random position corresponding to 65% -75% of the height of a free standing Wulff particle.
Two crystal sizes are considered in this study, the small crystals consist of 40x40x40 unit cells (Supplementary Figure 1) while the large crystals are made up of 80x80x80 unit cells (Supplementary Figure 2). This corresponds to a size of 15x15x(9 -12)nm 3 / 100000-140000 atoms for the small configurations, and 30x30x(19-25)nm 3 / 800000-950000 atoms for the large configurations, the height and number of atoms in the the crystal depend on the distance of the interface plane with respect to the centre of the particle, and on the lattice parameter of the element considered. For the purpose of this study, we focus on line defects, namely, edge and screw dislocations. A single dislocation and its corresponding displacement field (hypothesis of an isotropic and semi-infinite volume, see Ref. 22 ) is introduced following two strategies. In the first type of configurations, hereafter referred as CD, the dislocation is systematically introduced close to the centre, within a range not exceeding 10% of the lateral size of the particle. In the second type of configurations, hereafter referred to as RPD, the position of the dislocation is completely random. The simulated dislocations have a Burgers vector of b = 1 2 [11 0] which is kept constant for all the configurations. This implies that the initial line directions are t = [11 0] and t = [1 12] for the screw and edge dislocations, respectively. If the Burgers vector and line direction of the dislocations are not varied, the random selection of the interface plane ensures that a large variety of orientations of the dislocation line with respect to the normal of the interface plane is available in the dataset as shown in Supplementary Figures 1 & 2.
Once the atomistic configurations are generated, the next step is to obtain accurate and realistic relaxed configurations that reproduce as faithfully as possible the displacement fields measured in the experimental particle. The crystals are relaxed at 0 Kelvin using a conjugate gradient algorithm. If the dislocations introduced close to the centre of the nanocrystals are stabilized by the image forces during relaxation, one notable challenge is the tendency of the dislocations introduced close to a free surfaces to escape the crystal during the energy minimization. In order to prevent this phenomenon, the energy tolerance is used as the main stopping criterion for the energy minimization. The latter is defined as the energy change between two successive iterations divided by the total energy of the system, and is set to a value of 10 −6 for the RPD configurations. This value is sufficiently high to ensure that the dislocations dissociate into Shockley partials and remain in the crystal at the end of the relaxation, as shown in Fig. 1c and Supplementary Figures 1, 2 & 3. The small number of minimization steps also prevents large rotations of the dislocations during the relaxation. It was indeed observed that edge dislocations are prone to rotate (thus becoming a mixed dislocation) during the energy minimization, especially when they are introduced in the vicinity of the free surfaces. Limiting the number of relaxation steps allows to retain the edge and screw character of the dislocation during the relaxation, even if dislocations very close to the free surfaces tend to have a mixed character as illustrated in Supplementary Figures 1, 2 & 3. Each dataset typically contains 1000 relaxed configurations with one third of defect free nanocrystals, one third containing a relaxed screw dislocation and the last third with a relaxed edge dislocation. The time required for the energy minimization of a full dataset ranges between 10 and 25 minutes for the small crystal dataset and 1h30 minutes and 4h for the large crystal dataset.
The last step in the dataset creation is the calculation of the three-dimensional CXDPs that are used as input data for our CNN. This is done by summing the amplitudes scattered by each atom with its phase factor, following a kinematic approximation: where q is the scattering vector, f j (q) and r j are respectively the atomic scattering factor and position of atom j. Note that the crystallographic convention is used in this manuscript, i.e. the 2π factor is not included in q, which implies that a given q value corresponds to a real space distance d of q = 1/d. The computation is performed with a GPU using the PyNX 45 scattering package, which considerably speeds up the calculation of the CXDPs. Given the large number of atoms (10 5 -10 6 atoms) and the large number of CXDPs that are generated for each dataset (2000-15000), the calculations are performed on 64x64x64 reciprocal space points. The size of the 3D array is a trade-off between achieving a high-enough resolution in the reciprocal space, which is required for an accurate comparison with the experimental CXDPs and keeping the time required to generate the dataset reasonable. Using a POWER9 machine, each CXDP is calculated in 0.25s for the small configurations and 2s for the large configuration. A dataset containing 10000 CXDPs is therefore typically generated in 40 minutes for the small nanocrystals and 6 hours for the large ones.
In order to introduce enough variation in the dataset and prevent overfitting of the model to the training set (Supplementary Figure 4), each CXDP is rotated randomly around the chosen Q vector, typically we consider 10 random orientations for each relaxed configuration. The reciprocal space sampling (δ q) is also varied, which is equivalent to zooming around the Bragg reflection of interest (Supplementary Figure 14). A low reciprocal space resolution (coarse sampling / large δ q) can have detrimental effects on the accuracy of the network predictions (Supplementary Figure 14c). To prevent this loss in accuracy, we typically selected δ q values for which the oversampling ratio is consistent with the one used for experimental data. Note

3/12
that even for the largest δ q values, the oversampling criteria as defined by Sayre 46 are still fulfilled, as it is always the case for the experimental CXDPs. Since the simulated particles are significantly smaller than the experimental ones (typically by one order of magnitude), this also implies that a larger portion of the Brillouin zone is selected for the simulated particles. We will see in the following that this has little consequence for the accuracy of the network predictions. Before training the NN, the distribution of dislocation positions is typically estimated by comparing the maximum of the intensity scattered by the atomistic configurations in the dataset with the maximum of the intensity scattered by a defect free crystal with a similar number of atoms (Supplementary Figure 15).

Convolutional neural network
The NN model architecture is displayed in Fig. 2. It takes as input the 64×64×64 image of the CXDP intensity and encodes it through a series of convolution and fully connected layers. Dropout 47 is used in all layers with a dropout rate of 0.2, to avoid overfitting. This is a standard architecture, nevertheless it already gives very accurate predictions on the simulated dataset. Increasing the size of the model, adding extra layers or increasing the number of filters in the convolution layers does not increase the model efficiency and even leads to an overfit of the training dataset in some cases.
Training is performed using Adam optimization 48 with a learning rate of 10 −3 and a batch size of 64. A large amount of 3D datasets are simulated. They systematically specify the correct output (defect class) for a given input (3D CXDP intensity), and minimise a categorical cross-entropy loss that quantifies the difference between the predicted and the correct class labels (defect free, screw and edge). Through this minimisation, the weights (i.e., parameters) of the neural network are optimised to reduce the classification error. The weights of each convolutional and fully connected layers are initialized randomly. Moreover, the instances of the training dataset are processed in a random order. Nonetheless, two independent trainings for a given dataset a CNN always gives a very similar probability distribution as illustrated in Supplementary Figure 16. The simulated data are split into training, validation and test sets. The model fit is performed with the training set and stopped when the validation set accuracy reaches a maximum. The final model prediction on the test set containing 11556 CXDPs calculated from 1284 atomistic configurations reaches a very high total accuracy score of 97.2%. In addition, the confusion matrix displayed in Supplementary Figure 7 shows that almost all defect free crystals are predicted. Most of the errors (4.7%) come from edge dislocations predicted as screw. Furthermore, as illustrated in Supplementary Figures 11 & 12, a simpler two classes model (Supplementary Figure 10) predicting either a defect free oe a defective crystal can reach an even higher accuracy. From an occlusion sensitivity test 49 on a simulated CXDP shown in Supplementary Figure 13, we demonstrate that the NN mainly uses the vicinity of the Bragg peak to make its prediction.

Validation on experimental data
The experimental datasets correspond to 3D reciprocal space maps obtained by measuring the Bragg CXDPs of Pt nanoparticles. Single particles were measured either at the SixS beamline of synchrotron SOLEIL (Orsay, France) or at the P10 beamline of synchrotron PETRA (Hamburg, Germany). The 3D Bragg CXDPs were collected at the asymmetrical111 Pt Bragg reflection at the SixS beamline or at the symmetrical (specular) 111 Pt Bragg reflection at the P10 beamline. The experimental reciprocal space datatsets have been orthonormalised using the xrayutilities package 50 . Fig. 3 displays the CXDPs of the experimental datasets, as well as their reconstructed Bragg electron density using phase retrieval algorithms. Defect-free (Figs. 3a,c) as well as defective crystals (Figs. 3b,d) were measured. A closer look at Figs. 3(b,d) reveals the variety of dislocation configurations that is found in experimental nanocrystals. These dislocations were most likely nucleated during the growth of the nanoparticles, and did not escape during the annealing at 1100 • C, suggesting that they are strongly pinned in the nanocrystal. For the SixS data, the screw dislocation is close to the center of the nanocrystal (Burgers vector of b = 1 2 [110]). On the other hand, the dislocation in the P10 defective nanocrystal is closer to the free surfaces. In addition, the dislocation line is not perfectly straight and parallel to the Burgers vectors (b = 1 2 [101]). It can thus be described as a mixed dislocation with a dominant screw character.
In order to reinforce the agreement between the simulated and experimental datasets each diffraction measurement is preprocessed before computing the model prediction. The CXDP center of mass is placed at the center of the array, as it is also the case for the simulated data. Finally, the CXDP is normalized so that the maximum is equal to 1.
The results of our best NN model on the preprocessed CXDPs are displayed in Fig. 4 along with slices along Q x , Q y and Q z for each experimental CXDP. Some crystals were measured several times under different experimental conditions (temperature, gas environment) for example P10 -no defect 1, 2 and 3 in Fig. 4), allowing us to compare the model predictions for the same crystal but with slightly different CXDPs.
The performances of this model on experimental data are excellent, all the experimental examples being predicted in the correct class and most of them with a very high probability (> 95%). Although still very good, the predictions for the P10 data (mixed dislocation) are generally slightly worse with an accuracy ranging between 82 and 94%. This is not surprising given the mixed type of dislocation (with a dominant screw character), which necessarily increases the probability of identifying the defect as an edge dislocation. Nonetheless, even if the dislocation is located close to a free surface and therefore induces weak distortions in the CXDP (Fig. 3d), our model still manages to identify this crystal as defective with almost a 100% probability. This demonstrates the robustness of the model trained on this dataset, which can predict both centered and off-centered dislocations with a very high accuracy.
The simulated training dataset used to fit the NN model has a large influence on the accuracy of the predictions on experimental data. This dataset must contain enough diversity, while sharing enough similarities with the experimental CXDPs. The predicted probabilities on experimental data for the same model architecture but different simulated training datasets are shown in Table 1. Six different simulated datasets have been trained: (1) single element (Pt) unrelaxed small crystals, 100% centered dislocations (CD), (2) relaxed Pt small crystals (100% CD), (3) relaxed Pt large crystals (100% CD), (4) relaxed large crystals with multiple elements (Au and Pt) (100% CD), (5) relaxed multi-elements large crystals with dislocations at random position (100% RPD) and (6) relaxed multi-elements large crystals with a mix of CD and RPD configurations (75% CD and 25% RPD). The first two rows of Table 1 emphasize the importance of accurately modelling the displacement field of the dislocations. Indeed, while these two models trained on relaxed and unrelaxed datasets predict accurately the defect free configurations, they fail at identifying the mixed dislocation (P10 data). However, the model trained on the relaxed dataset performs much better on the SixS-"screw" data, which is correctly identified as a screw dislocation (see also Supplementary Figure 9). On the other hand, the size of the relaxed crystals does not have a major impact on the accuracy of the model ( Table  1, second and third row), although the predictions of the models trained on the large configurations is slightly more accurate, in particular for defect free configurations (Supplementary Tables 2 and 6).
The addition of several elements in the dataset improves the accuracy of the predictions for the SixS data, but has no effect on the P10 data (Table 1, fourth row). Nonetheless, mixing several elements in the dataset generally results in better model predictions compared to the models based on single elements, in particular for the large crystal size (Supplementary Tables 5-8).
The position of the dislocation also has a major impact on the model predictions. As seen from Table 1 (sixth row), introducing the dislocation at random positions, including positions close to the crystal free surfaces, results in more accurate predictions for the P10 data. However, this improvement is at the expense of the predictions for the SixS data, which is correctly identified as a dislocation, but with an edge character instead of a screw. The predictions for the defect free data are not affected and still excellent (see also Supplementary Figure 8).
In order to obtain accurate predictions simultaneously for both P10-"mixed" and SixS-"screw" dislocations, one must increase the diversity in the training dataset. This has been achieved by building a dataset consisting of a mix of CD and RPD configurations ( Table 1, seventh row). Training the CNN on this mixed dataset significantly enhances the performances of the model and allows to predict correctly and with a very high accuracy all the experimental examples.
We must emphasize that, despite the differences in the ability of the models to generalize to experimental data, the accuracy on the simulated test data for each training dataset is always higher than 86% (Supplementary Tables 1 & 5). Our work illustrates the necessity of using a simulated trained datasets close to real structures: atomistically relaxed nanoparticles with an accurate modelling of the dislocation displacement field, multiple atomic elements and random location of the dislocations. It also demonstrates that a convolutional neural network can predict dislocations in a crystal from its 3D coherent diffraction pattern. Combined with the fast scanning capabilities of some synchrotron beamlines 51 , this approach could be used to perform a fast screening of the nanocrystals on a sample of interest. This would allow to determine the proportion of defect free nanocrystals as well as nanocrystals containing a specific type of crystal defect, and select the best candidate for a coherent diffraction imaging experiment. In addition, if the CNN was only tested on metallic fcc particles, we foresee that it could be extended to more complex systems like for instance multi-element particles.
From 3D coherent X-ray CXDPs, we used a convolutional neural network to predict defect classes. As a result, we obtain an automatic procedure for defect classification in fcc metals, which does not require any user-manipulation, any intensive live data mining, and achieves high-accuracy classification even in the presence of defects close to the free surfaces of the nanocrystals. This tool can be exploited during experiment execution to provide rapid feedback to the investigator, enables one to identify on the fly target defect types present in individual nanocrystals, and furthers the possibility of unsupervised data collection, extremely relevant given the increases data rates expected at ever improving facilities. Our study paves the way for defect recognition of three-dimensional structural data in big-data materials science.

5/12
Methods Training the network We used the python deep-learning API Keras 52 running the TensorFlow backend 53 to build, develop and train our NN. The training was performed in parallel on two NVIDIA Tesla V100 GPUs and a POWER9 computer. We use a categorical cross-entropy loss function L(y,ŷ) y n,c log(ŷ n,c ) where B is the batch size, N c the number of classes, y n,c = 1 for data element n if the true class is c and y n,c = 0 otherwise.ŷ n,c is the predicted probability for class c. The simulated dataset is divided into training, validation and test, corresponding respectively to 85%, 10% and 5% of the total dataset. The model is trained with a learning rate of 10 −3 and a batch size of 64 on the training set until the model accuracy calculated on the validation set reaches a plateau (Supplementary Figures 5, 6 & 11). A typical training requires between 15 and 30 minutes depending on the dataset (8-10 seconds per epoch and 100-200 epochs). Decreasing the learning-rate and increasing the batch size does not further improve the model accuracy. Once trained, the model performance is evaluated on the test set and reaches a total accuracy >86% on the simulated data for all models presented in Table 1.

Sample growth
Pt nanocrystals were prepared by the solid-state dewetting of a 30-nm thin Pt film for 24 hours at 1100 • C in air 54 . The Pt film was deposited on α-Al 2 O 3 (sapphire) with an electron beam evaporator. The Pt nanocrystals have their c-axis oriented along the [111] direction normal to the (0001) sapphire substrate. A standard photolithography method was employed to prepare a patterned layer of photoresist on sapphire prior to the electron beam evaporation of Pt. The lithographic processing route ensured that a number of dewetted Pt particles are well-separated from their neighbors and that only one crystallite is irradiated by the incoming x-ray beam. The particle size ranges from 100 nm to 700 nm.

Data availability
The data supporting the findings of this work are available from the corresponding author on reasonable request.     Table 1. Predicted probabilities on the experimental data from several models trained with different simulated training datasets.
The predictions are shown in %. In each cell, the prediction probability for the 3 classes (prefect: p, screw: s, edge: e) is shown in green if the prediction is correct and in red if it is wrong. CD and RPD stand for centered dislocations and random position dislocations, respectively.

Simulated crystal dataset
Since the number of atoms in simulated crystal configurations are limited by the computing capabilities of our computers, the simulated crystal size is smaller (diameter < 30 nm) than the one measured experimentally (diameter between 300 nm and 600 nm). To ensure that the size of the simulated crystals does not influence the results of the deep learning model, two crystal sizes were considered in this study, hereafter referred as the large and small crystal datasets.

Screw Edge
Defect free We added dropout to all layers with a dropout rate of 0.2. This dropout was required to train the model with the unrelaxed dataset as "Pt unrelaxed small crystals CD" in Supplementary  perfect crystals. This again shows that, while our 3 classes model has a high prediction accuracy, it is even better at making the distinction between a defect free and defective crystal.

Model results for different simulated training datasets
The performances of the CNN on experimental data strongly depend on the simulated training dataset used to fit the model. A good training dataset should contain configurations close to the experimental data but also enough diversity to be able to predict correctly a wide range of experimental cases that can differ significantly from each other. RPD simulated dataset with a single atomic element (Au) and a large crystal size. The predicted probability for each class is shown above each example where blue, red and green correspond respectively to defect free, screw and edge class. A green check mark/red cross in the title means that the prediction is correct/wrong. If models trained on 100% RPD datasets typically fail to predict correctly the edge or screw character, a training dataset with dislocations introduced only in the vicinity of the center of the nanocrystal (100% CD datasets) is not without flaws. This is illustrated in Supplementary Figure   9. which shows the experimental predictions of a 100% CD dataset with multiple atomic elements (Au, Pt) and a large crystal size. In contrast with the model trained on a 100% RPD dataset, the model trained on the 100% CD dataset gives excellent predictions on the defect free and SixSscrew data, but incorrect predictions on the P10-mixed data, which is systematically identified as a defect free crystal. The reason behind these incorrect predictions is illustrated in Fig. 3 of the manuscript. As shown in Fig. 3d, the P10-mixed dislocation is close to the free surfaces, and yields much weaker distortions in the CXDPs than the centered dislocations of the simulated training dataset. We can infer that the model never learnt to identify the rather weak signature of an off-centered dislocation since it was only exposed to CD configurations, which create large distortions in the CXDPs. The lack of diversity of dislocation configurations/positions in the simulated training dataset results in a model that can predict accurately only specific dislocation configurations, i. e. close to the center of the nanocrystal, when applied to experimental data. As a consequence, the P10 mixed dislocation is incorrectly identified as a defect free crystal.

Overview of the trainings performed on relaxed configurations
Supplementary Tables 1 -8 give an overview of the training results for the two crystal sizes and reveal several interesting trends. First, the accuracy of the predictions on the simulated test datasets is systematically very high (>86%) (see Supplementary Tables 1 & 5 for the training on the small and large crystal size, respectively). Interestingly it slightly decreases with an increasing fraction of RPD configurations in the training dataset. For instance the accuracy drops from 96.4% for the 100% CD datasets to 86.8% for the 100% RPD (small crystal size and Pt element in Supplementary  Table 1). Training on the large crystal size also generally results in a slightly higher accuracy, but the difference is not significant (96.1% overall accuracy in Supplementary Table 5 vs 94.7% accuracy for the small crystal size in Supplementary Table 1).
Second, training on datasets with 100% of CD systematically results in 100% correct predictions for the experimental defect free and SixS-screw datasets (Supplementary Tables 2, 3, 6 & 7) but 0% prediction for the P10-mixed dataset (Supplementary Tables 4 & 8 Table 7, 0% of correct predictions). The success rate on the defect free configuration is also impacted, in particular for the small crystal size (Supplementary Table 2, 45% of correct predictions vs 100% for 100% CD datasets). The only exception is the Au small crystal size datasets which retains a very high success rate for all the experimental datasets. In order to improve the accuracy on the P10-mixed predictions while not impacting the success rate on the SixS-screw data, a better strategy consists in mixing CD and RPD datasets. For the large crystal configurations, a small fraction of RPD configurations is sufficient to dramatically improve the predictions on the P10mixed data while retaining a very high success rate for the defect free and SixS-screw datasets (Supplementary Tables 6-8). This is especially true for the multi-element Au-Pt dataset which performs very well when the ratio of RPD configurations in the dataset is between 20 and 35%. This strategy is slightly less efficient for the small crystal datasets where the success rate is more element dependent (Supplementary Table 2-4). Nonetheless, the introduction of a small fraction of RPD configurations in the training dataset is generally efficient to improve the success rate on the P10-mixed dataset, while keeping a high success rate on the defect free and SixS-screw experimental datasets (in particular for Ag, multi-element and to a lesser extent Pt training datasets). It also worth mentioning that this approach is not efficient on the Al dataset which performs poorly for the P10-mixed dataset, independently on the fraction of RPD configurations in the training dataset.   Finally, while the overall success rate on the defective experimental datasets appears mostly unaffected by the crystal size, the introduction of RPD configurations in the training dataset has a much larger impact for the small crystal size than for the large crystal size. The overall success rate for the defect free configurations is therefore significantly higher for the large crystal size than for the small crystal size, and overall the distinction between defect free and defective crystals is more robust for the former (96.9% accuracy) than for the latter (75.6% accuracy), Supplementary   Tables 1 & 5). Overall, adding diversity in the model (larger range of dislocation positions and/or atomic elements) is an efficient strategy in order to improve the robustness and accuracy of the CNN against a wide range of experimental configurations.   Supplementary Figure 12 shows the prediction of the 2 classes model on the experimental data.
All predictions are correct with slightly higher probabilities than in the 3 classes model, the lowest one being for the "P10 -mixed dislocation" with an 86% probability of a defective crystal.

Occlusion sensitivity test
The occlusion sensitivity test is a good method to check which features/regions of the the 3D CXDP are used by the neural network model to calculate the class probabilities 3    As a second test, we created several examples with different pixel sizes i.e. resolution or reciprocal space sampling using the same 3D interpolated experimental example with a screw dislocation. Our 3 classes model prediction is calculated for each resolution as shown in Supplementary   Figure 14c where a 2D slice of the diffraction is shown for the highest and lowest resolution. The model prediction is worse for very small and very large pixels but always correct between these 2 extremes. Again, we observe in the small pixel regime that our model fails to predict the edge or screw character of the dislocation (even though it still makes the correct screw prediction for most pixel sizes). The 2 classes model (Supplementary Figure 14d) is even less sensitive to the pixel size and always correctly predict the defective character of the nanocrystal. In conclusion, the 3 classes model can generalize to very diverse experimental measurements and the simple 2 classes model is even more robust to these variations.

Maximum intensity plots
We present in this section a quick method to estimate the distribution of dislocation positions in the simulated nanocrystals. It is based on the calculation of the maximum of the intensity scattered by a given atomistic configuration with respect to the maximum of the intensity scattered by a de-  Figure   15a), all dislocations are introduced close to the center, within a range not exceeding 10% of the lateral size of the nanocrystal. As a consequence, they yield a significant decrease of the maximum intensity for both screw and edge configurations: max(I screw ) = 38.1 ± 7.8, max(I edge ) = 43.2 ± 9.2. Note that the maximum of intensity also exhibits some variations for the defect free configurations. This is mostly due to the fact that the number of atoms is not constant (100000-140000 atoms and 800000-950000 atoms for the small and big nanocrystals, respectively). Indeed the (1 1 1) interface plane is cut at a random height in a range corresponding to 65% to 75% of the height of a free standing Wulff particle. In addition, changes in δ q, which is also randomized, can result in slight variations of the maximum of intensity.
The larger range of dislocation positions allowed for the RPD datasets is reflected in the distribution of the maxima of intensity of the 100% RPD dataset. As shown in Supplementary Figure   15b, the average value of the maxima increases significantly for both screw and edge defects : and max(I edge ). Finally, the distribution of the maxima of intensity in the 25% CD / 75% RPD dataset is similar to the one in the 100% RPD dataset. The larger fraction of CD configurations in the dataset is reflected in the lower values of max(I screw ) , max(I edge ) and of their standard deviations σ (max(I screw )), σ (max(I edge )). Overall, there is therefore a clear correlation between the fraction of RPD configurations in the dataset and the resulting distribution of the maxima of intensity. A higher fraction of RPD results in an increase of max(I dislo ) and σ (max(I dislo )) , while a lower fraction results in a decrease of these two quantities.

CNN training reproducibility
When creating a CNN model with tensorflow, the weights of each convolutional and fully connected layers are initialized randomly. Moreover, the instances in the training dataset are processed in a random for each epoch, and the CNN training is stopped manually once the validation set accuracy reaches a maximum value. Therefore, the CNN output will be slightly different for each training, and one could ask if a CNN always gives a similar probability distribution for dif-ferent training. In order to evaluate the similarity of the probability distributions, we trained 2 models with the same architecture. Identical training and validation datasets were used for both model. The confusion matrices calculated with the same test dataset are shown in Supplementary