Introduction

Phase formation and evolution is one of the key processes in the solid-state chemistry, physics, electrochemistry, and materials science. Correspondingly, studies of the kinetics and thermodynamics of phase evolution play a special role in virtually all areas of modern science, ranging from battery and fuel cell operation to formation and operation of electro-resistive devices to solid-state reactions involved in materials formation.1,2,3,4,5 Despite the cornerstone importance of this field, information on process parameters is derived from macroscopic measurements, or via scattering studies of large volume of material, while atomistic mechanisms remain largely unexplored.

In the last several years, advances in (scanning) transmission electron microscopy ((S)TEM) have enabled visualization of the atomic dynamics during multiple solid-state processes,6,7 including vacancy ordering in cobaltites,8 phase transformation at crystalline oxide interfaces,9 restructuring in 2D silica glass,10 single-point defect motion,11,12,13 single-atom catalytic activity,14,15,16 elastic–plastic transitions,17 and dislocation migration.18 The atomic dynamics were observed in response to classical macroscopic stimuli such as temperature and pressure, but also as a response to e-beam irradiation.19,20 Understanding these transformations requires establishing the nature of the phases and structures contained in a single image, ideally during the image acquisition process. These studies are particularly important in the context of the e-beam manipulation of matter atom by atom, as recently demonstrated.21,22,23,24,25,26

For any atomically resolved image, the first task is to identify the constituent lattice types present and determine their spatial distribution in the image. A review of existing approaches by Moeck and deStefano is a good reference on this topic.27,28 These codes and software tools analyze individual images and (some) require the user to select the constituent unit cell, and then proceed to use motif-matching style algorithms to determine the space group. As outlined in the review,27,28 there are several issues that limit the application of these methods, most obviously the difficulty in determining the uncertainties in the classification associated with the noise in atomic positions, but also that in many images a mixture of phases is present, necessitating an image segmentation that respects the available symmetries.

Previously we have used a sliding window-based method, to segment images based on linear unmixing of the (local) 2D fast Fourier Transform (FFT) spectra, via the N-FINDR unmixing algorithm29 with both positivity and sum-to-one constraints.30,31 Ideally one would like to impose an additional constraint that the endmembers should conform to some symmetry types in the case of atomically resolved images. For the 2D case, each lattice must reside in one of the five Bravais Lattice types, i.e., square, rectangle, rectangular centered lattice, hexagonal, and oblique. Thus, this task effectively boils down to the determination of which of the five Bravais lattice types each portion of the image belongs to. This can be trivially defined mathematically for the standard coordinate frame, but the task is more complex when one considers that in any real atomically resolved image, the lattices themselves may be rotated in arbitrary fashion, and multiple diffraction orders can be present. Moreover, there will always exist noise and slight distortions that can make hand-crafting the particular rules (i.e., the features) more difficult in practice. Indeed, we stress that this is not as trivial as would seem, because though all Bravais lattices in 2D can be defined by just two vectors, in practice the approaches that attempt to determine these vectors is made more difficult by a combination of arbitrary rotations of the lattice, microscope drift and distortions, multiple diffraction orders (if working in reciprocal space), and substantial noise that is present in many atomically resolved images. Moreover, even if the task can be done manually by selection of the unit cell,28 this is time-intensive and cannot be done, for e.g., videos of lattice transformations, which would require many such iterations. Therefore, we require a classification scheme that works robustly in the presence of noisy images of lattices that can be rotated arbitrarily, and wherein the symmetry can vary spatially (e.g., due to mixture of phases).

In the past, similar problems were faced by the computer vision community, where approaches such as the scale invariant feature transform32 were designed as methods to attempt to determine the appropriate feature vectors that would be suitable for the classification. However, such methods have quickly become overtaken by the development of deep convolutional neural networks (DCNN), which have displayed remarkable accuracy that rivals humans in image classification tasks.33,34 The key advantage of a DCNN is the ability for the convolutional layers to “learn” abstract features that are mostly independent of position and scale, allowing them to be useful in predictions.35 Interestingly, whether a DCNN can be used in the determination of crystal structure is unknown, because in this case, the features (diffraction spots) are identical, but the classification hinges on their relative positions and angles. Indeed this would likely be a difficult task for very deep networks, because information can be lost as the input is passed through many layers, but it should be stressed that this downside of DCNNs can be alleviated through a number of methods36,37, e.g., using densely connected networks.38

Here, we show that through the appropriate training and network architecture, a DCNN can learn the five different 2D crystal lattice types, independent of the scale and rotation, and allow for automated prediction of the Bravais lattice type. We find that the network has an 85% success rate on the simulated validation set, and that in 75% of cases where the prediction is incorrect, the second-most likely prediction is correct. We further validate the DCNN on real images obtained from both scanning tunneling microscopy and scanning transmission electron microscopy, and utilize the Monte-Carlo dropout39 technique to estimate the confidence of the predictions. We visualize the network with a reduced t-SNE representation, which surprisingly maps the symmetry classes in a physically intuitive form. We then apply our model towards understanding the phase transformation in Mo-doped WS2 under electron beam irradiation, which allows us to determine the phase transformation kinetics, which, in this case, appear exponential with beam time exposure. Combined with other recent advances in deep learning for atomic scale image analysis40, we believe these studies lay the foundation for a 2D AI crystallographer, which will be integral in future automated analysis workflows for STEM and STM platforms.

Results

Training set

The key task of machine learning-based approaches for physics-related studies is to establish an appropriate training set that captures the physics of the problem. For the image set for our DCNN, we generated a set of images of six different lattice types, with 4000 members in each set totaling 24,000 images. Note that the sixth set was a “noise” class, i.e., what the classifier is expected to return if there is no periodicity present (or no atoms are present). For each Bravais lattice type, we vary all possible lattice parameters (effectively allowing the DCNN to learn that the lattice types are invariant on scale), and then add arbitrary rotations, along with some randomization of atomic positions, to simulate (uncorrelated) disorder. Finally, the FFT of each image (only amplitude) is taken and stored. Note that for the noise class, we generated images by beginning with an oblique lattice, and then perturbing the atomic positions until no clear FFT diffraction spots could be observed. In contrast to most work in the deep learning community, we worked exclusively in the Fourier space, because this type of preprocessing is expected to maximize information on the periodicity, and suppress noise to the central reflection, simplifying the classification task. At the same time, this allows more transferability of the approach, as the FFT is expected to be similar across different imaging modalities, negating the need for specific simulations (such as multi-slice for STEM, or density functional theory for STM).

Network architecture

The architecture of the DCNN is shown in Fig. 1a, and consists of three convolutional layers, followed by an average pooling layer, and then a dense (fully connected) layer. An average pooling layer was added to increase the “connectedness” of the convolutional filters, matching the physics of the problem. The single average pooling layer reduces the spatial resolution of the feature maps by a factor of two only before it is fed to the dense layer. Note that we do not use global average pooling. Furthermore, the average pooling is used here as opposed to max pooling to reduce “lost” information on spatial positions as data are propagated through the network. Rectified linear units were used as activation functions in these layers. Finally, the last layer consists of a six-unit dense layer with softmax activation, so that the classification outputs would sum to one, respecting the choices available. It is this layer that provides the estimate of which class the input image belongs to. To prevent overfitting, we employed the commonly used weight decay (l2 regularization), as well as dropout layers (shown in red), which were set to randomly mask 20% of the input from the previous layer, before propagating them to the next layer on each training batch. In addition to boosting the generalizability of the network, dropout also allows for uncertainty quantification, as shown by Gal and Ghahramani.39 Effectively, running a single image through the trained network thousands of times with the dropout layers active can serve as a type of inference, allowing probability distributions of the resulting classifications to be computed. All work was conducted using keras41 with a TensorFlow backend. Optimization of the network was performed with the Adam optimizer utilizing the cross-entropy metric (all hyperparameters kept at default values from the keras implementation). We trained on 19,200 images and validated on 4800 for a total of 30 epochs (one epoch is one complete pass through the training dataset), with the results of the training accuracy and validation accuracy shown in Fig. 1b. It is seen that after ~15 epochs, little improvement in the validation accuracy occurs, which plateaus at about 85%.

Fig. 1
figure 1

Deep convolutional neural network for symmetry classification. a Schematic of the DCNN structure. A lattice image is input and transformed via a 2D fast Fourier Transform (FFT). This image is input to the DCNN, which outputs probability of classification into one of the six classes (five Bravais lattice types, and one for “noise”, i.e., no periodicity). b Training and validation accuracy as a function of epoch. One epoch is one complete pass through the training data. The dashed line is a guide only with a value of 0.85

We first plot some results of the network from the validation set, i.e., the data that the network has not seen during the training step. The results are shown in Fig. 2, which shows that the network misclassified 5 of the 18 images presented, highlighted in red (more examples shown in Supplementary Figure 1). These validation images show that the DCNN is indeed learning that scale is not an important feature for the classification task, as can be seen by the FFT spectra of the rectangular centered lattices of different scale. Unlike in a standard image classification task, this task is more difficult given that some classes can merge into others, for example a rectangle can easily morph into the square class for values of lattice parameters that are (within noise) indistinguishable; therefore, 100% accuracy is neither possible nor desirable. Instead, we computed the probability over the classifications via the use of the Monte-Carlo dropout method,39 with 5000 iterations for each image, with the probability shown next to each prediction in the titles in Fig. 2. This allows us to determine a mean probability and a standard deviation over the predictions. Given the difficulty of the task, it is more important that the confidence estimates be appropriate, and that the second-most likely physical solution be correct. It is seen that the confidence of the incorrect predictions is generally low (below 65% in all of the cases) and the mis-classifications generally appear reasonable.

Fig. 2
figure 2

Classification examples of the DCNN. Randomly selected (simulated) images from the validation set, and their classification by the DCNN with the predicted class (Pred:) shown above the true (Actual) class. The probability is shown in parentheses next to the prediction. Note the probabilities are computed via 5000 passes of the individual images through the network with dropout layers active. Misclassified images are boxed in red

DCNN performance

To investigate these errors more closely, we chose 1000 images from the training set, and investigated the statistics during incorrect predictions. The prediction confidence histogram is plotted in Fig. 3a and shows a distribution that is centered around lower values, i.e., when the prediction is wrong, the confidence tends to be low. This is of course evidence that the network is uncertain, as should be the case (one would not wish a network to be confident in incorrect predictions). For more inspection, we turned to an example of an incorrect prediction for a single lattice, shown in Fig. 3b. The prediction confidence for the six classes along with their standard deviations are plotted in Fig. 3c for this image. As can be seen, the probability is highest for the rectangle class, but the second-most likely prediction is square, which is the ground truth. The suggests a good degree of robustness for the classifier (more examples shown in Supplementary Figure 2). Indeed, we find that the first or second prediction of the DCNN is correct in 94% of cases, when evaluated on this set of images.

Fig. 3
figure 3

DCNN Performance. a Prediction confidence for all incorrect predictions made from a set of (simulated) 1200 images. In total, 188 images were classified incorrectly, and the second-most probable class was correct on 118 of those occasions. b, c Single (simulated) test image b with the network output for this image in c. The bars in c correspond to one standard deviation, computed via 5000 passes of the image through the network with dropout active. d Raw STM image of graphene (scale bar, 1 nm). A 2D FFT was performed with the whole image, and in smaller windows (shown in red). The prediction probability for the oblique and “noise” class for the three windows is shown in e, with one of the error bars marking one standard deviation in the prediction confidence. As the window size becomes smaller, the prediction probability drops

Of importance is the DCNN’s susceptibility to the number of unit cells present. As an example, we plot an STM image of graphene in Fig. 3d. We proceeded to take the 2D FFT of this image using three different window sizes: the first window was the size of the entire image, the next two were the sizes of the two red squares shown in Fig. 3d. That is, the window size was decreased by a factor of 2 each time, and the DCNN prediction on the symmetry class was made. The results are seen in Fig. 3e and show that a oblique symmetry class is predicted for the largest FFT (shown inset in Fig. 3e), but that the confidence in this prediction steadily decreases with decreasing window sizes. The associated FFT spectra become much more convolved, and substantial edge effects can be seen. It is interesting that the oblique class, instead of the hexagonal class, is predicted for this image. The lattice is ideally hexagonal, but substantial drift and disorder effects serve to reduce the symmetry to the oblique class. Moreover, the decreasing confidence of the classifier with smaller window sizes is instructive of the degree to which features should be distinguishable for the DCNN to accurately make a symmetry judgment.

Visualizing the features

To gain more insight into how the DCNN is working, we may observe that the purpose of a DCNN is to learn features of image data that can then be separated into the available classes via a linear classifier. Thus, observing the weights of the last layer (before the classification layer) can provide some information as to the nature of this classification space. Given that this is a 128-length vector, visualization requires projecting to a lower dimensional space. However, this dimensionality reduction should be done via a method that maintains (approximately) the distance between images in both the low and high-dimensional representations. The t-distributed stochastic neighbor embedding (t-SNE)42 provides such a method, and we used it to calculate the 2D embedding for 1000 training images (perplexity was set to 10). These vectors then form the (x,y) coordinates for each image, as shown in Fig. 4. The specific images are colored by their true classes. Essentially, images that are nearby in this representation are also nearby in the DCNN and can be though to be “similar” as far as the network is concerned. It should be noted that the distances in this representation from one class to the next carry no meaning.42 Note that the noise class is substantially further away than has been drawn in the figure (this distance of the noise class from the others is apparent when the inter-class distances are computed from the full 128-length vector representation, see Supplementary Figure 3). The oblique class members can be seen scattered near the other classes, as would be expected (given this is the lowest symmetry class).

Fig. 4
figure 4

t-SNE reduced representation of the classes. The embeddings for 1000 training images are visualized. Essentially, this plot shows the closeness of images as determined by the DCNN. The noise class is boxed, because it is much further away from the other classes in this representation (though inter-class distances have no meaning in t-SNE)

Application to edge cases

To better determine how the classifier separates edge cases (i.e., cases where the symmetry is close to other classes), we tested the DCNN on two images of graphene taken by scanning tunneling microscopy. In the first image, show in Fig. 5a, a slight drift of the microscope is evident. Thus, this provides an excellent test case for the DCNN to determine the stringency of the constraints for the network to classify a structure as hexagonal. The 2D FFT of the same STM image is shown in Fig. 5b, and the probabilities of classification are shown in Fig. 5c (note, 5000 passes of the network were used to determine the probabilities, which takes ~1 min per image on a standard desktop). The standard deviation is shown as error bars on the predictions. The DCNN predicts that the lattice type is oblique. It then predicts that the second most likely structure is the hexagonal class, and there are relatively large uncertainties in these predictions. The reason for the oblique classification as opposed to hexagonal is highly likely due to the drift in the image, which breaks the symmetry (see Supplementary Figure 4 for analysis). Next, we used the DCNN to classify the image in Fig. 5d, e, again of graphene, but this time containing defects as well a larger scale Moiré pattern. This image should be substantially easier to gauge the symmetry because it is not beset by microscope drift; on the other hand, the existence of the Moiré pattern may also make it more difficult, due to extra reflections in the 2D FFT pattern. The FFT spectra is shown in Fig. 5e and shows the higher-order reflections. Interestingly, the classifier is robust enough to determine the hexagonal symmetry in this instance, despite the extra reflections. Indeed, hand-coding features is highly likely to fail for cases such as these, due to multiple spots having similar intensity but being of different orders.

Fig. 5
figure 5

Application to edge cases of real STM images of graphene. a Graphene STM image (scale bar, 1 nm) with some slight microscope drift during the scan. b Associated 2D FFT from image in a. c DCNN output for this image. Because of the slight drift, the DCNN suggests that the oblique symmetry is more likely than the hexagonal symmetry, although the variance is large. d Raw STM Image with both defects and Moiré patterns (scale bar, 5 nm). e 2D FFT of image in d. f DCNN output for this image. The hexagonal class is most likely according to the network. Even in the presence of additional reflections from the Moiré pattern, the DCNN is still able to make the correct determination of the symmetry

Extension to transformations

While interesting for single images, the real utility of these classifiers is undoubtedly for image sequences (movies), comprising the evolution of atomic structures under the effect of temperature or electron beam bombardment, where such analysis cannot be done by simple observation. We captured a movie of e-beam-mediated decomposition of Mo-doped WS2, with frames 30, 60, and 90 shown in Fig. 6a–c, respectively. Here, we used a 100 kV beam which will produce damage from both radiolysis and knock-on damage particularly to the S atoms.

Fig. 6
figure 6

Analysis of defect evolution in Mo-doped WS2. ac STEM images of WS2 as a function of time, at a 0 s, b 157.5 s, and c 202.5 s. Scale bar, 5 nm. d, e Probability of classification of the “noise” d and “oblique” e classes for the image in c. Note that these images are generated via sliding a window (size depicted in a) across the image, and inputting each of the windowed images to the DCNN to extract class probabilities. f Average phase fraction calculated in each frame by the DCNN method (colored lines), and manual intensity thresholding (black circles). An exponential fit (discussed in text) to the data is shown as a black dashed line

We calculated the maximum energy transfer via the formula

$$\Delta E = \frac{{2\left( {E + 2E_0} \right)E}}{{Mc^2}}$$
(1)

where E is the beam electron kinetic energy, E0 = m0c2 (511 keV), m0 is the rest mass of the electron, and M is the atom mass, which yields 1.6 eV for W and 7.5 eV for S.43 Based on these estimates and average binding energy of atoms in the lattice, the e-beam effect can be represented as a gradual reduction of material and oversaturation with respect to S vacancies, leading to the gradual collapse of the 2D lattice and formation of W-rich phases. Moreover, previous studies have found evidence for e-beam induced chemical reactions,44,45 which may also lead to a damage-assisting mechanism here as well. Thus, unravelling the atomic level details of the damage evolution is a non-trivial task, yet understanding and controlling such evolution on an atom-by-atom basis may enable atomic scale defect generation and manipulation akin to that being developed for graphene.6,7,12,21,22,25,46

We therefore investigate an automated routine for identifying phase evolution at the atomic scale. For each frame, we computed the local FFT spectra via the sliding window method,31 with a window size of 128px and step size of 16px (the outline of the window is shown in Fig. 6a in white). Running the DCNN on each window allows classification of the local symmetries present in each frame of the image sequence, and is plotted in Fig. 6d, e for the 90th frame. Full results for the movie are shown in Supplementary Video 1. The mean % of classification of each Bravais lattice symmetry class is shown in Fig. 6f for each frame. It should be noted that this is simply the mean of the probabilities of individual windows of each frame belonging to a particular class and does not necessarily imply the existence of voids. For instance, if the probability of the void is 10% in each of the windows of the frame, but the probability of oblique lattice is 90% in each window, then the mean probability of the noise class is still 10%, even though the DCNN has determined that all the windows belong to the oblique class. Nonetheless, we expect this to be valid here, as the probability of the noise class would increase if many windows began to be classified as such within the frame. Evidently, there are motions of the atoms that disrupt the hexagonal symmetry and are likely due to the continual formation and filling of voids by the electron beam, as well as possible rippling effects. Therefore, the symmetry reverts to oblique due to these displacements. Initially there is a steady-state situation wherein the beam is not introducing defects (voids) large enough to cause cascade and runaway growth of the voids. Eventually, voids on the bottom right of the frame become larger, leading to runaway growth (see progression of frames in Fig. 6a–c.

From Fig. 6c, we observe the exponential increase in the “noise” phase fraction in Fig. 6f beyond a particular time. We can express this mathematically as \(\frac{{\partial A(t)}}{{\partial t}} = \rho A(t)\), where A(t) is the area of the hole that changes with time, and ρ is the sputtering rate for the atoms at the hole edge. Here, we are simply saying that the hole growth rate is dependent on the size of the holes. This is an ordinary differential equation with the solution, A(t) = A0exp(ρt) where A0 is the initial hole size. Note here the sizes A(t) are given as phase fractions, i.e., between 0 and 1. Clearly, the initial hole size cannot be zero, which suggests that this model is only useful for describing what happens after a small hole has been formed. Nevertheless, fitting this model to the data we can obtain an estimate for the initial hole size and determine the sputtering rate during our experiment. However, because this equation describes only the growth of the holes and not the generation, we must add another constant that represents the initial state where holes are being generated and healed randomly under the electron beam. In this situation, holes are indeed present, so our hole fraction does not start at zero, i.e., A(t) = A0exp(ρt)+C. We use the first 100 s as a representation of this initial state to determine the value of this constant, C, which is ~0.059. We then fit the above equation to the observed data, allowing A0 and ρ to be our fitting parameters for time t > 100 s. The curve of best fit is represented in Fig. 6f as a dashed black line. We see that our model closely follows the observed data and we obtain a minimum hole size of 1.85 × 10−5 (equivalent to an area of ~0.004 nm2), which is approximately the dimensions of a single vacancy, and a sputtering rate of ~9 nm2/s. As a sanity check, we also expect to observe agreement in this case between the DNNC and a simple threshold on the image (since the holes are clearly very dark). Discriminating the hole fraction through intensity thresholding results in the black dots in Fig. 6f. While the fractions are slightly different, they are generally within 20% of the DCNN, and further, there is good agreement as the voids become larger. These results suggest that the DNNC method is robust and reliable. Similarly, we have also performed a check on the robustness of the classifier on simulated data, for a slowly varying lattice that gradually transforms to hexagonal, which behaves as expected (see Supplementary Figure 5).

Discussion

Here, we introduce a deep learning model for the identification of symmetry classes in atomically resolved images in electron and scanning probe microscopies. This task is highly non-trivial given that local symmetry is generally poorly defined given the noise in the images and must allow for image rotations and scale invariance. The DCNN overcomes these problems and provides a robust tool for analysis of images and dynamic image sequence data. The future development of this field can include combinations of the DCNN with local atomic finding that will allow determination of primitive cells. We believe these approaches can be complementary, with symmetry finders improving the atomic recognition by providing priors based on expected local symmetries, and in turn improving symmetry determination. This approach provides a way to fully unravel the mechanisms involved in atomic changes during temperature- and beam-induced processes.

Further, these data can be of significant interest in the context of recent advances in understanding and designing new materials via large-scale integration of predictive modeling and machine learning methods,47,48,49,50,51,52,53,54 collectively referred to as the Materials Genome.51,55,56 This progress hinges on the fact that crystalline systems are associated with long-range periodicity and symmetries, which form the natural descriptors of crystalline materials. A combination of the structural data, processing parameters, and functionalities further enabled artificial intelligence-driven workflows for materials design and even optimization of synthesis,57,58,59,60 extending recent advances for organic molecule synthesis61,62 towards inorganic systems. We believe that approaches developed here and in other publications40,63,64 will provide experiment-based descriptors for structure of solids that can be used in similar workflows.

Data availability

A trained keras model along with notebook for analysis are available with this manuscript in the supplementary material.