Mapping mesoscopic phase evolution during E-beam induced transformations via deep learning of atomically resolved images

Understanding transformations under electron beam irradiation requires mapping the structural phases and their evolution in real time. To date, this has mostly been a manual endeavor comprising difficult frame-by-frame analysis that is simultaneously tedious and prone to error. Here, we turn toward the use of deep convolutional neural networks (DCNN) to automatically determine the Bravais lattice symmetry present in atomically resolved images. A DCNN is trained to identify the Bravais lattice class given a 2D fast Fourier transform of the input image. Monte-Carlo dropout is used for determining the prediction probability, and results are shown for both simulated and real atomically resolved images from scanning tunneling microscopy and scanning transmission electron microscopy. A reduced representation of the final layer output allows to visualize the separation of classes in the DCNN and agrees with physical intuition. We then apply the trained network to electron beam-induced transformations in WS2, which allows tracking and determination of growth rate of voids. We highlight two key aspects of these results: (1) it shows that DCNNs can be trained to recognize diffraction patterns, which is markedly different from the typical “real image” cases and (2) it provides a method with in-built uncertainty quantification, allowing the real-time analysis of phases present in atomically resolved images. Finding crystal type in large electron microscope data sets is time intensive; inspired by computer vision, a neural network may help. However, application is limited due to issues like arbitrary crystal orientation. Here, lead by Rama Vasudevan at Oak Ridge National Laboratory and colleagues in Puerto Rico, the US, and India, a team trained a neural network to recognize 2D crystal lattices from electron microscope data. Additional Monte Carlo analysis gives probabilistic prediction and statistical deviation. This network correctly predicts crystal lattice 85% of the time, with 75% of second predictions correct. Incorrect predictions have large deviations, providing a heuristic for edge cases. The network was used to investigate WS2 structure under electron bombardment, finding rhombohedral structure disappears. This could be a robust tool to analyse real-time electron microscope data for materials optimisation and design.


Introduction
Phase formation and evolution is one of the key processes in the solid-state chemistry, physics, electrochemistry, and materials science.2][3][4][5] Despite the cornerstone importance of this field, information on process parameters is derived from macroscopic measurements, or via scattering studies of large volume of material, while atomistic mechanisms remain largely unexplored.
In the last several years, advances in (scanning) transmission electron microscopy ((S)TEM) have enabled visualization the atomic dynamics during multiple solid state processes, 6,7 including vacancy ordering in cobaltites, 8 phase transformation at crystalline oxide interfaces, 9 restructuring in 2D silica glass, 10 single point defect motion, [11][12][13] single atom catalytic activity, 14- 16 elastic-plastic transitions, 17 and dislocation migration. 18The atomic dynamics were observed in response to classical macroscopic stimuli such as temperature and pressure, but also as a response to e-beam irradiation. 19,20Understanding these transformations requires establishing the nature of the phases and structures contained in a single image, ideally during image acquisition process.
[23][24][25][26] For any atomically-resolved image, the first task is to identify the constituent lattice types present and determine their spatial distribution in the image.A review of existing approaches by Moeck is a good reference on this topic. 27These codes and software tools analyze individual images and (some) require the user to select the constituent unit cell, and then proceed to use motif-matching style algorithms to determine the space group.As outlined in the review 27 , there are several issues that limit the application of these methods, most obviously the difficulty in determining the uncertainties in the classification associated with the noise in atomic positions, but also that in many images a mixture of phases is present, necessitating an image segmentation that respects the available symmetries.
Previously we have used a sliding window-based method, to segment images based on linear unmixing of the (local) 2D fast Fourier Transform (FFT) spectra, via the N-FINDR unmixing algorithm 28 with both positivity and sum-to-one constraints. 29,30Ideally one would like to impose an additional constraint that the endmembers should conform to some symmetry types in the case of atomically-resolved images.For the 2D case, each lattice must reside in one of the five Bravais Lattice types, i.e. square, rectangle, centered rectangle, hexagonal and rhombohedral.
Thus, this task effectively boils down to determination of which of the five Bravais lattice types each portion of the image belongs to.This can be trivially defined mathematically for the standard coordinate frame, but the task is more complex when one considers that in any real atomically resolved image, the lattices themselves may be rotated in arbitrary fashion, and multiple diffraction orders can be present.Moreover, there will always exist noise and slight distortions that can make hand-crafting the particular rules (i.e., the features) more difficult in practice.
In the past, similar problems were faced by the computer vision community, where approaches such as the scale invariant feature transform 31 were designed as methods to attempt to determine the appropriate feature vectors that would be suitable for the classification.However, such methods have quickly become overtaken by the development of deep convolutional neural networks (DCNN), which have displayed remarkable accuracy that rivals humans in image classification tasks. 32,33The key advantage of a DCNN is the ability for the convolutional layers to 'learn' abstract features that are mostly independent of position and scale, allowing them to be useful in predictions. 34Interestingly, whether a DCNN can be used in the determination of crystal structure is unknown, because in this case, the features (diffraction spots) are identical, but the classification hinges on their relative positions and angles.In theory, this is a poorly suited task for fully convolutional neural networks, given that each layer only received input from the node immediately previous.
Here, we show that through the appropriate training and network architecture, a DCNN can learn the five different crystal structure types, independent of the scale and rotation, and allow for automated prediction of the Bravais lattice type.We find that the network has an 85% success rate on the simulated validation set, and that in 75% of cases where the prediction is incorrect, the second-most likely prediction is correct.We further validate the DCNN on real images obtained from both scanning tunneling microscopy and scanning transmission electron microscopy, and utilize the Monte-Carlo dropout 35 technique to estimate the confidence of the predictions.We visualize the network with a reduced t-SNE representation, which surprisingly maps the symmetry classes in a physically intuitive form.We then apply our model towards understanding the phase transformation in Mo doped WS2 under electron beam irradiation, which allows us to determine the phase transformation kinetics, which are in this case appear exponential with beam time exposure.Combined with other recent advances in deep learning for atomic scale image analysis 36 , we believe these studies lay the foundation for a 2D AI crystallographer, which will be integral in future automated analysis workflows for STEM and STM platforms.

Training Set
The key task of machine learning based approaches for physics-related studies is to establish appropriate training set that captures the physics of the problem.For the image set for our DCNN, we generated a set of images of six different lattice types, with 4000 members in each set totaling 24000 images.Note that the sixth set was a 'noise' class, i.e. what the classifier is expected to return if there is no periodicity present (or no atoms are present).For each Bravais lattice type we vary all possible lattice parameters (effectively allowing the DCNN to learn that the lattice types are invariant on scale), and then add arbitrary rotations, along with some randomization of atomic positions, to simulate (uncorrelated) disorder.Finally, the FFT of each image (only amplitude) is taken and stored.Note that for the noise class, we generated images by beginning with a rhombohedral lattice, and then perturbing the atomic positions until no clear FFT diffraction spots could be observed.In contrast to most work in the deep learning community, we worked exclusively in the Fourier space, because this type of preprocessing is expected to maximize information on the periodicity, and suppress noise to the central reflection, simplifying the classification task.At the same time, this allows more transferability of the approach, as the FFT is expected to be similar across different instrumentation platforms, negating the need for specific simulations (such as multi-slice for STEM, or density functional theory for STM).

Network Architecture
The architecture of the DCNN is shown in Fig. 1(a), and consists of 3 convolutional layers, followed by an average pooling layer, and then a dense (fully connected) layer.An average pooling layer was added to increase the 'connectedness' of the convolutional filters, matching the physics of the problem.Rectified linear units were used as activation functions in these layers.Finally, the last layer consists of a six-unit dense layer with softmax activation, so that the classification outputs would sum to one, respecting the choices available.It is this layer that provides the estimate of which class the input image belongs to.To prevent overfitting, we employed the commonly used weight decay (l2 regularization), as well as dropout layers (shown in red), which were set to randomly mask 20% of the input from the previous layer, before propagating them to the next layer on each training batch.In addition to boosting the generalizability of the network, dropout also allows for uncertainty quantification, as shown by Gal and Ghahramani. 35Effectively, running a single image through the trained network thousands of times with the dropout layers active can serve as a type of inference, allowing probability distributions of the resulting classifications to be computed.Optimization of the network was performed with the Adam optimizer utilizing the cross-entropy metric.All work was conducted using keras with a TensorFlow backend.We trained on 19200 images and validated on 4,800 for a total of 30 epochs (one epoch is one complete pass through the training dataset), with the results of the training accuracy and validation accuracy shown in Fig. 1(b).It is seen that after ~15 epochs, little improvement in the validation accuracy occurs, which plateaus at about 85%.

Results
We first plot some results of the network from the validation set, i.e., the data that the network has not seen during the training step.The results are shown in Figure 2, which shows that the network misclassified 5 of the 18 images presented, highlighted in red (more examples shown in Supplementary Figure 1).These validation images show that the DCNN is indeed learning that scale is not an important feature for the classification task, as can be seen by the FFT spectra of the centered rectangle lattices of different scale.Unlike in a standard image classification task, this task is more difficult given that some classes can merge into others, for example a rectangle can easily morph into the square class for values of lattice parameters that are (within noise) indistinguishable; therefore, 100% accuracy is neither possible nor desirable.Instead, we computed the probability over the classifications via the use of the Monte-Carlo dropout method 35 , with 5000 iterations for each image, with the probability shown next to each prediction in the titles in Figure 2.This allows us to determine a mean probability and a standard deviation over the predictions.Given the difficulty of the task, it is more important that the confidence estimates be appropriate, and that the second-most likely physical solution be correct.It is seen that the confidence of the incorrect predictions is generally low (below 65% in all cases) and the misclassifications generally appear reasonable.

Errors and Noise Limits
To investigate these errors more closely, we chose 1000 images from the training set, and investigated the statistics during incorrect predictions.The prediction confidence histogram is plotted in Fig. 3(a) and shows a distribution that is centered around lower values, i.e. when the prediction is wrong, the confidence tends to be low.This is of course evidence that the network is uncertain, as should be the case (one would not wish a network to be confident in incorrect predictions).For more inspection, we turned to an example of an incorrect prediction for a single lattice, shown in Fig. 3(b).The prediction confidence for the six classes along with their standard deviations are plotted in Fig. 3(c) for this image.As can be seen, the probability is highest for the rectangle class, but the second-most likely prediction is square, which is the ground truth.The suggests a good degree of robustness for the classifier (more examples shown in Supplementary Figure 2).
Of importance is the DCNN's susceptibility to noise.As an example, we plot a (noisy) STM image of graphene in Fig. 3(d).We proceeded to take the 2D FFT of this image using 3 different window sizes: the first window was the size of the entire image, the next two were the sizes of the two red squares shown in Fig. 3(c).That is, the window size was decreased by a factor of 2 each time, and the DCNN prediction on the symmetry class was made.The results are seen in Fig. 3(e) and show that a rhombohedral symmetry class is predicted for the largest FFT (shown inset in Fig. 3(e)), but that the confidence in this prediction steadily decreases with decreasing window sizes.The associated FFT spectra become much more convolved, and substantial edge effects can be seen.It is interesting that the rhombohedral class, instead of the hexagonal class, is predicted for this image.The lattice is ideally hexagonal, but substantial drift and disorder effects serve to reduce the symmetry to the rhombohedral class.Moreover, the decreasing confidence of the classifier with smaller window sizes is instructive of the degree to which features should be distinguishable for the DCNN to accurately make a symmetry judgement.

Visualizing the features
To gain more insight into how the DCNN is working, we may observe that the purpose of a DCNN is to learn features of image data that can then be separated into the available classes via a linear classifier.Thus, observing the weights of the last layer (before the classification layer) can provide some information as to the nature of this classification space.Given that this is a 128length vector, visualization requires projecting to a lower dimensional space.However, this dimensionality reduction should be done via a method that maintains (approximately) the distance between images in both the low and high-dimensional representations.The t-distributed stochastic neighbor embedding (t-SNE) 37 provides such a method, and we used it to calculate the 2D embedding for 1000 training images.These vectors then form the (x,y) coordinates for each image, as shown in Fig 4 .The specific images are colored by their true classes.Essentially, images that are nearby in this representation are also nearby in the DCNN and can be though to be 'similar' as far as the network is concerned.Note that the noise class is substantially further away than has been drawn in the figure.Interestingly, the network separated the centered rectangle class very distinctly from the hexagonal class.Moreover, the rhombohedral class members can be seen scattered near the other classes, as would be expected (given this is the lowest symmetry class).
The closeness of the square and rectangle classes is also expected, and altogether support the idea that the classification appears to be made by the DCNN based on symmetry features.

Application to Edge Cases
To better determine how the classifier separates edge cases (i.e., cases where the symmetry is close to other classes), we tested the DCNN on two images of graphene taken by scanning tunneling microscopy.In the first image, show in Fig. 5(a), a slight drift of the microscope is evident.Thus, this provides an excellent test case for the DCNN to determine the stringency of the constraints for the network to classify a structure as hexagonal.The 2D FFT of the same STM image is shown in Fig. 5(b), and the probabilities of classification are shown in Fig. 5(c) (note, 5000 passes of the network were used to determine the probabilities, which takes ~1 minute per image on a standard desktop).The standard deviation is shown as error bars on the predictions.
The DCNN predicts, that the lattice type is rhombohedral.It then predicts that the second most likely structure is the hexagonal class, and there are relatively large uncertainties in these predictions.The reason for the rhombohedral classification as opposed to hexagonal may be due to the slight disorder (reflected as dispersion of the diffraction spots).To test this assumption, we used the DCNN to classify the image in Fig. 5(d,e), again of graphene, but this time containing defects as well a larger scale Moiré pattern.The FFT spectra is shown in Fig. 3(e) and shows the higher order reflections.Interestingly, the classifier is robust enough to determine the hexagonal symmetry in this instance, despite the extra reflections.Indeed, hand-coding features is highly likely to fail for cases such as these, due to multiple spots having similar intensity but being of different orders.

Extension to Transformations
While interesting for single images, the real utility of these classifiers is undoubtedly for image sequences (movies), comprising the evolution of atomic structures under the effect of temperature or electron beam bombardment, where such analysis cannot be done by simple observation.We captured a movie of e-beam-mediated decomposition of Mo-doped WS2, with frames 30,60 and 90 shown in Fig. 6(a-c), respectively.Here, we used a 100 kV beam which will produce damage from both radiolysis and knock-on damage particularly to the S atoms.
We calculated the maximum energy transfer via the formula where E is the beam electron kinetic energy,  0 =  0  2 (511 keV), m0 is the rest mass of the electron, and M is the atom mass, which yields 1.6 eV for W and 7.5 eV for S. 38 Based on these estimates and average binding energy of atoms in the lattice, the e-beam effect can be represented as a gradual reduction of material and oversaturation with respect to S vacancies, leading to the gradual collapse of the 2D lattice and formation of W-rich phases.Moreover, previous studies have found evidence for e-beam induced chemical reactions, 39,40 which may also lead to a damageassisting mechanism here as well.Thus, unravelling the atomic level details of the damage evolution is a non-trivial task, yet understanding and controlling such evolution on an atom-byatom basis may enable atomic scale defect generation and manipulation akin to that being developed for graphene. 6,7,12,21,22,25,41 therefore investigate an automated routine for identifying phase evolution at the atomic scale.For each frame, we computed the local FFT spectra via the sliding window method, 30 with a window size of 128px and step size of 16px (the outline of the window is shown in Fig. 6 It should be noted that this is simply the mean of the probabilities of individual windows of each frame belonging to a particular class and does not necessarily imply the existence of voids.For instance, if the probability of the void is 10% in each of the windows of the frame, but the probability of rhombohedral lattice is 90% in each window, then the mean probability of the noise class is still 10%, even though the DCNN that all the windows belong to the rhombohedral class.Nonetheless, we expect this to be valid here, as the probability of the noise class would increase if many windows began to be classified as such within the frame.Due to the large scale random motions of the atoms, the symmetry defaults to rhombohedral.Initially there is a steady-state situation wherein the beam is not introducing defects (voids) large enough to cause cascade and runaway growth of the voids.
Eventually, voids on the bottom right of the frame become larger, leading to runaway growth (see progression of frames in Fig. 6(a-c).
From Fig. 6(c), we observe the exponential increase in the 'noise' phase fraction in Fig. 6(f) beyond a particular time.We can express this mathematically as is the area of the hole which changes with time, and  is the sputtering rate for the atoms at the hole edge.Here, we are simply saying that the hole growth rate is dependent on the size of the holes.This is an ordinary differential equation with the solution, () =  0 exp () where  0 is the initial hole size.Note here the sizes A(t) are given as phase fractions, i.e. between 0 and 1.
Clearly, the initial hole size cannot be zero, which suggests that this model is only useful for describing what happens after a small hole has been formed.Nevertheless, fitting this model to the data we can obtain an estimate for the initial hole size and determine the sputtering rate during our experiment.However, because this equation describes only the growth of the holes and not the generation, we must add another constant which represents the initial state where holes are being generated and healed randomly under the electron beam.In this situation, holes are indeed present, so our hole fraction does not start at zero, i.e. () =  0 exp() + .We use the first 100s as a representation of this initial state to determine the value of this constant, C, which is ~0.059.We then fit the above equation to the observed data, allowing  0 and  to be our fitting parameters for time t>100s.The curve of best fit is represented in Fig. 6(f) as a dashed black line.We see that our model closely follows the observed data and we obtain a minimum hole size of 1.85x10 -5 (equivalent to an area of ~0.004nm 2 ), which is approximately the dimensions of a single vacancy, and a sputtering rate of ~9 nm 2 /sec.As a sanity check, we also expect to observe agreement in this case between the DNNC and a simple threshold on the image (since the holes are clearly very dark).Discriminating the hole fraction through intensity thresholding results in the black dots in Fig. 6(f).While the fractions are slightly different, they are generally within 20% of the DCNN, and further, there is good agreement as the voids become larger.These results suggest that the DNNC method is robust and reliable.

Discussion
Here, we introduce the deep learning model for the identification of symmetry classes in atomically resolved images in electron and scanning probe microscopies.This task is highly nontrivial given that local symmetry is generally poorly defined given the noise in the images and must allow for image rotations and scale invariance.The DCNN overcomes these problems and provides a robust tool for analysis of images and dynamic image sequence data.The future development of this field can include combinations of the DCNN with local atomic finding that will allow determination of primitive cells.We believe these approaches can be complementary, with symmetry finders improving the atomic recognition by providing priors based on expected local symmetries, and in turn improving symmetry determination.This approach and provide way to fully unravel the mechanisms involved in atomic changes during temperature-and beam induced processes.
Further, this data can be of significant interest in the context of recent advances in understanding and designing new materials via large scale integration of predictive modelling and machine learning methods, [42][43][44][45][46][47][48][49] collectively referred to as the Materials Genome. 46,50,51This progress hinges on the fact that crystalline systems are associated with long-range periodicity and symmetries, which form the natural descriptors of crystalline materials.Combination of the structural data, processing parameters, and functionalities further enabled artificial intelligence driven workflows for materials design and even optimization of synthesis, [52][53][54][55] extending recent advances for organic molecule synthesis 56,57 towards inorganic systems.We believe that approaches developed here and in other publications 36,58,59 will provide experiment-based descriptors for structure of solids that can be used in similar workflows.
(a) in white).Running the DCNN on each window allows classification of the local symmetries present in each frame of the image sequence, and is plotted in Fig 6(d,e) for the 90 th frame.Full results for the movie are shown in Supplementary Video 1.The mean % of classification of each Bravais lattice symmetry class is shown in Fig 6(f) for each frame.

Figure 1 :
Figure 1: Deep Convolutional Neural Network for symmetry classification.(a) Schematic of the DCNN structure.A lattice image is input, and transformed via a 2D fast Fourier Transform (FFT).This image is input to the DCNN, which outputs probability of classification into one of the six classes (five Bravais lattice types, and one for 'noise', i.e., no periodicity).(b) Training and validation accuracy as a function of epoch.One epoch is one complete pass through the training data.

Figure 2 :
Figure 2: Classification examples of the DCNN.Randomly selected (simulated) images from the validation set, and their classification by the DCNN with the predicted class ('Pred:') shown above the true ('Actual') class.The probability is shown in parentheses next to the prediction.Note the probabilities are computed via 5000 passes of the individual images through the network with dropout layers active.Misclassified images are boxed in red.

Figure 3 :
Figure 3: DCNN Performance and Noise Limits.(a) Prediction confidence for all incorrect predictions made from a set of (simulated) 1200 images.In total, 188 images were classified incorrectly, and the second-most probable class was correct on 118 of those occasions.(b,c) Single (simulated) test image (b) with the network output for this image in (c).The bars in (c) correspond to one standard deviation, computed via 5000 passes of the image through the network with dropout active.(d) Raw STM image of graphene (scale bar, 1nm).A 2D FFT was performed with the whole image, and in smaller windows (shown in red).The prediction probability for the rhombohedral and 'noise' class for the three windows is shown in (e), with one the error bars marking one standard deviation in the prediction confidence.As the window size becomes smaller, the prediction probability drops.

Figure 4 :
Figure 4: t-SNE reduced representation of the classes.The embeddings for 1000 training images are visualized.Essentially, this plot shows the separation between individual images within the classifier.The noise class is in reality much further apart than drawn (thus it has been boxed).

Figure 5 :
Figure 5: Application to edge cases of real STM images of graphene.(a) Graphene STM image (scale bar, 1nm) with some slight microscope drift during the scan.(b) Associated 2D FFT from image in (a).(c) DCNN output for this image.Because of the slight drift, the DCNN suggests that the rhombohedral symmetry is more likely than the hexagonal symmetry, although the variance is large.(d) Raw STM Image with both defects and Moiré patterns (scale bar, 5nm).(e) 2D FFT of image in (d).(f) DCNN output for this image.The hexagonal class is most likely according to the network.Even in the presence of additional reflections from the Moiré pattern, the DCNN is still able to make the correct determination of the symmetry.

Figure 6 :
Figure 6: Analysis of defect evolution in Mo-doped WS2.(a-c) STEM images of WS2 as a function of time, at (a) 0s, (b) 157.5s and (c) 202.5s.Scale bar, 5nm.(d,e) Probability of classification of the 'noise' (d) and 'rhombohedral' (e) classes for the image in (c).Note that these images are generated via sliding a window (size depicted in (a)) across the image, and inputting each of the windowed images to the DCNN to extract class probabilities.(f) Average phase fraction calculated in each frame by the DCNN method (colored lines), and manual intensity thresholding (black circles).An exponential fit (discussed in text) to the data is shown as a black dashed line.