Deep learning analysis of defect and phase evolution during electron beam-induced transformations in WS2

Recent advances in scanning transmission electron microscopy (STEM) allow the real-time visualization of solid-state transformations in materials, including those induced by an electron beam and temperature, with atomic resolution. However, despite the ever-expanding capabilities for high-resolution data acquisition, the inferred information about kinetics and thermodynamics of the process, and single defect dynamics and interactions is minimal. This is due to the inherent limitations of manual ex situ analysis of the collected volumes of data. To circumvent this problem, we developed a deep-learning framework for dynamic STEM imaging that is trained to find the lattice defects and apply it for mapping solid state reactions and transformations in layered WS2. The trained deep-learning model allows extracting thousands of lattice defects from raw STEM data in a matter of seconds, which are then classified into different categories using unsupervised clustering methods. We further expanded our framework to extract parameters of diffusion for sulfur vacancies and analyzed transition probabilities associated with switching between different configurations of defect complexes consisting of Mo dopant and sulfur vacancy, providing insight into point-defect dynamics and reactions. This approach is universal and its application to beam-induced reactions allows mapping chemical transformation pathways in solids at the atomic level.


INTRODUCTION
Chemical reactions and phase transformations underpin phenomena ranging from cosmological processes, to the emergence of life on Earth, to modern technologies and are therefore of tremendous interest for both basic and applied sciences. Simple gas phase reactions of small molecules can be readily studied using well-established spectroscopy methods (infrared, 1,2 mass, 3 nuclear magnetic resonance 4 ), utilizing the spatial homogeneity of reaction volumes when the same process occurs multiple times. In conjunction with first-principles-based modeling, 5,6 a reliable picture of molecular reactivity can be built. For studies of more complex organic and biochemical reactions, time-resolved cryogenic microscopy 7 and femtosecond X-ray pump-probes 8 provide a reliable investigative framework, again relying on the statistical similarity between multiple orientations of the same molecule.
The situation is far more complicated for solid state reactions involving continuous solids. Traditionally, solid-state phase transformations and reactions were explored by bulk measurements and X-ray techniques. However, such techniques may not be able to provide sufficient spatial resolution for understanding elementary mechanisms behind the observed transformations. This problem can be partially solved by direct ex situ visualization of reaction zones, 9,10 providing information on the geometry and, in certain cases, atomic configurations at the reaction fronts. Similarly, utilization of colloid models 11 allows for the development of model systems, even though the nature of local interactions is significantly different from those found in atomic systems.
In recent years, advances in scanning transmission electron microscopy ((S)TEM) have enabled the direct visualization of dynamic phenomena at the atomic level. [12][13][14][15][16][17][18][19][20] The physical/ chemical phenomena studied by in situ STEM are wide ranging and now include e-beam induced defect evolution, [21][22][23][24][25][26][27][28][29][30] dislocation migration, [31][32][33] observation of e-beam induced production of single layer Fe and ZnO membranes in graphene nanopores, 34,35 e-beam induced chemical etching and growth from nanoparticle and single atom catalysts, [36][37][38][39][40][41] sub-10 nm scale lithography, 42 graphene healing, 43 conductive nanowire formation, 44 crystallization and amorphization at 2D interfaces, 45,46 formation of fullerenes, 47 and graphene edge dynamics. 48,49 This list can hardly be considered comprehensive, but it serves to illustrate the vast array of dynamic changes that are being observed and rapidly explored via in situ (S)TEM techniques. A tantalizing development, which was published just last year (2017), is the introduction of a single dopant atom into a graphene lattice, the controlled movement of the atom through the lattice, and the assembly of a few primitive structures atom-by-atom. [50][51][52][53] Such efforts harken back to the work of Don Eigler who first demonstrated controlled atomic motion via scanning probe techniques. 54 However, given the colorful array of other atomic, chemical processes observed in (S)TEM, and the continuously growing portfolio of commercially available in situ equipment (heating, electrical biasing, gas and liquid cells, etc.), it seems likely that many more processes can be There are amendments to this paper brought under direct control, turning the (S)TEM into an atomicscale fabrication platform. 55 Successes in e-beam atom-by-atom fabrication and atom-byatom mapping of solid-state reactions will require more than explorational research and instrumental improvements. A key piece of the puzzle will involve successfully grappling with the enormous amount of data which can be generated by these instruments to infer material-specific information describing kinetics and thermodynamics of point and extended defects, reaction paths, and mechanisms for extended defect and second phase nucleation and growth. The "by hand" analysis of years past is no longer a tractable solution considering the dimensionality and number of datasets which are now routinely obtained. This necessitates the creation of methods which allow for automated analysis of dynamic transformations to extract relevant materials descriptors and reconstruct reaction pathways from various sources of detector readouts, such as the variety of imaging and spectroscopic modes. In this article, we attempt to forge an inroad in one aspect of this challenge, namely automated image analysis for the detection and tracking of defects in STEM of 2D materials and further proceed to extract microscopic point-defect reaction mechanisms from these observations.
Here, we analyze the phase evolution of Mo-doped WS 2 during electron beam irradiation. In this process, the electron irradiation results in removal of sulfur atoms, rendering the system oversaturated with respect to low-valence tungsten-sulfur moieties. We develop a deep-learning network for rapid analysis of this dynamic data, analyze transformation pathways, create a library of defects, and explore minute distortions in local atomic environment around the defects of interest, ultimately building a complete framework for exploring point-defect dynamics and reactions. Figure 1 shows several selected frames from a STEM "movie" of lattice transformations in the Mo-doped WS 2 monolayer under 100 kV electron beam irradiation. The full movie is available in the Supplementary Material. This movie was previously analyzed by some of the authors in the context of mesoscale phase transformations. 56 It can be seen that the system evolves with time, evolving numerous point defects. On accumulation of nonstoichiometry, the latter start to segregate, forming extended defects, nucleating secondary phases, and resulting in the segmentation and rearrangement of the 2D layer. The key task is to obtain information of interest about the atomic-scale defects. Unfortunately, most of the methods for localizing and identifying/ classifying defects available to date are slow, inefficient and require frequent manual inputs.

RESULTS AND DISCUSSION
To overcome the limitations of the available approaches, we developed a physics-based machine learning method for localizing and identifying defects. We exploit the fact that each defect is associated with violation of ideal periodicity of the lattice. Therefore, we train a convolutional neural network (cNN) using a single image at the early stage of the beam-induced transformation, when macroscopic periodicity is still maintained, and each defect can be readily discovered providing the "ground truth" for network training. Thus, a trained network relies only on the local characteristics of the image, and hence can identify defects on the later stages of system evolution when the long range periodicity of the lattice is broken due to a second phase evolution and displacement and rotation of unreacted WS 2 fragments. Furthermore, we find that the network can discover via "extrapolation" other defects which may not necessarily be a part of the initial training set. Such "extrapolation" is possible due to generalization abilities of deep-learning models. Indeed, we have recently demonstrated 57 that a deep cNN trained on the simulated images of an idealistic lattice vacancy structure can in principle generalize well enough to detect larger and more complex lattice vacancy structures in the system (e.g., double and triple vacancies, as well as reconstructed vacancies). The extracted defect structures can be identified/classified using unsupervised clustering and unmixing techniques. Finally, the selected defects can be studied further using local crystallography techniques, 58 such as a combination of atom finding and principal component analysis for analyzing minute atomic distortions in their vicinity in the "movies", as well as with a Markov analysis for identifying transition probabilities between different defect configurations.
As a first step of analysis, we defined the topology of a neural network to target specific physics of beam-induced transformations. The network must be able (i) to separate atomic-scale lattice disorder from the rest of the lattice, (ii) to return the precise location of the detected defects, and (iii) to be able to generalize to previously unseen defect structures. One possible candidate is the class activation maps-based deep-learning analysis, in which a model trained on image-level labels is capable in principle of discriminating the image regions used to identify the specific class 59 (defect). The disadvantage of such an approach is that one must start with manually selecting the isolated single defect structures to create a training set. In addition, we found that while this approach allows certain atomic defect structures to be located with sufficient accuracy, it has shown relatively poor generalization ability. The alternative approach is to use a fully cNN model, 60 which can be trained to output a pixel-wise classification map, with the same size/resolution as the original input image, that shows a probability of each pixel belonging to certain type of object (defect). This type of model has been recently successfully applied to finding lattice atoms in raw STEM data 57 and we therefore chose it for the current problem.
The next task is to create a training set that will be used to "teach" a model to find lattice defects in STEM movies, allowing for sufficient flexibility to discover all the defects but at the same time avoiding over-classification for classes that physically cannot be present in the data. We found that it is possible to train a network using only the first frame of such a movie or a single image obtained before recording a movie, and then let the trained network analyze the remaining part of the movie. This approach utilizes the fact that macroscopically (i.e., on the length scale of the image) the defects can be trivially discovered via the Fourier method, 61 providing the ground truth for network training. However, when trained, the network relies solely on local edge properties for identification and is thus stable toward formation of extended defects, rotations, and fragmentations of the lattice.
To identify the defects, we select a single image (frame) at the beginning of transformations (Fig. 2a). Once the image is selected, we performed a global Fast Fourier Transform (FFT) on the selected experimental image and applied a high-pass filter in reciprocal space in order to remove nonperiodic components of the lattice (Fig. 2b). We then performed an inverse FFT to obtain the periodic image and subtracted the original image from it (and vice versa) such that only the deviations from the ideal periodic lattice remained. 61 In this image difference, vacancies show up as bright spots. Next, the image difference is thresholded to find locations of the single defects (Fig. 2c). Note that the thresholded image represents the "ground truth" which will be used to train the cNN. The training set is created by performing data augmentation of the selected experimental image and the corresponding ground truth image. This augmented dataset can be used to train a neural network to return positions of atomic lattice disorder from raw experimental data (Fig. 2d). Once trained, not only is this cNN-based method for finding defects faster and more efficient than the method based on FFT subtraction, but it also allows, unlike the FFT method, to find the position of defects in the images of fragmented atomic lattices where multiple (joint and/or disjoint) lattice domains can be rotated by different angles with respect to each other. Because our model allows finding defects that break lattice periodicity irrespective of the exact type of the defect, we consider it to be a "universal" defect finder for a given material. Fig. 2 Training a deep convolutional neural network to recognize defects that break lattice periodicity. a The first frame from STEM movie on Mo-doped WS 2 . b Global FFT and global FFT with high-pass filter applied. c Binary masks for image differences between the original data in a and inverse of filtered FFT in b. The image in a is a training image and the data in c serves as 'ground truth' (pixel-wise labeling). d Schematics of convolutional neural network with an encoder-decoder type of structure We now use the cNN model trained according to the method described above (accuracy on the test set ∼99%) to locate atomic defects in dynamic STEM data on Mo-doped WS 2 . Interestingly, although our model was trained using only the first frame (out of 100) of the movie, it was able to accurately identify the positions of atomic defects in the remaining 99 frames (see Supplementary  Figure 1; full movie of the defects found can be downloaded from Supplementary Material). Once a sufficient number of defects (~10 4 in this case) is extracted via the cNN model it becomes possible to categorize them into different classes. To perform such a defect classification in an unsupervised fashion, we adapted a Gaussian mixture model (GMM). 62 The GMM was applied to a stack of defect "windows" (images of the identical size, usually 32px*32px cropped around the center of each defect) extracted using the pixel-wise classification maps in the cNN output. Here, we chose the number of GMM components to be five as this appears to be the optimal number of components for understanding the type of defect structures present in the data. Indeed, an increase in the number of components resulted in fine (sub-) structures of the detected defects, while decreasing the number of components produced some physically meaningless structures (see Supplementary Figure 2). We also note that the number of components past the purely exploratory stage can be adjusted based on known defect chemistry of the material (either from common physics principles, density functional theory calculations, or combinatorial analysis) The defect structures associated with the unmixed components of GMM are shown in Fig. 3a-e. The class 1 and class 3 (Fig. 3a, c) were found to correspond to a substitutional atom in W sublattice with a lower Z number, which we interpret as Mo dopant (Mo w ). Note that Mo atom does not occupy a symmetric central spot in these structures as one would expect for a lone Mo dopant. This suggests that there are additional distortions present in the defects that form classes 1 and 3, likely associated with a disorder in the S sublattice. Interestingly, presence of a coupling between distinct defect species has been recently observed in static STEM images from Mo-doped WS 2 system and attributed to merging of defects during growth and postgrowth procedures. 63 This comparison illustrates that as in other cases, systematic application of statistical and machine learning methods allows us both to recover earlier observations and, as we show next, derive new information about underlying physical and chemical processes. The classes 4 and 5 (Fig. 3d, e) are associated with a vacancy in the W sublattice (V w ) and in the S sublattice (V s ), respectively. The presence of adatoms/"contaminations" created during the e-beam surface transformations (e.g., chemical species from initial WS 2 material deposited back on to the surface in combination with carbon atoms) can explain the defect structure in class 2 (Fig. 3b). Figure 3f shows spatiotemporal trajectories ("brush diagram" 64 ) for the identified defects. Based on the analysis of the diagram, we identify three characteristic statistical behaviors: weakly moving trajectories, stronger diffusion, and "uncorrelated events"/"flickering". Presence of more than one characteristic behavior of the atomic defects may be potentially connected to the complex spatial character of strain fields during the material transformation, which may impact diffusion properties as well as create certain "localization regions", in which the motion of defects is restrained. 65,66 In the following, we will focus on the analysis of the continuous and quasi-continuous trajectories only. The most welldefined trajectories are associated with Mo dopants (class 1 and class 3). These Mo defects show different diffusion behaviors depending on their location in the lattice and are characterized by reversible switching between two configurations (class 1 and class 3) along their trajectories. The defects associated with S and W vacancies typically form shorter (compared to Mo defects) trajectories. One possible explanation is that these vacancies are becoming filled by the W and S species from the extended clusters of the deposited WS 2 material (although we did not find any associated correlations with point defects of class 2).
We now demonstrate that, based on the results produced by a combination of cNN and GMM, it becomes possible to estimate diffusion characteristics of the selected defect species. Particularly, we studied diffusion properties of S vacancies. We first collapse the 3-d spatiotemporal diagram for a chosen class of defect into a 2-d representation. For this purpose, we project the "windows" of specific classes of defect, which allows separating defects that are continuous in time from the randomly occurring ones (see Supplementary Figure 3). This analysis is complemented by a density-based clustering algorithm, 67 which yields similar results. After extracting defect coordinates for each selected defect "flow" (Fig. 4), we can obtain variance of each distribution and estimate a diffusion coefficient within a framework of a random walk model in two dimensions. This yields values of diffusion coefficient between 3 × 10 −4 nm 2 /s and 6 × 10 −4 nm 2 /s. We further proceed to the analysis of another type of defect, namely, the defect associated with a Mo dopant (classes 1 and 3). Here, it is worth noting that while the GMM-based decomposition into five components provides us with a good understanding of the major types of defects present in the system, it does not allow studying the fine details (variations) of the detected structures. Performing such an analysis is especially important for classes 1 and 3 that show peculiar switching behavior in Fig. 3f. We therefore investigated the "internal" structures of classes 1 and 3 using the so-called local crystallography analysis. 58 Specifically, we studied statistically significant deformation of the nearest neighborhood for each defect structure using principal component analysis (PCA). We first employ a deep-learning-based "atom finder" 68 that allows extracting positions of atoms from thousands of noisy images of defects in a matter of seconds (note that S atoms cannot be reliably identified at the current experimental resolution and hence we omit them). The first two PCA components associated with displacements from the averaged structure of the central Mo atom and six W neighbor atoms for each defect class are plotted in Fig. 5a, b. Since the Mo dopant does not considerably distort the WS 2 lattice, 63 the structural variations in PCA analysis must be associated with a disorder in the S sublattice. In general, one must exercise caution in assigning a specific physical meaning to the PCA components. However, the results shown in Fig. 5 strongly suggest a presence of strong variations in a relative position of central Mo atom with respect to neighbor W atoms, thus it is possible that these variations originate from the presence of S vacancies next to Mo dopant.
Based on the PCA analysis of the atomic displacements in Fig.  5a, b and general lattice symmetry considerations we use GMM to split the defect structures from classes 1 and 3 into four subclasses (Fig. 6b) associated with undistorted Mo w defect (no coupling to S vacancy) and three (Mo w + V s ) complexes (it is worth noting that the similar result can be achieved by splitting the entire stack of all the defect images into >12 classes, see Supplementary Figure 2). Our next goal is to analyze the switching behavior between different states. Using the same approach as described for the analysis of diffusion parameters we first identified continuous-intime defect trajectories for all the four subclasses from Fig. 6b, isolated them, and then converted them into the r(t) 1-d representation (Fig. 6a). In this case, each "flow" represents a sequence (in time) of defect structures undergoing switching between four different states. This observation suggests that the switching between different states can be analyzed as a Markov process, defining corresponding reaction constants on a single defect level.
The corresponding Markov transition matrix is depicted in Fig.  6d (see also the schematics of transitions in Fig. 6c). This analysis suggests the Mo w defect may couple to a S vacancy in the dynamic STEM experiment. To explain transitions between Mo w and (Mo w + V s ) we argue that, due to a lower diffusion barrier of a S vacancy, as well as higher probability of S sublattice atoms being knocked-out during the e-beam irradiation, it is likely that the S vacancy created in the vicinity of Mo dopant can get captured by it. Interestingly, we also found transitions between different (Mo w + V s ) structures. While the detailed explanation of such a behavior would require rigorous first-principles calculations and additional experiments, one can argue that the (Mo w + V s + V s ) structures are  In summary, we have presented a deep-learning-based approach for analysis of dynamic transformation of the lattice structure in STEM "movies" from Mo-doped WS 2 . We started by teaching a deep neural network how the defects that break lattice periodicity appear in STEM data using a single experimental image (frame 0) and then used the generalization abilities of the network to find various types of atomic defects in the rest of the experimental data. We then performed unsupervised classification of the detected defect structures using a Gaussian mixture model and showed that the classification results can be linked to specific physical structures. We were then able to (i) identify dominant point defects and their characteristic statistical behaviors in spatiotemporal diagrams, (ii) analyze diffusion for the selected defect species (S vacancy), and (iii) study transformation pathways for Mo-S complexes, including detailed transition probabilities. In this manner, point-defect dynamics and solid state reactions in the material are studied on the atomic level, and corresponding reaction constants are determined for just one point defect. As far as the future studies are concerned, we believe that one particularly promising direction is incorporating specific physicsbased constraints into machine learning based analysis of STEM videos. Indeed, the current approaches treat observed lattice defects as collections of pixels, without "understanding" the physics behind the observations. One possible way of overcoming such physics-agnostic classification is by integrating a Markov model into the initial search and identification/classification scheme. The Markov model can be guided by the theoretical calculations of interaction potentials on the atomic level, enforcing physical constraints to transition probabilities of atoms and molecules, effects of electron beam irradiation, and operating both in space (hidden Markov random field) and time (hidden Markov model) domains. For example, one may incorporate transition probabilities between certain types of defects (e.g., reconstructed vs. nonreconstructed defect), as well as a maximum diffusion length of a defect for a given time scale calculated from first principles, and with a Markov model use it to refine the results of the initial classification. This would be an important step towards creating a fully autonomous, AI microscope that could make decisions based on the knowledge of physics that it was "taught".

Sample preparation
The Mo-doped WS 2 monolayers were grown on SiO 2 /Si substrate at 800 o C by a low-pressure chemical vapor deposition. 43 To prepare STEM samples, poly(methyl methacylate), PMMA (A4), was first spun onto the SiO 2 /Si substrate with monolayer crystals at 3500 rpm for 60 s. After being cured at 100°C for 15 min, the PMMA/W 1−x Mo x S 2 sample was detached from the substrate with a 30% KOH solution (100°C and 0.5−1.0 h). The sample was then transferred to deionised water to remove the KOH residue. The washed film was scooped onto a QUANTIFOIL TEM grid. The PMMA was then removed with acetone, and the samples were soaked in methanol for 12 h to achieve a clean surface with flakes. To remove the polymer, the TEM grids were then annealed in an Ar flow (90 sccm, 10 torr) at 350°C for 3 h. STEM experiment STEM imaging was performed using a Nion UltraSTEM U100 STEM operated at 100 kV. The images were acquired in high angle annular dark field imaging mode and were introduced to the deep cNN without any post processing.

Data analysis
The deep cNN was implemented using Keras 2.0 (https://keras.io) Python deep-learning library, with the TensorFlow backend. The cNN had an encoder-decoder type of structure. The encoder part had alternating convolutional layers for feature extraction with filters of the size 3 × 3 and stride 1 activated by a rectified linear unit function and max-pooling layers of the size 2 × 2 and stride 2 to account for translational invariance as well as for reducing the size of processed data. The decoder part of the network, whose role was to map the encoded low-resolution feature maps to full input-resolution feature maps, consisted of the same filters (in reverse order) and upsampling layers. The feature maps from the final convolutional layer of the network were fed into a softmax classifier for pixel-wise classification, providing us with information on the probability of each pixel being a defect. The Adam optimizer 69 was used for training. The Gaussian mixture model was implemented with scikit-learn machine learning library (http://scikit-learn.org).

DATA AVAILABILITY
The complete workflow for studying defects in dynamic STEM data, which includes creation and training/testing of DL model, unsupervised defect classification, analysis of diffusion characteristics, local crystallography analysis and Markov transition matrix analysis, is available in a form of Jupyter notebooks in the Supplemental Material and at https://github.com/artemmaksov/ORNL-DeepLearningForAtomicScaleDefectTracking.