## Introduction

In the past decade, the widespread availability of aberration-corrected annular dark-field scanning transmission electron microscopy (ADF-STEM) that offers reliable atomic-scale imaging of materials, has enormously benefited many fields ranging from nanocatalysts and batteries to electronic and structural materials. Using advanced aberration-corrected ADF-STEM, direct acquisition of real-space images with 50-pm resolution can be achieved at high acceleration voltages (300 keV)1,2. Recently, Muller and his coauthors have demonstrated that by combining a ptychography technique with a highly sensitive pixelated detector, the resolution envelope can be extended to 39 pm even at low acceleration voltages (80 keV), a condition that can greatly reduce electron beam damage to low-atomic-number materials while retaining ultrahigh resolution3. However, acquiring and maintaining these high-resolution instruments incur high costs and to date recording high quality atomic-scale data is still a time-consuming process—high-quality STEM images are not always available, due to many environmental factors, such as scan jittering, temperature fluctuations, stray electromagnetic fields, sample charging and drifting. In non-ideal ADF-STEM images that are contaminated by noise and distortions, the atomic arrangement might still be recognizable by experienced electron microscopists, but some low-contrast atomic details might not be easily detectable by inexperienced operators. Therefore, it is highly desirable to develop a robust method to detect and localize atoms/atomic columns and restore the atomic-scale information in non-ideal ADF-STEM images. Such methods, if available, can greatly reduce misinterpretation, bias, and human errors. It will not only be a valuable tool to student researchers and materials scientists who use ADF-STEM as a tool but also can assist experienced electron microscopists in automated analysis of large datasets.

Atomic column localization and segmentation in atomic-resolution scanning TEM images with high precision and high robustness is non-trivial. Although several algorithms including graph methods4, clustering methods5,6,7,8,9, threshold methods10 and edge detection methods11 can achieve reasonable performance in pre-defined sceneries, they tend to fall short when noises are strong and interferences are unpredictable. Particularly, for atomic-scale scanning TEM images, to date there is no established algorithm that is sufficiently robust to detect all atomic features when there is large thickness change in an image. For instance, without human supervision, it is non-trivial to localize the dimmer atomic columns on or near the edge/surface of a particle due to the lower contrast and intensities. Herein, we report the development of a training library and a deep learning method that can perform robust and precise atom segmentation, localization, denoising, and deblurring/super-resolution processing of experimental images. Taking a step further, we have deployed our models to a desktop app with a graphical user interface. The app is free and open-source and it is available for download on Github12. We have also built a TEM ImageNet project website for searching, browsing, and downloading of the training images and labels13.

With the availability of affordable high-bandwidth computing hardware, deep learning or deep convolution neural networks (CNNs) that use multilayer artificial neural networks to achieve human-competitive or superhuman accuracy has gained great traction in both the research and commercial application domains14,15,16,25,26. Deep learning is now considered the “Holy Grail” for Computer Vision and deep learning models are increasingly being deployed to application areas that utilize object detection, recognition and classification17,18. Even though most of the theoretical frameworks for deep learning were developed by the 90 s, the deep learning field did not witness a breakthrough or a surge in results until 201119,20. What really has changed the field in the past 5–7 years are the availability of massive labeled data sets, GPU computing, and investments from the IT industry to create open software frameworks for deep learning21. The strong suit of deep CNNs is that, given enough training datasets, it can localize and classify features and patterns in images with high accuracy, precision, and robustness. Therefore, it is well poised for the study of ADF-STEM images of interest in this article. For example, Ziatdinov et al. developed a “weakly supervised” approach and combined it with deep learning to achieve chemical identification and tracking of local transformations in atomic-resolution images of graphene22. LeBeau et al. used AlexNet, a version of deep CNNs primarily used for classification, to preprocess SrTiO3 convergent beam electron diffraction patterns and determine crystal thicknesses23. Huang et al. used CNNs to locate defects and extract strain fields in 2d materials24. Xin et al. used generative adversarial models to inpaint and restore the missing-wedge information in electron tomography datasets25,26. Even though promising, the deployment of deep learning in the STEM imaging field is somewhat slow compared with other fields. The stagnation is partly due to the lack of sufficiently labeled database for training in which all categories of materials, such as crystalline, amorphous and 2d materials, are considered. Depending on the application, images from all resolvable crystallographic orientations also need to be included in the dataset. The magnitude of the library makes it impossible to collect all images experimentally and label all atomic columns by hand. In addition, training with human labeled data is not always desired because the model’s precision and accuracy is ultimately limited by human’s error rate.

## Methods

In this section, we describe in detail the method used for generating the training data set and then we present the neural network structure and the training strategy.

### Training datasets

Recording atomic-resolution ADF-STEM datasets for a large library of crystal structures with known ground-truth labels is an extremely time-consuming project. Even though the high-resolution images with satisfying quality can be obtained regardless of the cost, it is not a trivial task to define the atomic column labels with high precision and accuracy. To mitigate this problem, we create a forward model that can simulate the experimental-like ADF-STEM images of different atomic structures from different crystallographic orientations with realistic noise models. In this way, the ground truth atomic positions are pre-defined. It is also time efficient to create an experimental-like ADF-STEM image set that comprises of a large number of spatial symmetries, atomic arrangements, zone axes, different noise levels and random backgrounds which can greatly improve the robustness of our models.

#### Forward model

In this study, we used a simple linear imaging model which simulates ADF-STEM images by convolving the projected atomic potential of a material with the point spread function (PSF) of a scanning transmission electron microscope. Here, we only use the simplified version of the linear imaging model which disregards the three-dimensional shape of the point spread function because other than reducing contrast, it is a very subtle effect on atomic resolution images

\begin{aligned} I\left( {x,y} \right) & = \iint {\sigma \left( {x^{\prime},y^{\prime}} \right)\left| {{\Psi }\left( {x - x^{\prime},y - y^{\prime}} \right)} \right|^{2} dx^{\prime}dy^{\prime}} \\ & = \sigma \otimes {\text{PSF}} \\ \end{aligned}

and

$$PSF\left(x,y;df\right)=\frac{4{\pi }^{2}}{{k}^{2}}\left|\int H\left({\varvec{k}}\right){\mathrm{exp}\left[-i\chi \left({\varvec{k}};df\right)-2\pi i{\varvec{k}}\cdot {\varvec{r}}\right]d}^{2}{\varvec{k}}\right|$$

here, we opt out using full quantum mechanical methods, such as Bloch-wave and multislice simulations, to simulate images because the simple linear imaging model we employ here is computationally much more affordable. (For STEM simulation, calculating an $$N\times N$$-pixel image requires $$N\times N$$’s multislice simulations. However, the computational complexity of our method is equal one single multislice simulation of a very thin sample.) Using the simple linear imaging model, we can render ten thousand 256-by-256 images within minutes whereas even with GPU acceleration it would still take days for the multislice simulation to compute them. For creating a static library, multislice simulation has its merit as it captures most of the scattering physics. However, for on-the-fly local training, the simple linear imaging model is more desirable because of its speed.

In addition, it has been shown that the apparent atomic column positions in ADF-STEM images may not always correspond to the actual atomic positions27. This type of quantum phenomena is heavily crystal structure, thickness and orientation dependent. In addition, quantum mechanical simulation offers quantitatively correct column contrast in the simulated images which is one subtlety that can be compensated by other adjustments of the training sets. (The column contrast can be adjusted in our training images by changing the PSF and the background level.) Because our models only aim at reporting the apparent positions of the atomic columns, a simple linear imaging model is sufficient.

From the view of generating static libraries for the community, we have created two versions of the same library with one simulated by the simple linear imaging model28 and one with the multislice simulation29.

The second part of the forward model is the simulation of realistic noises in the ADF-STEM images. The primary sources of noise of ADF-STEM are the shot/Poisson noise (also known as counting noise) and the scan noise, which we will describe in detail as follows.

#### Poisson noise

For a given pixel, the expected number of incoming electrons is calculated by $$n={t}_{dwell}\times I/e$$. The counted electrons in this pixel follows the Poisson distribution, $$P\left(n\right)={e}^{-n}{n}^{k}/n!$$ (Because photomultiplier has extremely high quantum efficiency, we ignore the propagation of additional noises.)

#### Scan noise

Random or periodic electromagnetic field or circuit level interreference can cause the beam to deviate away from the expected scanning position; therefore, the effect of the scan noise is a geometrical transformation of the ideal images. We denote the deviation vector by $${{\varvec{\delta}}}_{i,j}=({\delta }_{x}^{i,j},{\delta }_{y}^{i,j})$$ where i is the row number, and j is the column number. We define the horizontal direction is the fast scanning direction and the vertical direction is the slow scanning direction. Here, for simplicity, we assume that the beam deviation vector does not change when it scans through a horizontal row, i.e. the deviation vector $${{\varvec{\delta}}}^{i,j}={{\varvec{\delta}}}^{i}$$ and $${\delta }_{x}^{i}$$ and $${\delta }_{y}^{j}$$ both follow the same normal distribution modulated by the periodic line frequency, i.e.

$$\delta_{i,x} = f(i|Normal\left( {\mu = 0,\sigma = 1} \right)) \times \sigma_{jitter} {\text{sin}}\left( {2\pi ft} \right)$$
$$\delta_{i,y} = f(i|Normal\left( {\mu = 0,\sigma = 1} \right)) \times \sigma_{jitter} {\text{sin}}\left( {2\pi ft + \phi_{0} } \right)$$

So, the final image is a transformation of the ideal image I0 by

$$I\left(i,j\right)={I}_{0}(i-{\delta }_{x}^{i},j-{\delta }_{y}^{j})$$

Some example images of how the noise model affect the images are shown in Fig. 1.

#### Library composition and augmentation

To construct a training library with a variety of spatial symmetries, column contrast, and thickness effects, we have included images of the bulk structures of the following materials and orientations: Pt [001], Pt [110], NiO [001], NiO [110], SrTiO3 [001] and [110], DyScO3 [110], Si [110], graphene, amorphous graphene, single-layer MoS2, rutile TiO2 [001], [100] and [110], (Li)CoO2 [010]. We have also included images of the faceted Pt nanocrystal to increase the robustness of finding atoms at boundaries and edges of nanoparticles and interfaces.

To enable robust and scale-free training we have included the following randomized operations in Table 1 in the simulation of the training images and some example images are shown in Fig. 2.

#### Ground truth labels

We have trained our model to perform atom segmentation, atomic-column Gaussian mapping, intensity-preserving super-resolution (deblur) processing, denoising and background removal. Their respective ground truth labels are shown in Fig. 3 and Table 2. The width of the circular mask is defined by the full width at half maximum of the point spread function and the width of the Gaussian mask is 0.2 angstrom.

### Network structure

For the atomic column segmentation, super-resolution/deblur processing, we deployed an encoder–decoder type, U-net architectured CNN network. It has been shown that U-net can work with very few training images and yields precise segmentations for cells tracking tasks30. One important feature of U-net is that it concatenates high-resolution feature channels, which directly come from the encoding layers, with the decoding layers to preserve high-resolution context information.

In our model, the contracting path (left side in Fig. 4) consists of the repeated application of two 3 × 3 convolutions, a rectified linear unit (ReLU) and a 2 × 2 max pooling operation with stride 2 for downsampling; the expansive path (right side in Fig. 4) consists of an up-convolution, a concatenation with the corresponding feature map from the contracting path, two 3 × 3 convolutions and a ReLU.

#### Loss function and training strategy

In our test trainings, we found that mean squared error (MSE) loss function has the tendency to increase false positive rate because the ground truth labels cover a small fraction of the total image area. Therefore, we use a modified chi-square function:

$${\chi }_{mod}^{2}=\sum \frac{{\left({I}_{i,j}-{I}_{i,j}^{ground truth}\right)}^{2}}{{I}_{i,j}^{ground truth}+\mathrm{max}\left({I}^{ground truth}\right)/10}$$

This loss function penalizes false atoms in the background area.

The dataset is split to training set and testing set randomly, with the training set percentage as 75%. The final average training loss after 200 epochs is 0.0174 and the final average testing error is 0.0195. Batch size is 4.

#### Atom localization

Otzu’s method is implemented to binarize the atomic features from the map generated by our models31. After binarization, each disconnected area is considered an atomic column. The column positions are localized by calculating the geometric centers of the disconnected area. This Otzu’s localization method performs the best when couple with outputs from models that were trained on the Circular Mark and Gaussian Mask ground truth labels.

#### Benchmark methods

We use transfer learning to customize a pre-trained faster R-CNN network32 for direct atomic detection. We have also implemented two-dimensional (2d) Gaussian fit to determine atomic column positions. 2d Gaussian fit is considered the golden method in the transmission electron microscopy (TEM) field for atomic column localization33. These two methods are used as baselines to benchmark the precision of the Otzu’s atom localizer described above.

## Result and discussion

### Validation of the AtomSegNet models using TEMImageNet

To validate our AtomSegNet models, we have performed visual inspection of the performance of various trained models using data from the validation set. Figure  5 shows validation results of the models that were trained to perform super-resolution/deblurring, atom circular segmentation, atomic-column Gaussian mapping, denoising and denoising + background removal using the following labels, noNoiseNoBackgroundSuperresolution, circularMask, gaussianMask, noNoise, noBackgrounnoNoise.

By applying our models, the centers of the atomic columns can be clearly identified by the following three networks, super-resolution/deblur processing, atom circular segmentation, and atomic-column Gaussian map. The difference between the super-resolution/deblur processing and atomic-column Gaussian mapping is that the deblurred/super-resolved map preserves the column intensity whereas the atomic-column Gaussian mapping equalizes the intensity of all atomic columns. However, all three methods can accurately recognize the atomic columns even in conditions with strong background and noise contamination. Atom localization was performed on the Gaussian maps using the Otsu’s method. Its precision benchmarked against baseline methods, Faster R-CNN and 2d Gaussian fit, will be discussed in the next sub-section.

### Precision of Otsu’s atom localizer

To understand the precision of the different localization method, we compare the Otsu localization method with the 2d Gaussian fit method and Faster R-CNN. 2d Gaussian fit is considered the golden method in the TEM field for atomic column localization and faster R-CNN is a deep learning-based method used for direct object detection in 2d images32. We have used transfer learning to customize a pre-trained Faster R-CNN to our TEM ImageNet datasets.

To benchmark the various methods, we have chosen a simulated image with moderately low peak signal-to-noise ratio (PSNR = 20.4 db or 10 in linear scale) and the ground-truth atomic column positions are known. Figure 6a shows the simulated image of a Pt nanoparticle projected along the [110] direction. Figure 6b shows the atomic-column Gaussian map with the Otsu’s atom localization results overlaid in the bottom half of the image. Figure 6c shows the Faster R-CNN predicted atomic positions as green boxes. Figure 6d,e show the histogram of the deviations of the extracted atomic column positions from of the ground truth labels in x and y directions, respectively. It is easy to see the Fast R-CNN is the least precise (largest $${\sigma }_{x}$$ and $${\sigma }_{y}$$). This is because the region proposal network in Faster R-CNN is designed to be fast but precise. Our Otsu’s atom localization method coupled with our atomic-column Gaussian map network gives a result that can even outperform 2d Gaussian fit. The reason that our deep learning method outperforms the traditional 2d Gaussian fit is likely attributed to the following: 2d Gaussian fit is based on least square minimization which assumes the noise follows a normal distribution but the dominant noise in the image follows a Poisson distribution. Our neural networks, on the other hand, has learned how to correctly handle the Poisson and the scan noises as well as the priors about the [110]-projected fcc structures. Because the Faster R-CNN method is not sufficiently precise, we only included the Otsu’s localization method in an open-source AtomSegNet App12.

### Validation of the models on images simulated by the mutlislice method

We have tested the robustness of our model by applying them to images simulated by the multislice method with aberration and thickness effects are incorporated. As shown in Fig. 7, although we have chosen an aberration and defocus condition so that the halo effect is strong, our atomic column Gaussian mapping model faithfully resolve the Sr and Ti atomic columns that are present in the ADF-STEM images.

### Validation of the models on experimental images

Our model performs well on our validation set, but experimental ADF-STEM images may be interfered by other factors that are not considered in our training sets, for example astigmatism, insufficient resolution, incoherent electron source, and thickness-induced low contrast. Therefore, it is critically important to validate the robustness of our model on experimental images that could contain the these interferences.

### The known knowns

We first apply the AtomSegNet models to ADF-STEM images of periodic crystals, such as DyScO3 [110] (DSO) and silicon (Si) [110]. Because their structures were in the training dataset, they are considered the known knowns. Figure 8a showcases the AtomSegNet processing of a large area of the DSO and Fig. 8c shows a magnified area. Figure 8b presents the processed images of Si. For both structures, the atomic columns are detected and localized accurately showing no false positives or false negatives and the denoising results look qualitatively sensible. It demonstrates the robustness of our model when applied to real ADF-STEM images of crystalline materials. It is worth noting that the DSO sample was prepared with Focused Ion Beam and it leaves many redeposited residuals on the surfaces. The background removal model performed well on removing these unwanted low-frequency features. In addition, upon close inspection, Fig. 8c shows the image has residual aberrations that renders an asymmetric tail on the Dy columns (the brighter columns). In the Denoised image, the tails are still visible as they should be because a denoise process should only remove noise and does not alter the real content of an image. The donoise + background removal model, however, corrected the aberrated tails. One can think of it as a process that convolves an unaberrated point spread function with the deblurred/super-resolved map. This is a process that was learned through training.

Another challenging issue in the TEM field is to localize the edge/surface atoms when there is large thickness variation in the recorded images. Observing surface atomic structure is significant for understanding the reaction and degradation mechanism of catalysis and electrode materials since most part of the reaction takes place at the 2–3 atomic layers on the surface. For 2d materials with uniform thickness, edge atoms detection is straightforward. However in nanoparticle samples, due to the large thickness variation from surface to the bulk interior, the surface atomic columns have lower intensity and are illegible. Herein, we use the noble metal Pt and Au nanoparticles and intermetallic PtFe as examples to demonstrate the capability of our AtomSegNet models for edge/facet atoms detection. It is worth noting that in the ADF-STEM image, the contrast is sensitive to the projected atomic mass of the underlying atomic columns which is commonly referred to as Z-contrast. Therefore, the intensity of PtFe intermetallic atom columns is affected by not only by the thickness variation but also the large atomic mass difference between Pt and Fe. The results shown in Fig. 9a,b indicate the surface atoms are accurately detected and segmented without ambiguity in both images. Taking a further step, we also tested the capability of edge detection where there is large thickness variation. Figure 9c shows the surface area of an Au nanoparticle with gradually varied thickness, in which the thinner atom columns close to the surface have lower intensities. The result shows that all the surface atom columns are precisely detected and localized without having to choose any hyperparameters. The outstanding results suggest our model is highly robust and capable in edge/facet atom segmentation and localization.

### The unknown knowns

We have tested the performance of the AtomSegNet models on a spinel structured material, Co3O4. We call this test an ‘unknown known’ test because this type of pattern was not included in the TEM ImageNet training dataset and the image is under-resolved which is also a condition not included in the training dataset. Figure 10a shows the original under-resolved ADF-STEM image of Co3O4, in which the adjacent Co atoms are too close to be clearly resolved. Unless the real atomic structure is known in advance, the Co atoms cannot be detected and localized precisely even with the assistance of experienced electron microscopists. Unexpectedly, with our model, the Co atomic columns are accurately recognized in the under-resolved image, corresponding well with the real structure (Fig. 10). This result demonstrates our models’ ‘superhuman’ capability to resolve structures and discover the “knowable unknowns”.

The reason that our debluring/super-resolution processing network exhibit ‘superhuman’ capability because it essentially learned the regularization from the training images and could perform tasks similar and beyond those formulated in Ref.34. On top of that, it is extremely fast (a few milliseconds of processing time with a GPU) and it does not rely on parameter tuning, i.e. adjusting fitting parameters for the proper point spread function etc.

### Precision analysis on experimental images

A deep-learning model’ accuracy can only be evaluated through comparing the model’s outputs with the ground truth labels. For experimental images, however, the ground-truth atomic column positions are not available, not even for crystalline materials of a known type because the atomic column positions in the images are affected by environmental factors such as sample charging, thermal drift and stray fields that cannot be characterized with high precision. However, we can evaluate the precision of our model by measuring the column-to-column spacing, i.e. the a and b lattice parameters shown in Fig. 11e33. It is worth noting that the precision measured here is the model’s own noise-limited precision compounded by precision loss induced by random distortion. Since the lattice points used for extracting a and b lattice parameters are spatially close to each other, we assume they share the same random distortions and hence the random errors can be canceled out or minimized.

Figure 11ci,ii show histograms and the precision measurement for x and y lattice parameters extracted by our deep-learning methods and Fig. 11ciii,iv shows the results extracted by 2d Gaussian fit on the original ADF-STEM image of Si [110] (Fig. 11a). It is shown our deep-learning method report precisions of 7.14 pm in a direction and 6.78 pm in b direction. These are uniformly better than those the 2d Gaussian fit method (7.57 pm and 8.25 pm in a and b, respectively). Please note the pixel size of the original ADF-STEM is 9.4 pm and the estimated peak signal-to-noise ratio is 21 db (12 in linear scale). Our Gaussian localization model reaches sub-pixel precision, and the performance is on par to the benchmark measurement in Fig. 6.

Another question worth asking is that would our denoise model help improve the precision of atomic column localization. Figure 11b shows the image after processing by our denoise + background removal model. We then applied the atomic-column Gaussian map model to the denoised image and the output is shown in the inset of Fig. 11b. Figure 11d shows in both a and b directions, there is a slight improvement in precision for both our deep-learning method and the 2d-Gaussian fit method. Again, out deep-learning based method reports higher precision than the 2d-Gaussian fit method. In addition, we can use the central limit theorem, i.e. $${\sigma }_{mean}=\frac{\sigma }{\sqrt{N}} \to N={\left(\frac{\sigma }{{\sigma }_{mean}}\right)}^{2}$$, to estimate how many images would be needed to achieve 1-pm precision in column localization.

Given the $${\sigma }_{a}$$ is 6.77 pm and the desired $${\sigma }_{mean}$$ is 1 pm, N is equal to 45. It means a series of 45 images is needed to achieve sub-pm precision. This number is on par with the number reported by Voyles et al.33.

## Discussion

In this work, we show that using forward modeling based on a linear imaging model, we can rapidly generate near realistic atomic-resolution ADF-STEM images for training many networks with high performance and general usage such as column mapping, deblur/super-resolution processing, denosing, etc. Here, we perform this numerical experiments to verify the effectiveness of the simple imaging model for the following reasons. First, we create this simple forward model envisioning that we will in the future generate training library on-the-fly while the operator is working on the microscope. Our fast and simple forward model can inject priors into a new library much faster than the multislice simulation. Second, the models we desire to train are for column recognition and denoising. Therefore, as long as we can provide a sufficiently diverse library that have different column separation, different coordination structure, and different intensity distribution, the model will be well trained without over fitting the data. Third, although multislice simulation is now much affordable now than before, it is still a computationally intensive task. It is worth noting that, even with the GPU acceleration, it still took us 2 days to simulate the entire library of images; whereas it would take only 10 min for a laptop to finish the calculation using our simple linear imaging model. Fourth, our models are scale-free models. It just works like our eyes and brains; it recognizes column patterns regardless of their physical separations. For the ground truth labels, if we use our simple forward model, we can easily rely on the atomic positions as the ground truth label without having to do any refitting to the generated images. Again, from on-the-fly deployment point of view, it is much faster and safer to adopt the single linear imaging method we proposed in this manuscript as it is fool proof and less prone to artifacts. We want to emphasize that our forward model, in terms of column position, is universally correct as it is the thin sample limit of the multiscale method.

## Conclusion

Detection and localization of atomic columns and the restoration of atomic-scale information in non-ideal ADF-STEM images are highly important for characterizing the atomic structure and understanding the structure–property relationship. However, atom localization through deep learning remains challenging partly due to the lack of sufficiently labeled database for training which is considered an extremely time/cost consuming project. To solve this problem, we create a forward model that can simulate the experimental-like ADF-STEM images. Using this forward model, we have created a TEM ImageNet library composed of training images of different atomic structures from different crystallographic orientations with realistic noise models. By training on this TEM ImageNet library, our deep-learning method can readily self-adapt to the experimental ADF-STEM images and show outstanding robustness in some challenging tasks such as deblurring/super-resolution processing, atom segmentation/localization and edge/surface atom detection. Our models also consistently outperform the precision of the golden method, 2d Gaussian fit, in locating atomic column positions. Furthermore, we have deployed our model to a desktop app with a graphical user interface and the app is free, open-source and available for download on Github12. Our model will not only be a valuable tool to researchers and materials scientists but also can assist experienced electron microscopists in automatic analysis of large datasets.