Deep convolutional neural networks to restore single-shot electron microscopy images

State-of-the-art electron microscopes such as scanning electron microscopes (SEM), scanning transmission electron microscopes (STEM) and transmission electron microscopes (TEM) have become increasingly sophisticated. However, the quality of experimental images is often hampered by stochastic and deterministic distortions arising from the instrument or its environment. These distortions can arise during any stage of the imaging process, including image acquisition, transmission, or visualization. In this paper, we will discuss the main sources of distortion in TEM and S(T)EM images, develop models to describe them and propose a method to correct these distortions using a convolutional neural network. We demonstrate the effectiveness of our approach on a variety of experimental images and show that it can significantly improve the signal-to-noise ratio resulting in an increase in the amount of quantitative structural information that can be extracted from the image. Overall, our findings provide a powerful framework for improving the quality of electron microscopy images and advancing the field of structural analysis and quantification in materials science and biology.


INTRODUCTION
The quality of modern electron microscopes, such as scanning electron microscopes (SEM), scanning transmission electron microscopes (STEM), and transmission electron microscopes (TEM), has greatly improved.However, the quality of the experimental images produced by these instruments is often compromised by stochastic and deterministic distortions arising from the instrument or its environment [1,2,3].These distortions can occur during the acquisition, transmission, or reproduction of the image.Despite technical improvements in the design of high-performance electron microscopes [1,2,3,4], the presence of these distortions in the recorded images may hinder the extraction of quantitative information from the samples under study [5].
In TEM, images are acquired in a single shot using parallel acquisition.Here, the main sources of distortions are the detector noise, which is a combination of counting noise associated with the uncertainty of photon/electron detection, dark current noise resulting from statistical variation in the number of thermally generated electrons within the detector, and readout noise resulting from the electronics that amplifies and digitizes the charge signal.Other sources of distortions for TEM include X-ray noise, which is produced by X-rays that saturate one or more nearby pixels as they pass through the detector [6,7], and dead pixel noise, which is caused by permanently damaged pixels on the sensor and often appears as black spots in the recorded images.
In S(T)EM, images are formed pixel by pixel by scanning a convergent electron beam across the sample and detecting the scattered, back-scattered or secondary electrons at each point.The main sources of distortions are the detector noise, which is a combination of shot noise hitting the scintillator, Gaussian noise resulting from the photomultiplier tube (PMT) [8], and readout noise from the electronics that amplifies and digitizes the electron signals.Unlike TEM imaging, the serial nature of SEM and STEM can introduce additional distortions into the resulting images due to time delays between measurements.At high doses, the main source of nonlinear distortion is the probe's fly-back time, where data collection pauses until scanning on the next line resumes.This produces a net two-dimensional random displacement of the pixel row known as horizontal and vertical scan distortion.These nonlinear distortions can often be corrected using iterative algorithms that require a series of images [9,10] or a single image with a high-resolution periodic structure [11,12].Moreover, S(T)EM images obtained through high-speed scans (dwell time < 1µs [13]) may display a non-uniform scan speed along individual scan lines resulting in a smearing effect that produces another type of nonlinear distortion.While these distortions can be partly compensated for periodic structures [13], they cannot be fully compensated for arbitrary specimens.Other types of distortion include row-line noise, which is caused by the detector's non-response over a few pixels, and X-ray noise, which is produced by X-rays hitting increases.This is expected since a deeper network can improve the performance of the model by increasing the number of parameters and allowing the model to learn more complex features.We would like to highlight that our hardware constraints only allow us to use a maximum of 9 layers for n lay .Nonetheless, we observed that the L 1 error starts to reach a plateau for n lay = 9, indicating that increasing the number of layers may not lead to substantial performance improvements.
Furthermore, we compared the performance of three different image denoising architectures: the Grouped Residual Dense Network (GRDN) [23], the Multi-resolution U-Net (MR-UNET) [31], and our proposed architecture, CGRDN.We assessed the performance of these architectures using the well-known peak signal-to-noise ratio (PSNR), which is defined as: where MAX denotes the maximum possible pixel value of the images, and MSE represents the mean squared error between the distorted and undistorted images.However, it is important to note that PSNR only measures the pixel-wise differences between the original and reconstructed images and does not account for other crucial factors such as visual perception and structural similarity.The GRDN architecture was previously ranked first in terms of PSNR and structure similarity index in the NTIRE2019 Image Denoising Challenge.The MR-UNET extends the functionality of the decoder in a U-Net [36] by adding additional convolutional layers to the hidden layers in order to produce coarse outputs that match low-frequency components.The results of our comparison are summarized in Table 1, which shows the number of parameters and the resulting PSNR for each architecture and show that the GRDN and CGRDN are more efficient architectures because they require approximately 7 times fewer parameters than the MR-UNET, while still achieving a higher PSNR.It is interesting to note that our CGRDN architecture achieved a higher PSNR than the GRDN, while only requiring an additional 20,000 parameters.
Table 1.PSNR denoising performance comparison of different network architectures.
Method # parameters PSNR MR-UNET [31] 51.7M 36.70dBGRDN [23] 7.02M 36.90dBCGRDN this work 7. 04M 36.96dBWe also compared the performance of our image restoration network to the Block-matching and 3D filtering (BM3D) [18] algorithm in terms of PSNR.BM3D is a widely used technique for removing noise from images through a process called denoising.It segments the image into overlapping blocks and identifies similar patterns among them to estimate the original image and reduce noise.BM3D has demonstrated effectiveness in denoising images with high levels of noise and serves as a benchmark for image denoising algorithms in image processing.The average PSNR of BM3D and our network on the validation dataset was 30.45 dB and 36.96dB, respectively.These results demonstrate that our network outperforms BM3D by a significant margin of 6.51 dB. Figure 2 illustrates the performance of our network and BM3D on two randomly generated, high-resolution STEM images with standard experimental noise values.These images were simulated using the procedure outlined in the "Data generation" section.The figure displays  demonstrate that our image restoration network significantly enhances image quality, as measured by PSNR.However, it is noteworthy that PSNR is not always a reliable indicator of image quality since it merely measures pixel-wise differences between original and reconstructed images and overlooks other critical factors such as visual perception and structural similarity.Hence, it is crucial to employ various image quality metrics, along with PSNR, to obtain a more comprehensive evaluation of the performance of image restoration techniques.

Atomic structure quantification
While the CNN was trained to restore images of a wide variety of imaging modes, STEM is of particular interest since it is routinely used for the quantification of atomic structures [37,38,39] in terms of atomic column positions and their corresponding scattering cross sections (SCS), which allows us to study the impact of the proposed image restoration method quantitatively.The probe position integrated scattering cross section, short SCS, in atomic resolution STEM images is defined as the integrated intensity of an atomic column, which is typically modelled as a 2D gaussian function.Since the SCS scales with the atomic number ≈ Z 1.7 [40,41] and mostly increases monotonically with thickness for large collection angles, it is routinely used for atom counting.The evaluation of the effect of image restoration on the quantitative assessments of STEM images is done in three complementary approaches, using MULTEM [42,43] to create multislice simulations and the StatSTEM software for all model fittings [39].All evaluations are based on 100 distortion/noise realisations for each dose setting.
column positions and the mean absolute percentage error (MPE) for the SCSs of atomic columns, as well as the variance of these measurements.This serves to show in particular the independence of the approach on the structural periodicity for atomic-resolution STEM images.
3. For a simulated Pt-nanoparticle it is demonstrated that distortion correction yields not only a more accurate localisation of atomic columns but also enables more reliable atom counting.
The simulation settings for all samples are tabulated in the supplementary information.The results of the first study are shown in figure 3. Examples of the underlying STEM images are given for the extremes of SNRs (i.e.smallest thickness and lowest dose and largest thickness and highest dose) for raw and restored images in panels (e), (f), (g) and (h).Comparing figure 3(e) and (f) it can be seen visually that even at a very low dose, the CNN can recover the underlying structure faithfully.This effect is measurable both in terms of the precision with which atomic columns can be located, as well as in SCS measurement precision, and is particularly pronounced in the low dose range as illustrated in figure 3(a) and (b).As the dose increases the precision of the structural measurements of raw and restored data converge eventually (figure 3(c-d)).An interesting observation is that the theoretical precision limit given by the CRLB, can be overcome employing image restoration.This makes a strong point for using image restoration for quantitative studies, like atom counting or strain measurements in general.The restoration results in the first example arguably benefit from the underlying perfect crystal symmetry, which is why we test the CNN also for imperfect structures.The Pt-bulk model depicted in figure 4(a) is in [112] zone axis orientation, six unit cells thick and contains a unit edge dislocation of Burgers vector b = 1/2[110] in the (111) glide plane; a dislocation commonly observed in fcc metals [44].The structure was created using the Atomsk software, which determines atom positions corresponding to the displacement fields predicted by the elastic theory of dislocations [45].The simulated HAADF STEM images were subjected to varying noise levels from 5e2 e/ Å2 to 5e4 e/ Å2 , and further corrupted by scan-line distortions as outlined in the "S(T)EM noise model" section.Example reconstructions for raw images at doses of 5e2 e/ Å2 and 5e4 e/ Å2 (figure 4(b) and (c)) are shown in figure 4(d) and (e), respectively.In the low-dose raw image individual atomic columns are hardly recognisable.Without the prior knowledge of the atomic column positions, any attempt of model fitting would have to overcome the challenge of performing reliable peak finding first, which is a factor not considered here.The reconstruction of this image (figure 4(d)) on the other hand shows very clear peaks.A burgers circuit is superimposed on the image to highlight that despite the poor separation of columns in the raw image, the dislocation with its correct burgers vector b is maintained, which means that the structure as a whole is retrieved correctly, albeit the individual column positions may not be fully accurate as can be seen in the mean absolute position error of the columns around the center of the dislocation (columns within the red circle in figure 4(a)) for low doses shown in figure 4(f).However, the error drops rapidly with increasing dose and shows a clear improvement against raw images.The position accuracy is therefore not only a result of denoising but also the result of the accurate correction of scan-line and fast-scan distortions.The comparatively high accuracy for the raw image fitting at low doses can be attributed to the fact that correct initial column positions are given for the fitting procedure.Since the column can hardly be located in the noisy images, the fitting algorithm on average does not move the position much away from this initial position.The CNN on the other hand reconstructs a clearly visible atomic column, but the available information in the underlying image is insufficient for accurate positioning.However, the proper retrieval of the dislocated atomic column at higher doses shows that the CNN is not by default just picking up on periodicity, but faithfully recovers the atomic structure also in the presence of non-periodic features in atomic resolution STEM images.Also the SCS measurements improve in accuracy by the restoration, which would translate directly into improvements for atom counting studies.An example of such an atom counting scenario is presented in figure 5.These results were obtained from a simulated spherical Pt nanoparticle with 11 unit cells in diameter in [100] zone axis orientation under the same distortion and noise parameters as given in the previous example.Atom counts were obtained by matching retrieved SCS values against simulated library values [46].The improvement in column position measurements over all dose settings again indicates the proper correction of scan-line and fast-scan distortions.The improvement of SCS measurement accuracies, especially at low-dose conditions greatly decreases the chance of miscounting atoms in the structure, which in turn may be very beneficial e.g. for the reconstruction of 3D information from atom-counts [47,48].

Experimental image restorations
One of the main advantages of our image restoration method is that the training data is generated using realistic physical models of the noise found in various microscopy modalities, as well as for an appropriate range of values for the noise model parameters, as detailed in the "Methods" section.This methodology allows for the direct application of our network to experimental data, without requiring additional training for a particular specimen or microscope settings.Figure 6 illustrates the effectiveness of our approach on diverse types of random experimental microscopy images.The top row of this figure shows raw experimental images for HR-STEM, LR-STEM, HR-TEM, LR-TEM, HR-SEM, and LR-SEM.The bottom row shows the corresponding restored versions of these images.These results show that the trained networks have excellent and images (e) and (f) were sourced from reference [50].
performance on experimental data and can effectively handle a wide range of microscopy images with varying resolution and noise levels.It is important to note that in this study, "high resolution" refers to images with round and symmetrical features, while "low resolution" refers to images with a variety of different features.Additional examples of restored experimental images for each microscopy modality can be found in the github repository https://github.com/Ivanlh20/r_em.
The importance of using realistic physical models of the noise to generate distorted data, along with selecting the correct range of values for the noise model parameters, is demonstrated in Figure 7.This figure illustrates how these factors can impact the accuracy of the restored image.Figures 7 (a) and (b) show two experimental STEM images that were acquired using a Fei Titan 3T M S/TEM microscope.The images were obtained using fast scanning with dwell times of 0.2µs and 0.05µs, respectively.The importance of accurately modelling fast scan distortion is evident from figures 7 (f) and (g).In these figures, our network architecture was trained using a model, which was not sufficient to completely compensate for the spread of pixel intensities along the scanning direction (see Equation 48 in the "S(T)EM noise model" section).If the dwell time decreases, these image artifacts become more pronounced, as shown in figure 7 (g).While the manufacturer recommends using dwell times larger than 0.5µs to avoid image artifacts, correctly modelling fast scan distortion allows us to fully compensate for these artifacts, as shown in figures 7 (k) and (l).The study of beam-sensitive materials and dynamic imaging will greatly benefit from the compensation of this distortion.Figure 7 (c) shows a registered STEM image that contains interpolation noise.The interpolation process changes the dominant noise distribution, which can impact the restoration process, especially at low doses, as shown in Figure 7 (h) where some atomic columns appear blurred.However, this issue can be addressed by including this type of noise in the training dataset, as explained in the "Methods" section.The effect of including this noise in the training dataset on the restored image can be seen in figure 7 m), where all atomic columns become clearly visible.Figure 7 (d) exhibits a STEM image with strong Y-jitter distortion.The impact of an incorrect range of values for this distortion during data generation on the restored image can be seen in figure 7 (i), where some atomic columns appear split.After retraining the data with newly generated data containing the proper range of Y-jitter distortion, the neural network can correctly compensate for this image artifact, as shown in figure 7 (n).In Figure 7 (e), an experimental STEM image of a nanoparticle taken using a gas cell holder is shown [51].The dominant sources of noise in this image are detector noise and fast scan noise.Figure 7 (j) shows a restored STEM image produced by our network architecture that was trained using a dataset generated with Poisson noise as the only source of STEM detector noise (as described by Equation 45 in the "S(T)EM noise model" section).However, this restored image exhibits strong artifacts despite using an accurate model for fast scan noise (as described by Equation 47in the "S(T)EM noise model" section).After retraining our network architecture with a new dataset that includes the correct STEM detector noise (as described by Equation 46in the "S(T)EM noise model" section), the restored image in Figure 7 (o) shows a significant reduction in artifacts.Nonetheless, it is worth mentioning that some of the remaining artifacts in the image could be attributed to other sources of distortion not accounted for in our data modelling, such as the gas holder effect, charging artifacts, and residual electronic noise.
Fast scan noise dwell time =0.2¡sAnother example that highlights the importance of properly modeling noise and distortion sources can be seen in Figure 8.In this figure, we compare the reconstruction performance of our CNN, AtomSegNet [33], and Noise2Void-NN (N2V) [53], which was retrained on the presented experimental image itself.The sample is a BaH f O 3 nanoparticle (figure 8-3 ) embedded in a superconducting REBa 2 Cu 3 O 7−δ (REBCO) matrix [54,55] (figure 8-2 ), which was grown on a SrTiO 3 substrate (figure 8-1 ).While all three networks successfully remove the noise from the image, there are notable differences in the reconstruction results.In region 1 , the N2V reconstruction recovers all the weaker intensities of the Ti + O columns to some degree, which is not the case for the AtomSegNet reconstruction.There, some of the columns blur or even disappear.Our CNN reliably recovers all atomic columns with superior contrast to the other two methods.Similar improvements are evident also in region 2 but most notably in region 3 .This region at the top of the image is also degraded, presumably by either FIB damage or carbon contamination.In both N2V and AtomSegNet reconstructions, features tend to blur into diagonal streaks, while our CNN recovers clearly distinguishable atomic columns and, given that the BaH f O 3 nanoparticle grew epitaxially on the SrTiO 3 substrate, that is indeed what would be expected [56].Considering the N2V network is a generic denoising network, the results are quite remarkable, albeit the additional training step is somewhat inconvenient from a user perspective.However,  2), which was epitaxially grown on a SrTiO 3 substrate( 1 ).Images were acquired on a non-probe-corrected Titan microscope with 300 keV at KIT Karlsruhe.The data is descibed in detail in references [54] and [55] .

Noise2Void
this example illustrates that the CNN presented in this work does not only benefit from the latest advances in deep learning, but also from the development of accurate, physically meaningful models of all distortions specific to HAADF-STEM.This CNN is shown to be accurate, not only in perceived contrast enhancement, but also in a quantitative way which boosts the accuracy and precision of atomic structure determination in ADF-STEM studies.

METHODS
In single-shot EM image restoration, the goal is to estimate an undistorted image y from a distorted image x.To achieve this, we train a generator G using a deep neural network approach, which learns to estimate the corresponding undistorted image y for a given input x.During the training procedure, a loss function is minimised to evaluate the quality of the results.Traditionally, pixel-wise losses such as L 1 or L 2 have been used to obtain quantitative results for the image restoration problem [57].However, these losses often lead to blurred images that do not look realistic.To address this, we propose a conditional generative adversarial network (cGAN) that trains both a generator and a discriminator.The generator G maps the distorted image x to the undistorted image y g = G(x), and the discriminator is trained to differentiate between real and generated images [58].We use pixel-wise losses to ensure quantitative results while restricting the GAN discriminator to model high-frequency details, resulting in sharper and more realistic restored images.
Our training is supervised, which requires input pairs of distorted and undistorted EM images.However, in practice, we only have access to distorted EM data.To overcome this, we can partially address the problem by collecting time-series EM images and using an average procedure based on rigid and non-rigid registration to generate an undistorted image.However, the combination of high-speed scans, jitter, and low-dose leads to highly correlated distortions [13].Furthermore, long exposure to the electron beam can result in charging, beam damage, atom hopping and rotation of the specimen under study, which can further hamper the average procedure.Therefore, the only solution is to train the GAN using synthetic pairs of undistorted/distorted EM images.

Network architecture
A GAN [59] is a powerful framework that encourages predictions to be realistic and thus to be close to the undistorted data distribution.A GAN consists of a generator (G) and discriminator (D) playing an adversarial game.A generator learns to produce output that looks realistic to the discriminator, while a discriminator learns to distinguish between real and generated data.The models are trained together in an adversarial manner such that improvements in the discriminator come at the cost of a reduced capability of the generator and vice versa.The GAN involves the generation of conditional data, which is fed to the generator and/or the discriminator [35].The generator and discriminator architectures proposed here are adapted from those described in [60] and [58], respectively.The details of these architectures are discussed in the following sections.Generator architecture Our generator architecture, called Concatenated Grouped Residual Dense Network (CGRDN), is shown in Fig. 9.This network architecture is an extension of the GRDN for image denoising [23], which was ranked first for real image denoising in terms of the PSNR and the structure similarity index measure in the NTIRE2019 Image Denoising Challenge [61].The GRDB architecture is shown in Fig. 9(a).The building module of this architecture is the residual dense block (RDB) [60], which is shown in Fig. 9(b).The original GRDN architecture can be conceptually divided into three parts.The first part consists of a convolutional layer followed by a downsampling layer based on a convolutional stride, the middle part is built by cascading GRDBs and the last part consists of an upsampling layer based on transposed convolution followed by a convolutional block attention module (CBAM) [62] and a convolutional layer.The GRDN also includes the global residual connection between the input and the last convolutional layer.In the original version of the GRDN [23], residual connections are applied in three different levels (global residual connection, semi-global residual connection in GRDB, and local residual connection in each RDB).However, in the version submitted for the NTIRE2019 Image Denoising Challenge [61], residual connections for every 2 GRDBs were included.
Although it has been demonstrated that one architecture developed for a certain image restoration task also performs well for other restoration tasks [60,63,58,64], an architecture for a given task will be data dependent.When applied to EM data, we found out that 2 modifications of GRDN are necessary in order to best handle the nature of our data, which involves different types and levels of distortions with high correlation between pixels: 1.The cascading of the GRDN is replaced by feature concatenation, feature fusion, and a semiglobal residual connection.
This allows us to exploit hierarchical features in a global way, which is important for highly correlated pixels that extend over a large area of the image.
2. The CBAM, which is included in [60] is removed from our network.The reason for this is the use of large image sizes (256x256) for training, which reduces its gain [23].

Discriminator architecture
The purpose of the discriminator network is to judge the quality of the output data resulting from the generator network.For our discriminator, we use the 70x70 convolutional patch discriminator described in [58] with some minor modifications.The zero-padding layers were removed and batch normalization layers [29] were replaced by instance normalization layers (IN) [65].Figure 10 shows the structure of the discriminator network.The result of the network is the non-transformed output  C(y) or C(y g ) of dimensions 32x32.Some benefits of the discriminator architecture shown in Fig. 10 include that it is fully convolutional and it only penalizes structure at the scale of image patches.Furthermore, we enhance our discriminator based on the relativistic GAN, which has been shown to improve the data quality and stability of GANs at no computational cost [66].Different from the standard discriminator, which estimates the probability that input data is real, a relativistic discriminator predicts the probability that real data y is relatively more realistic than generated data y g = G(x).If we denote our relativistic average patch discriminator as D Rap (x), then the output of the discriminator can be written as: where σ is the sigmoid function and E x 1 ,...x n {.} is an operator representing the expectation value computed on the variables x 1 , ...x n .In the next section, these functions will be used in the definition of the loss functions.

Loss function
The loss function is the effective driver of the network's learning.Its goal is to map a set of parameter values of the network onto a scalar value, which allows candidate solutions to be ranked and compared.In our case, the discriminator and adversarial losses are based on the relativistic average GAN loss defined in [66].We design our generator loss function as a sum of different contributions in such a manner that it keeps the quantitative information of the image at the pixel level and produces perceptually correct and realistic images.The different contributions of these loss functions are described in the following sections.
L 1 loss Pixel-wise losses are advantageous to keep quantitative information of the ground truth image.In this work, we used the L 1 11/28 loss, which as compared to the L 2 loss yields less blurred results [57].The L 1 loss can be written as: where w y is a weighting factor that gives equal importance to each example regardless of its contrast, σ min is a small value to limit the maximum scaling factor, and Std x 1 ,...x n {.} is an operator that represents the standard deviation calculated on the variables x 1 , ...x n .L 2 loss Due to the design of our architecture, which is learning the residual difference between the distorted and undistorted image and based on the fact that distorted images can have few outliers in the distribution of pixel intensities (i.e.X-rays hitting the EM detector, saturation of the detector, low dose and dead-pixels), the output of the generator will show a strong correlation at those pixel positions.For this reason, we also used the L 2 loss which strongly penalized the outliers: Multi-local whitening transform loss Local contrast normalisation (LCN) is a method that normalises the image on local patches on a pixel basis [67].A special case of this method is the whitening transform which is obtained by subtracting the mean and dividing by the standard deviation of a neighborhood from a particular pixel: where Ŝ is a local neighbourhood around the pixel i, j of window size S.The whitening transform makes the image patches less correlated with each other and can highlight image features that were hidden in the raw image due to its low local contrast.This effect can be seen in Fig. 11a Then, we calculate the average L 1 loss for these 4 images: E y S ,y S g y S − y S g .

Fourier space loss
In electron microscopy, Fourier space contains crucial information about the sample and any distortions that may be difficult to discern in real space.To address this issue, we introduce the L γ loss in the 2D Fourier transform of the difference between the generated data y g and the ground truth image y.Nevertheless, it is noted that high-frequency information typically possesses 12/28 smaller values than low-frequency information.Consequently, to accentuate the high-frequency information, we apply a power transform to the aforementioned difference and define the loss function as follows: Here, F symbolises the 2D Fourier transform, and γ is a parameter in the range (0.0, 1.0].In our investigation, we utilise γ = 0.125.Constraint losses Some important parameters for EM quantification are the total intensity and the standard deviation of the images.The reason for this is that they carry information about physical quantities of the sample or microscope, such as the number of atoms, defocus and spatial and temporal incoherence [68,69].Therefore, we encourage that the restored images have to minimize the above quantities, resulting in the following two loss functions:

Adversarial loss
The job of the relativistic adversarial loss is to fool the discriminator which can be expressed as: with D Rap (y, y g ) and D Rap (y g , y) defined in equations 2 and 3, respectively.This definition is based on the binary cross entropy between the ground truth and the generated images.Different from the conventional adversarial loss, in which y is not used, our generator benefits from y and y g in the adversarial training.

Generator loss
Our total generator loss function can be written as: where L pixel−wise is our pixel-wise loss function, λ 1 , λ 2 , λ mlwt , λ f s−γ , λ mean , λ std and λ Adv are the weighting parameters to balance the different loss terms.

Discriminator loss
Symmetrically to the relativistic adversarial loss, the relativistic discriminator is trying to predict the probability that real data is relatively more realistic than generated data, and it can be expressed as:

Data generation
While it is possible to fully describe the electron-specimen interaction and image formation in an electron microscope, generating realistic EM image simulations for specimens on a support with sizes of a few nanometers is too time-consuming even with the most powerful GPU implementations of the multislice method [42,43].However, our goal is to train a neural network to correct EM distortions without the need to know the specific specimen or microscope settings.Therefore, we only need to generate undistorted images that closely mimic the appearance of real EM data, while the EM distortions must be accurately modelled.The generated undistorted images should also include physical parameters of the specimen and microscope settings, such as atomic sizes, atomic distances, atomic vibrations, lattice parameters, and relative intensities of atomic species, as well as acceleration voltage, aberrations, magnification, detector sensitivity, detector angles, and the transfer function of the detection system.

Specimen generation
In order to optimise the simulation process, we generate a specimen that fully covers the extended simulated box size le xyz .This is an expanded version of the required simulation box size lxyz .The calculation of lxyz starts by randomly selecting a pixel size dr within the range [0.025, 0.90] Å.By using the required image size (n x , n y ), n z = max(n x , n y ) and dr, the required simulation box size can be expressed as lxyz = {n x dr, n y dr, n z dr}.From these values, an extended number of pixels n e i = n i + round(d ext /dr) and an extended simulation box size le xyz = {n e x dr, n e y dr, n e z dr} are obtained, where d ext is the maximum correlation distance for a given value of scanning distortions.The specimen generation is divided in 3 steps.
The first step of specimen generation involves randomly selecting a specimen type from the following options: crystalline specimen, amorphous specimen, or individual points.If the selected specimen is crystalline, the generation process starts by randomly choosing up to 16 unique atomic types with atomic number Z in the range [1,103].The crystallographic space group is randomly chosen from a range [1,230].The lattice parameters and the angles of the chosen space group are selected randomly from a range [3.1, 25.0] Å and [45 • , 120 • ], respectively.Atomic positions of the asymmetric unit cells are generated randomly within the volume that is allowed by their space-group symmetry.This specimen generation process is subject to a physical constraint: after applying the space group symmetry to the atomic positions on the asymmetric unit cells, the minimum distance between the atoms in the unit cell must be within the range [0.95, 7.0] Å.If this requirement is not met, the generation process is restarted.The generation of amorphous specimens is based on randomly choosing only one atomic number Z from the range [1,103].The atomic positions of amorphous specimens are generated by randomly placing atoms within the extended simulation box, subject to the requirement that the minimum distance between atoms is within the range [0.95, 1.6] Å.This process continues until the desired density within the range [2.0, 7.0]g/cm 3 is achieved.In contrast, the generation of individual points starts by randomly choosing a number of points within a given range of positive integers.The 3D positions of the particles are then generated randomly within the extended simulation box, subject to the requirement that the minimum distance between particles is within the range [1,20]dr.This option is also used to generate low-resolution images.
The second step begins by randomly choosing between a specimen orientation along the zone axis or a random orientation.The probability of choosing a zone axis orientation is 0.75.If the specimen is crystalline, the zone axis orientation is randomly chosen from the first eight main zone axes, and a small random mistilt angle is generated for the chosen orientation using a normally distributed random number with a standard deviation of 5 • .For non-crystalline specimens, a random 3D orientation is generated.To prevent alignment of crystalline specimens along the xy directions, an additional random rotation is applied along the z axis.For a given generated orientation, the specimen is oriented and cropped in the xy plane so that it fits within the extended simulated box.This is followed by a random generation of a wedge on the specimen with a probability of 0.75.The wedge can be generated on the top, bottom, or both surfaces of the specimen, each with a probability of occurrence of 0.33.The wedge orientation is generated randomly in the xy plane, and its angle is chosen randomly from the range [5 • , 45 • ].Shapes can be applied to the specimen with a probability of 0.5.To avoid any preference for the three different types of shapes, the probability of occurrence for each type is set to 0.33.The first type of shape is a polygon rod, for which the number of cross-section vertices sliced along its length is randomly chosen from the range [3,15].The rod is also placed and oriented randomly.The radius of the polygon is chosen randomly from the range [0.01, 0.5] max( lxyz ).The second shape is a convex polyhedron, for which the radius and the number of vertices are chosen randomly from the ranges [0.01, 0.5] max( lxyz ) and [4,20], respectively.The third shape is a hard shape, in which all atoms on one side of a randomly generated 3d plane parallel to the z orientation are removed.The application of a chosen shape can be used to either remove or keep the atoms of the specimen, with a probability of keeping the atoms of 0.5.Defects are generated randomly with a probability of 0.8.The process starts by randomly selecting a number of atoms, n sel , within the specimen.This number is chosen randomly from the range [0, n max ], where n max is equal to the number of atoms in the specimen multiplied by 0.25 and rounded to the nearest whole number.The positions of the selected atoms are randomly changed with a probability of 0.5.This is done by adding a normally distributed random number with a standard deviation equal to the atomic radius to the position of each selected atom.
The final step of specimen generation adds a support layer with a probability of 0.95.The support layer can be either crystalline or amorphous, each with a probability of 0.5.The thickness of the support layer is chosen randomly from the range [1,30]nm.The process described above for crystalline and amorphous specimen generation is used for the support layer, with the exception of shape generation.Finally, the generated atoms are added to the specimen.

Undistorted data generation
High/medium resolution electron microscopy data can be synthesized as a linear superposition of the projected signal of each atom in the specimen at a given orientation.Moreover, each projected atomic signal can be modelled as a two-dimensional radial symmetric function, f i Z (r), where the index i refers to an atom with atomic number Z in the specimen.Under this assumption, y can be expressed as: where r is a two-dimensional vector.Additionally, we model f Z (r) for each atom with atomic number Z as a weighted sum of Gaussian, Exponential, and Butterworth functions: where h 1 , h 2 , h 3 , n and r m are the parameters of our model which are restricted to positive values.This parameterization has 3 benefits.First, it accurately models almost any simulated/experimental incoherent EM image.Second, it allows for an easy inclusion of physical constraints.Third, it only requires 5 parameters.To allow realistic tails of f Z (r), we constrain n to be a uniform random variable between [4.0, 16.0].We would also like to emphasize that all numerical ranges for the data generation were fine-tuned based on analyzing around 2000 real simulations of (S)TEM images for different specimens and microscope settings.
In order to encode physical information into this model, r m Z is chosen proportionally to the transformed two-dimensional mean square radius, rZ , of the projected atomic potential, V p Z (r) [70]: where and α is a uniform random variable between [0.75, 1.25].On the other hand, the linear coefficients h 1 , h 2 and h 3 are randomly chosen within the range [0.5, 1.0] with the following constraint: where Z i and Z j are the atomic numbers of two elements of the specimen.This constraint arises from the fact that the integrated intensity of quasi-incoherently scattered electrons of a given atomic number is proportional to Z γ , in which γ is a real number between 1.0 and 2.0 depending on the microscope settings [71].
The process of generating low-resolution images begins by randomly choosing a set of low-resolution image types from the following options: soft particles, sharp particles, grains, bands, boxes, and cracks.This stage uses the specimen type "individual points" to generate random positions where different objects will be placed.Finally, the low-resolution image is obtained by linearly superimposing these individual objects.
The generation of soft particles starts by randomly choosing a number of particles in the range [15,85].Each soft particle image is generated by randomly rotating the asymmetric version of Eq. 17, where r m Z = (r m x Z , r m y Z ) and r m y Z = αr m x Z , with α a random variable in the range [0.8, 1.2].In the case of sharp particles, there is a sharp transition between the border and background of the particle, and the particle can be either polygonal or elliptical with equal probabilities of occurrence.The process starts by randomly choosing a number of particles in the range [15,40].For the polygon option, the number of vertices is randomly chosen in the range [3,5].Each sharp particle image is generated by masking a 3D random positive plane intensity with its randomly rotated shape.This masking creates an intensity gradient over the x − y plane such that the object does not appear flat.
Grain generation in 2D is performed using the Voronoi tessellation method [72], which is one of the available techniques for producing random polygonal grains within a domain.This process starts by randomly selecting a number of points within the range [15,175].Each grain image is created by masking a 3D random positive plane with its corresponding Voronoi cell.Additionally, the grain borderline is included with a probability of occurrence of 0.5, where its intensity value is randomly assigned within the range [0.5, 1.5] × mean(grain intensity).
EM images may exhibit contrast inversion related to the projected specimen, which can be easily simulated by inverting the image: The probability of this mechanism occurring was set to 0.5.To introduce non-linear dependence between the generated image intensity and the projected specimen's structure, y is non-linearly transformed with a probability of occurrence of 0.5: where β is a uniform random number selected from the range [0.5, 1.5].
To further break this linearity, a random background was added to y.The background is randomly chosen between a 3D plane and a Gaussian, with an occurrence probability of 0.5 for each.In the first case, a randomly orientated positive 3D plane is generated with a random height between [0, max(y)/2].In the second case, the Gaussian centre and its standard deviation are randomly chosen within the range of the xy simulation box size and [0.2, 0.6] × min(n x , n y ), respectively.From the analysis of the experimental and simulated data, we found that the ratio r std/mean = Std {y} /E {y} is between [0.01, 0.35].Therefore, if the EM image does not fulfill the latter constraint, then it is linearly transformed as: where c and d are chosen to bring r std/mean within the range of the constraint.Finally, the EM image is normalized through dividing by its maximum value.
y ← y max(y) (26) Note that the correct parameterization of the model and the randomness of its parameters are subject to physical constraints allowing to encode information in the generated high/medium resolution EM image of the atomic size, atomic vibration, relative intensities between atomic species, detector angle, acceleration voltage, aberrations and/or detector sensitivity.

TEM noise model
The TEM noise model is based on the fact that TEM images are recorded using parallel illumination, and that most signal acquisitions for electrons are set up so that the detector output is directly proportional to the time-averaged flux of electrons reaching the detector.In case of TEM, the electrons are detected indirectly using a charge coupled device (CCD) sensor [73] or a complementary metal oxide semiconductor (CMOS) sensor [74], or directly using a direct electron detector [75].
For indirect detection, primary electrons are converted to photons in a scintillator, which are then directed to the CCD/CMOS sensor through a lens or fiber optic coupling.In contrast, for direct electron detectors, the CMOS sensor is directly exposed to the electron beam.TEM camera modulation-transfer function Scattering of incident electrons over the detector leads to the detection of electrons in multiple pixels, which can be quantitatively described using the modulation-transfer function (MTF).Because the effect of the MTF is to produce an isotropic smear out of features on the recorded TEM image, which in general cannot be distinguished from an undistorted TEM image recorded with other microscope settings, we embedded this effect into the undistorted TEM image by convolving it with the point-spread function (PSF), which is the Fourier transform of the MTF: The MTF itself can be separated into a rotationally symmetric part, MTF r , describing the spread of electrons in the detector, and a part describing the convolution over the quadratic area of a single pixel.This yields the following equation: where the Fourier space coordinates (u, v) are defined in units of the Nyquist frequency [76].Furthermore, we found that the general shape of MTF r can be expressed parametrically as: where a, b and c are positive real numbers.These numbers were randomly generated until they fulfill the constraint that on a numerical grid of 1000 points with a length of 10 units of the Nyquist frequency, the MTF r is a positive and monotonically decreasing function.TEM detector noise TEM detectors are subject to three main sources of noise: shot noise, dark current noise, and readout noise.These noise sources can be classified into two types: temporal and spatial noise.Temporal noise can be reduced by frame averaging, whereas spatial noise cannot.However, some spatial noise can be mitigated by using techniques such as frame subtraction or gain/offset correction.Examples of temporal noise discussed in this document include shot noise, reset noise, output amplifier noise, and dark current shot noise.Spatial noise sources include photoresponse non-uniformity and dark current non-uniformity.Each of these noise sources can lower the SNR of a sensor imaging device.

Photon shot noise
After the initial conversion of the incident electron to its photon counterpart, the generated photons will hit the photosensor pixel area, liberating photo-electrons proportional to the light intensity.Due to the quantum nature of light, there is an intrinsic uncertainty arising from random fluctuations when photons are collected by the photosensor.This uncertainty is described by the Poisson process P with mean αx, where α is a dose scale factor.
The distribution of α is exponential, with a scale parameter of 0.5 and a range [0.5, 750]/E{y}.The use of the exponential distribution yields higher probabilities for the generation of images at lower doses which is the focus of our research.The division by α in the equation below brings x back to its original range: Fixed-pattern noise Fixed-pattern noise (FPN) is a pixel gain mismatch caused by spatial variations in the thickness of the scintillator, fiber-optic coupling, substrate material, CCD bias pattern, and other artifacts that produce variations in the pixel-to-pixel sensitivity and/or distortions in the optical path to the CCD or in the CCD chip itself [77].Since FPN is a property of the sensor, it cannot be fully eliminated.However, it can be suppressed using a flat-field correction procedure.We model the remaining distortion as a normal distribution N with zero mean and standard deviation σ f pn .
x ← x + xN(0, σ f pn ) Dark-current noise Dark current is the result of imperfections or impurities in the depleted bulk Si or at the SiO 2 /Si interface.These sites introduce electronic states in the forbidden gap which allows the valence electrons to jump into the conduction band and be collected in the sensor wells.This noise is independent of electron/photon-induced signal, but highly dependent on device temperature due to its thermal activation process [78].
Dark-current nonuniformity Dark-current nonuniformity (DCNU) arises from the fact that pixels in a hardware photosensor cannot be manufactured exactly the same and there will always be variations in the photo detector area that are spatially uncorrelated, surface defects at the SiO 2 /Si interface, and discrete randomly-distributed charge generation centers [79].This means that different pixels produce different amounts of dark current.This manifests itself as a fixed-pattern exposure-dependent noise and can be modelled by superimposing two distributions.The Log-Normal distribution (lnN) is used for the main body and the uniform (U) distribution is used for the "hot pixels" or "outliers" [80].

Dark-current shot noise
Additional noise arises from the random arrival of electrons generated as part of the dark signal, which is governed by the Poisson process.To simulate a single frame, it is necessary to apply shot noise to the DCNU array.

Readout noise
Readout noise is temporal noise and is generally defined as the combination of the remainder circuitry noise sources between the photoreceptor and the ADC circuitry.This includes thermal noise, flicker noise and reset noise [81].

Thermal noise
Thermal noise arises from equilibrium fluctuations of an electric current inside an electrical conductor due to the random thermal motion of the charge carriers.It is independent of illumination and occurs regardless of any applied voltage.The noise is commonly referred to as Johnson noise, Johnson-Nyquist noise, or simply white noise.It can be modelled by the normal distribution with zero mean and an appropriate standard deviation σ [81].
x ← x + N(0, σ ) Flicker noise Flicker noise, also known as 1/ f noise or pink noise, is often caused by imperfect contacts between different materials at a junction, including metal-to-metal, metal-to-semiconductor, and semiconductor-to-semiconductor.MOSFETs are used in the construction of CMOS image sensors, which tend to exhibit higher levels of 1/ f noise than CCD sensors [79].The amount of flicker noise in a CCD sensor depends on the pixel sampling rate.The equation below describes the effect of flicker noise on a signal x: Here, F is the two-dimensional Fourier transform, σ is the appropriate standard deviation, and f is the reciprocal distance.

Reset noise
Before a measurement of the charge packet of each pixel is taken, the sense node capacitor of a specific row is reset to a reference voltage level.This causes all pixels in that row to be exposed to noise coming in through the reset line, transfer gate, or read transistor.As a result, images may have horizontal lines due to the fixed and temporal components of the noise.This type of noise, known as reset noise (RN), follows a normal distribution with mean zero and a standard deviation σ .It can be simulated by adding a random intensity value, generated for each row, to the intensity values of all pixels in that row [80]: Black pixel noise Black pixels are dots or small clusters of pixels on the sensor that have significantly lower response than their neighbors, resulting in black spots on the image.Some black pixels may be created during the production process of the CCD camera, while others may appear during its lifetime.Black pixels are time-invariant and will always appear at the same locations on the image.They can be modelled by generating a sensitivity mask (S Black ) with a spatially uniform distribution of a specified number of black points.Regions can be generated by applying a random walk process for a given number of random steps to the black point positions.The equation below describes the effect of black pixels on a signal x: x ← xS Black (37) Zinger noise Zingers are spurious white dots or regions that can appear randomly in CCD images [82].Electron-generated X-rays, cosmic rays, and muons can produce a burst of photons in the scintillator, resulting in white spots or streaks in the image.Radioactive elements (such as thorium) present in fiber-optic tapers can also cause zingers [77].They can be modelled by generating a sensitivity mask (S Zinger ) with a spatially uniform distribution of a specified number of zinger points.Similar to the black pixel noise, regions can be generated by applying a random walk process for a given number of steps to the zinger point positions: x ← xS Zinger (38) Upper-clip noise Upper clip noise, also known as saturation noise, is a type of noise that occurs when the intensity value of a pixel exceeds the maximum value that the CCD sensor can detect.This causes the pixel to be "clipped" at the maximum value, resulting in an overly bright image with lost details.This type of noise can be modelled by setting a threshold value for the maximum intensity and clipping any pixel values above that threshold T u : x ← min(x, T u )

Quantisation noise
To generate a digital image, the analog voltage signal read out during the last stage is quantized into discrete values using analog-to-digital conversion (ADC).This process introduces quantization noise, which can be modelled with respect to the ADC gain α: x ← round(αx) Figure 12 shows simulated TEM images with different types of noise.These distortions have been randomly added to the images to mimic real TEM conditions and make it easier to identify the different types of noise.

S(T)EM noise model
S(T)EM images are formed one pixel at a time by scanning a convergent electron beam along scan lines across the sample with constant stationary probing, which is known as dwell time.The dimension of each square-shaped pixel in the physical space is determined by the magnification.The scanning direction is called the fast/row scan direction.For conventional scan patterns, the scanning begins at the top left corner and after scanning one row of n pixels, the electron probe moves to the next row's first pixel.The time required to move the beam to the beginning of the scan line is commonly known as fly-back-time.
Inaccuracies in beam positions during the scanning process give rise to characteristic scan-line/jitter distortions.Despite all technical improvements in the design of high-performance S(T)EM [3], the presence of these distortions on the recorded images still hampers the extraction of quantitative information from the sample under study [5].

Scanning jitter distortion
Scanning jitter is caused by beam instabilities while scanning a raster pattern across the sample during the image acquisition process.There are two distinguishable jitter effects: X-jitter causes random pixel shifts along the fast-scan direction, while  Y-jitter causes stretching or squishing of scan lines or line interchanges along the slow-scan direction [11].Although these displacements are not completely random due to serial acquisition, they depend on the previous scan position.Realistic modelling of scanning jitter distortion can be achieved using the Yule-Walker correlation scheme on time series [83,84].Furthermore, the fast and slow scanning directions can be modelled independently due to their different time scales.Here, we focus on displacement series in discrete pixels, in which each term of the series depends on the previous one.Mathematically, these displacement series can be described as: where t = x, y and k is the pixel index along a given t direction.φ t is the correlation coefficient which describes the coupling between two consecutive values of the series within the range [0, 1]. a i t is a normally distributed random number with zero mean and standard deviation σ t .The distorted image is created by using bicubic interpolation and evaluating on the non-regular grid, which is built by adding the positions of the regular grid and the generated displacements.
x ← SJ(y) (42) The described effects of individual jitter distortions for σ x = σ y = 0.75 and φ x = φ y = 0.6 along the fast and slow scan directions can be seen in Fig. 13(a) and Fig. 13(b), respectively.Fig. 13(c) shows the undistorted ADF STEM random generated image.
Based on our analysis of experimental data, we set the occurrence probability of jitter distortion to 0.9.In addition, we assign the occurrence probability of the X-jitter, Y-jitter and the XY-jitter to 0.25, 0.25 and 0.50, respectively.The values of σ t and φ t are randomly chosen within the range [0.0025, 0.8] Å and [0.0, 0.7], respectively.

S(T)EM detector noise
Electrons are detected by a scintillator coupled to a photomultiplier tube (PMT) via a mirror or reflective tube.Impact of the incident electrons on the scintillator cause photons to be emitted, which are directed to the PMT through a light pipe.The PMT consists of a photocathode that emits photoelectrons when illuminated by these photons, followed by a series of stages amplifying the signal.The resulting current at the anode can be measured using conventional ADC electronics [8].The statistics of the electron multiplication as a series of Poisson events with full width at half maximum (FWHM) of the pulse at the anode This equation assumes that the secondary gain δ at each stage inside the PMT is the same.In this equation, G represents the PMT gain, η is the detective quantum efficiency, m c is the number of photons collected per incident electron, and δ 2 c is the variance of that number [85].A good approximation for the noise spectrum of a photomultiplier is the Poisson distribution, which can be approximated by a Gaussian distribution for large means.Since for each electron reaching the scintillator, around 100 photons reach the cathode of the photomultiplier, a Gaussian approximation can be used with standard deviation σ = m c ηG In addition, the number of electrons hitting the scintillator is described by the Poisson process (P) [86].The signal can therefore be constructed in two steps: x ← P(αx) x ← (x + N(0, σ ))/α (46) where α is a dose scale factor.Dividing by α in the latter equation brings x back to approximately its original range.

Fast scan noise
Fast scan noise arises due to the use of short dwell times during data acquisition and appears as horizontal blur in the recorded images.This effect can also be seen in the Fourier domain as a damping effect on the high frequencies in the horizontal direction.This blurring is caused by the finite decay time of the detection system, which consists of a scintillator, a photomultiplier, and additional readout electronics [86,87].In addition to blurring in the horizontal direction, fast scans may introduce other artifacts due to the limited response time of the scan coils.In particular, strong distortions may appear on the left-hand side of the images due to the discontinuity in the scan pattern between consecutive lines.This can be avoided by using a small delay (flyback time) between scanning lines.The optimal value of this delay is hardware-specific, but results in additional dose to the sample, which will be localized on the left-hand side of each image [88].In general, the effect of fast scan distortion can be modelled by convolution in one dimension along the fast-scan direction between x and the point spread function (PSF) of the system.After careful analysis of the experimental data, we find that the PSF of the system can be decomposed into contributions from the detector and the readout system. is the normalization factor which ensures that the total integral of the psf readout is equal to 1, k is the pixel value in real space, and α is the parameter of the Lorentzian function that describes the PSF of the detector.The parameters β , γ, and θ are the parameters of the damped harmonic oscillator which is used to describe the PSF of the readout system.The model parameters were obtained by fitting to experimental images and by applying random variation to the fitting parameters.Row-line noise Row-line (RL) noise arises due to the non-response of the detector over some pixels during scanning process along the fast-scan direction.This noise can be modelled by generating a random number of row lines with random length.The pixel intensities of the lines in the image are replaced by their average intensity multiplied by a random factor within the range [0.5, 1.5].This can be represented as: Black pixel noise Black pixels are randomly occurring pixels that have significantly lower values than their neighbouring pixels, causing black spots to appear in the image.These black pixels may result from information loss during data transmission, cosmic rays, or the detector's non-response.As black pixels are time-dependent, they can be modelled by generating a sensitivity mask (S Black noise ) with a spatially uniform distribution of a specified number of black points.This can be represented mathematically as: x ← xS Black noise (52) However, in the case of SEM images, black spots in the images may be attributed to pores present in the sample, and hence, this type of distortion is not generated.

Zinger noise
Zingers are random white dots that appear in an image.They are caused by bursts of photons produced by electron-generated X-rays, cosmic rays, and muons in the scintillator [77].Zinger noise can be simulated by creating a sensitivity mask (S Zinger noise ) with a spatially uniform distribution of a specified number of Zinger points.
x ← xS Zinger noise (53) Upper-clip noise Upper clip noise, also known as saturation noise, occurs when the intensity value of a pixel exceeds the maximum value that the analog-to-digital converter can detect.This causes the pixel to be "clipped" at the maximum value, resulting in an overly bright image with lost details.This type of noise can be modelled by setting a threshold value for the maximum intensity and clipping any pixel values above that threshold T u .
x ← min(x, T u )

Quantisation noise
To generate an image in digital form, the analog voltage signal read out during the last stage is quantized into discrete values using an ADC with a gain α.This process introduces quantisation noise.
x ← round(αx) Figure 14 shows simulated STEM images of the different types of noise that can be found in STEM images.These distortions were randomly added to the images to simulate real STEM conditions and make it easier to identify the different types of noise.

Figure 1 .
Figure 1.Ablation study of the CGRDN architecture based on L 1 metric as a function of the size of the model.The number of layers n lay is indicated next to each data point.

Figure 2 .
Figure 2. CNN restoration results compared with BM3D in terms of PSNR for two random simulated STEM specimens using standard experimental noise values.

Figure 3 .
Figure 3. Precision of atomic column position and SCS-measurements over a series of Pt-bulk samples with a thickness varying from 2-75 atoms together with their 95% confidence intervals.(a) Precision of the atomic column locations for a dose of 5e2 e/ Å2 .(b) Precision of SCS measurements for a dose of 5e2 e/ Å2 .(c) Precision of atomic column locations for a dose of 5e4 e/ Å2 .(d) Precision of SCS measurements for a dose of 5e4 e/ Å2 .(e) Example of a raw STEM image at z=2 and dose=5e2 e/ Å2 .(f) Example of a restored STEM image at z=2 and dose=5e2 e/ Å2 .(g) Example of a raw STEM image at z=75 and dose=5e4 e/ Å2 .(h) Example of a restored STEM image at z=75 and dose=5e4e/ Å2 .

Figure 4 .
Figure 4. (a) Schematic of the Pt structure in [112] zone axis with a unit edge dislocation of Burgers vector b = 1/2[110] in the (111) glide plane.(b) Corrupted raw HAADF STEM image with a dose of 5e2e/ Å2 .(c) Corrupted raw image with a dose of 5e5e/ Å2 .(d) Restored image with a dose of 5e2e/ Å2 .(e) Restored image with a dose of 5e5e/ Å2 .(f) Quantification results for the atomic column positions and scattering cross sections of the atomic columns around the center of the edge dislocation (marked with red circles in panel (a)).

Figure 5 .
Figure 5. Quantification results for a spherical Pt nanoparticle with a diameter of 11 unit cells in [100] orientation.The values are based on all 333 atomic columns for 100 noise realisations.(a) The mean absolute error of the estimated atomic column positions.(b) The mean absolute percentage error of the fitted scattering cross sections, which are being used to estimate atom counts in each column.(c) The fraction of atomic columns with correctly estimated atom counts.

Figure 6 .
Figure 6.Experimental image restoration for various microscopy modalities.The top row illustrates the raw experimental images, while the bottom row displays the restored versions.Images (a), (b), (c), and (d) were obtained from reference[49], and images (e) and (f) were sourced from reference[50].

Figure 7 .
Figure 7.Raw STEM images alongside the results of a restoration process employing inaccurate and accurate models of the noise.The top row shows the original STEM images, while the second and third rows show the restored versions of the images trained with distorted data based on inaccurate and accurate noise models, respectively.Images (a)-(c) were obtained from our experimental datasets, whereas (d) and (e) were obtained from references[52] and[51], respectively.

Figure 8 .
Figure 8.Comparison of different CNN-restoration approaches on an experimental HAADF-STEM dataset of a BaH f O 3 nanoparticle ( 3 ) embedded in a superconducting REBa 2 Cu 3 O 7−δ (REBCO) matrix (2), which was epitaxially grown on a SrTiO 3 substrate( 1 ).Images were acquired on a non-probe-corrected Titan microscope with 300 keV at KIT Karlsruhe.The data is descibed in detail in references[54] and[55] .

Figure 11 .
Figure 11.a) Undistorted ADF STEM image of a nanoparticle on a carbon support.Images are generated by applying the whitening transform to (a) by using different window sizes of (b) 2, (c) 4, (d) 8 and (e) 16.

Figure 12 .
Figure 12.Simulated TEM images with random distortions showing the various types of noise.

Figure 13 .
Figure 13.Image (a) and (b) are distorted jitter images along the fast and slow scan direction, respectively.(c) Undistorted ADF STEM image of a random sample.