Five-second STEM dislocation tomography for 300 nm thick specimen assisted by deep-learning-based noise filtering

Scanning transmission electron microscopy (STEM) is suitable for visualizing the inside of a relatively thick specimen than the conventional transmission electron microscopy, whose resolution is limited by the chromatic aberration of image forming lenses, and thus, the STEM mode has been employed frequently for computed electron tomography based three-dimensional (3D) structural characterization and combined with analytical methods such as annular dark field imaging or spectroscopies. However, the image quality of STEM is severely suffered by noise or artifacts especially when rapid imaging, in the order of millisecond per frame or faster, is pursued. Here we demonstrate a deep-learning-assisted rapid STEM tomography, which visualizes 3D dislocation arrangement only within five-second acquisition of all the tilt-series images even in a 300 nm thick steel specimen. The developed method offers a new platform for various in situ or operando 3D microanalyses in which dealing with relatively thick specimens or covering media like liquid cells are required.


A. Image distortion correction
Rapid scan in STEM brings about image distortion in the X direction because the X axis corresponds to the fast scan axis as shown in Fig. S1a. To calculate and correct the distortion, we firstly compare the rapidly scanned image of the Au grating sample (Fig. S1b) to relatively slowly scanned that (Fig. S1c). Note that the former in the figure is denoised by DCFI technique. The rest of this section describes how we corrected the distortion.

Figure S1
Distortion correction process. (a) Schematic drawing of scan process of STEM, and (b) nonlinearly distorted image due to rapid scan. To let the rapid scan image and (c) the slow scan image correspond to each other, a part of the rapid scan image, which is indicated as the n-th "window" at Xr,n, is to be compared with the n-th window at Xs,n in the slow scan image. From the window pair which shows the highest cross correlation, we calculate the difference of position between the two, as shown in (d), enabling (e) the correction of the distortion. Linear distortion in the slow scan image is corrected by Affine transformation estimated from its FFT image ((f) and (g)).
4 First, a "shadow" filter is applied to the raw image of rapid scan to reduce image blur in the X direction, which is probably originating from a finite time width of the electron detector. This filter is a kind of differential filter, which combines the original image and its differential, expressed as Then a partial image corresponding to the "slit-shaped window" depicted in Fig. S1b is extracted from the rapid scan image. The horizontal and vertical sizes of this window are 20 pixels and 450 pixels, respectively. The central axis of the window is placed at a horizontal position , .
Then the corresponding partial image in the slow scan image is searched with cross-correlation coefficient. The central position of the found partial image is named as , in Fig. S1c. Figure   S1d shows measured difference of the horizontal position , − , , indicating monotonically decreasing. This result quantitatively shows how the rapid scan image is shrunken in the x direction comparing to the slow scan image. Importantly, the slope of the curve in Although the image distortion was expected to be limited to the X direction, we have calculated that in the Y direction to make sure that there is no affection in the Y direction. The same procedure in the case of the X direction was applied here but the dimensions of the window were 20 pixels and 300 pixels for the vertical and horizontal directions, respectively. As a result, we have confirmed that there was no detectable distortion in the y direction for everywhere.

B. Collection of training data for deep learning
Because the effective thickness, which an electron beam penetrates through, continuously changes as the specimen tilts, the signal-to-noise ratio also differs for each angle. Hence, training data collected from a single tilt angle was supposed to be insufficient for a noise filter applicable to images with a wide range of tilt angle. That is why we collected the training data from 5 tilt-angles (0°, 20°, 40°, 60° and 70°) as explained in the main text.
We confirmed that the training data and the rapid tilt-series images (Tilt-series 1) were almost comparable regarding the original intensity histogram. Figure S2 shows how we evaluated the intensity range of the training data and the rapid tilt-series images. In the figure, the histogram (     9

D. Performance evaluation of U-Net-based and BM3D-based noise filters
The performance of the noise filters in this study was evaluated by the relative dislocation contrast and the relative width of the visualized dislocation line as shown in Tables 2 and 3 in the main text.
The displayed values were calculated as ratios to the reference images (averaged images by 50 frames). Figure S6 shows examples of test data used for evaluation. Note that any images in the test data were not included in the training data. Figure S6. Examples of test data for 0°, 20°, 40°, 60° and 70° comparing to their denoised images by the U-Net and BM3D filters, and the reference images. Figure S7 shows the distribution of pixel value extracted along a line crossing a dislocation in a reference image, where the line profiles obtained from the corresponding BM3D-based denoised image and the U-Net-based one are indicated by the green and red line, respectively. Here, the contrast of dislocation was defined as the depth of the local minimum from the surrounding intensity (indicated as "1" in Fig. S7). In the case of Fig. S7, the left side of the dislocation has higher intensity than the right side. For such an asymmetric dip, we always selected the higher side as the standard level for the contrast measurement. The width of contrast was measured at the level of the half depth of the red bar 1 (indicated as "2" in Fig. S7). We have obtained 10 line-profiles from the images for 0°, 20°, 40°, 60° and 70° (Fig. S6), respectively, and measured the contrast and the width (1 and 2) following the above procedure to evaluate deterioration of the spatial resolution and visibility.

E. Optimization of BM3D-based noise filter
To utilize BM3D-based denoising, we needed to choose noise of image and its variance in advance.
Nine types of noise are assumed in the BM3D-based noise filter used in this study [S1], where we can choose the optimum noise type and adjust the variance of the noise amplitude distribution. We determined the best choice of the noise type and the noise variance by evaluating the PSNR relative to the reference image (averaged image). Let a noise called "gw" is as an example case, figure S8 shows a variation of PSNR, demonstrating that the peak (the best performance of BM3D-based denoising) appears around 0.002 of the assumed noise variance. By calculating the PSNR over 0.0001~0.0125 of the noise variance, and over all the nine types of noise, we determined the best combination of those for each image. Such optimized noise filter was to be compared to U-Netbased noise filter. Figure S8. Variation of PSNR with standard deviation of noise. Figure S9 shows a representative yz cross-section of the noise filtered Rapid 3D, where the seven cross-sections of dislocations form local maxima (indicated by red ellipses). In order to determine a reasonable area for summation of the weighted average (Eq. 1 in the main text), we first detected the local maxima for each of the dislocations as shown in the Fig. S9. Each of the summation was performed within the 100 nm × 100 nm area centered at the found local maximum.

G. Image binarization
For ideal tomographic reconstruction, tilt-series images must satisfy the projection requirement, that is, intensity in each of the tilt-series images must be a monotonic function of the targeted physical quantity integrated along the electron beam trajectory. However, raw tilt-series images of dislocations in a slab-shaped sample hardly satisfy the projection requirement, since the dislocation contrast depends not only the thickness of lattice strain distribution but also the total sample thickness (thickness of the surrounding matrix).
To eliminate such an undesirable sample-thickness dependence, the following binarization was performed on each of the images in both Tilt-series 2 (rapid STEM tomography) and Tiltseries 3 (slow STEM tomography) before 3D reconstruction, by using ImageJ and OpenCV Unlike a conventional STEM tomography, in the rapid STEM tomography, it is hard to record the accurate tilt angle from the angle indicator of the goniometer because the sample tilts at nonuniformly high speed without stopping from the initial angle to the final one. Therefore, we calculated the tilt angle for each of the frames from the tilt-series images themselves.
At first, we found a pair of dislocation tips as "feature points" in an image, which are to be also marked throughout the series of images (Fig. S11a). If the line connecting the feature points is not tilted, the vertical distance between the feature points should be longest as shown in Fig.   S11b. Therefore, the tilt angle in a-th frame can be estimated by comparing the distance throughout the entire frames, i.e., it is expressed as where is the maximum distance in a tilt series. When = , becomes 0°, meaning that the line connecting the feature points horizontally lies in the sample. Since the 3D reconstruction just requires relative angles among the frames, we can arbitrarily define the origin of the tilt angle. In other words, the surface of the slab sample is not needed to be perpendicular to the electron beam at 0° of the tilt angle. For the purpose of error reduction, we measured (b=1,2, ⋯,10) from 10 pairs of feature points. However, we cannot simply average among the different pairs since the connecting lines are tilted from each other by certain angles. Such an angle difference of the b-th pair to the 1st pair is calculated as where n is the number of total frames. By subtracting this angle difference from the original tilt angle, i.e., all the angle origins are unified to the origin of 1 . Finally, the tilt angle of a-th frame is estimated as, where m is the number of calculated distances (m=10 in this study). In this way, we obtained a relationship between the frame number and tilt angle as shown in Fig. S12. Figure S11. An example of how to obtain the vertical distance between two points, namely "feature points". (a) Representative images in the rapid tilt series with markers indicating the vertical distance between the feature points. Since these images were obtained as projection of a specimen, the distance changes as it tilts, as schematically drawn in (b). Figure S12. Variation of tilt angle with frame number.