Deep learning ferroelectric polarization distributions from STEM data via with and without atom finding

Over the last decade, scanning transmission electron microscopy (STEM) has emerged as a powerful tool for probing atomic structures of complex materials with picometer precision, opening the pathway toward exploring ferroelectric, ferroelastic, and chemical phenomena on the atomic scale. Analyses to date extracting a polarization signal from lattice coupled distortions in STEM imaging rely on discovery of atomic positions from intensity maxima/minima and subsequent calculation of polarization and other order parameter fields from the atomic displacements. Here, we explore the feasibility of polarization mapping directly from the analysis of STEM images using deep convolutional neural networks (DCNNs). In this approach, the DCNN is trained on the labeled part of the image (i.e., for human labelling), and the trained network is subsequently applied to other images. We explore the effects of the choice of the descriptors (centered on atomic columns and grid-based), the effects of observational bias, and whether the network trained on one composition can be applied to a different one. This analysis demonstrates the tremendous potential of the DCNN for the analysis of high-resolution STEM imaging and spectral data and highlights the associated limitations.

These considerations have stimulated extensive efforts toward exploring ferroelectric materials on the atomic level via (scanning) transmission electron microscopy, (S)TEM. The feasibility of visualizing polarization fields by TEM was first demonstrated in the late 1990s by Pan 32 . A decade later, work by Jia demonstrated the potential of TEM for mapping polarization behavior at the level of individual structural 33 and topological 34,35 defects. At about the same time, groups at Oak Ridge National Laboratory [36][37][38] and the University of Michigan 39 demonstrated STEM imaging of polarization in ferroelectrics, igniting rapid growth in this field. In these studies, STEM data is used to directly position the centroids of atomic columns and then the unit-cell-scale dipoles are calculated from the product of the displacements with associated Born or Bader charges 40 . Multiple observations of polarization distribution on topological defects [41][42][43] , interfaces 44 , modulated structures 45 , and extended defects 33,46 have been reported.
These studies have not only offered visualization of the polarization fields but have also allowed quantitative insights into the physics of ferroelectric materials. In the mesoscopic Ginzburg-Landau models, the structure of polarization distributions in the vicinity of domain walls or interfaces is intrinsically linked to the structure of the free energy functional, its gradient or flexoelectric terms, and the boundary conditions [47][48][49] . Correspondingly, quantitative analysis of STEM data can provide insight regarding the corresponding mechanisms 43,50 . Recently, this analysis has been extended toward the Bayesian analysis of domain wall structures, allowing incorporation of past knowledge of materials physics into the model and quantifying the requirements to microscopic systems required to identify specific aspects of physical behaviors 51 .
These analyses necessitate understanding of the veracity of the polarization analysis from STEM images and further necessitate the development of image analysis tools that allow rapid transformation of the STEM images into polarization fields, both as a first step toward physics-based analyses and as a necessary step toward automated experimentation with image-based feedback. Here, we explore the applications of deep convolutional neural networks (DCNNs) for reconstruction and segmentation of STEM images of ferroelectric materials and explore some of the potential sources of observational biases in this analysis.

RESULTS AND DISCUSSION
Polarization field mapping using atomic position parametrization As a model system we explore a thin film of the Sm-doped ferroelectric BiFeO 3 (BFO) epitaxially grown on a SrTiO 3 (STO) substrate as a combinatorial library with Sm concentration varying from 0 to 20%. Several Sm x Bi 1−x FeO 3 STEM samples with different substitution concentration x are obtained from one composition spread 51,52 spanning x = 0% (pure BiFeO 3 (BFO)) to 20% (Bi 0.8 Sm 0.2 FeO 3 ). For BFO the ferroelectric polarization strongly couples with the lattice, notably the heavy cation Bi and Fe sublattices which are readily imaged by atomic-resolution STEM, and this cation non-centrosymmetry is used as a proxy for the ferroelectric polarization vector. STEM images are collected using a high-angle annular dark field (HAADF) detector, which for zone-axis projected crystalline materials produce intuitive bright-atom contrasts images such as that shown in Fig. 1a for [100] psuedocubic BFO. The growth parameters, sample preparation, and imaging details are the same as in our previous publications 51,52 . Each dataset has been corrected for scanning aberrations using a reconstruction from pairs of scans taken at orthogonal angles 53 . This improves the preservation of spatial information of raster-scan datasets, as from sample drift, notably in this case the atom nearestneighbor positions underlying the local analysis and limiting distortions in subimages presented to the DCNN.
The spatial distribution of lattice structures and symmetry breaking distortions can be derived from the real-space positions of the atoms, relying on parameterizations of the atomic columns that are typically fitted as Gaussians. This process is illustrated in Fig. 1 for mapping the distribution of polarization in pure rhombohedral BFO that manifest in a phase offset between the local Bi A-site and Fe B-site sublattices. Figure 1b depicts a local neighborhood of Gaussian fits corresponding to the inset HAADF-STEM image. This polar displacement vector, P, is defined as the difference between the central Fe (red) position and the average of the four neighbor Bi atom positions (blue) or vice-versa for a Bi-centered cell. The colorized vector distribution for the HAADF-STEM image is shown in Fig. 1c, illustrating the polydomain polarization distribution, which is dominated by a 109°-type domain wall bisecting the image. An upper bound of the positional error of this parameterization can be made by measuring the total variation from the ideal uniform lattice spacings. Heat maps of the variation from mean values are shown for the Bi A-site and Fe B-site sublattices in Fig. 1d for a distance of 1 Å with root mean square (RMS) values of 9.7 pm and 14.6 pm, respectively, illustrating the greater uncertainty of fitting the dimmer Fe positions. This is also apparent from uncertainty estimates from the Gaussian-fit optimization function, as shown in the histograms of Fig. 1e. Some measurements can be made on the higher precision A-site sublattice alone, such as lattice spacings/strain, but the polar displacement requires both sublattices and thus, the Fe site is the most significant error contribution. The spatial distribution of P error estimates from constituent atomic fitting is shown in Fig. 1f showing that the uncertainty systematically increases in some regions such as the STO substrate at the top and closer to the free surface (bottom of image, especially at right).
The process of polarization field mapping by this approach is computationally intensive, requiring identifying all the atoms in the system, a fitting refinement of their position, and mapping neighbor relationships. In practice manual input is often necessary too in order to curate, threshold, filter/smooth, set parameter fitting bounds, remove lattice defects, etc. Furthermore, as with any point estimate it is also associated with relatively high noise. Similarly, the use of the ad-hoc Gaussian fitting to position the atomic column center as opposed to deconvolution using the correct beam profile leads to systematic fitting errors. Finally, measurement artifacts associated with zone-axis mis-tilt can also manifest as sublattice phase offsets, leading to systematic errors of this measurement that are independent of polarization values 54,55 . In practice this leads to observed polarization values of opposite domains mirrored at a domain wall to exhibit unequal magnitudes, or the appearance of non-centrosymmetry in centrosymmetric materials. A (D)CNN trained with labels derived from atom- finding alone is not expected to mitigate all these factors, to the contrary, it will attempt to faithfully reproduce the labels inclusive of any systemic errors. However, the accuracy and robustness of the DCNN against established methods is a critical first step before training with more advanced labeling from wider parameter multi-datasets (e.g., incorporating thickness, mistilt, multimodal signals, etc).

Deep learning polarization via convolution networks
DCNNs do offer a significant advantage in speed. For DCNNs, the principal time and computation are front-loaded into the training of the network. Once complete, inference of new datasets can be done with modest hardware in real-time. Atom finding methods vary but generally high precision measurement is accomplished with parametric fits to gaussian functions using numerical The anomalous regions along with their respective spatial locations (both marked in red) associated with the predictions are also shown in c, d, g, h, and k, l for measured P x , measured unit-cell volume and measured unit-cell size, respectively.  regression (e.g., Gauss-Newton, Levenberg-Marquardt, Trustregion). Computation times are highly dependent on the specifics of the dataset and fitting method/environment, but a key disadvantage against DCNNs is these underlying iterative solver methods require large numbers of function evaluations which cannot be parallelized across iterations, hampering implementation on Graphics Processing Units (GPUs). As an illustrative comparison of the fitting in this work was performed using the trust-region reflective method for 5-parameter (elliptical) Gaussians using the least_squares function from the SciPy library 56 . Both the atom finding and DCNN inference were executed on the Google Colab platform. The Gaussian fit operation for the BiFeO 3 dataset in Fig. 1 took an average of 39 ms per atom over 47,583 total atoms in this dataset for a total execution time of 30 min 40 s. This is roughly on par with the training time of the DCNN discussed below. Inference for the same DCNN of all the atomcentered subimages subdivided into training (4742) and test (3197) sets took 1 s and 0.7 s, respectively, or 1.7 s total. It should be emphasized that the very large (3 orders of magnitude) disparity reflects their optimization to different execution environments, GPU vs CPU. DCNNs offer the potential to perform this processing in real-time or to significantly reduce computational requirements for batch processing of large datasets.
We explore the applications of supervised DCNNs for the extraction of polarization and other structural descriptors from STEM image data with and without atom finding. All details of this framework can also be found in the accompanying Jupyter notebooks. As a first step, we establish whether DCNN analysis can substitute for classical featurization of the STEM images if atomic positions are predetermined, e.g., using deep learning atom finding algorithms 57,58 .
Here, we implement the DCNN models in PyTorch deep learning framework 59 models with three convolution blocks; the first one contains five 2D convolution layers with 32 filters each; the second has two 2D convolution layers with 64 filters each; and the third has one convolution block with two 2D convolution layers having 128 filters each. The leaky rectified linear unit (LReLU) is considered as the activation function in all these blocks. A 2D max pooling layer for dimensionality reduction is also added at the end of the second convolution block. A dropout for preventing overfitting and a batch normalization layer for training networks in mini batches are added toward the very end of the network architecture. The feature set is the sub-images (80*80), whereas the target vector is the unit-cell descriptors such as unitcell parameters, volume, and polarization vector components. Figure 2 shows the comparisons of the of the DCNN predictions and the ground truth data. The individual points and kernel density estimates for the distribution are shown as a way to visualize both the average behaviors and outliers. The observed dynamics are rather remarkable.
Further shown in Fig. 3 are the joint distributions of the measured and DCNN-predicted polarization components. For most of the locations, the DCNN-predicted parameters tend to have narrower distributions than the original (measured) values. This behavior is expected since DCNNs tend to smooth the data. However, for extreme values of the parameters the DCNN predictions start to deviate strongly, leading to unphysical predicted values. The distribution is clearly multimodal, reflecting the ferroic variants present in the system. Remarkably, the observed maxima are asymmetric, suggesting that the polarization values extracted from the STEM images contain systematic errors, as from mistilt 54,55 . The corresponding distributions for the a and b lattice parameters are shown in Fig. 3d-f where the maxima corresponding to the film and substrate are clearly seen.
While the analyses in Figs. 2 and 3 show reduced noise levels compared to classical analyses, they only offer a partial advantage compared to the classical approach since both are based on identification of atomic position. Here, we further explore whether the DCNN approach can be used for mapping polarization fields in the raw STEM images without using atom finding. We note that this is expected to be feasible given the DCNNs are invariant to translations in the image plane.

Sliding window approach
To explore this, we configured a 'sliding window' approach to generate sub-images that are not centered around atomic centers. For a predefined window size, parts of the STEM images lying inside the window are first considered. These form the feature set for the DCNN training, i.e., local descriptor. To create the target set, i.e., the corresponding polarization or unit-cell volume value, we adopt the following approach. First, an upper bound of the cation-cation average interatomic distance along with a minimum distance to the centers of the identified unit-cells is set. The upper bound signifies the distance at which the contribution of a unitcell to polarization becomes zero. Next, a KDTree algorithm as implemented in Scipy library 56 is employed to query all the closest neighbors for a given list of coordinates to select only those falling within the specified maximum distance. The maximum distance is chosen such that neighboring 4-unit cells are situated at the same distance from each patch created by the specified window size.
The resulting (x,y) coordinates, corresponding polarization values along the x, y directions, and sub-images are utilized for training.
Each stack of sub-images and the associated physical values for different concentrations of Sm are used to build the networks. Here, the inputs (X) are the subimages generated utilizing window size of 80 pixels or 1.25 nm, and the output (y) is the polarization values. A set of 26 models trained on various training sets with all 13 stacks of sub-images (generated from individual input images) with a window size of 80 pixels, centered (C) and not centered (NC) around atoms utilizing the same architecture as above.
Polarization maps are generated by plotting the measured and predicted polarization values, as shown in Fig. 4. Specifically, the maps in (a-d), (e-h), and (i-l) represent the true polarization values, predicted ones by networks trained on the same Sm concentrations, relative differences between them as well as distribution of errors in predictions for 0%, 7%, and 10% Sm concentrations, respectively. Note that while 0% Sm corresponds to the pure rhombohedral ferroelectric BiFeO 3 , the 7-10% doping corresponds to the monoclinic phases at the morphotropic boundary and 20% corresponds to orthorhombic nonferroelectric phase. In all cases, the uncertainty is relatively low, assuring reasonable performance of these networks. We note that the relative percentage errors are significantly low in most of the regions except few boundary points. A comparison between the ground truth (d, i, n) and predicted (e, j, o) P x and P y components are shown using 2D contour plots. The relatively minor differences (outliers) can be attributed to possible mistilts, systematic errors in the observed polarization values. We have also added the 1D distributions of relative percentage errors in the Supplementary  Fig. 1. A more refined sampling approach can be explored to selectively train networks on datapoints to exclude outliers leading to higher errors, which is beyond the current scope of the paper. How extendable these predictions are (as discussed later), meaning if trained on one and applied to another can lead  to similar accuracies, is also extremely important to further show robustness of such networks.

Feature maps
To gain insight into the DCNN operations, we constructed feature maps for individual trained DCNNs illustrating how the input is transformed passing through the convolution layers. Once an input image is passed through a specific block, layer, and filter, the immediate activations are recorded, which are plotted to visualize the corresponding encoded features. For each layer, there are multiple (32 or 64 or 128) filters yielding individual feature maps. For example, for one convolution layer with 32 filters, a sum of 32 feature maps can be plotted corresponding to each filter for that specific layer. Figure 5 shows selected feature maps for four convolution layers of the first block of the networks. The DCNNs that are trained on a stack of sub-images that are both NC (Fig.  5a-d) and C (Fig. 5e-h) around atoms are utilized for constructing these representative maps. From these feature maps, it is evident that atoms in both lattices become more prominent in each filter as we progress from one layer to the next one.
In addition to feature maps, we also visualized CNN filters present in different blocks (similar to the celebrated DeepDream 60 approach). These visualizations primarily display the patterns each filter maximally respond to. Any random image (could be one from one of the sub-image stacks) is considered as input. A loss function maximizing the value of the CNN filter is used to iteratively perform gradient ascent in the input space such that the algorithms find input values where the filter is activated the most. Figure 6 has a few representative visualizations of how the first three layers Fig. 6b-d of the first convolution block are activated as a random image Fig. 6a is selected as an input to this specific network.
The activations in the last kernel for three consecutive filters are shown in Fig. 6. This analysis not only helps to understand the network architecture in greater detail but also shows how layers located deeper in the network facilitate in visualizing more training data-specific features. In the specific example in Fig. 6, the network follows the same trend where visualization of the third layer Fig. 6d displays more patterns as compared to that in Fig. 6b or Fig. 6c. Feature maps for all the filters for all the convolution layers present in each convolution block, as well as examples of CNN filter visualization of different blocks, can be found in the accompanying Jupyter Notebooks.

Noise-sensitivity performance
To test the extendibility and applicability of DCNNs trained with the best available dataset comprising NC and C sub-images, these networks are utilized to also predict polarization values for a set of noisy images. A set of noisy images for both NC and C sub-images are generated by adding five different magnitudes of gauss noise (GN) such as (10,100,200,500,2000 in arb. units) for varied window sizes of 10, 20, 40, 80, and 160. Here, each given value such as 10, 100 etc. is multiplied by 10 −4 and passed via scikit random noise function to add GN to the images. An example of how the noisy images appear is shown in Fig. 7, where (a-e) and (h-l) are the noisy NC and C sub-images (frame #0), respectively, with a window size of 80 with GNs added individually from the set of noises. Reference images of the same window size and subimages with no added noise are shown in f and m. The DCNNs for each window size as trained on a dataset without any additional noise introduced to the sub-images are then considered as pretrained models to evaluate the mean square error (MSE) values of predicted polarization corresponding to noisy sub-images and real polarization values. The error matrices (log (MSE)) for the complete set of 26 stacks are represented as heatmaps in Fig. 7g, n for NC and C sub-images, respectively. As the magnitude of noise increases, it becomes harder for the DCNNs to recover features, leading to larger errors between the predicted and measured polarization values. This behavior is somewhat expected. It is also safe to say that for sub-images created with smaller window sizes with higher noises added, the corresponding DCNNs will have much lower performance as compared to that for moderate window sizes. For example, the difference between real and predicted polarizations of sub-images for a window size of 10 and GN = 1000 is much higher as compared to those for a window size of 80 and GN = 1000, as evident from both (g) and (n) heatmaps.
To evaluate the performance of the trained networks, we computed the MSE for all 26 networks as applied to all 13 subimage stacks for both NC and C as represented by heatmaps, as shown in Fig. 8a, b, respectively. The stack of sub-images with lesser concentrations of Sm have high polarization values, meaning these systems are more ferroelectric in nature as compared to counterparts with higher dopants concentrations. Therefore, it is expected that deep NNs trained on 0% Sm should exhibit higher performance as applied to system with 20% Sm concentration. However, a network with information on lessferroelectric to non-ferroelectric systems should fail to predict higher polarization values for the systems with less Sm concentrations. This behavior is further illustrated by Fig. 8c-f, g-j as a couple of DCNNs trained on 0% and 20% Sm concentrations are applied to (0%, 7%) and (20%, 0%) dopant concentrations, respectively. The diverging colors (lighter to deeper shades) represent low to high polarization values. Figure  8c, g and d, h shows how the network trained on ferroelectric image is successful in predicting a range of high-low polarization values. To the contrary, a DCNN trained on sub-images with a 20% Sm concentration only yields reasonable performances when applied to the training set (e, j) and fails to predict high polarization values.
Simulated ADF-STEM images To gain insight into the possible origins of the observed behaviors, including asymmetric polarization distributions and cross-training, we simulated ADF-STEM images for several tilt values off from the zone axis. Calculations were carried out using the μSTEM program and the quantum excitation of phonons algorithm 61 . An accelerating voltage of 200 kV and probe forming aperture of 30 mrad was used. The specimen was assumed to be 100 nm thick and the ADF detector spanned 65-250 mrad. In order to simplify the calculation, the unit-cell angles were adjusted from 59.34 to 60 degrees so it could be converted into a cubic structure that would be more amenable to multi-slice calculations. This small change will have a minimal effect on the qualitative examination of specimen tilt. In Fig. 9a the simulated ADF-STEM image for an untilted BFO specimen is shown. Figure 9b-d shows increasing tilts with 10 mrad increments; all tilts are clockwise about the vertical y axis. While increasing tilt leads to a reduction in the image contrast, it is unclear to the naked eye if there is a relative shift in the cation positions. To examine this effect, line scans acquired across the Bi columns are shown in Fig. 9e. For the 30 mrad tilt, a shift of the apparent Bi column is evident. In Fig. 9f, line scans acquired across the Fe columns are shown and any shifts in the apparent position are much smaller. To quantify this effect, atom finding routines are used to locate the apparent position of the two atoms circled in Fig. 9a. In Fig. 9g we plot the shifts in the x and y directions for each atom relative to the untilted simulation. It is clear that the Bi column shifts approximately twice as far as the Fe column, which would result in an apparent change of polarization.
To summarize, we developed an approach for the analysis of atomically resolved STEM image data of ferroelectric materials to extract local polarization based on sub-image analysis. We demonstrate that the application of DCNN-based regression on sub-images centered on a given sub-lattice yields values similar to direct column position analysis. It should be noted that in both cases, the derived values are biased compared to the expected values. We attribute this behavior to the effect of sample mis-tilt during imaging. Correspondingly, dynamic correction of this effect becomes a key element of the quantitative STEM image and may necessitate the development of tools adjusting the specimen tilt at different parts of the image. We further show that the polarization fields can be visualized from the STEM images without atom finding using DCNN analysis of atom-centered sub-images and arbitrarily selected sub-images bypassing the atom finding stage. This approach was found to give the correct polarization values for the majority of the image and can be readily incorporated during data acquisition. However, the presence of local defects (i.e., out of distribution data) leads to significant errors in the prediction at certain locations. These can be further used to identify sites for automated experiments. Overall, the translational invariance built in into the DCNN structure can significantly facilitate the extraction of physical order parameter fields from structural and potentially highdimensional data.

Materials and characterization
All material systems utilized in this project are part of a publicly accessible combinatorial library consisting pulsed layer deposition fabricated layers of Sm-doped BiFeO 3 and the SrRuO 3 . All relevant details of sample preparations and STEM, TEM characterization measurements can also be found in this reference 52 .

CODE AVAILABILITY
All the deep learning routines were implemented using a home-built open-source software package AtomAI (https://github.com/pycroscopy/atomai). All details of the developed framework are available via two interactive Jupyter notebooks accessible at https://github.com/aghosh92/DCNN_Ferroics.