Deep learning polarization distributions in ferroelectrics from STEM data: with and without atom finding

Over the last decade, scanning transmission electron microscopy (STEM) has emerged as a powerful tool for probing atomic structures of complex materials with picometer precision, opening the pathway toward exploring ferroelectric, ferroelastic, and chemical phenomena on the atomic-scale. Analyses to date extracting a polarization signal from lattice coupled distortions in STEM imaging rely on discovery of atomic positions from intensity maxima/minima and subsequent calculation of polarization and other order parameter fields from the atomic displacements. Here, we explore the feasibility of polarization mapping directly from the analysis of STEM images using deep convolutional neural networks (DCNNs). In this approach, the DCNN is trained on the labeled part of the image (i.e., for human labelling), and the trained network is subsequently applied to other images. We explore the effects of the choice of the descriptors (centered on atomic columns and grid-based), the effects of observational bias, and whether the network trained on one composition can be applied to a different one. This analysis demonstrates the tremendous potential of the DCNN for the analysis of high-resolution STEM imaging and spectral data and highlights the associated limitations.

The functionality of ferroelectric materials is inseparably linked to the static distributions and dynamic behaviors of the polarization. [1][2][3][4][5] The discontinuity of polarization is associated with the emergence of bound charge, resulting in strong coupling between the polarization and electrochemical, [6][7][8][9][10][11][12][13] semiconductive, 14-17 and transport phenomena. [18][19][20][21][22][23][24] Compared to ferromagnets, ferroelectrics have extremely short correlation lengths and domain wall widths, on the order of several unit cells. This results in an extreme sensitivity of the polarization dynamics on the atomic structure. For example, since the early work of Miller and Weinreich 25 and Burtsev and Chervonobrodov [26][27][28] it has been realized that domain wall motion proceeds via the generation of kinks in the domain walls. This further results in strong interactions between topological defects in ferroelectrics and charged impurities, giving rise to unique functionalities of ferroelectric relaxors. [29][30][31] These considerations have stimulated extensive efforts toward exploring ferroelectric materials on the atomic level via (scanning) transmission electron microscopy, (S)TEM. The feasibility of visualizing polarization fields by TEM was first demonstrated in the late 1990s by Pan. 32 A decade later, work by Jia demonstrated the potential of TEM for mapping polarization behavior at the level of individual structural 33 and topological 34,35 defects. At about the same time, groups at Oak Ridge National Laboratory [36][37][38] and the University of Michigan 39 demonstrated STEM imaging of polarization in ferroelectrics, igniting rapid growth in this field. In these studies, STEM data is used to directly position the centroids of atomic columns and then the unit-cell-scale dipoles are calculated from the product of the displacements with associated Born or Bader charges. 40 Multiple observations of polarization distribution on topological defects, [41][42][43] interfaces, 44 modulated structures, 45 and extended defects 33,46 have been reported.
These studies have not only offered visualization of the polarization fields but have also allowed quantitative insights into the physics of ferroelectric materials. In the mesoscopic Ginzburg-Landau models, the structure of polarization distributions in the vicinity of domain walls or interfaces is intrinsically linked to the structure of the free energy functional, its gradient or flexoelectric terms, and the boundary conditions. [47][48][49] Correspondingly, quantitative analysis of STEM data can provide insight regarding the corresponding mechanisms. 43,50 Recently, this analysis has been extended toward the Bayesian analysis of domain wall structures, allowing incorporation of past knowledge of materials physics into the model and quantifying the requirements to microscopic systems required to identify specific aspects of physical behaviors. 51 These analyses necessitate understanding of the veracity of the polarization analysis from STEM images and further necessitate the development of image analysis tools that allow rapid transformation of the STEM images into polarization fields, both as a first step toward physicsbased analyses and as a necessary step toward automated experimentation with image-based feedback. Here, we explore the applications of deep convolutional neural networks (DCNNs) for reconstruction and segmentation of STEM images of ferroelectric materials and explore some of the potential sources of observational biases in this analysis. As a model system we explore a thin film of the Sm-doped ferroelectric BiFeO3 (BFO) epitaxially grown on a SrTiO3 (STO) substrate as a combinatorial library with Sm concentration varying from 0 to 20%. Several SmxBi1-xFeO3 STEM samples with different substitution concentration x are obtained from one composition spread 51, 52 spanning x = 0% (pure BiFeO3 (BFO)) to 20% (Bi0.8Sm0.2FeO3). For BFO the ferroelectric polarization strongly couples with the lattice, notably the heavy cation Bi and Fe sublattices which are readily imaged by atomicresolution STEM, and this cation non-centrosymmetry is used as a proxy for the ferroelectric polarization vector. STEM images are collected using a high-angle annular dark field (HAADF) detector, which for zone-axis projected crystalline materials produce intuitive bright-atom contrasts images such as that shown in Figure 1a for [100]psuedocubic BFO. The growth parameters, sample preparation, and imaging details are the same as in our previous publications. 51,52 The data set for the composition series is publicly available at DOI 10.5281/zenodo.4555978.
The spatial distribution of lattice structures and symmetry breaking distortions can be derived from the real-space positions of the atoms, relying on parameterizations of the atomic columns that are typically fitted as Gaussians. This process is illustrated in Figure 1 for mapping the distribution of polarization in pure rhombohedral BFO that manifest in a phase offset between the local Bi A-site and Fe B-site sublattices. Figure 1  This is also apparent from uncertainty estimates from the Gaussian-fit optimization function, as shown in the histograms of Figure 1 (e). Some measurements can be made on the higher precision A-site sublattice alone, such as lattice spacings/strain, but the polar displacement requires both sublattices and thus, the Fe site is the most significant error contribution. The spatial distribution of P error estimates from constituent atomic fitting is shown in Figure 1    The process of polarization field mapping by this approach is computationally intensive, requiring identifying all the atoms in the system, a fitting refinement of their position, and mapping neighbor relationships. In practice manual input is often necessary too in order to curate, threshold, filter/smooth, set parameter fitting bounds, remove lattice defects, etc. Furthermore, as with any point estimate it is also associated with relatively high noise. Similarly, the use of the ad-hoc Gaussian fitting to position the atomic column center as opposed to deconvolution using the correct beam profile leads to systematic fitting errors. Finally, measurement artifacts associated with zoneaxis mis-tilt can also manifest as sublattice phase offsets, leading to systematic errors of this measurement that are independent of polarization values 53,54 . In practice this leads to observed polarization values of opposite domains mirrored at a domain wall to exhibit unequal magnitudes, or the appearance of non-centrosymmetry in centrosymmetric materials.
We explore the applications of supervised DCNNs for the extraction of polarization and other structural descriptors from STEM image data with and without atom finding. All details of this framework can also be found in the accompanying Jupyter notebooks. As a first step, we establish whether DCNN analysis can substitute for classical featurization of the STEM images if atomic positions are predetermined, e.g., using deep learning atom finding algorithms. 55,56 Here, we implement the PyTorch DCNN models with three convolution blocks; the first one contains five 2D convolution layers with 32 filters each; the second has two 2D convolution layers with 64 filters each; and the third has one convolution block with two 2D convolution layers having 128 filters each. The leaky rectified linear unit (LReLU) is considered as the activation function in all these blocks. A 2D max pooling layer for dimensionality reduction is also added at the end of the second convolution block. A dropout for preventing overfitting and a batch normalization layer for training networks in mini batches are added toward the very end of the network architecture. The feature set is the sub-images (80*80), whereas the target vector is the unit-cell descriptors such as unit-cell parameters, volume, and polarization vector components.  The individual points and kernel density estimates for the distribution are shown as a way to visualize both the average behaviors and outliers. The observed dynamics are rather remarkable.
For the majority of the locations, the DCNN-predicted parameters tend to have narrower distributions then the original (measured) values. This behavior is expected since DCNNs tend to smooth the data. However, for extreme values of the parameters the DCNN predictions start to deviate strongly, leading to unphysical predicted values. where the maxima corresponding to the film and substrate are clearly seen.
While the analyses in Figures 2 and 3 show reduced noise levels compared to classical analyses, they only offer a partial advantage compared to the classical approach since both are based on identification of atomic position. Here, we further explore whether the DCNN approach can be used for mapping polarization fields in the raw STEM images without using atom finding.
We note that this is expected to be feasible given the DCNNs are invariant to translations in the image plane.
To explore this, we configured a 'sliding window' approach to generate sub-images that are not necessarily centered around specific atoms. For a predefined window size, parts of the STEM images lying inside the window are first considered. These form the feature set for the  polarization values, predicted ones by networks trained on the same Sm concentrations, and pointby-point difference between them showing uncertainties in predictions for 0%, 7%, and 10% Sm concentrations, respectively. Note that while 0% Sm corresponds to the pure rhombohedral ferroelectric BiFeO3, the 7-10% doping corresponds to the monoclinic phases at the morphotropic boundary and 20% corresponds to orthorhombic non-ferroelectric phase. In all cases, the uncertainty is relatively low, assuring reasonable performance of these networks. How extendable these predictions are (as discussed later), meaning if trained on one and applied to another can lead to similar accuracies, is also extremely important to further show robustness of such networks.  To gain insight into the DCNN operations, we constructed feature maps for individual trained DCNNs illustrating how the input is transformed passing through the convolution layers.
Once an input image is passed through a specific block, layer, and filter, the immediate activations are recorded, which are plotted to visualize the corresponding encoded features. For each layer, there are multiple (32 or 64 or 128) filters yielding individual feature maps. For example, for one convolution layer with 32 filters, a sum of 32 feature maps can be plotted corresponding to each filter for that specific layer. Figure 5 shows selected feature maps for four convolution layers of the first block of the networks. The DCNNs that are trained on a stack of sub-images that are both NC (a-d) and C (e-h) around atoms are utilized for constructing these representative maps. From these feature maps, it is evident that atoms in both lattices become more prominent in each filter as we progress from one layer to the next one.  In addition to feature maps, we also visualized CNN filters present in different blocks (similar to the celebrated DeepDream 57 approach). These visualizations primarily display the patterns each filter maximally respond to. Any random image (could be one from one of the subimage stacks) is considered as input. A loss function maximizing the value of the CNN filter is used to iteratively perform gradient ascent in the input space such that the algorithms find input values where the filter is activated the most. Figure 6 has a few representative visualizations of how the first three layers (b-d) of the first convolution block are activated as a random image (a) is selected as an input to this specific network.
The activations in the last kernel for three consecutive filters are shown in Figure 6. This analysis not only helps to understand the network architecture in greater detail but also shows how layers located deeper in the network facilitate in visualizing more training data-specific features.
In the specific example in Figure 6,   To evaluate the performance of the trained networks, we computed the MSE for all 26 networks as applied to all 13 sub-image stacks for both NC and C as represented by heatmaps, as shown in  We further show that the polarization fields can be visualized from the STEM images without atom finding using DCNN analysis of atom-centered sub-images and arbitrarily selected sub-images bypassing the atom finding stage. This approach was found to give the correct polarization values for the majority of the image and can be readily incorporated during data acquisition. However, the presence of local defects (i.e., out of distribution data) leads to significant errors in the prediction at certain locations. These can be further used to identify sites for automated experiments. Overall, the translational invariance built in into the DCNN structure can significantly facilitate the extraction of physical order parameter fields from structural and potentially high-dimensional data.

Acknowledgments:
This sublattices of the systems, utilizing too big of a window size such as 160 is also not reasonable to capture the atomic features and respective polarization behaviors. Adding any magnitude of noise does not improve the performance of the networks either. In some cases, large amount of added noise makes it difficult for the networks to learn distinct patterns as present in the dataset, that are even obvious to human eyes. We note that for a large enough WS like 80 or 160, the effect of adding more noises seem to be lesser compared to that for smaller WS.