RapidEELS: machine learning for denoising and classification in rapid acquisition electron energy loss spectroscopy

Recent advances in detectors for imaging and spectroscopy have afforded in situ, rapid acquisition of hyperspectral data. While electron energy loss spectroscopy (EELS) data acquisition speeds with electron counting are regularly reaching 400 frames per second with near-zero read noise, signal to noise ratio (SNR) remains a challenge owing to fundamental counting statistics. In order to advance understanding of transient materials phenomena during rapid acquisition EELS, trustworthy analysis of noisy spectra must be demonstrated. In this study, we applied machine learning techniques to denoise high frame rate spectra, benchmarking with slower frame rate “ground truths”. The results provide a foundation for reliable use of low SNR data acquired in rapid, in-situ spectroscopy experiments. Such a tool-set is a first step toward both automation in microscopy as well as use of these methods to interrogate otherwise poorly understood transformations.

The use of neural networks (NNs) for spectra classification in EELS has been limited, with the majority of NN-based spectra classification having been focused on other techniques, such as Raman or X-ray absorption spectroscopy 12,[20][21][22] or denoising of STEM images [23][24][25][26] . The application of NNs to EELS data has, to date, relied on feature engineering, use of a reference spectrum, and requiring a priori knowledge during post-processing to extract useful information. Linear models have been used for the determination of oxidation states in metal oxides 27 and manganese valence states 28 . Kalinin et. al explored unmixing and supervised classification on heterogeneous self-assembled monolayer films of doped-semiconductor nanoparticles 29 . Blum et. al proposed a new algorithm for strong metal-support interactions and exploring encapsulation signals in heterogeneous catalysts 30 . Most notably, Chatzidakis et. al 31 investigated EELS spectra oxidation state classification with a mix of fully dense and fully CNN architectures and samples augmented with varying simulated noise levels. Advances in STEM allows for EELS spectral images of beam-stable materials to be obtained with a high SNR 32,33 . However, the high current density and longer dwell time needed may damage sensitive samples [33][34][35] .
Autoencoders are a type of unsupervised neural network model that maps high dimensional input features to a lower dimension and then reconstructs the original input from the lower dimensional layer, i.e. the latent space. Generally, AECs have a three-part symmetric structure: encoder, latent space, decoder. The encoder maps high dimension input data to a lower dimension representation. In doing so, it learns and builds patterns in the data that might be overlooked by or invisible to human-designed models or analysis. These patterns are represented in a reduced dimension vector space, also called latent space, that produce a more generalized characterization of the data. A decoder, inverse in structure to the encoder, is then trained to reconstruct the input from the latent space features extracted by the encoder. A variation of this, commonly used in the image reconstruction field, is the semi-supervised denoising auto encoder. The encoder maps a partially destroyed or incomplete input to the latent space, decodes as a non-destroyed (denoised) image, and is trained to minimize the average reconstruction error between the reconstruction and the ground truth image.
In this study, we utilize a dual Autoencoder-Classifier algorithm to denoise low SNR spectra collected at 400FPS, as a starting point for rapid identification of real-time EELS data. The reduction of SrFeO 3−δ ( δ represents the oxygen deficiency) is used as a model experiment, with the detection of oxidation state changes from nominally SrFeO 3 ( Fe 4+ ) to SrFeO 2.5 ( Fe 3+ ) as target metrics to benchmark our algorithms in terms of accuracy and effectiveness. Specifically, we employed a stacked convolutional denoising AEC and fully connected classifier on a model system to study noise reduction and oxidation state classification for low SNR EELS spectra collected at 400FPS. Classification accuracy on the latent space representations indicate how successfully the AEC learns relevant, unique features and can be expanded on in the future for more complex unsupervised applications, or high frame rates studies of non-equilibrium phenomena. The notebooks developed during this study can be found on GitHub (https://github.com/patecm/rapidEELS).

Methodology
As a model system to investigate EELS fine structure changes, we study the reduction of a SrFeO 3−δ thin film. A SrFeO 3−δ film of roughly 20 nm was grown via molecular beam epitaxy on a ( LaAlO 3−δ ) 0.3 (Sr 2 TaAlO 6 ) 0.7 (LSAT) substrate 36 . Post-growth, the film was oxidized with an ozone anneal to produce perovskite SrFeO 3 , with δ ≈ 0 37 . For δ = 0, SrFeO 3 is metallic and paramagnetic with a cubic perovskite structure and Fe is in a formal 4+ oxidation state. With reduction to δ = 0.5, the brownmillerite SrFeO 2.5 structure is stabilized, which is an insulating antiferromagnet with Fe in an average 3+ oxidation state 38 . While the metal-to-insulator and magnetic transitions hold technological relevance, we are mainly interested in the core-loss spectral changes associated with the altered Fe oxidation state and Fe-O bond geometry.
STEM-EELS measurements were performed with a JEOL 2100F instrument and a Gatan Imaging Filter Quantum equipped with a K2 direct electron detector operated in counting mode 4 . The STEM convergence semi-angle was 8 mR, and the EELS collection semi-angle was 24 mR. For EELS, the 5 mm aperture was used with a dispersion of 0.125 eV/channel. A focused ion beam (FEI DB235) was used to prepare a conventional liftout for TEM analysis. Final thinning was performed with 5 keV Ga ions. The FIB lift-out was then re-oxidized via the ozone anneal, assuming some loss of O during FIB sample preparation. After initial TEM and EELS analysis, the lift-out was removed from the TEM and reduced ex situ by annealing on a hot plate at 300 C in ambient atmosphere for 10 min Subsequently, the sample was placed back in the TEM for further TEM and EELS analysis. We note the presence of an interfacial region between the SrFeO 3−δ film and LSAT substrate (present before and after ex situ annealing), characterized by planar defects observed with HRTEM and reduced intensity observed in STEM-ADF (indicating a lower average mass density). This defective interfacial region is excluded from subsequent imaging and spectral analysis.
HRTEM imaging and EELS show a clear transition in the SrFeO 3−δ film after annealing. As shown in Fig. 1B, before the ex-situ anneal, the SrFeO 3−δ shows cubic symmetry as expected for perovskite SrFeO 3 . After the anneal, half-order peaks are observed in the HR-TEM Fourier transform, indicative of O vacancy ordering in the brownmillerite SrFeO 2.5 structure. Spectrally, clear changes in both the O K-edge and Fe L-edge are observed, which are consistent with the perovskite SrFeO 3 to brownmillerite SrFeO 2.5 transition studied with X-ray absorption spectroscopy at the same edges 39 . Spectral analysis shows that the energy difference between the O K-edge onset and the Fe L-edge onset is reduced from 183.1 to 181.5 eV after annealing, and additionally, the Fe L 3 /L 2 white line ratio increased from 3.8 to 4.6. Both changes are consistent with a nominal decrease in the Fe oxidation state from ∼ 4+ to ∼ 3+ with annealing. Thus, while the precise value of δ in our film is not known, we conclude a clear transition was induced via the ex-situ anneal, and that the initial and annealed states roughly corresponding to SrFeO 3 and SrFeO 2.5 .
Experimental EELS datasets were used to develop our algorithm. For the initial-state EELS data acquisition, multi-frame spectra images (SIs) were acquired with a pixel size of 1 nm 2 , 0.0025 s dwell time, and lateral www.nature.com/scientificreports/ dimensions of 10 638 pixels. Three frames were acquired, and a drift correction was performed between each frame. The SI was vertically aligned within the SrFeO 3−δ layer to only probe the central region of the film, thereby excluding the defective interfacial layer as well as damage at the top of the film (the green box in Fig. 1A shows one representative SI area). Four multi-frame SIs were acquired in the initial-state. Each SI was shifted laterally along the SFO/LSAT interface, which, owing to the sample geometry and FIB preparation method, resulted in sampling various thicknesses of the SFO film. Based on low-loss EELS measurements, the four SIs covered a continuous thickness range from 0.6 to 1.3 inelastic mean free paths (MFPs). Note that the 4 multi-frame SIs give a total of 12 SIs. The exact same procedure was repeated after annealing the sample ex situ. Various irregularities in the protective layers above the SrFeO 3−δ film were used to correctly position the annealed-state SIs with the corresponding initial-state SIs. While the spectral differences associated with the SrFeO 3 to SrFeO 2.5 transition appear obvious in the data shown in Fig. 1, we note three experimental difficulties which make practical classification of spectra a challenging task. First, the SNR of spectra associated with individual pixel elements is extremely low. For each individual spectra, the average number of electron counts per channel at the O K-edge is ≈ 15 (for a local thickness of 0.8 MFP), which gives a shot noise limited SNR of sqr(N) = 3.8. This low SNR is shown in Fig. 2A. Secondly, given the large range of sample thicknesses, plural scattering leads to significant spectral changes as a function of thickness. This is shown in Fig. 2B, which compares a thin versus thick region of the SFO in the initial state. Lastly, as the STEM beam scans across the sample, there is an associated shift of the EELS data in energy space. This artifact is shown in Fig. 2C, which compares spectra from the leftmost, central, and rightmost regions of a SI acquired from the annealed-state. Note that from the perspective of the classifier, the effects of thickness (plural scattering) and artificial energy shifts may be considered as additional 'noise' sources, which the classifier must deal with to accurately label spectra as 'initial' or 'annealed' . However, from the perspective of the AEC, the effects of thickness and energy shifts on a given spectra should be preserved. The AEC is useless unless it preserves subtle features of interest within a given spectra. Of course, artificial energy shifts are not 'of interest' but they may act as a criteria on which to assess the AEC performance.
Training data was grouped into four categories for comparison purposes, based on thickness-based spectral changes and oxidation state. 12 SIs are initial ( SrFeO 3 ) state and 12 have been annealed to SrFeO 2.5 using the methodology previous described. Each oxidation state contained six "thin" datasets, with a MFP between 0.65 and 0.80, and six "thick" datasets, with a MFP of 0.83-1.33. Hence, the four categories are thin initial, thick initial, thin annealed, and thick annealed, and each category contains 6 SIs. EELS data is collected in a 3D cube, where www.nature.com/scientificreports/ each 2D pixel of the TEM image has an associated energy spectrum. Each of the twenty-four EELS SIs produced STEM images with 638 by 10 spatial pixels, for a total of 153,120 input spectra. Of these, four SIs were withheld for testing, one from each of the four main categories: thin initial, thick initial, thin annealed, thick annealed. Twenty percent of the remaining training input spectra were used as a validation set.
To facilitate real-time classification of spectra in the future, pre-processing for the ML framework was kept as minimal as possible. Training spectra were imported from dm4 files and cropped from 520 to 580 eV around the O K-edge using the HyperSpy package 40 , and binned from 0.125 to 0.25 eV/channel, resulting in 240-feature long spectra. Spectra were normalized from 0 to 1, without any background subtraction. Five-fold cross-validation was performed prior to model training to select the dimensions of the latent space representation, learning rate, percent dropout, training batch-size, and number of training epochs. Mean centering standardization was not performed on the training data, as it was also determined in cross-validation to be unnecessary and in some cases to degrade the reconstruction. Translational shifts between scans or varying sample thicknesses were not corrected for.
To train the AEC toward denoising, 'ground truths' were created with a convolution filter across the spatial dimensions of the raw input spectra. Because the oxidation state was uniform across each EELS sample, convolution simulates a lower frame rate and therefore less noisy spectra. A simulated frame rate of 1FPS (filter size 10x40) was chosen since it was shown to produce spectra where oxidation states are visually distinguishable from one another. Fig. 3B shows the difference in the SNR for different simulated frame rates. In contrast to most denoising AEC, where the ground truth is corrupted by adding Gaussian noise, the spectra collected at 400FPS at each pixel was used as the noisy data. The addition of noise has been shown to improve reconstructions and prevent overfitting in denoising AECs [41][42][43] . Because the noise intrinsic to electron exposure (shot noise) follows a Poisson distribution 44 , Poisson noise was added to the input instead of the more commonly used random noise. PowerLaw background subtraction was performed only on the "ground truth" spectra before normalizing from zero to one for input to the AEC. Thus, the AEC is also trained to perform a background subtraction on the input data, which allows easier analysis on the denoised spectra.
The encoder architecture of the AEC consisted of five one-dimensional (1D) convolution hidden layers, which increase in filter size but decrease in kernel size. An overview of the model can be found in Fig. 4 with more details in Table 1 of the Supplemental Information. For some AEC applications, Dropout has proven more  impact on the signal to noise ratio when summing all spectra within the ROI. At high frame rates, the low signal to noise ratio makes it impossible to visually analyze or assign structure, as the signal is mostly noise. www.nature.com/scientificreports/ effective at preventing over-fitting than other regularization methods such as L1 norm [45][46][47][48] .During framework development, Dropout was found to produce better results than L1 regularization. Therefore, a twenty percent Dropout layer between layers is included, with the last convolution layer of the encoder also having a twenty percent dropout bottleneck. Pooling was not used between layers to avoid discarding finer details and features for the reconstructions 49 . Each hidden convolution layer had ReLU activation with the exception of the output layer of decoder, which used a linear activation. The decoder was structured as the inverse of the encoder, but without dropout between layers to preserve the reconstruction of the latent space. Connecting the encoder to the decoder is the latent space, which learns a five-dimension representation of the spectra. The model was fit using the Adam optimizer and minimizing the reconstruction mean square error (MSE) loss function. MSE was chosen over mean absolute error (MAE) because it is more sensitve to outliers 50 and after demonstrating a higher overall MSE for denoised spectra during cross-validation. Post-Autoencoder training, encoder layers were frozen before training the classifier. The latent space containing features extracted by the Encoder were used as input to the classifier, which consist of a single, two neuron dense layer with Softmax activation. All models were trained for 500 epochs each on a NVIDIA GTX1080Ti GPU, with CUDA v10.1 and cuDNN v7.6, using Keras and Tensorflow 2.3.0 back-end. Total training time was approximately 20 min (Fig. 5).

Results and discussion
Testing data consisted of four SIs, one of each oxidation state and thickness range, for a total of 25520 spectra. The training data was prepared in the same manner as previously described for the test dataset but without the addition of noise. Additional background subtraction was not performed on the predictions, as the "ground truth" training data had background removed prior to training the autoencoder which results in denoised spectra that are background subtracted. Two key concerns addressed by this study were if fine details and subtle features were maintained in the denoised reconstruction spectra and how well the autoencoder performed compared to common denoising techniques. Principle component analysis (PCA) was selected to benchmark fine feature reconstruction. While commonly used as a dimensionality reduction technique, PCA is also used in the computer  www.nature.com/scientificreports/ vision field as a noise reduction technique 51,52 . To establish general performance, Gaussian curves were fit to the pre-peak and K-edge peaks for each denoised pixel result (see Figure 1 of the Supplemental Information), using a least squared approximation to minimize the sum of squared residuals. Gaussian curves were also fit to the pre-peak and Oxygen K-edge of the Ground Truth spectra that corresponded to each pixel. Reconstruction errors of each sample class were determined by mean square error (MSE). Visualizations of the PCA reconstructions and AEC densoised SI are shown in Fig. 6. While the AEC denoised spectra had a lower MSE than the 5 and 7 component PCA reconstructions, 3 component PCA reconstruction had a comparable or lower MSE than the AEC denoised spectra, with the exception of the thicker annealed samples. The changes in peak position and height of the pre-peak and K-edge from the Ground Truth pixels, after Gaussian fitting, are shown in Table 2 of the Supplemental Information. Also of importance was the level of detail maintained in the denoised samples. Overall intensity trends in thickness-related variation (Fig. 2B) were maintained in the denoised 400FPS spectra (Fig. 7C). However, while translation variation from STEM probe position (Fig. 2C) were seen in the predictions (Fig. 7D), it was not as pronounced as in the ground truth spectra. This is likely do to the pooling nature of convolution minimizing variation between similar spectra and was expected due to previous applications by Chatzidakis and Botton in mitigating spectral shifts due to calibration difference 31 . Relative peak intensity, on average, was maintained in the denoised and PCA reconstruction spectra are displayed in Table 2 of the Supplemental Information.
Lastly, classification accuracy of the framework was evaluated for the 400FPS raw data and three other simulated frame rates. Using the 5-dimension latent space representation of the 400FPS input spectra, the classification algorithm achieved 82.0 percent accuracy in determining if the input spectra had oxidized. This indicates the encoder is successfully learning unique features of the low signal to noise ratio input data and is able to overcome shifts in the spectra due to probe position, as well as spectral changes due to thickness effects. When the frame rate was reduced to 200, 100, and 25 frames per second, the classification accuracy increased to 88.5 percent, 92.3 percent, and 93.0 percent respectively. Corresponding accuracy values for all SI are shown in Table 3 of the Supplemental Information.
Previous work by Chatzidakis and Botton 31 explored convolutional and dense classifying models to overcome translational shifts due to calibration differences between machines of EELS spectra for three different electron states of manganese. While we did not achieve the same 100 percent classification accuracy for our perovskite samples, classification accuracy was a secondary goal used to explore the latent representation of the data. Nevertheless, we achieved a classification accuracy of 93.0 percent and 85.8 percent on high SNR (20fps) and very low signal to noise ratio (400FPS) data, respectively. An algorithm trained with the explicit goal of classifying the denoised spectra directly would likely achieve even high classification accuracy. Peak statistic comparisons between the ground truth spectra and denoised spectra prove that fine-detail structure is maintained in the denoised spectra. Notably, at lower frame rates, the classification algorithm begins to have difficulties with the thick annealed spectra and mis-classifies as initial state. However, at low frame rates, it would likely not be necessary to use a deep learning framework for classification. www.nature.com/scientificreports/ Figure 8 shows 2-D t-SNE plots for the raw spectra data at five different frame rates, plus AEC denoise spectra at 400 FPS. Below 100 FPS, it would be possible to use clustering algorithms instead of a deep learning approach to classify oxidation state and sample thickness, because distinct clusters are formed. At 100FPS and above, the 2-component t-SNE representation of annealed and initial state input spectra become increasingly intermixed. At 400FPS, the clusters are completely entangled, yet the classification algorithm is still able to achieve 85.8 percent binary classification accuracy, proving that the AEC is able to learn relevant, unique information from even the low SNR data. The denoised 400 FPS spectra t-SNE representation begins to form distinct clusters, similar to raw data at low frame rates. This is important for unsupervised and semi-supervised learning applications, as it may be possible to employ clustering algorithms to classify non-labeled data if categorical clusters can be dis-entangled. Additionally, an abundance of recent work has focused on spatial components for linear unmixing, denoising and classification of hyperspectral data, such as Bayesian mode based on Markov random fields v , image in-painting by beta-process factor analysis 33 , and Gaussian processes methods 53 . Future work aims to incorporate similar spatial components and relationships for in-situ annealed samples.  . Two dimensional t-SNE plots of test spectra (raw data) at varied frame rates, including denoised 400 FPS spectra. Blue and orange markers represent the initial and annealed states, respectively. An 'X' marker represents a thin sample and ' o' marker represents a thick sample.

Conclusions
Our ML approach successfully classified spectra from different thickness for samples with 85.8 percent accuracy and reduced noise enough to visually compare to reference spectra. Trends in thickness-dependent details were preserved in the auto-encoder denoised spectra for the initial state SIs. Denoised annealed spectra produced peaks that were 0.3-0.4 eV In all benchmarks, the auto-encoder denoised spectra had similar performance to the PCA reconstructions, with respect to MSE. This paves the way for more regular use of ML codes in spectra acquisition to assist experiments in situ, and application of more complex ML, e.g., unsupervised learning, to take advantage of high frame rates and study non-equilibrium phenomena. The high classification accuracy on 400FPS latent space representations indicates the autoencoder is 'learning' relevant and unique low-dimensional features representing the SrFeO 3 and SrFeO 2.5 oxidation states. Future work will build on the foundation established here to investigate low-dimensional latent representations further for potential unsupervised applications without defined, finite classes. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.