Introduction

X-ray absorption spectroscopy (XAS)1 and electron energy loss spectroscopy (EELS)2, 3 are two techniques that can probe the unoccupied electronic states providing bonding information of materials. In particular, the L2,3 edges are widely used to determine the oxidation state of transition metals1, 4, 5. The transition metal L2,3 edges probe the unoccupied d orbitals and therefore the edge onset and the edges’ fine structures and shapes are sensitive to the oxidation state of the d-block metal ions, in particular the 3d transition metals, such as V, Ti, Mn, Fe, and Ni5,6,7,8. For example, using the near-edge fine structures in the Mn L2,3 edges, the oxidation states of Mn ions in a material can be determined by decomposing the spectrum into a linear combination of Mn2+, Mn3+, and Mn4+ reference spectra9, 10. This decomposition, in principle, is simple but in reality, it is non-trivial because the energy axis is not always calibrated, and the instrument/beamline does not always have the instrumental broadening. Without proper calibration, an energy offset is present between the experimental spectrum and the references which prevents accurate oxidation state decomposition. In order to avoid the problem, standard reference samples such as MnO, Mn2O3, and MnO2 need to be measured in the same experimental session to avoid any energy offsets as well as changes in instrumental broadening9, 11. Still, with this procedure, other factors could prevent the proper energy axis calibration, for example, temperature fluctuations would result in an energy shift in the monochromator for XAS experiments. Basically, if the XAS measurements are separated multiple hours in time, the spectra taken could have a slight energy offset. In EELS, the energy offset could change more rapidly and is more unpredictable than in XAS. Typically, the energy offset is very sensitive to the DC stray field. For example, the passing of a heavy-duty truck or the movement of a nearby elevator could change the energy offset if the TEM instrument is not fully shielded. This problem is now mitigated with the dualEELS instruments but there are still many single EELS instruments under active service. Moreover, all historical data were acquired without the dualEELS correction. In addition, the nonlinearity of the parallel EELS spectrometer is present in EELS in a nontrivial way because the nonlinearity is not only present in the dispersion device, i.e., the magnetic prism. There is another complex nonlinearity present in the magnification lenses, a series of quadrupoles. Therefore, it is extremely difficult to calibrate the energy onset of EELS edges unless strict protocols are followed as described by Tan et al.11.

Another complication is that EELS’ near-edge fine structures change with sample thickness due to plural scattering. As the sample gets thicker, signals close to the edge onset would be multiply scattered to higher energy losses. This would result in a shape change of the spectrum11. For example, for the latter 3d transition metals’ L2,3 edges, as the sample gets thicker, the L2/L3 ratio increases—this problem has rendered the reference-free L2,3 ratio method inaccurate for EELS11. In addition, for XAS, the background and the near-edge structures could be different between the Total Electron Yield (TEY) and Partial Fluorescence Yield (PFY) modes. TEY mode measures the total number of emitted electrons resulting from the absorption of X-rays while PFY measures the fluorescence emitted by the sample as a result of X-ray absorption. That also renders the L2,3 ratio method unreliable. Moreover, for early 3d transition metals, there are no established reference-free methods because of the L2,3 anomaly.

For both EELS and XAS spectroscopy, one interesting observation is that human operators with sufficient training can identify spectral features and assign oxidation states to transition metal L2,3 edges with high confidence. This points to the direction that deep learning could be successful in solving the L2,3 oxidation state decomposition problem. Pate et al. in 2021 discussed using deep learning to denoise high frame rate spectra12. Chatzidakis and Botton in 2019 introduced the idea of translation-invariance for classifying EELS edges13. They built a convolutional neural network (CNN) for oxidation state classification and showed that with translation-invariant training, moving the energy axis does not change the Mn2+, 3+, and 4+ oxidation state classification. This is a very important step in demonstrating that spectral features are like spatial features in images—they can be classified by a CNN network regardless of their absolute energy positions in the spectrum. However, there are still problems remained to be resolved: (1) how to quantitatively decompose mixed oxidation states; (2) is it possible to build one model that works for both XAS and EELS spectroscopy that have drastically different energy resolutions; (3) is it possible to build a model that is not affected by plural scattering, i.e., the thickness effect in EELS.

To address the three challenges defined above, in this study, we present a reference-free, calibration-free deep learning approach to determine the accurate oxidation states decomposition of 3d transition metal based on the L2,3 near-edge fine structures. To demonstrate the validity of the method, we use Mn as an example because Mn is technologically important in catalysis, energy storage, and electronic materials. Also, Mn oxides are a good case study because their 3 different oxidation states lead to notable variations in fine structures of the Mn L2,3 edge. Determining the composition of the mixed oxidation states is extremely important for understanding the charge transfer phenomenon happening at the device interfaces. The method we present in this study is not a simple classification of Mn2+, 3+, and 4+ edges but an accurate and quantitative decomposition of the mixed Mn oxidation states. Instead of having a classification/binary type label, we created a three-element ground truth vector that quantitatively describes the composition of Mn2+, 3+, and 4+ in each Mn spectrum, i.e. [%Mn2+, %Mn3+, %Mn4+].

To achieve this goal, we synthesized a spectrum library from 38 experimental spectra (23 EELS and 15 XAS). The library contains 1.2 million spectra 50% of which are synthesized from XAS data and the other 50% are synthesized from EELS data. In building the mixed oxidation state library, we paid special attention to normalizing the Mn L2,3 edges correctly, and including experimental-like uncertainties such as both Gaussian and Lorentzian type instrumental broadening, energy offset, and detector noise. To include the plural scattering effect in the training library, we developed a forward model to correctly introduce the thickness effect to the L2,3 edges. Using this physics-informed large training library, we show that the deep convolutional regression model we trained is robust against plural scattering and noise. The overall accuracy of the model in determining the mixed valence state reaches 85% on the validation data set. We also validated the data on “unknown unknowns”, i.e., Mn3O4 spectra that have never been used for training and validation—the accurate decomposition of Mn3O4 experimental data shows the model is quantitatively correct and can be deployed for real experimental data.

Methods

In this method section, we will describe (1) how to build a ground-truth oxidation state labeled Mn edge library, (2) how to construct the neural network, and (3) how to train it.

For building the library, the technical challenges lie in (1) how to obtain a wide variety of XAS and EELS Mn2+, Mn3+, and Mn4+ reference spectra; (2) how to normalize or ratio the 2+, 3+, and 4+ spectra correctly; (3) how to include the EELS’s plural scattering effect (thickness effect) into the training sets; (4) how to include the various experimental uncertainties including instrumental broadening, energy offset, detector noise, etc. In the following subsections, we will address each aforementioned challenge.

Collection of Mn reference spectra

To have sufficient varieties of data that can capture the features of the EELS and XAS Mn 2+, 3+, and 4+ edges, in this study, we digitized 23 experimental EELS and 13 XAS Mn spectra, in total 38, that were documented in 6 literatures using WebPlotDigitizer20. In Fig. 1, we presented all spectra that were used for making the training library. (The Mn 2.67+ spectrum was not included in the training library). In Table 1, we listed the compounds for which we digitized the spectra and their original references.

Figure 1
figure 1

The presentation of the EELS and XAS Mn L2,3 edges included in making the training library. The Mn 2.67+ presented is not included in the training library.

Table 1 The compound information and references of the Mn L2,3 edges.

All data are standardized to range from 630.5 eV to 669.4 eV with 0.1 eV increments (338 data points). For missing data, the left side of the spectra is padded with zero and the right side is padded with the end value of the spectra.

Normalization of 2+, 3+, 4+ reference spectra

In order to quantitatively combine the 2+, 3+, and 4+ Mn spectra, they need to be normalized to the correct ratio. To achieve that, we normalize the Mn L3 edge according to the d-hole number. Elemental Mn has an electron configuration of [Ar] 3d5 4s2. Therefore, Mn2+, 3+, 4+ have an electron configuration of [Ar] 3d5, [Ar] 3d4, [Ar] 3d3. Because the d shell can hold 10 electrons, the number of holes for Mn 2+, 3+, and 4+ are 5, 6, and 7 respectively. Therefore, the area under the L3 peak and above the continuous background shall be proportional to the d-hole number. The continuous background under the L2,3 edge can be modeled by two-step functions with a step height that follows the 1:2 population ratio. (The filled 2p3/2, and 2p1/2 orbitals have a population ratio of 1:2). The d-hole area can be calculated after the background is subtracted from the spectrum (Fig. 2). With this procedure to find the d-hole area, we can correctly ratio the 2+, 3+, and 4+ spectra.

Figure 2
figure 2

Schematics showing how to extract the d-hole area under the L3 edge.

Ground truth labeled library

After the d-hole ratio normalization, we can correctly combine the Mn 2+, 3+, and 4+ component spectra to form a new spectrum with the known ground truth oxidation state composition and oxidation state as the following

$$\begin{aligned} & s = xMn^{2 + } + yMn^{3 + } + zMn^{4 + } \\ & where\,x + y + z = 1 \\ & {\text{ground}}\;{\text{truth}}\;{\text{oxidation}}\;{\text{state}}\;{\text{composition }} = \, \left[ {x,y,z} \right] \\ & {\text{ground}}\;{\text{truth}}\;{\text{average}}\;{\text{oxidation}}\;{\text{state }} = { 2}x + { 3}y + { 4}z. \\ \end{aligned}$$

In making the ground truth labeled spectra, we only combined Mn spectral components that were digitalized from the same publication source. The reason is the spectra to be combined shall share the same instrumental resolution.

The composition of the training library is detailed in Table 2. A total of 1,200,000 synthetic spectra are included in the library.

Table 2 The composition of the ground truth labeled library.

Instrumental broadening

The instrumental broadening of EELS spectra includes two major contributions. The first contribution primarily comes from the thermal broadening at the electron source and the first crossover due to the space-charge effect. This type of broadening is typically characterized by a Gaussian type broadening function. The second contribution happens at the detector. The light diffusion in the sinterlator and the optical coupler introduce a long-tail broadening effect which can be characterized by a Lorentzian function. For XAS, similar short-range and long-range broadening happens due to the monochromator. Therefore, we introduce a two-parameter controlled instrument broadening kernel aka the point spread function, PSF(E) as the following.

$${\text{PSF}}\left( {E,w,\sigma } \right) = {\text{L}}\left( {E,w} \right) \otimes {\text{G}}\left( {E,\sigma } \right)$$

where  stands for convolution

$$L(E,w) = \frac{1}{\pi }\frac{1/2w}{{E}^{2}+(1/2w{)}^{2}}$$

and

$$G(E,\sigma ) = \frac{1}{\sigma \sqrt{2\pi }}{e}^{-\frac{1}{2}(\frac{E}{\sigma \sqrt{2}})}$$

Basically, the instrumental point spread function is a convolution of a Gaussian function with a Lorentzian function. The full width at maximum (FWHM) of the Lorentzian function is w and the FWHM of the Gaussian function is \(2\sqrt{2ln(2)}\sigma\). The combined FWHM is equal to \(\sqrt{FWH{M}_{Lorentzian}+FWH{M}_{Gaussian}}\). It is worth noting that the inclusion of the Lorentzian tails in the point spread kernel is very important for making the synthesized spectra resemble the experimental ones. An example of such a broadening effect on a Mn2 + L2,3 edge is shown in Fig. 3.

Figure 3
figure 3

An example of the instrumental broadening effect on the Mn L2,3 edge.

Plural scattering in EELS

If the single scattering probability function is P(E), plural scattering as a function of thickness, t, in EELS can be described by the following differential equation

$$\frac{dS(E, t)}{dt}={\int }S(E{\prime})P(E-E{\prime})dE{\prime}$$

and the boundary condition is \(S(E,t=0) = \delta (E)\) in the ideally monochromatic condition. In the practical situation where the incoming electron has an energy spread, we can use the point spread function given in the last section as the initial energy profile, i.e.

$$S(E,t=0) = PSE(E,w,\sigma )$$

Once we obtain a numerical representation of P(E), the spectral function, S(E, t) at any given thickness, t can be numerically calculated.

Using this equation, it allows us to calculate the low-loss spectral function numerically. Once we obtain the low-loss spectral function, the core-loss spectrum is a convolution of the core-loss single scattering probably distribution, \({P}_{core-loss}(E)\), with the low-loss spectral function, i.e., \(S(E,t)\).

Figure 4 shows the change of the low-loss function as a function of normalized thickness (\(t/\lambda\), \(\lambda\) is the inelastic mean free path) and how the Mn L2,3 edge evolves.

Figure 4
figure 4

The modeling of the plural scattering for Mn-containing compound and its effect on the spectral shape of Mn L2,3 edges.

In this modeling, we use an average plasmon loss energy of 25 eV and approximate the P(E) by an asymmetric function where the left side is a Gaussian function, and the right size is a Lorentzian loss function. To be more exact, we also modeled the Mn M edge and superimposed it onto the plasmon loss.

Other augmentations: energy shift and noise

Both EELS and XAS are subject to the issues of inaccurate energy axis. To take this into account, we apply a random shift augmentation of the energy axis for the ground truth labeled spectra. With this augmentation, the model becomes translation invariant—it is only sensitive to the spectral shape and it is insensitive to the absolute energy onset of the L2,3 edge.

For noise, we have modeled the noise as white noise (Gaussian noise) with a salt and pepper noise (impulse noise). Both noises are additive to the spectrum. We use the linear definition of PSNR as:

$$PSNR=\frac{Max\,Signal}{\sqrt{Mean\,Sequare\,Error}}$$

Summary of augmentation

In Table 3, we summarize the augmentation operations done to the ground truth labeled library.

Table 3 A summary of the augmentation operations and occurrences.

Network structure

How our brains process or identify a spectral feature is very similar to recognizing spatial features in an image. Inspired by this, we adopted the convolution layers that are used in image classification for feature extraction. Then we connected the features with a fully connected layer (also known as dense layers) for composition regression. The input is the one-dimensional spectrum, and the output is a 3-element composition vector (Fig. 5). We call this network a convolutional regression net (CRN). Different from a classification network, a regression network’s outputs are continuous numbers rather than binary numbers. Therefore, we used the mean square error function as the loss function.

Figure 5
figure 5

The structure of the convolutional regression net for mixed oxidation state decomposition.

For feature extractions, we use three convolutional layers followed by leaky ReLU and maxpooling. The final layer outputs 41*128 = 5248 filtered features. In the regression layers, we used three fully connected layers with 2048, 512, and then 3 neurons with leaky ReLU in between. The final output is a softmax normalization of the final 3-neuron layer to ensure that the sum of the composition vector is equal to 1.

Training

In Table 4, we summarized the technical information of the training process.

Table 4 Technical information for training.

All spectra are subtracted by the mean and divided by the standard deviation before entering the network. Dropouts are added to each layer before maxpooling with a dropout rate of 0.1. Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments was used for learning. The learning rate is set at 8E-5. The batch size is 32. As shown in Fig. 6, the model converges quickly; therefore, only 4 epochs of training were used to avoid overfitting.

Figure 6
figure 6

The MSE loss as a function of epochs processed.

Validation

We performed an 80/20 split of the ground truth labeled library to split it into 80% of training and 20% of validation datasets. The accuracy of the model is evaluated on the validation set. An accurate prediction is defined as the predicted oxidation state falling in the range of ± 0.1 of the ground truth oxidation state.

We also evaluated our model on reference data and testing data. Reference data are the data we digitized from the literature, used to build the library. The testing data are new experimental and literature data that were never used for the construction of the training data.

Result and discussion

Performance of the model on the validation set

On the validation set, the trained model reports an accuracy of 85%. Figure 7 shows the scatter plot of the prediction versus the ground truth (2000 spectra were randomly selected from the validation set). The result shows that the model performs reasonably well on the validation data.

Figure 7
figure 7

The scatter plot of the model’s predicted average oxidation state versus the ground truth.

To more closely look at the performance of the predicted decomposition, we provide a table of the predicted composition as shown in Table 5. It shows that the decomposition is reasonably accurate on the validation dataset.

Table 5 CRN’s decomposition performance on validation spectra.

Plural scattering

To show how the model performs with the interference of plural scattering, we tested the thickness effect on MnO, Mn2O3, and MnO2. As shown in Table 6, the results are accurate up to 1.5 inelastic mean free path (λ) which is larger than the maximum augmentation range used in the training dataset.

Table 6 Testing of CRN’s decomposition robustness against plural scattering on validation data.

Performance with noise

To show how the model performs with the interference of noise, we tested the effect on MnO, Mn2O3, and MnO2. As shown in Table 7, the model is robust down to PSNR = 20. At PSNR = 10, 2+ and 4+ are more stable than 3+.

Table 7 Testing of CRN’s decomposition robustness against noise on validation data.

Validation of the model on testing data

Validation of testing data is critical for understanding the accuracy and robustness of a machine learning model.

Testing on Mn3O4

One of the compounds, for which we have experimental data on, but was not used for training was Mn3O4. It has a mixed oxidation state of Mn2+ and Mn3+ with a theoretical ratio of 1:2. It gives an average oxidation state of + 2.67. Figure 8 shows the predicted oxidation state as a function of thickness and the oxidation state decomposition is shown in Table 8. The model predicts the correct ratio between 2+/3+ with a small error. The prediction starts to deviate from the ground truth at t/λ = 1.5 which is larger than the maximal augmentation used for training. Therefore, reduced performance is expected.

Figure 8
figure 8

Validation on Mn3O4 as a function of thickness.

Table 8 Testing of CRN’s decomposition robustness against plural scattering on testing data(Mn3O4).

Further testing on the influence of noise

The noise-contaminated Mn3O4 spectra are shown in Fig. 9. The predicted oxidation state decomposition is shown in Table 9. The ratio between 2+ and 3+ stays close to 1:2 for PSNR down to 10 (SNR down to 2). When PSNR is below 10 (SNR is below 2), the composition ratio starts to deviate from the theoretical ground truth as expected (the noise augmentation range is PSNR = [10,30]).

Figure 9
figure 9

The Mn3O4 spectra from testing data as a function of noise.

Table 9 Testing of CRN’s decomposition robustness against noise on testing data(Mn3O4).

Sensitivity and accuracy validation on Mn3O4 with vacancies on the tetrahedral sites

We tested the accuracy and sensitivity of our model using two Mn3O4 EELS spectra documented in Ref.9. The two spectra are shown in Fig. 10. The predicted oxidation state decomposition is given in Table 10. The small difference in the L3 edge indicates that the nanosized Mn3O4 has slightly more Mn2+ and less Mn3+ than the bulk Mn3O4. The documented ratio of Mn3+/Mn2+ is 2 for the bulk sample and 1.6 for the nanosized sample in Ref.9. Our model accurately captures this change. For the nanosized sample, our model’s predicted decomposition clearly shows the reduction of Mn3+ composition and increase of Mn2+ composition. The ratio of Mn3+/Mn2+ is predicted to be 1.59 which is almost the same as the documented value. This test shows that our model is accurate and sensitive to small changes in the spectrum.

Figure 10
figure 10

The EELS Mn L2,3 edges of bulk Mn3O4 vs nanosized Mn3O4.

Table 10 Testing of CRN’s decomposition sensitivity on testing data(Mn3O4).

More on testing data

To further test the model on testing data. We collected more EELS and XAS data from literature and experiments with very different energy resolutions. All data shown in Fig. 11 were not used for training. The EELS data shown are from references5, 19 and the XAS data are from our own experimental collection at NSLSII and Taiwan Light Source. Figure 11 shows the oxidation decomposition of the EELS/XAS Mn L2,3 edge inferred by our model. All predictions are within reasonable errors of the ground truth. It is worth noting that the model is effective on both XAS and EELS spectra. The XAS and EELS have very different energy resolutions. Within the XAS, the TEY and PFY also have noticeable differences in fine structures. In addition, the energy onsets are all different. However, as shown, our model remains translation invariant and is robust enough to correctly decompose their oxidation states.

Figure 11
figure 11

Validation of CRN’s decomposition sensitivity on data not used for training.

Conclusion

In this work, we built a regression deep learning network to accurately decompose the mix valence state of Mn for both EELS and XAS spectra. By passing the Mn L2,3 edge spectra into the neural network, the ratio of Mn2+, 3+, and 4+ can be predicted. To train the network, we also created a forward model for synthesizing the mix valence state of Mn L2,3 edge spectra. Plural scattering, instrumentation broadening, noise, and energy axis offset were taken into account when creating the forward model. A 1.2 million spectral dataset was synthesized for network training and validation using the forward model. The network was also tested on the testing spectra, real spectra collected from experiments, which were not used in the dataset synthesis. The robustness of the network was examined against noise and plural scattering. The high accuracy of the network on both validation and testing spectra suggested it can accurately decompose Mn L2,3 edges. The robustness of the network against noise (PSNR down to 10) and plural scattering (t/λ up to 1) demonstrated the high sensitivity and stability of our network. Furthermore, the network performed quantitively well on common compounds such as MnO, Mn2O3, Mn3O4, and MnO2 which means our model can be deployed and trusted in real experiments. This work showed that it is possible to accurately decompose mix valence state of Mn L2,3 edge spectra for both EELS and XAS without reference and calibration using deep learning algorithms. In the future, the method described in this work can also be generalized to other transition metals such as Fe, since their similar chemistry property to Mn. This work provided a new angle to study the fine structure of L2,3 edges and the development of AI-driven autonomous TEM.