Background & Summary

A crucial step in the process of novel materials discovery is the characterization of the synthesized material. There exists a wide array of tools and spectroscopic techniques that are used in the material identification process, e.g. X-ray diffraction (XRD), X-ray emission spectroscopy (XES), and X-ray absorption spectroscopy (XAS). XAS is widely-employed in the characterization of the local structural environment surrounding select elements within a material.

Great progress has been made over the past few years in the development of laboratory-based X-ray spectrometers for high-resolution x-ray absorption near edge structure (XANES) and X-ray emission spectroscopy (XES)1. The availability of relatively inexpensive laboratory-based XAFS system (http://easyxafs.com/) and third generation synchrotron facilities2 have established the groundwork for the broad application of high-resolution XAS in characterization of materials. On the other hand, modern computational resources and methodologies have reached a level of maturity and efficiency to complement as well as to fast-track new discoveries. In the case of XAS, a variety of theoretical frameworks including time-dependent density-functional theory (TDDFT)3,4, multiple-scattering5, and Bethe-Salpeter equation (BSE) based approaches6 have been implemented, each exhibits its advantages and drawbacks. Leveraging spectroscopic simulation software with large crystal structure databases enables the computation of a large number of reliable theoretical spectra corresponding to well-defined crystal structures7, providing a broad reference dataset with clean unique structural fingerprints that can be used for identification purposes. With the help of carefully crafted software tools, these computations can be performed in a high-throughput fashion and can be used to scan the structural phase space for novel materials. In addition, through proper integration with modern database tools, these scans can be saved for future use and leveraged for training machine learning algorithms to assist the characterization process. Some examples of such publicly available spectroscopic database are the EELS Data Base8, a compilation of valence and core-loss spectra from EELS and XAS experiments containing 271 spectra that covers 39 elements of the periodic table, and XCOM (https://www.nist.gov/pml/xcom-photon-cross-sections-database), which provides photon cross sections for scattering, photoelectric absorption and pair production, as well as total attenuation coefficients, for any element, compound or mixture. Other existing XAS databases9,10, i.e. https://www.cat.hokudai.ac.jp/catdb/ and http://cars.uchicago.edu/xaslib, covering a few hundred spectra, are hosted across the world and serve as valuable references for analysis.

The FEFF framework affords relatively inexpensive calculations compared to other approaches and requires minimum adjustable parameters. It provides an efficient means of generating high quality XAS spectra for a larger amount of chemical systems and structures. Hence, in our study, we selected the FEFF9 (ref. 5) program for the ab initio calculation of K-edge X-ray absorption near edge spectra (XANES). Using the parameter settings obtained from recent benchmarking work against experimental spectra11 and the FEFF workflow infrastructure available in the open source materials science workflow package Atomate12, we generate spectra of all the materials available in the publicly accessible and widely used materials database, Materials Project (MP)7.

A comprehensive database of computed XAS spectra enables comparison between different spectroscopic signatures across chemical systems and structures such that rapid determination of oxidation states, coordination environment, and other local atomic structure information can be obtained. Furthermore, using matching algorithms11 or other machine learning methods13, the data can be leveraged for on-the-fly characterization. Though the peak positions and amplitudes of the computational K-edge XANES spectral may exhibit differences compared to experimental spectra, theoretically computed XANES spectra provide sufficient information to identify oxidation state and coordination chemistry of the probe atom, and can be highly useful when experimental data are not available or scarce. For example, a previous study by Timoshenko et al.14 showed that ab initio XANES spectra provide excellent input data for training supervised machine learning models aimed at reconstructing metal catalyst structures from their experimental XANES. The current authors have also shown in a previous study11 that an ensemble-learned algorithm to match experimental K-edge XANES spectra in the EELS Data Base to computed spectra can achieve nearly 80% accuracy in identifying the correct oxidation state and coordination environment. In addition, the data is associated with download options and programmatic analyses tools for each structure in the Materials Project database, thereby making it accessible to the broader materials science community. Furthermore, the MP web application enables users to select spectra from the database, upload experimental spectra data and predict the material composition using the matching tool. To date, this is the largest computed XAS dataset available and it is still expanding.

The paper is organized as follows; first we briefly describe the XAS computation methodology as implemented in the FEFF code, and thereafter the high-throughput framework used in the generation of the spectra. We then describe the data storage and dissemination details, followed by the technical validation of the computational methodology and the high-throughput framework.

Methods

Theory

The K-edge XANES spectra were computed using the FEFF5 code which employs the Green's formulation of the multiple scattering theory to compute the spectra5. The X-ray absorption μ is computed in a manner similar to Fermi's golden rule when written in terms of the projected photoelectron density of final states or the imaginary part of the one-particle Green's function, G(r,r′;E). In terms of the Green's function, G(r,r′;E), the absorption coefficient, μ, from a given core level c is given by ref. 15.

(1) µ = 1 π I m < c | ε r G ( r , r ' ; E ) ε r | c >

with the Green's function, G(r,r′;E) given by

(2) G ( r , r ' ; E ) = f Ψ f ( r ) Ψ f ( r ' ) * E E f + i Γ

where ψ f are the final states, with associated energies Ef and net lifetime Γ, of a one-particle Hamiltonian that includes an optical potential with appropriate core hole screening.

The FEFF code computes the full propagator G incrementally using matrix factorization and uses the spherical muffin-tin approximation for the scattering potential15. For a more detailed description, we direct the readers to the review paper by Rehr et al.15

High-throughput Workflow

For the high-throughput XAS spectra generation, we use the FEFF workflow (Fig. 1) available in the open source computational materials science workflow package Atomate12. Atomate provides a high-level interface to compose workflows using open source materials science softwares such as pymatgen16, FireWorks17 and Custodian (https://github.com/materialsproject/custodian).

Figure 1
figure 1

Schematic diagram of the high throughput framework employed in the generation of XAS spectra for the Materials Project.

Each FEFF calculation involves the following 3 steps:

  • Selection of the absorbing site and the cluster of atoms to be included in the scattering calculations.

  • Generation of the FEFF input files for each site and its surrounding atomic cluster.

  • Execution of the FEFF binary on the generated input files.

As shown in Fig. 1, the workflow is initiated by importing a structurally optimized compound from MP. Each site in the downloaded structure is a possible absorbing center and FEFF calculation sequence must be initiated for each site. However, the number of calculations can be reduced by considering only the symmetrically unique sites in the structure. The FEFF input files for each such symmetrically unique absorbing site are generated subsequently and the FEFF binary is invoked on each input set. In the final step, the computed spectra from each calculation is inserted into a MongoDB database and disseminated via the Material Project (https://materialsproject.org/) website.

Code availability

Except the for FEFF code, which is proprietary, all the other aforementioned packages used in the high-throughput XAS workflow are open source and can be found at https://github.com/materialsproject and https://github.com/hackingmaterials/atomate.

Data Records

All the data described in this section can be found in the file, xas.json.tgz (Data Citation 1). The same data is also stored in the Materials Project database and is made freely available to the public. We also provide a user friendly web application called XAS Matcher (screenshot shown in Fig. 2) that enables user interaction with the computed data. The app can be reached at https://materialsproject.org/#apps/xas. Users can employ the app to search for computed XAS spectra, upload experimental spectra and find structures in the MP database whose computed spectra match that of the uploaded one. Details on spectra matching algorithm employed on Materials Project were published separately11.

Figure 2: Screen shot of XAS Matcher web application.
figure 2

The web application is hosted at https://materialsproject.org/#apps/xas.

Data Representation

To date, spectra for more than half of the compounds(≈40000) in the Materials Project database are available, for all the symmetrically unique sites in each structure. Each structure dataset is stored in the database in the binary JavaScript Object Notation (BSON) format. The keys and respective descriptions are summarized in Table 1. Although the workflow yields separate spectra for each unique atomic site, the averaged absorption coefficient over all the sites in the structure with that element is presented on the MP website. This will facilitate comparison with experimental spectra, where the averaging over each element is unavoidable. However, the full data, e.g spectra for all unique sites, are available to the user for download and further analysis.

Table 1 Keys and their description for each spectra data JSON file.

Data Download

The spectral data as well as the input parameters used for the calculations can be downloaded either directly from the Material Project website or using the REST Application Programming Interface (API) available in pymatgen18. Data can be downloaded for each element in the selected structure. The downloaded spectrum is provided in a tab separated file format and includes the spectral data for all the symmetrically unique sites of the selected element in the structure. The standard XAS data interchange (XDI) format19 is also available for download, which can be directly imported into most existing XAFS data analysis programs20 for further detailed analysis.

Technical Validation

Verification of the default parameter settings for the workflow.

The workflow described above relies on the default FEFF input parameter settings to generate the K-edge XANES spectra in a high throughput fashion. In this section, we will briefly describe the major FEFF input parameters relevant to the calculation of the XANES spectra, the bench-marking procedure and sample validation cases against experimentally available XANES spectra.

FEFF9 is capable of achieving quantitative agreement with XAS experimental results with a minimal set of adjustable parameters. The development and implementation of parameter-free models within the FEFF9 code permit consistent calculations across different chemical systems and constitute the main advantage for high-throughput calculations. In the benchmarking process, we included 13 unique compounds and their corresponding high-quality K-edge XAS spectra available in the open EELS/XAS database8, supplemented by 6 experimental XANES spectra of V2O5, V2O3, VO2, LiNiO2, LiCoO2, and NiO from previous studies21,22. Compounds included in the earliest commentary5 of FEFF9 software were also evaluated. The benchmark compound dataset has a high structural diversity and covers a wide chemical space. Detailed benchmark information is provided in a previous publication11. For benchmark compounds that contain detailed structural information, we used structures from the Materials Project (https://materialsproject.org/) database that exhibit an optimized geometry with the same space group as the benchmark compound. For benchmark compounds without provided structural information, MP ground state structures with identical chemical compositions were used11.

The following input fields in FEFF9 were subjected to convergence and optimization tests:

  • Self-consistent field (SCF): The SCF card controls FEFF automated self-consistent potential calculations.The self-consistent potential calculation is required in the XANES calculation for the Fermi level E0 estimation. In the convergence test, we varied the number of atoms included in the self-consistent potential calculations through changing the rfms1 value from 2 Å to 8 Å at 1 Å interval.

  • Full multiple scattering (FMS): The FMS card is required in the XANES calculation as the multiple scattering (MS) expansion's convergence might not be stable in the XANES calculation5. To identify the effect of rfms field on XANES calculation results, we varied the rfms value from 3 Å to 11 Å at 1 Å interval.

  • EXCHANGE: The EXCHANGE card specifies the exchange correlation potential model used for XANES calculation. The Hedin-Lundqvist self-energy is chosen as previously recommended for most applications23.

  • COREHOLE: The COREHOLE card is used for specifying how the core is treated during XANES calculation. The default choice in FEFF treats the core-hole interaction based on the Final State Rule (FSR), which could overestimate the strength of the core-hole and excludes the core-hole mixing effect24. To overcome these deficiencies and avoid possible break down of FSR for the L-shell metals25, the random phase approximation (RPA) is used to approximate the core-hole interactions in our high-throughput K-edge XANES calculations.

Through the benchmarking study, a set of optimized FEFF parameters were determined to achieve the best balance between the computational cost and convergence performance. The Pearson correlation coefficient is used to compare spectra calculated using different parameters. The Pearson correlation coefficient between two same energy grid spectra, Xi and Yi, is calculated using the following expression:

(3) S P e a r s o n ( X , Y ) = i = 1 D ( X i X ¯ ) ( Y i X ¯ ) ( i = 1 D ( X i X ¯ ) 2 ) ( i = 1 D ( Y i Y ¯ ) 2 ) ,

where Xi and Yi are the corresponding absorption coefficients.

We noticed that the Pearson correlation coefficients between FEFF computed spectra and experimental spectra obtained from EELS Data Base are above 0.85 in general (see Fig. 3). For C and B2O3, the FEFF-computed spectra are not in good agreement with experimental spectra. Possible solutions include the adoption of other higher-level real-space full-potential multiple scattering theory or first principles approaches26, which are not ideal for high-throughput implementation due to their high computational cost. Figure 4 depicts some sample comparisons between the computed and the experimental K-edge XANES spectra. We note that the computed spectra match with that of the experimental ones only up to a constant shift in the energy. The computed K-edge XANES spectra of vanadium oxides given in Fig. 4d–f show a strong change in their first peak intensity. Reasonably good agreement between computational and experimental spectra was found.

Figure 3: Benchmarking results of rfms1 parameter in the SCF card for K-edge XANES of various materials.
figure 3

Pearson correlation coefficients were calculated between spectra calculated at different rfms1 and the experimental reference provided by electron energy-loss spectroscopy (EELS) Data Base8.

Figure 4: Sample comparisons of FEFF computed K edge XANES spectra with the corresponding experimental ones for six different compounds.
figure 4

Computational spectra are shifted upwards by 0.5. (a) LiCoO2, (b) LiNiO2, (c) NaCl, (d) V2O5, (e) VO2, (f) V2O3.

Usage Notes

We present a database of K-edge XANES spectra computed using FEFF. The data is made freely available to all researchers via the Materials Project(www.materialsproject.org). Users can also download the data using the REST API that is part of pymatgen. All the codes used to create the high throughput are made freely available at Github(https://github.com/materialsproject and https://github.com/hackingmaterials/atomate). We hope that the users will find the data to be useful and will find novel ways to employ the data to accelerate their research. One such use case would be using machine learning techniques to predict structures from the experimentally measured spectra.

For users of FEFF and the spectra resulting in this study, it should be noted that K-edge XANES spectra computed by FEFF are more accurate for the investigation of elements in the periodic table up to the fifth-row. For excitations in heavier elements, e.g., the rare earth elements and 5d elements, L-edge XANES spectra are primarily used. FEFF is also applicable for the simulation of L-edge XANES spectra, though in certain cases2729 ground-state DFT methodologies need to be used for better agreement between computed spectra and experimental results. A detailed study of high-throughput FEFF calculation and implementation of L-edge XANES is currently being conducted by our research group. Furthermore, the analysis of XANES is recommended for the identification of oxidation state and coordination chemistry of the absorbing atom30. We note that the quantitative accuracy of XANES calculations is not comparable to EXAFS in identification of the distances, coordination number, and species of the neighbors of the absorbing atom. The accurate and precise interpretation of EXAFS is routinely conducted coupled with well-established software packages31 using the FEFF calculated EXAFS. The FEFF calculated K-edge EXAFS of all the materials available in the Materials Project database is underway, and a significant portion will be released in parallel with this publication.

Additional information

How to cite this article: Mathew, K. et al. High-throughput computational X-ray absorption spectroscopy. Sci. Data 5:180151 doi: 10.1038/sdata.2018.151 (2018).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.