Abstract
Raman spectroscopy is used ubiquitously in the characterization of condensed materials, spanning from biomaterials, minerals to polymers, as it provides a unique fingerprint of local bonding and environment. In this work, we design and demonstrate a robust, automatic computational workflow for Raman spectra that employs first-principle calculations based on density functional perturbation theory. A set of computational results are compared to Raman spectra obtained from established experimental databases to estimate the accuracy of the calculated properties across chemical systems and structures. Details regarding the computational methodology and technical validation are presented along with the format of our publicly available data record.
Design Type(s) | modeling and simulation objective • database creation objective • data validation objective • chemical structure classification objective |
Measurement Type(s) | Raman spectrum |
Technology Type(s) | ab initio quantum chemistry computational method |
Factor Type(s) | |
Sample Characteristic(s) |
Machine-accessible metadata file describing the reported data (ISA-Tab format)
Similar content being viewed by others
Background & Summary
Raman spectroscopy is an effective method for obtaining vibrational frequencies and local chemical bonding information in condensed materials. Due to its relative quick and non-destructive analytical nature, it is widely used in the analysis of crystalline materials and polymers - extending even into bio-medical materials and pharmaceuticals1,2,3, with advanced technologies and portable devices developed over the years. Indispensable for a complete vibrational spectrum observation, Raman spectroscopy has established an unique role in obtaining “fingerprints” for modern material science characterization. The increase in open access of experimentally measured Raman spectra databases, such as the RRUFF4 Project and Bio-Rad’s KnowItAll Raman Spectral Library, has incurred increased popularity. These resources now contain thousands of Raman spectral data of materials ranging from minerals, semiconductors to prescription drugs, which greatly facilitates the referencing process and identification of novel materials.
Compared to an experimental Raman database, a computational Raman Spectrum database from ab initio electronic structure calculations has many advantages5,6,7. Cheaper and more abundant computing resources enable thousands of calculations for a larger number of materials in an automated fashion, significantly reducing human effort. Systematic calculations result in a uniform data type, enabling accelerated classification of modes, discovery of correlations, and screening of useful materials. However, despite these advantages, there has been limited effort in constructing a computational Raman Spectrum database or a systematic workflow for its development. The WURM Project8 was an early prototype, but has shown little to no activity since 2012.
In this paper, we develop a computational workflow to calculate Raman spectra using ab initio density functional perturbation theory (DFPT): an accessible and fairly accurate tool for describing lattice dynamics9. Following the development of the dielectric tensor components in Petousis et al.10, computed directly using DFPT as implemented in VASP, the Raman tensors of 55 inorganic compounds are derived. They are then used to generate the Raman spectra data reported, which is benchmarked against experimental Raman spectra. These Raman spectra are integrated into the Materials Project (MP)11, a free, online database with computed properties to enable accelerated materials design. While Raman spectra are still computationally expensive, this work establishes a systematic workflow and associated benchmarking of numerical parameters for automated Raman spectra calculation that can be utilized by the community at large, and help increase the availability of reference Raman spectra for a wide variety of structures and chemical systems.
Methods
Theory and definitions
In Raman spectroscopy measurements of crystalline systems, incident laser photons with frequency ωL interact with lattice vibrations, which can be described in the form of phonons. The inelastically scattered photons either exhibit a decrease or increase in frequency, resulting in symmetric Stokes or Anti-Stokes shifts, respectively. The recorded spectrum shows the intensity of the scattered light as a function of its frequency change, often converted to wavenumbers v in units of cm−1, known as Raman shifts. The Raman spectrum therefore provides a “fingerprint” description of the normal modes in the crystal through the comparison of incident and scattered photon frequencies. In our work, a normal mode’s wavenumber vmode is directly obtained through a DFPT phonon calculations. The intensity of the mode i with polarization along β and electric field along γ is represented12,13,14,15 as
where ωi is the vibrational mode frequency and ωL, the laser frequency, is usually arbitrarily set in computational methods since the shift in photon frequency is independent of incident photon frequency due to the nature of the interaction. \(n({\omega }_{i})+1={\left(1-{e}^{\frac{-h{\omega }_{i}}{{k}_{B}T}}\right)}^{-1}\) is the Bose occupation number that is dependent on both temperature and frequency. In order to accommodate the most common experimental conditions, we set T = 300 K and ωL to the wavenumber corresponding to a 532 nm wavelength laser. For each mode, the Raman tensor12,16
is obtained from the finite difference derivative of the dielectric tensor, εβγ, with respect to the displacement uinv. mn is the mass of atom n, ν the direction of displacement, Ω the unit cell volume, and einγ the eigenvector of the dynamical matrix. More specifically, the central difference scheme is employed as atoms are moved independently with displacement of 0.005 Å in both the positive (+) and negative (−) directions to calculate the differential of dielectric tensor with respect to displacement. However, the intensity expression in Eq. (1) is ideal for single crystals. Quite often experimental data use polycrystaline mineral specimens or powdered samples, in which case the intensity must be averaged over all possible orientations of the crystals which is accomplished by separating the total intensity I = I⊥ + I||17,18,19 into the depolarized light intensity
and the polarized light intensity
where the rotation invariants17,19 are defined as
For the Raman spectra of each compound, the intensities of each mode are normalized by setting the max intensity in the spectra to 100 and then scaling other mode intensities accordingly.
Computational workflow
The workflow designed for automatic Raman Spectrum calculations is outlined in Fig. 1. The relaxed material structures are obtained from the Materials Project Database11,20,21 and serve as input structures to the following DFPT simulation. For all calculations of vibrational normal modes, DFPT simulations were run using the Vienna Ab-Initio Simulation Package, also known as VASP22,23,24,25, using the GGA/PBE26 + U27,28 exchange-correlation functional and projector augmented wave (PAW) pseudopotentials29,30. The k-point density was set at 3,000 per reciprocal atom and the plane wave energy cut-off at 600 eV, same as those in the calculation of dielectric tensors10. The structure is then displaced along the calculated normal mode eigenvectors for dielectric tensor calculations. Raman susceptibility tensors are calculated from the finite difference derivative of the dielectric tensor as shown in Eq. (2) and the collection of computational Raman analysis data is stored with all associated metadata from the Materials Project.
Data Records
Computational data record
The computational data is stored in a lightweight data-interchange format, and the JSON document can be downloaded directly from the Figshare repository31. The nested key/value pairs can be navigated using Tables 1 and 2. The document includes the original structure data from the Materials Project, normal modes information, tensors, and calculated intensities.
Experimental data record
The experimental data extracted from the RRUFF database is provided in a separate JSON document with nested structure as shown in Table 3. The file includes mineral type, peak locations, intensity values, RRUFF_id, and matched Materials Project IDs. To ensure identical structures, each mineral is matched to a compound from the Materials Project by fitting Crystallographic Information Files (CIF) using pymatgen tools or manually referencing to mineralogy data32. The RRUFF spectra data and all CIF format files used are included in the Figshare repository31 for increased transparency and ease in reproduction of the benchmarking results.
Technical Validation
Unlike evaluating the accuracy of scalar material properties, comparison of the Raman spectra requires a matching process between the calculated vibrational modes and experimental Raman spectra peaks from the RRUFF database4. Hence, a linearly modelled cost metric, u = w1(vmode − vpeak) + w2(Imode − Ipeak) = w1Δv + w2ΔI, considering the proximity in both wavenumber and normalized intensity is used in the matching process: within the same compound, each of the computed vibrational modes is paired with a corresponding experimental peak that minimizes the overall cost. To ensure the matching process produces more accurate results, we process the calculated Raman spectra data in the following two steps.
First, for each compound’s computed Raman spectra, a signal to noise threshold of 0.356% is used to remove vibrational modes with experimentally undetectable intensities. This threshold is the average noise intensity to maximum intensity obtained from reference experimental data. Second, for each compound, the low wavenumber region is removed from the computational Raman spectrum for both experimental and computational reasons. From an experimental point of view, the RRUFF database records different initial probing locations for different compounds’ Raman spectra, which presents a noticeable challenge for the peak matching process. The average distance between the starting location of RRUFF Raman spectra and its first peak location is 113 cm−1, which provides a location reference for the computational Raman spectra of each compound. From a computational perspective, DFPT tend to exhibit larger errors at lower wavenumbers; these modes correspond to long range oscillations that are difficult to compute within limited size of periodic unit cells. For example, these low wavenumber modes, corresponding to phonon wavelengths on the orders of microns, can be influenced by long range features such as defects, dislocations and grain boundaries that are not captured in the simulations of perfectly periodic, single crystal structures.
The data consists of 55 compounds and 205 pairs of matched peaks/modes, where the total calculation cost amounted to 9.5 Million CPU hours. The accuracy of the calculated phonon modes’ wavenumber values vmode is represented in Fig. 2, where the compounds range from simple oxides, sulfides to carbonates. The average wavenumber deviation Δv for all the compounds is −9.66 cm−1 with a standard deviation of 18.58 cm−1, which indicates that our calculations tend to slightly underestimate the Raman peak locations. With a mean absolute relative error (MARE) of −2%, we conclude that our test set is in reasonably good agreement with the experimental data available. In Fig. 3, violin plots are used to depict the wavenumber deviations grouped by mineral type. The oxide family consists 40% of the data, and shows weak underestimation of peak locations with average absolute wavenumber deviation of −7.5 cm−1. The sulfate and carbonate families exhibit the largest wavenumber deviations. While the sulfate family’s larger deviation is more likely due to the limited number of sulfate compounds used in this work, the carbonate family exhibits more abundant data for distribution analysis, showcasing in an average wavenumber deviation of −19.3 cm−1. Most notably, we observe large wavenumber deviations of carbonates’ modes at Raman shifts around 1000 cm−1. These modes are all generated from the vibrations of oxygen anions, which could be affected by small (unknown) amounts of water in mineralogical samples, which are not accounted for in the calculations.
The intensity deviations do not show a clear bias. Most often, as seen in Fig. 4, carefully tuned weights w1 and w2 can yield well paired peaks and modes, yet discrepancies in intensity still remain. It should be noted that our structures from MP assume a perfectly ordered bulk crystal at 0 K, and therefore extrinsic factors associated with temperature, pressure, defects, and phonon anharmonicity are not considered. These effects on the dielectric tensors are described in detail in Petousis et al.33, which carries over to the Raman tensors and eventually affects the calculated intensities. Interfaces and defects in particular are expected to be a strong scattering center for phonons which could account for much of the intensity variation as the phonon population deviates from a simple Bose-Einstein distribution.
Besides the extrinsic effects mentioned above, there are other factors that could contribute to discrepancies between computational and experimental values. One such effect is that of the use of the GGA/PBE exchange-correlation function, which has been shown to exhibit lattice under binding34, yielding larger bond lengths, lattice parameters and unit cell volumes of the relaxed structures. We investigated the correlation between the unit cell volume expansions % and Raman peaks’ wavenumber discrepancies (cm−1) to determine if the GGA/PBE known under-binding causes a systematic and significant deviation of the computed Raman data. Despite an average volume expansion of 4.94%, the weak correlation between the two (+0.09), as seen in Fig. 5, suggested that the effects of GGA/PBE is not significant in this work. Therefore, we do not attribute the observed wavenumber discrepancies to the use of GGA/PBE exchange correlation function. Other salient differences between computational assumptions and experimental conditions may also contribute to the discrepancies. For example, the applied rotation invariant intensity equation is an average result considering every possible scattering geometry, and therefore assumes a powdered or isotropic poly-crystalline sample. However, the quality of mineral samples do not always satisfy these ideal conditions, which may allow specific sample orientations or laser polarization directions in experimental measurements to create discrepancies in Raman spectra.
Usage Notes
An automatic workflow for ab initio electronic structure Raman spectra is expected to be of interest to the broad community of materials science and chemistry. We expect this growing dataset to be used in the understanding of Raman characterization and provide a reference for experimentally measured Raman data. Furthermore, our work opens up opportunities in the development of a data intensive approaches. For example, the application of machine learning techniques on existing spectra may be utilized to enhance the data set as well as to improve the peak matching process. Such work provide the basis for quick identification by screening through compounds on the Materials Project website, comparing their computational Raman spectra data to the reference experimental counterpart. Still, we emphasize that the computational cost for developing this database is rather significant. While this work illustrates the promise of a computational Raman spectra database, further work is necessary to reduce the computational cost to access truly large data sets needed for machine learning approaches.
Code Availability
The proprietary VASP-code is primarily used in the DFPT calculations. The processing and modifications of the simulations were implemented using Pymatgen20 and FireWorks35. Pymatgen (Python Materials Genomics) is an open-source Python library under Massachusetts Instrutute of Technology (MIT) license for materials analysis. The workflow shown in Fig. 1 is implemented using FireWorks in Atomate36, which stores, executes, and manages calculation workflows and is free to public through Atomate’s Github site under a modified GNU General Public License.
References
Das, R. S. & Agrawal, Y. Raman spectroscopy: recent advancements, techniques and applications. Vib. Spectrosc 57, 163–176 (2011).
Schrader, B. Infrared and Raman spectroscopy: methods and applications (John Wiley & Sons, 2008).
Parker, F. S. Applications of infrared, Raman, and resonance Raman spectroscopy in biochemistry (Springer Science & Business Media, 1983).
Lafuente, B., Downs, R. T., Yang, H. & Stone, N. The power of databases: The RRUFF project. Highlights in Mineralogical Crystallography 1, 1–29 (2016).
Sluiter, M. H., Simonovic, D. & Tasci, E. S. Materials databases for the computational materials scientist. Int J Min Met Mater 18, 303–308 (2011).
Feller, D. The role of databases in support of computational chemistry calculations. J. Comput. Chem. 17, 1571–1586 (1996).
Curtarolo, S. et al. The high-throughput highway to computational materials design. Nat. Mater. 12, 191 (2013).
Caracas, R. & Bobocioiu, E. Theoretical modelling of Raman spectra. In Raman spectroscopy applied to Earth sciences and cultural heritage (Mineralogical Society of Great Britain and Ireland, 2012).
Baroni, S., De Gironcoli, S., Dal Corso, A. & Giannozzi, P. Phonons and related crystal properties from density-functional perturbation theory. Rev. Mod. Phys 73, 515 (2001).
Petousis, I. et al. High-throughput screening of inorganic compounds for the discovery of novel dielectric and optical materials. Sci. Data. 4, 160134 (2017).
Jain, A. et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL Mater 1, 011002 (2013).
Prosandeev, S., Waghmare, U., Levin, I. & Maslar, J. First-order raman spectra of AB′1/2 B″1/2 O 3 double perovskites. Phys. Rev. B 71, 214307 (2005).
Umari, P. & Pasquarello, A. First-principles analysis of the raman spectrum of vitreous silica: comparison with the vibrational density of states. J. Phys. Condens. Matter 15, S1547 (2003).
Petzelt, J. & Dvorak, V. Changes of infrared and raman spectra induced by structural phase transitions. ii. examples. J. Phys. Condens. Matter 9, 1587 (1976).
Born, M. & Huang, K. Dynamical theory of crystal lattices (Clarendon press, 1954).
Ceriotti, M., Pietrucci, F. & Bernasconi, M. Ab initio study of the vibrational properties of crystalline TeO 2: the α, β, and γ phases. Phys. Rev. B 73, 104304 (2006).
Placzek, G. Handbuch der Radiologie (Akademische Verlagsgesellschaft, 1934).
Hayes, W. & Loudon, R. Scattering of light by crystals (Wiley, 1978).
Hamaguchi, H.-O. The resonance effect and depolarization in vibrational raman scattering. Advances in infrared and Raman spectroscopy 12, 273–310 (1985).
Ong, S. P. et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).
Ong, S. P. et al. The materials application programming interface (api): A simple, flexible and efficient api for materials data based on representational state transfer (rest) principles. Comput. Mater. Sci. 97, 209–215 (2015).
Kresse, G. & Hafner, J. Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558 (1993).
Kresse, G. & Hafner, J. Ab initio molecular-dynamics simulation of the liquid-metal–amorphous-semiconductor transition in germanium. Phys. Rev. B 49, 14251 (1994).
Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6, 15–50 (1996).
Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169 (1996).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865 (1996).
Dudarev, S., Botton, G., Savrasov, S., Humphreys, C. & Sutton, A. Electron-energy-loss spectra and the structural stability of nickel oxide: An lsda + u study. Phys. Rev. B 57, 1505 (1998).
Jain, A. et al. A high-throughput infrastructure for density functional theory calculations. Comput. Mater. Sci. 50, 2295–2310 (2011).
Blöchl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953 (1994).
Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758 (1999).
Liang, Q., Dwaraknath, S. & Persson, K. A. High-throughput Computation and Evaluation of Raman Spectra. Figshare, https://doi.org/10.6084/m9.figshare.7427393 (2018).
Anthony, J. W., Bideaux, R. A., Bladh, K. W. & Nichols, M. C. Handbook of mineralogy, volume iv, arsenates, phosphates, vanadates. 1–680, Mineralogical Society of America, Chantilly, Virginia (2000).
Petousis, I. et al. Benchmarking density functional perturbation theory to enable high-throughput screening of materials for dielectric constant and refractive index. Phys. Rev. B 93, 115151 (2016).
Van de Walle, A. & Ceder, G. Correcting overbinding in local-density-approximation calculations. Phys. Rev. B 59, 14992 (1999).
Jain, A. et al. Fireworks: A dynamic workflow system designed for high-throughput applications. Concurr Comput: Pract E. 27, 5037–5059 (2015).
Mathew, K. et al. Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows. Comput. Mater. Sci. 139, 140–152 (2017).
Acknowledgements
This work was funded by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division under Contract No. DE-AC02-05CH11231 (Materials Project program KC23MP). This work made use of resources of the National Energy Research Scientific Computing Center (NERSC), supported by the Office of Basic Energy Sciences of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
Author information
Authors and Affiliations
Contributions
Q.L. performed the phonon and Raman spectra data calculations, obtained experimental data from online database, developed peak matching algorithm and the code for visualization, performed the data benchmarking and analysis, and wrote the paper. S.D. contributed to supervision of the calculations, the data analysis and planning of the paper. K.P. was involved in supervising and planning the work and its integration with the Materials Project effort.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
ISA-Tab metadata file
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
About this article
Cite this article
Liang, Q., Dwaraknath, S. & Persson, K.A. High-throughput computation and evaluation of raman spectra. Sci Data 6, 135 (2019). https://doi.org/10.1038/s41597-019-0138-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-019-0138-y
This article is cited by
-
Automated all-functionals infrared and Raman spectra
npj Computational Materials (2024)
-
A database of computed Raman spectra of inorganic compounds with accurate hybrid functionals
Scientific Data (2024)
-
High-throughput computation of Raman spectra from first principles
Scientific Data (2023)
-
Validating neural networks for spectroscopic classification on a universal synthetic dataset
npj Computational Materials (2023)
-
Advantages and developments of Raman spectroscopy for electroceramics
Communications Materials (2023)