Following the discovery of graphene in 20041, the field of two-dimensional (2D) materials has grown tremendously during the last decade. Today, more than 50 different monolayer compounds including metals2,3, semiconductors4,5,6, insulators7, ferromagnets8, and superconductors9,10, have been chemically grown or mechanically exfoliated from layered bulk crystals11. The enormous interest in 2D materials has mainly been driven by their unique and easily tunable properties (as compared to 3D bulk crystals), which make them attractive for both fundamental research and technological applications in areas such as energy conversion/storage, (opto)-electronics, and photonics6,12,13. Among the various experimental techniques used for characterizing 2D materials, Raman spectroscopy plays a pivotal role14 thanks to its simplicity, non-destructive nature, and high sensitivity towards key materials properties such as chemical composition, layer thickness (number of layers), inter-layer coupling, strain, crystal symmetries and sample quality15,16,17.

Raman spectroscopy is a versatile technique for probing the vibrational modes of molecules and crystals from inelastically scattered light, and is widely used for identifying materials through their unique vibrational fingerprints18. There are various types of Raman spectroscopies that differ in the number of photons or phonons involved in the scattering process18. Here we focus on the first-order Raman processes in which only a single phonon is involved. Typically, this is the dominant scattering process in defect-free samples (which are considered here). Note that Raman processes involving defect states or several phonons may also play important roles in some 2D crystals such as graphene19. As shown schematically in Fig. 1(a), the light scattered from a crystal appears in three distinct frequency bands: A strong resonance at the incident frequency ωin due to Rayleigh (elastic) scattering, and weaker resonances due to Raman (inelastic) scattering at ωin − ων and ωin + ων forming Stokes and anti-Stokes bands, respectively. Here, ων is the frequency of a (Raman active) vibrational mode of the crystal, i.e. a phonon. Depending on the symmetry of the phonon modes and polarization of the electromagnetic fields, a phonon mode may be active or inactive in the Raman spectrum.

Fig. 1: Schematic view of Raman scattering process and inverse Raman problem.
figure 1

a Raman scattering processes, in which incident photons of polarization u\(_{\mathrm{in}}\) and frequency \(\omega _{\mathrm{in}}\) are scattered into u\(_{\mathrm{out}}\) and ω\(_{\mathrm{out}}\) under emission (or absorption) of a phonon with frequency ων. Only zero momentum phonons contribute to first-order Raman processes but, for illustrative purposes, a finite momentum phonon is shown here. In a typical output spectrum, the Rayleigh (elastic), Stokes and anti-Stokes lines are observed. b Given an experimental spectrum, the Raman library based on the open Computational 2D Materials Database (C2DB) can be used to tackle the inverse Raman problem, i.e. identifying the underlying material based on its Raman spectrum.

While semi-classical theories of Raman spectroscopy can provide some qualitative insight18, a full quantum mechanical treatment is necessary for a quantitatively accurate description. In particular, ab initio techniques have been employed successfully to calculate Raman spectra of both molecules18,20 and solids21,22 typically showing good agreement with experimental spectra. The parameter-free nature of such computational schemes endow them with a high degree of predictive power, although their computational cost can be significant, thus, in practice limiting them to relatively simple, i.e. crystalline, materials. In the realm of 2D materials, ab initio Raman studies have been limited to a handful of the most popular 2D crystals including graphene19, hBN23, WTe224, SnS, and SnSe25, as well as MoS2 and WS226. In view of the significant experimental efforts currently being devoted to the synthesis and application of future 2D materials and the important role of Raman spectroscopy as a main characterization tool, it is clear that the compilation of a comprehensive library of Raman spectra of 2D materials across different crystal structures and chemical compositions is a critical and timely endeavor.

Recently, we have introduced the open Computational 2D Materials Database (C2DB)11, which contains various calculated properties for several thousands 2D crystals using state of the art ab initio methods. The properties currently provided in the C2DB include the relaxed crystal structures, thermodynamic phase diagrams (convex hull), electronic band structures and related quantities (effective masses, deformation potentials, etc.), elastic properties (stiffness tensors, phonon frequencies), and optical conductivity/absorbance spectra. We stress that the materials in the C2DB comprise both experimentally known as well as hypothetical materials, i.e. materials that may or may not be possible to synthesize in reality.

In this paper, we present an ab initio high-throughput computation of the resonant first-order Raman spectra of more than 700 monolayers selected as the most stable 2D crystals from the C2DB. The calculations are based on an efficient density functional theory (DFT) implementation of the first-order Raman process employing a localized atomic orbital (LCAO) basis set27. We describe the implementation and the automated workflow for computing the Raman spectra at three different excitation frequencies and nine polarization setups. All calculated Raman spectra are provided in Supplementary Figs. 2–734, and can be found at the C2DB website ( In addition, the applied computational routines are freely available online through the website. Our numerical results are benchmarked against available experimental data for selected 2D crystals (15 different monolayers) such as MoS2, MoSSe, and MoSe2. The calculated spectra show excellent agreement with experiments for the Raman peak positions and acceptable agreement for the relative peak intensities. Finally, we analyze the inverse problem of identifying a material based on an input (experimental) Raman spectrum as shown schematically in Fig. 1(b). Using MoS2 (H-phase) and WTe2 (T\({}^{\prime}\)-phase) as two examples, we find that a simple descriptor consisting of the first and second moments of the Raman spectrum combined with the Euclidean distance measure suffices to identify the correct material among the 700+ candidate materials in the database. In particular, this procedure can be used to differentiate clearly the distinct structural phases of MoS2 and WTe2. Incidentally, the library of calculated Raman spectra provides a useful dataset for training machine learning algorithms28,29. As such, our work is not only a valuable reference for experimentalists and theoreticians working in the field of 2D materials, but also represents a step in the direction of autonomous (in situ) characterization of materials.


Theory of Raman Scattering

We first briefly review the theory of Raman scattering in the context of third-order perturbation theory. As discussed above, accurate modeling of Raman processes requires a quantum mechanical treatment to obtain the electronic properties. Regarding the electromagnetic field, it can be shown that a classical description of the field30,31 yields the same results as the full quantum mechanical theory that quantizes the photon field18,32. The most common approach to Raman calculations is the Kramers–Heisenberg–Dirac approach31, in which the Raman tensor is obtained as a derivative of the electric polarizability with respect to the vibrational normal modes21,26,30,31. Nonetheless, here we employ a more direct and much less-explored approach based on time-dependent third-order perturbation theory to obtain the rate for coherent electronic processes involving creation/annihilation of two photons and one phonon. While the two approaches can be shown to be equivalent18, at least when local field effects can be ignored as is the case for 2D materials, the third-order perturbative approach can be readily extended to higher order Raman processes (e.g. scattering on multiple phonons), and provides a more transparent physical picture of the Raman processes in terms of individual scattering events33. Hence, our computational framework is prepared for future extensions to multi-phonon processes. Note that in terms of computational effort, the perturbative approach is comparable to the polarizability derivative method for typical crystals, for which the matrix element calculation dominates the computation time. In this case, both approaches scale as \({N}_{\nu }{N}_{b}^{2}\), where Nν and Nb denote the number of phonon modes and electronic bands, respectively.

To derive an expression for the Raman intensity, both electron–light and electron–phonon Hamiltonians are treated as perturbations (the exact forms of these Hamiltonians are given in the method section). A general time-dependent perturbation can be written as \(\hat{H^{^\prime}} (t)\equiv {\sum }_{{\omega }_{1}}{\hat{H^{^\prime}}}({\omega }_{1})\exp (-{\mathrm{i}}{\omega }_{1}t)\) (ω1 runs over positive and negative frequencies and can also be zero). Note that, in our study, there are three distinct frequency components in \(\hat{H^{^\prime}} (t)\): input and output frequencies (ωin and ωout) due to the electron–light interaction and zero frequency (i.e. time-independent) for electron–phonon coupling. Within third-order perturbation theory, the transition rate \({P}_{i\to f}^{(3)}\) from an initial state \(\left|{\Psi }_{i}\right\rangle\) to a final state \(|{\Psi }_{f}\rangle\) due to the perturbative Hamiltonian \(\hat{H^{^\prime}} (t)\), is given by34

$${P}_{i\to f}^{(3)}=\frac{2\pi }{\hslash }{\left|\sum _{ab}\sum_{({\omega }_{1}{\omega }_{2}{\omega }_{3})}\frac{\langle {\Psi }_{f}| {\hat{H^{^\prime}}}({\omega }_{1})| {\Psi }_{a}\rangle \langle {\Psi }_{a}| {\hat{H^{^\prime}}}({\omega }_{2})| {\Psi }_{b}\rangle \langle {\Psi }_{b}| {\hat{H^{^\prime}}}({\omega }_{3})| {\Psi }_{i}\rangle }{({E}_{i}-{E}_{a}+\hslash {\omega }_{2}+\hslash {\omega }_{3})({E}_{i}-{E}_{b}+\hslash {\omega }_{3})}\right|}^{2}\delta ({E}_{f}-{E}_{i}-\hslash \omega)\ .$$

Here, ab summations are performed over all eigenstates of the unperturbed system (here a set of electrons and phonons) and the sums over ωn with n = 1, 2, 3 are over all three involved frequencies in the perturbative Hamiltonian \(\hat{H}^{\prime} (t)\). The notation (ω1ω2ω3) indicates that, in performing the summation over ωn, the sum ω1 + ω2 + ω3 = ω is to be held fixed. In addition, Eα with \(\alpha \in \left\{i,f,a,b\right\}\) denote the energies associated with \(\left|{\Psi }_{\alpha }\right\rangle\) and the Dirac delta ensures energy conservation. The light field is written as \({\boldsymbol{\mathcal{F}}}(t)={\boldsymbol{\mathcal{F}}}_{\mathrm{in}}{\bf{u}}_{{\mathrm{in}}}\,{\mathrm{exp}}\, (-{\mathrm{i}}{\omega }_{\mathrm{in}}t)+{\boldsymbol{\mathcal{F}}}_{{\mathrm{out}}}{{\mathbf{u}}}_{{\mathrm{out}}}\,{\mathrm{exp}}\,(-{\mathrm{i}}{\omega }_{{\mathrm{out}}}t)+{\mathrm{complex}}\) conjugate, where \({\boldsymbol{\mathcal{F}}}_{{\mathrm{in}}/{\mathrm{out}}}\) and ωin/out are the amplitudes and frequencies of the input/output electromagnetic fields, respectively, see Fig. 1(a). In addition, \({{\mathbf{u}}}_{{\rm{in}}/{\rm{out}}}={\sum }_{\alpha }{u}_{{\rm{in}}/{\rm{out}}}^{\alpha }{{\bf{e}}}_{\alpha }\) denote the corresponding polarization vectors, where eα is the unit vector along the α-direction with α {xyz}.

We now specialize to the case where the initial and final states of the system are given by \(\left|{\Psi }_{i}\right\rangle =\left|0\right\rangle \otimes \left|{n}_{\nu }\right\rangle\) and \(|{\Psi }_{f}\rangle =|0\rangle \otimes |{n}_{\nu }+1\rangle\), respectively32 so that Ef − Ei = ων. Here, \(\left|0\right\rangle\) denotes the ground state of the electronic system and \(\left|{n}_{\nu }\right\rangle\) is a state with nν phonons at frequency of ων. In this case, the intensity of the Stokes Raman process for a phonon mode is proportional to \({P}_{i\to f}^{(3)}\), in which the transition rate involves a photon absorption, followed by an emission of a single phonon and photon. For this type of processes, (ω1ω2ω3) are any permutation of (ωin, −ωout, 0), e.g. ω1 = ωin, ω2 = −ωout, ω3 = 0 and five similar terms (all six terms contribute to the response at frequency of ω = ωin − ωout). The total Raman intensity I(ω) is then obtained by summing over all possible final states, i.e. phonon modes ν. Inserting the perturbative Hamiltonians [c.f. Eqs. (6)–(8) in method section] in Eq. (1), the expression for the Stokes Raman intensity involving scattering events by only one phonon can be written

$$I(\omega )={I}_{0}\sum_{\nu }\frac{{n}_{\nu }+1}{{\omega }_{\nu }}{\left|\sum_{\alpha \beta }{u}_{{\rm{in}}}^{\alpha }{R}_{\alpha \beta }^{\nu }{u}_{{\rm{out}}}^{\beta }\right|}^{2}\delta (\omega -{\omega }_{\nu })\ .$$

Here, I0 is an unimportant constant (since Raman spectra are always reported normalized) that is proportional to the input intensity and depends on the input frequency, and nν is given by the Bose–Einstein distribution, i.e. \({n}_{\nu }\equiv {(\exp [\hslash {\omega }_{\nu }/{k}_{B}T]-1)}^{-1}\) at temperature T. Due to momentum conservation, only phonons at the center of the Brillouin zone contribute to the one-phonon Raman processes19. Furthermore, \({R}_{\alpha \beta }^{\nu }\) denotes the Raman tensor for phonon mode ν, see method section. Eq. (2) is used for computing the Raman spectra in this work for a given excitation frequency and polarization setup. It may be noted that one can derive a similar expression for the anti-Stokes Raman intensity by replacing nν + 1 by nν in Eq. (2) and ων by −ων in Eq. (10) in method section. Note, also, that the Raman shift ω is expressed in cm−1 with 1 meV equivalent to 8.0655 cm−1.

Computational workflow

An overview of the automated workflow for computing the Raman tensor of the materials in the C2DB is shown in Fig. 2. First, the relaxed structures are extracted from the database. In this work, we consider only compounds that are dynamically stable. Next, the electronic band energies and wavefunctions are obtained from a DFT calculation. In parallel, a zone-center phonon calculation is performed to obtain the optical vibrational modes. From the obtained electronic states and phonon modes, the momentum and electron–phonon matrix elements are evaluated and stored. In the final step, for a given excitation frequency and input/output polarization vectors, the Raman spectrum is calculated using Eq. (2). The key feature of the approach outlined here is that the calculation process can be automatized, allowing one to perform thousands of calculations in parallel without human intervention.

Fig. 2: Computational workflow.
figure 2

The diagram illustrates the steps necessary to calculate the Raman tensor of a material.

For simplicity, we have restricted the study to non-magnetic materials, but our routines can be readily extended to include magnetic materials. The Raman spectra presented in this paper are computed for in-plane polarization, where the incoming and outgoing photons are polarized along the x- or y-directions, i.e. uin/out are either [1, 0, 0] or [0, 1, 0]. The four possible combinations are referred to as xx, xy, yx, and yy polarization setups.

Raman spectra and comparison with experiments

  Figure 3 compares the calculated Raman spectrum of three different monolayer transition metal dichalcogenides (TMDs), namely MoSe2, MoSSe and MoS2, with the experimental data extracted from ref. 35. For all three monolayers, a good agreement is observed both for the peak positions and relative amplitudes of the main peaks. Additional peaks in the experimental spectra presumably originate from the substrate or defects in the samples. The differences between the Raman spectra of the three materials provide valuable information about the crystal structure. Symmetry and Raman activity of phonon modes are determined by the irreducible point group representations. MoS2 and MoSe2 are members of point group D3h, whereas MoSSe lacking a horizontal mirror plane σh belongs to the point group C3v. In Mulliken notation, the irreducible representation of MoS2 and MoSe2 is \(2{{\rm{A}}}_{2}^{{\prime\prime} }+{{\rm{A}}}_{1}^{\prime}+2{{\rm{E}}}^{\prime}+{{\rm{E}}}^{^{\prime\prime} }\), whereas for MoSSe the lowered symmetry leads to 3A1 + 3E. For MoS2 and MoSe2, one member of both \({{\rm{A}}}_{2}^{{\prime\prime} }\) and \({\rm{E}}^{\prime}\) is an acoustic mode, and the other \({{\rm{A}}}_{2}^{{\prime\prime} }\) mode is Raman inactive. For MoSSe, A1 and E each contain an acoustic mode, and all other modes are Raman active. The relevant modes are shown schematically in Fig. 3(b). In general, a Raman active mode will only appear in certain polarization configurations. The tensorial Raman selection rules follow from the irreducible point group representations36,37 as shown for point groups D3h and C3v in Supplementary Note 1.

Fig. 3: Evolution of Raman spectra from MoSe2 over MoSSe to MoS2.
figure 3

a Comparison of the computed Raman spectra (solid) with the experimental results in ref. 36 (dashed) for MoSe2 (top), MoSSe (middle) and MoS2 (bottom). The excitation wavelength is 532 nm, and both input and output electromagnetic fields are polarized along the y-direction. b Optical phonon modes for MoSe2 (top), MoSSe (middle) and MoS2 (bottom) labeled by the irreducible representations of the respective point groups. Note that \({{\rm{A}}}_{2}^{{\prime\prime} }\) modes (shown in red) are Raman inactive.

Next, we focus on the case of MoS2, and investigate the dependency of the Raman spectrum on the excitation frequency and polarization, see Figs. 4(a) and 4(b), respectively. In Fig. 4(a), the Raman spectra are computed for three commonly used wavelengths of blue, green and red laser sources. In this case both in- and outgoing polarization vectors are along the y-direction (or x-direction). While the relative strength of the first Raman active peak in the spectrum is enhanced slightly for shorter wavelengths, the shape of the spectrum does not change significantly. Note that, in reality, the relative amplitudes of the \({\rm{E}}^{\prime}\) and \({{\rm{A}}}_{1}^{\prime}\) modes may change considerably if the excitation frequency coincides with an exciton resonance38,39. This is because excitons can selectively enhance specific Raman modes due to their symmetry40,41. Although this effect is not captured properly in our independent-electron model, it is in principle straightforward to include by using the many-body eigenstates obtained by diagonalizing the Bethe–Salpeter equation (BSE) when evaluating the matrix elements in Eq. (1)41,43,44,45,45. Moreover, the absolute magnitudes of the Raman peaks can vary substantially by changing the excitation wavelength due to the possible resonance with electronic states (resonance Raman spectroscopy)40. Nonetheless, the overall magnitude of the Raman spectra is usually of little practical importance compared to the spectral positions and spectra are typically normalized as done here. Changing the polarization of electromagnetic fields not only influences the relative amplitudes of Raman peaks, but may switch certain modes on and off as shown in Fig. 4(b). For instance, the MoS2\({\rm{E}}^{\prime}\) mode becomes completely inactive for the perpendicular polarization setup (zz) due to symmetry26. This is easily confirmed using Supplementary Eq. (1) of Supplementary Note 1 predicting an inactive \({\rm{E}}^{\prime}\) mode for zz-polarization. Note that, although the E mode is Raman active for xz-, yz-, zx-, and zy-polarizations, the intensity is too small to be observed in Fig. 4(b).

Fig. 4: Polarization and frequency dependent Raman spectra.
figure 4

a Raman spectra of MoS2 evaluated at three different excitation wavelengths, blue (488 nm), green (532 nm), and red (633 nm) for the xx-polarization setup. b Polarized Raman spectra of MoS2 for various input and output polarization directions at 532 nm excitation wavelength. The inset shows a top view of the crystal structure.

We have assessed the quality of the Raman library for a wide range of material compositions and crystal structures. Fig. 5 compares experimental and calculated Raman spectra for 12 monolayers including graphene, hBN, several conventional TMDs in the H- or T\({}^{\prime}\)-phase as well as anisotropic crystals such as phosphorene and Pd2Se4. In general, the number of Raman active modes increases with the number of atoms in the unit cell, as expected. For instance, there are more than eight peaks in the Raman spectrum of Pd2Se4. Furthermore, as a rule of thumb, Raman modes of materials containing heavier atoms are at lower frequencies and vice versa, e.g. the Raman peaks for graphene and hBN appear at frequencies above 1000 cm−1. The experimental data are obtained under various experimental conditions such as different excitation wavelengths and polarizations or diverse sample substrates. Note that if polarized Raman spectra were not available (or in the case of unspecified polarization), an average of all four in-plane polarization settings, i.e. xx, xy, yx, and yy, has been used for generating the theoretical spectra. In general, there is quite good agreement between our calculations and experimental results in all cases, particularly, for the peak positions. The deviations can be attributed to various factors such as substrate and excitonic effects, which are not captured in our calculations, as well as the quality of the experimental samples and other experimental uncertainties, all of which can influence the spectra considerably.

Fig. 5: Raman spectra of 12 monolayers.
figure 5

Comparison of computed Raman spectra (solid lines) with available experimental results (dashed lines). The experimental data are extracted from Refs. 23,69,68,69,70,71,72,73,74,75,76,77,78,79, for (a) to (l), respectively. The temperature is set to 300 K (room temperature) and excitation wavelength is specified in each case, see the main text. The crystal structures are shown in the insets including top view and cross sectional views. For all crystal structures the x- and y-directions are along the horizontal and vertical directions, respectively, as shown for graphene.

Identifying materials from their Raman spectra

At this point, we turn to a critical test of the ab initio Raman library: given an experimental Raman spectrum, is it possible to identify the underlying material by comparing the experimental spectrum to a library of calculated spectra? The answer to this question will depend on several factors including: (1) the quality of the experimental spectrum. (2) The quality of the calculated spectra, i.e. the ability of theory to reproduce a (high quality) experimental spectrum for a given material. (3) The size/density of the calculated Raman spectrum database. Obviously, a more densely populated database increases the chances that the experimental sample is, in fact, contained in the database. But, at the same time, this increases the risk of obtaining a false positive, i.e. matching the experimental spectrum by a calculated spectrum of a different material.

Putting the above idea into practice requires a quantitative measure for comparing Raman spectra. In the present work, we use the two lowest moments to fingerprint the Raman spectrum. In general, the Nth Raman moment of the spectrum is given by

$$\langle {\omega }^{N}\rangle \equiv \int_{0}^{\infty }I(\omega ){\omega }^{N}{\rm{d}}\omega =\sum_{\nu }{I}_{\nu }{\omega }_{\nu }^{N}\ ,$$

where Iν denotes the amplitude of mode ν, i.e. \({I}_{\nu }={I}_{0}({n}_{\nu }+1)| {\sum }_{\alpha \beta }{u}_{{\rm{in}}}^{\alpha }{R}_{\alpha \beta }^{\nu }{u}_{{\rm{out}}}^{\beta }{| }^{2}/{\omega }_{\nu }\). Note that, for these calculations, we normalize the Raman spectrum such that its zeroth moment becomes one, i.e. \(\mathop{\int}\nolimits_{0}^{\infty }I(\omega ){\rm{d}}\omega ={\sum }_{\nu }{I}_{\nu }=1\). Therefore, the first Raman moment corresponds to the mean value of the spectrum. Rather than using the second moment, we use the standard deviation of the spectrum as the selected measure, given by

$$\delta \omega =\sqrt{\langle {\omega }^{2}\rangle -{\langle \omega \rangle }^{2}}\ .$$

Figure 6 shows a scatter plot of 〈ω〉 and δω/〈ω〉 for the 733 monolayers at an excitation wavelength of 532 nm and xx-polarization setup obtained at the room temperature. In this plot, crystals composed of lighter elements appear further to the right because their optical phonons generally have higher energies. Furthermore, crystals with fewer atoms in the unit cell and/or higher degree of symmetry, appear in the bottom of the plot because they have fewer (non-degenerate) phonons and thus fewer peaks in their Raman spectrum resulting in a reduced frequency spread. In particular, δω vanishes for materials with only a single Raman peak such as graphene and hBN.

Fig. 6: Calculated Raman moments.
figure 6

Scatter plot of the first Raman moment and normalized standard deviation for 733 calculated spectra at excitation wavelength of 532 nm and xx-polarization setup (circles). For comparison, several independent experimental spectra for monolayer MoS2 (in H-phase) and WTe2 (in T\({}^{\prime}\)-phase) are also shown (stars). We highlight the points corresponding to MoS2 and WTe2 in red (H-phase, T\({}^{\prime}\)-phase, and experiments). The insets are zooms of the vicinity of the experimental data for MoS2 and WTe2. For MoS2, 1–5 correspond to the experimental spectra obtained from refs. 36,46,79,80, and41, respectively, whereas 1–3 for WTe2 are adopted from refs. 47,81, and75, respectively.

To test the feasibility of inverse Raman mapping, we evaluate the lowest Raman moment fingerprint for five experimental Raman spectra of MoS2 (H-phase) and three spectra of WTe2 (T\({}^{\prime}\)-phase) obtained from independent studies, see stars in Fig. 6. Similar analyses have been performed for the eleven additional crystals found in Fig. 5, and is provided in Supplementary Note 2. Firstly, note that the fingerprint of MoS2 in the T\({}^{\prime}\)-phase (WTe2 in H-phase) is located relatively far from the H-phase (T\({}^{\prime}\)-phase) fingerprint in the plot, which suggests that the lowest Raman moments are indeed able to distinguish different structural phases of the same material. The insets highlight the regions surrounding the experimental data. The variation in the experimental fingerprints is due to small differences in the Raman spectra, originating from the variations in sample quality, substrate effects, measurement techniques/conditions, etc. Consequently, the precise peak positions and, in particular, their amplitudes can vary from one experiment to another. Clearly, the fingerprints of the calculated spectra for both MoS2 and WTe2 lie close to the experimental data. In a few cases, such as Pd2Se4, the experimental fingerprints lie further from the theoretical predictions, as illustrated in Supplementary Fig. 1. This may partly be due to insufficient sample quality for these less-explored 2D crystals. In fact, the deviation between theory and experiments is comparable to the variation between the different experiments. Importantly, only a few other materials show a similar agreement with the experimental data. This suggests that fingerprints including higher order moments could single out the correct material with even higher precision. For instance, the skewness (based on the third Raman moment) can be used to distinguish MoS2 from CrS2. By manual inspection of the Raman spectra, one readily confirms that the calculated spectra of MoS2 and WTe2 are in fact the best match to the experimental spectra, e.g. other candidates have Raman peaks that are not observed in the experimental spectra or the relative amplitudes of the peaks are completely different from the experimental data. Nonetheless, the procedure of manual inspection can be replaced by a more rigorous and unbiased approach as discussed below.

To compare the experimental and calculated Raman spectra quantitatively, we focus on the experimental data of Tongay et al.46 and Cao et al.47 for MoS2 and WTe2, respectively. The experimental spectra for MoS2 are obtained without any polarizer at 77 K at an excitation wavelength of 488 nm. For WTe2 in Cao et al.47, the experiment is performed at room temperature using a 532 nm laser linearly polarized in-plane. To account for the unspecified polarization, we take the average of Raman spectra for the xx and xy polarization setups in the case of WTe2, while for MoS2 the average of all Raman spectra for transverse components (xx, xy, yx, and yy) is used as the theoretical spectrum. For quantitative comparison with the experimental data, one can use Euclidean distances between the experimental and theoretical spectra as a measure. For two Raman spectra I1(ω) and I2(ω), the Euclidean distance (or L2-norm) I1 − I2 is defined as

$$| | {I}_{1}-{I}_{2}| | \equiv {\left(\int_{0}^{\infty }{\left|{I}_{1}(\omega )-{I}_{2}(\omega )\right|}^{2}{\rm{d}}\omega \right)}^{1/2}\ .$$

Note that the spectra are normalized such that the total area is unity. Figure 7 shows the computed Euclidean distances from the calculated Raman spectra to the experimental data for both MoS2 and WTe2. We highlight the points corresponding to the materials in the insets of Fig. 6. In both cases, identifying the smallest Euclidean distance confirms that the Raman spectra closest to the experimental data are indeed the calculated spectra of MoS2 and WTe2. This shows that the quality and accuracy of, respectively, the experimental and computed 2D materials Raman spectra, is sufficient for automatic structure identification.

Fig. 7: Euclidean distances between Raman spectra.
figure 7

Distances (see main text for details) are calculated between theoretical Raman spectra and the experimental data of Tongay et al.46 and Cao et al.47 for MoS2 (top) and WTe2 (bottom). For comparison purposes, we highlight the points corresponding to materials in the insets of Fig. 6 by yellow.


We have introduced a comprehensive library of ab initio computed Raman spectra for more than 700 2D materials spanning a variety of chemical compositions and crystal structures. The 2D materials comprise both experimentally known and hypothetical compounds, all dynamically stable and with low formation energies. Using an efficient first-principles implementation of third-order perturbation theory, the full resonant first-order Raman tensor was calculated including all nine possible combinations for polarization vectors of the input/output photons and three commonly used excitation wavelengths. All spectra are freely available as part of the C2DB and should comprise a valuable reference for both theoreticians and experimentalists in the field. The reliability of the computational approach was demonstrated by comparison with experimental spectra for 15 monolayers such as graphene, hBN, phosphorene and several TMDs in the H-, T-, and T\({}^{\prime}\)-phases.

We carefully tested the feasibility of inverse Raman mapping, i.e. to what extent the library of computed Raman spectra can be used to identify the composition and crystal structure of an unknown material from its Raman spectrum. For the specific cases of MoS2 in H-phase and WTe2 in T\({}^{\prime}\)-phase, we showed that a simple fingerprint based on the lowest moments of the Raman spectrum is sufficient to identify the materials from their experimental Raman spectrum. This represents a significant step in the direction of autonomous identification/characterization of materials. In addition, apart from being a useful reference for 2D materials research, the Raman library can be used to train machine learning algorithms to predict Raman spectra directly from the atomic structure similarly to recent work on prediction of linear optical spectra for molecules48. This is of particular importance in the currently attractive trend of employing machine learning algorithms in materials science28,29.

In the present work, we have focused on Raman processes involving only a single phonon, i.e. first-order Raman processes, since these are typically the dominant contributions to the Raman spectrum. Nonetheless, the presented methodology can be readily extended to include two-phonon scattering processes, although the computational cost will be significantly increased. Excitonic effects in the Raman spectrum have been neglected since most experimental Raman spectra are recorded off-resonance where excitons play a minor role. The inclusion of excitonic effects can be achieved within the presented methodology by employing the many-body eigenstates obtained from the BSE42,49,50 instead of Slater determinantal electron–hole excitations. However, this will mainly affect the amplitude of the Raman peaks which is of secondary importance in practice. We only compute the Raman spectra of monolayers in the present work, but the library can be extended to multi-layer structures. For some 2D materials such as graphene or MoS2 this can be done by employing existing exchange-correlation functionals capable of accurate modeling of van der Waals forces. But for other 2D systems such as phosphorene, further development of exchange-correlation functionals is required to describe the complex inter-layer couplings, particularly for low-frequency Raman modes51. The symmetry of phonons modes have previously been investigated for graphene52, the TMD family37 and phosphorene53 using group theory analysis. Based on the Raman library, such analysis could be performed for a much wider range of materials in future work. Finally, the current work has been restricted to non-magnetic materials, and the ab initio Raman response of magnetic materials is an interesting future research field.



In the independent-particle approximation, the Hamiltonian of a system of electrons interacting with phonons and electromagnetic fields takes the form \(\hat{H}={\hat{H}}_{0}+{\hat{H}}_{{\rm{e}}\gamma }+{\hat{H}}_{{\rm{e}}\nu }\), where \({\hat{H}}_{0}\) is the unperturbed Hamiltonian of the electrons (e) and phonons (ν), \({\hat{H}}_{{\rm{e}}\gamma }\) describes the electron–light interaction (here written in the velocity or minimal coupling gauge54,55), and \({\hat{H}}_{{\rm{e}}\nu }\) describes the electron–phonon coupling. In second quantization, they are given by56

$${\hat{H}}_{0}\equiv \sum_{n{\bf{k}}}{\varepsilon }_{n{\bf{k}}}{\hat{c}}_{n{\bf{k}}}^{\dagger }{\hat{c}}_{n{\bf{k}}}+\sum_{\nu {\bf{q}}}\hslash {\omega }_{\nu {\bf{q}}}\left({\hat{a}}_{\nu {\bf{q}}}^{\dagger }{\hat{a}}_{\nu {\bf{q}}}+\frac{1}{2}\right)\ ,$$
$${\hat{H}}_{{\rm{e}}\gamma }(t)=\frac{e}{m}{\boldsymbol{\mathcal{A}}}(t)\cdot \sum _{nm{\bf{k}}}{{\bf{p}}}_{nm{\bf{k}}}{\hat{c}}_{n{\bf{k}}}^{\dagger }{\hat{c}}_{m{\bf{k}}}\ ,$$
$${\hat{H}}_{{\rm{e}}\nu }=\sum _{ ^{nm\nu} _{{\bf{k}}{\bf{q}}}}\sqrt{\frac{\hslash }{{\omega }_{\nu {\bf{q}}}}}{g}_{nm{\bf{k}}}^{\nu {\bf{q}}}{\hat{c}}_{n{\bf{k}}}^{\dagger }{\hat{c}}_{m{\bf{k}}}\left({\hat{a}}_{\nu {\bf{q}}}+{\hat{a}}_{\nu (-{\bf{q}})}^{\dagger }\right)\ .$$

Here, \({\hat{c}}^{\dagger }/\hat{c}\) and \({\hat{a}}^{\dagger }/\hat{a}\) are the creation/annihilation operators of electrons and phonons, respectively, \({\boldsymbol{\mathcal{A}}}\) denotes the vector potential (\({\boldsymbol{\mathcal{F}}}=-\partial {\boldsymbol{\mathcal{A}}}/\partial t\)), εnk is the energy of the single-particle electronic state \(\left|n{\bf{k}}\right\rangle\), and ωνq denotes the phonon energy of normal mode ν and wavevector q. Furthermore, \({{\bf{p}}}_{nm{\bf{k}}}=\langle n{\bf{k}}| \hat{{\bf{p}}}| m{\bf{k}}\rangle\) and \({g}_{nm{\bf{k}}}^{\nu {\bf{q}}}=\langle n{\bf{k}}+{\bf{q}}| {\partial }_{\nu {\bf{q}}}{V}^{{\rm{KS}}}| m{\bf{k}}\rangle\) are the momentum and electron–phonon matrix elements (to the first order in the atomic displacements56), respectively, with the Kohn–Sham potential VKS. The summation over k implies an integral over the first Brillouin zone, i.e. (2π)Dk → VDBZdDk where V is the D-dimensional volume (D = 2 for 2D systems). Note that the \({\boldsymbol{\mathcal{A}}}^{2}\) term does not contribute to the linear Raman response and, hence, is absent here. Moreover, we neglect the Coulomb interaction between electrons and holes, i.e. excitonic effects. If the Raman spectroscopy is performed with an excitation frequency that matches the exciton energy38, the electron–hole interactions should be included, ideally within the GW and BSE framework41,43,43.

We now insert the Hamiltonians given in Eqs. (6)–(8) in the third-order perturbation rate, Eq. (1). As mentioned in the main text, for the Stokes processes involving one phonon, six permutations of (ωin, − ωout, 0) are used for (ω1ω2ω3). Furthermore, the eigenstates of the unperturbed Hamiltonian \(\left|{\Psi }_{a}\right\rangle\) (or \(\left|{\Psi }_{b}\right\rangle\)) can be written as \(\left|{\psi }_{e}\right\rangle \otimes \left|{n}_{\nu ^{\prime} }\right\rangle\), where \(\left|{\psi }_{e}\right\rangle\) and \(\left|{n}_{\nu ^{\prime} }\right\rangle\) are the many-body electronic and phononic states, respectively (the index e runs only over many-body electronic states). Since \({\hat{H}}_{{\rm{e}}\nu }\) and \({\hat{H}}_{{\rm{e}}\gamma }\) are, respectively, linear in and independent of the phononic operator, the phonon state \(\left|{n}_{\nu ^{\prime} }\right\rangle\) contributes to the Stokes response only if \(\left|{n}_{\nu ^{\prime} }\right\rangle\) is either \(\left|{n}_{\nu }\right\rangle\) or \(\left|{n}_{\nu }\pm 1\right\rangle\). Consequently, \(\langle {\Psi }_{a}| {\hat{H}}_{{\rm{e}}\gamma }| {\Psi }_{i}\rangle =\langle {\psi }_{e}| {\hat{H}}_{{\rm{e}}\gamma }| 0\rangle {\delta }_{\nu ^{\prime} \nu }{\delta }_{{n}_{\nu ^{\prime} }{n}_{\nu }}\) and \(\langle {\Psi }_{a}| {\hat{H}}_{{\rm{e}}\nu }| {\Psi }_{i}\rangle =\langle {\psi }_{e}| {\hat{H}}_{{\rm{e}}\nu }| 0\rangle {\delta }_{\nu ^{\prime} \nu }{\delta }_{{n}_{\nu ^{\prime} }({n}_{\nu }\pm 1)}\) (δij denotes the Kronecker delta). The total Raman intensity I is obtained by summing over all final states, i.e. phonon modes, and given by I(ω) = I0ν(nν + 1)Pν2δ(ω − ων)/ων, where Pν is defined as

$${P}_{\nu }\equiv \,\sum _{ed}\left[\frac{\langle 0| {\bf{u}}_{\rm{in}}\cdot \hat{{\bf{P}}}| {\psi }_{e}\rangle \langle {\psi }_{e}| {\hat{G}}_{\nu} | {\psi }_{d}\rangle \langle {\psi }_{d}| {\bf{u}}_{\rm{out}}\cdot {\hat{{\bf{P}}}}| 0\rangle }{(\hslash {\omega }_{{\rm{in}}}-{{\mathcal{E}}}_{e})(\hslash {\omega }_{{\rm{out}}}-{{\mathcal{E}}}_{d})}\, +\, \frac{\langle 0| {\bf{u}}_{\rm{in}}\cdot {\hat{{\bf{P}}}}| {\psi }_{e}\rangle \langle {\psi }_{e}| {\bf{u}}_{\rm{out}}\cdot {\hat{{\bf{P}}}}| {\psi }_{d}\rangle \langle {\psi }_{b}| {\hat{G}}_{\nu} | 0\rangle }{(\hslash {\omega }_{{\rm{in}}}-{{\mathcal{E}}}_{e})(\hslash {\omega }_{\nu }-{{\mathcal{E}}}_{d})}\right.\\ \, + \, \frac{\langle 0| {\bf{u}}_{\rm{out}}\cdot {\hat{{\bf{P}}}}| {\psi }_{e}\rangle \langle {\psi }_{e}| {\hat{G}}_{\nu} | {\psi }_{d}\rangle \langle {\psi }_{d}| {\bf{u}}_{\rm{in}}\cdot {\hat{{\bf{P}}}}| 0\rangle }{(-\hslash {\omega }_{{\rm{out}}}-{{\mathcal{E}}}_{e})(-\hslash {\omega }_{{\rm{in}}}-{{\mathcal{E}}}_{d})}\, +\, \frac{\langle 0| {\bf{u}}_{\rm{out}}\cdot {\hat{{\bf{P}}}}| {\psi }_{e}\rangle \langle {\psi }_{e}| {\bf{u}}_{\rm{in}}\cdot {\hat{{\bf{P}}}}| {\psi }_{d}\rangle \langle {\psi }_{d}| {\hat{G}}_{\nu} | 0\rangle }{(-\hslash {\omega }_{{\rm{out}}}-{{\mathcal{E}}}_{e})(\hslash {\omega }_{\nu }-{{\mathcal{E}}}_{d})}\\ \, + \, \left.\frac{\langle 0| {\hat{G}}_{\nu} | {\psi }_{e}\rangle \langle {\psi }_{e}| {\bf{u}}_{\rm{in}}\cdot {\hat{{\bf{P}}}}| {\psi }_{d}\rangle \langle {\psi }_{d}| {\bf{u}}_{\rm{out}}\cdot {\hat{{\bf{P}}}}| 0\rangle }{(-\hslash {\omega }_{\nu }-{{\mathcal{E}}}_{e})(\hslash {\omega }_{{\rm{out}}}-{{\mathcal{E}}}_{d})}\, +\, \frac{\langle 0| {\hat{G}}_{\nu} | {\psi }_{e}\rangle \langle {\psi }_{e}| {\bf{u}}_{\rm{out}}\cdot {\hat{{\bf{P}}}}| {\psi }_{d}\rangle \langle {\psi }_{d}| {\bf{u}}_{\rm{in}}\cdot {\hat{{\bf{P}}}}| 0\rangle }{(-\hslash {\omega }_{\nu }-{{\mathcal{E}}}_{e})(-\hslash {\omega }_{{\rm{in}}}-{{\mathcal{E}}}_{d})}\right].$$

Here, \({{\mathcal{E}}}_{e/d}\) denote the electronic energies (with respect to the electronic ground state), the summations over e and d include only electronic states, and \(\hat{{\bf{P}}}\) and \(\hat{G}_\nu\) are many-body electronic operators given by \(\hat{{\bf{P}}}\equiv {\sum }_{nm{\bf{k}}}{{\bf{p}}}_{nm{\bf{k}}}{\hat{c}}_{n{\bf{k}}}^{\dagger }{\hat{c}}_{m{\bf{k}}}\) and \(\hat{G}_\nu \equiv {\sum }_{nm{\bf{k}}}{g}_{nm{\bf{k}}}^{\nu {\bf{0}}}{\hat{c}}_{n{\bf{k}}}^{\dagger }{\hat{c}}_{m{\bf{k}}}\). Note that the momentum conservation implies that only phonons at q = 0 contribute to the response here19, i.e. ων ≡ ων0. Since both \(\hat{{\bf{P}}}\) and \(\hat{G}_\nu\) are bi-linear in the electronic operator, for a non-vanishing matrix elements, \(|{\psi }_{e/d}\rangle\) must include singly-excited states, i.e. terms in the form \({\hat{c}}_{c{\bf{k}}}^{\dagger }{\hat{c}}_{v{\bf{k}}}\left|0\right\rangle\) (indices c and v imply conduction and valence bands, respectively)50. Excitonic effects can readily be introduced at this stage by incorporating the BSE solution45,50. However, we neglect the excitonic effects in the present work, and hence, each singly-excited state contributes individually to the response, i.e. \(|{\psi }_{e/d}\rangle ={\hat{c}}_{c{\bf{k}}}^{\dagger }{\hat{c}}_{v{\bf{k}}}|0\rangle\) with an energy of \({{\mathcal{E}}}_{e/d}={\varepsilon }_{c{\bf{k}}}-{\varepsilon }_{v{\bf{k}}}\). At finite temperature, the expression for \(|{\psi }_{e/d}\rangle\) should be taken as \(|{\psi }_{e/d}\rangle ={f}_{i}(1-{f}_{j}){\hat{c}}_{j{\bf{k}}}^{\dagger }{\hat{c}}_{i{\bf{k}}}|0\rangle\), where \({f}_{i}\equiv {(1+\exp [({\varepsilon }_{i{\bf{k}}}-\mu )/{k}_{B}T])}^{-1}\) is the Fermi–Dirac distribution with chemical potential μ.

Rewriting Pν in terms of the single-particle variables and polarization vectors leads to Eq. (2) for the Raman intensity, where the Raman tensor component, \({R}_{\alpha \beta }^{\nu }\), reads

$${R}_{\alpha \beta }^{\nu }\equiv \sum _{ijmn{\bf{k}}}\left[\frac{{p}_{ij}^{\alpha }({g}_{jm}^{\nu }{\delta }_{in}-{g}_{ni}^{\nu }{\delta }_{jm}){p}_{mn}^{\beta }}{(\hslash {\omega }_{{\rm{in}}}-{\varepsilon }_{ji})(\hslash {\omega }_{{\rm{out}}}-{\varepsilon }_{mn})}\, +\, \frac{{p}_{ij}^{\alpha }({p}_{jm}^{\beta }{\delta }_{in}-{p}_{ni}^{\beta }{\delta }_{jm}){g}_{mn}^{\nu }}{(\hslash {\omega }_{{\rm{in}}}-{\varepsilon }_{ji})(\hslash {\omega }_{\nu }-{\varepsilon }_{mn})}\right.\\ \quad \quad +\, \frac{{p}_{ij}^{\beta }({g}_{jm}^{\nu }{\delta }_{in}-{g}_{ni}^{\nu }{\delta }_{jm}){p}_{mn}^{\alpha }}{(-\hslash {\omega }_{{\rm{out}}}-{\varepsilon }_{ji})(-\hslash {\omega }_{{\rm{in}}}-{\varepsilon }_{mn})}\, +\, \frac{{p}_{ij}^{\beta }({p}_{jm}^{\alpha }{\delta }_{in}-{p}_{ni}^{\alpha }{\delta }_{jm}){g}_{mn}^{\nu }}{(-\hslash {\omega }_{{\rm{out}}}-{\varepsilon }_{ji})(\hslash {\omega }_{\nu }-{\varepsilon }_{mn})}\\ \quad \quad +\,\left.\frac{{g}_{ij}^{\nu }({p}_{jm}^{\alpha }{\delta }_{in}-{p}_{ni}^{\alpha }{\delta }_{jm}){p}_{mn}^{\beta }}{(-\hslash {\omega }_{\nu }-{\varepsilon }_{ji})(\hslash {\omega }_{{\rm{out}}}-{\varepsilon }_{mn})}\, +\, \frac{{g}_{ij}^{\nu }({p}_{jm}^{\beta }{\delta }_{in}-{p}_{ni}^{\beta }{\delta }_{jm}){p}_{mn}^{\alpha }}{(-\hslash {\omega }_{\nu }-{\varepsilon }_{ji})(-\hslash {\omega }_{{\rm{in}}}-{\varepsilon }_{mn})}\right]{f}_{i}(1-{f}_{j}){f}_{n}(1-{f}_{m})\ .$$

Here, εij ≡ εik − εjk, \({p}_{ij}^{\alpha }\equiv \langle i{\bf{k}}| {\hat{p}}^{\alpha }| j{\bf{k}}\rangle\), \({g}_{ij}^{\nu }\equiv \langle i{\bf{k}}| {\partial }_{\nu {\bf{0}}}{V}^{{\rm{KS}}}| j{\bf{k}}\rangle\), and (ijmn)/ν are the electron/phonon band index. The line-shape broadening is accounted for by adding a small phenomenological imaginary part, iη, to the photon frequencies ωin/out → ωin/out + iη. We set the frequency broadening to η = 200 meV in our calculations.

First-principles calculations

All DFT calculations are performed with the projector-augmented wave code, GPAW57,58, in combination with the atomic simulation environment (ASE)59. The Perdew–Burke–Ernzerhof (PBE) exchange-correlation functional is used60 and the Kohn–Sham orbitals are expanded using the double zeta polarized (dzp) basis set27. Despite its fairly small size, the dzp basis set provides sufficiently accurate phonon modes. This has been tested by benchmarking the phonon frequencies obtained from this basis set against the results using the commonly-employed plane waves for 700+ monolayers (more than 7000 phonon modes). We confirm that for approximately 80% of all phonons, the discrepancy between the two approaches is less than 5%. Also, the choice of exchange-correlation functional may slightly influence the Raman spectra. For instance, it is known that the PBE functional tends to overestimate the lattice parameters and underestimate the phonon frequencies in crystals61, whereas the opposite occurs for the local-density approximation (LDA) functionals. Nonetheless, this choice only slightly influences our calculated Raman spectra, and PBE usually provides sufficiently accurate phonon frequencies in the range of theoretical and experimental uncertainties62. The monolayers are placed between two vacuum regions with thicknesses of 15 Å. A convergence test of Raman spectra with respect to the wavevector density is performed for several materials, and a mesh with the density of 25 Å−1 for ground state calculations was chosen. The phonon modes are obtained using the standard approach based on calculating the dynamical matrices in the harmonic approximation63. The dynamical matrix is evaluated using the small-displacement method64, where the change of forces on a specific atom caused by varying the position of neighboring atoms is computed. Since only the zone-centered (Γ-point) phonons are required, the phonon modes can be computed based on the crystal unit cell. A k-mesh with a density of 12 Å−1 is used for phonon calculations, and the forces are converged within 10−6 eVÅ−1. Since the wavefunctions and Kohn–Sham potentials in GPAW are evaluated on a real-space grid57, a convergence test with respect to this grid spacing is performed and a real-space grid of 0.2 Å is chosen for calculations. The electron–phonon matrix elements are then obtained within the adiabatic approximation using a finite difference technique for evaluating the derivative of the Kohn-Sham potential65. Similarly, the momentum matrix elements are calculated using the finite difference technique and the correction terms due to projector-augmented waves66 are added. The width of the Fermi–Dirac occupations is set to kBT = 50 meV for faster convergence of the DFT results. For generating the Raman spectra, a Gaussian [\(G(\omega )={(\sigma \sqrt{2\pi })}^{-1}\exp (-{\omega }^{2}/2{\sigma }^{2})\)] with a variance σ = 3 cm−1 is used to replace the Dirac delta function, which accounts for the inhomogeneous broadening of phonon modes. The temperature of the Bose–Einstein distributions is set to 300 K for all calculations except for the results in top panel of Fig. 7, where a temperature of 77 K is used. The calculations are submitted, managed, and received using the simple MyQueue workflow tool67, which is a Python front-end to job scheduler.

Experimental Raman spectra

The experimental Raman spectra are extracted from the figures in the corresponding references using a common plot digitizer. To remove the noise in the experimental data, they are filtered using a Savitzky–Golay filter68 of order three with a filter window length of eleven. For a fair comparison with our theoretical spectra in Fig. 5, we have convolved the experimental spectra with a Gaussian function with variance of 10 cm−1 to reduce the effect of possible but unimportant small frequency shifts between the experimental and theoretical spectra. Furthermore, the Raman moments have been calculated over a frequency range where the main Raman peaks appear, from 350 to 450 cm−1 for MoS2 and from 75 to 260 cm−1 for WTe2. For calculating the Euclidean distance, both the experimental and theoretical spectra are convolved with a Gaussian function with variance of 6 cm−1.