Exploring and machine learning structural instabilities in 2D materials

Manti, Simone; Svendsen, Mark Kamper; Knøsgaard, Nikolaj R.; Lyngby, Peder M.; Thygesen, Kristian S.

doi:10.1038/s41524-023-00977-x

Download PDF

Article
Open access
Published: 04 March 2023

Exploring and machine learning structural instabilities in 2D materials

npj Computational Materials volume 9, Article number: 33 (2023) Cite this article

2629 Accesses
10 Citations
Metrics details

Subjects

Abstract

We address the problem of predicting the zero-temperature dynamical stability (DS) of a periodic crystal without computing its full phonon band structure. Here we report the evidence that DS can be inferred with good reliability from the phonon frequencies at the center and boundary of the Brillouin zone (BZ). This analysis represents a validation of the DS test employed by the Computational 2D Materials Database (C2DB). For 137 dynamically unstable 2D crystals, we displace the atoms along an unstable mode and relax the structure. This procedure yields a dynamically stable crystal in 49 cases. The elementary properties of these new structures are characterized using the C2DB workflow, and it is found that their properties can differ significantly from those of the original unstable crystals, e.g., band gaps are opened by 0.3 eV on average. All the crystal structures and properties are available in the C2DB. Finally, we train a classification model on the DS data for 3295 2D materials in the C2DB using a representation encoding the electronic structure of the crystal. We obtain an excellent receiver operating characteristic (ROC) curve with an area under the curve (AUC) of 0.90, showing that the classification model can drastically reduce computational efforts in high-throughput studies.

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

John Jumper, Richard Evans, … Demis Hassabis

Neural operators for accelerating scientific simulations and design

Article 08 April 2024

Kamyar Azizzadenesheli, Nikola Kovachki, … Anima Anandkumar

Geometry-enhanced pretraining on interatomic potentials

Article 05 April 2024

Taoyong Cui, Chenyu Tang, … Wanli Ouyang

Introduction

Computational materials discovery aims at identifying materials for specific applications, often employing first-principles methods such as density functional theory (DFT)¹. The potential of a given material for the targeted application is usually evaluated based on the elementary properties of the crystal, such as the electronic band gap, the optical absorption spectrum, or the magnetic order. Such properties can be highly sensitive to even small distortions of the lattice that reduce the symmetry of the crystal, and it is, therefore, important to develop efficient methods for identifying and accounting for such distortions.

Lattice distortions can be classified according to their periodicity relative to the primitive cell of the crystal. Local instabilities conserve the periodicity of the crystal, i.e. they do not enlarge the number of atoms in the primitive cell. Other distortions, accompanied by a modulation of the electronic density known as charge density wave (CDW)², lead to an enlargement of the period of the crystal, which can be either commensurate or incommensurate with the high-symmetry phase. A classical example is the Peierls instability in one-dimensional³ and two-dimensional (2D)⁴ systems, where a gap is opened in the CDW state. A universal microscopic theory of the CDW phase is still missing due to the many possible and intertwined driving mechanisms, e.g., electron-phonon interaction⁵, Fermi surface nesting, or phonon-phonon interactions⁶, which makes a precise clear-cut definition of the CDW phase difficult. In addition, the CDW state is sensitive to external effects such as temperature and doping⁷. As a testimony to the complexity of the problem, different models and concepts are used to describe the CDW phase depending on the dimensionality of the material^8,9,10,11,12.

The last few years have witnessed an increased interest in CDW states of 2D materials. For example, CDW physics is believed to govern the transition from the trigonal prismatic T-phase to the lower symmetry T’-phase in monolayer MoS₂¹³ as well as the plethora of temperature-dependent phases in monolayers of NbSe₂^14,15, TaS₂^16,17, TaSe₂^18,19, and TiSe₂^20,21. In addition, a number of recent studies have investigated the possibility to control CDW phase transitions. For instance, the T-phase of monolayer MoS₂ can be stabilized by argon bombardment²², exposure to electron beams¹³, or Li-ion intercalation²³. Similar results have been reported for MoTe₂²⁴.

Regardless of the fundamental origin of possible lattice distortions, it remains of great practical importance to devise efficient schemes that makes it possible to verify whether or not a given structure is dynamically stable (DS), i.e., whether it represents a local minimum of the potential energy surface. Structures that are not DS are frequently generated in computational studies, e.g., when a structure is relaxed under symmetry constraints or the chosen unit cell is too small to accommodate the stable phase. Tests for DS are rarely performed in large-scale discovery studies, because there is no established way of doing it apart from calculating the full phonon band structure²⁵, which is a time-consuming task. At the same time, the importance of incorporating such tests is in fact unclear; that is, it is not known how much symmetry-breaking distortions generally influence the properties of materials.

A straightforward strategy to generate potentially stable structures from dynamically unstable ones, is to displace the atoms along an unstable phonon mode using a supercell that can accommodate the distortion. This approach has previously been adopted to explore structural distortions in metallic system²⁶ in bulk perovskites^27,28 and one-dimensional organometallic chains²⁹. However, systematic studies of structural instabilities in 2D materials, have so far been lacking.

In this work, we perform a systematic study of structural distortions across a broad class of 2D crystals, and explore a machine learning-based approach to DS classification. Throughout, we focus on the most common case of small-period, commensurate distortions that can be accommodated in a 2 × 2 repetition of the primitive cell of the high-symmetry phase. We shall refer to the test for the occurrence of such distortions as the Center and Boundary Phonon (CBP) protocol. The motivation behind the present work is fourfold: (i) To assess the reliability of the CBP protocol (which is currently used for DS classification in the Computational 2D Materials Database (C2DB)^30,31). (ii) To elucidate the effect of symmetry-breaking distortions on the basic electronic properties of crystals. (iii) To obtain the DS phases of a set of dynamically unstable 2D materials that were originally generated by combinatorial lattice decoration, and make them available to the community via the C2DB. (iv) To explore the viability of a machine learning-based classification scheme for predicting DS using input from a DFT calculation of the undistorted high-symmetry phase.

The paper is structured as follows. In “Results and discussion” we describe the CBP protocol. We first benchmark the CBP protocol against full phonon band structure calculations and evaluate its statistical success rate. For 137 dynamically unstable 2D materials, we further analyze how the small-period distortions that stabilize the materials influence their electronic properties. “Methods” concludes the paper.

Results and discussion

This section presents and discusses the results of the CBP protocol and its connection with machine learning predictions. Together they will increase the success rate of finding dynamically stable material within the C2DB workflow (see Fig. 1).

The CBP protocol: stability test

Given a material that has been relaxed in some unit cell (from hereon referred to as the primitive unit cell), the CBP protocol proceeds by evaluating the stiffness tensor of the material and the Hessian matrix of a supercell obtained by repeating the primitive cell 2 × 2 times. In the current work, the stiffness tensor is calculated as a finite difference of the stress under an applied strain, while the Hessian matrix is calculated as a finite difference of the forces on all the atoms of the 2 × 2 supercell under displacement of the atoms in one primitive unit cell (this is equivalent to calculating the phonons at the center and specific high-symmetry points at boundary of the BZ of the primitive cell, see Fig. 4. Next, the stiffness tensor and the Hessian matrix are diagonalised, and the eigenvalues are used to infer structural stability. A negative eigenvalue of the stiffness tensor indicates an instability of the lattice (the shape of the unit cell) while a negative eigenvalue of the 2 × 2 Hessian signals an instability of the atomic structure. The obvious question here, is whether it suffices to consider the Hessian of the 2 × 2 supercell, or equivalently consider the phonons at the BZ center and boundaries.

We can distinguish three possible outcomes when comparing the CBP protocol against full phonon calculations (see Fig. 2), namely a true positive result, a true negative result, and a false-positive result. We note that the case of a false negative is not possible, because a material that is unstable in a 2 × 2 cell is de facto unstable. The false-positive case occurs when a material is stable in a 2 × 2 supercell, but unstable if allowed to distort in a larger cell. Our results show that such large-period distortions that do not show as distortions in a 2 × 2 cell, are relatively rare (see “Assessment of the CBP protocol”).

**Fig. 2: Possible outcomes of the CBP protocol.**

The CBP protocol: Structure generation

Here we outline a simple procedure to generate distorted and potentially stable structures from an initial dynamically unstable structure. The basic idea is to displace the atoms along an unstable phonon mode followed by a relaxation. In practice, the unstable mode is obtained as the eigen function corresponding to a negative eigenvalue of the Hessian matrix of the 2 × 2 supercell. The procedure is illustrated in Fig. 3 for the well-known T-T’ phase transition of MoS₂¹³. The left panel shows the atomic structure and phonon band structure of monolayer MoS₂ in the T-phase. Both the primitive unit cell (black) and the 2 × 2 supercell (orange) are indicated. The CBP method identifies an unstable mode at the BZ boundary (M-point). After displacing the atoms along the unstable mode, a distorted structure is obtained, which after relaxation leads to the dynamically stable T’-phase of MoS₂ shown in the right panel. In this work, we have applied the CPB protocol systematically to 137 dynamically unstable 2D materials. The 137 monolayers were selected from the C2DB according to the following two criteria: First, to ensure that all materials are chemically “reasonable", only materials with a low formation energy were selected. Specifically, we require that ΔH_hull < 0.2 eV atom⁻¹, where ΔH_hull is the energy above the convex hull defined by the most stable (possibly mixed) bulk phases of the relevant composition^31,32. Secondly, we consider only materials with exactly one unstable mode, i.e. one negative eigenvalue of the Hessian matrix at a given q-point. We stress that the latter condition is not strictly necessary but was adopted here to limit the number of materials. When two or more unstable modes exist there is not a unique way to distort the structure. One possibility is to push along the linear combination of modes yielding the distorted structure with the highest symmetry²⁶. However, it is not clear that imposing high-symmetry is the best strategy for finding a DS structure. For the few cases with multiple unstable modes that we have analyzed, we have found that pushing along the most unstable eigenmode (the one with the most negative eigenvalue) often yields a DS structure, like in the case of TMoS₂.

**Fig. 3: Phonons frequencies of the T and T’ phase of MoS₂.**

For the 137 dynamically unstable materials we displaced the atoms along the (unique) unstable mode. The size of the displacement was chosen such that the maximum atomic displacement was exactly 0.1 Å. This displacement size was chosen based on the MoS₂ example discussed above, where it results in a minimal number of subsequent relaxation steps. A smaller value does not guarantee that the system leaves the saddle point, while a larger value creates a too large distortion resulting in additional relaxation steps. During relaxation the unit cell was allowed to change with no symmetry constraints and the relaxation was stopped when the forces on all atoms were below 0.01 eV Å⁻¹.

Assessment of the CBP protocol

To test the validity the CBP protocol, we have performed full phonon calculations for a set of 20 monolayers predicted as dynamically stable by the CBP protocol. The 20 materials were randomly selected from the C2DB and cover 7 different crystal structures. Out of the 20 materials 10 are metals and 10 are insulators/semiconductors. The calculated phonon band structures are reported in the Supplementary information (SI). For all materials, the phonon frequencies obtained with the CBP protocol equal the frequencies of the full phonon band structure at the q-points ${{{\bf{q}}}}\in \{(0,0),(\frac{1}{2},0),(0,\frac{1}{2}),(\frac{1}{2},\frac{1}{2})\}$. This is expected as the phonons at these q-points can be accommodated by the 2 × 2 supercell.

Within the set of 20 materials, we find three False-positive cases, namely CoTe₂, NbSSe, and TaTe₂. These materials exhibit unstable modes (imaginary frequencies or equivalently negative force constant eigenvalues) in the interior of the BZ (NbSSe and TaTe₂) or at the K-point (CoTe₂), while all phonon frequencies at the q-points covered by the CBP protocol, are real. A simple interpolation of the frequencies is not enough to catch an instability at internal points of BZ (see Supplementary Figure 1) because a supercell larger than a 2 × 2 is needed to catch these distortions, like in the case of 2H-CoTe2 (see Supplementary Figure 2). This relatively low percentage of False-positives in our representative samples is consistent with the work by Mounet et al.²⁵ who computed the full phonon band structure of 258 monolayers predicted to be (easily) exfoliable from known bulk compounds. Applying the CBP protocol to their data yields 14 False-positive cases; half of these are transition metal dichalcogenides (TMDs) with Co, Nb, or Ta.

We note that the small imaginary frequencies in the out-of-plane modes around the Γ-point seen in some of the phonon band structures are not distortions, but are rather due to the interpolation of the dynamical matrix. In particularly, these artifacts occur because of the broken crystal point-group symmetry in the force constant matrix and they will vanish if a larger supercell is used or the rotational sum rule is imposed^33,34 or higher-order multipolar interactions are included³⁵.

Stable distorted monolayers

The 137 dynamically unstable materials, which were selected from the C2DB according to the criteria described in “The CBP protocol: structure generation”, can be divided into two groups depending on whether the eigenvalues of the Hessian at the wave vectors ${q}_{x}=(\frac{1}{2},0),{q}_{y}=(0,\frac{1}{2})$ and ${q}_{xy}=(\frac{1}{2},\frac{1}{2})$, are equal or not. Equality of the eigenvalues implies an isotropic Hessian. For such materials, we generate distorted structures by displacing the atoms along the unstable mode at ${q}_{x}=(\frac{1}{2},0)$, followed by relaxation in a 2 × 1 supercell. In the case of an anisotropic Hessian, in general, it may exist a particular combination of q-vectors that stabilizes the system. Here, we were interested in finding a general method to generate DS structures in a high-throughput way and therefore, we decided to displace only at ${q}_{xy}=(\frac{1}{2},\frac{1}{2})$ in a 2 × 2 supercell.

After atomic displacement and subsequent relaxation, the CBP protocol was applied again to test for DS of the distorted structures. Histograms of the minimum eigenvalue, min(${\tilde{\omega }}_{{{{\bf{q}}}}\lambda }^{2}$), of the Hessian matrix at q for the unstable mode λ, are shown in Fig. 4 with the materials before and after atomic displacement shown in the upper and lower panels, respectively. Negative eigenvalues, corresponding to unstable materials, are shown in red while positive eigenvalues are shown in green. We removed the three translational modes with eigenvalues close to zero before extracting min(${\tilde{\omega }}_{{{{\bf{q}}}}\lambda }^{2}$). Out of the 137 unstable materials, 49 become dynamically stable (according to the CBP protocol). By far the highest success rate for generating stable crystals was found for the isotropic materials (left panel), where 43 out of 91 materials became stable while only 6 out of the 43 anisotropic materials became stable. In principle, the procedure can be applied many times to increase the number of DS structures, while here, we applied the protocol only once for computational reasons. An example where applying the protocol two times is the case of the 1T-TiSe₂ monolayer in the SI. In that case, the material is first displaced at ${q}_{xy}=(\frac{1}{2},\frac{1}{2})$ in a 2 × 2 supercell. Then the unstable material obtained is again displaced along one of the two degenerate modes at Γ and the final structure with the experimentally known CDW state²⁰ is obtained.

**Fig. 4: Histograms of minimum eigenvalues of the Hessian matrix before and after displacing along the unstable mode.**

A wide range of elementary properties of the 49 distorted, dynamically stable materials were computed using the C2DB workflow (see Table 1 in³¹ for a complete list of the properties). The atomic structures together with the calculated properties are available in the C2DB. Table 1 provides an overview of the symmetries, minimal Hessian eigenvalues, total energies, and electronic band gap of the 49 materials before and after the distortion.

Table 1 Properties of the stable materials.

Full size table

Apart from the reduction in symmetry, the distortion also lowers the total energy of the materials. An important descriptor for the thermodynamic stability of a material is the energy above the convex hull, ΔH_hull. Figure 5 shows a plot of ΔH_hull before and after the distortion of the 49 materials. The reduction in energy upon distortion ranges from 0 to 0.2 eV atom⁻¹. In fact, several of the materials come very close to the convex hull and some even fall onto the hull, indicating their global thermodynamic stability (at T = 0 K) with respect to the reference bulk phases. We note that all DFT energies, including the reference bulk phases, were calculated using the PBE xc-functional, which does not account for van der Waals interactions. Accounting for the vdW interactions will downshift the energies of layered bulk phases and thus increase ΔH_hull for the monolayers slightly. This effect will, however, not influence the relative stability of the pristine and distorted monolayers, which is the main focus of the current work. Another characteristic trend observed is the opening/increase of the electronic band gap. The increase of the single-particle band gap is expected to be related to the total energy gained by making the distortion. Figure 6 shows the relation between the two quantities. Simplified models, for low-dimensional systems and weak electron-phonon coupling, predict a proportionality between these two quantities³⁶. From our results, it is clear that there is no universal relationship between the change in band gap and total energy. In particular, several of the metals show large gains in total energy while the gap remains zero.

**Fig. 5: Energy above the convex hull before and after the stabilization.**

**Fig. 6: The energy gain as a function of the gap opening for the stable materials.**

It is interesting that 21 of the distorted and dynamically stable materials exhibit direct band gaps when a tolerance of 0.1 eV is employed for the difference between the direct and indirect gap. Atomically thin direct band gap semiconductors are highly relevant as building blocks for opto-electronic devices, but only a hand full of such materials are known to date e.g. monolayers of the Mo- and W-based transition metal dichalcogenides^37,38 and black phosphorous³⁹. As an example of a monolayer material that drastically changes from a metal to a direct band gap semiconductor upon distortion, we show the band structure of CdBr in Fig. 7. The initial unstable metallic phase of the material becomes dynamically stable upon distortion and opens a direct band gap of 1.28 eV at the C point.

Machine learning DS

We next attempt to accelerate the dynamic stability prediction using the machine learning model outlined in “Machine learning method”.

As an introductory exercise, we consider the correlation between DS and five elementary materials properties, namely the energy above the convex hull, the PBE band gap, the DOS at the Fermi level, the total energy per atom, and the heat of formation. Figure 8 shows the distribution of these properties over the 3295 2D materials. The materials have been split into dynamically stable (blue) and dynamically unstable materials (orange), respectively. There is a clear correlation between DS and the first three materials' properties shown in panels a–c. In particular, dynamically stable materials are closer to the convex hull, have larger band gap, and lower DOS at E_F as compared to dynamically unstable materials. The observed correlation with ΔH_hull is consistent with previous findings based on phonon calculations³¹. In contrast, no or only weak correlation is found for the last two quantities in panels d–e. These five properties were used as a low-dimensional feature vector for training an XGBoost machine learning model that will serve as a baseline for a model trained on the higher dimensional RAD-PDOS representation described in “Machine learning method”.

**Fig. 8: Histograms of electronic features for stable and unstable materials.**

To evaluate the performance of our model we employ the receiver operating characteristic (ROC) curve. The ROC curve maps out the number of materials correctly predicted as unstable as a function of the number of materials incorrectly labeled as unstable, and it is calculated by varying the classification tolerance of the model. The area under the curve (AUC) is a measure of the performance of the classifier. Random guessing would amount to a linear ROC curve with unit slope, shown in Fig. 9 by the dashed gray line, and correspond to an AUC of 0.5, whereas a perfect classification model would have an AUC of 1. When calculating the ROC curve of our dynamic stability classifier we employ tenfold cross-validation (CV). This allows us to obtain a mean ROC curve and its standard deviation, which we then use to evaluate the performance of our model.

The results from the machine learning model are shown in Fig. 9. The mean ROC curve is shown in blue in Fig. 9a; it achieves an excellent 10-fold CV AUC of 0.90 ± 0.01. This suggests that the XGBoost model is able to efficiently detect the dynamically unstable materials in the C2DB. We quantify the effect of the RAD-PDOS fingerprints by comparing the performance of the full model with a model trained on the low-dimensional fingerprint. We observe that the effect of including the RAD-PDOS in the fingerprint is statistically significant raising the AUC from 0.82 ± 0.01 to 0.90 ± 0.01. The relative impact of the RAD-PDOS fingerprints is shown in the feature importance evaluation in Fig. 9c. Here feature importance refers to how many times a feature is used to perform a split in the decision trees, and the feature importance has been summed for the six different components of the RAD-PDOS fingerprints, i.e., summing the radial distance and energy axes of the fingerprint. The vertical dashed black line shows the feature importance of random noise for reference. We observe that especially the RAD-PDOS ss fingerprint leads to many splits in the gradient-boosted trees. Part of the importance is explained by the number of non-zero features in the fingerprints. For the materials without d-orbital electrons, the sd, pd, and dd fingerprints will be all zero, while the ss, sp, and pp will have fewer zero-valued features and thus more features to use for splits in the trees.

Because of the strong performance of the model, we envision that it can be deployed directly after the initial relaxation step of a high-throughput workflow to reduce the number of phonon calculations needed to remove the dynamically unstable materials as depicted in Fig. 1. The ML model does not replace the phonon calculation but is merely used to avoid performing expensive phonon calculations for materials that can be labeled unstable by the ML model. Depending on the number of stable candidates that one is willing to falsely label as unstable, it is possible to save a significant amount of phonon calculations by pre-screening with the ML model. The willingness to sacrifice materials is controlled by classification tolerance. The trade-off between the number of unstable materials removed and the number of stable materials lost is directly mapped out by the ROC curve. In Fig. 9b we have indicated the classification thresholds where we discard 5%, 10%, and 20% of the stable materials, and we observe that we can save 56 ± 9%, 70 ± 6%, and 85 ± 3% of the computations for the three thresholds, respectively.

As an additional test of the machine learning model, we apply it to the set of the 137 dynamically unstable materials that were investigated using the CBP protocol in the first part of the paper. The DS of the materials is evaluated by the ML model both before and after being pushed along an unstable mode (recall that before the push all the 137 materials are unstable; after the push the subset of 49 materials listed in Table 1 become stable while the other materials remain unstable). It is found that before the push 56% of the unstable materials are labeled correctly. After the push, only 29% of the unstable materials are labeled as unstable while the precision of the stable materials are 72%. Overall, the ML model performs worse on this test set than on a randomly selected test set from the original dataset. An obvious explanation is that the 137 materials were selected according to (i) low energy above the convex hull (ΔH_hull < 0.2 eV/atom) and (ii) dynamically unstable. As seen from Fig. 8a such materials are highly unusual and not well represented by the set of materials used to train the model.

In conclusion, we have performed a systematic study of structural instabilities in 2D materials. We have validated a simple protocol (here referred to as the CBP protocol) for identifying dynamical instabilities based on the frequency of phonons at the center and boundary of the BZ. The CBP protocol correctly classifies 2D materials as dynamically stable/unstable in 236 out of 250 cases²⁵ and is ideally suited for high-throughput studies where the computational cost of evaluating the full phonon band structure becomes prohibitive.

For 137 dynamically unstable monolayers with low formation energies, we displaced the atoms along an unstable phonon mode and relaxed the structure in a 2 × 1 or 2 × 2 supercell. This resulted in 49 distorted, dynamically stable monolayers. The success rate of obtaining a dynamically stable structure from this protocol was found to be significantly higher for materials with only one unstable phonon mode as compared to cases with several modes. In the latter case, the displacement vector is not unique and different choices generally lead to different, (dynamically unstable) structures. The 49 stable structures were fully characterized by an extensive computational property workflow and the results are available via the C2DB database. The properties of the distorted structures can deviate significantly from the original high-symmetry structures, and we found only a weak, qualitative relation between the gain in total energy and band gap opening upon distortion. Finally, we trained a machine learning classification model to predict the DS using a radially decomposed projected density of states (RAD-PDOS) representation as input and a gradient boosting decision tree ensemble method (XGBoost) as learning algorithm. The model achieves an excellent ROC-AUC score of 0.90 and lends itself to the high-throughput assessment of DS.

Methods

Density functional theory calculations

All phonon calculations were performed using the asr.phonopy recipe of the Atomic Simulation Recipes (ASR)⁴⁰, which makes use of the Atomic Simulation Environment (ASE)⁴¹ and PHONOPY⁴². The DFT calculations were performed with the GPAW⁴³ code and the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional⁴⁴. The BZ was sampled on a uniform k-point mesh of density of 6.0 Å² and the plane wave cutoff was set to 800 eV. To evaluate the Hessian matrix, the small displacement method was used with a displacement size of 0.01 Åand forces were converged up to 10⁻⁴ eV Å⁻¹. The non-analytical force constants were not included because we saw that they do not have any effect on the minimum eigenvalues of the hessian. To benchmark the CBP protocol, we compare it to full phonon band structures. In these calculations, the size of the supercell is chosen such that the Hessian matrix includes interactions between pairs of atoms within a radius of at least 12 Å. (This implies that the supercell must contain a sphere of radius 12 Å). The spacegroup of the materials before and after applying the protocol, were calculated with spglib⁴⁵, with a symmetry threshold of 0.1 Å, a value also used in a similar study²⁶.

Machine learning method

Below we describe the machine learning algorithm that we have developed and assessed in an attempt to accelerate the prediction of dynamic instabilities. Our choice of machine learning algorithm is the library, XGBoost⁴⁶, due to its robustness and flexibility, while being a simpler model compared to neural network methods. XGBoost is a regularized high-performance implementation of gradient tree boosting, which makes predictions based on an ensemble of gradient-boosted decision trees. The decision trees of the ensemble are grown sequentially while learning from the mistakes of the previous trees by minimizing the loss function through gradient descent. This loss function is regularized to reduce the complexity of the individual decision trees which reduces the risk of overfitting. In contrast, in the widely used decision tree ensemble model Random Forest, the decision trees are grown independently and without any regularization.

The dataset used is a subset of C2DB and consists of 3212 materials (1536 stable and 1676 unstable materials), which does not include the 137 distorted materials identified in the first part of the paper (as these will be used as a particularly challenging test case for the model performance). As input for the model we use the radially projected density of states (RAD-PDOS) material fingerprints⁴⁷. The RAD-PDOS starts from the wave functions projected onto the atomic orbitals (ν) of all the atoms (a) of the crystal, ${\rho }_{nk}^{a\nu }=| \langle {\psi }_{nk}| a\nu \rangle {| }^{2}$. For each state, these projections are then combined into a radially distributed orbital pair correlation function,

$$\begin{array}{ll}{\rho }_{nk}^{\nu {\nu }^{{\prime} }}(R)\,=\,\mathop{\sum}\limits_{a{a}^{{\prime} }}{\rho }_{nk}^{a\nu }{\rho }_{nk}^{{a}^{{\prime} }{\nu }^{{\prime} }}G(R-| {R}_{a}-{R}_{{a}^{{\prime} }}| ;{\delta }_{R})\\ \qquad\qquad\quad\times\, \exp (-{\alpha }_{R}R)\end{array}$$

(1)

Finally, the radial functions are distributed on an energy grid,

$$\begin{array}{ll}{\rho }^{\nu {\nu }^{{\prime} }}(R,E)\,=\,\mathop{\sum}\limits_{nk}{\rho }_{nk}^{\nu {\nu }^{{\prime} }}(R)G(E-({\varepsilon }_{nk}-{E}_{F});{\delta }_{E})\\ \quad\qquad\qquad\quad\times\, \exp (-{\alpha }_{E}R),\end{array}$$

(2)

where G(x; δ) is a Gaussian of width δ centered at x = 0. For the materials in the dataset, the s, p, and d orbitals lead to six unique components of the RAD-PDOS fingerprint ($\nu {\nu }^{{\prime} }=\{ss,sp,sd,pp,pd,dd\}$). The fingerprint involves some hyperparameters for which we use the values ${E}_{\min }=-10\,{{{\rm{eV}}}},{E}_{\max }=10\,{{{\rm{eV}}}},{N}_{E}=25,{\delta }_{E}=0.3\,{{{\rm{eV}}}},{\alpha }_{E}=0.2{{{{\rm{eV}}}}}^{-1},{R}_{\min }=0,{R}_{\max }=5\,\mathring{\rm A} ,{N}_{R}=20,{\delta }_{R}=0.25\,\mathring{\rm A} ,{\alpha }_{R}=0.33{\mathring{\rm A} }^{-1}$.

In addition to the RAD-PDOS fingerprint, we consider a low-dimensional fingerprint consisting of five features, namely the PBE electronic band gap (${\varepsilon }_{{{{\rm{gap}}}}}^{{{{\rm{PBE}}}}}$), crystal formation energy (ΔH), density of states at the Fermi level (DOS@E_F), energy above the convex hull (ΔH_hull) and the total energy per atom in the unit cell. The low-dimensional fingerprint is used to train a “baseline" ML model that we use to benchmark the performance of the ML model based on the more involved RAD-PDOS fingerprint. Common to all the features considered is that they are obtained from a single DFT calculation and thus are much faster to compute that the phonon frequencies. The gradient boosting model introduces several hyperparameters such as depth of the trees, learning rate, minimum loss gain to perform a split, and minimum weights in tree leafs. These parameters are optimized using Bayesian optimization where a Gaussian process is fitted to the mean test ROC-AUC of a 10-fold cross-validation. The hyperparameters used here are max depth = 8, learning rate = 0.06, min split loss = 0 and min weights = 0. The XGBoost classification model is in fact a logistic regression model, i.e. the output of the model is a number between 0 and 1 which is interpreted as a probability. In our case, 0 (1) refers to a dynamically stable (unstable) material.

Data availability

All the crystal structures and properties will be available in the C2DB (https://doi.org/10.11583/DTU.14616660.v1).

References

Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133–A1138 (1965).
Article Google Scholar
Grüner, G. The dynamics of charge-density waves. Rev. Mod. Phys. 60, 1129–1181 (1988).
Article Google Scholar
Peierls, R. Quantum Theory of Solids. International Series of Monographs on Physics. (Clarendon Press, 1996).
Ono, Y. & Hamano, T. Peierls distortion in two-dimensional tight-binding model. J. Phys. Soc. Jpn. 69, 1769–1776 (2000).
Article CAS Google Scholar
Weber, F. et al. Electron-phonon coupling and the soft phonon mode in tise₂. Phys. Rev. Lett. 107, 266401 (2011).
Article CAS Google Scholar
Bianco, R., Errea, I., Monacelli, L., Calandra, M. & Mauri, F. Quantum enhancement of charge density wave in nbs2 in the two-dimensional limit. Nano Lett. 19, 3098–3103 (2019).
Article CAS Google Scholar
Zhou, J. S. et al. Anharmonicity and doping melt the charge density wave in single-layer tise2. Nano Lett. 20, 4809–4815 (2020).
Article CAS Google Scholar
Leroux, M. et al. Anharmonic suppression of charge density waves in 2h-nbs₂. Phys. Rev. B 86, 155125 (2012).
Article Google Scholar
Bianco, R., Errea, I., Monacelli, L., Calandra, M. & Mauri, F. Quantum enhancement of charge density wave in nbs2 in the two-dimensional limit. Nano Lett. 19, 3098–3103 (2019).
Article CAS Google Scholar
Calandra, M., Mazin, I. I. & Mauri, F. Effect of dimensionality on the charge-density wave in few-layer 2H-NbSe₂. Phys. Rev. B 80, 241108 (2009).
Article Google Scholar
Zhu, X., Cao, Y., Zhang, J., Plummer, E. W. & Guo, J. Classification of charge density waves based on their nature. Proc. Natl. Acad. Sci. 112, 2367–2371 (2015).
Article CAS Google Scholar
Johannes, M. & Mazin, I. Fermi surface nesting and the origin of charge density waves in metals. Phys. Rev. B 77, 165135 (2008).
Article Google Scholar
Lin, Y.-C., Dumcenco, D. O., Huang, Y.-S. & Suenaga, K. Atomic mechanism of the semiconducting-to-metallic phase transition in single-layered mos2. Nat. Nanotechnol. 9, 391–396 (2014).
Article CAS Google Scholar
Xi, X. et al. Strongly enhanced charge-density-wave order in monolayer nbse2. Nat. Nanotechnol. 10, 765–769 (2015).
Article CAS Google Scholar
Ugeda, M. et al. Characterization of collective ground states in single-layer nbse 2. Nat. Phys. 12, 92–97 (2016).
Article CAS Google Scholar
Yang, Y. et al. Enhanced superconductivity upon weakening of charge density wave transport in 2h-tas₂ in the two-dimensional limit. Phys. Rev. B 98, 035203 (2018).
Article CAS Google Scholar
Yu, Y. et al. Gate-tunable phase transitions in thin flakes of 1t-tas2. Nat. Nanotechnol. 10, 270–276 (2015).
Article CAS Google Scholar
Ryu, H. et al. Persistent charge-density-wave order in single-layer tase2. Nano Lett. 18, 689–694 (2018).
Article CAS Google Scholar
Ge, Y. & Liu, A. Y. Effect of dimensionality and spin-orbit coupling on charge-density-wave transition in 2h-tase₂. Phys. Rev. B 86, 104101 (2012).
Article Google Scholar
Sugawara, K. et al. Unconventional charge-density-wave transition in monolayer 1t-tise2. ACS Nano 10, 1341–1345 (2016).
Article CAS Google Scholar
Wang, H. et al. Large-area atomic layers of the charge-density-wave conductor tise2. Adv. Mater. 30, 1704382 (2018).
Article Google Scholar
Zhu, J. et al. Argon plasma induced phase transition in monolayer mos2. J. Am. Chem. Soc. 139, 10216–10219 (2017).
Article CAS Google Scholar
Wang, L., Xu, Z., Wang, W. & Bai, X. Atomic mechanism of dynamic electrochemical lithiation processes of mos2 nanosheets. J. Am. Chem. Soc. 136, 6693–6697 (2014).
Article CAS Google Scholar
Krishnamoorthy, A. et al. Semiconductor-metal structural phase transformation in mote2 monolayers by electronic excitation. Nanoscale 10, 2742–2747 (2018).
Article CAS Google Scholar
Mounet, N. et al. Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds. Nat. Nanotechnol. 13, 246–252 (2018).
Article CAS Google Scholar
Togo, A. & Tanaka, I. Evolution of crystal structures in metallic elements. Phys. Rev. B 87, 184104 (2013).
Article Google Scholar
Patrick, C. E., Jacobsen, K. W. & Thygesen, K. S. Anharmonic stabilization and band gap renormalization in the perovskite cssni₃. Phys. Rev. B 92, 201205 (2015).
Article Google Scholar
Yang, R. X., Skelton, J. M., da Silva, E. L., Frost, J. M. & Walsh, A. Assessment of dynamic structural instabilities across 24 cubic inorganic halide perovskites. J. Chem. Phys. 152, 024703 (2020).
Article CAS Google Scholar
Kayastha, P. & Ramakrishnan, R. High-throughput design of peierls and charge density wave phases in q1d organometallic materials. J. Chem. Phys. 154, 061102 (2021).
Article CAS Google Scholar
Haastrup, S. et al. The computational 2d materials database: high-throughput modeling and discovery of atomically thin crystals. 2D Mater. 5, 042002 (2018).
Article CAS Google Scholar
Gjerding, M. et al. Recent progress of the computational 2d materials database (c2db). 2D Mater. 8, 044002 (2021).
Article CAS Google Scholar
https://cmr.fysik.dtu.dk/c2db/c2db.html.
Eriksson, F., Fransson, E. & Erhart, P. The hiphive package for the extraction of high-order force constants by machine learning. Adv. Theory Simul. 2, 1800184 (2019).
Article Google Scholar
Carrete, J. et al. Physically founded phonon dispersions of few-layer materials and the case of borophene. Mater. Res. Lett. 4, 204–211 (2016).
Article CAS Google Scholar
Royo, M., Hahn, K. R. & Stengel, M. Using high multipolar orders to reconstruct the sound velocity in piezoelectrics from lattice dynamics. Phys. Rev. Lett. 125, 217602 (2020).
Article CAS Google Scholar
Rossnagel, K. On the origin of charge-density waves in select layered transition-metal dichalcogenides. J. Phys. Condens. Matter 23, 213001 (2011).
Article CAS Google Scholar
Mak, K. F., Lee, C., Hone, J., Shan, J. & Heinz, T. F. Atomically thin mos2: a new direct-gap semiconductor. Phys. Rev. Lett. 105, 136805 (2010).
Article Google Scholar
Manzeli, S., Ovchinnikov, D., Pasquier, D., Yazyev, O. V. & Kis, A. 2d transition metal dichalcogenides. Nat. Rev. Mater. 2, 1–15 (2017).
Article Google Scholar
Liu, H. et al. Phosphorene: an unexplored 2d semiconductor with a high hole mobility. ACS Nano 8, 4033–4041 (2014).
Article CAS Google Scholar
Gjerding, M. et al. Atomic simulation recipes – a python framework and library for automated workflows. Comput. Mater. Sci. 199, 110731 (2021).
Article CAS Google Scholar
Larsen, A. H. et al. The atomic simulation environment–a python library for working with atoms. J. Phys. Condens. Matter 29, 273002 (2017).
Article Google Scholar
Togo, A. & Tanaka, I. First principles phonon calculations in materials science. Scr. Mater. 108, 1–5 (2015).
Article CAS Google Scholar
Enkovaara, J. et al. Electronic structure calculations with GPAW: a real-space implementation of the projector augmented-wave method. J. Phys. Condens. Matter 22, 253202 (2010).
Article CAS Google Scholar
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).
Article CAS Google Scholar
Togo, A. & Tanaka, I. Spglib: a software library for crystal symmetry search. Preprint at https://arxiv.org/abs/1808.01590 (2018).
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA, (2016). Association for Computing Machinery.
Knøsgaard, N. R. & Thygesen, K. S. Representing individual electronic states for machine learning gw band structures of 2d materials. Nat. Commun. 13, 468 (2022).
Article Google Scholar

Download references

Acknowledgements

The Center for Nanostructured Graphene (CNG) is sponsored by the Danish National Research Foundation, Project DNRF103. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program grant agreement no. 773122 (LIMA). K.S.T. is a Villum Investigator supported by VILLUM FONDEN (grant no. 37789).

Author information

Authors and Affiliations

CAMD, Computational Atomic-Scale Materials Design, Department of Physics, Technical University of Denmark, 2800 Kgs., Lyngby, Denmark
Simone Manti, Mark Kamper Svendsen, Nikolaj R. Knøsgaard, Peder M. Lyngby & Kristian S. Thygesen
Center for Nanostructured Graphene (CNG),Department of Physics, Technical University of Denmark, DK - 2800 Kongens, Lyngby, Denmark
Kristian S. Thygesen

Authors

Simone Manti
View author publications
You can also search for this author in PubMed Google Scholar
Mark Kamper Svendsen
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaj R. Knøsgaard
View author publications
You can also search for this author in PubMed Google Scholar
Peder M. Lyngby
View author publications
You can also search for this author in PubMed Google Scholar
Kristian S. Thygesen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.M. and K.S.T. developed the initial concept. S.M. performed the benchmark of the CBP protocol and developed the workflow and the analysis for the pushed materials. M.K.S., N.R.K, and P.M.L. conducted the machine learning analysis. K.S.T. supervised the work and helped in the interpretation of the results. All authors modified and discussed the paper together.

Corresponding author

Correspondence to Simone Manti.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Manti, S., Svendsen, M.K., Knøsgaard, N.R. et al. Exploring and machine learning structural instabilities in 2D materials. npj Comput Mater 9, 33 (2023). https://doi.org/10.1038/s41524-023-00977-x

Download citation

Received: 21 January 2022
Accepted: 30 January 2023
Published: 04 March 2023
DOI: https://doi.org/10.1038/s41524-023-00977-x

This article is cited by

Methods and applications of machine learning in computational design of optoelectronic semiconductors
- Xiaoyu Yang
- Kun Zhou
- Lijun Zhang
Science China Materials (2024)