## Abstract

We address the problem of predicting the zero-temperature dynamical stability (DS) of a periodic crystal without computing its full phonon band structure. Here we report the evidence that DS can be inferred with good reliability from the phonon frequencies at the center and boundary of the Brillouin zone (BZ). This analysis represents a validation of the DS test employed by the Computational 2D Materials Database (C2DB). For 137 dynamically unstable 2D crystals, we displace the atoms along an unstable mode and relax the structure. This procedure yields a dynamically stable crystal in 49 cases. The elementary properties of these new structures are characterized using the C2DB workflow, and it is found that their properties can differ significantly from those of the original unstable crystals, e.g., band gaps are opened by 0.3 eV on average. All the crystal structures and properties are available in the C2DB. Finally, we train a classification model on the DS data for 3295 2D materials in the C2DB using a representation encoding the electronic structure of the crystal. We obtain an excellent receiver operating characteristic (ROC) curve with an area under the curve (AUC) of 0.90, showing that the classification model can drastically reduce computational efforts in high-throughput studies.

## Introduction

Computational materials discovery aims at identifying materials for specific applications, often employing first-principles methods such as density functional theory (DFT)^{1}. The potential of a given material for the targeted application is usually evaluated based on the elementary properties of the crystal, such as the electronic band gap, the optical absorption spectrum, or the magnetic order. Such properties can be highly sensitive to even small distortions of the lattice that reduce the symmetry of the crystal, and it is, therefore, important to develop efficient methods for identifying and accounting for such distortions.

Lattice distortions can be classified according to their periodicity relative to the primitive cell of the crystal. Local instabilities conserve the periodicity of the crystal, i.e. they do not enlarge the number of atoms in the primitive cell. Other distortions, accompanied by a modulation of the electronic density known as charge density wave (CDW)^{2}, lead to an enlargement of the period of the crystal, which can be either commensurate or incommensurate with the high-symmetry phase. A classical example is the Peierls instability in one-dimensional^{3} and two-dimensional (2D)^{4} systems, where a gap is opened in the CDW state. A universal microscopic theory of the CDW phase is still missing due to the many possible and intertwined driving mechanisms, e.g., electron-phonon interaction^{5}, Fermi surface nesting, or phonon-phonon interactions^{6}, which makes a precise clear-cut definition of the CDW phase difficult. In addition, the CDW state is sensitive to external effects such as temperature and doping^{7}. As a testimony to the complexity of the problem, different models and concepts are used to describe the CDW phase depending on the dimensionality of the material^{8,9,10,11,12}.

The last few years have witnessed an increased interest in CDW states of 2D materials. For example, CDW physics is believed to govern the transition from the trigonal prismatic T-phase to the lower symmetry T’-phase in monolayer MoS_{2}^{13} as well as the plethora of temperature-dependent phases in monolayers of NbSe_{2}^{14,15}, TaS_{2}^{16,17}, TaSe_{2}^{18,19}, and TiSe_{2}^{20,21}. In addition, a number of recent studies have investigated the possibility to control CDW phase transitions. For instance, the T-phase of monolayer MoS_{2} can be stabilized by argon bombardment^{22}, exposure to electron beams^{13}, or Li-ion intercalation^{23}. Similar results have been reported for MoTe_{2}^{24}.

Regardless of the fundamental origin of possible lattice distortions, it remains of great practical importance to devise efficient schemes that makes it possible to verify whether or not a given structure is dynamically stable (DS), i.e., whether it represents a local minimum of the potential energy surface. Structures that are not DS are frequently generated in computational studies, e.g., when a structure is relaxed under symmetry constraints or the chosen unit cell is too small to accommodate the stable phase. Tests for DS are rarely performed in large-scale discovery studies, because there is no established way of doing it apart from calculating the full phonon band structure^{25}, which is a time-consuming task. At the same time, the importance of incorporating such tests is in fact unclear; that is, it is not known how much symmetry-breaking distortions generally influence the properties of materials.

A straightforward strategy to generate potentially stable structures from dynamically unstable ones, is to displace the atoms along an unstable phonon mode using a supercell that can accommodate the distortion. This approach has previously been adopted to explore structural distortions in metallic system^{26} in bulk perovskites^{27,28} and one-dimensional organometallic chains^{29}. However, systematic studies of structural instabilities in 2D materials, have so far been lacking.

In this work, we perform a systematic study of structural distortions across a broad class of 2D crystals, and explore a machine learning-based approach to DS classification. Throughout, we focus on the most common case of small-period, commensurate distortions that can be accommodated in a 2 × 2 repetition of the primitive cell of the high-symmetry phase. We shall refer to the test for the occurrence of such distortions as the Center and Boundary Phonon (CBP) protocol. The motivation behind the present work is fourfold: (i) To assess the reliability of the CBP protocol (which is currently used for DS classification in the Computational 2D Materials Database (C2DB)^{30,31}). (ii) To elucidate the effect of symmetry-breaking distortions on the basic electronic properties of crystals. (iii) To obtain the DS phases of a set of dynamically unstable 2D materials that were originally generated by combinatorial lattice decoration, and make them available to the community via the C2DB. (iv) To explore the viability of a machine learning-based classification scheme for predicting DS using input from a DFT calculation of the undistorted high-symmetry phase.

The paper is structured as follows. In “Results and discussion” we describe the CBP protocol. We first benchmark the CBP protocol against full phonon band structure calculations and evaluate its statistical success rate. For 137 dynamically unstable 2D materials, we further analyze how the small-period distortions that stabilize the materials influence their electronic properties. “Methods” concludes the paper.

## Results and discussion

This section presents and discusses the results of the CBP protocol and its connection with machine learning predictions. Together they will increase the success rate of finding dynamically stable material within the C2DB workflow (see Fig. 1).

### The CBP protocol: stability test

Given a material that has been relaxed in some unit cell (from hereon referred to as the primitive unit cell), the CBP protocol proceeds by evaluating the stiffness tensor of the material and the Hessian matrix of a supercell obtained by repeating the primitive cell 2 × 2 times. In the current work, the stiffness tensor is calculated as a finite difference of the stress under an applied strain, while the Hessian matrix is calculated as a finite difference of the forces on all the atoms of the 2 × 2 supercell under displacement of the atoms in one primitive unit cell (this is equivalent to calculating the phonons at the center and specific high-symmetry points at boundary of the BZ of the primitive cell, see Fig. 4. Next, the stiffness tensor and the Hessian matrix are diagonalised, and the eigenvalues are used to infer structural stability. A negative eigenvalue of the stiffness tensor indicates an instability of the lattice (the shape of the unit cell) while a negative eigenvalue of the 2 × 2 Hessian signals an instability of the atomic structure. The obvious question here, is whether it suffices to consider the Hessian of the 2 × 2 supercell, or equivalently consider the phonons at the BZ center and boundaries.

We can distinguish three possible outcomes when comparing the CBP protocol against full phonon calculations (see Fig. 2), namely a true positive result, a true negative result, and a false-positive result. We note that the case of a false negative is not possible, because a material that is unstable in a 2 × 2 cell is de facto unstable. The false-positive case occurs when a material is stable in a 2 × 2 supercell, but unstable if allowed to distort in a larger cell. Our results show that such large-period distortions that do not show as distortions in a 2 × 2 cell, are relatively rare (see “Assessment of the CBP protocol”).

### The CBP protocol: Structure generation

Here we outline a simple procedure to generate distorted and potentially stable structures from an initial dynamically unstable structure. The basic idea is to displace the atoms along an unstable phonon mode followed by a relaxation. In practice, the unstable mode is obtained as the eigen function corresponding to a negative eigenvalue of the Hessian matrix of the 2 × 2 supercell. The procedure is illustrated in Fig. 3 for the well-known T-T’ phase transition of MoS_{2}^{13}. The left panel shows the atomic structure and phonon band structure of monolayer MoS_{2} in the T-phase. Both the primitive unit cell (black) and the 2 × 2 supercell (orange) are indicated. The CBP method identifies an unstable mode at the BZ boundary (M-point). After displacing the atoms along the unstable mode, a distorted structure is obtained, which after relaxation leads to the dynamically stable T’-phase of MoS_{2} shown in the right panel. In this work, we have applied the CPB protocol systematically to 137 dynamically unstable 2D materials. The 137 monolayers were selected from the C2DB according to the following two criteria: First, to ensure that all materials are chemically “reasonable", only materials with a low formation energy were selected. Specifically, we require that Δ*H*_{hull} < 0.2 eV atom^{−1}, where Δ*H*_{hull} is the energy above the convex hull defined by the most stable (possibly mixed) bulk phases of the relevant composition^{31,32}. Secondly, we consider only materials with exactly one unstable mode, i.e. one negative eigenvalue of the Hessian matrix at a given *q*-point. We stress that the latter condition is not strictly necessary but was adopted here to limit the number of materials. When two or more unstable modes exist there is not a unique way to distort the structure. One possibility is to push along the linear combination of modes yielding the distorted structure with the highest symmetry^{26}. However, it is not clear that imposing high-symmetry is the best strategy for finding a DS structure. For the few cases with multiple unstable modes that we have analyzed, we have found that pushing along the most unstable eigenmode (the one with the most negative eigenvalue) often yields a DS structure, like in the case of TMoS_{2}.

For the 137 dynamically unstable materials we displaced the atoms along the (unique) unstable mode. The size of the displacement was chosen such that the maximum atomic displacement was exactly 0.1 Å. This displacement size was chosen based on the MoS_{2} example discussed above, where it results in a minimal number of subsequent relaxation steps. A smaller value does not guarantee that the system leaves the saddle point, while a larger value creates a too large distortion resulting in additional relaxation steps. During relaxation the unit cell was allowed to change with no symmetry constraints and the relaxation was stopped when the forces on all atoms were below 0.01 eV Å^{−1}.

### Assessment of the CBP protocol

To test the validity the CBP protocol, we have performed full phonon calculations for a set of 20 monolayers predicted as dynamically stable by the CBP protocol. The 20 materials were randomly selected from the C2DB and cover 7 different crystal structures. Out of the 20 materials 10 are metals and 10 are insulators/semiconductors. The calculated phonon band structures are reported in the Supplementary information (SI). For all materials, the phonon frequencies obtained with the CBP protocol equal the frequencies of the full phonon band structure at the *q*-points \({{{\bf{q}}}}\in \{(0,0),(\frac{1}{2},0),(0,\frac{1}{2}),(\frac{1}{2},\frac{1}{2})\}\). This is expected as the phonons at these *q*-points can be accommodated by the 2 × 2 supercell.

Within the set of 20 materials, we find three False-positive cases, namely CoTe_{2}, NbSSe, and TaTe_{2}. These materials exhibit unstable modes (imaginary frequencies or equivalently negative force constant eigenvalues) in the interior of the BZ (NbSSe and TaTe_{2}) or at the K-point (CoTe_{2}), while all phonon frequencies at the *q*-points covered by the CBP protocol, are real. A simple interpolation of the frequencies is not enough to catch an instability at internal points of BZ (see Supplementary Figure 1) because a supercell larger than a 2 × 2 is needed to catch these distortions, like in the case of 2H-CoTe2 (see Supplementary Figure 2). This relatively low percentage of False-positives in our representative samples is consistent with the work by Mounet et al.^{25} who computed the full phonon band structure of 258 monolayers predicted to be (easily) exfoliable from known bulk compounds. Applying the CBP protocol to their data yields 14 False-positive cases; half of these are transition metal dichalcogenides (TMDs) with Co, Nb, or Ta.

We note that the small imaginary frequencies in the out-of-plane modes around the Γ-point seen in some of the phonon band structures are not distortions, but are rather due to the interpolation of the dynamical matrix. In particularly, these artifacts occur because of the broken crystal point-group symmetry in the force constant matrix and they will vanish if a larger supercell is used or the rotational sum rule is imposed^{33,34} or higher-order multipolar interactions are included^{35}.

### Stable distorted monolayers

The 137 dynamically unstable materials, which were selected from the C2DB according to the criteria described in “The CBP protocol: structure generation”, can be divided into two groups depending on whether the eigenvalues of the Hessian at the wave vectors \({q}_{x}=(\frac{1}{2},0),{q}_{y}=(0,\frac{1}{2})\) and \({q}_{xy}=(\frac{1}{2},\frac{1}{2})\), are equal or not. Equality of the eigenvalues implies an isotropic Hessian. For such materials, we generate distorted structures by displacing the atoms along the unstable mode at \({q}_{x}=(\frac{1}{2},0)\), followed by relaxation in a 2 × 1 supercell. In the case of an anisotropic Hessian, in general, it may exist a particular combination of q-vectors that stabilizes the system. Here, we were interested in finding a general method to generate DS structures in a high-throughput way and therefore, we decided to displace only at \({q}_{xy}=(\frac{1}{2},\frac{1}{2})\) in a 2 × 2 supercell.

After atomic displacement and subsequent relaxation, the CBP protocol was applied again to test for DS of the distorted structures. Histograms of the minimum eigenvalue, min(\({\tilde{\omega }}_{{{{\bf{q}}}}\lambda }^{2}\)), of the Hessian matrix at **q** for the unstable mode *λ*, are shown in Fig. 4 with the materials before and after atomic displacement shown in the upper and lower panels, respectively. Negative eigenvalues, corresponding to unstable materials, are shown in red while positive eigenvalues are shown in green. We removed the three translational modes with eigenvalues close to zero before extracting min(\({\tilde{\omega }}_{{{{\bf{q}}}}\lambda }^{2}\)). Out of the 137 unstable materials, 49 become dynamically stable (according to the CBP protocol). By far the highest success rate for generating stable crystals was found for the isotropic materials (left panel), where 43 out of 91 materials became stable while only 6 out of the 43 anisotropic materials became stable. In principle, the procedure can be applied many times to increase the number of DS structures, while here, we applied the protocol only once for computational reasons. An example where applying the protocol two times is the case of the 1T-TiSe_{2} monolayer in the SI. In that case, the material is first displaced at \({q}_{xy}=(\frac{1}{2},\frac{1}{2})\) in a 2 × 2 supercell. Then the unstable material obtained is again displaced along one of the two degenerate modes at Γ and the final structure with the experimentally known CDW state^{20} is obtained.

A wide range of elementary properties of the 49 distorted, dynamically stable materials were computed using the C2DB workflow (see Table 1 in^{31} for a complete list of the properties). The atomic structures together with the calculated properties are available in the C2DB. Table 1 provides an overview of the symmetries, minimal Hessian eigenvalues, total energies, and electronic band gap of the 49 materials before and after the distortion.

Apart from the reduction in symmetry, the distortion also lowers the total energy of the materials. An important descriptor for the thermodynamic stability of a material is the energy above the convex hull, Δ*H*_{hull}. Figure 5 shows a plot of Δ*H*_{hull} before and after the distortion of the 49 materials. The reduction in energy upon distortion ranges from 0 to 0.2 eV atom^{−1}. In fact, several of the materials come very close to the convex hull and some even fall onto the hull, indicating their global thermodynamic stability (at *T* = 0 K) with respect to the reference bulk phases. We note that all DFT energies, including the reference bulk phases, were calculated using the PBE xc-functional, which does not account for van der Waals interactions. Accounting for the vdW interactions will downshift the energies of layered bulk phases and thus increase Δ*H*_{hull} for the monolayers slightly. This effect will, however, not influence the relative stability of the pristine and distorted monolayers, which is the main focus of the current work. Another characteristic trend observed is the opening/increase of the electronic band gap. The increase of the single-particle band gap is expected to be related to the total energy gained by making the distortion. Figure 6 shows the relation between the two quantities. Simplified models, for low-dimensional systems and weak electron-phonon coupling, predict a proportionality between these two quantities^{36}. From our results, it is clear that there is no universal relationship between the change in band gap and total energy. In particular, several of the metals show large gains in total energy while the gap remains zero.

It is interesting that 21 of the distorted and dynamically stable materials exhibit direct band gaps when a tolerance of 0.1 eV is employed for the difference between the direct and indirect gap. Atomically thin direct band gap semiconductors are highly relevant as building blocks for opto-electronic devices, but only a hand full of such materials are known to date e.g. monolayers of the Mo- and W-based transition metal dichalcogenides^{37,38} and black phosphorous^{39}. As an example of a monolayer material that drastically changes from a metal to a direct band gap semiconductor upon distortion, we show the band structure of CdBr in Fig. 7. The initial unstable metallic phase of the material becomes dynamically stable upon distortion and opens a direct band gap of 1.28 eV at the C point.

### Machine learning DS

We next attempt to accelerate the dynamic stability prediction using the machine learning model outlined in “Machine learning method”.

As an introductory exercise, we consider the correlation between DS and five elementary materials properties, namely the energy above the convex hull, the PBE band gap, the DOS at the Fermi level, the total energy per atom, and the heat of formation. Figure 8 shows the distribution of these properties over the 3295 2D materials. The materials have been split into dynamically stable (blue) and dynamically unstable materials (orange), respectively. There is a clear correlation between DS and the first three materials' properties shown in panels a–c. In particular, dynamically stable materials are closer to the convex hull, have larger band gap, and lower DOS at *E*_{F} as compared to dynamically unstable materials. The observed correlation with Δ*H*_{hull} is consistent with previous findings based on phonon calculations^{31}. In contrast, no or only weak correlation is found for the last two quantities in panels d–e. These five properties were used as a low-dimensional feature vector for training an XGBoost machine learning model that will serve as a baseline for a model trained on the higher dimensional RAD-PDOS representation described in “Machine learning method”.

To evaluate the performance of our model we employ the receiver operating characteristic (ROC) curve. The ROC curve maps out the number of materials correctly predicted as unstable as a function of the number of materials incorrectly labeled as unstable, and it is calculated by varying the classification tolerance of the model. The area under the curve (AUC) is a measure of the performance of the classifier. Random guessing would amount to a linear ROC curve with unit slope, shown in Fig. 9 by the dashed gray line, and correspond to an AUC of 0.5, whereas a perfect classification model would have an AUC of 1. When calculating the ROC curve of our dynamic stability classifier we employ tenfold cross-validation (CV). This allows us to obtain a mean ROC curve and its standard deviation, which we then use to evaluate the performance of our model.

The results from the machine learning model are shown in Fig. 9. The mean ROC curve is shown in blue in Fig. 9a; it achieves an excellent 10-fold CV AUC of 0.90 ± 0.01. This suggests that the XGBoost model is able to efficiently detect the dynamically unstable materials in the C2DB. We quantify the effect of the RAD-PDOS fingerprints by comparing the performance of the full model with a model trained on the low-dimensional fingerprint. We observe that the effect of including the RAD-PDOS in the fingerprint is statistically significant raising the AUC from 0.82 ± 0.01 to 0.90 ± 0.01. The relative impact of the RAD-PDOS fingerprints is shown in the feature importance evaluation in Fig. 9c. Here feature importance refers to how many times a feature is used to perform a split in the decision trees, and the feature importance has been summed for the six different components of the RAD-PDOS fingerprints, i.e., summing the radial distance and energy axes of the fingerprint. The vertical dashed black line shows the feature importance of random noise for reference. We observe that especially the RAD-PDOS *s**s* fingerprint leads to many splits in the gradient-boosted trees. Part of the importance is explained by the number of non-zero features in the fingerprints. For the materials without *d*-orbital electrons, the *s**d*, *p**d*, and *d**d* fingerprints will be all zero, while the *s**s*, *s**p*, and *p**p* will have fewer zero-valued features and thus more features to use for splits in the trees.

Because of the strong performance of the model, we envision that it can be deployed directly after the initial relaxation step of a high-throughput workflow to reduce the number of phonon calculations needed to remove the dynamically unstable materials as depicted in Fig. 1. The ML model does not replace the phonon calculation but is merely used to avoid performing expensive phonon calculations for materials that can be labeled unstable by the ML model. Depending on the number of stable candidates that one is willing to falsely label as unstable, it is possible to save a significant amount of phonon calculations by pre-screening with the ML model. The willingness to sacrifice materials is controlled by classification tolerance. The trade-off between the number of unstable materials removed and the number of stable materials lost is directly mapped out by the ROC curve. In Fig. 9b we have indicated the classification thresholds where we discard 5%, 10%, and 20% of the stable materials, and we observe that we can save 56 ± 9%, 70 ± 6%, and 85 ± 3% of the computations for the three thresholds, respectively.

As an additional test of the machine learning model, we apply it to the set of the 137 dynamically unstable materials that were investigated using the CBP protocol in the first part of the paper. The DS of the materials is evaluated by the ML model both before and after being pushed along an unstable mode (recall that before the push all the 137 materials are unstable; after the push the subset of 49 materials listed in Table 1 become stable while the other materials remain unstable). It is found that before the push 56% of the unstable materials are labeled correctly. After the push, only 29% of the unstable materials are labeled as unstable while the precision of the stable materials are 72%. Overall, the ML model performs worse on this test set than on a randomly selected test set from the original dataset. An obvious explanation is that the 137 materials were selected according to (i) low energy above the convex hull (Δ*H*_{hull} < 0.2 eV/atom) and (ii) dynamically unstable. As seen from Fig. 8a such materials are highly unusual and not well represented by the set of materials used to train the model.

In conclusion, we have performed a systematic study of structural instabilities in 2D materials. We have validated a simple protocol (here referred to as the CBP protocol) for identifying dynamical instabilities based on the frequency of phonons at the center and boundary of the BZ. The CBP protocol correctly classifies 2D materials as dynamically stable/unstable in 236 out of 250 cases^{25} and is ideally suited for high-throughput studies where the computational cost of evaluating the full phonon band structure becomes prohibitive.

For 137 dynamically unstable monolayers with low formation energies, we displaced the atoms along an unstable phonon mode and relaxed the structure in a 2 × 1 or 2 × 2 supercell. This resulted in 49 distorted, dynamically stable monolayers. The success rate of obtaining a dynamically stable structure from this protocol was found to be significantly higher for materials with only one unstable phonon mode as compared to cases with several modes. In the latter case, the displacement vector is not unique and different choices generally lead to different, (dynamically unstable) structures. The 49 stable structures were fully characterized by an extensive computational property workflow and the results are available via the C2DB database. The properties of the distorted structures can deviate significantly from the original high-symmetry structures, and we found only a weak, qualitative relation between the gain in total energy and band gap opening upon distortion. Finally, we trained a machine learning classification model to predict the DS using a radially decomposed projected density of states (RAD-PDOS) representation as input and a gradient boosting decision tree ensemble method (XGBoost) as learning algorithm. The model achieves an excellent ROC-AUC score of 0.90 and lends itself to the high-throughput assessment of DS.

## Methods

### Density functional theory calculations

All phonon calculations were performed using the asr.phonopy recipe of the Atomic Simulation Recipes (ASR)^{40}, which makes use of the Atomic Simulation Environment (ASE)^{41} and PHONOPY^{42}. The DFT calculations were performed with the GPAW^{43} code and the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional^{44}. The BZ was sampled on a uniform *k*-point mesh of density of 6.0 Å^{2} and the plane wave cutoff was set to 800 eV. To evaluate the Hessian matrix, the small displacement method was used with a displacement size of 0.01 Åand forces were converged up to 10^{−4} eV Å^{−1}. The non-analytical force constants were not included because we saw that they do not have any effect on the minimum eigenvalues of the hessian. To benchmark the CBP protocol, we compare it to full phonon band structures. In these calculations, the size of the supercell is chosen such that the Hessian matrix includes interactions between pairs of atoms within a radius of at least 12 Å. (This implies that the supercell must contain a sphere of radius 12 Å). The spacegroup of the materials before and after applying the protocol, were calculated with spglib^{45}, with a symmetry threshold of 0.1 Å, a value also used in a similar study^{26}.

### Machine learning method

Below we describe the machine learning algorithm that we have developed and assessed in an attempt to accelerate the prediction of dynamic instabilities. Our choice of machine learning algorithm is the library, XGBoost^{46}, due to its robustness and flexibility, while being a simpler model compared to neural network methods. XGBoost is a regularized high-performance implementation of gradient tree boosting, which makes predictions based on an ensemble of gradient-boosted decision trees. The decision trees of the ensemble are grown sequentially while learning from the mistakes of the previous trees by minimizing the loss function through gradient descent. This loss function is regularized to reduce the complexity of the individual decision trees which reduces the risk of overfitting. In contrast, in the widely used decision tree ensemble model Random Forest, the decision trees are grown independently and without any regularization.

The dataset used is a subset of C2DB and consists of 3212 materials (1536 stable and 1676 unstable materials), which does not include the 137 distorted materials identified in the first part of the paper (as these will be used as a particularly challenging test case for the model performance). As input for the model we use the radially projected density of states (RAD-PDOS) material fingerprints^{47}. The RAD-PDOS starts from the wave functions projected onto the atomic orbitals (*ν*) of all the atoms (*a*) of the crystal, \({\rho }_{nk}^{a\nu }=| \langle {\psi }_{nk}| a\nu \rangle {| }^{2}\). For each state, these projections are then combined into a radially distributed orbital pair correlation function,

Finally, the radial functions are distributed on an energy grid,

where *G*(*x*; *δ*) is a Gaussian of width *δ* centered at *x* = 0. For the materials in the dataset, the *s*, *p*, and *d* orbitals lead to six unique components of the RAD-PDOS fingerprint (\(\nu {\nu }^{{\prime} }=\{ss,sp,sd,pp,pd,dd\}\)). The fingerprint involves some hyperparameters for which we use the values \({E}_{\min }=-10\,{{{\rm{eV}}}},{E}_{\max }=10\,{{{\rm{eV}}}},{N}_{E}=25,{\delta }_{E}=0.3\,{{{\rm{eV}}}},{\alpha }_{E}=0.2{{{{\rm{eV}}}}}^{-1},{R}_{\min }=0,{R}_{\max }=5\,\mathring{\rm A} ,{N}_{R}=20,{\delta }_{R}=0.25\,\mathring{\rm A} ,{\alpha }_{R}=0.33{\mathring{\rm A} }^{-1}\).

In addition to the RAD-PDOS fingerprint, we consider a low-dimensional fingerprint consisting of five features, namely the PBE electronic band gap (\({\varepsilon }_{{{{\rm{gap}}}}}^{{{{\rm{PBE}}}}}\)), crystal formation energy (Δ*H*), density of states at the Fermi level (DOS@*E*_{F}), energy above the convex hull (Δ*H*_{hull}) and the total energy per atom in the unit cell. The low-dimensional fingerprint is used to train a “baseline" ML model that we use to benchmark the performance of the ML model based on the more involved RAD-PDOS fingerprint. Common to all the features considered is that they are obtained from a single DFT calculation and thus are much faster to compute that the phonon frequencies. The gradient boosting model introduces several hyperparameters such as depth of the trees, learning rate, minimum loss gain to perform a split, and minimum weights in tree leafs. These parameters are optimized using Bayesian optimization where a Gaussian process is fitted to the mean test ROC-AUC of a 10-fold cross-validation. The hyperparameters used here are max depth = 8, learning rate = 0.06, min split loss = 0 and min weights = 0. The XGBoost classification model is in fact a logistic regression model, i.e. the output of the model is a number between 0 and 1 which is interpreted as a probability. In our case, 0 (1) refers to a dynamically stable (unstable) material.

## Data availability

All the crystal structures and properties will be available in the C2DB (https://doi.org/10.11583/DTU.14616660.v1).

## References

Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation effects.

*Phys. Rev.***140**, A1133–A1138 (1965).Grüner, G. The dynamics of charge-density waves.

*Rev. Mod. Phys.***60**, 1129–1181 (1988).Peierls, R. Quantum Theory of Solids. International Series of Monographs on Physics. (Clarendon Press, 1996).

Ono, Y. & Hamano, T. Peierls distortion in two-dimensional tight-binding model.

*J. Phys. Soc. Jpn.***69**, 1769–1776 (2000).Weber, F. et al. Electron-phonon coupling and the soft phonon mode in tise

_{2}.*Phys. Rev. Lett.***107**, 266401 (2011).Bianco, R., Errea, I., Monacelli, L., Calandra, M. & Mauri, F. Quantum enhancement of charge density wave in nbs2 in the two-dimensional limit.

*Nano Lett.***19**, 3098–3103 (2019).Zhou, J. S. et al. Anharmonicity and doping melt the charge density wave in single-layer tise2.

*Nano Lett.***20**, 4809–4815 (2020).Leroux, M. et al. Anharmonic suppression of charge density waves in 2

*h*-nbs_{2}.*Phys. Rev. B***86**, 155125 (2012).Bianco, R., Errea, I., Monacelli, L., Calandra, M. & Mauri, F. Quantum enhancement of charge density wave in nbs2 in the two-dimensional limit.

*Nano Lett.***19**, 3098–3103 (2019).Calandra, M., Mazin, I. I. & Mauri, F. Effect of dimensionality on the charge-density wave in few-layer 2

*H*-NbSe_{2}.*Phys. Rev. B***80**, 241108 (2009).Zhu, X., Cao, Y., Zhang, J., Plummer, E. W. & Guo, J. Classification of charge density waves based on their nature.

*Proc. Natl. Acad. Sci.***112**, 2367–2371 (2015).Johannes, M. & Mazin, I. Fermi surface nesting and the origin of charge density waves in metals.

*Phys. Rev. B***77**, 165135 (2008).Lin, Y.-C., Dumcenco, D. O., Huang, Y.-S. & Suenaga, K. Atomic mechanism of the semiconducting-to-metallic phase transition in single-layered mos2.

*Nat. Nanotechnol.***9**, 391–396 (2014).Xi, X. et al. Strongly enhanced charge-density-wave order in monolayer nbse2.

*Nat. Nanotechnol.***10**, 765–769 (2015).Ugeda, M. et al. Characterization of collective ground states in single-layer nbse 2.

*Nat. Phys.***12**, 92–97 (2016).Yang, Y. et al. Enhanced superconductivity upon weakening of charge density wave transport in 2

*h*-tas_{2}in the two-dimensional limit.*Phys. Rev. B***98**, 035203 (2018).Yu, Y. et al. Gate-tunable phase transitions in thin flakes of 1t-tas2.

*Nat. Nanotechnol.***10**, 270–276 (2015).Ryu, H. et al. Persistent charge-density-wave order in single-layer tase2.

*Nano Lett.***18**, 689–694 (2018).Ge, Y. & Liu, A. Y. Effect of dimensionality and spin-orbit coupling on charge-density-wave transition in 2h-tase

_{2}.*Phys. Rev. B***86**, 104101 (2012).Sugawara, K. et al. Unconventional charge-density-wave transition in monolayer 1t-tise2.

*ACS Nano***10**, 1341–1345 (2016).Wang, H. et al. Large-area atomic layers of the charge-density-wave conductor tise2.

*Adv. Mater.***30**, 1704382 (2018).Zhu, J. et al. Argon plasma induced phase transition in monolayer mos2.

*J. Am. Chem. Soc.***139**, 10216–10219 (2017).Wang, L., Xu, Z., Wang, W. & Bai, X. Atomic mechanism of dynamic electrochemical lithiation processes of mos2 nanosheets.

*J. Am. Chem. Soc.***136**, 6693–6697 (2014).Krishnamoorthy, A. et al. Semiconductor-metal structural phase transformation in mote2 monolayers by electronic excitation.

*Nanoscale***10**, 2742–2747 (2018).Mounet, N. et al. Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds.

*Nat. Nanotechnol.***13**, 246–252 (2018).Togo, A. & Tanaka, I. Evolution of crystal structures in metallic elements.

*Phys. Rev. B***87**, 184104 (2013).Patrick, C. E., Jacobsen, K. W. & Thygesen, K. S. Anharmonic stabilization and band gap renormalization in the perovskite cssni

_{3}.*Phys. Rev. B***92**, 201205 (2015).Yang, R. X., Skelton, J. M., da Silva, E. L., Frost, J. M. & Walsh, A. Assessment of dynamic structural instabilities across 24 cubic inorganic halide perovskites.

*J. Chem. Phys.***152**, 024703 (2020).Kayastha, P. & Ramakrishnan, R. High-throughput design of peierls and charge density wave phases in q1d organometallic materials.

*J. Chem. Phys.***154**, 061102 (2021).Haastrup, S. et al. The computational 2d materials database: high-throughput modeling and discovery of atomically thin crystals.

*2D Mater.***5**, 042002 (2018).Gjerding, M. et al. Recent progress of the computational 2d materials database (c2db).

*2D Mater.***8**, 044002 (2021).Eriksson, F., Fransson, E. & Erhart, P. The hiphive package for the extraction of high-order force constants by machine learning.

*Adv. Theory Simul*.**2**, 1800184 (2019).Carrete, J. et al. Physically founded phonon dispersions of few-layer materials and the case of borophene.

*Mater. Res. Lett.***4**, 204–211 (2016).Royo, M., Hahn, K. R. & Stengel, M. Using high multipolar orders to reconstruct the sound velocity in piezoelectrics from lattice dynamics.

*Phys. Rev. Lett.***125**, 217602 (2020).Rossnagel, K. On the origin of charge-density waves in select layered transition-metal dichalcogenides.

*J. Phys. Condens. Matter***23**, 213001 (2011).Mak, K. F., Lee, C., Hone, J., Shan, J. & Heinz, T. F. Atomically thin mos2: a new direct-gap semiconductor.

*Phys. Rev. Lett.***105**, 136805 (2010).Manzeli, S., Ovchinnikov, D., Pasquier, D., Yazyev, O. V. & Kis, A. 2d transition metal dichalcogenides.

*Nat. Rev. Mater.***2**, 1–15 (2017).Liu, H. et al. Phosphorene: an unexplored 2d semiconductor with a high hole mobility.

*ACS Nano***8**, 4033–4041 (2014).Gjerding, M. et al. Atomic simulation recipes – a python framework and library for automated workflows.

*Comput. Mater. Sci.***199**, 110731 (2021).Larsen, A. H. et al. The atomic simulation environment–a python library for working with atoms.

*J. Phys. Condens. Matter***29**, 273002 (2017).Togo, A. & Tanaka, I. First principles phonon calculations in materials science.

*Scr. Mater.***108**, 1–5 (2015).Enkovaara, J. et al. Electronic structure calculations with GPAW: a real-space implementation of the projector augmented-wave method.

*J. Phys. Condens. Matter***22**, 253202 (2010).Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple.

*Phys. Rev. Lett.***77**, 3865–3868 (1996).Togo, A. & Tanaka, I. Spglib: a software library for crystal symmetry search. Preprint at https://arxiv.org/abs/1808.01590 (2018).

Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA, (2016). Association for Computing Machinery.

Knøsgaard, N. R. & Thygesen, K. S. Representing individual electronic states for machine learning gw band structures of 2d materials.

*Nat. Commun.***13**, 468 (2022).

## Acknowledgements

The Center for Nanostructured Graphene (CNG) is sponsored by the Danish National Research Foundation, Project DNRF103. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program grant agreement no. 773122 (LIMA). K.S.T. is a Villum Investigator supported by VILLUM FONDEN (grant no. 37789).

## Author information

### Authors and Affiliations

### Contributions

S.M. and K.S.T. developed the initial concept. S.M. performed the benchmark of the CBP protocol and developed the workflow and the analysis for the pushed materials. M.K.S., N.R.K, and P.M.L. conducted the machine learning analysis. K.S.T. supervised the work and helped in the interpretation of the results. All authors modified and discussed the paper together.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Manti, S., Svendsen, M.K., Knøsgaard, N.R. *et al.* Exploring and machine learning structural instabilities in 2D materials.
*npj Comput Mater* **9**, 33 (2023). https://doi.org/10.1038/s41524-023-00977-x

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s41524-023-00977-x