Data-driven studies of magnetic two-dimensional materials

We use a data-driven approach to study the magnetic and thermodynamic properties of van der Waals (vdW) layered materials. We investigate monolayers of the form \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {A}_2\hbox {B}_2\hbox {X}_6$$\end{document}A2B2X6, based on the known material \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {Cr}_2\hbox {Ge}_2\hbox {Te}_6$$\end{document}Cr2Ge2Te6, using density functional theory (DFT) calculations and machine learning methods to determine their magnetic properties, such as magnetic order and magnetic moment. We also examine formation energies and use them as a proxy for chemical stability. We show that machine learning tools, combined with DFT calculations, can provide a computationally efficient means to predict properties of such two-dimensional (2D) magnetic materials. Our data analytics approach provides insights into the microscopic origins of magnetic ordering in these systems. For instance, we find that the X site strongly affects the magnetic coupling between neighboring A sites, which drives the magnetic ordering. Our approach opens new ways for rapid discovery of chemically stable vdW materials that exhibit magnetic behavior.


Data visualization of A B X 6 structures
The data generated from DFT calculations can be visualized in order to learn how the magnetic moment and formation energy vary across the set of A 2 B 2 X 6 structures. For instance, we can examine how substitutions at one of the A sites with a given transition metal (TM) impacts the formation energy of a set of structures. Fig. S1(a) shows a histogram of formation energies for various TM substitutions. The histograms indicate that TM substitution can substantially affect the formation energy. For example, structures containing Cu tend to have a formation energy close to zero (i.e. they are chemically unstable). Fig. S1(b) shows the distribution of magnetic moments for A site substitutions with TM atoms (i.e. one of the two A sites is substituted with a TM atom). Different TM atoms have varying impact on the total magnetic moment per unit cell. We argue in the main text that the magnetic ordering of the structures is affected by the distance between adjacent X sites and from an X site to an A site. Fig. S2(b) and (c) shows how the distance between an A and X site (e.g. a 1 x 1 , a 1 x 2 ) and between adjacent X sites (e.g. x 1 x 2 ) is linked to the magnetic moment. We performed principal component analysis (PCA) [4] on a set of features comprising inter-site distances. These include the distance between the following pairs of sites: a 1 a 2 , a 1 b 1 , a 1 b 2 , a 1 x 1 , a 1 x 2 , a 2 b 1 , a 2 b 2 , a 2 x 1 , a 2 x 2 , b 1 x 1 , b 1 x 2 and x 1 x 2 . Fig. S2(d) shows that the magnetic moment per unit cell varies with the first principal component. Fig. S2(e) shows how the magnetic moment (see color scale) varies with the first and third principal components -clustering of regions of large µ is clearly shown. This demonstrates a relationship between the magnetic moment and the relative positions of the sites -in particular, the X and A sites.

Principal component analysis
Principal component analysis (PCA) is a data analysis tool, often used for dimensionality reduction [4] . It can be exploited for data visualization, where a high-dimensional space of descriptors is reduced to two-dimensions which can then be plotted (see Fig. S2). In addition, PCA is used alongside regression to create more robust models which avoid overfitting. PCA works by defining an orthogonal set of basis vectors which are ranked in terms of the variance in the data. That is, the first principal component represents a linear combination of the original set of descriptors that captures the most variance in the data. The coefficients of the vector represent the contribution from the original set of descriptors. Since the target property is not used to determine the principal components, there will not necessarily be a correlation between the target property and the principal components. However, it can be useful to determine which descriptors are important for predicting a target property. This is discussed in the next section.

Materials descriptors
The choice of descriptor plays a crucial role in the success of the machine learning prediction, and the interpretability of the descriptors allows new insights. Descriptor D i for the i th structure was constructed from atomic properties as follows: where D i is a function of atomic properties p of elements A, A , B, B and X of the i th structure. The function f (p A , p A , p B , p B , p X ) describes the statistics of a set of atomic properties. This can be the arithmetic mean of all the atomic properties (e.g. mean(p A ,  p A , p B , p B , p X )), the mean of the differences of adjacent pairs, the sum, the sum of the differences of adjacent pairs, the variance, the variance of the differences of adjacent pairs (i.e. var( This approach for constructing descriptors is similar to that used by Ghiringhelli et al. [5] . The atomic properties, p considered were the following: dipole polarizability, ionization energy, atomic radius, number of valence electrons, number of electrons, electronegativity, atomic volume, van der Waals radius, covalent radius, number of spin up electrons and chemical hardness [6;7] as defined by the python package mendeleev (Version 0.4.1).
We attempted to improve the molecular representation of the ABX 3 structures by designing a descriptor to contain information on the direct exchange interaction. We modified the Bag of Bonds (BoB) descriptor [8] by incorporating the relationship between the exchange coupling and the interatomic distance [9] . That is, we replace the Coulomb Kernel [10] in the BoB with an estimate of the term parameterizing the Bethe-Slater curve, the interatomic distance normalized by the radius of the incompletely filled shell: (J i + J j )(N up i +N up j ) for pairs of i and j atoms in a structure. J is the product of the atomic dipole polarizability and the covalent radius, N up is the number of unpaired electrons.

Descriptor importance
The extra trees regression model can rank the descriptors used in a successful model in order of importance [4] . The top twelve descriptors (in decreasing order of importance out of a total of 50) for the magnetic moment prediction are shown in Table 1   where 'var.' is the variance and 'max. difference' is the maximum in the set of differences of adjacent pairs. The top twelve descriptors for the formation energy prediction are shown in Table 2. Notice that the descriptor importances for the magnetic moment prediction and the formation energy prediction are different. This is reasonable since the ML models are optimized for a specific target property. The top descriptors for the magnetic moment prediction involve the number of valence electrons and the number of spin up electrons. The importance of these descriptors agrees with physical intuition for the magnetic moment. For instance, we expect that the maximum value for the magnetic moment per unit cell to be the sum of local magnetic moments (i.e. number of spin up electrons) at each site. In the case of the formation energy, the ionization energy and the dipole polarizability are among the top descriptors. This agrees with physical intuition for the formation energy (or chemical stability). For example, those pairs of atoms that have large differences in the ionization energy should easily form strong ionic bonds and thus would lead to a structure of lower (i.e. more negative) formation energy.

Magnetic moment predictions for X = Te, Se, S
Magnetic moment predictions for X = Te, Se and S are shown in Fig. S3. ML prediction accuracy decreases as X moves up the group VI column of the Periodic Table from Te to S. The descriptors used in our model do not generalize well across X sites. This may be due to the limited size of our data set or the larger number of degenerate parallel and antiparallel spin configurations for X=Se and S (see Fig. 2(a) in the main text). This may also be due to the larger variation in the magnetic moment per unit cell across B sites for X=Se and S when compared to X=Te structures. In addition, when all composites for X=Te, Se and S are combined, despite the larger training set size, the prediction accuracy is poor. The role of the X site on the magnetic properties is highlighted in Fig. S2. We also find that a dimensionality reduction method similar to PCA, t-stochastic nearest neighbor embedding (t-SNE) [11] demonstrates the importance of the X-sites. t-SNE is used to plot high dimensional (53-dimensional) data describing distances between adjacent A, B and X sites on a two-dimensional manifold (lattice parameters are also included alongside statistics such as the average or variance of the AB, AX and BX distances). The dimensionality reduction results in clustering of points, where three distinct clusters form for X=Te, Se and S. This underscores the impact of the X site and shows the importance of subgroups -using machine learning models to interpolate between 4/11  structures within similar regions of chemical space in order to improve prediction accuracy [12] . More sophisticated descriptors will be used in future studies to create more robust machine learning models [13;14;15;16] and improve prediction accuracy for both X=Se and S structures.

Prediction accuracy and data set size
In the main text we showed that increasing the size of the training set can increase the prediction accuracy of the formation energy. We extended the investigation of the training set size and prediction accuracy to include the magnetic moment. Initial models were built with data set size, N = 66. We increased the data set size from 66 to 262, fixing X = Te and allowing additional transition metal substitutions at A sites and atomic substitutions at B sites. The prediction accuracy of the test set was calculated for variations in the data set size. The training/test data split chosen here was 80/20. Fig. S4 shows how the test set prediction accuracy varied with the size of the data set.

Classification of magnetic order
We trained a support vector classification model [4] to predict whether an ABX 3 structure was ferromagnetic or antiferromagnetic. The model training and testing was done on 378 structures using a training/test split of 80 to 20 (180 structures were added to

5/11
the original data set of 198 structures). We found that the prediction performance improved with increasing data set size. The confusion matrix displayed in Fig. S5 shows the results of the ML classification performed on the test set data. The model predicted 51 structures had FM order, only 42 of which were FM, yielding an accuracy of 82%. The model also predicted 25 structures had AFM order, 20 of which were correct, giving a prediction accuracy of 80%.

Estimating the Curie temperature
The methods for estimating the Curie temperature T c for three-dimensional crystal structures is outlined in Ref. 17. The analytical expression for the Curie temperature of three-dimensional crystals is shown below.
Although this approach has also been used for two-dimensional magnetic materials in the past [18] , it does not include the magnetic anisotropy which is a key component of magnetic ordering in two-dimensions owing to the Mermin-Wagner theorem [19] . Ref. 20 describes the importance of the magnetocrystalline anisotropy (MCA) and derives an analytical expression for the Curie temperature, T c of magnetic two-dimensional materials that incorporates the MCA (see Eqn. 3,4,5).
where T Ising

Cleavage energy
It is important to determine whether the structures predicted from our analysis are layered materials which are cleavable. In order to estimate this, we determine the cleavage energy of a random sample of ABX 3 structures and compare these values to that of CrGeTe 3 , a known van der Waals material that has been exfoliated. We find that all the structures we analyzed had cleavage energy comparable to that of CrGeTe 3 . Consequently, we anticipate that all the ABX 3 structures considered in this work are cleavable layered materials. The results are shown in Table 3. We used the van der Waals DFT-D3 method with spin-orbit coupling included.

Chemical stability
A low formation energy is a necessary but not sufficient condition for chemical stability. Our high-throughput DFT approach uses formation energy as a computationally cheap initial indicator of chemical stability. Further analysis on stability is performed for a subset of promising candidates. This includes determining dynamic stability as well as exploring competing phases.

Dynamic stability
The formation energy only probes the thermodynamic stability of a pristine structure at absolute zero in the absence of phonons. Computation of the phonon spectrum and looking for positive frequencies evidences that the phases in question would be dynamically stable at low temperatures, whereas negative frequencies in the Brillouin zone indicate dynamic instability, or in some cases, the possibility of charge-density wave formation. We checked the dynamic stability for a subset of the promising candidates detailed in Supplementary Table 4 using the phonopy Python package [22] . We showed that our baseline compound Cr 2 Ge 2 Te 6 was dynamically stable and present the phonon spectra below. After examining the list of select candidates, we found that Cr 2 Ge 2 Se 6 , CrMoSi 2 Te 6 , CrSi 2 WTe 6 , and CrMnSi 2 Te 6 were dynamically stable. We also tested the candidate CrFeSiPS 6 and and found evidence of dynamical instability due to large negative phonons spanning the Brillouin zone.
Some two-dimensional materials exhibit a z-direction acoustic phonon mode which is quadratic in character near the Γ point, which can be notoriously difficult to converge [23;24] ; This is because fitting the dynamical matrix to small perturbations of the structure cell requires high precision, as the response due to small perturbations is vanishingly small and numerical noise can result in spurious negative values that do not reflect true dynamical instability.
Consequently, some structures featured an extremely small negative frequency near the Γ point of the Brillouin zone, which we believe to be a numerical artifact as documented in the SI of Ref. 23. Cr 2 Ge 2 Te 6 exhibited a similar minute instability. For that compound, we perturbed the supercell along the direction of the negative frequency and found that the perturbed structure had a higher, not lower, energy and relaxed back to its initial configuration, providing evidence this was a numerical artifact.
Our frozen phonon calculations used displacements of 0.05 Angstrom in radius. We found that in most cases 2x2x1 supercells were sufficient. The k-point grid size used for each calculation is noted in Supplementary Table 4 and were all centered at the Γ point. The path through the Brillouin zone used for all figures corresponds to the path from (0, 0, 0) → [25] . It is possible that going to higher supercell sizes or increasing precision (i.e., via increased k-point sampling, tighter electronic convergence criteria or increased cutoff energy) could result in increased stability, possibly by identifying chargedensity wave phases, and/or the elimination of spurious negative modes. A more rigorous test of dynamic stability in vacuum could involve an ab initio molecular dynamics simulation at finite temperature, but is beyond the scope of this work. We also note that under experimental conditions, the presence of a stabilizing substrate may support dynamical stability in a given structure [26] .
We present the phonon spectra results from the dynamic stability calculations for the subset of candidate structures in Supplementary Fig. S6

Competing phases
The goal of materials prediction is to identify materials with desirable properties that are likely to be synthesizable. This includes metastable phases [27] , such as the 1T and similar distorted phases of MoS 2 , which are higher in energy by 180 to 280 meV/atom compared to the ground state 2H-MoS 2 [28;29;30] . As a test case, we considered a known competing phase of the crystal structure based on Cr 2 Ge 2 Te 6 where the A site substitutions form alternating 'chains' [31] . We performed calculations for ten different example materials (CrAgP 2 S 6 , CrAgP 2 Se 6 , CrCuP 2 S 6 , CrCuP 2 Se 6 , CrMoSi 2 Te 6 , CrWSi 2 Te 6 , CrMoSiPTe 6 , CrMnSi 2 Te 6 , CrMnGe 2 Se 6 , CrFeGePSe 6 ) assuming they took on the 'chain' structure. For 7 of the materials the energy difference between the structures was less than 15 meV/atom. The largest energy difference found was 68 meV/atom, still significantly lower than the energy difference between 2H-MoS 2 and its related 1T structures. This demonstrates that the materials identified as interesting in this study are likely to be at least metastable and thus worth further detailed theoretical and experimental investigation. Interestingly, the lower energy phase of A 2 B 2 X 6 might also be dependent on the magnetic configuration. This suggests the possibility of an interplay between crystalline phase and magnetic order. Nevertheless, further exploration of all possible competing phases for the entire database of composite structures, including kinetic barriers and synthesis conditions, is beyond the scope of this manuscript.