Data-driven discovery of 2D materials by deep generative models

Lyngby, Peder; Thygesen, Kristian Sommer

doi:10.1038/s41524-022-00923-3

Download PDF

Article
Open access
Published: 11 November 2022

Data-driven discovery of 2D materials by deep generative models

npj Computational Materials volume 8, Article number: 232 (2022) Cite this article

6198 Accesses
31 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Efficient algorithms to generate candidate crystal structures with good stability properties can play a key role in data-driven materials discovery. Here, we show that a crystal diffusion variational autoencoder (CDVAE) is capable of generating two-dimensional (2D) materials of high chemical and structural diversity and formation energies mirroring the training structures. Specifically, we train the CDVAE on 2615 2D materials with energy above the convex hull ΔH_hull < 0.3 eV/atom, and generate 5003 materials that we relax using density functional theory (DFT). We also generate 14192 new crystals by systematic element substitution of the training structures. We find that the generative model and lattice decoration approach are complementary and yield materials with similar stability properties but very different crystal structures and chemical compositions. In total we find 11630 predicted new 2D materials, where 8599 of these have ΔH_hull < 0.3 eV/atom as the seed structures, while 2004 are within 50 meV of the convex hull and could potentially be synthesised. The relaxed atomic structures of all the materials are available in the open Computational 2D Materials Database (C2DB). Our work establishes the CDVAE as an efficient and reliable crystal generation machine, and significantly expands the space of 2D materials.

Physics guided deep learning for generative design of crystal materials with symmetry constraints

Article Open access 18 March 2023

Yong Zhao, Edirisuriya M. Dilanga Siriwardane, … Jianjun Hu

Predicting the synthesizability of crystalline inorganic materials from the data of known material compositions

Article Open access 25 August 2023

Evan R. Antoniuk, Gowoon Cheon, … Evan J. Reed

Automating crystal-structure phase mapping by combining deep learning with constraint reasoning

Article 16 September 2021

Di Chen, Yiwei Bai, … Carla P. Gomes

Introduction

The discovery of new materials that meet specific requirements e.g., in terms stability, compatibility, or physical properties, is an exciting scientific challenge of great relevance for our society. First-principles quantum mechanical calculations, e.g., based on density functional theory (DFT)¹, can predict the structure and properties of materials with high accuracy even before they are made in the lab. However, a DFT code by itself is insufficient for realising the paradigm of inverse materials design, where instead of mapping from structure to property using ab initio methods, the goal is to do the inverse map: from target property to atomic structure.

Considering the vast number of possible materials and the complexity of general structure-property relations, it becomes clear that successful inverse design relies on the following critical components: (i) automated execution and management of large numbers of atomistic calculations, (ii) access to large amounts of relevant high quality materials data, and (iii) efficient algorithms that can propose new candidate materials from data. In addition, synthesis and characterisation experiments must be included in the loop as well, but this aspect will not be considered here.

Components (i) and (ii) are largely in place. Indeed, the advent of workflow management engines for computational materials science^2,3,4,5,6 have made it possible to perform high-throughput (HT) computations for thousands of materials with minimal human intervention^{7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22}. Atomic structures and basic materials properties from such HT studies have been stored in computational databases^{2,23,24,25,26,27,28,29,30,31}, which together contain results of millions of DFT calculations. Complemented by experimental crystal structure databases, this makes a rich and rapidly growing data source for materials science.

The main challenge concerns component (iii). In previous HT studies, the candidate materials to be explored were mostly produced by lattice decoration of known reference materials. An obvious limitation of this approach is that the resulting materials by construction will be similar to the reference materials. In particular, the 3-tuple: (space group, occupied Wyckoff positions, stoichiometry) is invariant under element substitution.

Generative machine learning algorithms could potentially broaden the diversity of candidate materials beyond the lattice decoration paradigm. However, designing a successful generative model for periodic materials has proved challenging due the problem of creating representations of the lattice, atomic coordinates and elemental composition that are both invariant to translations and rotations and is invertible³². The vast chemical space of elements that can be present in inorganic crystals further complicates the design of representations. Therefore, previous implementations of generative models for periodic materials have either been limited to a fixed subset of chemical elements^33,34,35 and/or a subset of possible crystal structures^36,37. Recently, a general invertible representation has been proposed³⁸, which encodes the material as a matrix of both real and reciprocal space features, but is not invariant under translations and rotations. Xie et al. developed a crystal diffusion variational autoencoder (CDVAE) model³⁹, which uses a generative diffusion model to circumvent the need for an invertible representation by working directly on the atomic coordinates of the structures and employs an equivariant graph neural network to ensure invariance (in fact, equivariance).

In this work, we train a CDVAE³⁹ on 2615 2D materials with formation energy up to 0.3 eV/atom from the convex hull, and generate 10000 two-dimensional (2D) crystals. We compare these structures to a set of 14192 2D crystals obtained by systematic lattice decoration of the training structures. While ref. ³⁹ assessed validity and diversity of the generated crystals by means of qualitative measures, such as charge neutrality and minimum bond distance, we here conduct a systematic, unbiased quantitative analysis by performing full DFT-based relaxations and stability analysis of the generated structures. Compared to the crystals in the training set, the structures generated by the CDVAE (after DFT relaxation) show similar formation energies but significant differences in both composition and crystal structure. In general, CDVAE seems able to produce more complex materials without compromising stability.

As a direct test of the CDVAE model’s capacity to learn the stability properties of the training structures, we also train a CDVAE on materials lying at least 0.4 eV/atom above the hull. We find that the structures generated by this model have significantly higher formation energies than those produced by the CDVAE trained on the more stable materials.

In addition to providing a quantitative assessment of the CDVAE, our work identifies no less than 8599 new unique 2D materials with an energy above the convex hull below 0.3 eV/atom many of which could potentially be synthesised. The generated crystal structures are freely available as part of the C2DB²⁹.

Results

Crystal diffusion variational autoencoder

The CDVAE combines a variational autoencoder⁴⁰ and a diffusion model to generate new periodic materials. The crystal is represented by a tuple consisting of the atomic number of the N atoms, their respective coordinates, and the unit cell basis vectors. CDVAE consist of three networks: the encoder, a property predictor, and the decoder which all are trained concurrently. The encoder is a SE(3) equivariant periodic graph neural network (PGNN), which encodes the material onto a lower dimensional latent space from which the property predictor predicts the number of atoms N, the lattice vectors, and the composition, which is the fraction present of each element. The decoder is a noise conditional score network diffusion model⁴¹ that takes a structure with noise added to the atom types and coordinates and learns to denoise it into the original stable structure. Noise added to the atom types changes type of element for each atom into another element within the predicted composition with a certain probability given by the noise-level. Coordinate noise on the other hand is simply Gaussian noise added to the coordinates of each atom of the structure. The score of the conditional score network diffusion model is an estimate of the gradient of the underlying probability distribution of the materials and is predicted by another SE(3) equivariant PGNN. The use of a equivariant diffusion model as the decoder makes it possible to work directly with the atomic positions without the need for any intermediate representations like descriptors or graphs. This in turn makes the CDVAE framework quite general and agnostic to the kind chemical elements and structure which it is used for, which allows CDVAE to generate 2D materials even though it was designed for 3D bulk materials.

New materials can be generated after training by using the property predictor to sample the latent space. A unit cell with the predicted basis vectors is then initialised with the predicted atoms placed at random positions. Using the decoder, the atom types and coordinates of the initial random placed atoms are then gradually denoised into a material that is similar to the data distribution of the training data. CDVAE utilizes that adding noise to a stable material will likely decrease its stability and, thus, by learning to denoise the noisy stable structure, the decoder learns to increase the stability of the structure. Therefore CDVAE should be trained only on stable materials. An in-depth description of CDVAE can be found in Xie et al.³⁹.

The set of materials used as training data for the CDVAE and seed structures for the lattice decoration protocol (LDP), respectively, consists of 2615 unique 2D materials from the C2DB^29,31. As our aim is to discover new stable materials we limited the initial set of materials to the subset of C2DB with energy above the convex hull ΔH_hull < 0.3 eV/atom. This was done because both the CDVAE (LDP) are more likely to generate stable materials when trained on (seeded by) stable materials. We did not exclude dynamically unstable materials.

After training the CDVAE model, 10.000 structures were generated of which 1106 failed CDVAE’s basic validity check (charge neutrality and bond lengths above 0.5 Å). Of the remaining 8894 structures, 3891 are duplicate structures which are sorted out (see “Method” for more details) and the rest are relaxed using DFT.

Lattice decoration protocol

The lattice decoration protocol (LDP) substitutes the atoms in the seed structures by atoms of similar chemical nature. As a measure of chemical similarity we use the probability matrix P_AB introduced by ref. ⁴², which describes the likelihood that a stable material containing a chemical element A remains stable after the substitution A → B. Glawe et al. constructed this probability matrix based on an analysis of materials in the Inorganic Crystal Structure Database⁴³. We choose a substitution probability of 10% (P_AB > 0.1), which generates the substitutions shown in Fig. 1. Based on these substitution relations, we perform all possible single and double substitutions for all seed structures. For example, the seed structure MoS₂ generates six MX₂ structures with M = Mo,W and X = O, S, Se (the seed structure itself included). The total set of resulting materials are analysed for structures that share the same reduced formula and space group. Such structures are considered as duplicate structures and are filtered out. After removal of duplicates, we are left with 14,192 unique 2D crystals (the seed structures excluded) which are relaxed using DFT.

**Fig. 1: Heat map of the relative occurrence of each element in the 2D materials used to train (seed) the CDVAE (LDP).**

Workflow

Our workflow is illustrated in Fig. 2. Starting with the initial set of 2D materials, we generate two new sets of crystal structures using CDVAE and LDP, respectively. Duplicate structures within each set are removed (see “Method” for more details). The now unique crystal structures are relaxed using DFT calculations employing the PBE xc-functional (see “Method” for more details). After the relaxation, any new duplicate structures are removed again and as are materials that have relaxed into non 2D structures (we refer to ref. ³¹ for details on the dimensionality analysis). Finally the heat of formation, ΔH, and the energy above convex hull, ΔH_hull, are calculated.

In Table 1, we report the success rates for the DFT relaxations of the structures generated by CDVAE and LDP, respectively, together with the average number of relaxation steps and the average energy decrease from the initial to the relaxed structure. All three parameters are assumed to describe how close the initial structures are to the final DFT relaxed structures - e.g., a structure from a perfect generative method would only need one relaxation step and the energy decrease would be zero. As expected, neither LDP or CDVAE generate stable relaxed structures. However, while the LDP on average requires less steps to relax, the CDVAE structures are closer in energy to the relaxed structure. The fact that the number of relaxation steps and reduction in energy upon relaxation is comparable for LDP and CDVAE, suggest that the CDVAE-generated crystals are as close to relaxed structures as the LPD-generated structures.

Table 1 Summary statistics for the DFT relaxation of the two methods for generating initial structures.

Full size table

We observe that the DFT relaxation fails for about 18% of the LDP-generated and about 31% of the CDVAE-generated structures. The vast majority of these failures are due to problems in converging the Kohn–Sham SCF cycle. We suspect that a large fraction of the convergence problems occur for materials with magnetic ground state (all calculations are performed with spin polarisation). This is supported by the fact that 30% of the materials containing one or more of the magnetic 3d-metals (V, Cr, Mn, Fe, Co, Ni), fails due to convergence errors, while this is only happens for 10% of other materials. Moreover, 38% of the CDVAE-generated structures contains at least one of the the magnetic 3d-metals, while this is only the case for 30% of the LDP-generated structures. This difference is consistent with the difference in the observed success rate.

Thermodynamic stability

A histogram of the heat of formation and the energy above the convex hull for the (DFT-relaxed) structures resulting from the CDVAE and LDP are shown in Fig. 3. The distributions of both ΔH and ΔH_hull obtained for the two structure generation methods are remarkably similar. For example, 73.8% of the CDVAE materials have ΔH_hull below 0.3 eV/atom (as the training data) while this is the case for 74.0% of the LDP materials. It should, however, be noted that the smaller success rate of the DFT relaxation of the CDVAE generated materials could influence these statistics as it likely that many of the structures which could not be converged would have resulted in unstable structures. The inset of Fig. 3 shows how the energy above the convex hull is distributed depending on the number of different elements in the structure. First of all it is evident that CDVAE is able to create structures with a larger number of unique elements than is present in the training data (5 unique elements is the maximum in the seed structures), while LDP is limited to the stoichiometries present in the seed materials. However, generally the thermodynamic stability is lower for the materials with larger number of unique elements. Examples of some of the most stable CDVAE generated structures is shown in Fig. 4. The material Zr₂CCl₂ shown in c) is one of the 22 materials which are found both by the CDVAE and LDP method.

**Fig. 3: Histogram of the heat of formation and energy above convex hull for the DFT-relaxed structures resulting from the CDVAE and LDP methods.**

**Fig. 4: Examples of CDVAE generated structures.**

To predict whether a given 2D material can be synthesised is a complex problem that involves many factors. Often the size of ΔH_hull is used a soft criterion for synthesizability as it determines the material’s thermodynamic stability relative to other competing phases (this criterion neglects growth kinetics and substrate interactions both of which can be important for 2D materials). A previous study of 700 polymorphs in 41 common inorganic bulk material systems showed that a threshold of ΔH_hull < 0.1 eV/atom will exclude 26% of the known synthesised polymorphs⁴⁴. We also note that the T-phase of MoS₂ was synthesised both as a monolayer⁴⁵ and a layered bulk⁴⁶, despite having ΔH_hull = 0.18 eV/atom⁴⁷. These examples demonstrate that many of the predicted 2D materials with ΔH_hull < 50 meV/atom (2004) or even ΔH_hull < 100 meV/atom (3400), are likely to be synthesizable.

While the ΔH_hull-distributions in Fig. 3 are clearly peaked close to zero they also have a tail of less stable materials. In particular, about 26% of the materials have ΔH_hull > 0.3 eV/atom (the threshold to select the training structures). A natural question to ask is then to what extent the structures produced by the CDVAE are in fact biased towards high stability structures? To answer this question, we trained a CDVAE model on 988 2D materials with a ΔH_hull > 0.4 eV/atom and used it to generate another 10.000 structures from which we randomly selected 1000 non-duplicate structures, which we relaxed following the same workflow as described before. The distribution of the energy above the convex hull of the relaxed structures for both the stable and unstable CDVAE models are shown in Fig. 5 together with the distribution of their respective training sets. We clearly see that the CDVAE model trained to generate unstable materials produces structures that are significantly further from the convex hull than the stable model. This illustrates that CDVAE successfully learns the chemistry of the materials in the training data.

**Fig. 5: Kernel density estimate showing the distribution of the convex hull energies for the stable and unstable CDVAE generated dataset.**

Structural diversity

Having established the capability of the CDVAE to produce materials with good stability properties, we now turn to its ability to generate crystals of high chemical and structural diversity. While the LDP is restricted to stoichiometries and crystal structures already present in the seed structures, the CDVAE (in principle) has no such limitations. Figure 1 shows the relative occurrence of each element in the seed/training structures. The corresponding plots for the materials generated by LDP and CDVAE (after relaxation) are shown in Supplementary Fig. 1. Both LDP and CDVAE produces diverse compositions with elements covering most of the periodic table. However, CDVAE has a significantly higher occurrence of oxygen and chalcogens (S and Se) as well as halogens (Cl, Br and I). This trend is also present for the materials prior to relaxation and, thus does not originate from a potential higher DFT convergence rate for these elements. Instead, the six elements are also more prevalent, albeit slightly, in the seed structures which could indicate an overfitting of the model.

The CDVAE generates significantly different chemical compositions and crystal structures as compared to the seed structures and those generated by the LDP. Figure 6 shows the relative frequencies of stoichiometry, space group number and occupied Wyckoff positions, respectively. Only the most common classes of the seed structures are shown. We find 239 unique stoichiometries among the CDVAE-generated materials, while there is only 87 and 103 unique stoichiometries in the seed structures and LDP-generated structures, respectively. The higher number of unique stoichiometries in the LDP-generated structures than in the seed structures is due to new stoichiometries being created when two different elements are substituted by the same element, or when an element is being substituted with an element already present in the seed material. For example, the seed materials Te₂Cu₄O₁₂ (stoichiometry AB₂C₆) becomes Cu₄S₁₄ (stoichiometry A₂B₇) under the double substitution O → S and Te → S. The significantly larger number of unique stoichiometries generated by CDVAE compared to the LDP shows that the former is able to produce new classes of structures that are not present in the training data. Another indication of new structural prototypes being created is the occurrence of new occupied Wyckoff positions within each space group when comparing to the training data. These new combinations of space group and occupied Wyckoff position are shown in Supplementary Tables 1 and 2 for both the CDVAE dataset and the LDP dataset. In total there are 130 new combinations and 357 materials with these new combinations in the CDVAE-generated materials, while there are only 76 new combinations in the LDP-generated materials and only 339 materials with the combinations - even though the LDP dataset is almost three times as large as the CDVAE dataset. It might seem strange that LDP generates new combinations of space group and Wyckoff position as simple element substitution should preserve the space group and occupied Wyckoff position. However during the DFT relaxation the crystal symmetry can change and thus so can the space group and occupied Wyckoff positions.

**Fig. 6: Histograms of structural parameters.**

The CDVAE tends to generate rather complex, low-symmetry structures, which is illustrated by the large fraction of materials with space group number 1 and occupied Wyckoff position a. Moreover, the average number of different elements in the unit cell is 4.0 for the CDVAE generated materials while it is only 2.6 for the C2DB seed structures. The larger number of different elements is part of the reason for the higher fraction of materials with low symmetry. This tendency of CDVAE to generate structures with more complex composition is also noted by Xie et al., who attributes this to a non-Gaussian distribution of the underlying structure of the materials. Thus, when CDVAE generates new materials it samples from a Gaussian distribution \({{{\mathcal{N}}}}(0,1)\) from which it predicts the number of atoms and composition. However if \({{{\mathcal{N}}}}(0,1)\) is not representative of the latent space, out of distribution materials can be generated. For materials discovery this could, however, be advantageous as this makes CDVAE able to generate new crystal types which are not present in the training data.

To give a global overview of the structural distribution of the three data sets, a t-SNE embedding is shown in Fig. 7. The t-SNE analysis is made for 2500 materials sampled randomly from each data set. Here the structure is represented as a tuple given by the space group, occupied Wyckoff positions, and stoichiometry, where each is one-hot encoded before the t-SNE embedding. We see that most of the training data form clear clusters, which represent the most common stoichiometries, space group and Wyckoff positions. The LDP generated materials mostly follow the same pattern as the seed structures. However, the CDVAE generated structures are more spread out, which is partly due to the large variation in their stoichiometries, while a few clusters appear due to the large fraction of low symmetry materials with space group number 1. One noteworthy example is the cluster of CDVAE generated materials with stoichiometry ABC₂D₂, space group number 25 and occupied Wyckoff positions a, b, c, d. For this specific combination, CDVAE discovered 123 new materials of which 30 lies within 50 meV of the convex hull, while there is no examples of such materials in the training set nor in the LDP generated structures. Two of the most stable discovered materials of this type can be seen in Fig. 4h, i. The new class of structures have broken out-of-plane symmetry either due to the outermost atoms (i) or the innermost atoms (h). The fact that the CDVAE is able to generate new classes of stable materials, which are not present in the training data, is very promising and a clear advantage of deep generative models compared to lattice decoration protocols.

**Fig. 7: t-SNE visualization of the structural diversity.**

Discussion

In conclusion, we have successfully employed a deep generative model in combination with a systematic lattice decoration protocol (LDP) to generate more than 8500 unique 2D crystals with formation energies (ΔH) within 0.3 eV/atom of the convex hull. Out of these, more than 2000 have ΔH within 50 meV/atom of the convex hull, and could potentially be synthesised. This represents at least a doubling of the known stable 2D materials.

In addition to the very significant expansion of the known space of 2D materials, our work provides a quantitative assessment of the crystal diffusion variational autoencoder (CDVAE)³⁹, and establishes its excellent performance with respect to the two key criteria: ability to learn the stability properties of the training structures, and ability to generate crystals with high chemical and structural diversity. In fact, only 25% of the generated materials had ΔH_hull above the 0.3 eV/atom threshold used to select the training structures, and the stoichiometries of the generated materials span 239 types versus 87 present in the training structures. Generally, the crystal structures generated by CDVAE have higher complexity and lower symmetries than the training structures. We found the method of lattice decoration to be complementary to the CDVAE generator with the two methods yielding only 22 common crystals out of the 11630 structures generated in total. While the LDP is limited to the structural blueprint of the seed materials, CDVAE is able to generate new classes of materials, which are not present in the initial data set. This is promising for an autonomous materials discovery method as it adds new genes to pool of trial materials and thus goes beyond the lattice decoration paradigm.

The fact that CDVAE is comparable to lattice decoration (with substitution by chemically similar elements) in terms of stability while producing new and diverse crystal structures, is a testimony to the prospect of using deep generative models in materials discovery.

All the structures are available in the C2DB database⁴⁷, and their basic properties will also be made available as the execution of the C2DB property workflow proceeds.

Method

Workflow

To set up and manage the workflow we use the Atomic Simulation Recipes⁵, which has implemented tools for DFT relaxation, duplicate removal, dimensionality check, and for calculating the thermodynamic properties. The DFT calculations are performed using the GPAW code⁴⁸ with the PBE xc-functional, a plane wave cut-off energy of 800 eV and a k-point density of at least 4 Å. The relaxation is stopped when the maximum force is below 0.01 eV/Å and the maximum stress is below 0.002 eV/Å³.

The duplicate removal recipe finds duplicate structures using the root mean square distance (RMSD) between the structures which is calculated using the Python library pymatgen⁴⁹. We consider structures to be duplicate if RMSD < 0.3 Å and only keep the structure with the lowest heat of formation. See ref. ³¹ for more information. For the initial LDP generated materials (before the DFT relaxation) a more crude duplicate sorting of the structures is employed, where two materials with the same reduced formula and space group are considered duplicates.

To determine the convex hull we use as reference databases the C2DB as well as a database of reference structures comprising 9590 elementary, binary, and ternary crystals that all lie within 20 meV of the convex hull in the Open Quantum Materials Database (OQMD)²⁵. These reference structures were relaxed using the VASP⁵⁰ code at the PBE level (PBE+U for selected transition metal oxides) as part of the OQMD project. Since we use the GPAW code to relax and evaluate the energy of the 2D materials, we have re-calculated the total energy of the reference structures (without re-optimisation) using the GPAW code.

CDVAE

CDVAE is designed to generate 3D bulk crystals, where the unit cell is periodic in all three directions. This introduces a problem when generating 2D materials, which are non-periodic in one direction. We solve this issue by introducing an artificial periodicity in the non-periodic direction with a lattice vector which is an order of magnitude larger than those in the periodic directions. This ensures that the graph networks only connect atoms in the 2D layer and thus CDVAE learns to generate 2D materials. When training the model, we used 70% of the materials in the training set, while 15% was used for validation and 15% for test. We used the same hyperparameters as employed by Xie et al. for their MP-20 data set. See ref. ³⁹ for more information.

Data availability

All the discovered crystal structures and their properties are available as a part of C2DB (https://cmr.fysik.dtu.dk/c2db/c2db.html). The discovered materials can be selected using the key origin, which have the values Lyngby22_CDVAE or Lyngby22_LDP for the CDVAE and LDP generated materials respectively.

References

Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133 (1965).
Article Google Scholar
Curtarolo, S. et al. Aflow: An automatic framework for high-throughput materials discovery. Computational Mater. Sci. 58, 218–226 (2012).
Article CAS Google Scholar
Jain, A. et al. Fireworks: a dynamic workflow system designed for high-throughput applications. Concurrency Comput. 27, 5037–5059 (2015).
Article Google Scholar
Pizzi, G., Cepellotti, A., Sabatini, R., Marzari, N. & Kozinsky, B. Aiida: automated interactive infrastructure and database for computational science. Computational Mater. Sci. 111, 218–230 (2016).
Article Google Scholar
Gjerding, M. et al. Atomic simulation recipes: A python framework and library for automated workflows. Computational Mater. Sci. 199, 110731 (2021).
Article CAS Google Scholar
Mortensen, J., Gjerding, M. & Thygesen, K. Myqueue: Task and workflow scheduling system. J. Open Source Softw. 5, 1844 (2020).
Article Google Scholar
Greeley, J., Jaramillo, T. F., Bonde, J., Chorkendorff, I. & Nørskov, J. K. Computational high-throughput screening of electrocatalytic materials for hydrogen evolution. Nat. Mater. 5, 909–913 (2006).
Article CAS Google Scholar
Madsen, G. K. Automated search for new thermoelectric materials: the case of liznsb. J. Am. Chem. Soc. 128, 12140–12146 (2006).
Article CAS Google Scholar
Curtarolo, S. et al. The high-throughput highway to computational materials design. Nat. Mater. 12, 191–201 (2013).
Article CAS Google Scholar
Kirklin, S., Meredig, B. & Wolverton, C. High-throughput computational screening of new li-ion battery anode materials. Adv. Energy Mater. 3, 252–262 (2013).
Article CAS Google Scholar
Ørnsø, K. B., Garcia-Lastra, J. M. & Thygesen, K. S. Computational screening of functionalized zinc porphyrins for dye sensitized solar cells. Phys. Chem. Chem. Phys. 15, 19478–19486 (2013).
Article Google Scholar
Zhang, Z. et al. Computational screening of layered materials for multivalent ion batteries. ACS Omega 4, 7822–7828 (2019).
Article CAS Google Scholar
Chen, W. et al. Understanding thermoelectric properties from high-throughput calculations: trends, insights, and comparisons with experiment. J. Mater. Chem. C. 4, 4414–4426 (2016).
Article CAS Google Scholar
Hachmann, J. et al. The harvard clean energy project: large-scale computational screening and design of organic photovoltaics on the world community grid. J. Phys. Chem. Lett. 2, 2241–2251 (2011).
Article CAS Google Scholar
Bhattacharya, S. & Madsen, G. K. High-throughput exploration of alloying as design strategy for thermoelectrics. Phys. Rev. B 92, 085205 (2015).
Article Google Scholar
Castelli, I. E. et al. Computational screening of perovskite metal oxides for optimal solar light capture. Energy Environ. Sci. 5, 5814–5819 (2012).
Article CAS Google Scholar
Hautier, G., Miglio, A., Ceder, G., Rignanese, G.-M. & Gonze, X. Identification and design principles of low hole effective mass p-type transparent conducting oxides. Nat. Commun. 4, 1–7 (2013).
Article Google Scholar
Yu, L. & Zunger, A. Identification of potential photovoltaic absorbers based on first-principles spectroscopic screening of materials. Phys. Rev. Lett. 108, 068701 (2012).
Article Google Scholar
Kuhar, K., Pandey, M., Thygesen, K. S. & Jacobsen, K. W. High-throughput computational assessment of previously synthesized semiconductors for photovoltaic and photoelectrochemical devices. ACS Energy Lett. 3, 436–446 (2018).
Article CAS Google Scholar
Aykol, M. et al. High-throughput computational design of cathode coatings for li-ion batteries. Nat. Commun. 7, 1–12 (2016).
Article Google Scholar
Mounet, N. et al. Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds. Nat. Nanotechnol. 13, 246–252 (2018).
Article CAS Google Scholar
Chen, L.-Q. et al. Design and discovery of materials guided by theory and computation. npj Computational Mater. 1, 1–2 (2015).
Article Google Scholar
Thygesen, K. S. & Jacobsen, K. W. Making the most of materials computations. Science 354, 180–181 (2016).
Article CAS Google Scholar
Himanen, L., Geurts, A., Foster, A. S. & Rinke, P. Data-driven materials science: status, challenges, and perspectives. Adv. Sci. 6, 1900808 (2019).
Article Google Scholar
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (oqmd). JOM 65, 1501–1509 (2013).
Article CAS Google Scholar
Jain, A. et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Article Google Scholar
Borysov, S. S., Geilhufe, R. M. & Balatsky, A. V. Organic materials database: An open-access online database for data mining. PLoS ONE 12, e0171501 (2017).
Article Google Scholar
Draxl, C. & Scheffler, M. The nomad laboratory: from data sharing to artificial intelligence. J. Phys.: Mater. 2, 036001 (2019).
CAS Google Scholar
Haastrup, S. et al. The computational 2d materials database: high-throughput modeling and discovery of atomically thin crystals. 2D Mater. 5, 042002 (2018).
Article CAS Google Scholar
Cheon, G. et al. Data mining for new two-and one-dimensional weakly bonded solids and lattice-commensurate heterostructures. Nano Lett. 17, 1915–1923 (2017).
Article CAS Google Scholar
Gjerding, M. N. et al. Recent progress of the computational 2d materials database (c2db). 2D Mater. 8, 044002 (2021).
Article CAS Google Scholar
Noh, J., Gu, G. H., Kim, S. & Jung, Y. Machine-enabled inverse design of inorganic solid materials: promises and challenges. Chem. Sci. 11, 4871–4881 (2020).
Article CAS Google Scholar
Noh, J. et al. Inverse design of solid-state materials via a continuous representation. Matter 1, 1370–1384 (2019).
Article Google Scholar
Kim, S., Noh, J., Gu, G. H., Aspuru-Guzik, A. & Jung, Y. Generative adversarial networks for crystal structure prediction. ACS Cent. Sci. 6, 1412–1420 (2020).
Article CAS Google Scholar
Long, T. et al. Constrained crystals deep convolutional generative adversarial network for the inverse design of crystal structures. NPJ Comput. Mater. 7, 66 (2021).
Article CAS Google Scholar
Zhao, Y. et al. High-throughput discovery of novel cubic crystal materials using deep generative neural networks. Adv. Sci. 8, 2100566 (2021).
Article CAS Google Scholar
Song, Y., Siriwardane, E. M. D., Zhao, Y. & Hu, J. Computational discovery of new 2d materials using deep learning generative models. ACS Appl. Mater. Interfaces 13, 53303–53313 (2021).
Article CAS Google Scholar
Ren, Z. et al. An invertible crystallographic representation for general inverse design of inorganic crystals with targeted properties. Matter 5, 314–335 (2022).
Article CAS Google Scholar
Xie, T., Fu, X., Ganea, O.-E., Barzilay, R. & Jaakkola, T. Crystal diffusion variational autoencoder for periodic material generation. arXiv preprint arXiv:2110.06197 (2021).
Kingma, D. P. & Welling, M.Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
Song, Y. & Ermon, S. Generative modeling by estimating gradients of the data distribution. In Wallach, H. et al. (eds.) Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., 2019). https://proceedings.neurips.cc/paper/2019/file/3001ef257407d5a371a96dcd947c7d93-Paper.pdf.
Glawe, H., Sanna, A., Gross, E. K. U. & Marques, M. A. L. The optimal one dimensional periodic table: a modified pettifor chemical scale from data mining. N. J. Phys. 18, 093011 (2016).
Article Google Scholar
Belsky, A., Hellenbrandt, M., Karen, V. L. & Luksch, P. New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. Acta Crystallogr. Sect. B 58, 364–369 (2002).
Article Google Scholar
Aykol, M., Dwaraknath, S. S., Sun, W. & Persson, K. A. Thermodynamic limit for synthesis of metastable inorganic materials. Sci. Adv. 4, eaaq0148 (2018).
Kappera, R. et al. Phase-engineered low-resistance contacts for ultrathin mos2 transistors. Nat. Mater. 13, 1128–1134 (2014).
Article CAS Google Scholar
Bell, R. E. & Herfert, R. E. Preparation and characterization of a new crystalline form of molybdenum disulfide. J. Am. Chem. Soc. 79, 3351–3354 (1957).
Article CAS Google Scholar
https://cmr.fysik.dtu.dk/c2db/c2db.html.
Mortensen, J. J., Hansen, L. B. & Jacobsen, K. W. Real-space grid implementation of the projector augmented wave method. Phys. Rev. B 71, 035109 (2005).
Article Google Scholar
Ong, S. P. et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Mater. Sci. 68, 314–319 (2013).
Article CAS Google Scholar
Kresse, G. & Hafner, J. Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558–561 (1993).
Article CAS Google Scholar

Download references

Acknowledgements

We thank Morten N. Gjerding for assistance with setting up the lattice decoration protocol. We acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme Grant No. 773122 (LIMA) and Grant agreement No. 951786 (NOMAD CoE). K.S.T. is a Villum Investigator supported by VILLUM FONDEN (grant no. 37789).

Author information

Authors and Affiliations

Computational Atomic-scale Materials Design (CAMD), Department of Physics, Technical University of Denmark, 2800, Kgs. Lyngby, Denmark
Peder Lyngby & Kristian Sommer Thygesen

Authors

Peder Lyngby
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Sommer Thygesen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.L. and K.S.T. developed the initial concept. P.L. ran the generative models, the DFT simulations and performed the data analysis. K.S.T. supervised the project and aided with the interpretation of the results. P.L. and K.S.T. wrote and discussed the paper together.

Corresponding author

Correspondence to Peder Lyngby.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lyngby, P., Thygesen, K.S. Data-driven discovery of 2D materials by deep generative models. npj Comput Mater 8, 232 (2022). https://doi.org/10.1038/s41524-022-00923-3

Download citation

Received: 27 June 2022
Accepted: 23 October 2022
Published: 11 November 2022
DOI: https://doi.org/10.1038/s41524-022-00923-3

This article is cited by

Leveraging language representation for materials exploration and discovery
- Jiaxing Qu
- Yuxuan Richard Xie
- Elif Ertekin
npj Computational Materials (2024)
Prediction of BiS2-type pnictogen dichalcogenide monolayers for optoelectronics
- José D. Mella
- Muralidhar Nalabothula
- Aldo H. Romero
npj 2D Materials and Applications (2024)
Methods and applications of machine learning in computational design of optoelectronic semiconductors
- Xiaoyu Yang
- Kun Zhou
- Lijun Zhang
Science China Materials (2024)
Guided diffusion for inverse molecular design
- Tomer Weiss
- Eduardo Mayo Yanes
- Renana Gershoni-Poranne
Nature Computational Science (2023)
MatHub-2d: A database for transport in 2D materials and a demonstration of high-throughput computational screening for high-mobility 2D semiconducting materials
- Mingjia Yao
- Jialin Ji
- Wenqing Zhang
Science China Materials (2023)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Crystal diffusion variational autoencoder

Lattice decoration protocol

Workflow

Thermodynamic stability

Structural diversity

Discussion

Method

Workflow

CDVAE

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links