Constrained crystals deep convolutional generative adversarial network for the inverse design of crystal structures

Autonomous materials discovery with desired properties is one of the ultimate goals for materials science, and the current studies have been focusing mostly on high-throughput screening based on density functional theory calculations and forward modelling of physical properties using machine learning. Applying the deep learning techniques, we have developed a generative model which can predict distinct stable crystal structures by optimizing the formation energy in the latent space. It is demonstrated that the optimization of physical properties can be integrated into the generative model as on-top screening or backwards propagator, both with their own advantages. Applying the generative models on the binary Bi-Se system reveals that distinct crystal structures can be obtained covering the whole composition range, and the phases on the convex hull can be reproduced after the generated structures are fully relaxed to the equilibrium. The method can be extended to multicomponent systems for multi-objective optimization, which paves the way to achieve the inverse design of materials with optimal properties.


Introduction
The discovery and exploitation of materials has enormous benefits for the welfare of society and technological revolutions 1 , which motivates the launch of Materials Genome Initiative in 2011 2,3 . Till now, high-throughput (HTP) workflows based on density functional theory (DFT) enable massive calculations on existing and hypothetical 2 compounds, accelerating materials discovery dramatically 4 . For instance, crystal structure predictions can be performed based on brute force substitution of the known prototypes or the evolutionary algorithms as implemented in CALYPSO 5 and USPEX 6 . Nevertheless, the soaring computation cost prevents exhaustive screening over an immense phase space, limiting the application of such methods and calling for more efficient solutions. Global optimization methods added a target function to guide the searching and therefore reduce the computational cost 7 , e.g., crystal structure prediction with Bayesian optimization can efficiently select the most stable structure from a large number of candidate structures with a lower number of searching trials, and has been successfully applied to predict structures of NaCl and Y 2 Co 17 8 . The random search method has also demonstrated its potential to find the global minimum with highly parallel implementation, as manifested by the applications on elemental boron, silicon cluster and lithium hydrides 9,10 . Last but not least, the evolutionary algorithm is promising in predicting thermodynamically stable crystal structures, such as demonstrated for tungsten borides, carbon, and TiO 2 11,12 .
The emergent machine learning techniques have become the fourth paradigm for materials science 13 , as exemplified by the rapid developments and applications in the last decade 14 . Particularly, the generative machine learning methods have been utilized to discover distinct materials, e.g., applying neural networks to assist molecule design 15, 16 . Two particularly powerful methods therein are the variational autoencoder (VAE) 17 and generative adversarial network (GAN) 18 , resulting in significant progresses on molecule design 19 . Combined with the successful forward modeling of physical properties 20 , generative machine learning on the existed structures enables inverse design, i.e., prediction of distinct structures with desired properties 1,21 . Unfortunately, unlike molecules which can be represented using simplified molecular input line entry specification (SMILES) 22 , inverse design of three-dimensional (3D) crystal structures has been rare due to the challenges in obtaining a continuous representation in the latent space 20 . 3 Recently, Nouira et al. developed a CrystalGAN model to design ternary A-B-H phases starting from binary A-H   and B-H structures using a vector-based representation 23 , and successfully applied it on the Ni-Pd-H system. Another promising representation is the two-dimensional (2D) crystal graphs constructed based on the voxel and autoencoder methods [24][25][26] 29 . These models exhibit excellent abilities in generating distinct structures, but not all predicted phases host the desired functionality. For example, only ZeoGAN integrates the optimization of heat of adsorption, whereas the other schemes exert mostly constraints to filter the generated structures. It is noted that by nature, inverse design entails the optimization of properties in the latent space, and thus internalizes the desired property as a joint objective of the generator 30 .
In this work, we developed a GAN-based inverse design framework for crystal structure prediction with target properties and applied it on the binary Bi-Se system. It is firstly demonstrated that our deep convolutional generative adversarial network (DCGAN) can be applied to generate distinct crystal structures 31 . Taking formation energy as the target property, its optimization is integrated into the DCGAN model in two schemes: DCGAN + constraint to select structures following the conventional screening method, and constrained crystals deep convolutional generative adversarial network (CCDCGAN) with an extra feedback loop for automatic optimization. The performance of the DCGAN + constraint and CCDCGAN models are comparatively evaluated, and it is demonstrated CCDCGAN is more efficient in generating stable structures, while the DCGAN + constraint model has a higher generation rate of meta-stable structures. Both schemes can be generalized to perform multi-objective inverse design for multicomponent systems 32 , hence our work paves the way to achieve autonomous inverse design of distinct 4 crystal phases with desired properties.

Data for generative adversarial network
Typically, DCGAN requires 10 3 known crystal structures as positive examples during training. However, there are mostly tens of known crystalline phases for a given binary system, e.g., Materials Projects (MP) 33  Bi-Se database, 155 (1.56%) structures of the database are stable (i.e., formation energy lower than 0 eV/atom and distance to the convex hull smaller than 100 meV/atom) and 707 (7.2%) cases are meta-stable (i.e., with negative formation energy and distance to the convex hull smaller than 150 meV/atom).

Continuous representation for crystal structures
Although the crystallographic and chemical information are stored in the crystallographic information file (CIF), the discontinuous and heterogeneous formats are not a suitable choice of representation in a generative model, thus a continuous and homogeneous representation including both the chemical and structural information is required.
Following Ref. 27, the lattice constants and atomic positions are translated into the voxel space, followed by encoding 5 into a 2D crystal graph through autoencoder, as demonstrated in Fig. 1(a). In this way, a continuous latent space is constructed. The whole process is reversible, i.e., a random 2D crystal graph can be reconstructed into a crystal structure in real space, which is essential for a generative model. When applied to the Bi-Se database, 9420 of 9810 (i.e., 96%) crystal structures are successfully reconstructed. Such a high ratio is not caused by overfitting, illustrated by the learning curve of both autoencoders as shown in Fig. 1 (b) and (c). The differences of training set loss and test set loss are both negligible, suggesting that the 2D crystal graphs in the latent space contain adequate information to reconstruct crystal structures. This is confirmed by the diversity of twelve typical 2D crystal graphs in the latent space (cf. Fig. 1(d) for four cases). Please refer to Methods as well as Supplementary Information for more details.

Construction and prediction of DCGAN
Turning now to the realization of generative models, VAE 17 and GAN 18 are two most popular algorithms. VAE is a mutation of autoencoder discussed above, which assumes a specific (such as Gaussian) distribution of data (in our case 2D crystal graphs) in the latent space. To be generative, such a distribution function should be defined properly, i.e., to be consistent with the distribution of 2D crystal graphs of the known crystal structures in the latent space. In addition, the distribution function has to be specified prior to the training process and its form determines the performance of the generative model, which demands domain expertise in statistics and profound understanding of the input data. For instance, a recent work by Ren et al. used sigmoid function in VAE to design inorganic crystals and find structures that do not exist in training set 36 . Court et al. implemented VAE to generate crystal structures and successfully applied it to a broad class of materials including binary and ternary alloys, perovskite oxides, and Heusler compounds 37 . In contrast, GAN trains two mutual-competitive neural networks (i.e., generator and discriminator) to generate data statistically the same as the training data without assuming the distribution function 18 . That is, the discriminator tries to distinguish the generated data from the training data, while the generator attempts to fool the discriminator by generating data similar to the training data. Mathematically the objective of GAN is defined by the following equation: where D is the discriminator, G is the generator, E means expectation value, x is the original data, D(x) is the output of discriminator with x as input, p x is the possibility density function of the original data, while p g is the possibility density function of the generated data. Through the competition between the generator and discriminator, the distribution of generated data becomes hardly distinguishable with the distribution of training data. Compared with VAE, GAN does not require specifying a distribution function in the latent space (p g ), which makes the generation 7 process more robust, and thus is adopted in this work 19 . Such an implementation of DCGAN can generate crystal structures with a high success rate (defined as the ratio of the number of generated crystals over the number of generated 2D crystal graphs), e.g., 2832 crystal structures are transformed from 13000 generated 2D crystal graphs. The generated structures cover a large composition range as 8 shown in Fig. 2(b), where the red points denote the original data in the Bi-Se database and the gray circles mark the generated structures by DCGAN. One interesting observation is that the generated structures are mostly with negative formation energies, on average lower than that of the original database (Supplementary Figure 2

Training of constraint model
As mentioned above, the objective of inverse design is to design compounds with desired properties, including thermodynamical, mechanical, and functional properties. In this work, we take the formation energy (i.e., thermodynamic stability) as the target property. In order to be able to optimize the formation energy in the latent space, another convolutional neural network (CNN) model is trained taking 2D crystal graphs in the latent space as descriptors and formation energies as the output physical property. The training is carried out by a 90%/10% training/test-set ratio over the Bi-Se database, with the resulting R 2 score and mean absolute error (MAE) being about 9 85% and 0.019 eV/atom (Supplementary Figure 3(a)), respectively. Such a MAE is even smaller than that (0.021 eV/atom) obtained using the state-of-the-art CGCNN model 38 . That is, the 2D crystal graphs in the latent space can be considered as effective descriptors to perform forward prediction of physical properties. We note that overfitting is a marginal issue for such a high accuracy, as seen in the learning curve of the training process in Fig. 2(d). Details on the CNN model are described in Methods and Supplementary Information.

Construction and prediction of DCGAN + constraint
A straightforward way to implement the optimization of physical properties, e.g., formation energy in this work, is to apply an add-on constraint on the generated structures using DCGAN, as sketched in Fig. 2

Construction and prediction of CCDCGAN
The constraint can also be integrated into DCGAN as a back propagator, as illustrated as CCDCGAN in Fig. 3(a), to realize automated optimization in the latent space so that inverse design can be realized. In this way, the constraint gives rise to an additional optimization objective of the generator, which can be mathematically described as: where E f is the formation energy predicted by the constraint model, z is the generated 2D crystal graph, and ω is defined as the weight of formation energy loss. Note that such an additional optimization objective cannot outweigh 11 the primary objective, leading to lower weight for the formation energy loss (0.  To explore the full competence of the CCDCGAN model, we generated 100,000 crystal structures and performed 13 DFT calculation on the generated structures. The CCDCGAN model is able to regenerate most (11 out of 15) experimentally achievable phases. As seen from Fig. 3 are higher than those reported phases. It is suspected that this can be attributed to the constraints (i.e., unit cell dimension and number of atoms) we applied to select the training database, which will be extended in the future.

(c) and Supplementary
To further test the predictive capability of the CCDCGAN model, we deliberately removed the crystal structures of 4 specific compositions (i.e., BiSe 3 , Bi 2 Se 3 , Bi 2 Se, and Bi 3 Se) from the training database, and observed that the experimental crystal structures of such four phases can still be successfully regenerated, as shown in Fig. 3 (d) to (g).
This proves that the CCDCGAN model can generate crystal structures for unknown compositions, and thus it could accelerate the discovery of distinct crystal phases.

Discussion
We have developed an inverse design framework CCDCGAN consisting of generator, discriminator and constraint and successfully applied it to design unreported crystal structures with low formation energy for the binary Bi

Data for generative adversarial network
The

Continuous representation for crystal structures
Specifically, 3D voxel grids are used for a typical binary compound: two grids to record the atomic positions of two elements separately and the third one to store the lattice constants, i.e., the lengths and angles of/between them.
Hereafter such grids are referred as site voxel and lattice voxel, respectively. Using the probability density based on Gaussian functions, the lattice voxel is defined as where n is number of atoms and r is the real space distance between this grid point and the atom. In this way, the inverse transformation is trivial for the lattice voxel while that for the sites voxel relies on the image filter technique 27 .
Autoencoder is a type of artificial neural network which is able to encode inputs into low dimension vectors, as well as decode the vectors back 42 . In this work, the autoencoder is applied to encode 3D voxel data into onedimensional (1D) vectors in the latent space, which is realized based on 3D convolutional neural network (CNN) autoencoder. The loss function of autoencoder consists of two parts: the first is the loss of information, i.e., the difference between inputs and outputs, and the second is the regulation term to prevent over fitting, which yields where x and y are the input and output of the autoencoder, λ is regulation coefficient, and ω is weight in the 16 autoencoder 17 . The construction of autoencoder for both site and lattice voxel are the same, where both are trained using a 90%/10% training/test set ratio. Both site and lattice voxel autoencoders have high reconstruction ratio. All of them are considered to be identical by the crystal structure comparison subroutine available in pymatgen 43 , with fractional length tolerance being 0.2, sites tolerance 0.3, and angle tolerance 5 degrees.
For data transformation between real space and voxel space, the voxel used for atomic positions is 64×64×64, while the voxel for lattice parameters is 32×32×32 and the σ used in Gaussian function is always 0.15, function "gaussian_filter" from scipy package is used. 27 The autoencoders constructed by tensorflow in python. The sites autoencoder consists of 10 3D convolutional layers, i.e. 5 for encoder and 5 for decoder, strides except the first and last on are 2×2×2 and activation function is leakyRELU, Adam optimizer is used, learning rate is 0.003, λ is 0.000001, β 1 is 0.9 and β 2 is 0.99 27 . Similar, the lattice autoencoder consists of only 8 3D convolutional layers (4 for encoder and 4 for decoder), other parameters are same to the previous one 27 . Detailed design of these two models can be found in Supplementary Table 1 and 2.

Training of DCGAN
The left half of Fig. 2 (a) illustrates our implementation of DCGAN to generate crystal structures. Firstly, both the generated 2D crystal graphs by the generator and original 2D crystal graphs are fed into the discriminator, which is trained to distinguish such graphs. Afterwards the generator is further trained by the feedback from the discriminator through back propagation to generate structures more similar to the original structures. By repeating the processes, the generator will evolve to be able to eventually generate structures that are indistinguishable (by the discriminator) to the original structures. Details to construct DCGAN can be found in Methods and Supplementary Information.
The DCGAN model consists of generator and discriminator, they both use Adam optimizer with 0.002 learning rate, β 1 is 0.5 and β 2 is 0.999. 2D convolutional layer, dropout and batch normalization are used in both model, while 17 RELU, leakyRELU, tanh and sigmoid are used as activation functions. 1,000,000 steps are trained for this model for 16 hours on Quadro P2000 GPU. Design of generator and discriminator are listed in Supplementary Table 3 and 4 respectively. Latent space of the GAN model has 200 dimensions.

Training of constraint model
The constraint model consists of 4 convolutional 2D layer and 6 fully connected layer, conducted by keras 44 . It also uses Adam optimizer, learning rate 0.002, β 1 is 0.5 and β 2 is 0.999, optimization loss is mean square error, activation function is leakyRELU. To prevent overfitting, dropout and batch normalization are used after each convolutional layer 45 . Detailed design of this model is described in Supplementary Table 5.

Training of CCDCGAN
The generator, discriminator and constraint have exactly the same design as the previous model, all parameters are same as parameters in DCGAN. The only difference is the optimization objective becomes Eq.  Table 6, more data is available in Supplementary Data 2.

Data Availability statement
All data needed to produce the work are available from the corresponding author.

Code Availability statement
CCDCGAN code is available upon reasonable request from the corresponding author.