Optimization of physical quantities in the autoencoder latent space

We propose a strategy for optimizing physical quantities based on exploring in the latent space of a variational autoencoder (VAE). We train a VAE model using various spin configurations formed on a two-dimensional chiral magnetic system. Three optimization algorithms are used to explore the latent space of the trained VAE. The first algorithm, the single-code modification algorithm, is designed for improving the local energetic stability of spin configurations to generate physically plausible spin states. The other two algorithms, the genetic algorithm and the stochastic algorithm, aim to optimize the global physical quantities, such as topological index, magnetization, energy, and directional correlation. The advantage of our method is that various optimization algorithms can be applied in the latent space containing the abstracted representation constructed by the trained VAE model. Our method based on latent space exploration is utilized for efficient physical quantity optimization.

www.nature.com/scientificreports/ To generate the labyrinth spin configurations, we implement the two-dimensional magnetic system using a square lattice model composed of 128 × 128 grid sites and consider a simple Hamiltonian model, where J is the exchange interaction parameter, − → D ij is the Dzyaloshinskii-Moriya (DM) interaction vector 20,21 , and − → S i is a Heisenberg spin on the i-th grid site. The exchange and DM interaction parameters are fixed to J = 1 and − → D ij = 0.3 . The system was carefully studied, so that we can investigate and interpret the results from our algorithm by relying on those of previous studies [20][21][22][23][24] . The dataset is generated by an SA process, which has been used to simulate a temperature annealing process of magnetic systems in several previous studies 5,19,25 . Since the detailed magnetic textures are determined by the spontaneous symmetry breaking process, they can have countless different metastable states in a fixed condition. Using this process, we generate a training dataset consisting of 30,000 labyrinth spin configurations, which is called X IN in this study, and some samples are shown in Fig. 1a.
Since our purpose is to find optimization solutions, it is important that the training data are sampled from entire data distribution. The SA process is performed by heat bath algorithm in Monte-Carlo simulations 26 , a carefully studied method. This method can generate a variety of samples due to the contribution of random numbers of it.
Training VAE and generating new data. To extract the appropriate features implied in the labyrinth spin configurations, we construct the VAE network as shown in Fig. 1a. Specifically, the encoder of the VAE compresses X IN into two groups composed of N numbers each ( N = 100 in this study). These are used as the mean ( µ ) and standard deviation ( σ ) values to construct N normal distributions, and one number is sampled from each of the normal distributions ( N numbers are sampled in total). In general, the set of sampled numbers, z , is called the latent code; z is considered an N-dimensional vector represented in the latent space. Through the decoder, z is decoded into an output spin configuration, X OUT .
We train the VAE network using our spin configuration dataset. In other words, the network parameters are modified to minimize a loss function, L VAE , shown in Eq. (1), where L RC is the reconstruction loss term, L KL is the Kullback-Leibler (KL) loss term, and β is the coefficient of L KL 27 . The two loss terms, L RC and L KL , are expressed in Eq. (2), where S IN,i,α and S OUT,i,α indicate the α component of spins at the i-th grid site of X IN and X OUT , respectively. µ n and σ n are the n-th components of µ and σ , respectively.
(1)  www.nature.com/scientificreports/ To minimize L RC , the VAE model produces X OUT , which is similar to X IN . L KL measures how a given probability distribution is different from a reference probability distribution. In this study, the purpose of L KL is to make the probability distribution of z equal to the standard normal distribution; as L KL decreases, the values of σ n and µ n approach 1 and 0, respectively. β is chosen to be 0.001, as described in the Methods section.
After the training process, we can generate various spin configurations by decoding the latent codes that are sampled from the standard normal distribution. The generation process is illustrated schematically in Fig. 1b, and the spin configurations generated using this process are called X VAE s in this study. The X VAE s have similar characteristics to the spin configurations in our training dataset. This indicates that the VAE network is welltrained to learn the magnetic characteristics implied in the training dataset, such as the spin profiles, length scale, and chirality.
There are several nodal points in the X VAE s, which are visually unfeasible and not found in the training dataset. We can also quantitatively evaluate the feasibility by calculating the energy of X VAE since the energy is the generating rule of the training dataset. The energy density of nodal points is higher than that of other plausible pixels, as indicated by the energy peak points in Fig. 1c. Unstable points are inevitable from the direct interpretation of the sampled z , since the latent space is continuous while the stable structures are topologically separated. This problem of directly applying VAE to a topologically discrete structure was reported in a previous study 10 .
Latent space exploration algorithms. To generate physically plausible data or to obtain the optimized structure using the trained VAE decoder, we attempt three optimization algorithms exploring the latent space. The first is a simple optimization algorithm, which is called the single-code modification algorithm in this study. The algorithm pseudocode is shown in Fig. 2a. We prepare a latent code, z * , composed of N trainable variables and initialize it with z * = z , where z is randomly sampled numbers from the standard normal distribution. z * is decoded into a spin configuration using the trained decoder and evaluated with its energy. The Adam optimizer 28 modifies z * to minimize the spin configuration energy. This process is repeated until the energy stabilizes. The main idea of the single-code modification algorithm is to search the latent space to find a better solution near the position indicated by the selected initial latent code. Thus, the states generated by the single-code modification algorithm are locally optimized states in the latent space, and they are not uniquely determined.
If we want to find a state representing a global optimum state, the whole latent space should be searched. To utilize the idea for the general optimization problem, we apply a genetic algorithm instead of the single-code modification algorithm. The genetic algorithm is one of the conventional optimization methods imitating biological evolution. In the algorithm, there is a population composed of many individuals, and the population evolves to optimize an objective goal. Every individual consists of their genes. Individuals are evaluated with the objective function and selected to pass on their genes to the next generation with a probability calculated from the evaluation. Crossover and mutation operations are performed to combine the genes of the selected individuals and to maintain the genetic diversity, respectively. The pseudocode of the genetic algorithm used in this study is shown in Fig. 2b. We consider z * as an individual and every number z * i of z * is considered its gene. Our population is composed of 100 individuals, z * s, which are randomly initialized. In every iteration of the genetic algorithm www.nature.com/scientificreports/ process, z * s are decoded and evaluated with an objective function. In this study, we use several quantities that can be calculated from a spin configuration as the objective function, which is discussed in the Results section. The third algorithm, which is the stochastic algorithm, is illustrated in Fig. 2c. The algorithm is similar to the single-code modification algorithm but is designed for global optimization. In this case, we update µ * and σ * instead of z * , which denotes the mean and standard deviation values of normal distributions. µ * and σ * are each composed of N trainable variables, and they are initialized with 0 s and 1 s, respectively. In every training iteration, hundreds of z * are sampled from N normal distributions constructed using µ * and σ * values, and the z * s are decoded into output spin configurations using the trained decoder. They are evaluated with a given objective quantity, Q , similar to the case of the genetic algorithm. The Adam optimizer updates the µ * and σ * to minimize the function, L = �Q� + γ L * KL , where L * KL is a term similar to the KL loss term shown in Eq. (2) except that it is calculated using µ * and σ * , and γ is the coefficient of L * KL . L * KL is adopted to control the diversity of the z * s, and it is expected that this diversity can prevent the distribution from being trapped in local extremum points in the latent space. As the iteration progresses, we gradually decrease γ so that the distribution slowly converges to the optimal solution in the latent space.

Results
Single-code modification algorithm. The purpose of this algorithm is to generate new structures that have similar characteristics to those in the training data, and Fig. 3a shows a schematic illustration of the algorithm. As described earlier, X VAE s decoded from z s have many nodal points that are not included in the training data, which means that direct sampling from the standard normal distribution is not enough to generate physically plausible states. Hence, we perform the single-code modification algorithm to search the better positions near z s, which can be decoded into energetically plausible states. Figure 3b shows some examples of X VAE and X ′ VAE , where X ′ VAE represents the modified spin configuration after applying this algorithm. As indicated by the black arrows, the spin structures around the nodal points are revised to stable structures; the nodal points are removed. Figure 3c shows the distributions of energy density, ε , of X IN , X VAE , and X ′ VAE . The distribution of X ′ VAE is similar to that of X IN , but there is a clear difference for the case of X VAE . This also indicates that we can generate spin configurations analogous to X IN not only structurally but also physically and energetically using the single-code modification algorithm.
Of course, it is possible to incorporate the energy of X VAE s in the loss function of VAE so that X VAE s are forced to be energetically stable. A previous study 10 shows that this approach is available at generating feasible and energetically stable samples in a magnetic system. Genetic algorithm. Genetic algorithms are known as efficient optimization methods, but their effectiveness is strongly dependent on how we define the individuals and genes in a given system. In this study, a latent code, which is the abstracted information of a spin configuration, is considered an individual, and the components of the latent code, which characterize the properties of the spin configuration, are considered the genes. By doing so, we perform genetic algorithm in a compressed space of R 100 rather than the original space R 128×128×3 , reducing computational cost.
A schematic explanation of this algorithm is shown in Fig. 4a. We start from 100 individuals initialized with random numbers sampled from the standard normal distribution. To determine the probability that an individual will be selected, the spin configuration decoded from the individual is evaluated by a certain objective quantity. The mutated gene, an instance of an offspring, is sampled from the Gaussian distribution, so that the offspring keeps in the learned distribution by the VAE, following the general features of training spin configurations.  Figure 4b,c,e,f show the best individuals of each generation evolved by the genetic algorithm to maximize W , maximize M z , maximize C , and minimize ε , respectively.
As shown in Fig. 4b, the spin configuration is changed from the labyrinth structure to the skyrmion structure 29 by maximizing W . The spin configuration of the best individual in the last generation almost forms a regular lattice of skyrmions, and it is a noticeable result because the skyrmion lattice structure is not included in the training set of X IN s; the skyrmion lattice structure is one of the possible metastable states of the system, but it is statistically impossible to generate it using an SA process due to the existence of numerous metastable states of the system. This means that the VAE model has learned the general features of the system, such as the chirality, the width of magnetic domains, and sinusoidal profiles of local spin structures, and a set of optimal features to maximize W is properly searched through the genetic algorithm in the latent space of the trained VAE.
Interestingly, a similar skyrmion structure is obtained by maximizing M z , as shown in Fig. 4c. In numerous studies 24,30-32 , it is commonly observed that the labyrinth structure becomes a magnetic skyrmion configuration when an external field is applied to the out-of-plane direction of the two-dimensional magnetic system. Since the out-of-plane external field usually increases M z to reduce the Zeeman energy, it is reasonable to show similar structures as the result of maximizing M z and W in our methods. Figure 4d shows the distributions of W values calculated from the spin configurations evolved by the genetic algorithm to maximize W and M z for each. We find that the W values from both optimizations are increased compared with those from the initial generation.
It is also noticeable that the optimization is performed within the bounds of the system, producing feasible and realizable results as commonly observed in experimental research 24,31 . Since W counts the number of skyrmion structures in a spin configuration, breaking the spin structures into smaller structures is one way to maximize W , yet this is not compatible with the Hamiltonian of the system as well as not feasible in experimental observations. M z can also be maximized by generating a single out-of-plane domain along the z direction, but this is also incompatible with the Hamiltonian. Although these are very intuitive methods for maximizing W and M z , our method takes more suitable approaches to the Hamiltonian of the system during the optimization processes, as shown in Fig. 4b-d. This shows that our method is a physically plausible optimization method working within the constraints of a system.
By maximizing C , we obtain a well-aligned stripe pattern as shown in Fig. 4e; the stripes are aligned along the direction of a . A well-aligned stripe pattern is also generated by minimizing ε , as shown in Fig. 4f. Similar to the skyrmion cases discussed above, it is also noticeable that the results of both optimization processes, maximizing C and minimizing ε , are the well-aligned stripe patterns because there are no such spin configurations in our www.nature.com/scientificreports/ training dataset. The stripe pattern is the ground state of this system 19 . Nevertheless, generating the ground state through an SA process is significantly difficult due to the existence of countless labyrinth patterns that are the metastable states of the system. The ε of both results, obtained by maximizing C and minimizing ε , decrease, as shown in Fig. 4g. Considering our method can search out the well-aligned stripe patterns that are not given in the training process, we believe that our method can be utilized as a novel approach to finding uncharted ground states of various systems. As discussed in "Training VAE and generating new data" section, energy is a proper indicator to evaluate the feasibility of spin configurations. Therefore, it can be interpreted that we ranked the offspring based on the feasibility while optimizing the energy with the genetic algorithm.
Our results show that VAEs are advantageous for applying genetic algorithms to optimization problems. Results of the algorithm are optimal solutions following the general rules of the system. Various metrics, such as discriminator networks of GANs or pre-trained classifiers, can be adapted for those purposes.
Stochastic algorithm. In this algorithm, there is a Gaussian distribution from which the latent codes,z * , are sampled, and the mean, µ * , and standard deviation, σ * , of the distribution are updated by the Adam optimizer to optimize an objective quantity. For each optimization iteration, we sampled 100 z * s from the distribution and decoded them to calculate an objective quantity similar to that of the genetic algorithm. As the algorithm progresses, the distribution gradually converges to a location in the latent space of the trained VAE, as schematically shown in Fig. 5a. We repeat this algorithm until the distribution converges sufficiently to a location and found that the latent codes sampled around the location can be decoded as the optimal solutions to given optimization problems.
Consequently, the skyrmion structures and perfectly aligned stripe pattern are obtained using this algorithm for the M z maximization and ε minimization objectives, respectively, as shown in Fig. 5b,c. Compared with the results from the genetic algorithm, we find that the spin configurations obtained using this stochastic algorithm are more realistic in the visual inspection; in Fig. 5b,c, there are no nodal points and wiggles which are shown in the images for the last generations in Fig. 4b,c,e,f. Quantitatively, the maximum energy density of Fig. 5b is approximately −1.88 , which is much lower than the energy peak ( ∼ −0.11) in Fig. 1c. Note that we do not give any bias for reality or energetic stability in the optimization process.
Since VAEs model the given data, it is known that interpolation using VAEs works successfully providing reasonable results 33 . Our optimization results can also be understood as combinations of training data. For example, the skyrmion structures (see Figs. 4b,c, 5b) might be considered combinations of many training data that occasionally contain a few skyrmions. Nevertheless, as discussed in "Genetic algorithm" section, the skyrmion structure is outside the given data, not only in a visual inspection but also in physical quantities. Figure 5f shows the M z and W of spin configurations. We found that the relation between M z and W of X IN is maintained in our optimized solution (b), even though it is an extreme situation. We confirm that it is possible to obtain desired optimal solutions using the stochastic algorithm, keeping the plausibility with the help of the VAE trained on samples.
We investigate whether this algorithm can be applied not only to a system with periodic boundary conditions but also to a geometrically confined system with a specific boundary shape. For this purpose, the objective quantity is set to be the correlation between a decoded spin configuration and the specific boundary condition map shown in Fig. 5d, C BC = �S BC (r) · S(r)� , where S BC indicates the boundary condition. The boundary condition is illustrated in Fig. 5e. We perform an optimization process using this stochastic algorithm to maximize C BC . Since the spins in the boundary condition map are located only at the round narrow strip, maximizing C BC presents geometrical confinement on the decoded spin configurations as shown in Fig. 5d. We confirmed that the spin configuration is in an energetically stable state, which means that this stochastic algorithm is applicable to find physically plausible states in various systems regardless of boundary conditions. www.nature.com/scientificreports/ The idea of exploring the latent space of a VAE for generating optimized samples of a distribution given cost functions is innovative and promising as our results demonstrate. Our methods, single-code modification, genetic, and stochastic algorithms, are generally applicable if you can train deep generative models that provide compressed representations of the real world. Given the complexity of the distribution of the problem, the number of instances may increase substantially, which needs to be considered. Also, it must be noted that the success of optimization with trained deep generative models can be applications dependent. For image generation and optimization, the success might be more expected because there has been much research and development. For other systems, it can be challenging to train reliable VAEs and obtain optimal solutions. If the trained deep generative model does not contain the desired optimal solution, to find the solution in the latent space is hardly expected. It is widely known that prescribed models such as VAEs have better data coverage than implicit models such as GANs 34 . We recommend choosing and training the deep generative model considering its performance in terms of data coverage.

Conclusion
In this study, we trained generative model, the variational autoencoder (VAE), to extract the features of a twodimensional magnetic system. We implemented three algorithms, the single-code modification algorithm, genetic algorithm, and stochastic algorithm, to solve various optimizing problems by exploring the latent space of the trained VAE. The single-code modification algorithm obtains a metastable spin configuration by locally optimizing the energy minimization problem. The genetic algorithm and stochastic algorithm are applied to various global optimization problems, such as minimizing energy and maximizing topological numbers. We confirmed that, even though the optimal solutions are not given explicitly in the VAE model training process, the optimization algorithms successfully determine the optimal solutions in the latent space regardless of the boundary conditions of the system. We suggest that our method based on latent space exploration can be applied to various other systems to optimize their properties.

Methods
The network structure of VAE. The encoder and decoder of the VAE consist of fully connected neural networks (FCNNs) and convolutional neural networks (CNNs). Figure 6 shows the details of our architectures. The encoder is composed of four CNN layers with 16, 32, 64, and 128 filters and two FCNN layers with 512 and 200 neurons. After all CNN layers of the encoder, max-pooling layers with a 2 × 2 pooling size are adopted. The decoder is composed of two FCNN layers with 512 and 8,192 neurons, the reshape layer, CNN layers with 64, 32, 16, 3, 3 kernels, and the L2-normalization layer. Upsampling layers with 2 × 2 upsampling sizes are applied before all CNN layers of the decoder, except the last CNN layer. Batch normalization layers and leaky-ReLU activation layers are applied immediately after all FCNN and CNN layers, in both the encoder and the decoder, except for the last layer of each. Since the dataset is generated under the periodic boundary condition, periodic padding is applied immediately before all CNN layers.
Detailed VAE training process. We used 30,000 spin configurations as a training dataset of VAE and used 5000 spin configurations as a valid dataset during training. The VAE is fed a batch of data at a time, where the batch consists of 100 spin configurations. The Adam optimizer with the learning rate 10 −3 modifies the network parameters to minimize L VAE every time the batch goes through the VAE. The β 1 and β 2 of the Adam optimizer are fixed at 0.9 and 0.999 , respectively. The training progresses for 100 epochs.
To select the appropriate coefficients of L KL , β , we trained four VAE models with β values of 0.1 , 0.01 , 0.001 , and 0.0001 . VAEs with β larger than 0.001 are excluded because of the posterior collapse problem 35 . We need a sufficiently compressed latent space so that the standard normal distribution represents the whole latent space. Thus, we chose the VAE of β at 0.001 , which represents a latent code closer to the standard normal distribution.
Single-code modification algorithm. The Adam optimizer with the learning rate 10 −2 is used for the single-code modification. Other hyperparameters are identical to the Adam optimizer used for VAE training. The single-code modification algorithm is repeated for 10,000 iterations. www.nature.com/scientificreports/ Genetic algorithm. The genetic algorithm starts with 100 latent codes sampled from a standard normal distribution. Roulette wheel selection, uniform crossover, and random reset mutation operations are sequentially applied to construct a single new individual. The roulette wheel selection is applied to choose two parents 36 . We give the selection pressure differently depending on the objective quantity, as the selection pressure is a hyperparameter of the genetic algorithm. The selection pressure is given by 8 to optimize W and M z , and by 5 to optimize C and ε . Uniform crossover is applied, which means that each gene z * i is selected randomly from one of the corresponding genes of the parents 37 . Random reset mutation is applied so that each gene has a probability of being replaced by a new number generated from the standard normal distribution 38 . The mutation error, the probability that a gene will be mutated, is chosen by 0.01 . The process is repeated for 10,000 iterations.
Stochastic algorithm. An Adam optimizer with the same hyperparameters as used in the single-code modification algorithm is used for the stochastic algorithm. γ starts at 1 and is gradually decreased to a tenth after every 10,000 iterations. The process continues until σ * approaches zero so that the sampled latent codes are all the same.