Estimating the effective fields of spin configurations using a deep learning technique

The properties of complicated magnetic domain structures induced by various spin–spin interactions in magnetic systems have been extensively investigated in recent years. To understand the statistical and dynamic properties of complex magnetic structures, it is crucial to obtain information on the effective field distribution over the structure, which is not directly provided by magnetization. In this study, we use a deep learning technique to estimate the effective fields of spin configurations. We construct a deep neural network and train it with spin configuration datasets generated by Monte Carlo simulation. We show that the trained network can successfully estimate the magnetic effective field even though we do not offer explicit Hamiltonian parameter values. The estimated effective field information is highly applicable; it is utilized to reduce noise, correct defects in the magnetization data, generate spin configurations, estimate external field responses, and interpret experimental images.


Results
Preliminaries of dataset and training. We select magnetization datasets containing several varieties in the spin configuration but with a certain rule that can be learned by our network. For this purpose, the magnetic labyrinth configurations of the two-dimensional magnetic system are used in this study. Magnetic labyrinth configurations, as shown in Fig. 1a, have variety in their shape, but the structures are in their local energy minimum state, and thus, they are energetically and topologically stable. The local magnetic moment is aligned along the effective field. The strength of the magnetic moment in the structure is a constant, whereas the strength of local effective fields varies spatially. The effective field strength is dominantly determined by the exchange interaction, but small spatial variances exist due to the Dzyaloshinskii-Moriya interaction (DMI) and the detailed labyrinth structure. In Fig. 1a, we can see how the effective field strength varies spatially. Therefore, the strength of effective fields is the hidden information that cannot be directly obtained from the spin configurations. In our study, we use a system that includes exchange interactions and DMI, and we train the network to infer the effective field from these two interactions. If our network is applied to a system that has additional energy contributions, such as Zeeman energy or weak anisotropy, their effective field contributions can be added to correct the inferred effective field. If we use a dataset containing additional energies explicitly, it is also possible to train the network to estimate the effective fields from them.
The properties of the labyrinth magnetic structure have been extensively investigated, both numerically and experimentally [22][23][24]29,30 . With a theoretical model, it is possible to calculate the effective field from the spin structure and to generate a magnetic structure with Monte Carlo simulation. Therefore, the magnetic labyrinth configuration provides a model system for evaluating the trained network and checking whether the network can estimate the physically plausible effective field from the structures. Details of dataset generation are explained in the "Methods" section.
In this study, we use an FCN to estimate the effective field from the spin configuration. Figure 1b shows the schematic network training workflow. We feed the spin configurations from the simulation as the input, and the FCN is trained to estimate the effective field. An FCN can derive output from the input image of any size, even if it is trained with data of a specific size. Thus, we use the FCN as our network, and it can be applied to estimate the effective field from the spin configuration with data of any size. Due to these properties of the FCN, we can apply our network to the magnetization image from experiments as well as data generated by simulation annealing. Details of the network structure are discussed in the "Methods" section.
Characteristics of the trained network. During the network training process, the training loss and validation loss are decreased to the order of 10 −5 (Fig. 2a). We first investigate the training results of the deep learning algorithms. This is done by estimating the effective fields from the spin configurations in four randomly chosen samples from the test dataset and analyzing the ratio between the true effective fields F x,y, or z and estimated effective fields F * x,y, or z , with the subscript denoting the x, y, or z components of the fields. In Fig. 2b, we see that F x,y, or z and F * x,y, or z have a strong linear correlation. The effective field information obtained from the trained network provides expanded information on the spin structure, and it can be manipulated to recover or evolve the spin structure. When a new spin structure is obtained from the effective field, the trained network can be used to infer its effective field again. To apply the effective field information, we use the recursive process presented in Fig. 3a. The recursive process is composed of feeding the input spin configuration to the trained network, generating a new spin configuration from the output effective field, and refeeding the spin configuration as a new input of the network. First, the trained network provides the estimation of the effective field, and then, the effective field is modified according to the application necessity. Additional fields such as external fields or fluctuations, which are not considered in the training www.nature.com/scientificreports/ dataset, can be included in the estimated effective field. With effective field information, a new magnetization map is generated by an evolutionary method. In a statistical study, magnetic moments can be sampled by thermal distribution. In a dynamic study, they can be evolved using equations of motion such as the Landau-Lifshitz-Gilbert (LLG) equation so that they precess around the effective field inferred by our network. In our discussion, we use a spin evolution method where the magnetic moments are immediately adjusted to be parallel with the effective field in a step (greedy method) because it is the simplest method for evaluating our network. Details of the recursive process are explained in the "Methods" section. Figure 3b shows how the magnetic energy evolves with the simple recursive process in which no field modification is used and the greedy method is applied to the magnetic moments. Energy is calculated through the dot product of the spin vector and the effective field. When the energy calculated from the true effective field is compared with the energy calculated from the estimated effective field, the accuracy is approximately 99.95%. Although the network is only trained to estimate the effective field from the spin configuration, we find that the initial spin configuration can evolve to a lower energy state during the recursive process. Some truncated magnetic structures are connected, and some connected structures are separated during a recursive process, resulting in the total energy being lowered. Although there are topological energy barriers indicated by the energy peaks in Fig. 3c, transitions among metastable states appear in the recursive process. www.nature.com/scientificreports/ The reorganization of the magnetic structure is a notable result. In general, changing the topological structure requires a significant amount of energy, as each metastable state is located at a local energy minimum. Thus, considering that the training is only performed with thermal-fluctuation-less structures, it is interesting that escaping the local minimum state naturally occurs in the FCN's recursive process. We speculate that the reason for this phenomenon is because we train to estimate the effective field from various metastable spin configurations, and the estimated value is not completely accurate. So, it can reflect the general feature of the group of metastable states. Therefore, spin configuration can be changed to another plausible state during the recursive process, passing energy barriers among metastable states. In case that we apply our network to the spin systems without global stable states due to frustrations, we expect that various metastable states can be searched during the recursive process.
The initial attempts at the recursive process show that the network suitably learns the general properties of the spin configurations in the training. It tends to remove atypical features in the spin configurations and fix them to have general features learned in the training. These characteristics enable us to apply the network to correct or modify spin configurations. In the following sections, we show several application methods, which fully exemplify the advantages of these aforementioned characteristics.
Application: noise removal and defect correction. One possible network application is denoising, a field in which artificial intelligence is efficiently utilized 31,32 . To see if our network can be effective for this purpose, we intentionally injected random noise and defects in the spin configuration into our datasets. Random noise was injected into the spin configurations using Ŝ ′ = L 2 Ŝ + αR , where Ŝ ′ and Ŝ are the noisy and noiseless spin configurations, respectively R is a unit vector map randomly oriented in any direction, α is the coefficient for varying the amplitude of the random map, and L 2 is the L2-normalization process. The representative case of α = 2.5 is shown in the leftmost column of Fig. 3d. When we feed the noisy spin configuration into the trained network, the noise is almost instantly removed within a few iterations; the energy decrease indicates that the noise has been removed.
We also intentionally place defect sites in the spin configuration dataset. The process of injecting defects involves erasing the magnetization information in a specific region of the spin configuration. We use two types of defects: Defect I is made by erasing the middle rows of the data and adding random unit vectors in the erased part (center column of Fig. 3d), whereas Defect II is made by simply erasing a square-shaped center region of the data (rightmost column of Fig. 3d). When we feed the defect-containing-spin configuration into the trained network, the defect regions in the spin configurations are reconstructed such that they show plausible spin configurations.
The recursive process of our trained network also lowers the spin configuration energy, as shown in Fig. 3e. The energy decrease is achieved by removing noise and reconstructing defects. From these results, we clearly www.nature.com/scientificreports/ observe that the trained network is capable of outputting plausible effective fields that are used to construct a spin configuration even when the input magnetization map does not contain complete information. The output result is built to have lower energy and hence becomes one of the most plausible states based on the training set information.
Application: extraction of hidden information from experimental data. Given that the trained network has the characteristic of estimating the effective field from the spin configuration without full information, we feed simulated test data that contain only one magnetization vector information component. In Fig. 4a, we see that the network successfully estimates all components (x, y, and z) of the effective field even when the input data contain only one (z) spin configuration component. This capability of our network is fully exhibited when applied to experimental data. Most magnetic microscopy techniques, such as STXM and MOKE microscopy, only provide one axial spin component; thus, it is necessary to infer other directional components from it. To prove the capability, our network is applied to actual experimental data where only one magnetization component is measured. Figure 4b and c shows the results when the network input data are experimental magnetic domain images of a [Pt(3 nm)/GdFeCo(5 nm)/MgO(1 nm)] 20 multilayer system. Detailed information about the experimental environment is given in a previous study 33 . The magnetic domains shown in Fig. 4b and c are observed using STXM and MOKE, respectively.
We note that the effective field or the in-plane magnetization inferred by our network is valid if the Hamiltonian used for the training data is applicable to the experimental system. Therefore, the method can be suitably applied in cases where the certain theoretical model is known but the experimental data do not provide the entire information. In our case, all three components of the effective fields are well estimated by the network. Although the experimental data are unnormalized, the image size is different from the training data, and only one axial spin component is given. The Hamiltonian used in our training includes the interfacial DMI, typical of Pt/GdFeCo/MgO multilayers 33 . As a result, we see in-plane components from the effective field (Fig. 4b, c), as we expect in the systems where the interfacial DMI induces Neél-type domain walls. Application: generative model. The trained network in this study also has the potential to generate new spin configurations as a generative deep learning model, as shown in Fig. 5a. Details of the generation recursive process are given in the "Methods" section. When we feed a random spin map to the network, the output data become a plausible spin configuration within a few recursive iterations. Figure 5b shows that if we feed a different random map to the network, another spin configuration is output. From these results, the trained network can be considered a generative model that generates the different spin configurations whenever different random maps are seeded.
We compare spin configurations generated by several generative methods: the Monte Carlo (MC) method, the greedy method, and spin configurations generated by our trained network. In the MC method, we generate the spin configuration by dropping the temperature from higher than the Curie temperature to zero. In Fig. 5c, the spin configuration, such as the ones we used to train the network, is only generated after thousands of iterations when using the MC method. The greedy method is a model that generates spin configuration with the MC method in a state where the temperature is 0. In the greedy method, the result of the final iteration (Fig. 5d) shows multiple skyrmions. Not only is the physical result different in this case, but the energy of this multiskyrmion state is higher than the data of the spin configuration dataset that we used to train the network.
To quantitatively investigate whether the spin configurations generated by a recursive process are physically plausible, we compare the energy of the resultant states of this process with those of the greedy and MC methods (Fig. 5e). The energy values from the recursive process are minimized in a few iterations. In contrast, the greedy and MC methods require thousands of iterations to generate a sufficiently minimized energy state. This clearly shows that the generation method using the recursive process proposed in this study can generate a new metastable spin configuration with a much lower computational cost. Additionally, since we use the FCN, we can generate a new spin configuration of any size by feeding a random map of the desirable size other than 128 × 128 (size used for training) into the network. This again exemplifies the advantage of our network as a spin configuration generator.
Application: addition of external fields. Experimentally, it is well known that when an appropriate outof-plane (z-direction) external field is applied, the labyrinth spin configuration changes to magnetic skyrmions before all magnetic moments become uniformly aligned when the out-of-plane field is further increased [34][35][36][37] . Since the trained FCN estimates the effective field without any external field, we can include additional fields in the recursive process to observe how the additional external field modifies the original structure.
We add the external field in the field modification step in the recursive process such that the total effective field to produce the magnetization map in the next iteration becomes � F ′ = � F * + Hẑ . The other type of effective field, such as the anisotropy field for weak anisotropy energy or the Langevin field for thermal fluctuation, can be added similarly when necessary. Details of the field addition in the recursive process are given in the "Methods" section. We use a labyrinth spin configuration as the initial state. As shown in Fig. 6a, the labyrinth structure starts to break into smaller domains when a field of H z,ext = 0.03 is applied. At H z, ext = 0.05 , the skyrmion spin configurations appear. When the field is further increased, the skyrmion configuration gradually disappears ( H z, ext = 0.07 ) and becomes gradually saturated out-of-plane ( H z, ext = 0.09 ). To confirm that these results are reasonable, we similarly apply the external field with the MC method (Fig. 6b). In the MC method, we find the spin configuration as the temperature is decreased from above the Curie temperature to zero while applying an external field. The results of applying the field to the trained FCN and the MC method are interestingly similar. Figure 6c shows the magnetization as a function of the external field. We see that two graphs from our FCN and www.nature.com/scientificreports/ MC methods show almost identical field-dependent magnetizations. During the training process, the network is trained only to estimate the effective field without any external field; we confirm that adding external fields to our method can generate physically plausible states.

Conclusion
We devised a novel method based on a deep learning technique to estimate the effective field information of spin configurations. An FCN was trained using various spin configurations generated by a simulated annealing process. We confirmed that the trained network can estimate the effective fields of input spin configurations even though we did not offer the explicit Hamiltonian parameters that are used in the data generation process. Through the recursive process introduced in this study, we found a surprising feature of the trained network: it prefers to make the output spin configurations more stable or more plausible than the input spin configurations. We utilized useful features to devise several application methods for various purposes, such as noise reduction, correcting defects, estimating external field responses, and inferring the hidden information of underinformed experimental data. Generating plausible spin configurations with a less computational cost is also a possible application of the trained network, as presented in this study. We believe that the interesting properties and various applicability of our method can be adopted as novel numerical methods in many other scientific research areas.

Methods
Dataset generation. The dataset is chosen to evaluate whether the network structure can properly estimate the effective fields from the spin configurations. The input data should be well characterized under certain conditions, while they should have a variety of structures. Therefore, in this study, we generate magnetic labyrinth configurations as a dataset. They have been extensively studied in two-dimensional magnetic systems due to the potential for new spin device applications, and it is well known that a phase transition to a skyrmion structure occurs with an external field. These properties provide advantages for evaluating our network.
To implement two-dimensional magnetic systems, we use the Heisenberg spin model in a square lattice system of 128 × 128 size. The magnetic labyrinth configurations are generated under the Hamiltonian shown in Eq. (1), where S is a normalized spin vector, J is an exchange parameter, and D ij is a DMI vector. i and j represent the spin sites index, and the summation is on every nearest pair site. The ratio between J and | � D ij | determines the length scale of the magnetic structure, and we choose it at J/| � D ij | = 1/0.3 to have enough structure in a simulation size. The effective field of the spin configuration is also obtained from Eq. (1), � F = − � ∇ � S H. A simulated annealing process is used to generate various labyrinth spin configurations; the temperature of the system is gradually decreased from above the Curie temperature to zero temperature. The total number of generated data points is 30,100, and we divide it into three subdatasets: training, validation, and test. These three datasets are composed of 25,000, 5000, and 100 spin configurations, respectively. Network structure and loss function. The goal of this study is to devise an algorithm for estimating the effective fields from the spin configurations using the deep learning technique. We construct a neural network structure to obtain the effective field from the input spin configuration. The structure is similar to an autoencoder that has an encoder and a decoder. The encoder, composed of four FCN layers with 8, 16, 32, and 64 filters, abstracts the spin configuration. The filter sizes are 3 × 3 . Since our spin configuration dataset is generated under the periodic boundary condition, we add a periodic padding process in front of all FCN layers to train with the same conditions. After every FCN layer in the encoder, we attach the batch normalization layer, rectified linear unit (ReLU) activation, and max-pooling layer whose pooling size is 3 × 3 . The decoder decodes the abstracted information into the effective field. It is constructed using four upsampling blocks, and the single block is composed of both an upsampling layer with a 2 × 2 filter and an FCN layer with a 3 × 3 filter. The number of filters for the FCN layers in upsampling blocks are 32, 16, 8, and 3 for each. After the decoder, we add one more FCN layer, the last FCN layer, with three filters. The periodic padding process we use in the encoder is added in front of all FCN layers in the decoder. The batch normalization layer and ReLU activation are attached after all FCN layers in the decoder except for the last FCN layer. The input and output data dimensions are the same as [400, 128, 128, 3]; the input data are hundreds of spin configurations generated under the Hamiltonian shown in Eq. (1), and the output data are hundreds of two-dimensional vector maps composed of three-dimensional vectors.
We want to train our network structure to make the output vector maps become the effective fields of input spin configurations. Therefore, the mean squared error (MSE) � F − � F * 2 is used as a loss function, where F is a true effective field and � F * is the estimated effective field. The difference between the true effective fields of input spin configurations and the output vector maps is used as the total loss of our network, which should be minimized during the training process. The minimization of the total loss means that the output vector maps become identical to the true effective input data fields; thus, after the training process, our network structure can appropriately estimate the effective input data fields. The Adam optimizer is adopted to minimize the total loss with a 0.01 learning rate.
Recursive process. In the recursive process shown in Fig. 2a, the spin configuration is fed as input data, and the trained FCN estimates the effective field. Through the field modification step, we can change the field in the way we want. In most discussions, we do not modify the field � F ′ ← � F * , but in the last discussion on the www.nature.com/scientificreports/ effect of the external field, we add a constant field to the field from the FCN � F ′ ← � F * + Hẑ . The effective field is used to change the spin configuration in the spin evolution step. Suitable methods can be applied depending on the purpose. In our study, we simply align the spin direction parallel to the effective field � S ′ ← � F ′ /| � F ′ | . This process is repeated until the output condition is satisfied.