Accelerated spin dynamics using deep learning corrections

Theoretical models capture very precisely the behaviour of magnetic materials at the microscopic level. This makes computer simulations of magnetic materials, such as spin dynamics simulations, accurately mimic experimental results. New approaches to efficient spin dynamics simulations are limited by integration time step barrier to solving the equations-of-motions of many-body problems. Using a short time step leads to an accurate but inefficient simulation regime whereas using a large time step leads to accumulation of numerical errors that render the whole simulation useless. In this paper, we use a Deep Learning method to compute the numerical errors of each large time step and use these computed errors to make corrections to achieve higher accuracy in our spin dynamics. We validate our method on the 3D Ferromagnetic Heisenberg cubic lattice over a range of temperatures. Here we show that the Deep Learning method can accelerate the simulation speed by 10 times while maintaining simulation accuracy and overcome the limitations of requiring small time steps in spin dynamic simulations.

www.nature.com/scientificreports/ is applied to simulation of the quantum spin dynamics 26,27 , identifying phase transitions 28 , and solves the exponential complexity of the many body problem in quantum systems 29 .
In this paper, we show that speed up is achieved if we combine spin dynamics simulation and Deep Learning to learn the error corrections. The first condition for speed up is enough capacity of Deep Learning to learn the associations between spin configuration generated by large time steps and spin configuration generated by accurate short time steps. The second condition is enough training data for learning and show the Deep Learning enough pairs of patterns between spin configuration for large and short time steps. We propose to use Deep Learning to estimate the error correction terms of Suzuki-Trotter decomposition method, and then add the correction terms back to spin dynamics results, making them more accurate. As a result of this correction, larger time step can be used for Suzuki-Trotter decomposition method, and corrections can be made for each time step. To evaluate our Deep Learning method, we analyze spin-spin correlation as a more stringent measure. We also use thermal averages to benchmark the performance of our method. We compare the Deep Learning results with those from spin dynamics simulation without Deep Learning for short time steps.

Methods
Heisenberg model. The ferromagnetic Heisenberg model on a cubic lattice is used to demonstrate the efficiency of our method. The Hamiltonian for this model is given as H = −J <i,j> S i · S j , where a vector S i has three components (S i x , S i y , S i z ) and |S i | is a unit vector. We formalize our spin dynamics following the notations of Tsai et al. 16 . We write the equations of motion for all spins as where σ (t) = (S 1 (t), S 2 (t), . . . , S n (t)) is the spin configuration at time t. The integration of the equations of motion in Eq. (1) is done using the second order Suzuki-Trotter decomposition method as in Tsai et al. 16 . As following the mathematical notations of Tsai et al., we decompose the evolution operator R into R A and R B on the sublattices A and B respectively, and obtain The ferromagnetic Heisenberg model is considered on the cubic lattice of dimensions L × L × L with periodic boundary conditions. This model undergoes a phase transition at a temperature k B T c /J = 1.442 . . . 30 , where k B is Boltzmann's constant. In the spin dynamics approach, the equations of motion for the Heisenberg model is governed by the following equation: Here, H i eff is the effective field acting on the ith spin. The k component of the effective field can be specified as H i eff,k = − j=nn(i) S j k , where the sum runs over the nearest neighbor pairs of sites and k = x, y, and z.
Deep Learning approach. A fully supervised Deep Learning method is developed to perform the spin dynamics by using the second order Suzuki-Trotter decomposition method to reduce simulation errors. In order to produce training data for our supervised Deep Learning, initial spin configurations are considered at ordered, near-critical, and disordered states in the temperature range k B T/J ∈ [0.5, 2.4] and sampling 9.1 × 10 5 independent spin configurations using Monte-Carlo simulations with the Metropolis-Hastings algorithm [30][31][32][33] .
The initial spin configurations are prepared with 300,000 samples in ordered states, 210,000 samples near critical states, and 400,000 samples disordered states by simulated annealing method. The temperature annealing scheme will be described in more details in the supplementary information. The temperatures for annealing are gradually lowered from high to low temperatures and Monte Carlo data are always obtained at equilibrium configurations. For each sampled initial spin configuration σ i , two sets of spin dynamics simulations are performed with the time steps τ 1 = 10 −1 and τ 3 = 10 −3 as illustrated in Fig. 1a. Second-order Suzuki-Trotter method uses τ = 0.04 as typical integration time step, so we use τ = 10 −3 which would give good accurate simulation.   are used as the inputs into U-Net 34 , a kind of convolutional neural networks. The U-Net is a proven architecture for image segmentation as well as for extracting subtle features. The detailed structure of U-Net is shown in Fig. 1b. The architecture of U-net used for 8 × 8 × 8 cubic lattice is that convolutional layers are used as an encoder on left upper side followed by a decoder on right upper side that consists of upsamplings and concatenations with the correspondingly feature maps from the encoder. We add fully connected layers (FC) in the bottom of the network between the encoder and the decoder to efficiently determine particular weights in the feature map from the encoder , such as capturing more information of spin-spin interactions. The input channels C are 6 by concatenating spin coordinates S x , S y , and S z of both σ i and σ (10 −1 ) i , respectively. The input dimensions of U-Net are reshaped to [D, L, L, L, C] as cubic grid vector map, where D is the total number of training data, L is lattice size, and C is input channels. The encoder consists of the repeated two convolutional layers with 3 × 3 × 3 filters followed by a 2 × 2 × 2 max pooling. We apply a reshaping function to FC with dimensions from [D, ] . Every step in decoder consists of upsampling layers with a 2 × 2 × 2 filters followed by the repeated two convolutional layers with 3 × 3 × 3 filters and copies with correspondingly cropped feature maps from encoding layers. The periodic boundary conditions are also applied to the convolutional layers.

Results
The effectiveness of our proposed Deep Learning method is evaluated at k B T/J = 0.4 < k B T c /J , k B T/J = 1.44 ≈ k B T c /J, and k B T/J = 2.4 > k B T c /J . Note that at k B T/J = 2.4 , the system is in a disordered state and spatial corrections between spins are very short. One hundred independent spin configurations are generated by using Monte-Carlo simulation for use as test data sets at each temperature k B T/J = 0.4 , 1.44, and 2.4. Second order Suzuki-Trotter decomposition methods are used for all experiments in this paper.
To evaluate the accuracy of simulation results, correlation is investigated by comparing spin dynamics trajectory σ (t) with highly accurate spin dynamics trajectory ρ(t) performed with τ = 10 −6 . τ = 10 −6 is used as the reference time step as we found that it can give accurate trajectories. Correlation ξ(t) as function of time t in which σ (t) and ρ(t) are compared is given by [(ρ j (t)) x (σ j (t)) x + (ρ j (t)) y (σ j (t)) y + (ρ j (t)) z (σ j (t)) z ], www.nature.com/scientificreports/ where index j denotes lattice site of spins, L is the linear dimension of the lattice, and L 3 is total number of spins at lattice sites. Since the initial spin configurations are the same, ρ(0) is identical to σ (0) . We compute one hundred correlation ξ(σ i , t) for spin configurations σ i (t) , where i is from 1 to 100. Then, we also estimate the mean of correlation µ ξ(t) and the standard deviation of correlation std(ξ(t)) of ξ(σ i , t) as a function of time at each temperature. We wish to compare the conservation of energy and magnetization across one hundred samples, but their starting spin configurations are different. In order to take statistics across the samples, we shift the energy and magnetization of the initial spin configurations to zero. Eq. (11) and Eq. (12) show how we shift the energy per site e(t) and magnetization per site m(t) at each time step t. Here, Q represents the number of samples at each temperature. We use Q as one hundred. www.nature.com/scientificreports/ With the shifting of energy and magnetization, we can compute the mean of absolute energy per site µ |ẽ(t)| , the mean of absolute magnetization per site µ |m(t)| , standard deviation of energy per site std(ẽ(t)) , and standard deviation of magnetization per site std(m(t)) over independent samples. In Fig. 2, the spin-spin correlation plots are shown as using reference trajectory generated at the reference time step τ = 10 −6 for k B T/J = 0.4 (k B T/J < k B T c /J) [Fig. 2a,d]  www.nature.com/scientificreports/ We define threshold time t thres as the average time required for spin-spin correlation µ ξ(t) to drop from 1 to 0.99. In Fig. 2g, the plot of t thres as a function of temperature k B T/J has the logarithmic scale on the y-axis, and simulations for τ = 10 −3 have higher threshold time (red squares) at each temperature than for τ = 10 −1 without Deep Learning corrections. Threshold time (filled blue diamonds) for τ = 10 −1 with Deep Learning corrections approaches to almost the same threshold time (yellow circles) for τ = 10 −2 without Deep Learning corrections at each temperature. www.nature.com/scientificreports/ using Deep Learning corrections (blue line). In Fig. 3c, at k B T/J > k B T c /J , the system is disordered and the mean of absolute energy µ |ẽ(t)| and the mean of absolute magnetization µ |m(t)| become more constant, simply due to averaging of disordered spins. Especially, Fig. 4c shows that at k B T/J > k B T c /J , the effect of averaging over disordered spins for L = 8 is stronger than for L = 4 . At high temperature, the number of possible states increase exponentially and hence fitting by Deep Learning corrections is more difficult.

Discussion
Our results have demonstrated that the Deep Learning corrections enhance the time integration step of the original Suzuki-Trotter method and have achieved ∼ 10 times computational speed up while maintaining accuracy compared to the original Suzuki-Trotter decomposition method. The nature of local nearest neighbours interactions in the lattice means that convolutional structure of the Deep Neural Network is a nature choice of network architecture. Since convolution is translationally invariant, the effect of lattice size on training our U-Net is not a major concern. For example, between L = 4 and L = 8 lattices, the time required for training the U-Net parameters increases by about 4 times, which is sub-linear with respect to the number of lattice sites. Our Deep Learning was trained on simulation data at τ = 10 −3 , however its accuracy performance is equivalent to simulation data at τ = 10 −2 . This shows that our Deep Learning training has not reached its theoretical limit of a perfect prediction. This theoretical limit can be achieved exactly if we train on an infinite amount of data for an infinite capacity. In practise, Deep Learning methods can not be perfect because the amount of data and the capacity of U-Net are finite. The main source of inaccuracies in our Deep Learning method is that U-Net's output does not fit exactly the labeled data generated at τ = 10 −3 and that even if U-Net is able to fit the data it has through training, it may not predict perfectly on the data it has never seen in training. For future work, we will explore the effects of Deep Learning corrections on higher order Suzuki-Trotter decomposition. We will also apply Deep Learning corrections such as off lattice systems and integrators such as velocity-verlet.

Data availability
The data and code that support the findings of this study are available from corresponding authors upon request.