A machine-learned spin-lattice potential for dynamic simulations of defective magnetic iron

A machine-learned spin-lattice interatomic potential (MSLP) for magnetic iron is developed and applied to mesoscopic scale defects. It is achieved by augmenting a spin-lattice Hamiltonian with a neural network term trained to descriptors representing a mix of local atomic configuration and magnetic environments. It reproduces the cohesive energy of BCC and FCC phases with various magnetic states. It predicts the formation energy and complex magnetic structure of point defects in quantitative agreement with density functional theory (DFT) including the reversal and quenching of magnetic moments near the core of defects. The Curie temperature is calculated through spin-lattice dynamics showing good computational stability at high temperature. The potential is applied to study magnetic fluctuations near sizable dislocation loops. The MSLP transcends current treatments using DFT and molecular dynamics, and surpasses other spin-lattice potentials that only treat near-perfect crystal cases.

. Snapshot of the spin dynamics simulation of ferromagnetic α-Fe at 100K. It demonstrates at finite temperatures magnetic excitations are realized as non-collinear orientations of magnetic moments. The system evolves according to Langevin spin dynamics using the Ma-Dudarev potential 14 .
• Class 3: Small random displacement of atoms from ideal lattice sites in BCC and FCC phases (|∆r i | ≤ 0.02 Å).
• Class 4: Snapshots of atomic configurations of perfect crystal produced by MD simulations. MD simulations using Derlet-Dudarev potential 12 were performed in an NPT ensemble at 10K, 100K, 200K, 300K and 800K.
• Class 7: Snapshots of atomic configurations of SIA and mono-vacancy configurations produced by MD simulations. MD simulations were performed at low temperature allowing exploration of the energy well.
• Class 8: Amorphous configurations. It is produced by quenching systems at liquid states through conjugate gradient method, where liquid states were produced by MD using Derlet-Dudarev potential. 12 .
• Class 9: Primitive BCC and FCC unit cells with different values of magnetization. The magnetization of the simulation box is controlled by setting the difference between the number of up and down electrons, which is controlled by the keyword "NUPDOWN" in VASP.
• Class 10: Snapshots of non-collinear magnetic configurations on perfect BCC lattice produced by spin dynamics simulations. Spin dynamics simulations were performed at 100K, 200K, 600K and 800K using Langevin spin dynamics 13 and Ma-Dudarev potential 14 . An example of magnetic configuration is shown in Figure 1. Then, the non-collinear magnetic configurations were generated using constrained method for non-collinear DFT 6 .
• Class 11: Small random non-collinear displacements of magnetic moments in BCC and FCC phases.

Density Functional Theory Calculations
All training data was generated using the VASP package [7][8][9][10] . For most calculations, we performed non-spin-polarized 15,16 or spin-polaried 17 DFT calculations. We used the projector augmented wave (PAW) pseudopotentials [18][19][20] which includes the 3p states as valence states, with a total of 14 valence electrons. The plane wave cut off energy was set to 400 eV. Exchangecorrelation energy was approximated using the GGA Perdew-Burke-Ernzerhof (PBE) functional 21,22 . Brillouin zones were sampled using the Monkhorst Pack scheme such that the smallest spacing between k-points is 0.01Å −1 . Convergence criteria of the self consistent field iteration was set to 10 −8 eV. Forces were relaxed to a convergence of 10 −4 eV/Å for bulk-like systems and 10 −3 eV/Å for simulation cells containing defects. In order to sample magnetic configurations in excited states, we performed non-collinear magnetic configurations using constrained method 6 . To constrain the orientations of magnetic moments to chosen directions {e I }, one can apply a energy penalty term E p to the total energy functional: where λ is a Lagrange multiplier which acts to change the degree of enforcement of the constraint on site I, and E 0 is the DFT energy. The E p acts to introduce an effective magnetic field to each atom. The E p is analytically proven to be inversely proportional to λ 6 . A large enough λ is chosen such that the change of E p is smaller than 1meV. Due to the use of PAW method, there is ambiguity on the definition of atomic magnetic moment. One may define the magnetic moment of atom I through the spatially varying magnetic density m(r): where the radius of the sphere Ω I has been chosen to match the VASP parameter RWIGS of the PAW psuedopotentials which for our calculations is 1.131a 0 . This satisfies that spheres are large enough but not overlapping. To prevent a discontinuity at the boundary of the constraint field, a smoothing function F I (r − r I ) = sin x/x and x = π|r − r I |/R I is introduced 6 . In our MSLP, the magnitude of magnetic moment can change representing the itinerant nature of electrons. However, in the constrained non-collinear DFT calculations, the magnitude of magnetic moment is allowed to relax to achieve lowest energy subjecting to constrains. The information about the energy landscape corresponding to the change of magnitude is not captured here, but through other part of the training data.

The Loss Function
A single atomic configuration can provide the information of total energy, atomic forces, box stresses and atomic magnetic effective fields. We can design a Loss function that can capture all these information. The aim is to develop a model that can reproduce target data with sufficient accuracy whilst maintaining generality. The act of training is essentially to minimise the Loss function: with respect to a parameter vector: The Loss function for energy: where N config is the number of configuration, w E c is weight for configuration c, ∆E c and ∆E f ,c are the differences in energy and formation energy between the model and target data, respectively.
The implementation of DFT in VASP can only calculate the total energy of a system. Therefore, we can only compare the energy difference of a system. All DFT energies were adjusted by subtracting the energy of an isolated atom multiplying the number of atoms in particular system. Then, one can calculate the energy difference: The E MSLP c is the sum of all the atomic energy of configuration c calculated by MSLP. The formation energy of a configuration c is defined as: where N c and N GS are the number of atoms in configuration c and the corresponding ground state configuration, E α GS is the total energy of the ground state configuration, superscript α represents MSLP or DFT. Then, the difference of the formation energy: The weight of configuration c is a function of ∆E c , such that: where w E c0 is a constant that assigned to a class or subclass of configurations. The design of this weight function is to automate the reduction in relative importance for configurations in the class whose energies are higher than the smallest energy in the set. This biases the loss function to find solutions which fit configurations near the minima against those in phase space which are unfavorable or unlikely.
The Loss function for atomic forces: where w F c is the weight, F MSLP i and F DFT i are atomic forces calculated by MSLP and DFT, respectively. The Loss function for the stress of simulation cell: where w σ c is the weight, σ MSLP The Loss function for magnetic effective field: where w H c is the weight, H MSLP i is the magnetic effective field calculated by MSLP. In adiabatic self-consistent field DFT calculations, the effective field for each atom must be zero, if no external field is applied. For those training data generated using constrained method, we set the weight w H c to zero. It is because we cannot obtain the effective fields from DFT without ambiguity.
In the Loss function, we introduced a Ridge (or Tikohnov) regularisation term: It is to prevent overfitting, loss of generalisation or the model becoming too dependent on any particular free parameter. A fixed regularisation value of λ = 10 −6 is used.

Validation Procedure and Limitations
Once the conventional spin-lattice potential had been trained (steps 1 and 2 in the training workflow), we began training neural networks of different sizes. One of the most computationally expensive steps is the calculation of the descriptors so we opted to minimise the input layer of the networks. To reduce cost, we reuse the G2 descriptors when building the magnetic descriptors (see Methods in article). As such, for all parameterisation tests using n G2 descriptors, our input layer would consist of 4n nodes. We found n > 5 with equidistantly spaced Gaussians in the range [2, R cut ] could train reasonable networks when used with one or more hidden layers wider than 5 nodes. An input layer of n = 9 was determined to be large enough to build a sufficient representation of the configurational and magnetic environments, whilst small enough to reduce excessive computational cost. In general, deeper neural networks were found to be slower to train whilst also overfitting the training data. This was compounded by the IPFIT codes inability to perform early stopping by simultaneously analysing a test set. As such, we would generally rely on the dynamic simulations used to calculate the Curie temperature to test for over fitting. An over fitted potential would have large variations in observable properties during a Langevin dynamics run at constant temperature and pressure (NPT ensemble). A small test set of approximately 400 unseen non-equilibrium configurations generated through ab initio molecular dynamics was also used to qualitatively assess the quality of a fit once converged (see Figure 2(ii)). The presented parameterisation relied only on a single hidden layer (36 x 9 x 1) to produce accurate results whilst remaining scalable to over 1 million atoms. Future updates of IPFIT will introduce improved testing and validation techniques such that deeper networks may be trained more reliably.

4/13 2 Atomic Force and Magnetic Effective Field
Details of spin-lattice dynamics that allows the change of magnitudes of magnetic moments 23 can be found in a review article 24 and related works 25,26 . An important part in the integration of equation of motion is the calculation of atomic force and magnetic effect field. They are derivatives of the Hamiltonian H . The Hamiltonian of our MSLP is written as: }} is a vector of descriptors for atom i. The functional form of the descriptors are discussed in the section of Method in the main text. For simplicity, we write Since nine G2 descriptors were used, there are totally 4 × 9 = 36 elements in G i . The atomic force on atom k is defined as: where The magnetic effective field is where 3 Additional Results

Quality of the MSLP
To evaluate the quality of our potential, we calculated the mean absolute error which quantifies the difference between the energy of the target DFT data and the energy predicted by the MSLP of the same configurations. Fig. 2a plotted the DFT energies against the predicted energies of all configurations used for training. The mean absolute error for all data is 46.3 meV/atom. We note this error is larger than most machine-learned potentials which are within 20meV/atom 27,28 . However, we should note the additional 3N degrees of freedoms due to magnetism in our model. If we examine the mean absolute error according to classes in the training data, the mean absolute errors for classes 1 to 11 are 18.4, 49.3, 4.26, 2.29, 1.67, 0.70, 8.26, 31.6, 76.9, 22.15 and 8.46 meV/atom, respectively. For configurations near stable and metastable states, the mean absolute error is small and comparable to other machine-learned potentials for MD. The largest error comes from class 9, where energies were calculated subjecting to constraint controlling the value of magnetization. It is understandable that the mean absolute error is large in this data set, because most data contribute little to the energy loss function according to the functional form of w E c , i.e. weight is small when energy is high, where the data set has the largest energy range. The second largest error is from Class 2. We found the largest discrepancy comes from the antiferromagnetic BCC configurations. At full relaxed conditions, its DFT energy is 0.46 eV/atom higher than the ferromagnetic BCC configuration. If we consider the melting temperature of iron T melt = 1811K, we know the corresponding energy scale is about k B T melt = 0.16eV. Such discrepancy should have little consequence in applications. A difficult point of producing a sensible potential is to guarantee the ground state ferromagnetic BCC configuration always attains the lowest energy. It is achieved through using the functional from of w E c , which amplified the importance of data for configurations with an energy close to the ground state. We can see in Fig. 2 that the lowest energy configuration in DFT data is the lowest energy configuration predicted by MSLP. We also plotted the forces and stresses calculated using DFT against MSLP in Fig. 2. They show good correlations. The mean absolute errors of force and stress are 0.1 eV/Å and 2.4 GPa, respectively.
In Fig. 3 the energies of BCC and FCC phases in non-magnetic state and the most stable magnetic states are shown as a function of lattice constant. DFT calculations show ferromagnetic and double layer antiferromagnetic state are most stable in BCC and FCC phases, respectively. Magnetism can lower the energy of both BCC and FCC phases. Our MSLP reproduced such phenomenon quantitatively well. Fig. 4 shows the change of energy of ferromagnetic BCC iron as a function of the magnitude of magnetic moments calculated using VASP and MSLP. It shows an energy well resembling Landau functional form, where the minimum is about 0.4 eV/atom lower than non-magnetic case. It causes the formation of spontaneous magnetic moment M s . The magnitude can also fluctuate due to thermal excitation. We should note in our MSLP, the Hamiltonian is rotational symmetry, so the energy is independent of the direction of magnetization in ferromagnetic state. The position of the minimum and the barrier height are well reproduced for FM BCC. However, the shape of the well warrants refinement in future parameterisations. The incorrect profile of the region between M = 0 and M = M s partially accounts for the high error for Class 2 configurations in the database.

Point defects
Magnetic interactions are responsible for the anomalous order of SIA stability for BCC iron which supports a 110 orientation opposed to 111 found in other BCC transition metals. Various DFT calculations as show in Table 1 are compatible with each other concerning the stability of SIA configurations, where the energy of 110 < tetrahedral < 111 < 100 < octhahedral.
We performed conjugate gradient method to relax various SIA configurations using our MSLP. However, except the 110 dumbbell, which is the most stable configuration, none of them can relax to decided configuration. It is understandable because they are unstable configurations. In DFT, most codes would determine and impose symmetry of a system, and so restricted the relaxation of atomic positions. However, a molecular statics calculation does not impose any symmetry when relaxing a system, so high energy configuration is difficult to hold. Therefore, we calculated the formation energy of defects using the atomic positions and magnetic moments of configurations obtained from present VASP calculations without further relaxation. The formation energy calculated using MSLP are well compatibility with our DFT calculations and other works.
We performed a nudged elastic band calculation between the 110 and 111 dumbbell self-interstitial atom configurations in FM BCC iron. It is to check if there is any energy barrier preventing the transition of a 111 dumbbell to the lower energy 110 configuration. Simulation cells contain 251 atoms. To reduce computational cost, we used standard PAW pseudopotentials instead. we generated 5 images along the transition pathway. The tolerance of SCF cycles is set to 10 −7 eV/step and force is set 10meV/Å. No energy barrier was detected between the two configurations. It essentially means small perturbation, such as thermal excitation, can drive a 111 dumbbell to relax into a 110 . dumbbell configuration.

Dislocation Loop
In addition to the SIA loop that we presented in the main text, we also constructed a shear loop. We constructed a loop in a simulation box containing 128,000 atoms via the ATOMSK code 32

9/13
The dislocation loop relaxed to be circular with mixed character as detected via the DXA routine in OVITO 33 and shown in Figure 6a. Atoms on the side of the edge type dislocation consisting of the extra half plane are under relative compression and are found to have their magnetic moments suppressed with magnitudes as small as half that of bulk (Figure 6b). On the other side which does not contain the extra half plane, the atoms are under tensile stress and are observed to have increased magnetic moments (Figure 6c). A simple schematic of the mixed character loop and the atomic distortions are shown in Figure 6d highlighting the regions under tension and compression relative to the orientation of the dislocation. The relationship between the local stress and magnitude of the magnetic moment is consistent with trends noted for strained bulk samples 29 , as well as point defects and prismatic loops as shown in the main manuscript.

Code availability
The spin-lattice dynamics SPILADY code is available under the Apache Licence version 2.0. One can download it from https://ccfe.ukaea.uk/resources/spilady/. Version 1.0.1 is suitable for single element simulation of molecular dynamics, spin dynamics, spin-lattice dynamics and spin-lattice-electron dynamics 34 . It is capable of considering the longitudinal fluctuations of magnetic moments. The modified SPILADY code developed for this work, which includes the MSLP potential, is available upon request from the corresponding author and will be included in the next version release.
The interatomic potential fitting IPFIT code was developed in-house for this work. It is available upon request from the corresponding author subjecting to mutual agreements. Fitted parameters for our MSLP is provided as an additional file labelled mslp.txt. In Table 2, the keywords are identified in relation to the Hamiltonian of MSLP. We provide the equation number (in main manuscript) which defines the specific parameter.
It is to note SPILADY receives a more general form of the embedding function: This is equivalent to the embedding function defined in the Hamiltonian (Eq. 6 in main manuscript) where φ 0 = 1, φ 1 = φ and φ 2 = 0.