Metadynamics sampling in atomic environment space for collecting training data for machine learning potentials

The universal mathematical form of machine-learning potentials (MLPs) shifts the core of development of interatomic potentials to collecting proper training data. Ideally, the training set should encompass diverse local atomic environments but conventional approaches are prone to sampling similar configurations repeatedly, mainly due to the Boltzmann statistics. As such, practitioners handpick a large pool of distinct configurations manually, stretching the development period significantly. To overcome this hurdle, methods are being proposed that automatically generate training data. Herein, we suggest a sampling method optimized for gathering diverse yet relevant configurations semi-automatically. This is achieved by applying the metadynamics with the descriptor for the local atomic environment as a collective variable. As a result, the simulation is automatically steered toward unvisited local environment space such that each atom experiences diverse chemical environments without redundancy. We apply the proposed metadynamics sampling to H:Pt(111), GeTe, and Si systems. Throughout these examples, a small number of metadynamics trajectories can provide reference structures necessary for training high-fidelity MLPs. By proposing a semi-automatic sampling method tuned for MLPs, the present work paves the way to wider applications of MLPs to many challenging applications.


INTRODUCTION
By delivering the accuracy of density-functional theory (DFT) calculations at much lower costs, atomistic simulations based on machine-learning potentials (MLPs) are being established as a new pillar in computational material science. 1 Most MLPs utilize the locality of quantum systems and so the computational cost increases linearly with respect to the system size, which is a significant advantage over DFT with a cubic scaling. 2 Until now, various types of MLPs have been proposed; neural network potential (NNP), 3 Gaussian approximation potential (GAP), 4 moment tensor potential, 5 deep tensor neural network, 6 and gradient-domain machine learning. 7In particular, the NNP and GAP are garnering wide interests with applications to challenging simulations such as crystallization behaviors of GeTe, 8,9 and Ge2Sb2Te5, 10 Ni-silicidation process, 11 proton transfer at the ZnO-water interface, 12 structure search of Pt13Hx clusters, 13 crystal structure prediction, 14 and identification of active sites in bimetallic catalysts for CO2 reduction. 15 the heart of the traditional classical potential is the mathematical formula that captures underlying bonding natures.In contrast, universal mathematical structures of MLPs shift the core of potential development to collecting a proper training set that defines atomic environments wherein the trained MLP is valid.Ideally, the training set should encompass diverse local configurations that may appear in target simulations.In usual practices, the training set is selected based on crystal-derived structures and their molecular dynamics (MD) trajectories.However, MD simulations are conditioned by the Boltzmann statistics, which over-represents low-energy regions and can sample only a few distinct configurations separated by low thermal barriers.7][18] We note that some methods such as random structure search 19 and entropymaximization 20 focus on sampling diverse configurations, but they have not been employed in complicated simulations as far as we are aware.
The above discussions call for a sampling method specifically aiming to prepare training sets for MLPs, which is tuned to collect local atomic environments as diverse as possible within the time-and sizescale of DFT calculations.In addition, the sampled configurations should be relevant in the intended simulations.We herein propose one such approach based on metadynamics. 21The metadynamics defies the Boltzmann distribution by accumulating bias potentials along the collective variables (CVs).
Instead of usual implementations that formulate CVs from a set of atomic positions in the real space, 22 we employ as CVs the coordinates in the abstract atomic-environment space, which is spanned by the atom-centered symmetry-function vector (G). 23Being widely used as input features of NNPs, the G vector parametrizes local atomic environments into fixed-length vectors via integrating radial and angular distributions of neighboring atoms.By accumulating bias potentials in the G space, the present metadynamics (abbreviated as G-metaD hereafter) drives each atom to evolve towards unvisited points in the G space.In addition, G-metaD is controlled by a few hyperparameters and can start with simple initial structures, requiring less expertise than the conventional MD-based sampling.Recently, J. Herr et al. suggested a metadynamics sampling method using the distance matrix of the whole system as a CV, which enhanced the stability of MD simulations compared to conventional MD sampling. 24However, individual atoms in this approach can still repeatedly encounter similar local environments because the CV is based on the total system configuration.
In the following section, we formulate the G-metaD and demonstrate its applications to three systems: H:Pt(111), GeTe, and Si.The first model, H:Pt(111), is chosen to directly compare the sampling style between the conventional MD and G-metaD.The other two examples, GeTe and Si, were studied by using the NNP or GAP. 8,9,17,25,26We choose these materials to benchmark the performance of NNPs trained by G-metaD trajectories against the state-of-the-art MLPs trained over a large number of structures that were prepared manually.

Metadynamics simulation
The present G-metaD employs the G vector as the CV.The local bias potential (ub) is defined as a function of G, and the summation of atomic local biases constitutes the total bias potential (Ub) applied on the system: where {R(t)} and Gi(t) are the set of position vectors and the symmetry-function vector of the ith atom at time t, respectively, and Nat is the number of atoms in the system.The biasing force on each atom is computed as follows: where Fi,α and Ri,α are the α-component (α = x, y, and z) of the force and position vectors of the ith atom, respectively, and Gj,s is the sth component of Gj with the dimension of NG.For multi-component systems, ub is defined independently for each atomic species.
We construct the local bias ub in Eq. ( 1) using Gaussians centered at G points visited by each atom.
Since the elements of G vectors are highly correlated to each other, it is ineffective to adopt isotropic Gaussians with a fixed width.This is illustrated in Fig. 1a, which schematically shows a typical distribution of training points (grey dots) along two components (Gi and Gj).To sample the distribution with isotropic Gaussian biases (black dots with circles whose radius means the Gaussian width), many Gaussians should be accumulated because of the highly anisotropic distribution.To overcome this problem, we employ geometry-adapted Gaussians, which was developed to reconstruct the accurate potential energy surface (PES) from metaD by adjusting the shape and size of Gaussian biases according to the distribution of visited CVs. 27The present G-metaD reformulates this approach and accumulates adaptive Gaussians centered at visited G points as follows: The covariance matrix Σ in Eq. ( 3) is given by: where Gj and Gk are the jth and kth components of G, respectively.In Eqs. ( 3) and ( 4), hyperparameters h, σ and τ represent the height and width of Gaussian potentials and time interval of bias updates, respectively.The high correlations among components of G render Σ −1 to be numerically unstable.To prevent the divergence, a small regularization term ε (fixed to 10 −4 ) is added to the diagonal components in Eq. ( 4).According to Eqs. ( 3) and ( 4), the width of the Gaussian bias is adjusted anisotropically such that the Gaussian shape resembles the data distribution as shown in Fig. 1b.This enables the G-metaD to search the relevant regions with much smaller number of bias potentials than in Fig. 1a.There are three hyperparameters in G-metaD: h, σ, and τ.Being related to the bias strength, h and σ control the height and width of Gaussian potentials, respectively.In the case of h, a proper value is chosen such that the magnitude of Ub has an order similar to the thermal energy and the simulation remains to be stable.For σ, if it is too small, a large number of bias potentials would be needed to fill a basin of the PES.In contrast, a large σ can obscure the curvature of the PES, thereby causing undersampling.We find that σ around 1 Å is a reasonable choice.Lastly, τ should be long enough for the system to respond to the updated bias force, and short enough that the metadynamics trajectories can search diverse configurations in the limited simulation time.Our experiences indicate that 20 fs is a sound choice for τ, which is used throughout the present work.Note that these hyperparameters could be assigned differently for each type of atom.
We implement a pair style that computes the bias potential into the LAMMPS package 28 using the SIMPLE-NN library. 29By operating the client-server mode, one can interface LAMMPS with other ab initio codes such as VASP 30 to perform the G-metaD.

Hydrogen on the Pt(111) surface
To compare G-metaD and conventional MD in terms of the sampling style, we investigate diffusion of the H atom on the Pt(111) surface.Noble metals are widely used as efficient catalysts for H-involved reactions such as hydrogen evolution reactions 31 and CO2 reduction. 32An accurate description of the H diffusion on the metal surface is important for simulating these reactions.Furthermore, it has been reported that H atoms can diffuse into the subsurface, influencing the total diffusion kinetics and reaction rates. 33,34Thus, sampling various H sites on the surface as well as in the subsurface would be important for training MLPs that aim to simulate catalytic reactions.
To obtain the training data, we carry out three simulations on (3×3)-Pt(111) with one H atom adsorbed on the surface; standard MD at 600 and 1700 K and G-metaD at 600 K under NVT conditions.
In the case of G-metaD, the bias potential is applied only on the H atom with h and σ of 24 meV and 1.0 Å, respectively.The total simulation time is 3 ps for every simulation.The detailed set up for DFT calculations are presented in the Methods section and simulation movies are provided as Supplementary Videos 1-3.At lower temperatures than 1700 K, say, 1500 K, the diffusion into subsurface is not observed during 3-ps MD simulations.In Fig. 2a, the trajectories of the H atom are classified into four regions; face-centered cubic (fcc), hexagonal (hex), bridge and top sites.At 0 K, the lowest energies within each region are 0 (fcc; the reference site), 59 (hex), 47 (bridge), and 37 (top) meV.The lowest energy site in the sublayer is the tetrahedral site right below the top site with the energy of 823 meV in reference to the fcc site.The temporal evolution of visited sites are displayed in Figs.2b-d.At 600 K, the H atom stays mostly at the fcc site that is the lowest in the potential energy, which is consistent with the Boltzmann distribution.In addition, the H atom does not penetrate into the subsurface due to a high diffusion barrier of 0.9 eV (see below).At the elevated temperature of 1700 K in Fig. 2c, various sites are sampled more or less evenly and subsurface diffusion is observed (shaded area).On the other hand, G-metaD at 600 K also samples various sites including the sublayer (see Fig. 2d).The H atom in the G-metaD stays within the sublayer for ~1 ps out of 3-ps simulation time, in contrast to ~0.5 ps duration in 1700-K MD (see Fig. 2c). Figure 2e shows the distribution of the three trajectories projected onto major principal axes from principal-component analysis.It is seen that the G-metaD covers a wider area than 600-or 1700-K MD. procedures.)To compare accuracy of the trained NNPs, we compute in Fig. 2f the minimum energy paths (MEPs) between the three symmetric sites (fcc, hex, and top) using the nudged-elastic-band method using 9 replicas between the symmetric sites. 35,36The MEP into the subsurface tetrahedral site is also calculated on the right side.For reference, DFT results are also presented, which agree well with literature. 37,38Overall, the results obtained with the NNP-G best agree with the DFT results except that NNP-G incorrectly estimates the energy of the hexagonal site to be more stable than that of the top site (36 vs. 55 meV).It is notable that both NNP-L and NNP-H show large errors of ~0.1 eV at the top site.The undersampling in 600-K MD for this site (see Fig. 2b) would be responsible for the error with NNP-L.Even though top sites are well sampled in MD at 1700 K (see Fig. 2c), the large error implies that the trajectory fails to capture the low-energy surface because the MD is heavily influenced by wide atomic vibrations.The same reasons account for the errors along the diffusion into the subsurface (shaded area of Fig. 2f); a dramatic failure of NNP-L certainly originates from the absence of data in this region (see Fig. 1b), which is partly resolved by MD at 1700 K.However, a substantial error of ~0.1 eV remains at the subsurface tetrahedral site.This implies that the high-temperature sampling, while useful for overcoming energy barriers, risks undersampling atomic environments around local minima due to the wide vibrations and entropic effects.In contrast, the G-metaD is performed at moderate temperatures so it does not suffer from such problems.
Figure 2g compares vibrational frequencies at the stable fcc sites.There are two independent vibrational modes, out-of-plane and in-plane (twofold).The accuracy with respect to DFT results follows the order of NNP-L > NNP-G > NNP-H.This is consistent with the above observations: the 600-K MD most densely samples the fcc site, resulting in the highest accuracy.In contrast, 1700-K MD undersamples this site because of large thermal energies.The reasonable accuracy with the NNP-G implies that the basin of the PES was sampled sufficiently.

Amorphization of GeTe
The present G-metaD is controlled by two hyperparameters, h and σ.By tuning these two parameters, one can steer the system to explore different regions in the G-space, which enables a semi-automatic sampling.We demonstrate this with GeTe, an archetypal phase-change material that has been extensively studied for non-volatile memory devices. 40Several studies employed NNPs in simulating the amorphous structures and crystallization behaviors of GeTe. 8,25,26To sample diverse local orders, the training set included liquids, crystals, amorphous phases as well as quenching trajectories.To improve the stability of the simulation, non-stoichiometric phases were also considered. 2,8,25Here we attempt to prepare the training set for simulating GeTe by utilizing G-metaD only.
Starting from the crystalline rock-salt structure, the G-metaD for GeTe is carried out for 20 ps under NPT conditions of 64 atoms, 0 kbar, and 600 K.For simulating the whole melt-quench process as well as crystallization behaviors, it is necessary to sample both liquid structures with high energies and amorphous structures with the local order similar to those in the crystalline phases.To this end, we generate four G-metaD trajectories with different choices of (h, σ): (8.0, 1.5), (8.0, 1.0), (0.8, 1.0), and (0.8, 0.5) in (meV, Å).In Fig. 3a, the evolution of potential energies during 20-ps G-metaD is shown for each (h, σ).(The movies for G-metaD trajectories are provided as Supplementary Videos 4-7.)To sample local order close to the crystalline structure, we restart G-metaD at 10 ps from the rock-salt structure while maintaining the bias potential accumulated during the preceding 10 ps to avoid redundancy.Here we use h values of 8 or 0.8 meV, which are far smaller than 24 meV in the previous example.This is because there are 32 atoms that contribute to the bias potential (see Eq. ( 3)) while there was only one atom (H) in the previous case.It is seen in Fig. 3a that at the strongest bias (h = 8 meV and σ = 1.5 Å), the system widely changes and even phase separations are noticeable near the end of the G-metaD (see inset figures at the top).In Ref. 8, diffusional mixing of liquid Ge and Te was considered to prevent unphysical phase separations from the ad hoc energy mapping.Such atomic environments are automatically sampled in the strongly biased G-metaD.In contrast, under the weakest bias strengths (h = 0.8 meV and σ = 0.5 Å), the trajectory remains relatively close to the crystalline structures and appears to mainly sample amorphous-like structures.
To analyze characteristic structures that each G-metaD samples, we introduce the Mahalanobis distance, which is used in measuring distances between a point and distributions. 41,42(See the Methods section for details.)Using the Mahalanobis distance, we classify atomic environments into crystal, amorphous, and liquid structures.If the distance does not satisfy the given criteria for any of the three phases, the sampled G point remains to be unclassified.Histograms in Fig. 3b display the phase fractions sampled for each (h, σ).At the strongest bias, unclassified structures are the most dominant.
The rapidly accumulating bias potentials drive the system to evolve towards high-energy structures resembling surfaces or unmixed phases.As the bias strength is reduced, the relative portions of bulk structures, in particular rock-salt structures, increases.This analysis indicates that the system can explore distinct regions in the G space by tuning the hyperparameters.Using the four G-metaD trajectories, we train an NNP with the energy, force, and stress RMSEs of 6 meV/atom, 0.24 eV/Å, and 5 kbar, respectively.(Trajectories are sampled every 20 fs.)In Fig. 4a, the equation of states (EOS) for rock-salt (Fm3m) and rhombohedral (R3m) phases are compared between NNP and DFT.The equilibrium volume and bulk moduli agree with DFT within 1%.Even though the EOS and deformed crystals were not explicitly included in the training set, good agreements are found with both phases.This indicates that G-metaD can automatically sample various lattice distortions around the equilibrium structure.However, the small energy difference between rock-salt and rhombohedral phases (8 meV/atom) is neglected by the NNP (< 1 meV/atom).
Next, we perform melt-quench simulations with the NNP and characterize structural properties of resulting liquid and amorphous structures.To compare with DFT on an equal footing, we select a 96atom supercell for the simulation.The temperature protocol is identical to that in Ref. 8.During the melt-quench process, we do not observe any artefacts such as phase separations.Figures 4b and 4c compare the total and element-resolved radial-distribution functions (RDFs) for liquid and amorphous phases.Overall, good agreements with DFT are found, which is comparable to the previous studies. 8In Fig. 4d, we analyze the ring statistics by using the R.I.N.G.S. code. 43Albeit overestimated in densities, the overall ring distributions including the portion of ABAB-type four-membered rings are similar between NNP and DFT.We also simulate the crystallization behavior of a 4096-atom supercell at 500 K with an atomic density of the amorphous phase.(See Fig. 5e.)The crystallization speed is similar to those in Refs.

General-purpose potential for Si
Due to the limited transferability, most MLPs are trained for specific applications.Developing generalpurpose MLPs is a formidable task involving construction of a huge data set that contains a vast range of chemical environments, which in turn requires deep understanding of the system and possibly several iterations of refinements of MLPs.There have been a few attempts to generate general-purpose MLPs with manually selected data set. 17,18For example, in Ref. 17

DISCUSSION
The showcase examples on GeTe and Si in the above demonstrate that the G-metaD can produce training sets that are comparable to those that experts collected elaborately.This confirms that G-metaD can generate diverse and relevant configurations semi-automatically, which will expedite the development of MLPs by mitigating technicalities of choosing reference structures.However, a limited number of G-metaD trajectories may not provide full accuracy for every region of PES as observed in the example of Si wherein the EOS of some polymorphs is inaccurate.Therefore, we advise that practitioners augment the training set if high accuracy is necessary for specific configurations.For example, by including additional G-metaD trajectories starting with the same diamond structure but under constant pressures of 10-20 GPa, we could significantly improve EOS of high-pressure phases such as hcp and bc8 phases.
One can also utilize the G-metaD in complementing the traditional sampling style: Since MLPs are essentially interpolative algorithms, the prediction error increases rapidly with structures outside the training domain.The MD-based sampling rarely explores high-energy regions and so the trained MLP is vulnerable to failures in long-term, large-scale simulations because some atoms may visit untrained regions eventually.This can be partly resolved by a weighting scheme 50 but the present G-metaD can provide a more robust solution.That is to say, after preparing a training set based on the traditional approach, practitioners may augment the training set by adding G-metaD trajectories, which extrasamples high-energy regions relevant for the simulation.This will achieve both high accuracy and stability of subsequent simulations.
In some cases, it is useful to apply a partial-bias G-metaD in which only a few selected atoms contribute to the total bias potential.For instance, to sample various sites of an interstitial atom (self or dopant types), one can add the interstitial atom into the crystalline bulk and apply the G-metaD only to the interstitial atom.This will enhance sampling of defective structures embedded in the crystalline bulk, which would not be feasible if the biasing force drives all atoms out of the crystalline structure simultaneously.For instance, diffusion paths of Li within a solid would be sampled efficiently by the partial-bias G-metaD. 51out the computational cost, the G-metaD takes about three times longer than the corresponding MD in the case of 20-ps simulations of GeTe.The present implementation of G-metaD operates the client-server mode between LAMMPS and VASP, and the read-write time of wave-function files is substantial.This could be alleviated by implementing the G-metaD directly into the ab initio program.
Another source of higher computational loads is the computation of bias forces in Eq. ( 2).Unlike typical metaD, the CVs in G-metaD have large dimensions of 50-100 and the bias potential in Eq. ( 3) is contributed by every atom in the system.As a result, the computational time of bias calculations becomes significant as the G-metaD proceeds.By reducing the dimension of the G vector used in G-metaD, it would be possible to increase the computational speed.

Density functional theory calculations
The reference DFT calculations are performed with Vienna Ab initio Simulation Package (VASP) 30 using projector augmented-wave pseudopotentials. 52The generalized gradient approximation is used for the exchange-correlation energy of electrons. 53In the case of GeTe, we include a parameterized van der Waals interaction. 54,55The temperature is controlled by the Nosé-Hoover thermostat and a time step for Si, respectively.

Neural network potential
The NNPs are trained by using SIMPLE-NN. 29In training NNPs, the reference DFT data are split randomly into the training and validation sets with 9:1 ratio for H:Pt(111) and 19:1 ratio for GeTe and Si.We use the atom-centered symmetry function vector (G) to represent local environments. 23The symmetry function vector G consists of radial (G2) and angular components (G4 and G5) with cutoff radii of 3.5−8.0Å.For training GeTe and Si, symmetry-function parameters are selected from a large pool of 233 sets by using the CUR method 56,57 .In the process of CUR, each symmetry function is penalized by rough estimates of evaluation costs based on the cutoff radius and the function type (radial or angular), thereby selecting the most cost-effective set of parameters.The CUR selection is terminated when the error (ε), defined in the following, drops below a certain threshold (0.001 and 0.002 for GeTe and Si, respectively): where A is the original feature matrix constructed from 233 parameters, and Ã is the reduced feature matrix from selected parameters, and ‖‖  is the Frobenius norm of matrix A. As a result, 61, 103, and 47 symmetry functions are selected for Ge, Te, and Si, respectively.For H:Pt(111), 70 parameters with a constant cutoff of 6.0 Å are selected without applying the CUR method.
where M is the total number of structures in the training set, Ni is the number of atoms in the ith structure.In Eq. ( 6), Ei DFT(NNP) , Fij DFT(NNP) , and Sik DFT(NNP) for the ith structure indicate the total energy, atomic force of the jth atom, and the kth component (k = 1-6) of the virial stress, respectively.The scaling parameters μ1, μ2, and μ3 in Eq. ( 6) control the weights of the force, normal (k = 1-3), and shear (k = 4-6) stress terms relative to the energy term, respectively.Since the shear components are usually smaller than normal ones, employing different scaling parameters μ2 and μ3 improves accuracy in the shear modulus.The training and validation errors are similar and their RMSE values for each system are noted in the main text.

Mahalanobis distance
We utilize the Mahalanobis distance to classify G vectors in GeTe structures into a point group θ (crystal, amorphous, or liquid phase). 42The Mahalanobis distance (d) measures the distance between a certain data point x and the center of the data point group θ in a multidimensional space. 41It is calculated as: (, ) = �( −   ) T Σ  −1 ( −   ) , (7)   where μθ and Σθ are the mean and covariance matrix of the point group θ, respectively.When all axes are independent, Σθ becomes an identity matrix and the Mahalanobis distance is equal to the Euclidean one.In order to classify the unlabeled G vector, we first prepare reference data points for the GeTe phases from MD simulations: rock-salt crystal at 700 K, amorphous at 500 K (2 structures), and liquid at 1000 K. Then we randomly select 4,000 points from each MD simulation and label them as

Fig. 1
Fig.1The schematic description of bias potential in the G space projected on certain components Gi

Fig. 2
Fig. 2 Sampled configurations and performances of the corresponding NNPs. a Characteristic area on Next, we train three NNPs by employing trajectories from each simulation as training data.(The trajectories are sampled every 10 fs.)They are named NNP-L, NNP-H, and NNP-G according to the MD types (600-K MD, 1700-K MD, and G-metaD, respectively).The training errors are similar among the three NNPs and root-mean-squared errors (RMSEs) for energy and force are less than 3 meV/atom and 0.2 eV/Å, respectively.(We refer to the Methods section for the details of NNPs and training

Fig. 3
Fig. 3 Energy and fraction of phases in the four G-metaD.a Time evolution of potential energy of four

8 and 9 .
In Ref.8, it was found that NNP tends to produce flat four-fold rings, resulting in unphysically fast crystallization, which was improved when relaxation paths from the flat to puckered four-fold rings were included in the training set.The present NNP produces flatness in between the conventional and refined NNPs (not shown), implying that the fine details of medium-range order are not well captured by either MD or G-metaD.

Fig. 4
Fig. 4 Structural properties and crystallization behavior of GeTe obtained by the NNP. a The energy-

Fig. 5
Fig. 5 Comparison between NNP and DFT results over various properties of Si. a Ratios of NNP to DFTfor static properties of Si in the diamond structure.Surface energies are calculated for (100)-(2×2),47

of 2
fs is used.In MD or G-metaD simulations for H:Pt(111), the cutoff energy of 350 eV and k-point grid of 5×5×1 are used.The G-metaD simulations with GeTe and Si are carried out with the cutoff energy of 300 and 250 eV, respectively, and the k-point grids are varied with a spacing of 0.4 Å −1 to maintain the computational consistency during the large volume change.In obtaining reference energies, forces, and stress tensors of sampled structures, we perform oneshot DFT calculations with tighter parameters such that the total energy and atomic forces converge within 1.5 meV/atom and 0.04 eV/ Å, respectively, for randomly sampled G-metaD snapshots.The resulting cutoff energy and k-point spacing are 400 eV and 0.3 Å −1 for GeTe, and 350 eV and 0.157 Å−1 For the NNP architecture, we adopt atomic neural networks with two hidden layers.The number of nodes per hidden layer is optimized with respect to the training RMSE.As a result, each hidden layer consists of 30 nodes for H:Pt(111) and Si, and 60 nodes for GeTe.Since decorrelating the input vector benefits training quality and convergence speed, we transform the input vector by principal component analysis without dimension reduction.After the transformation, variances of the vector components are normalized by whitening.The training is performed with momentum-based Adam optimizer 58 with minibatch (the batch size of 20), which balances between performance and computational costs.The initial learning rate is 0.0001 and reduced exponentially.The loss function (Γ) is formulated as follows: corresponding groups.Then, a G vector is classified into one of three phases (θ*) for which it has the shortest d:  * = argmin ∈{C,A,L} (, ) ,