Machine learning as an improved estimator for magnetization curve and spin gap

The magnetization process is a very important probe to study magnetic materials, particularly in search of spin-liquid states in quantum spin systems. Regrettably, however, progress of the theoretical analysis has been unsatisfactory, mostly because it is hard to obtain sufficient numerical data to support the theory. Here we propose a machine-learning algorithm that produces the magnetization curve and the spin gap well out of poor numerical data. The plateau magnetization, its critical field and the critical exponent are estimated accurately. One of the hyperparameters identifies by its score whether the spin gap in the thermodynamic limit is zero or finite. After checking the validity for exactly solvable one-dimensional models we apply our algorithm to the kagome antiferromagnet. The magnetization curve that we obtain from the exact-diagonalization data with 36 spins is consistent with the DMRG results with 132 spins. We estimate the spin gap in the thermodynamic limit at a very small but finite value.

also restricted to small systems. We have not found a promising numerical method to treat frustrated systems up to the present. Under these circumstances, the development of a data-analysis method is particularly important in order to extract a meaningful physical conclusion out of poor numerical data.
The Gaussian-kernel method is a machine-learning algorithm which enables analytically differentiable data regression. Harada 11 introduced the Bayesian inference coupled with the Gaussian-kernel method for parameter estimation in the finite-size scaling analysis of critical phenomena. It automatically evaluates the critical point and critical exponents without assuming any form of the scaling function. Nakamura 51 employed this method to find continuous and analytically differentiable model functions of physical quantities directly from raw simulation data. For example, we obtained the internal energy as a continuous function of the temperature as E(T) out of energy data collected at discrete values of the temperature. We then found the specific heat C(T) as a continuous function by algebraic temperature differentiation of E(T). We also determined the critical temperature as an exclusion point of E(T) that cannot be modeled by a smooth function. Its accuracy was within five digits of the exact value for the two-dimensional Ising model 51 . We further estimated the critical exponent by the logarithmic differentiation of a model function of a physical quantity. A similar approach 8 was taken to estimate the critical temperature as a point at which the neural network confuses the classification problem.
Empowered by this Bayesian inference coupled with the Gaussian-Kernel (B-GK) method, we acquire a useful tool to obtain a continuous and accurate estimate of the magnetization curve in a style completely different from the conventional ones. The magnetization plateau M p , its critical field H p , and its critical exponent δ will be estimated very accurately in a similar manner to how we obtained the critical temperature and the critical exponent for the Ising model 51 . The method also gives an alternative estimate of the spin gap by the critical field of its zero-magnetization plateau. Since its size dependence is different from the conventional spin-gap definition, we can perform a combined size-extrapolation analysis using both series of data, which further improves the accuracy of the estimate. We also find that one hyperparameter in this analysis identifies the ground state being gapless or gapful by its score.
For a check, we applied this method to one-dimensional models, obtaining the results that agreed with the exact solutions. Then, we used it to resolve the unsettled issues in the kagome antiferromagnet. We estimated the spin gap in the thermodynamic limit at a small but finite value. The critical exponent of each magnetic plateau suggested that the excitations above the states on the finite-magnetization plateaux as well as the excitations above the ground state are different from those below the plateau states.

Results
Validity check in one-dimensional models. We first tested the method for the magnetization curve in the S = 1/2 antiferromagnetic (AF) Heisenberg modulated spin chain 52 , where N denotes the spin number, is a dimerization parameter, α is a modulation parameter, and φ is a phase factor. The periodic boundary condition is imposed. We study the existence of an incommensurate or commensurate ground state depending on the value of α . According to a criterion proposed by Oshikaw et al. 16 , we expect to observe a plateau in the magnetization curve at M/N = S − α , where M denotes the total magnetization of the system and α was originally set to a rational number. Hu et al. 52 investigated this model with an irrational number of α in order to study the incommensurate ground state. They obtained the plateau magnetization at S − α within the accuracy of 1/200 when = 0.8 by the density-matrix renormalization group (DMRG) method applied up to N = 200 . We here test our data-analysis method to estimate the plateau magnetization using poor data obtained for a system of much smaller size and with much smaller .
The magnetization curve. We calculated the ground-state energy for each value of the magnetization as E(M) by the ED method on a system of 30 spins; see Fig. 1a. Let E p the energy value at the plateau magnetization M p . We fixed the plateau point (M p , E p ) by the Bayesian inference as an exclusion point at which the Gaussian-kernel method fails to model data by a smooth function. If the magnetic plateau exists, the E(M) data exhibit a jump of its slope at (M p , E p ) since H = ∂E(M)/∂M . We cannot model such data by a smooth function because the derivative is not continuous. In this inference, we introduced additional mirror data (depicted by crosses in Fig. 1a) corresponding to the original data (depicted by circles in Fig. 1a) for both upper and lower sides of the plateau point. Locations of the mirror data are controlled by an additional hyperparameter A s , which we hereafter call the asymmetric parameter. The mirror data are located symmetrically or antisymmetrically when A s = 1 or −1 , respectively. This asymmetric parameter is also fixed by the Bayesian inference. We will below use the estimate of the asymmetric parameter as an indicator of the existence of a finite spin gap.
By using estimated values of (M p , E p ) and hyperparameters, the B-GK method gives a model function for E(M) smoothly connecting the original data and the mirror data as shown in Fig. 1a. The present algorithm evaluates the value of M p within five-digit accuracy. This lets us clearly observe the phase-factor dependence of M p due to the finite-size effects; see an inset of Fig. 1a. This dependence is regarded as an outcome of the inconsistency between the incommensurate modulation and the periodic boundary condition. The magnetic interactions are not continuous at the periodic boundary edge if αN is not an integer. When N = 30 , this inconsistency is relaxed because αN ( ≃ 10.98 · · · ) is close to an integer. Therefore, the results did not show the φ dependence.
We draw the magnetization curve by analytic differentiation: H(M) = ∂E(M)/∂M . Note again that this is possible because we obtain E(M) as an analytic function except for at M p . The results for various system sizes are shown in Fig. 1b. Each B-GK result exhibits a continuous magnetization curve, whose plateau magnetization It is a remarkable consistency considering the size of the system. A value of M p for each system size oscillates around and approaches S − α as the system size increases. Values of the critical field H p for both plateau edges also exhibit similar size dependences. In addition to M p , the point at M/N = 0.5 is another data edge.
Since there is no numerical data above this point, contributions of the Gaussian kernel function from the upper side of the data are missing in the vicinity of M/N = 0.5 . Then, the quality of the model function deteriorates and the finite-size dependence appears. We compared our results with the magnetization curve estimated by the conventional method, that is, by taking the difference of the same data as H(M) = E(M) − E(M − 1) , which exhibits a stepwise behavior. Obviously, we cannot estimate the incommensurate magnetization plateau accurately from this plot.
A model function for the critical exponent δ defined by (H − H p ) ∼ (M − M p ) δ is plotted in an inset of Fig. 1b. Here, we plotted two model functions given by two equivalent expressions for δ in Eq. (10). Both model functions approach two at the critical field, which is consistent to the critical exponent in gapful systems 53 . Numerical instability occurred in the vicinity of M p . We need to discard these data where |M − M p | is smaller than several times the standard deviation of M p , which is estimated by 40 best results out of 800 trials of the Bayesian inference.
The spin gap. We also checked the present method in two exactly solvable models particularly focusing on the existence of the spin gap: One is the XY-spin version of the Hamiltonian (2), namely a dimerized AF XY chain. We applied the magnetic field in the z direction. The other one is the uniform AF Heisenberg spin chain with = 0 . The exact expression for the magnetization curve is known [53][54][55][56][57] in both cases. The magnitude of the spin gap in the thermodynamic limit is in the XY model. Therefore, we can tune the system gapful or gapless by varying the dimerization parameter. The ground state of the uniform AF Heisenberg spin chain is known to be gapless.
Our B-GK results for the magnetization curve and the ground-state energy are plotted in Fig. 2. The magnetization curve obtained by our method coincides with the exact result accurately in the whole range of the magnetic field. We only notice a small disagreement in the vicinity of M = 0 when is very small in the XY model. The agreement is particularly excellent when the spin gap is large or zero. We did not observe finite-size effects in the AF Heisenberg chain. The average deviation of the magnetization curve from the exact one was 0.0007 (8) for N = 30 and 0.003 (2) for N = 16 in the range of 0 < H < 1.99 , where the error is estimated by the standard deviation among two thousand data points.
The model function for the critical exponent δ is plotted in insets of Fig. 2. Two model functions with equivalent expressions converge to the same value. The exact results in the gapful models (black solid lines of = 0.1, 0.3, 0.5 in Fig. 2a) approach 2 with a finite slope, whereas those of the gapless models (a black solid line of = 0.0 in Fig. 2a and that in Fig. 2b) approach 1 with a zero slope. This difference in the exponent also clearly www.nature.com/scientificreports/ identified the gapful and the gapless ground states. Our B-GK result exhibits an agreement with the exact one except for the case of the = 0.1 XY model. It deviated from the exact curve before reaching the critical field and exhibited a non-uniform behavior. We consider it as an outcome of the small lattice size compared to the magnitude of the spin gap. We here present an alternative estimate for the spin gap by the B-GK method. Namely, the critical magnetic field at the zero magnetization, H(0) = ∂E(M)/∂M| M=0 , corresponds to the spin gap above the ground state. The conventional estimate for the spin gap, E(1) − E(0) , is regarded as an approximation of the numerical differentiation by the difference. Since H(0) is the slope of E(M) at M = 0 , the symmetric mirror ( A s = 1 ) of the data are favored when the spin gap is zero, whereas the asymmetric mirror ( A s = −1 ) of the data are favored when it is finite; see Fig. 2c. Therefore, we expect to judge the ground state whether gapless or gapful by the estimate of this parameter.
The estimates of the spin gap and the asymmetric parameters are summarized in Table 1. This algorithm easily identified the large-gap state and the gapless state. In the gapless models (the XY model with = 0 and the AF Heisenberg model), the estimate of the spin gap according to the critical field H(0) almost vanishes even in a finite lattice with 30 spins. The asymmetric parameter A s took a value close to 1 in the gapless state and a value close to −1 in the gapful state (the XY model with ≥ 0.1 ). This parameter indeed identified by its score whether the ground state is gapless or gapful in the thermodynamic limit. The estimate took a value away from ±1 for = 0.1 and 0.05. These ambiguous results together with the non-uniform behavior of the critical exponent suggest that the algorithm needs more data with larger lattice sizes to clearly identify the state as gapful.
In the analyses of the spin gap estimation, we fixed the value of M p to 0 as prior information for the Bayesian inference, because we know that the ground state of these systems lie in the M = 0 sector. Here, a question arises. What happens if we feed an incorrect prior information to the Bayesian inference? Does the present method give a continuous magnetization curve if we set M p at a value where there is no magnetic plateau? We here show that the algorithm produces only a negligibly narrow plateau at a wrong value of magnetization. We checked the method by setting M p /N = 0.2 in the AF Heisenberg spin chain. We obtained the magnetization curves both below and above this M p value individually by the same procedure as in the incommensurate modulated model  www.nature.com/scientificreports/ in Fig. 1. The results are plotted in Fig. 2b by a red line. We estimated two critical field above and below the magnetization plateau at 1.37862(8) and 1.37762 (9), respectively; in other words, the plateau width was only 0.001. Asymmetric parameters were 0.883(2) and 1.143 (2), which are close to the gapless value 1. We did not identify the difference of the magnetization curve from the exact result. The average deviation of the magnetization curve from the exact one was 0.0006 (7) in the range of 0 < H < 1.99 , which is consistent with the results obtained by fixing M p = 0 . The error is estimated by the standard deviation among four thousand data points. In short, the algorithm answers with practically no plateau if we give a value of M p at which there is no magnetization plateau.
Size extrapolation of the spin gap. An estimate for the spin gap by H(0) always gives a value smaller than the conventional estimate E(1) − E(0) because E(M) is a convex function. Both estimates should converge to the same value in the thermodynamic limit accompanied by different size dependences. A combined size-extrapolation analysis using both E(1) − E(0) and H(0) gives an improved estimate for the spin gap in the thermodynamic limit.
We can perform this extrapolation analysis by the same procedure as we fixed the plateau point using the mirror data. Namely, we plot spin-gap data against 1/N and search for an extrapolated point (1/N = 0, �) by the Bayesian inference such that a model function connecting the original data and the mirror data becomes the smoothest. We now have two estimates for the spin gap, H(0) and E(1) − E(0) . For each data series, we define the mirror data with respect to the common extrapolated point (1/N = 0, �) . We estimate the log-likelihood function of the model function for each data series, and just sum them up. We search for a value of that yields the largest value of the sum. Once we obtain the spin gap and the hyperparameters for each data series, we have a model function by Eq. (7).
The results are summarized in Table 1 compared with those obtained by the quadratic least-squares method. Namely, �(N) = � + b/N 2 + c/N 4 in the gapful models, and �(N) = � + b/N + c/N 2 in the gapless models and in the kagome antiferromagnet. The B-GK method gave a result consistent with the least-squares one regardless of whether the system is gapless or gapful. The extrapolated values listed in Table 1 remained the same even if we changed the horizontal axis of the plot from 1/N to 1/N 2 except for the case of the kagome antiferromagnet. We changed the horizontal axis to 1/ √ N and the result changed from 0.028(1) to 0.080 (2). The extrapolation was sensitive because we only had three data for the size extrapolation.
The model function for the size extrapolation is plotted in Fig. 3. It converged to a finite value with a zero slope in a gapful model. It is consistent with a reasoning that there is a typical length scale ξ , beyond which the data loses the size dependence. It corresponds to a spin gap as ξ ∼ 1/� . The model function in the AF Heisenberg spin chain converged to zero with a finite slope, which is also consistent with this reasoning.
Application to the kagome antiferromagnet. Results presented above guaranteed that the present method gives the magnetization curve and the spin gap accurately. The asymmetric parameter A s and the model function for the exponent δ are expected to identify whether the spin gap is zero or finite. Now, we apply this method to the S = 1/2 AF Heisenberg model on the kagome lattice. We first focus on the magnetization curve of this model, particularly on the existence of the 1/9-plateau. It was claimed to exist by Nishimoto et al. 40 , whereas denied by Nakano and Sakai 44 .
We calculated the ground-state energy for N = 36, 30, 27, and 24, and applied the B-GK analysis to the data. The ground-state energy for N = 36 were obtained by the numerical package H 58 . The results for the magnetization curve are shown in Fig. 4a. Here, we used five values of 2M p /N at 0, 1/9, 3/9, 5/9 and 7/9 as prior information and performed each search for E p without using the data point of (M p , E(M p )) . We could not estimate the plateau point without fixing the value as was done in the modulated spin chain in Fig. 1. This is just because we only have a few data points between the neighboring magnetic plateaux for these lattice sizes. The magnetization curve for each plateau connects with each other smoothly. They are consistent with the one obtained by the DMRG calculation 40   www.nature.com/scientificreports/ to finite-size effects. We observed the 1/9-plateau clearly except for the size N = 30 . The plateau width exhibits strong finite-size effects. The situation is similar for other magnetic plateaux. The 5/9-plateau vanished only for the size N = 24 . The plateau magnetic states with the broken translational symmetry may not fit properly in such distorted lattices. Shapes of the finite-size kagome lattice are summarized in Fig. 4c. A result for a symmetric lattice mostly exhibits wide magnetic plateaux. The model function of the critical exponent δ for each system size is shown in an inset of Fig. 4a. It approaches a value around 2 in the low-field side of the plateaux of 2M p /N = 1/9 , 3/9, and 5/9. However, most of the data exhibit a crossover bending before reaching the critical point. We observed the similar behavior in an inset of Fig. 2a, where the lattice size was too small for the spin gap in the XY model with = 0.1 . The exponent in the high-field side of the 1/9-and the 1/3-plateaux as well as that of the ground state at M p = 0 converges to one with a zero slope, which is similar to the gapless models in one dimension. The result in the high-field side of the 5/9-plateau exhibited the strong finite-size effect, which suggests that we need more data with larger lattice sizes. The exponent in the low-field side of the 7/9-plateau converges to one but from above with a finite slope, which is a behavior different from the other plateau edges. These differences in the behavior of δ suggest different scenario for the excited states at each plateau edge. The estimated critical exponent in the low-field side of the 1/3-plateau is consistent with the estimate obtained by Sakai and Nakano 39 , which gave δ = 1.92(99) , but that in the high-field side deviated from their estimate, which gave δ = 0.56 (15).

H(0) E(1)-E(0) Model function Least squares
The size extrapolation of the spin gap for the kagome antiferromagnet is shown in Fig. 4b. We also plotted for comparison the spin-gap data for larger and different-shape lattices 28,59 . Here, we obtained the spin-gap data for systems with 18, 24, and 30 spins under the periodic boundary condition as well as under the twisted boundary condition with a π flux. By using four data series with a common extrapolated point, we estimated the gap at 0.028(1). The spin-gap data for N = 36 (both E(1) − E(0) and H(0)) and N = 27 (only H(0)) as well as those for larger and different-shape lattices much deviate from the trend of other system sizes, and hence we did not use them in the extrapolation analysis. The extrapolated gap shifts to 0.078(3) if we include the lowest gap data of N = 36 and 42 among the shape-dependent results. In this analysis, we needed to set the error bars of the ED data to 0.003 in order to accept the deviated data to be fitted smoothly. The extrapolation analysis fails if we include the data of N = 48 or use other data of N = 36 and 42.  www.nature.com/scientificreports/ All data (both E(1) − E(0) and H(0)) up to the present lattice sizes seem to decrease faster than the behavior of linear convergence to zero. Since the spin gap cannot be negative, there must exist a size-crossover point at which the trend changes. This is also consistent with the behavior of each model function in Fig. 4b, which almost lost the size dependence before reaching the thermodynamic limit. These results suggest that the gap is finite, although inclusion of large-size data may affect the final result much, which is always the case with the extrapolation analysis. Considering the size dependence and the shape dependence of the spin-gap data obtained up to the present, we need data for much larger lattices.
Our extrapolated value of the spin gap is consistent with three previous results: the DMRG calculation on a kagome strip 32 , which gave 0.02-0.04; the DMRG calculation using the lattice deformation 40 , which gave 0.05(2); the experimental estimate 31 , which gave 0.03-0.07. Meanwhile, it physically contradicts recent results 32,33 that suggest a gapless Dirac spin liquid, although our result of the model function of the exponent δ converging to 1 supports this picture. The existence of the spin gap thus remains under debate.

Discussion
We can obtain a continuous and differentiable model function of data by the Gaussian-kernel method. The Bayesian inference also gives an exclusion point of data very accurately. These two points are the essential ingredient for the success of the present method. The accuracy of each physical quantity was much higher than the resolution due to the finite lattice size. It is a great advantage when we only have insufficient data. Applications to incommensurate systems and to two-dimensional quantum systems are promising. A critical exponent can be obtained as a continuous curve converging to its critical value at the critical point. It is an alternative method for estimating critical exponents to the conventional scaling analysis.
Although the original Gaussian-kernel regression is basically the data interpolation, we extended it to the data extrapolation by defining mirror data. There remains a room for improvement in defining the mirror data and for another attempt to make this extension, depending on the problem. In the present data extrapolation, we do not need to assume any form of a model function as has been done in the least-squares method. The method automatically finds the most probable model function converging to the extrapolated point. We can avoid ambiguity due to the choice of the function form. We checked in this paper that the B-GK results are robust against the change of the horizontal axis. The present method may replace a conventional data extrapolation and numerical differentiation, which have been applied to various scientific analyses.
The asymmetric parameter introduced in the mirror data and the model function of the exponent δ possibly serve as an identifier for a gapless/gapful system. The former one identifies it by its values A s ≃ 1 or −1 . The latter one does it by its converging value and behavior. Systems investigated in this paper satisfied these criteria. In a case with a very small spin gap, the algorithm suggested that it is not conclusive by giving an ambiguous value and the behavior. It indicates the necessity of preparing more data with larger lattices. In other words, the algorithm did not make a mistake to identify the small-gap system as gapless. We may regard this as a partly success.
In this context, the spin-gap issue in the kagome antiferromagnet remained unsettled. The size extrapolation of spin-gap data gave a very small gap, whereas a model function of the exponent δ in an inset of Fig. 4a showed a convergence to 1, which is common to the gapless systems. Finally, the asymmetric parameter took an ambiguous value between 1 and −1 . This contradiction reflects the difficulty in the study of the kagome antiferromagnet. The lattice sizes treated in this paper were restricted to be small. It may be much smaller than a typical size for which physics of the kagome antiferromagnet clearly emerges. We need careful analyses beyond the lattice sizes that we treated, because the size effect in the kagome antiferromagnet sometimes changes its trend depending on the shape of lattice 26,28 .
Let us make one comment. The present algorithm is not universally better than other algorithms for any problems. This is because of the no-free-lunch theorem 2,3 in machine learning. There may exist another algorithm that works better for some problems. We found that the B-GK method works better than numerical differentiation by the difference and better than the least-squares extrapolation. On the other hand, we needed to give some prior information or directions for the algorithm to work well. We utilize this information depending on each problem and data 3 . It is a general aspect of the machine-learning algorithm. Therefore, we cannot use it as a black box. A disadvantage of the kernel method compared to the neural-network algorithm is the computational cost O(d 2 ) , where d is the number of data. However, the computational ability is still increasing and it may solve the disadvantage.

Method
In order to obtain the magnetization curve in the ground state numerically, we first evaluate the ground-state energy E(M) in each total magnetization subspace with M = i S z i . The relation between the magnetization and the magnetic field, H, is given by We plot M against H(M) to obtain the magnetization curve. This differentiation has been estimated by the difference as in H(M) = E(M) − E(M − 1) , which exhibits a stepwise behavior. We cannot determine the existence of a nontrivial magnetic plateau from such results. The DMRG method is a good choice to increase the system size in low-dimensional systems, but the differentiation still has been performed by the difference. The resolution of the result is limited by the lattice size. This situation is common in numerical differentiation in various fields up to the present. The B-GK method replaces it by analytic differentiation of the model function. www.nature.com/scientificreports/ Let us briefly explain the Gaussian kernel regression [1][2][3]11 . We try to obtain a model function for data (x i , y i , δy i ) , where i = 1, 2, · · · , d is the data index, and δy i denotes the error of datum y i , e.g. a Monte Carlo error. The point is to use the Gaussian kernel function K(x i , x j ) for x i = x j of the form where θ 1 , θ 2 and θ 3 are hyperparameters, which are fixed by maximizing the log-likelihood function where C is a d-dimensional covariance matrix defined by and |C| denotes the determinant of C. This maximization problem is solved by a numerical package 60 , such as the conjugate-gradient method or the downhill simplex method. With the estimates of the hyperparameters, we have a continuous and generally nonlinear model function for arbitrary x as This function consists of data {y j } with their weights given by the Gaussian kernel function. We may roughly consider that the weight is larger if the data is closer to x or the error bar is smaller. Since x appears only in K(x, x i ) , we can find the derivatives of F (x) by replacing K(x, x i ) with ∂K(x, x i )/∂x and its higher derivatives. Now, suppose that we have a set of energy data, (M i , E i ) , as shown in Fig. 1a. When the magnetic plateau exists in the magnetization curve, the function H(M) of Eq. (3) should jump at M = M p and the data (M i , E i ) should exhibit a sudden change of slope at the plateau point. Because of this, we cannot model all the energy data by one smooth function that straddles the plateau point. This is the same situation in which we determined the phase transition temperature in the two-dimensional Ising model by the B-GK method 51 . Energy data were not modeled by a smooth function beyond the phase transition temperature. The critical temperature was estimated as an exclusion point. Here, we follow the same recipe.
We first split the energy data into the high-field region ( M > M p ) and the low-field region ( M < M p ) at a plateau point (M p , E p ) , which is for the moment unknown and is to be estimated by the Bayesian inference. We start the inference from a random initial value of (M p , E p ) . We set d h pieces of points (x i , y i ) from the data in the high-field region as for i = 1, 2, . . . , d h , where d h is the number of data points in the high-field region.
A key of the present algorithm is to introduce additional data points (x i+d h , y i+d h ) as the mirror data as shown in Fig. 1a by cross symbols. Here, we consider a line y = ax that goes through the plateau point with a slope a = (y u − y ℓ )/(x u − x ℓ ) , where (x u , y u ) and (x ℓ , y ℓ ) are the upper and the lower neighboring data of the plateau point, and hence it gives an approximate slope at the plateau point. We consider mirror data with respect to this line. Namely, for i = 1, 2, . . . , d h , where A s is an additional hyperparameter to be estimated by the Bayesian inference. The mirror is antisymmetric if A s = −1 and symmetric if A s = 1 . This is an adoption of the method of mirror images in electrostatics; mirror charges are fixed so that the electric field may continue smoothly at the boundary. The parameter A s is determined so that the model function may become as smooth as possible, which is automatically realized by the Gaussian kernel regression. We searched for A s from the initial value randomly distributed between −1 and 1.
We introduce the asymmetric parameter A s because of the following reason. In general data regression, the quality of modeling becomes poorer near the edges of data series. Since the magnetic plateau exists between two edges of the E(M) data series, the numerical precision of the obtained model function can deteriorate much. Then, its derivative, the magnetization curve, would exhibit numerical instability near the plateau point. The introduction of the mirror data solves this problem by making the data edge the midst point.
Using 2d h data points in Eqs. (8) and (9), we evaluate the log-likelihood function (5) in the high-field region. The log-likelihood function in the low-field region is also evaluated by applying the same procedure to 2d ℓ data points, where d ℓ is the number of data points in the low-field region. We searched for a common estimate of (M p , E p ) and individual estimates of A s in both low-and high-field regions so that the sum of the two loglikelihood functions may take the maximum. We carried out this search for 800 times. Then, we took the average over 40 best results and put an error by the standard deviation. When the distribution of the log-likelihood function is broad, we increased the search up to 1600 times and took the average over 400 best results. This occurred when we estimated the magnetization curve of the kagome antiferromagnet. Using the estimated parameters, The critical exponent of the magnetization at the plateau edge H p is defined by (H − H p ) ∼ (M − M p ) δ . By logarithmic differentiation and the l'Hôpital's rule, we can estimate this exponent in the following expressions: We have model functions of these expressions by analytic differentiation of the model function of H(M). By plotting two model functions with equivalent expressions, we can fix the common extrapolated point as the critical exponent. The second expression trivially takes a value 1 if we set M = M p as long as the denominator ∂H/∂M is not exactly zero. We observe it as a crossover bending of the model function. Before reaching the critical point, both expressions exhibit numerical instability in the vicinity of M = M p . It is caused by the situation 0/0 for the first expression, and is caused by the poor estimation of the derivatives for the second expression; see an inset of Fig. 1b. We need to discard these data where the value of |M − M p | is smaller than several times the standard deviation of M p .
When we apply this method for the size extrapolation of the spin gap, we set x i = 1/N i and y i = �(N i ) , where �(N i ) is the spin gap estimated at the size N i . We search for the extrapolated point (1/N = 0, �) by defining the mirror data in the negative side of 1/N. We set the approximate slope a = 0 in this analysis because there is no real data in the lower side of the extrapolated point.
It is very important to avoid overfitting and underfitting in the machine-learning algorithm 1-3 . When overfitting occurs, the algorithm tries to fit data strictly, ignoring a trend of whole data. It occurred mostly in the analysis of the kagome antiferromagnet. For example, an estimated spin gap H(0) took a negative value even for a finite lattice size. In another case, the extrapolated spin gap took a value larger than the finite-size data. The cross validation is generally useful to avoid the overfitting. In the textbook example, we would randomly select (d − 1) pieces of data out of d and search for the parameters. The estimated parameters would be validated with another choice of (d − 1) pieces of data. This cross validation actually fails when d is very small. In the present analysis, we introduced another method of cross validation by using random noise. When we estimated parameters, we first added random noise to the original data, where the noise was set proportional to the error of the data. We validated the estimated parameters by calculating the log-likelihood function using data with different random noise. We found that this procedure works fine even when the number of data is very much restricted. We mostly set noise amplitude to 10 −6 since the ED data are supposed to be numerically exact. We needed to increase it up to an order of 10 −4 when strong overfitting occurred. On the other hand, underfitting may occur giving just a flat line of averaged data, if the noise was set too large. We needed to check the results and set values of the noise of the original data so that both overfitting and underfitting may vanish. This is the only procedure that we did manually.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.