Introduction

Spectroscopy (X-ray, optical, infra-red, electron, etc.) is a popular and important experimental technique for materials analyses and investigations on the fundamental properties of materials.1,2,3,4,5,6 Large amounts of samples and experimental data need to be measured and treated for materials research and development. Therefore, there is a strong demand for high-throughput measurement to reduce the time and cost of spectroscopy experiments. In a conventional spectroscopy experiment, a large amount of data points is usually measured with sufficient measurement time to obtain a spectrum with an adequate signal-to-noise ratio. Often, the quality of a spectrum is determined according to the experimenter’s experience.

Although there are single-shot spectroscopy experiments like wavelength-dispersive X-ray spectroscopy,7 many kinds of spectroscopy need point-by-point measurement with scanning energy or wavelength. One can not obtain whole spectrum until the end of the experiment in such sequential point-by-point measurement and one can obtain parameters by analysis as a post-process after the measurement. Modern experiments like scanning X-ray micro-spectroscopy takes relatively long measurement time per energy because it takes scanning image at each energy point.8 Therefore, an efficient measurement by reducing energy data points based on the intelligent design of experiments is needed. At present, regression models are used to realise precise predictions thanks to advances in machine learning techniques, and such techniques can be applied to the intelligent design of spectroscopy experiments.9,10

Machine learning techniques have recently been introduced to materials science.11 Materials informatics12 is regarded as the fourth paradigm in the field of materials science following the previous paradigms of experiment, theory, and computation.13 In materials informatics, statistics and/or machine learning techniques are necessary to derive or predict target data from big data in an efficient manner. Bayesian optimisation has been applied to the high-throughput prediction of new materials.14,15 However, this kind of sampling optimisation strategy has never been applied to spectroscopic measurement. Bayesian optimisation can be applied when an optimisation objective can be defined as a functional to be modelled. Because our aim in this study was to optimise the measurement strategy for approximating the spectrum, instead of a Bayesian optimisation in which the Gaussian process16 is used as a fundamental modelling tool, we utilised the Gaussian process and its predictive variance to design efficient measurement strategies. Gaussian process regression is also known as kriging and is used in geostatistics to predict a geographic surface from an interpolation of discrete observation data.17

Generally, a spectrum is represented as a nonlinear function of energy or wavelength. The Gaussian process is a generalised linear model that can approximate such nonlinear spectral shapes by linear regression in feature space. The Gaussian process predicts a spectrum by tuning hyper-parameters through the learning of previous data of the spectrum. Moreover, not only the expectation value of the prediction but also the variance can be evaluated. Thus, the certainty of the prediction can be evaluated, and the efficiency of the adaptive sampling of new data points can be optimised.

In order to assess its applicability to spectroscopy, we applied Gaussian process modelling to X-ray magnetic circular dichroism (XMCD) spectroscopy. XMCD spectroscopy is an experimental technique for quantitatively evaluating the orbital and spin magnetic moments of a material.18,19,20 These magnetic moments are related to the quantities evaluated from the X-ray absorption spectrum (XAS) and XMCD spectrum based on magneto–optical sum rules.21,22,23 Thus, XMCD spectroscopy is suitable for the quantitative assessment of the performance of a spectrum predictor modelled with Gaussian process.

In this paper, we propose an adaptive design for XMCD spectroscopy experiments that uses a machine learning technique. The XMCD spectrum is predicted by Gaussian process modelling with learning of an experimental spectrum. Adaptive sampling reduces the total data points for the evaluation of magnetic moments while maintaining the required accuracy.

Results

Strategies for the adaptive design of an XMCD spectroscopy experiment

Figure 1 shows comparison between conventional XMCD experiment and the adaptively designed XMCD experiment. In a conventional XMCD experiment, measurement and analysis are independent. Therefore, one can evaluate the magnetic moments after the measurement. On the other hand, in an adaptively designed XMCD experiment, one can evaluate magnetic moments in sampling-by-sampling regime. First, initial data points are sampled to obtain an experimental discrete spectrum as the training data set. Next, a spectrum is continuously predicted by Gaussian process modelling fitted to the training data set. Then, magnetic moments are evaluated from the predicted XMCD and XAS spectra. The experiment is stopped if the values of the magnetic moments satisfy the convergence criterion. Otherwise, new data points are sampled, and the spectrum is predicted again. Note that no prior knowledge is included in the modelling.

Fig. 1
figure 1

Comparison between conventional XMCD experiment and the adaptively designed XMCD experiment. (left) Flowchart for conventional XMCD spectroscopy experiment. First, data points are sampled to obtain a spectrum for opposite directions of X-ray polarisations or magnetisation directions at each energy point. Next, magnetic moments are evaluated from the experimental spectra. (right) Flowchart for the adaptive design of the XMCD spectroscopy experiment. First, initial data points are sampled to obtain an experimental spectrum as the training data set for opposite directions of X-ray polarisations or magnetisation directions at each energy point. Next, the Gaussian process (GP) model predicts the spectra, and magnetic moments are evaluated from the predicted spectra. The convergence of values of the magnetic moments are checked in order to determine whether to sample new data points and return to predicting spectra or stop the experiment. We examined two and three options for sampling initial data points and selecting new sampling point, respectively

We used Sm M4,5 XMCD and XAS spectra of SmCo5 to assess the applicability of Gaussian process modelling. We examined two sampling conditions for the initial data points: (1) Equally sampled 30 data points from the pre-edge to the post-edge of the Sm M4,5 absorption edges (1060–1130 eV). (2) Intensively sampled 30 data points at the M5 and M4 peaks (15 data points for each); this is because the peak positions of a specific element are usually known beforehand.

We examined three methods for the selection of new sampling data points: (1) Sample the data point with maximum variance (max. var.) of the predicted spectrum, (2) random sampling, and (3) random sampling weighted with variance (i.e., a data point with large variance has a high possibility of being sampled). Hereafter, this sampling method is called ‘weighted sampling’. Note that random sampling and weighted sampling were examined 50 times with different random numbers and averaged for all examinations.

Magnetic moments should converge if the variation in the magnetic moments is less than 0.5% for five successive times. If the convergence criterion is not satisfied, a new sampling point is selected and sampled with the methods described above.

XMCD and XAS spectra measured in a conventional experiment

First, Fig. 2 shows the Sm M4,5 XMCD and XAS spectra of SmCo5 obtained in a conventional experiment. These spectra were obtained by counting transmitted X-rays through the specimen. In total, 216 data points were measured for the energy range of 1060–1130 eV. Both the XMCD and XAS spectra were similar to those of trivalent Sm ion that have been reported previously.24,25 To evaluate the magnetic moments by the magneto–optical sum rules, the integral values of the XMCD spectrum p, q and integral value of the XAS spectrum r need to be calculated. The red solid lines in Fig. 2 represent the energy-integration spectra for XMCD and XAS. The p value was taken at 1096 eV to be p = 0.39, and the q and r values were taken at 1130 eV to be q = 2.06 and r = 16.3, respectively. By applying the magneto–optical sum rules, we obtained the orbital magnetic moment mo = 2.27μB, spin magnetic moment ms = −2.34μB, and their ratio mo/ms = −0.97. We assumed the number of 4f holes to be n = 9 for the trivalent Sm ion. Note that the magnetic dipole moment is effectively included in ms. In this study, these values of magnetic moments were used as a reference for the optimisation by Gaussian process modelling.

Fig. 2
figure 2

XMCD and XAS spectra measured in a conventional experiment. Sm M4,5 a XMCD and b XAS spectra of SmCo5. The red solid lines represent the energy-integration spectra for XMCD and XAS. The p, q, and r values are defined

XMCD spectra predicted by the Gaussian process model

Figure 3a–f shows the typical XMCD spectra predicted by the Gaussian process model. The initial 30 data points were equally separated and artificially extracted from the experimental XMCD spectrum shown in Fig. 2a. The predicted spectra (blue solid curves) for different numbers of total energy points are shown for comparison. Variances in the predicted spectra (red solid curves) became large between the observed data points, as clearly shown in Fig. 3a. The data point with the maximum variance of the predicted spectrum was adaptively sampled. By increasing the observed data points, the total variance of the predicted spectra became smaller, and the spectral shape of the predicted spectrum became similar to that of the experimental spectrum.

Fig. 3
figure 3

af XMCD spectra predicted by the Gaussian process model with the equally separated initial data points (black filled circles in a) and maximum variance point sampling. Sm M4,5 XMCD spectra for a 30, b 35, c 40, d 50, e 60, and f 70 total data points. The black dashed and blue solid curves represent the true (experimental) spectrum and predicted spectrum by the GP model, respectively. The open circles represent observed data points. The red solid curves indicate the variance with the 95% confidence interval (±2σ) of the predicted spectrum. gk Results for the adaptive design of the XMCD spectroscopy experiment where the initial data points had equal separation. g Orbital magnetic moment mo, h spin magnetic moment ms, and i their ratio mo/ms plotted as functions of the total data points. The red, blue, and green markers represent the methods for data point sampling: maximum variance (max. var.), random, and random sampling weighted by variance, respectively. The black solid and dashed lines represent the reference value and 5% deviations, respectively. j Total data points to convergence values of mo, ms, and mo/ms for different sampling methods. k Deviations of mo, ms, and mo/ms from their values in the conventional experiment

Figure 4a–f shows another example of the XMCD spectra predicted by the Gaussian process model. The initial 30 data points were intensively sampled around the main peaks. In Fig. 4a, the predicted spectrum approximates the experimental spectra very well for peak regions. However, the predicted spectrum largely deviates from the experimental spectra for non-peak regions and variance is very large. The total variance of the predicted spectra became smaller with increasing the data points, however the deviation from the experimental spectrum in non-peak regions is large even for 50 pts (Fig. 4d) as compared to that of the case of initial data points with equal separation (Fig. 3d).

Fig. 4
figure 4

af XMCD spectra predicted by the Gaussian process model with the initial data points intensively sampled around peaks (black filled circles in a) and maximum variance point sampling. Sm M4,5 XMCD spectra for a 30, b 35, c 40, d 50, e 60, and f 70 total data points. The black dashed and blue solid curves represent the true (experimental) spectrum and predicted spectrum by the GP model, respectively. The open circles represent observed data points. The red solid curves indicate the variance with the 95% confidence interval (±2σ) of the predicted spectrum. gk Results for the adaptive design of the XMCD spectroscopy experiment with the initial data points intensively sampled around peaks. g Orbital magnetic moment mo, h spin magnetic moment ms, and i their ratio mo/ms plotted as functions of the total data points. The red, blue, and green markers represent the methods for data point sampling: maximum variance (max. var.), random, and random sampling weighted by variance, respectively. The black solid and dashed lines represent the reference value and 5% deviations, respectively. j Total data points to convergence values of mo, ms, and mo/ms for different sampling methods. k Deviations of mo, ms, and mo/ms from their values in the conventional experiment

Results for the adaptive design of the XMCD spectroscopy experiment

Figures 3g–k and 4g–k show the results for the adaptive design of the XMCD spectroscopy experiment. Figure 3g–i shows the results for the initial data points with equal separation. The orbital magnetic moment, spin magnetic moment, and their ratio from the predicted spectrum are plotted as functions of the total data points. True values for the magnetic moments and the ±5% errors are indicated by black solid and dashed lines, respectively. As shown in Fig. 3g, the orbital magnetic moment converged to the true value at around 40 total data points with maximum variance sampling. Random sampling showed poor convergence to the true value even with 100 total data points. Weighted sampling behaved halfway between maximum variance sampling and random sampling and showed good convergence to the true value. As shown in Fig. 3h, the spin magnetic moment almost fell within ±5% of the true value with the initial 30 data points. It showed moderate convergence to the true value as the number of data points was increased. As shown in Fig. 3i, the ratio between the orbital and spin magnetic moments had the same tendency as the orbital magnetic moment for different sampling methods. Maximum variance sampling rapidly converged to the true value around 40 data points. Figure 3j shows the total number of data points for the various sampling methods to satisfy the convergence criterion. All sampling methods satisfied the convergence criterion at about 50 points. Random sampling seemed to converge with the minimum number of data points for mo and mo/ms. However, the deviation of the converged value of the magnetic moment from the true value was very large compared to those of maximum variance sampling and weighted sampling, as shown in Fig. 3k.

Figure 4g–i shows the results for the initial data points with intensive sampling around peaks. As shown in Fig. 4g, the orbital magnetic moment was within ±5% of the true value with the initial 30 data points. However, the value was overestimated as the number of data points was increased with all sampling methods. The deviation from the true value was largest for maximum variance sampling at around 40–50 data points. As shown in Fig. 4h, the spin magnetic moment greatly deviated from the true value with the initial 30 data points. The value converged to the true value as the number of data points was increased. As shown in Fig. 4i, the orbital to spin magnetic moment ratio also greatly deviated from the true value with the initial 30 data points and converged as the number of data points was increased. As shown in Fig. 4j, the total number of data points for convergence was more than the initial data points with equal separation.

This comparison of the results revealed that Gaussian process modelling works well for initial data points with equal separation. Sampling data points with the maximum variance of a predicted spectrum results in the convergence of magnetic moments with the minimum number of total data points and good accuracy.

Validation of the present method

Gaussian process modelling was found to work well at approximating Sm M4,5 XMCD and XAS spectra with complex spectral shapes. To validate the method, we applied it to the Fe and Co L2,3 XMCD and XAS spectra of FeCo alloy. Those spectra were measured by total electron yield method. Measurement method and electron transition is totally different between previous Sm M4,5 and present Fe and Co L2,3 spectra. The results are shown in Fig. 5. The initial 10 data points were equally separated within the energy range of the Fe and Co L2,3 absorption edges, respectively. The overall trends for the magnetic moments versus the total number of data points were the same as those of Sm M4,5 XMCD. The maximum variance sampling converged to the magnetic moments with the minimum number of total data points and minimum deviation compared to the other sampling methods for Fe. Deviation from the true values for weighted sampling is smaller than max. var. sampling for Co. This is considered as the effect of the average of 50 trials. Thus, the present method also works well for the Fe and Co L2,3 XMCD spectra.

Fig. 5
figure 5

Results for the adaptive design of an XMCD spectroscopy experiment applied to ae the Fe L2,3 and fj the Co L2,3 XMCD spectra where the initial data points had equal separation. a,f Orbital magnetic moment mo, b,g spin magnetic moment ms, and c,h their ratio mo/ms plotted as functions of the total number of data points. The red, blue, and green markers represent the method for data point sampling: maximum variance (max. var.), random, and random sampling weighted by variance, respectively. The black solid and dashed lines represent the reference value and 5% deviations, respectively. d,i Total data points for the values of mo, ms, and mo/ms to converge with different sampling methods. e,j Deviations of mo, ms, and mo/ms from their values in the conventional experiment. Inset in b shows the Fe L2,3 XMCD spectrum with initial data points (black filled circles). The black dashed and blue solid curves represent the true (experimental) spectrum and predicted spectrum by the GP model, respectively. The red solid curves indicate the variance with the 95% confidence interval (±2σ) of the predicted spectrum

As described above, the present method is valid for XMCD spectra for completely different absorption edges, compounds, instruments, and measurement methods. Generally, we do not know the spectral shapes of X-ray absorption and XMCD spectra of unknown materials, however the method is useful to predict the magnetic moments with required accuracy. Note that the present method is not the method to predict detailed spectral shapes of X-ray absorption and XMCD spectra of unknown materials, but to predict magnetic moments with required accuracy under the reduced measurement time. In other words, magneto–optical sum rules are robust for fine structures of the spectra to evaluate the magnetic moments.

Selection of the correlation function

Choice of the correlation function is an essential issue in Gaussian process modelling. We examined exponential and Matérn correlation functions with various parameters as well as the Gaussian correlation functions, which we already shown the results above. We applied Gaussian process modelling with various correlation functions to the Sm M4,5 XMCD and XAS spectra. We tried the best method for initial sampling (i.e., equally separated data points) and adaptive sampling (i.e., sampling data points with maximum variance) for the exponential and Matérn correlation functions. As shown in Fig. 6, mo, ms, and mo/ms values converge to the true value by increase of total data points. Gaussian, exponential and Matérn correlation functions with k = 0.5 show similar tendency in particular for mo. However, convergence to the true value of Gaussian correlation function seems faster than other correlation functions for ms and mo/ms. On the other hand, Matérn correlation functions with k = 1, 1.5 and 2 deviates from true values in fewer data points and abruptly converges. Therefore, we concluded the Gaussian correlation function is the best for the present method.

Fig. 6
figure 6

Results for the adaptive design of an XMCD spectroscopy experiment for Sm M4,5 XMCD spectrum by the Gaussian process modelling with various correlation functions: Gaussian (same as in Fig. 3), exponential, and Matérn. a Orbital magnetic moment mo, b spin magnetic moment ms, and c their ratio mo/ms plotted as functions of the total number of data points. Initial data points are equally separated and adaptive sampling of the data point with the maximum variance was applied. Parameters are set to p = 1 for the exponential correlation function and k = 0.5,1,1.5,2 for the Matérn correlation function

Discussion

Generally, peaks are considered to be more important than non-peak regions of a spectrum. Modern XMCD experimental instruments allow intensive sampling of specific energy ranges; i.e., higher density around the peaks and lower density in non-peak regions. By considering such a situation, we examined the case when initial data points are intensively sampled at peaks. The convergence of the magnetic moments was worse than that of initial data points separated equally as shown in Fig. 4. This is because of the large variance in the non-peak regions of the predicted spectrum. Therefore, the equally separated sampling is better than the intensive sampling around the peaks, which is against our intuition, for the present method which predicts the spectra by Gaussian process modelling.

Gaussian process modelling has several favourable properties. First, under mild regularity conditions, the Gaussian process model asymptotically obtains an optimal functional relationship that minimises the gap between the true response y(x) and prediction \(\hat y(x)\). Second, when there are a finite number of observations as in real situations, the average of the squared error (i.e., generalisation error) between the true response and prediction by the Gaussian process model is known to show a roughly linear decrease for small n. As the number of observations is increased, the rate of decay of the generalisation error becomes slower than 1/n.26,27 This was also observed in our experimental results (not shown).

Gaussian process modelling offers accurate functional form estimation with point-wise confidence values. However, inference based on the Gaussian process requires storing and inverting the Gram matrix, which typically scales as O(n3). For large problems, storing and inverting a large-size matrix are prohibitive, and a large number of approximation methods have been developed to deal with this computational problem, such as the Nyström method.28 In our problem though, the number of energy points to be evaluated is 216 at most; hence, all of the computation can be done without resorting to approximation methods. If we use our proposed method for larger problems with many energy points, conventional approximation methods for the Gaussian process can be readily combined with our method to reduce the computational and storage costs.

In conclusion, we demonstrated the adaptive design of an XMCD spectroscopy experiment with Gaussian process modelling. The Gaussian process was found to successfully predict the nonlinear spectral shapes of the XMCD spectrum. Magnetic moments can be evaluated from the predicted spectra with the required level of accuracy. The present method reduces the total number of data points for measurement as well as the time and cost of an XMCD spectroscopy experiment. This method has potential applicability to various spectroscopy. It drastically reduces measurement time for point-by-point measurement, such as scanning transmission X-ray microscopy with scanning energy points around absorption edges.

Methods

XAS and XMCD experiments

Sm M4,5 XMCD and XAS spectra of SmCo5 were obtained by using a scanning transmission X-ray microscope (STXM)29 at the BL-13A of the Photon Factory, Institute of Materials Structure Science, High Energy Accelerator Research Organization, Japan. A SmCo5 specimen for the STXM experiment was prepared from a thermally demagnetised bulk material by using a micro-fabrication technique. In the STXM experiment at the Photon Factory, XMCD spectra are obtained as a difference of two X-ray absorption spectra for right-handed and left-handed elliptically polarised X-rays that are measured over the entire spectrum for fixed polarisation. Details of the STXM experiment are described in the literature.30 Fe and Co L2,3 XMCD and XAS spectra of FeCo alloy were measured at BL-14 of Hiroshima Synchrotron Radiation Center (HSRC), Hiroshima University, Japan.31 Spectra were obtained with the total electron yield method by measuring the sample drain current. In the XMCD experiment at HSRC, polarisation of the incident X-ray was fixed and the relative direction of the external magnetic field was switched parallel and antiparallel to the X-ray polarisation at each energy point. Details of the XMCD experiment at HSRC are described in the literature.32

Magneto–optical sum rules

The orbital magnetic moment (mo) and spin magnetic moment (ms) were calculated by applying the magneto–optical sum rules to the XMCD and XAS spectra. Experimentally obtained XMCD and XAS spectra were integrated along the energy axis to evaluate p, q, and r. p and q were obtained from the XMCD spectrum, and r was obtained from the XAS spectrum. The magneto–optical sum rules relate these values to the magnetic moments. For the 3d–4f transition (M4,5 edges) of rare earth elements such as Sm, the orbital sum rule is given as follows 21,22,24:

$$m_{\mathrm{o}} = \frac{{2q}}{r}n\mu _{\mathrm{B}},$$
(1)

where mo is the orbital magnetic moment and μB and n are the Bohr magneton and number of holes in the 4f orbital, respectively. The spin sum rule for the 3d–4f transition is given as follows:

$$m_{\mathrm{s}} = \left( {\frac{{5p - 3q}}{r}n + 6\left\langle {T_{\mathrm{z}}} \right\rangle } \right)\mu _{\mathrm{B}},$$
(2)

where ms is the spin magnetic moment and \(\left\langle {T_{\mathrm{z}}} \right\rangle\) is the expectation value of the magnetic dipole moment. For the magneto–optical sum rules for 2p–3d transition (L2,3 edges), see.23

Gaussian process for approximating continuous spectrum

We explain the Gaussian process modelling and how the relevant parameters are estimated below by using the observed data points. For details on Gaussian process modelling, see ref. 16. Our implementation of the proposed method is based on the R33 package GPfit.34 Let the i-th energy and corresponding output spectral be denoted by x i and y i  = y(x i ), respectively. The observed data points are denoted by \(X = (x_1, \ldots ,x_n)^{\rm T} \in {\Bbb R}^n\), and the corresponding responses are denoted by Y = y(X) = (y1,…,y n )Τ. The relation between the energy and output (e.g., XAS or XMCD spectrum) is modelled as

$$y(x_i) = \mu + z(x_i),\quad i = 1, \ldots ,n,$$
(3)

where μ is the overall mean and z(x i ) is a Gaussian process with \({\Bbb E}[z(x_i)] = 0,{\mathrm{Var}}(z(x_i)) = \sigma ^2\), and \({\mathrm{Cov}}(z(x_i),z(x_j)) = \sigma ^2R_{ij}\). In the Gaussian processing model, y(X) is assumed to have a multivariate normal distribution \({\it{{\cal N}}}({\mathbf{1}}_n\mu ,\sigma ^2R)\), where 1 n is an n × 1 vector of all ones, and R is the correlation matrix with elements R ij . There were several choices for the correlation structure; we used the most popular Gaussian correlation function defined by

$$R_{ij} = exp\left( { - \theta |x_i - x_j|^2} \right),\quad \forall i,j,$$
(4)

where \(\theta \in [0,\infty )\) is a hyper-parameter to be tuned. The maximum likelihood estimates of the mean and variance parameters are functions of the hyper-parameter θ and are obtained by using the observed data points X and responses Y as follows:

$$\hat \mu (\theta ) = ({\mathbf{1}}_n^{\rm T}R^{ - 1}{\mathbf{1}}_n)^{ - 1}({\mathbf{1}}_n^{\rm T}R^{ - 1}Y),$$
(5)
$$\hat \sigma ^2(\theta ) = \frac{{(Y - {\mathbf{1}}_n\hat \mu (\theta ))^{\rm T}R^{ - 1}(Y - {\mathbf{1}}_n\hat \mu (\theta ))}}{n}.$$
(6)

These estimates are plugged into the log-likelihood function of the Gaussian process model to estimate the hyper-parameter θ:

$$log(L_\theta ) \propto - log(detR) + nlog[(Y - {\mathbf{1}}_n\hat \mu (\theta ))^{\rm T}R^{ - 1}(Y - {\mathbf{1}}_n\hat \mu (\theta ))]$$
(7)

By using \(\hat \mu ,\hat \sigma\), and R (calculated using the optimised parameter \(\theta ^ \ast = {\mathrm{argmax}}_{\theta \in [0,\infty )}log(L_\theta )\)), the best predictor \(\hat y\) for a newly observed point \(x^ \ast\) is obtained as follows:

$$\hat y(x^ \ast ) = \hat \mu + r^{\rm T}R^{ - 1}(Y - {\mathbf{1}}_n\hat \mu )$$
(8)

with the mean squared error

$$s^2(x^ \ast ) = {\Bbb E}[(\hat y(x^ \ast ) - y(x^ \ast ))^2] = \sigma ^2\left( {1 - r^{\rm T}R^{ - 1}r + \frac{{(1 - {\mathbf{1}}_n^{\rm T}R^{ - 1}r)^2}}{{{\mathbf{1}}_nR^{ - 1}{\mathbf{1}}_n}}} \right),$$
(9)

where r = (r1(x*),…,r n (x*))Τ and r i (x*) = Cor(z(x*),z(x i )). The mean squared error s2 is used as the criterion for selecting the next energy to be examined.

To optimise the hyper-parameter θ, we used a multi-start gradient ascent to maximise the likelihood. In our implementation, we randomly selected 200 starting points of the hyper-parameter and used the quasi-Newton method (L-BFGS-B algorithm) with 200 different initial points. Then, we chose the results with the maximum likelihood value.

In order to investigate the effect of the choice of the correlation function for Gaussian process modelling, we tried other correlation functions as shown below. One is an exponential correlation function

$$R_{ij} = exp\left( { - \theta |x_i - x_j|^p} \right),\quad \forall i,j,$$
(10)

with p = 1. Note that the exponential correlation function with p = 2 is equivalent for the Gaussian correlation function. Another is a Matérn correlation function

$$R_{ij} = \frac{1}{{\Gamma (\nu )2^{\nu - 1}}}\left( {2\sqrt \nu |x_i - x_j|\theta } \right)^\nu \kappa _\nu \left( {2\sqrt \nu |x_i - x_j|\theta } \right),\quad \forall i,j,$$
(11)

where ν = (2k + 1)/2 with k = 0.5,1,1.5,2 and κ ν is the modified Bessel function of order ν.

Typical computational time for Gaussian process modelling of an XMCD spectrum in the present study is 0.03 to 0.2 s by using a laptop with a 3.3 GHz Intel Core i7 CPU.

Data availability

The data and codes that support the findings of this study are available from the corresponding author upon request.