Introduction

Over the last few years the use of machine learning opened up new vistas in many areas of physics, including plasma physics1, condensed-matter physics2,3, quantum physics4,5,6,7, thermodynamics8, quantum chemistry9, particle physics10 and many others. Recent examples include applications for magnetic confinement fusion11,12,13, inertial confinement fusion14,15,16, discovery of phase transitions17,18,19, closure of turbulence models20, representation of quantum states21,22, galaxy classification23 and orbital stability24. One of the origins of this progress is the possibility of processing large sets of data and drawing conclusions based on features that admit no straightforward description and assessment with human languages. In this way, some natural human limitations can be overcome, making machine learning be a useful tool that works in a fruitful synergy with traditional approaches in theoretical and experimental physics.

One area, where machine learning is successfully being employed, is related to the problem of model calibration25,26. The problem concerns finding appropriate parameters of a model based on incomplete and, potentially, inaccurate knowledge about the behaviour of the modelled system in a set of particular cases. Statistical methods, such as Maximum likelihood estimation, Bayes estimation, Kalman filtering and others26,27,28,29,30 are successfully applied in many areas, including financial market analysis31,32, hydrology33,34, urban studies35 and climatology36,37. However, the use of machine learning appears to be a promising alternative, which can provide some new opportunities38,39,40,41.

In this paper we consider the opportunities of using machine learning for solving long-standing problems in laser-plasma physics. We discuss the possibility of using autonomous recognition of difficult-to-qualify features in the data of real or numerical experiments for validating and advancing phenomenological models as well as for reconstructing experimental conditions. Using a phenomenological model for laser-plasma high-harmonic generation, we train an artificial neural network (NN) to reconstruct various parameters based on the recognition of unspecified features in the harmonic spectra. The NN then “identifies” the learned features in the spectra obtained with ab-initio simulations, which we use to mimic real experiments. In this way we can reconstruct the parameters of experiments or determine the most appropriate values for free parameters of incomplete theories. This can also be used to determine the validity regions of different models. It is important that this approach can be applied in case of inaccurate or intrinsically incomplete knowledge about the experimental conditions, i.e. in cases when performing a particular ab-initio simulation is not possible. In such a way this approach can provide new routes for experiments and new insights for theory and model development.

For the sake of completeness, we start from the discussion of basic ideas, using the Galton Board42 as an illustrative example. We then provide a proof-of-principle demonstration of the use of this methodology for the outlined problem in the field of laser-plasma interactions.

Methods

Typically, the validation of a theory is reduced to the experimental observation of some clearly describable feature, such as an observable physical value, its certain dependency on some parameters, a peak in some distribution, etc. These kinds of features are conventionally used to claim for the agreement of experimental and theoretical results.

One natural limitation of this conventional approach is that in the presentation of results we appeal to the consistency in terms of such clearly describable features and consequently do this mostly for features in one, two or three-dimensional sets of data. The essential part of many studies is finding ways of reducing and transforming raw data sets into the forms that expose such describable, indicative features. Of course, there have been developed numerous techniques and approaches, such as statistical and spectral analysis as well as various algebraic transformations. However, this toolbox is inevitably limited and in many cases the solution of the outlined problem requires insightful analysis and development of the theory and experiment with, sometimes, manual search for such a feature in large sets of data.

Another common consequence of developing sophisticated comparison methodology is losing the lucidity in the relation between the compared features and the origins of the theory. In some cases, it is difficult to say whether the observed feature unambiguously indicate the correctness of the theory or if it is peculiar to a family of theories that are thus not disqualified by the experiment. For example, the selected feature can be a generic consequence of some conservation laws rather than of the main principle or assumption that has to be validated. In other words, it can be difficult to quantify in what sense and to what degree the theory is validated and what the alternative theories that are disqualified by the experiment are.

One more limitation of such conventional methodologies is the fact that theories can be benchmarked against the data of experiments with sufficiently complete knowledge about the initial conditions and all important parameters. In some cases, the information is intrinsically incomplete that hampers the use of theories and ab-initio simulations.

In this paper we discuss and apply a methodology that overcomes the outlined limitations with the help of machine learning applied to the recognition of hardly describable features in outputs of ab-initio simulations or experimental data, even in case of essentially incomplete knowledge about the experimental conditions. We provide proof-of-principle demonstration of several essential capabilities of this approach:

  1. 1.

    Comparison, validation or disqualification of theories in a lucid and quantitative way (as a function of position in parameter space);

  2. 2.

    Completing theories through indirect measuring dependencies of free parameters, even in the parameter regions where they are not well-defined in terms of the first principles;

  3. 3.

    Indirect measurement for determination of unknown experimental parameters;

  4. 4.

    Identifying regions in a parameter space where certain ranges of experimental data carry unambiguous information about experimental conditions.

Note that these approaches are not intended to replace traditional methodologies but to supplement them with methods of gaining knowledge that can either be exploited heuristically or used to conceive hypothesis and ideas for further theoretical and experimental studies.

Although many ideas might look very trivial and known, we start from a general discussion of the outlined approaches. To support the discussion we use the well-known Galton Board42 experiment as a simple example of an experiment and a theory. After that we demonstrate how the developed methods can be used for resolving long-standing questions in the physics of high-intensity laser-plasma interactions.

Validation of a theory

Suppose we need to validate a theory \({\mathbb{A}}\) using an experiment \({\mathbb{E}}\). To formulate the problem in a more exact way we assume that we intend to compare theory \({\mathbb{A}}\) with some alternative theory \({\mathbb{B}}\) (or a family of alternative theories) which we intend to disqualify using experimental data. In our notations, both theories \({\mathbb{A}}\) and \({\mathbb{B}}\), as well as the experiment \({\mathbb{E}}\) are denoted as some possibly non-linear and non-deterministic operators that act on a vector of initial conditions c and give a vector of measurable quantities r. These vectors can represent a set of data of arbitrary composition and dimensionality. Suppose we carried out a sequence of experiments, then we can write:

$${{\bf{r}}}_{i}^{A}={\mathbb{A}}{{\bf{c}}}_{i},$$
(1)
$${{\bf{r}}}_{i}^{B}={\mathbb{B}}{{\bf{c}}}_{i},$$
(2)
$${{\bf{r}}}_{i}^{E}={\mathbb{E}}{{\bf{c}}}_{i},$$
(3)

where index i enumerates the experiments. We admit that due to experimental imperfections the values \({{\bf{r}}}_{i}^{E}\) are a subject of some unknown systematic or non-systematic distortions that hamper direct comparison of \({{\bf{r}}}_{i}^{E}\) with \({{\bf{r}}}_{i}^{A}\) and \({{\bf{r}}}_{i}^{B}\). However, we assume that the measurable data r contains some features that can appear in the results of either \({\mathbb{A}}\) or \({\mathbb{B}}\). These features can depend on the conditions c in a complex way, which also should be reproduced by the appropriate theory (at least to some extent). Note, that in general both theories can be applicable in certain regions of the parameter space of vectors c and this is something that we intend to determine.

To solve the problem we develop a unification theory \({\mathbb{U}}\) that depends on at least one parameter p and provides a smooth transition between the theories, for example, \({\mathbb{U}}(p=0)={\mathbb{B}}\), and \({\mathbb{U}}(p=1)={\mathbb{A}}\). In the most primitive case, this can be just a linear combination, i. e. \({\mathbb{U}}(p)=p{\mathbb{A}}+(1-p){\mathbb{B}}\). However, it is better to form the smooth transition not in-between the final results of the theories, but between the essential principles or assumptions that provide the origin for the development of the theories. This is because, even if both theories fulfill basic conservation laws, their linear combination might not (for \(0 < p < 1\)). To avoid this we can extend the dimensionality of the parameter p, so that in this space there exist some route between the points corresponding to \({\mathbb{A}}\) and \({\mathbb{B}}\) so that at each point of this route all the conservation laws are fulfilled. With help of simple examples we will see further why it is important.

We can now apply the unified theory for various possible initial conditions and generate a sufficiently large set of pairs ck and \({{\bf{r}}}_{k}^{U}={\mathbb{U}}{{\bf{c}}}_{k}\) for various random values of p and conditions c (k is the index running over the set). Next, we train a feed forward fully connected NN \({\mathbb{N}}\) to learn how to reconstruct p and c from rk, i.e.

$$({p}_{k}^{N},{{\bf{c}}}_{k}^{N})={\mathbb{N}}{{\bf{r}}}_{k}^{U}.$$
(4)

According to43 any continuous real-valued function can be approximated with such a NN for any given accuracy. Thus, we choose this type of NN as an approximation of unknown function that maps p and c to r. We can then apply the trained NN to the experimental data \({{\bf{r}}}_{i}^{E}\). If the values of cE are systematically close to the reconstructed values cN in the whole or some certain region of parameter space, we can interpret this as if the NN “recognizes” some indicative features in this region of parameter space. In this region the reconstructed value p can indicate the validity of one of the theories: a systematic tendency of p to 1(0) indicates the validity of theory \({\mathbb{A}}\)(\({\mathbb{B}}\)). This procedure can also show the transition between the applicability regions of theories explicitly, through plotting p as a function of parameters c.

It would be reasonable to ask: In what sense is the validity of a theory is demonstrated by this procedure? Of course, there exists a large variety of relations between the output value r and the parameter p and certainly not all of them are necessarily sensible in terms of physics. In other words, the NN can establish successful correlation between p and some feature of little importance or complete irrelevance to the physics of the process. To avoid this, we train the NN to reproduce not only p, but a sufficiently large set of physically essential values c. This favours establishing correlations with some features that significantly depend on the initial conditions and, in this sense, have some physical meaning. Although a more rigorous analysis would be of interest, in this paper we focus on showing proof-of-principle examples that convincingly demonstrate the rationality of this concept. However, we would like to emphasize that one should not overrate the validity of the inference obtained, by using it outside the context of (1) the range of validity in the space of c, (2) the view of the output rU, where we search for the indicative features, and (3) the set c used as the reference for the physical sense. We will discuss the rationality of this meaning in the last section.

Another potential difficulty is related to the possibility for the measured output to contain simultaneously two unique types of features each described by one of the theories. In this case, the NN can potentially select the one that describes more apparent features, i.e. more descriptive theory. One way to deal with such a difficulty is to consider more narrow or restricted output, so that only one type of features is included.

We can outline one important peculiarity of the method: the procedure does not require a complete knowledge about the experimental parameters ci. We can use only the known components of vector c to see whether the NN “recognizes” the features of importance or not.

Before we move to an illustrative example, we would like to outline a limitation that is crucial for the applicability of this procedure. Since we assign the NN to reconstruct the inputs p and c based on the measurable output rU the procedure assumes that this inverse problem admits a solution. If the inverse problem is ill-posed the NN will not converge to a reasonable reconstruction. Thus, this will not lead us to a wrong conclusion, but might hamper the applicability of the method. In practice, this might be a matter of using sufficiently informative output that uniquely identify the input. However, the inverse problem can turn out being ill-posed because of two other reasons, which worth mentioning. First, the physics of the studied process can have some symmetry that makes two or more sets of values for the input parameters indistinguishable in reality. Second, the physics can be self-similar, which means that the physics remains the same if two or more input parameters are changed simultaneously, in a specific way that keeps a certain single parameter constant. One can say that the process depends on this similarity parameter rather than individually on the parameters determining it. These parameters can then be replaced with the similarity parameter to make the inverse problem well-posed. This shows that some understanding of the problem properties is necessary for making use of the described procedure. The NN not establishing a reasonable solution for the inverse problem could, in fact, point to the presence of such symmetry or self-similarity. In the further discussion we will show how both problems can be handled in practice.

As a proof-of-principle example we consider the Galton Board (GB), which is also known as a bean machine. We highlight that the Galton Board is chosen as a clear illustrative example, while one can certainly apply standard statistical methods for this problem. The reasons and some advantages of using machine learning will be discussed later in a separate section.

We use index j to denote the final position of a bead that bypasses n horizontal rows of pegs. Bouncing from each peg leads to equal probability of bypassing it from each of two sides. The probability of coming to the j-th positions is then given by

$${r}_{j}^{A}=(\begin{array}{c}j\\ n\end{array}){0.5}^{j}{0.5}^{n-j}\approx \sqrt{\frac{2}{\pi n}}\,\exp \,(-\frac{2}{n}{(j-\frac{n}{2})}^{2}),$$
(5)

where the approximation is the de Moivre-Laplace theorem applied under the assumption of \(n\gg 1\). We will use this Gaussian distribution of limited applicability as theory \({\mathbb{A}}\), which we intend to validate using experiments. As the experiment for this problem we will use the numerical implementation of Monte-Carlo method, with a limited number of beads. The theory \({\mathbb{A}}\) will be validated relative to an alternative theory \({\mathbb{B}}\) that suggests that the distribution is super-Gaussian:

$${r}_{j}^{B}\sim \exp \,(-{p}_{2}{(j-\frac{n}{2})}^{4}),$$
(6)

We use this form as a particular example because it is difficult to describe and appeal to the difference between the Gaussian and super-Gaussian distribution without elaborating and applying additional data processing. We here deliberately do not apply any transformation to show the capability of the used approach. Note also that theory \({\mathbb{B}}\) is formulated incompletely. The missing factor in front of the exponent can be determined using the normalization conditions. We here intentionally do not compute this factor to demonstrate how one can deal with incomplete theories.

We now define a unified theory \({\mathbb{U}}\):

$${r}_{j}^{U}={p}_{0}\sqrt{\frac{8}{\pi n}}\,\exp \,(-{p}_{1}{(j-\frac{n}{2})}^{2}-{p}_{2}0.1{(j-\frac{n}{2})}^{4}).$$
(7)

We see that theory \({\mathbb{A}}\) corresponds to \({p}_{1}\ne 0\), \({p}_{2}=0\) while theory \({\mathbb{B}}\) is characterized by \({p}_{1}=0\), \({p}_{2}\ne 0\). (More precisely, theory \({\mathbb{A}}\) corresponds to \({\bf{p}}=(0.5,2/n,0)\), but this is not important for now.) Note, that there exist a value of p0 that provides probability normalization. In other words, within the family of theories \({\mathbb{U}}\) there exist theories of type \({\mathbb{B}}\) that fulfill the probability normalization. As we will see, we can disqualify \({\mathbb{B}}\) even without knowing this value of p0.

We now randomly generate values of p0, p1 and p2 from 0 to 1, generate the result rj according to the unified theory \({\mathbb{U}}\) and train the NN to reconstruct the generated vector \({\bf{p}}=({p}_{0},{p}_{1},{p}_{2})\). In our implementation we used \(n=16\) and the result was sampled through 16 values rj. For the proof-of-principle demonstration we used a rather small feed forward fully connected NN that contained four layers with the neuron numbers 16, 16, 16 and 3, respectively. The three output neurons were associated with the components of the vector p. We used logistic sigmoid, squared error measure and stochastic gradient descent, which resulted in an accuracy of p determination of the order of 10−3.

Next we perform a number of numerical experiments using a random number generator to simulate a number of beads that pass through \(n=16\) layers of pegs. The number of beads reached each of the positions is normalized by the total number of beads to get the experimental values \({r}_{j}^{E}\). These distributions are then used as input for the NN, which reconstructs p according to the learned unified theory.

In Fig. 1 (left column) we plot the distribution of reconstructed values on the plane of p0 and p1 for the number of beads equal to 103 (upper panels) and 105 (lower panels). As we see, in both cases the reconstructed values are localized mostly in the vicinity of the point related to the theory \({\mathbb{A}}\), i.e. (\({p}_{2}=0\) and \({p}_{1}=0.5\)). However, this is more obvious in the case of using the larger number of beads. This is not surprising because the NN was trained to reproduce values based on exact distribution without any stochastic deviations. To establish some tolerance of the NN to noise we train it using an artificial noise that we apply to the theoretical values before sending them to the input of the NN. For this purpose we multiply each value \({r}_{j}^{U}\) by a factor \((1-0.005+0.01r)\), where r is random value from 0 to 1. As we can see from the comparison of the panels in Fig. 1, this results in a better localization of the distribution around the expected point (precise analysis shows that this improvement appears for both p2 and p0). This result demonstrates that this approach can be used to retrieve more efficiently the information from the experimental results with noise.

Figure 1
figure 1

Validation of the theory for the Galton Board in the limit of large number of layers of pegs (Eq. (5)). The plots show the distribution of reconstructed values of parameters of the unified theory (Eq. (7)) by the NN that receives the distributions obtained with Monte-Carlo method using 103 (upper panels) and 105 (lower panels) beads. The tendency of p2 to 0 indicates irrelevance of the super-Gaussian component and invalidity of such alternative theory. The right column shows the results of using a NN that is trained with additional noise that leads to higher tolerance to the experimental noise caused by smaller number of beads.

We can note that the distributions of reconstructed values are centered at a point that is close to, but notably different from the point predicted by the theory. This is an indication of the fact that the theory is valid in the limit \(n\gg 1\), while we here use a large, but finite value, \(n=16\). In other words, we see the closest fitting of experimental data in the framework of theories described by the unified theory \({\mathbb{U}}\). This observation leads to the next opportunity that we identify and describe in the next subsection.

Completing a theory

Suppose we have an incomplete theory \({\mathbb{A}}(\alpha )\) that contains some free parameter α (or several parameters). Alternatively we can have a hypothesis that some theoretical approach can be applied under the appropriate choice of a parameter that we were not able to determine. We can use the methodology described in the previous subsection to determine both the possibility and the appropriate values in such cases.

We generate a sequence of theoretical results for various values of parameters c and values of α in the appropriate range:

$${{\bf{r}}}_{i}^{A}={\mathbb{A}}({\alpha }_{i}){{\bf{c}}}_{i},$$
(8)

and train a NN to reconstruct the values of c and α. We can then vary the experimental conditions c and send each result to the input of the trained NN. The agreement of the reconstructed values to the known experimental parameters c would indicate that the NN “relates” some features in the input to the ones peculiar to the theory \({\mathbb{A}}(\alpha )\). If such agreement is not observed, this procedure does not indicate weather such a theory can be applied or not. Alternatively, if the agreement for c is observed, but the values of α are not exposing any systematic tendency, we can conclude that there is no appropriate choice for free parameter α.

However, if, in some region of parameters, the reconstructed values of c are systematically close to the known values in the experiment, this procedure can show the systematic dependency of α on the parameters c.

One can approximate the obtained dependencies and use the approximations as a heuristic way to complete the theory. However, this can also provide a hint for further development of a theory in a deductive way. For example, the possibility of applying certain assumptions or phenomenological model can become clear and be validated rigorously.

Finally, we would like to highlight that this procedure can be applied to determine the experimental conditions that are not known or even not measurable. This procedure can thus be treated as indirect measurement.

As an illustrative example, we again use the Galton Board experiment. We assume that we determined only the fact of having a Gaussian distribution, but did not determine the coefficient that defines the spread. We can use Eq. 7 with \({p}_{2}=0\). Then the unknown free parameter is p1 and our purpose is to determine its dependency on the number of rows n.

We perform numerical experiments with different number of rows from \(n=2\) to \(n=32\). In all experiments we had 105 beads. Using the same NN trained with noise as in the previous subsection, we reconstruct the value of p1 as a function of n. The result shown in Fig. 2 clearly demonstrates that we can systematically determine the value of p1. As expected, the result is close to the analytical dependency determined by the Eq. (5) and shown with the dashed blue curve.

Figure 2
figure 2

Indirect measurement applied as a way to complete theory (Eq. (7)) for the Galton Board. The distribution of reconstructed values of p1 (defines the width of Gaussian distribution) is shown as a function of number of rows n. In the limit of large n the distribution show systematic tendency towards the known analytical answer (Eq. (5)). The deviation in the region of small n provides an extension of the theory in the framework of Eq. (7).

Note that for small values of n the determined values show a systematic deviation from the analytical trend. This is an indication that the analytical trend is not valid for small n. This procedure provides the possibility to not only see this but also measure and determine the closest choice of p1 in the framework of the theory (Eq. (7)). In such a way, based on experimental data, we can perform an indirect measurement of any parameter and its dependency even beyond the range where the parameter has intended physical meaning.

Indirect measurement

We would like to note that the procedure described in the previous subsection can be applied also for indirect measurement of a real physical variable that defines physical conditions modeled by the theory. This means that we can use this procedure to perform indirect measurements of unknown parameters in the experiments and, in such a way, complete the information about experimental conditions in case they are not known for us. Moreover, instead of a theory we can use ab-initio simulations, which makes the method applicable for many complex situations. To do so one needs to parametrize the conditions of the process using knowledge about real conditions and perform sufficiently large set of ab-initio simulations to train NN for the described procedure.

Why using NN?

One can ask a very reasonable question: What is the benefit of using a NN instead of collecting all possible outcomes of a model and then determining, which one is the closest to the experimental measurement? To clarify the benefit, we note that this alternative procedure would inevitable require setting some metric for calculating the closeness between the known output and the measured result. This is the main trouble. First, the appropriate metric can be different for different regions in the parameters space. Second, it can be sensitive to the experimental noise. Finally, it can be sensitive to systematic distortions in the experimental measurements. For example, a systematic shift of a certain useful pattern can hamper its identification in case of using the mean standard deviation as the metric.

In the discussed methods (on the basis of NN), we only set the metric for a number of physical parameters and the effect of the used metric is more transparent. The NN is automatically adjusted to relate the parameters to the most indicative features in the output. As we showed, one can even train NN to have tolerance to noise and some distortions.

Application for the physics of laser-plasma interactions

In this section we show how the discussed methodology can be applied for resolving long-standing questions in the field of high-intensity laser-plasma interactions. The process of such interactions is crucial for many applied and fundamental research direction related to the use of modern high-intensity lasers44. Even compact table-top lasers can now produce pulses with relativistic intensity, which means that not only almost instantaneous ionization but also a relativistic, collective dynamics in the produced plasma can be caused on the surface of a target. This opens opportunities for driving large variety of highly non-linear interaction regimes and thus for converting laser energy into energetic particles or unique, tailored forms of radiation. This hold promise for numerous applications ranging from fundamental studies to new diagnostic tools in medicine and nuclear waste utilization45,46,47,48.

The ionization of solids leads to the formation of high-density plasma that hampers the penetration of laser radiation. However, the light pressure can be strong enough to cause relativistic, repeated shifts of electron bulks that is balanced by the attraction to the residual ions that are less mobile due to higher mass. Using the appropriate reference frame and some reasonable assumptions, one can reduce the problem to 1D radiation-plasma dynamics49. However, the application of first principles leads to a highly complex problem formulation, which is largely inaccessible for analytical tools. At the same time, the use of ab-initio simulations lacks generality and is also of limited use due to commonly incomplete knowledge about experimental conditions. This naturally impedes search for useful regimes in a multi-dimensional space of parameters.

One way of overcoming this difficulty is developing phenomenological models50,51,52,53,54,55, i.e. theories that are based on introducing entities (and the rules of their behavior) that model patterns systematically observed in ab-initio simulations. The applicability of such models is typically motivated and analyzed based on theoretical estimates.

Historically the first and probably the most known phenomenological model for the nonlinear radiation reflection from plasmas is referred to as the relativistic oscillating mirror (ROM)50. The underlying principle of this model states that the reflection happens according to the Leontovich boundary condition (equality of incoming and outgoing energy fluxes) at some oscillating apparent reflection point, which can approach but not reach the relativistic speed limit just as real particles. When this point moves against the incident radiation with relativistic velocity, a quick transition is formed in the reflected radiation. High harmonics generated in such a way undergo a universal spectral law \({I}_{k}\sim {k}^{-8/3}\), where Ik is the intensity of k-th harmonic56. Although some other trends in spectra are also observed in simulations51,54,55,57,58, the observation of this trend in some experiments59 established a conviction in the validity and applicability of the ROM model.

An alternative model is referred to as the relativistic electronic spring (RES)60,61. The underlying principle of this model states that under the light pressure some varied part of foremost electrons becomes and remains bunched, while the resulted bunch moves so that its radiation precisely cancels out the incident radiation in the plasma bulk. The resulted description agrees well with ab-initio simulations in many aspects in a wide range of conditions and provides several important predictions62,63 including the possibility of producing unprecedentedly intense and short bursts60,64,65 of radiation with controllable ellipticity66. However, the experimental validation of the RES model is difficult. Current experiments are largely limited to the observation of high-harmonic spectra, which are typically analyzed in terms of the exponent of the power-law fall, while setting aside more peculiar signatures that can indicate the validity and applicability of the RES model. More accurate comparison requires cutting-edge experimental and theoretical developments based on reveling and retrieving indicative features in the experimental data67,68,69,70. We now show how such analysis can be performed with the help of a NN on the basis of the methodology described in the previous section.

Comparing the RES and the ROM models

Here we show how we can use the measurable spectra of generated high-harmonics for comparing and determining the validity regions of the RES and ROM models. Although the procedure can be based on experimental data, here we use ab-initio simulations to obtain the spectra in the frequency range of up to harmonic order 12.8, which mimics the capabilities of typical experimental arrangements.

To perform the comparison, we need to develop a unification theory. Since the ROM model was intended and motivated for the case of sharp density drop at the plasma surface, we consider the RES equations for this case and also assume the most indicative case of P-polarized incidence:

$${f}_{{\rm{in}}}({x}_{s}-t)=\frac{S}{2\,{\cos }^{3}\,\theta }\,(\sin \,\theta -\frac{{\beta }_{y}}{1-{\beta }_{x}}),$$
(9)
$${\beta }_{x}^{2}+{\beta }_{y}^{2}=1,$$
(10)
$$\frac{d{x}_{s}}{dt}={\beta }_{x},$$
(11)
$${f}_{{\rm{out}}}^{RES}({x}_{s}+t):\,=-\,\frac{S}{2\,{\cos }^{3}\,\theta }\,(\sin \,\theta -\frac{{\beta }_{y}}{1+{\beta }_{x}}).$$
(12)

Here fin and \({f}_{{\rm{out}}}^{RES}\) are the shape of the incoming laser pulse and the shape of the outgoing reflected signal that carries generated high harmonics. The first three equations can be solved to find the temporal evolution xs(t) of the bunch that accommodates peripheral electrons according to the RES model. We can then substitute the obtained solution xs(t) into the fourth equation to obtain the outgoing signal \({f}_{{\rm{out}}}^{RES}(x+t)\). The second equation implies the ultra-realistic limit for the bunch velocity components along the pulse propagation direction (βx), and along the electric field direction (βy). The shapes fin and \({f}_{{\rm{out}}}^{RES}\) are given in laboratory reference frame, while the consideration is carried out in the moving reference frame49 that provides a way to account for arbitrary incidence angle θ. The relativistic similarity parameter S is defined as \(S=n/a\), where n is the plasma density given in laboratory reference frame in critical units, a is the pulse field amplitude given in relativistic units; both units are computed relative to the laser wavelength \(\lambda =1\,\mu \)m. For more details see refs60,61.

The ROM model does not provide a complete set of differential equations for computing \({f}_{{\rm{out}}}^{ROM}\). Instead the model provides a way to compute directly the indicative spectral properties. The only two essential assumptions that lead to this result are the Leontovich boundary conditions (\({f}_{{\rm{out}}}^{ROM}({x}_{ARP}+t)=-\,{f}_{{\rm{in}}}({x}_{ARP}-t)\)) and the fact that they are applied to the point xARP that passes through the stage of moving with the speed close to the speed of light against the incident radiation (in the moving reference frame). From Eqs (9) and (12) we see that the boundary conditions of the RES model implies inequality between the incident and outgoing radiation. To provide a smooth transition between these types of boundary conditions and use xs as xARP (i.e. admitting the same relativistic motion) we modify the Eqs (912) and formulate the unification theory:

$${f}_{{\rm{in}}}({x}_{s}-t)=\frac{S}{2\,{\cos }^{3}\,\theta }\,(\sin \,\theta -\frac{{\beta }_{y}}{{(1-{\beta }_{x})}^{\frac{p+1}{2}}}),$$
(13)
$${\beta }_{x}^{2}+{\beta }_{y}^{2}=1,$$
(14)
$$\frac{d{x}_{s}}{dt}={\beta }_{x},$$
(15)
$${f}_{{\rm{out}}}^{U}({x}_{s}+t):\,=-\,\frac{S}{2\,{\cos }^{3}\,\theta }\,(\sin \,\theta -\frac{{\beta }_{y}}{{(1-{\beta }_{x})}^{\frac{1-p}{2}}{(1+{\beta }_{x})}^{p}}).$$
(16)

Here we introduce the parameter p that provides the needed transition through its variation from 0 to 1. In the limit \(p=1\) the Eqs (1316) coincide exactly with the equations of the RES model. In the limit \(p=0\) the Eqs (13 and 16) imply the Leontovich boundary conditions, while the solutions for xs include instances of approaching relativistic limit of motion against the incident radiation (see more details below). Thus the unified theory for \(p=0\) exposes the same spectral properties as the ROM model. However, the unified theory still provides one particular way of completing the ROM model to a set of equations that determine not only spectral properties, but also the explicit form of fout. We thus will refer to this theory as the completed ROM or ROMc.

The numerical solution of Eqs (1315) is straightforward. From Eq. (13) one can express:

$$\frac{{\beta }_{y}}{{(1-{\beta }_{x})}^{\frac{p+1}{2}}}=R=\,\sin \,\theta -2{S}^{-1}\,{\cos }^{3}\,\theta {f}_{{\rm{in}}}({x}_{s}-t).$$
(17)

One can see that when R changes sign, so does βy. According to Eq. (14), this means that βx passes close to 1 or −1 as it was mentioned above. The Eq. (17) together with Eq. (14) have one relevant solution for βx:

$${\beta }_{x}=1-g({R}^{2},p),$$
(18)

where g(α, p) is the indirect solution of the equation \(\alpha {x}^{p}+x-2=0\). We can solve this equation numerically and use this to determine the evolution xs(t). Then we can use Eq. (14) to obtain \({f}_{{\rm{out}}}^{U}({x}_{s}+t)\) and calculate its spectrum.

Once the unification theory is developed we can use it to calculate the spectra. For our studies we considered two-cycle laser pulse \({f}_{{\rm{in}}}(\eta =x-t)\) characterized by the vector potential in the form of ~\(\sin \,(\eta +\varphi )\,{\sin }^{4}\,(\eta /4)\), which results in:

$$\begin{array}{rcl}{f}_{{\rm{in}}}(\eta =x-t) & = & {\sin }^{3}\,(\eta /4)\\ & & \times \,(\cos \,(\eta /4)\,\sin \,(\eta +\varphi )\\ & & +\,\sin \,(\eta /4)\,\cos \,(\eta +\varphi )),\end{array}$$
(19)

where the phase \(\varphi \) can have arbitrary value. For this study we set \(\varphi =\pi /2\).

We numerically solve the equations of the unified theory for the parameter space spanned by:

$$\theta \in [0,3\pi /8],$$
(20)
$$S\in [1,10],$$
(21)
$$p\in [0,1].$$
(22)

Each spectrum was sampled with 16 equidistant points in the interval from 0 to harmonic order 12.8 and the value is converted to appropriate logarithmic units so that the values mostly lie in the interval from 0 to 1. Next we train the NN to reconstruct the values of S, θ and p from this data. We use the same topology and training method of the NN as in the previous experiments with the Galton Board. The achieved accuracy was about 3 × 10−3 for the square error measure applied to the parameters normalized to unity.

Next, we perform a series of particle-in-cell (PIC) simulations for the parameter space spanned by Eqs (20 and 21) and obtain spectra using the field distribution obtained in each simulation. For this purpose we used 1D version of PIC code ELMIS60,71. In all simulations we used field amplitude \(a=200\) and the density determined in accordance to the value of S. The spectra are then sent to the input of the trained NN that provides the reconstructed values of parameters.

In Fig. 3 we show the distribution of the values of the parameter p as a function of S and θ. The results point to the fact that in all cases the most appropriate choice of p is close to 1, which corresponds to the RES model.

Figure 3
figure 3

Comparison of the RES and the ROM model. The RES-ROM unification parameter p (see Eqs (1315)) is shown as a function of the incidence angle θ and S (right panel). The distribution of p as a function of S is shown on the left panel.

Advancing the RES model

The original RES model has no fitting parameters and still describes fairly accurately the shape of the pulse computed with ab-initio PIC simulations for a wide range of relativistic laser-plasma interaction scenarios61,72. In particular, the model indicates the possibility of producing singularly intense XUV bursts, identifies the optimal conditions for this process60,72 and determines the polarization states of the XUV bursts66. However, the original RES theory does not predict the amplitude of these bursts that appear as singular points for the theory.

These bursts are originated from the singularity of the second term in the right-hand side of Eq. (12), when βy changes sign and βx becomes close to −1 according to Eq. (10). To see this one can substitute βy/\((1-{\beta }_{x})\) expressed from Eq. (9) into Eq. (12):

$${f}_{{\rm{out}}}^{RES}:\,=-\,\frac{S\,\sin \,\theta }{{\cos }^{3}\,\theta }+{f}_{{\rm{in}}}+\frac{2}{1+{\beta }_{x}}\,(\frac{S\,\sin \,\theta }{2\,{\cos }^{3}\,\theta }-{f}_{{\rm{in}}}).$$
(23)

One can formally bound the resulting field through introducing a constant α that is close to, but less than, unity:

$${f}_{{\rm{out}}}^{RES}:\,=-\,\frac{S\,\sin \,\theta }{{\cos }^{3}\,\theta }+{f}_{{\rm{in}}}+\frac{2}{1+\alpha {\beta }_{x}}\,(\frac{S\,\sin \,\theta }{2\,{\cos }^{3}\,\theta }-{f}_{{\rm{in}}}),$$
(24)

and also relate α to the effective bounding gamma factor γb in terms of Eq. (10):

$$\alpha =\sqrt{1-{\gamma }_{b}^{-2}}.$$
(25)

Large values \({\gamma }_{b} > 10\) result in \(0.99 < \alpha < 1\) and thus almost do not affect the values of \({f}_{{\rm{out}}}^{RES}\) everywhere except the vicinity of \({\beta }_{x}=-\,1\), where they formally bound the result. This bound, however, affects crucially the high-frequency end of the spectra measurable in experiments. Simulations show64 that the amplitude of the XUV bursts can be up to factor 20 times higher than that of the incident radiation and this factor grows with the laser intensity in a complex way. This means that determining γb is a matter of theory beyond the self-similarity \(S=n/a\) implied by the RES model. Determining γb is thus an important theoretical problem for both experimental validations and future applications of ultra-intense XUV bursts.

Serebryakov et al. (see ref.62) analyzed the possibility to relate the bounding factor to the actual gamma factor of electrons in the bunch. However, the electrons in the bunch have only similar velocity, but genuinely different gamma factors, since they all experience different acceleration over different intervals of time61.

We will now use the methodology of indirect measurements to examine heuristically the appropriate values of γb as a function of laser field amplitude. We use the RES model extended with parameter α through Eq. (24). We obtain the same spectral data for various parameters in the space spanned by:

$$\theta \in [0,3\pi /8],$$
(26)
$$S\in [1,10],$$
(27)
$$\alpha \in [0,1].$$
(28)

We train the same kind of NN and reach roughly the same level of accuracy of determining these parameters on the basis of spectral data.

Next, we perform a series of PIC simulations for \(\theta \in [0,3\pi /8]\), \(S\in [1,10]\) and field amplitudes \(a\in [5,500]\), which is relevant to current and near-future experiments. The obtained spectra are used as inputs for the trained ANN that reconstructs the values of α. In Fig. 4 we see that heuristic value of \({\gamma }_{b}\approx 10\) appears as universal for the amplitude values \(a\gtrsim 10\). On the right panel we also see that this tendency becomes more prominent for large values of S.

Figure 4
figure 4

Indirect measurement of the γb parameter for the extension of the RES model according to Eq. (24). The distribution of reconstructed values of γb are shown as a function of the laser field amplitude a for \(1 < S < 10\) (left panel) and for \(3 < S < 10\) (right panel).

This result can be used either directly to advance heuristically the RES model or for further theoretical analysis of the physics of the process. Note, that since the parameter γb has no meaning in terms of first principles we can hardly determine it from these principles. In opposite, we here determine its value in terms of its intended role as a parameter of the extended RES model.

Indirect measurements based on incomplete knowledge about experimental conditions

The previous examples primarily concerned the questions of theory. As a final example we demonstrate how we can make a clear use of the discussed methodology in complex experiments, when the information about the initial conditions of the studied process is not complete. This is not related to the use of any models and can be based on ab-initio simulations. However, here, we again use the RES model and ab-initio PIC simulations to mimic such experimental scenarios.

Suppose we perform an experiment about high-harmonic generation through the interaction of a high-intensity laser pulse with a solid target. We know and can vary the incidence angle θ. We also know the duration and the amplitude of the laser pulse. However, we do not know the carrier envelope phase (CEP) \(\varphi \) and we do not know precisely the density profile that is the result of plasma spreading after heating by foregoing laser radiation. This is rather common experimental situation.

We will mimic the unknown initial plasma state through considering a steep density profile with unknown density in the plasma bulk. This can be related to any plasma distribution through the approach of effective S-number proposed in ref.72. Note, however, that this is just for showing proofs of principles and one can use any plasma density profile in both RES and PIC calculations61.

We will assume that the pulse has the form given by Eq. (19) with amplitude \(a=200\). The limited knowledge about plasma density n can be interpreted as limited knowledge about the \(S=n/a\), which we assume to stay in the range \(S\in [1,10]\).

In Fig. 5 we show common spectral data obtained for various values of S, θ and \(\varphi \). One can see that the data contains sophisticated features that depend on parameters in a complex way. Although they seemed to encode ambitiously the scenario of interaction and the initial conditions, it would be very difficult to describe them using human language so that one can determine the initial conditions systematically. Developing a methodology of retrieving such information from the spectral data appears as an intrinsically complex problem that is a matter of advanced developments69,70. Here we demonstrate that this problem can be solved with a NN.

Figure 5
figure 5

Some spectra obtained with the PIC simulations (black) and the numerical solution of the RES model (red) for different parameters S, θ and \(\varphi \).

Note, that in essence the results of the RES model (red curves) agree with the results of PIC simulations. However, since the agreement is not ideal and in some cases is rather poor, the problem of reconstructing initial conditions based on the RES model is not trivial. This requires appealing to some essential features rather than to ideal memory about the states. This mimics the possible experimental limitations related to natural noise or potentially systematic distortions.

We use the same kind of NN and train it to reconstruct values of \(\varphi \), θ and S on the basis of spectral data obtained via numerical solutions of the RES model in the parameter space spanned by:

$$\theta \in [\pi /8,3\pi /8],$$
(29)
$$S\in [1,10],$$
(30)
$$\varphi \in [0,\pi ].$$
(31)

To mimic real experiments we perform PIC simulations with the parameters in the same parameter space. We then use the obtained spectral data to see whether the NN can identify correctly the phase values used in simulations. The results are shown in Fig. 6.

Figure 6
figure 6

The demonstration of indirect measurement of the pulse carrier envelope phase \(\varphi \) based on spectral data obtained with PIC simulations. The used NN was trained with the RES model. Panel (a) shows all results and the insert to the right shows the notations and standard deviation for different groups of PIC simulations. Panel (b) shows the results for \(\theta =\pi \)/4 and \(S=4.6\) that are optimal for accurate reconstruction. Panel (c) shows the result for the optimal \(\theta =\pi \)/8 in case we have no precise information about \(S\in [2.2,10]\).

As one can see in Fig. 6(a), all the results of the NN are mostly located around the diagonal that corresponds to accurate reconstruction of the phase \(\varphi \). However, from the diagram for the standard deviation (shown in the insert to the right) the accuracy of the reconstruction varies with parameters θ and S.

There could be two reasons for this. First, potentially the RES model has different accuracy for different parameters and this affects the accuracy of reconstruction. Second, the physics of the process can potentially provide more indicative feature for certain ranges of parameters. This conclusion indicates an important capability of this methodology. The accuracy of the reconstruction of the known parameters can show where (in the parameter space) the used theoretical model (or the setup for simulations) is more adequate. Alternatively, this accuracy can indicate where the physics of the process provides more indicative features in the data used for the reconstruction. Moreover, if we can vary the range of used data, we can determine where these features are located.

One trivial outcome of this observation is the possibility to chose the parameters that are most useful for the reconstruction. In our case this is show in Fig. 6(b).

Finally, we demonstrate that the procedure can be efficiently used in case of limited knowledge about the experimental data. As we see in Fig. 6(c), the phase can be reconstructed fairly accurate even if we do not know precisely the plasma density distribution.

Discussion

The presented example demonstrates that the described approach is sensible for selecting and calibrating a model. In the considered case the NN has been supplied with the unified RES-ROM theory and has been trained to reconstruct the unification parameter p and some set of physically important input parameters (S and θ). The NN learned all the features that are useful for the reconstruction of these parameters. The physical meaningfulness of the learned (i.e. selected) features was controlled by the used set of physically meaningful parameters. The resulted (selected and calibrated) model is thus relevant in terms of describing the experimentally observable features that are sensitive to S and θ. One can then ask if the selected and calibrated model is more generally valuable. We here can refer to the general tendency of physical phenomena to be originated from some underlying principles and laws. In other words, if we see that some theory foundations yield a good description of certain phenomena and their dependency on certain parameters, it would be natural to expect that the resulted theory can also describe other phenomena and their dependency on other parameters. The presented possibility of reconstructing the the carrier envelope phase \(\varphi \) serves as a convincing demonstration for this expectation.

We now return to the questions raised earlier concerning the well-posed (or otherwise) nature of the inverse problem, when there is some symmetry and/or self-similarity. The considered problem of laser-plasma interaction provides an illustrative example for both issues.

As we mentioned earlier, the physics of the process in the considered ultra-relativistic regime has the property of self-similarity with the similarity parameter \(S=n/a\): if we proportionally increase/decrease the values of amplitude a and density n the shape of the spectrum remains the same. This means that from the normalized spectrum we can only determine the value of S but not the values of a and n separately. In other words, the inverse problem would be ill-posed in case of using a and n as the input parameters. This is why we used S as the input parameter in all arrangements. (Alternatively, one can use non-normalized spectra, but this would have little sense and would also increase the dimensionality and thus computational demands).

The property of symmetry appears for the specific case of normal incidence, i.e. \(\theta =0\). In this case inverting the directions of the electric and magnetic fields leads to the same plasma dynamics (with reversed velocities of particles in the y and z directions). This means that the problem setting for \(\varphi =0\) and \(\varphi =\pi \) provide exactly the same spectrum. This is why we restrict the values of angle \(\theta \ge \pi \)/8 in Fig. 6. One can ask however, what would happen if we do not pay attention to this symmetry and vary the angle in the whole range making the inverse problem ill-posed for \(\theta =0\). To assess this, we tried to train the NN for the range \(\theta \in [0,3\pi /8]\) and checked whether this ruins the whole procedure or affects the results only locally for \(\theta =0\). In Fig. 7 we plot the average standard deviation for the reconstructed phase as a function of angle θ and also the reconstructed phase for the case of \(\theta =0\), \(S=4.6\).

Figure 7
figure 7

The demonstration of the tolerance against the presence of parameter ranges, where the inverse problem is ill-posed. The standard deviation is shown for \(\theta \in [0,3\pi /8]\), while for \(\theta =0\) the cases of \(\varphi =0\) and \(\varphi =\pi \) are indistinguishable. Other settings are the same as in Fig. 6.

As expected, the measured deviation is notably larger for \(\theta =0\). However, for other values of θ a reasonably satisfactory result is achieved. Moreover, for previously determined optimal value of \(S=4.6\) the reconstructed values seem to still follow the right trend for all values of \(\varphi \) except the vicinity of the ambiguous points \(\varphi =0\) and \(\varphi =\pi \). This indicates that the method is practically robust against a limited knowledge about the properties that might lead to the ill-posed inverse problem. Moreover, the fact that the NN does not converge to a reasonable reconstruction accuracy in certain range of parameters can be used as an indicator of some symmetry or self-similarity of the problem. In such a way this procedure can guide further theoretical analysis of the problem.

Conclusions

In this paper we discussed and demonstrated the possibility of using machine learning for validating and advancing theories, as well as performing indirect measurements with incomplete knowledge about experimental conditions. The procedure is based on the possibility of using NN for establishing the relation between various parameters (of the process and theory) and the features that might be poorly accessible for description with human language. First, we showed how this can be used to validate, compare and advance theoretical models. Next, we showed how this can be used to perform indirect measurement of parameters of the experiment or theory based on experimental data, even if we have incomplete knowledge about experimental conditions. Finally, we outline that one can use the accuracy of the reconstruction of the known parameters for the identification of indicative features and their locations in the experimental data.