Introduction

A field reversed configuration (FRC) has no externally imposed toroidal field, belonging to the category of compact tori1, 2. The poloidal field in an FRC has one component arising from magnets arranged on a common linear axis and another component generated by a toroidal plasma current flowing in opposite direction to the magnet currents. Under transient conditions, an additional magnetic field component arises from toroidal currents flowing in the vessel (the flux conserver (FC) current), which are induced by changes in plasma current distribution and/or transients in the external magnet currents. When the plasma current is strong enough to reverse the externally imposed magnetic field, a closed field structure topologically similar to a torus is formed (Fig. 1). Closed field lines in an FRC enclose a point of maximum kinetic pressure and null magnetic field called the o-point. The separatrix is the flux surface with null poloidal flux and separates the internal closed field region from the open field line scrape-off layer (SOL) region.

Fig. 1
figure 1

Schematics of the magnetic field topology in a field reversed configuration. Plasma (orange) is contained using a set of axially symmetric magnets (blue). When plasma current is strong enough to reverse the externally imposed magnetic field, a closed field line structure is formed. Closed field lines circle around the so called o-point, where the magnetic field is null. The longest closed magnetic field line enclosing the o-point has null poloidal flux and separates the internal closed field region from the open field line scrape-off layer (SOL) region

The C-2U device, built and operated by Tri Alpha Energy (TAE), is the first device to demonstrate that FRCs can be sustained in near steady state using neutral beam injection3, 4. TAE’s C-2U device relied largely on FC effects to stabilize plasma displacements, so the discharge lifetime was of the same order as the time constant of the vessel. TAE’s C-2W device, presently in its initial operational phase, will extend the discharge duration over this limit, so plasma control will become necessary to stabilize the separatrix shape and position5. A method to determine the magnetic field structure and related control variables in real time (with sampling frequency in the range 10–100 kHz) is then required. Some first order approximations for FRC geometry parameters are available from the excluded flux radius. However, these cannot distinguish an FRC from a high beta mirror6, so they are not particularly useful if the FRC state itself is uncertain.

Determination of the magnetic field structure in an FRC is a challenging problem. While magnetic field structure inside the plasma can be measured by inserting probes inside the plasma7, this cannot be done in high temperature plasmas without severely disrupting the plasma confinement. In FRC plasmas, the magnetic field structure must be determined indirectly from external magnetic probes8, laser polarimetry systems9, etc.

Determination of the internal current sources from external measurements is termed the inverse problem. The technique used to determine the current sources from the sensor data is the inference technique. When Lorentz forces are balanced by plasma pressure, there is no net acceleration of the plasma, and the plasma is said to be in equilibrium. The determination of the magnetic structure corresponding to plasma in equilibrium is referred to in the literature as 'equilibrium reconstruction'.

This work departs from the standard equilibrium reconstruction approach10 and use instead the current tomography (CT) method11, 12, a well-validated alternative already studied in connection with real-time control of tokamaks13. The CT method uses Bayesian inference14 of Gaussian processes (GPs)15 to solve the inverse problem. The GP modelling used by the CT method can be tailored to a multiplicity of related tomography problems16, in particular to the specifics of the FRC magnetics. There are several advantages of the CT method that make it ideal for plasma control. First of all, when the relationship between current sources and sensors is linear (such as is the case with magnetic probes), and the physics assumptions can be reduced to linear relationships among current sources and measurements, the solution depends on the sensor data through non-iterative matrix operations and, for this reason, is deterministic and suitable for real time. A version of the algorithm for C-2W device has already been implemented in a field-programmable gate array and verified to run under 10 μs. Second, as no equilibrium restrictions are necessarily required, the CT method can infer Alfvenic oscillations from magnetic sensor data. Fast transients can then be resolved accurately and with very low latency, both factors known to have an impact on control systems performance. Third, the CT method is able to fuse information from multiple sensor data sets and boundary conditions using a unified inference approach. This allows straightforward scalability should other magnetic sensors become available at a later stage. Sensors based on Polarimetry17 and Hanle effect18, for instance, are both planned for TAE’s C-2W device. Finally, the CT method provides uncertainty measures on all inferred outputs. This is interesting information on its own, but it has also an interest for advanced control applications, since the uncertainty information can be factored in as part of a robust control scheme19.

Results

Inference of Alfvenic transients in FRC

In the C-2U device3, two individual toroidal current rings are produced inductively (θ-pinch technique) in two opposing quartz formation sections placed at both ends of a stainless steel vacuum vessel (Fig. 2). These are produced simultaneously using pulsed power, fast magnetic field transients, and then accelerated out of their respective formation sections at supersonic speeds v z  ~ 300 km/s. Collisions of both FRCs take place inside the confinement vessel near or at the mid-plane z = 0.

Fig. 2
figure 2

MHD simulation of a typical FRC process in the C-2U device. Two FRCs are created in two quartz formation sections. They are accelerated towards each other to collide and merge into a single FRC inside a stainless steel confinement chamber. The collision transforms the kinetic energy of both moving FRCs into thermal energy of a single, static FRC. Magnetic field topology in the SOL (solid lines), closed field region (broken lines) and separatrix (thick solid line) are shown along with colour-coded electron density. DC Magnets (brown blocks), flux conserver structures (blue), fast switching magnets (dotted red line) and quartz tube boundaries (green horizontal lines) are also shown for completeness

The merging process occurs during the first few 10s of μs of the discharge, transforming the kinetic energy of the two initial compact tori into thermal energy of a single, static FRC20. Neutral beam heating is then applied to this initial FRC to provide the necessary heating and current drive to sustain the discharge against thermal and resistive flux losses.

When the accelerations in both formation sections are slightly different with respect to each other, FRCs do not collide exactly in the middle of the confinement section, leading to a merged FRC with a residual velocity. The resulting FRC is bounced back and forth in the axially stabilizing external mirror field until its position is stabilized around the machine mid-plane. Analysis of these oscillations provides a way to test the compliance of the inferred forces and accelerations with Newton’s second law, using estimations of the plasma mass obtained by other diagnostics, as it will be shown.

A plasma discharge exhibiting Alfvenic plasma oscillations around the midplane is chosen for the study, as illustrated in Fig. 3. Contours of poloidal flux and forward prediction of the actual magnetic measurements are shown every 10 μs. The frequency of the oscillations is about ~ 20 kHz.

Fig. 3
figure 3

Inference of  Alfvenic oscillations. Poloidal flux structure at the start of the C-2U shot #49040. Left panels: Contour map of the poloidal flux and its evolution in 10 μs intervals. External (red squares) and internal (blue squares) magnetic sensor locations are shown along with the vacuum vessel contour (black) intersecting the flux contours. Right panels: Magnetic field predictions (black) superimposed with the corresponding external (red) and internal (blue) probe measurements (after the magnetic field offset at t = 0 is subtracted). The forward prediction of the measurements is so accurate that differences with actual measurements are barely distinguishable. The frequency of the oscillations around the mid-plane is approximately 20 kHz. The flux-conserving effect of the vessel on this fast scale is evident from the absence of magnetic field change on the external probes

The axial position of the o-point is shown in Fig. 4 with high time resolution (every 10 μs), along with other geometric descriptors and plasma variables related to those. The o-point position is strongly correlated with the vessel current imbalance \(I_{\mathrm{V}}^{\mathrm{\zeta }}\), defined as:

$$I_{\mathrm{V}}^{\mathrm{\zeta }} = I_{\mathrm{V}}^{{{z}} > 0} - I_{\mathrm{V}}^{{{z}} < 0}$$
(1)

where \(I_{\mathrm{V}}^{{{z}} > 0}\) is the net toroidal current flowing in one half of the vessel with z > 0, and \(I_{\mathrm{V}}^{{{z}} < 0}\) is the net toroidal current flowing the other half of the vessel with z < 0. For a static plasma, the vessel current imbalance is zero. As plasma moves back and forth, mid-plane antisymmetric current components are induced in the vacuum vessel, which eventually dissipate ohmically. These are in the direction to oppose and slow down the plasma movements. As a result, a strong correlation between the o-point axial position and the vessel current imbalance exists, as shown.

Fig. 4
figure 4

Inference of Alfvenic oscillations. Inferred plasma variables at the start of the C-2U shot #49040. Solid lines correspond with the expected values of the posterior distribution, and shaded areas are a measure of the uncertainty of the inferred variables corresponding to one standard deviation. a Separatrix radius Rs (blue) and o-point radius R0 times \(\sqrt 2\) (green). Rs is found to be proportional to the R0, in agreement with Eq. (7). b Trapped flux (blue) and the trapped flux approximation for a long FRC (green). c Separatrix length (distance between x-points). d Total plasma current. e o-point z position. f Vessel current imbalance

The separatrix radius is found to be proportional to the o-point radius, which is in agreement with Eq. (7). The trapped flux ψ0 also matches approximately the approximation (9) for an elongated FRC. These approximations are not used in the inference process but as a check for consistency of the final results with these limiting cases.

The total number of deuterons in the plasma is estimated from line integrated density measurements integrated over the excluded flux radius. Plasma mass is estimated to be mp = 1.3 × 10−7 kg from the deuteron mass times the deuteron inventory. The acceleration \(\ddot z\) of the o-point can be determined from its position z (see Fig. 4). The net Lorentz force F z exerted over the whole plasma current distribution can be determined from the inferred current distribution and derived magnetic field. It turns out the product of the plasma mass and acceleration \(\ddot z\) is consistent with the inferred electromagnetic force

$$F_{{z}} = m_{\mathrm{p}}\ddot z$$
(2)

within one standard deviation, as illustrated in Fig. 5. So Newton’s second law is recovered from the inference results. The algorithm, however, is not very accurate during the first 50 μs or so of the discharge (right after formation/merging) presumably because the smoothing prior used cannot adequately describe the abrupt profiles resulting from shock waves or violations of other prior assumptions.

Fig. 5
figure 5

Inference of Alfvenic oscillations. a Hooke’s constant as obtained from the inferred magnetic configuration. b Estimated particle inventory as stored in the C-2U database. c Electromagnetic force (solid green), plasma mass times acceleration (red) and Hooke’s constant times axial plasma displacement (blue). The shaded area corresponds to one standard deviation

Another test of relevance is to check whether the axial forces are proportional to some measure of plasma position z

$$\frac{{\partial F_{{z}}}}{{\partial z}} \cong \frac{{F_{{z}}}}{z}$$
(3)

If Eq. (3) is valid for some axial range, a Hooke’s constant can be defined. For a rigid plasma current distribution subjected to an infinitesimal displacement, the Hooke’s constant can be evaluated from the plasma current distribution and the externally applied flux ψext (from magnets and FC currents) as an integral extending over the plasma domain Ω21

$$k_z = - \frac{{\partial F_z}}{{\partial z}} = 2\pi {\int} {\mathop {\int}\limits_{\varOmega } {j_\phi \frac{{\partial ^2\psi _{{\rm ext}}}}{{\partial z^2}}{\rm d}r{\rm d}z} }$$
(4)

Note that when taking derivatives the flux created by the plasma does not change with z, as the plasma is considered a rigid object; only the external flux does change due to the relative motion.

A positive Hooke’s constant corresponds with a magnetic configuration that is axially stable and vice versa. The evolution on Hooke’s constant and the axial force exerted on the plasma as a result of its axial displacement are shown in Fig. 5. Axial force and displacement are linearly dependent in a range of +/−1 m around the mid-plane, so plasma dynamics can be approximated by a linear partial differential equation in this range. This is interesting for the future plasma control goals, since control theory and practice are well established for linear systems22.

The Hooke’s constant is positive due to the axially stabilizing external field and therefore consistent with an axially stable magnetic configuration that reaches the mid-plane after a few oscillations, as observed. The inferred value of about 1000 N/m is in agreement with the results obtained using the Lamy Ridge code23. The inference method can also provide the axial stability properties of the magnetic configuration. This is an important information for plasma control of future devices, which will require to establish and sustain an axially unstable plasma in equilibrium around the mid-plane z = 024. A method to determine the axial stability properties of the magnetic configuration will be therefore required.

Comparison with plasma imaging

High-speed imaging of visible plasma emission is an independent technique that can yield information about the plasma dimensions. In this study, qualitative agreement between visible light emission from intrinsic oxygen impurity ions and the dynamics of the inferred poloidal flux contours serve as additional validation of the proposed inference method. Photons emitted from the 3d→3p transition (at 650.0 nm) of O4+ were measured using a filtered high-speed camera with a radial view of the plasma25.

Emissivity of this spectral line was reconstructed (assuming axis symmetry) using the Simultaneous Algebraic Reconstruction Technique26. The core FRC electron temperature and density are more than sufficient to ionize the O4+ charge state and populate higher charge states via electron impact excitation; therefore, minimal emission from this spectral line is found in the core. Instead, emission is peaked in the SOL where the electron temperature and density are lower and diffusive transport from boundary sources competes with ionization to higher charge states.

An example comparing the results of the magnetic inference method with the emissivity reconstruction for this spectral line is shown in Fig. 6. A relatively high-density plasma discharge (#48269) was chosen so that the emission measurement had good signal. Good agreement in the temporal dynamics of the reconstructed poloidal flux and emissivity is observed. This agreement provides further validation of the proposed inference method and is all the more encouraging since the two reconstructed quantities are derived from independent measurements (magnetics vs. photons) and analysis procedures.

Fig. 6
figure 6

FRC poloidal flux contours. Flux contours (cyan) superimposed with surfaces of equal emissivity from the 3d→3p oxygen 4+ transition (grey scale). Maximum emissivity is shown in white, while no emissivity is represented by black. The flux contour levels corresponding with 0 Wb (separatrix) and 0.01 Wb are labelled to show how flux expands in the open field region as the discharge evolves. The region of peak emissivity approximately tracks the flux expansion

The overall consistency of inferred results (Fig. 7) is also good, with the following highlights: (a) \(R_{\mathrm{s}} = \surd 2R_0\) is really a very good approximation, within one sigma. (b) The long FRC trapped flux (Eq. (9)) \(\psi _0 = B_{\mathrm{w}}R_{\mathrm{s}}^3{\mathrm{/}}R_{\mathrm{w}}\) is also a very good approximation, being its overall magnitude in agreement with similar results obtained with hybrid and Grad–Shafranov equilibrium codes27, 28. (c) The magnitude of field reversal on axis is very significant and consistent with radial force balance (Eq. (10)) predictions. (d) The approximation (14) does not reproduce well the inferred FRC length. (e) There is a correlation between FRC length and plasma current, as expected from Eq. (11). However, the approximation (11) does not reproduce well the inferred results, partly because this approximation does not consider a current distribution flowing outside the separatrix. (f) Vessel current decays in about ~ 5 ms, comparable with the characteristic time over which the FRC is passively stabilized.

Fig. 7
figure 7

Inference of plasma magnetic related variables. Solid lines correspond with mean values of the posterior distribution, and shaded areas are a measure of the uncertainty of the inferred variables corresponding to one standard deviation for shot # 48269. a Separatrix radius Rs (blue) and o-point radius R0 times \(\sqrt 2\) (green). b Trapped flux (blue) and the trapped flux approximation for a long FRC (green). c Machine axis Bz at the o-point z position (blue) and comparison with the long FRC approximation (green). d Separatrix length (blue) compared with the approximation from excluded flux (green). e Total plasma current (blue) and comparison with the long FRC approximation (green). f Total vessel current

Discussion

We have used the CT method to provide a direct inference of the internal FRC magnetic topology, both during steady state and fast Alfvenic transients. The viability of the approach has been verified in a number of ways, including comparisons with approximate results from a long FRC approximation, recovering of a force balance dynamic equation, and comparison with imaging of visible plasma emission.

All current sources have been modelled as GPs and inferred from external magnetic measurements using Bayesian analysis. Smoothing priors (for plasma current and vessel current distributions) and a flux-conserving prior derived from Lenz’s law (for the magnet currents) have been used in the inference. From all the inferred current sources, FRC topology and dynamic properties have been obtained. This includes the separatrix geometry and the axial stability properties of the magnetic configuration, among others.

When GP priors are used, and linear relationships among current sources and measurements can be established, the CT solution involves non-iterative matrix operations and is then ideally suited for deterministic real-time applications. Because no equilibrium assumptions are used in this case, inference of plasma topology and dynamics up to Alfvenic frequencies then becomes possible. The FRC topology and dynamics have been determined during Alfvenic oscillations, with excellent self-consistency of results.

Methods

FRC approximations

The inference results of experimental data presented have been compared with first-order approximations for FRC parameters, which are summarized below. These are valid for an elongated FRC inside a FC of constant radius Rw with negligible field line curvature at the mid-plane, termed the long FRC approximation.

The radial pressure balance condition relates the magnetic field component in the axial direction right beneath the inner vessel walls at the o-point plane Bw with the average kinetic pressure of the plasma:

$$P\left( r \right) \cong \frac{{B_{\mathrm{w}}^2 - B^2\left( r \right)}}{{2\mu _0}}.$$
(5)

So the average kinetic pressure of the plasma at the o-point (where the magnetic pressure is null) must necessarily be equal to the magnetic pressure at the confinement vessel walls

$$P\left( {0 - {\mathrm{point}}} \right) = \frac{{B_{\mathrm{w}}^2}}{{2\mu _0}}.$$
(6)

When the plasma pressure is a flux function, and Eq. (5) is fulfilled, then the o-point radius is proportional to the separatrix radius 1.

$$R_0 \cong \frac{{R_{\mathrm{s}}}}{{\surd 2}}.$$
(7)

In addition to the former, the plasma is in axial force balance, and the maximum beta achievable by an ideal FRC surrounded by a perfect FC of constant radius Rw is given by the Barnes’ average β condition29,

$$\beta \cong 1 - 0.5\left( {\frac{{R_{\mathrm{s}}}}{{R_{\mathrm{w}}}}} \right)^2$$
(8)

which depends solely on the ratio of separatrix radius Rs to FC wall radius Rw.

When both axial and radial pressure balance are fulfilled, the flux at the o-point (trapped flux) is given by

$$\psi _0 \cong \frac{{B_{\mathrm{w}}R_{\mathrm{s}}^3}}{{R_{\mathrm{w}}}}.$$
(9)

The plane perpendicular to the machine axis that contains the o-point is termed the o-point plane. The intersection of the o-point plane with the machine axis determines the point were the axial component of the magnetic field is minimum. From Eqs. (5) and (9), the magnitude of the field reversal Bax at this point is

$$B_{{\mathrm{ax}}} \cong - \frac{{R_{\mathrm{s}}}}{{R_{\mathrm{w}}}}B_{\mathrm{w}}$$
(10)

From Ampere’s law and Eq. (10), the total plasma current in a very elongated FRC of length L can be approximated by

$$I \cong \frac{{\left( {B_{\mathrm{w}} - B_{{\mathrm{ax}}}} \right)L}}{{\mu _0}} \cong \frac{{B_{\mathrm{w}}}}{{\mu _0}}\left( {1 + \frac{{R_{\mathrm{s}}}}{{R_{\mathrm{w}}}}} \right)$$
(11)

The plasma elongation is defined as

$$E = \frac{L}{{2R_{\mathrm{s}}}}.$$
(12)

A common approximation to separatrix radius and length comes from the excluded flux radius axial profile, which can be derived directly from magnetic sensors30, as explained below.

If \(\psi ^{\mathrm{i}},B_{{z}}^{\mathrm{i}}\) are the flux and field profiles determined at positions zi along the internal wall of the vacuum vessel with radius r = Rw, the excluded flux radius profile is defined as

$$R_{\mathrm{\psi }}^{\mathrm{i}} = R_{\mathrm{w}}\sqrt {1 - \frac{{\psi ^{\mathrm{i}}}}{{\pi R_{\mathrm{w}}^2B_{{z}}^{\mathrm{i}}}}}.$$
(13)

The position of the plasma mid-plane can then be estimated from the position where the excluded flux radius has its maximum.

A first-order approximation for the FRC separatrix radius is to consider it equal to the excluded flux radius at the mid-plane. A first-order approximation for the FRC x-point position is taken as the point along the axis Z2/3 where the excluded flux radius has fallen to 2/3 of its value at the mid-plane28. The FRC length is then approximated by

$$L = 2Z_{2{\mathrm{/}}3}.$$
(14)

Bayesian inference of GPs

Bayesian inference is used in this paper to calculate the posterior distribution of currents given the magnetic measurements. The method, however, is generic enough to be used in a variety of related tomographic problems, which can be stated as follows.

Given a forward model D = H(X) relating a continuous variable X(r) function of location r = (r1,r2,r3) with some discrete set of measurements in the data vector D, the objective is to obtain all the solutions for X(r) that can explain the measurements in D. These are arranged into a probability distribution p(X|D) termed the posterior. A likelihood probability distribution p(D|X) measures the misfit between the model predictions H(X)and the measurements D. The probability of the spatial variable p(X) prior to taking any measurements is termed the prior probability distribution. According to Bayes theorem14, the posterior can be obtained from the likelihood distribution and the prior as

$$p\left( {X{\mathrm{|}}{\mathbf{D}}} \right) = \frac{{p\left( {{\mathbf{D}}{\mathrm{|}}X} \right)p(X)}}{{p({\mathbf{D}})}}.$$
(15)

The term in the denominator p(D) is called the evidence (or marginal likelihood) and normalizes the volume of the posterior distribution to 1.

$$p\left( {\mathbf{D}} \right) = \textstyle\int p\left( {{\mathbf{D}}{\mathrm{|}}X} \right)p(X)\mathrm{d}X.$$
(16)

Given prior and likelihood, the most likely solution is the maximum a posteriori (MAP) estimate, the solution in the posterior with the highest probability.

In the particular case where the forward model is linear, the spatial variable X(r) can always be discretized on a fine grid of dimension k, and a matrix \({\mathbf{K}} \in {\Bbb R}^{n \times k}\)can be used to relate the discretized variable \({\mathbf{X}} \in {\Bbb R}^k\) with a set of n measurements in \({\mathbf{D}} \in {\Bbb R}^n\)

$${\mathbf{D}} = {\mathbf{KX}} + {\mathbf{\varepsilon }}$$
(17)

Assuming additive Gaussian measurement noise ε = N(0,ΣD) independent of X, the likelihood function can be modelled by an n-dimensional Gaussian distribution

$$p\left( {{\mathbf{D}}{\mathrm{|}}{\mathbf{X}}} \right) = \frac{1}{{(2\pi )^{n/2}|{\mathbf{\Sigma }}_{\mathrm{D}}|^{1/2}}}{\mathrm{exp}}\left( { - \frac{1}{2}\left( {{\mathbf{D}} - {\mathbf{K}}{\boldsymbol{X}}} \right)^{\mathrm{T}}{\mathbf{\Sigma }}_{\mathrm{D}}^{ - 1}\left( {{\mathbf{D}} - {\mathbf{K}}{\boldsymbol{X}}} \right)} \right)$$
(18)

where \({\mathbf{\Sigma }}_{\mathrm{D}} \in {\Bbb R}^{n \times n}\) is the data covariance matrix.

The prior distribution can also be approximated by a multivariate probability distribution over \(\bar X\)

$$p\left( {\mathbf{X}} \right) = \frac{1}{{(2\pi )^{k{\mathrm{/}}2}|{\mathbf{\Sigma }}_{\mathrm{X}}|^{1{\mathrm{/}}2}}}{\mathrm{exp}}\left( { - \frac{1}{2}\left( {{\mathbf{X}} - {\mathbf{\mu }}_{\mathrm{X}}} \right)^{\mathrm{T}}{\mathbf{\Sigma }}_{\mathrm{X}}^{ - 1}\left( {{\mathbf{X}} - {\mathbf{\mu }}_{\mathrm{X}}} \right)} \right)$$
(19)

where \({\mathbf{\Sigma }}_{\mathrm{X}} \in {\Bbb R}^{k \times k}\) is the prior covariance kernel and \({\mathbf{\mu }}_{\mathrm{X}} \in {\Bbb R}^k\) is the prior mean. It is usually convenient (but by no means necessary) to consider a zero mean μX = 0 on the prior.

The posterior distribution can likewise be approximated by a k-dimensional Gaussian probability distribution.

$$p\left( {{\mathbf{X}}|{\mathbf{D}}} \right) = \frac{1}{{(2\pi )^{{\mathrm{k}}/2}|{\mathbf{\Sigma }}|^{1/2}}}{\mathrm{exp}}\left( { - \frac{1}{2}\left( {{\mathbf{X}} - {\mathbf{\mu }}} \right)^{\mathrm{T}}{\mathbf{\Sigma }}^{ - 1}\left( {{\mathbf{X}} - {\mathbf{\mu }}} \right)} \right).$$
(20)

Since all probability distributions are Gaussian, the posterior distribution can be obtained analytically, since Gaussian distributions are transformed into Gaussian distributions through linear operations. The posterior mean (MAP estimate) and covariance are given in this case by11:

$${\mathbf{\Sigma }} = \left( {{\mathbf{K}}^{\mathrm{T}}{\mathbf{\Sigma }}_{\mathbf{D}}^{ - 1}{\mathbf{K}} + {\mathbf{\Sigma }}_{\mathbf{X}}^{ - 1}} \right)^{ - 1},$$
(21)
$${{\mathbf{\mu }}} = {\mathbf{\Sigma K}}^{\mathrm{T}}{\mathbf{\Sigma }}_{\mathbf{D}}^{ - 1}{\mathbf{D}}.$$
(22)

As the dimension k of the multivariate normal distribution is made increasingly large, the multivariate normal distribution approaches a continuous distribution, and at this limit a GP is obtained15. In our case, the vector X becomes a continuous function X(r) of the spatial location. All possible solutions for X(r) can then be thought of as being generated by a stochastic process, described by the corresponding GP.

For a large number of situations in plasma physics, transport processes will work in the direction to reduce the spatial gradients of X(r). In other words, our prior belief about X(r) is that it must be a smooth function of r. The prior covariance kernel required for the inference can then be parameterized using the Squared exponential (SE) function, which is one of many options available15 to model the spatial correlations between the values of a smooth profile variable at two points r and r':

$${\mathbf{\Sigma }}_{\mathrm{X}}\left( {{\mathbf{r}},{\mathbf{r}}\prime } \right) = {\mathrm{\sigma }}^2{\mathrm{exp}}\left( { - \frac{1}{2}\left( {{\mathbf{r}} - {\mathbf{r}}\prime } \right)^{\mathrm{T}}{\mathbf{\Lambda }}^{ - 1}\left( {{\mathbf{r}} - {\mathbf{r}}\prime } \right)} \right)$$
(23)

with \({\mathbf{\Lambda }} = {\mathrm{diag}}\left( {\lambda _1^2,\lambda _2^2,\lambda _3^2} \right)\).

In the Bayesian context, σ and λ i are termed the prior hyper-parameters. The standard deviation σ controls the spread of values of X. The scale length λ i determines how quickly the plasma variable can change with the coordinate r i . A large length scale will give a large covariance between the values of the variable X at different r i coordinates, so the prior probability (Eq. (19)) for large differences between the values of the plasma variable X at neighbouring positions \({\mathrm{r}}_i,{\mathrm{r}}\prime_i\) will be low. In other words, if the plasma profiles are smooth, the corresponding scale lengths will be large, and vice versa.

The SE kernel is by no means the only choice for a prior covariance kernel. A good review of GPs and the most common covariance kernels used can be found in ref. 15. In general, the prior covariance kernel will have a set of hyper-parameters, arranged in a vector θ for simplicity.

Determination of the prior hyper-parameters can be considered as a continuous model selection problem, where the more likely hyper-parameters are obtained directly from the data31.

The posterior for the hyper-parameters is p(θ|D), which from Bayes theorem is

$$p\left( {{\mathbf{\theta }}{\mathrm{|}}{\mathbf{D}}} \right) = \frac{{p\left( {{\mathbf{D}}|{\mathbf{\theta }}} \right)p({\mathbf{\theta }})}}{{p({\mathbf{D}})}}$$
(24)

where p(D|θ) is the likelihood and p(θ) is the hyper-prior (prior for the hyper-parameters).

In Bayesian model selection, the optimum set of hyper-parameters θopt is selected to maximize this probability.

$${\mathbf{\theta }}_{{\mathrm{opt}}} = {\mathrm{arg}}_{\mathbf{\theta }}{\mathrm{max}}\left( {p\left( {{\mathbf{\theta }}{\mathrm{|}}{\mathbf{D}}} \right)} \right)$$
(25)

The prior over the hyper-parameters p(θ) in Eq. (24) is usually taken to be flat, since there is no indication of what are the best hyper-parameters before seeing the data. In this case, the optimal set of hyper-parameters that maximizes likelihood of the data with respect to the hyper-parameters is

$${\mathbf{\theta }}_{{\mathrm{opt}}} = {\mathrm{arg}}_{\mathbf{\theta }}{\mathrm{max}}\left( {p\left( {{\mathbf{\theta }}{\mathrm{|}}{\mathbf{D}}} \right)} \right) = {\mathrm{arg}}_{\mathbf{\theta }}{\mathrm{max}}\left( {p\left( {{\mathbf{D}}|{\mathbf{\theta }}} \right)} \right)$$
(26)

Given a set of hyper-parameters θ, there is an infinite class of plasma profiles X(r) that can be generated by the corresponding prior covariance p(X|θ) through the corresponding GP. The quality of the data fit must be evaluated not just for one particular solution but for all the solutions that can be obtained for a given set of hyper-parameters. The likelihood should be integrated out (marginalized) with respect to all these possible profiles generated by a single set of hyper-parameters, so it becomes a marginal likelihood.

$${\mathbf{\theta }}_{{\mathrm{opt}}} = {\mathrm{arg}}_{\mathbf{\theta }}{\mathrm{max}}\left( {p\left( {{\mathbf{D}}|{\mathbf{\theta }}} \right)} \right) = {\mathrm{arg}}_{\mathbf{\theta }}{\mathrm{max}}\left({ {\textstyle\int} p\left( {{\mathbf{D}}|X,{\mathbf{\theta }}} \right)p\left( {X|{\mathbf{\theta }}} \right)\mathrm{d}X} \right)$$
(27)

In the particular case at hand where p(X) is a GP, the likelihood is normal and the model linear, the marginal likelihood can be calculated analytically. The expression for its logarithm is16

$$L = - \frac{1}{2}{\mathrm{log}}\left| {{\mathbf{K\Sigma }}_{\mathrm{X}}{\mathbf{K}}^{\mathrm{T}} + {\mathbf{\Sigma }}_{\mathrm{D}}} \right| - \frac{1}{2}\,{\mathbf{D}}^{\mathrm{T}}\left( {{\mathbf{K\Sigma }}_{\mathrm{X}}{\mathbf{K}}^{\mathrm{T}} + {\mathbf{\Sigma }}_{\mathrm{D}}} \right)^{ - 1}{\mathbf{D}} - \frac{n}{2}{\mathrm{log}}\left( {2\pi } \right).$$
(28)

For any given prior kernel, the maximum of the expression (28) with respect to the hyper-parameters gives the optimal set of hyper-parameters that explain D.

Inference model for the C-2U device

The C-2U magnetic model used for the analysis comprises a total of 42 magnets, 31 vacuum vessel (passive) segments and a current distribution made of 734 discrete plasma current elements modelled as block coils (Fig. 8). Of special relevance are 8 equilibrium (EQ) magnets in the confinement vessel and 6 FC magnets, which can be used as passive FCs or be connected to power supplies.

Fig. 8
figure 8

C-2U magnetostatic model. Toroidal plasma current is modelled using 734 discrete plasma current elements modelled as block coils (back grid inside the vessel contour in blue). The vessel is modelled using 31 flat block coil elements (not shown). Insulating quartz tubes (formation sections) are shown in red. All the magnets in confinement and formation sections are considered. Pulsed powered fast switching coils for plasma formation and acceleration (not shown) are located right outside the quartz tube. Magnetic probes inside and outside the confinement vessel are shown as red circles

The magnetic measurement system on C-2U32, 33 comprises a set of 19 magnetic pick-up probes placed inside the confining vessel and 8 external pick-ups located right underneath the 8 EQ magnets (Fig. 8). There are also Rogowski-based current measurements for all the FC magnets currents IEQ and also for some of the EQ magnets currents IEQ. For the rest of the magnets, only the set point used for its control is known.

The inference problem at hand requires finding the most likely solution for the elements of the total plasma current distribution arranged in a vector IP, along with the most likely solution for current induced in the confining vessel IV and all the magnets IM. A diagram illustrating the magnet location and grid used for the current distribution is shown in Fig. 8.

All the currents to be inferred are arranged into a single current vector

$${\mathbf{I}} = \left\{ {{\mathbf{I}}_{\mathrm{P}},{\mathbf{I}}_{\mathrm{V}},{\mathbf{I}}_{\mathrm{M}}} \right\}.$$
(29)

All current sources are modelled as GPs as described earlier. The information available to perform the inference comes from (i) set points for all IM, (ii) current measurements for IFC and a few IEQ, (iii) measurements of magnetic field at several locations outside the plasma region, both inside and outside the confining vessel, (iv) null boundary conditions for plasma current distribution, and (v) null boundary conditions for the flux change underneath the equilibrium magnets \(\frac{{\partial \psi }}{{\partial t}} \cong 0\), which behave as perfect FCs on the timescale of the discharge.

The boundary conditions (iv) and (v) are built directly into the prior, to obtain solutions where the plasma current distribution drops to zero at the domain, and the flux is conserved at the magnet locations (flux-conserving prior).

From the inferred currents in I is then straightforward to calculate the poloidal flux and magnetic field components on the domain grid using the matrix representations M,GR,GZ of the Biot–Savart operator34:

$$\begin{array}{*{20}{c}} {\mathbf{\psi }} & = & {{\mathbf{M}}\,{\mathbf{I}}}; \\ {{\mathbf{B}}_{\mathrm{R}}} & = & {{\mathbf{G}}_{\mathrm{R}}{\mathbf{I}}}; \\ {{\mathbf{B}}_{\mathrm{Z}}} & = & {{\mathbf{G}}_{\mathrm{Z}}{\mathbf{I}}}. \end{array}$$
(30)

Main plasma shape and position variables of interest for control such as x-point, o-point and separatrix radius can then be obtained directly by searching for nulls on the magnetic field and flux along the axis and mid-plane. Low-order moments of the plasma current distribution of interest for control such as total plasma current or the axial position of current centroid can likewise be obtained from linear operations.

$$\begin{array}{*{20}{l}} {I_{\rm P}} \hfill & = \hfill & {{\mathrm{sum}}\left( {{\mathbf{I}}_{\mathrm{P}}} \right)} \hfill; \\ {z_0I_{\rm P}} \hfill & = \hfill & {{\mathbf{z}}^{\mathbf{T}}{\mathbf{I}}_{\mathrm{P}}} \hfill. \end{array}$$
(31)

Data availability

All relevant data supporting the findings of this study are available from the authors on request.