Introduction

Developing atomic sensors with high sensitivity and compact configuration is a topic of great interest in quantum science and technologies. Prominent measurement devices including atomic clocks1,2, atom interferometers3, magnetometers4 and microwave sensors5 etc., are under active pursuit and play important roles in both fundamental research and real-life applications ranging from new physics search6 to navigation and medical diagnosis7,8. While in most scenarios the sensing process can be described by a single parameter estimation problem, multiparameter estimation9,10 has recently attracted attention both theoretically and experimentally. Notable examples are measurements of a multi-dimensional field, identification of a spatial structure11 or multi-frequency signals12. In general, multiparameter measurement requires a more involved sensor architecture, such as applying several electromagnetic fields along different directions to interact with the atoms, or performing successive interrogations under varied conditions. Furthermore, the relation between the observable readings and the parameters can be complex and decoding may require model fitting or elaborate data analysis techniques13,14,15.

Machine learning (ML), as a part of artificial intelligence, involves model-building based on sample data, or training data, to “learn" and then to make predictions without an explicit programme. ML is used widely for instance in speech recognition16, computer vision17, social network filtering18, medical diagnosis19,20 etc. Recently, ML has been applied in many fields of physics, to name a few, ultrafast laser science21,22, ultracold atoms23, many-body physics24, classification of quantum phases25, and quantum error correction26. Some works have also demonstrated its use in atomic sensors12,27, where it was shown that ML can perform better than a physics model. However, in these proof-of-principle experiments on atomic sensors, ML is merely used in analyzing the signal’s time trace to extract several frequency components. The potential of ML in atomic sensors, especially in multiparameter estimation, is yet to be unveiled. How to obtain the measurement sensitivity from the ML, and whether incorporating ML can significantly reduce the complexity in the sensor’s hardware remains elusive.

As an example of multiparameter atomic sensor, the vector magnetometer undergoes intense investigations for it provides more complete information than its scalar counterpart and has applications in biosciences, geophysics etc. To attain the magnetic field’s orientation, the sensor needs to incorporate certain axial references, for example field compensation coils28, radio frequency fields29,30,31, multiple crossing laser beams32,33,34,35,36, which all inevitably complicates the setup. Also, in many schemes the three-dimensional information is obtained successively29,37, or through sweeping the atomic resonance spectra38,39,40, which may not be suitable for relatively fast or real-time field measurement36. Simultaneous acquisition of the three-dimensional information can be achieved by modulating bias magnetic fields in three perpendicular directions at different frequencies41,42,43, thus discerning the three orthogonal magnetic field components. An all optical version of this method has been demonstrated by replacing the bias magnetic fields with orthogonally propagating laser fields imposing AC-stark shifts to the atoms44. However, in scenarios requiring miniaturization and high density packing of the sensors, all optical single-beam single-shot (within the sensor’s response time) vector magnetometry is desired, whereas to the best of our knowledge has not been reported.

Here, we propose a paradigm for vector magnetometry based on machine learning, which enables a single-shot single-beam all optical vector magnetometer. The information is encoded in the AC components of the optical rotation signal, where the complicated and nonlinear relation between the set of four simultaneously recorded signals and the three parameters of the B field is established via machine learning. Removing the demand of the correspondence between one signal and one parameter as needed in most existing designs allows great simplification of the sensor structure, empowering vector magnetometry with a scalar magnetometer architecture. We further develop techniques for extracting sensitivities and frequency response of the ML-based magnetometer. The achieved sensitivities are about 100 \({{{{{{{\rm{fT}}}}}}}}/\sqrt{{{{{{{{\rm{Hz}}}}}}}}}\) for the field magnitude, and about \(100 \sim 200\,\mu rad/\sqrt{{{{{{{{\rm{Hz}}}}}}}}}\) for the field direction, in a room temperature Rb vapor cell. This magnetometer approach may provide insight in designing compact sensors with multiple measurement capabilities.

Results

Principle

Our magnetometer scheme is based on the well known nonlinear magneto-optical rotation (NMOR) process45,46,47,48. An elliptically polarized and frequency modulated laser beam serves as both the pump and probe field. The ellipticity of the light is optimized for balanced sensitivities of the magnetic field along different directions49 (see Supplementary Note 3). The modulation frequency ωm is set near the Larmor frequency of the atom ΩL = γB where B is the amplitude of the total magnetic field to be measured and γ is the gyromagnetic ratio. With the direction of B set as the quantization axis, the atomic levels then couple with the σ+, σ and π polarization components whose amplitudes and phases depend on the orientation of the magnetic field with respect to the wave vector of the laser38. These optical fields and their frequency sidebands form multiple sets of Λ-type electromagnetically-induced-transparency (EIT) interactions that interfere with each other, as shown in Fig. 1a, giving rise to optical rotation effects. Since the NMOR resonance occurs when ΩL and ωm coincide, the phases and intensities of the transmitted sidebands naturally encode both the amplitude and the orientation of B. The AC components of the polarization rotation signals, i.e., the Stokes component Sy, are acquired by phase-sensitive detection through frequency demodulation, where the in-phase and quadrature signals at the first and second harmonics of ωm, denoted as X1,2 and Y1,2, are recorded. Simultaneous recording of these four signals allows for single-shot vector magnetometry, lifting the requirement of sweeping the EIT spectrum as in refs. 38,39,40.

Fig. 1: Working principle and schematics of the single-shot all optical vector magnetometer.
figure 1

a Frequency modulated elliptically polarized light interacts with the 87Rb atom, coupling the ground state 5 2S1/2(F = 2) and the excited state 5 \({}^{2}{{{{{{{{\rm{P}}}}}}}}}_{1/2}\,({F}^{{\prime} }=1)\). With the direction of the total magnetic field set as the quantization axis, atomic levels exhibit Zeeman splitting. Frequency modulation of the laser gives rise to frequency sidebands with the intervals of the modulation frequency ωm near the Larmor frequency ΩL. The σ+, σ, π components of the laser form multiple sets of EIT. b Schematics of the experiment setup. ECDL external cavity diode laser, HWP half-wave plate, QWP quarter-wave plate, PBS polarization beam splitter, PC personal computer.

To extract the vectorial information of the magnetic field from the rotation signals, we adopt an artificial Neural Network (ANN) which is a typical algorithm of ML. By mimicking the way biological neural network learns from experience, the ANN establishes a map between input signals and output results using pre-collected data, and can thus give predictions on unknown parameters, for example, here, on the direction and magnitude of an unknown B. The network weights (parameters) are updated using the gradient descent algorithm50 to minimize the defined loss function over the training data set. Each time when the NN goes through the whole training data set and returns new weights in the network is called an epoch. The loss decreases as the epoch number increases and the map is eventually established. In our scheme, the demodulated optical rotation signals X1,2 and Y1,2 are first collected for a range of field amplitudes and directions, and then are used to train the NN. In the end, an accurate map is established between the signal set (X1, Y1, X2, Y2) and the parameter set (B, θ, φ), i.e., the three-dimensional field information. Here θ is normally defined as the angle between B and the wave vector k of the laser, and φ is the azimuthal angle in the plane perpendicular to the wave vector with φ = 0 being the horizontal \({x}\) direction associated with the polarization axis of the optics (Fig. 1).

Experimental setup

As shown in Fig. 1b, the light beam from an external cavity diode laser (ECDL) is near resonant with the 87Rb D1 line \(F=2\to {F}^{{\prime} }=1\) transition with 200 MHz red detuning to maximize the NMOR resonance amplitude51,52. The laser is frequency modulated (FM) at ωm = 997 Hz with a modulation range of 400 MHz (or modulation amplitude of 200 MHz), and its center frequency is locked via the dichroic atomic vapor laser lock53. The laser beam (about 2 mm in diameter) has its power (about 20 μW) stabilized in order to suppress the residual amplitude modulation. We adjust the laser polarization from linear to elliptical through two wave plates, before a cylindrical atomic vapor cell (2 cm in diameter and 7.1 cm in length) filled with enriched 87Rb at room temperature (~22 °C).

The alkene coating54 on the inner wall of the vapor cell ensures that atoms undergo thousands of wall collisions with little destruction of their internal quantum states. The cell resides within a four-layer μ-metal magnetic shield (residual field inhomogeneity in the cell is about 1 nT), together with three orthogonal sets of well-calibrated Helmholtz coils to generate the to-be-measured field B, with a fractional magnetic field inhomogeneity of 8/1000 within the cell. The NMOR resonance used for the magnetometer has an extracted zero-power linewidth (full width at half maximum, FWHM) of about 1 Hz, and a power broadened FWHM of about 16 Hz at the magnetometer’s operational laser power 20 μW.

The Stokes component Sy of the transmitted laser beam, after traversing a half-wave plate and a polarization beam splitter, is detected by a balanced photodetector in a homodyne configuration, whose output is sent to a lock-in amplifier for demodulation at frequencies ωm and 2ωm.

Experiment results

Before collecting data for NN training, it is necessary to calibrate the residual magnetic field within the shields and the three sets of coils, in order to generate a field B with arbitrary direction. For a single set of coil, one can observe a good linear relation between the current applied and the magnetic field generated, but for the vector compositions of the magnetic field, the small non-orthogonality between the coils can’t be neglected. Thanks to the fact that the NMOR resonance appears when the Larmor frequency ΩL equals the modulation frequency ωm or \(\frac{1}{2}{\omega }_{{{{{{{{\rm{m}}}}}}}}}\)46, these imperfections can be well calibrated. The details of the calibration process are described in Methods and Supplementary Note 2.

First, we show the observed AC optical rotation signals in the form of NMOR resonance spectra at a tilted magnetic field direction. For instance, at θ = 60°, φ = 60°, when we scan the magnitude of B, as shown in Fig. 2a, both the first harmonic and second harmonic NMOR signals exhibit resonance at ΩL = 0, ωm and \(\frac{1}{2}{\omega }_{{{{{{{{\rm{m}}}}}}}}}\). The resonance center can be found precisely by fitting the curves with a generalized Lorentzian function, which is the key in coil calibration. For vector magnetometry, we choose the resonance at ΩL = ωm, since more EIT channels take part in the interferences than the \({{{\Omega }}}_{{{{{{{{\rm{L}}}}}}}}}=\frac{1}{2}{\omega }_{{{{{{{{\rm{m}}}}}}}}}\) resonance, as can be seen from Fig. 1a, allowing more information to be encoded. Figure 2b shows the spectrum calculated by the 8-level theoretical model using the master equation. Despite of the qualitative agreement, the experimental spectra deviate from the theory results because it is impractical to include in the model the accurate information of the following experimental complications which affect both the resonance lineshape and the absolute signal values: (a) demodulation phases are unknown in the phase sensitive detection due to phase delays in the electronics. (b) the input light polarization is slightly altered by the cell window. (c) there is a wide pedestal for the narrow NMOR resonance, charateristic of the coated cell and related to the thermal motion of the atoms55,56,57,58. We emphasize that due to motional averaging57, the field inhomogeneities of the coil causes negligible line broadening, as evidenced in our experiment by the zero-power resonance linewidth55 of 1 Hz for both the resonances at ΩL = ωm and \({{{\Omega }}}_{{{{{{{{\rm{L}}}}}}}}}=\frac{1}{2}{\omega }_{{{{{{{{\rm{m}}}}}}}}}\), which is likely dominated by spin exchange. Because of the above reasons, relying on the master equation theory model in establishing the relation between the signals and the B field parameters is generally not suitable, while the NN can provide a better solution.

Fig. 2: AC quadratures of nonlinear magneto-optical rotation (NMOR) signals as a function of the magnetic field amplitude.
figure 2

a Experimental NMOR signals versus the amplitude of a tilted magnetic field. The first harmonic signal and second harmonic signal is shown in a1 and a2 respectively. The laser is frequency modulated at 997 Hz, with a modulation range of 400 MHz. The center frequency of the laser is 200 MHz red-detuned from 87Rb D1 line, \(F=2\to {F}^{{\prime} }=1\) transition. The laser power is 20 μW. X is the in-phase signal and Y is the quadrature signal. b Theoretically calculated NMOR signals as a function of magnetic field. The first harmonic signal and second harmonic signal is shown in b1 and b2 respectively. In all figures, the red (blue) curve corresponds to the X(Y) signals and left (right) y-axis.

Then we train the the NN using NMOR signals for a large range of field amplitudes and orientations. The structure of the fully connected NN is shown in Fig. 3a. There is one input layer receiving the four-dimensional NMOR signal and one output layer releasing the field information. Between the input and output layer there are 8 hidden layers each containing 128 neurons and the L2 regularization59 is used to prevent over-fitting. The activation function in the hidden layer is a ReLU (rectified linear unit) function60. The data set is divided into the training set and verification set in the proportion of 8 to 2 and the mean squared error is defined as the loss. The training set is used for learning, i.e., to determine the weights in the NN, while the validation set is used to assess the performance of the already trained NN. In practice, the NMOR data at the input layer for training is generated by a reverse-NN11 with a similar structure. After using the (B, θ, φ) set as the input and the corresponding experimental data (X1, Y1, X2, Y2) as the output for training, this reverse-NN can be employed to produce optical rotation data which is denser and more robust against noise than the measured. We then use these denser NMOR data to train the NN as shown in Fig. 3a with an Adam optimizer61, and the training and validation error is plotted in Fig. 3b. The trained NN can reproduce the full vectorial information of the magnetic field accurately as shown in Fig. 3c, where the solid lines are data generated from the reverse NN and the scattered points are from the prediction of the NN. In our data set, we have chosen the range for θ and φ to be \(\left[1{0}^{\circ },17{0}^{\circ }\right]\), because the NMOR signals are insensitive to the variation of φ ("dead zone") when B is nearly aligned with the propagation direction of the light k (θ ≈ 0° and 180°). One other issue is the signal degeneracy for φ and φ + π, but we propose an angled multi-pass configuration to lift this degeneracy and also to remove the “dead zone" for φ (see Supplementary Note 6).

Fig. 3: Architecture and performance of the neural network (NN).
figure 3

a Illustration of the neural network. The demodulated optical rotation signals' quadratures X and Y at the first and second harmonics of ωm form the 4-dimensional input. The NN gives the magnitude of the magnetic field B and its direction θ, φ as output. b Training process of the NN. Loss of the training set and validation set decreases with the rounds of iteration. Mean squared error is used as the loss function. There is no obvious difference between the training loss and validation loss which means no over-fitting. c Test of the validity of NN. Scattered points are predictions from the trained NN and solid lines are the dense reproduction of the input data through an inverse NN (see text), which show good agreement. In c1: θ = 60°, φ = 60°, in c2: φ = 60°, ΩL = 997 Hz, and in c3: θ = 60°, ΩL = 997 Hz.

Finally, we examine the sensitivities of the three polar components B, θ, φ given by our NN scheme. The normal way to obtain the magnetometer sensitivity is to convert the fluctuations on the measured signal δS to that on the magnetic field δB through a measured slope dS/dB. Here, an analogous “slope" is provided by the trained NN which establishes a map between the optical rotation signals and magnetic field parameters. We continuously record the signal set of optical rotations (X1, Y1, X2, Y2) for about one minute at a sampling rate of 900 per second for each fixed B, and the signal set at each time point is fed to the NN which then outputs the predicted parameter (B, θ, φ). Consequently, the four time traces of the signals X1(t), Y1(t), X2(t), Y2(t) are converted into three time traces B(t), θ(t), φ(t). We then perform fast-Fourier-transform (FFT) on B(t), θ(t), φ(t) respectively, and obtain the sensitivities, where the frequency response has also been considered and was obtained experimentally with the aid of the NN (see Supplementary Note 4) using a similar approach as described here.

Shown in Fig. 4a are the sensitivities at low frequencies for an exemplary B field direction of θ = 63.435°, φ = 60° with an amplitude of about 140 nT, while we found that in other field orientations the sensitivity is at a similar scale (see Supplementary Note 5). Due to the relatively small bandwidth of our magnetometer (associated with the narrow linewidth ~16 Hz of NMOR resonance), sensitivities are better at lower frequency. The best sensitivities are observed in the range of 10–20 Hz, where the sensitivity of field magnitude is about 100 fT\(/\sqrt{\,{{\mbox{Hz}}}\,}\), and the angular sensitivity has the order of 100 \(\mu rad/\sqrt{\,{{\mbox{Hz}}}\,}\). The extra noise at low-frequency near DC is mainly from the magnetic field itself, as well as 1/f noises. In order to confirm the sensitivities given by the NN, we examined whether a small change at these sensitivity levels in the magnetic field can be detected. We applied a small AC magnetic field at 11 Hz to slightly vary (B, θ, φ), and the NN is trained for the AC field in the parameter space near B ≈ 140 nT (ΩL ~ 997 Hz), θ = 63.435°, φ = 60°. The test field change has an interval of (140 fT, 0.02°, 0.02°). The predicted changes in the vector components of B are consistent with the true values, as shown in Fig. 4b where the sizes of the error bars (standard deviations) indicate the sensitivities, which agree with those given by the NN-aided noise analysis shown in Fig. 4a. These results prove that ML-assisted approach for vector magnetometry can give the correct sensitivity levels.

Fig. 4: Sensitivity of the machine learning assisted vector magnetometer.
figure 4

a Neural network predicted sensitivity for field amplitude (a1) and orientations (a2, a3) at low frequency. The measurement is performed at θ = 63.435°, φ = 60° for a field magnitude about 140 nT. b NN-predicted change of the magnetic field magnitude (on top of 140 nT, b1) and directions (b2, b3) versus the corresponding true values. The dashed line corresponds to the y = x function. The results are demonstrated for magnetic field changes at a frequency of 11 Hz. The size of the error bars (standard deviation from 60 repetitive independent measurements) are in agreement with the NN predicted sensitivity level at 11 Hz.

Discussion

We propose a paradigm for atomic vector magnetometry based on machine learning, allowing three dimensional single-shot information extraction using a simple standard scalar magnetometer setup. Acquiring the amplitude and phase of the AC optical rotation signals removes the need for spectral sweep, enabling future real-time measurement of time varying magnetic field. The single-beam all-optical design is suitable for dense integration of the sensor units. We also demonstrate how to obtain vector field sensitivities using the neural network, and the best sensitivities on field amplitude and orientations are about 100 fT\(/\sqrt{\,{{\mbox{Hz}}}\,}\) and \(100 \sim 200\,\mu rad/\sqrt{\,{{\mbox{Hz}}}\,}\) respectively. The current sensitivities are limited by electronic noises around the relatively low modulation frequency. After removal of such noises, the sensitivity may be further improved using a multipass design62. The signal degeneracy for φ and φ + π can be lifted with an angled multi-pass configuration, as shown in our simulation (see Supplementary Note 6), which also removes the dead zone for φ when B is nearly aligned with k of the laser. Furthermore, the dynamic range of detectable magnetic field can be controlled through the resonance linewidth or changing the modulation frequency of the laser. Higher bandwidth can be obtained in vapor cells working in the higher temperature spin-exchange-relaxation-free regime28.

Our strategy of using machine learning to simplify the structure of vector NMOR-magnetometers can be extended to other types of atomic magnetometers, as well as multiparameter sensors in general, using the following procedure: (1) Identify a set of observables which are sensitive to the target parameters and can be simultaneously, if possible, recorded in the experiment. The rich degrees of freedom in the interrogating laser or broadly the electromagnetic field, for example the amplitude, polarization, spatial modes, frequency spectra etc., can be all used for encoding the information indirectly and compressively. (2) Stabilize the experiment system as a prerequisite for a robust map between the observable set and the parameter set. (3) Experimentally collect data within a suitable range of target parameters and perform the neural network training to build the map between the signal set and parameter set. The NN structure is chosen according to the complexity level of the problem, and overfitting should be avoided. (4) Conduct real measurements using the trained NN.

Methods

Theoretical model

Our numerical calculation used the eight-level atomic system as shown in Fig. 1 in the main text. However, since our simulations showed that the four-level model gave qualitatively similar results as the eight-level model, to gain intuition on the key physics, we here describe a simplified four-level system, as shown in Fig. S1, where the ground states have three Zeeman levels which couple to one excited state by σ+, π, σ polarized light fields respectively. The atom-light interaction Hamiltonian H can be derived with the rotating wave approximation (RWA), and the atomic coherences can be found from the density matrix ρ by solving the master equation:

$$\frac{\partial \rho }{\partial t}=-\frac{i}{\hslash }[{H},\rho ]+\left({{{\Gamma }}}_{{{{{{{{\rm{rel}}}}}}}}}+{{{\Gamma }}}_{{{{{{{{\rm{rep}}}}}}}}}\right)\rho,$$
(1)

where Γrel describes the decoherences including the spontaneous decay and dephasing etc., and Γrep describes the repopulation of the ground states63. Due to the periodicity of the system under frequency modulation, the coefficients of a Fourier expansion of the density matrix can be identified using the Floquet technique where ρ(t) is expanded in harmonics of the modulation frequency ωm:

$$\rho (t)=\mathop{\sum }\limits_{n=-\infty }^{\infty }{\rho }^{(n)}{e}^{in{\omega }_{{{{{{{{\rm{m}}}}}}}}}t}$$
(2)

Then the polarization rotation signal of the light we measure can be derived from the atomic coherences, which is found to contain the full vectorial information of the magnetic field. More details are in the Supplementary Note 1.

Calibration of magnetic field

In the experiment, the magnetic field to-be-measured is provided mainly by the three sets of orthogonal Helmholtz coils within the shields, where precise calibration is required in order to generate a magnetic field along any direction as we intend. In the calibration process, we obtain the amplitude of the total magnetic field (produced by the coils and background magnetic field in the shields) by identifying the resonance locations of the NMOR spectrum obtained through slowly sweeping the laser modulation frequency ωm. As shown in Fig. S2, the spectra exhibit resonance when the Larmor frequency ΩL equals ωm (or \(\frac{1}{2}{\omega }_{{{{{{{{\rm{m}}}}}}}}}\), not shown). The resonance center is found by fitting the experiment curve with a linear superposition of a Lorentzian absorption and dispersion function. For a single set of Helmholtz coil, the relation between the current applied and the generated magnetic field is linear. However, for the vector synthesis of a magnetic field generated by three sets of coils, imperfection in the orthogonality of the coils should be considered. Furthermore, the residual background magnetic field in the magnetic shields couldn’t be neglected.

The strategy we used for calibration is similar to that used in reference64. We consider a coil system with imperfect orthogonality among the three sets of coils which yield magnetic fields \({B}_{{{{{{{{{\rm{X}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}},{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}},{B}_{{{{{{{{{\rm{Z}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\) along Xc, Yc, Zc axis respectively, as shown in Fig. S3. First, for each set of coil we obtain the relation between the field amplitude and the current through the NMOR spectra with only this coil in operation. Then, without losing generality, we can set small angles ξ, η, ζ(see Fig. S3) to describe the deviation of (Xc, Yc, Zc) from a normal orthogonal coordinate system (X, Y, Z), and we have:

$${{{{{{{{\bf{X}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}=\left(\begin{array}{c}\cos \xi \\ 0\\ \sin \xi \end{array}\right),\; {{{{{{{{\bf{Y}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}=\left(\begin{array}{c}\sin \eta \cos \zeta \\ \cos \eta \cos \zeta \\ \sin \zeta \end{array}\right),\; {{{{{{{{\bf{Z}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}=\left(\begin{array}{c}0\\ 0\\ 1\end{array}\right).$$
(3)

The total magnetic field is \({{{{{{{\bf{B}}}}}}}}={B}_{{{{{{{{{\rm{X}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}{{{{{{{{\bf{X}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}+{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}{{{{{{{{\bf{Y}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}+{B}_{{{{{{{{{\rm{Z}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}{{{{{{{{\bf{Z}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}+{{{{{{{{\bf{B}}}}}}}}}_{{{{{{{{\rm{residual}}}}}}}}}\), which can be written as:

$$ {B}_{{{{{{{{{\rm{X}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\cos \xi+{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\sin \eta \cos \zeta+{B}_{{{{{{{{{\rm{X}}}}}}}}}_{0}}=B\sin \theta \cos \varphi \\ {B}_{{Y}_{{{{{{{{\rm{c}}}}}}}}}}\cos \eta \cos \zeta+{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{0}}=B\sin \theta \sin \varphi \\ {B}_{{{{{{{{{\rm{X}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\sin \xi+{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\sin \zeta+{B}_{{{{{{{{{\rm{Z}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}+{B}_{{{{{{{{{\rm{Z}}}}}}}}}_{0}}=B\cos \theta$$
(4)

or:

$$ {({B}_{{{{{{{{{\rm{X}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\cos \xi+{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\sin \eta \cos \zeta+{B}_{{{{{{{{{\rm{X}}}}}}}}}_{0}})}^{2}\\ \;+{({B}_{{Y}_{{{{{{{{\rm{c}}}}}}}}}}\cos \eta \cos \zeta+{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{0}})}^{2}\\ \;+{({B}_{{{{{{{{{\rm{X}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\sin \xi+{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\sin \zeta+{B}_{{{{{{{{{\rm{Z}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}+{B}_{{{{{{{{{\rm{Z}}}}}}}}}_{0}})}^{2}={B}^{2}.$$
(5)

Here B, θ, φ are respectively the amplitude, altitude angle and azimuth angle of the total magnetic field we intend to measure. \({B}_{{{{{{{{{\rm{X}}}}}}}}}_{0}},{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{0}},{B}_{{{{{{{{{\rm{Z}}}}}}}}}_{0}}\) are the components of the residual magnetic field along X, Y, Z respectively. The total magnetic field’s amplitude B as expressed by Eq. (5) can be measured from the NMOR spectra. By traversing the currents in the three coils and measuring the total field amplitude B for each set of (\({B}_{{{{{{{{{\rm{X}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}},{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}},{B}_{{{{{{{{{\rm{Z}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\)), we can determine parameters \((\xi,\eta,\zeta,{B}_{{{{{{{{{\rm{X}}}}}}}}}_{0}},{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{0}},{B}_{{{{{{{{{\rm{Z}}}}}}}}}_{0}})\) using Eq. (5) through non-linear least squares fitting. Then, to set a total magnetic field with parameters B, θ, φ as we intend, we can solve Eq. (4) to find what magnetic field should be generated in each coil, i.e., (\({B}_{{{{{{{{{\rm{X}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}},{B}_{{{{{{{{{\rm{Y}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}},{B}_{{{{{{{{{\rm{Z}}}}}}}}}_{{{{{{{{\rm{c}}}}}}}}}}\)).

Implementation of neural network

Neural Network (NN) is an artificial intelligence (AI) method based on the connectivism which imitates the connection between neurons. Our model is a simple fully connected Neural Network, and we proceed as follows to mimic the function of the biological neural network. First, data are collected in pairs of feature (input) and label (output). Commonly, the larger the amount of data, the better the performance of the NN. Second, we build the structure of the NN with a complexity determined by the scale of the problem to be solved. Similar to the growth of cognitive ability of human, the NN receives large amount of collected data with features and corresponding labels which change the weights of neurons. The NN updates its parameters via back-propagation using gradient descent algorithm aimed to reduce the loss function we choose. This is the training process of the NN. In our experiment mean-squared error is chosen to be the loss function. After training, parameters in the NN are fixed and new data of features can be sent to the input port of the NN and it will output the predictions.

Our Neural Network is implemented using the framework of Keras, a high-level API (Application Programming Interface) of Tensorflow written in python. In Keras, a model is understood as a sequence or diagram composed of independent and fully configurable modules. These modules can be assembled together with as few restrictions as possible. In particular, modules such as Neural Network layer, loss function, optimizer, initialization method, activation function, and regularization method, can be combined to build new models.

The input layer of our NN receives the four-dimensional NMOR signal (X1, X2, Y1, Y2) and the NN predicts the three-dimensional magnetic field information (B, θ, φ) as the output. Between them are 8 hidden layers each containing 128 neurons. The transmission between layers is implemented via matrix operation and in each neuron there should be a non-linear activation function. ReLU activation function is used in each neuron. Mean-squared error is chosen to be the loss function, and additional term is added to the loss function to prevent overfitting. By calling the Keras API for L2 regularization in the hidden layers, quadratic sum of all the parameters in the hidden layers are recorded and added to the loss function. This procedure guarantees the generalization ability of the model, i.e., it will prevent overfitting which often means a complicated NN that adjusts the input and output relation only for the training data set. As for the training process, the adaptive moment estimation method65 is applied.