NV center based nano-NMR enhanced by deep learning

The growing field of nano nuclear magnetic resonance (nano-NMR) seeks to estimate spectra or discriminate between spectra of minuscule amounts of complex molecules. While this field holds great promise, nano-NMR experiments suffer from detrimental inherent noise. This strong noise masks to the weak signal and results in a very low signal-to-noise ratio. Moreover, the noise model is usually complex and unknown, which renders the data processing of the measurement results very complicated. Hence, spectra discrimination is hard to achieve and in particular, it is difficult to reach the optimal discrimination. In this work we present strong indications that this difficulty can be overcome by deep learning (DL) algorithms. The DL algorithms can mitigate the adversarial effects of the noise efficiently by effectively learning the noise model. We show that in the case of frequency discrimination DL algorithms reach the optimal discrimination without having any pre-knowledge of the physical model. Moreover, the DL discrimination scheme outperform Bayesian methods when verified on noisy experimental data obtained by a single Nitrogen-Vacancy (NV) center. In the case of frequency resolution we show that this approach outperforms Bayesian methods even when the latter have full pre-knowledge of the noise model and the former has none. These DL algorithms also emerge as much more efficient in terms of computational resources and run times. Since in many real-world scenarios the noise is complex and difficult to model, we argue that DL is likely to become a dominant tool in the field.


Introduction
The newly developed discipline of nano-NMR [1][2][3][4][5][6][7] is aimed at reducing the minimal NMR sample size by many orders of magnitude, and thus increasing the NMR sensitivity and spatial resolution down to a few molecules 8 .This is achieved by replacing the macroscopic coil of the NMR setup, which measures the magnetic field, by a single or an ensemble of controllable spins, e.g., NV centers in diamond, which serve as tiny magnetometers.Recent experiments have shown that it is possible to estimate the spectrum of artificial signals and signals of polarized samples with high resolution [9][10][11][12][13] .However, the obvious advantages of receiving spectral information about tiny quantities of molecules are masked by the extra amount of noise that goes hand in hand with most configurations of this setup.There are a few sources of this extra noise, which include the NV coherence time (magnetic noise), the controller noise (laser and microwave operations), and most importantly the diffusion induced noise, which is negligible in the regular NMR setup but is extremely large in the nano-NMR setup and broadens the line-width above the required resolution.This noise creates a serious bottleneck, as the crucial information is encoded in the tiny chemical shifts and small energy gaps caused by J -couplings.That is, the nano-NMR setup is usually characterized by a weak measured signal, which is masked by a strong noise.
Moreover, the precise noise model is usually complex and unknown.Consequently, it is an intractable data processing challenge to achieve a spectral discrimination between weak and similar signals of near-by frequencies.In particular, because the noise model is complex and unknown, it is difficult to tackle this noise and reach the optimal discrimination by conventional data analysis methods with which optimal discrimination can usually be obtained only when full knowledge of the noise model is available.
In this work we show that the challenge of spectral discrimination between weak and similar signals in the presence of strong and complex noise, can be efficiently confronted by DL algorithms, which effectively learn the noise model.Moreover, we show that DL methods are capable of learning the noise model from a small amount of data which only needs to be gathered for a few minutes.This means that a DL algorithm can analyze a test signal with the same efficiency as numerically demanding Bayesian methods that rely on precise knowledge of the model.In addition, we show that DL methods can be extremely useful in dealing with challenging frequency resolution problems and possibly overcome Bayesian methods even under assumptions that these have full knowledge of the model and infinite computing power.
DL techniques have been successfully applied to spectral data in the fields of Astronomy, Chemistry, Geosciences, and Bioinformatics 14 .Spectral data from these disciplines pose similar challenges: (1) High data dimensionality (2) Difficulty of modeling the important features from first principles (3) Dirty environments with many classes of objects that need to be differentiated along with varying signal intensities (4) Importance of subtle differences in the signal.Despite these difficulties, which apply in our context as well, impressive achievements have been made such as the detection of narcotics in Raman spectroscopy data with a 0.5% error rate 15 .
DL methods have also been used for the analisys of NMR data, in particular in the context of automated protein structure for peak-picking of nuclear magnetic resonance spectra 16 , of biological macromolecules 17 , and recently also in the context of analyzing a variety of spectral images of proteins by using support vector machine classifier combined with histogram of oriented gradient 18 , and by using convolutional neural network 19 .In addition, deep learning techniques such as long short term memory networks 20 and variational auto-encoder networks 21 have been used in NMR applications for material characterization and subsurface characterization We believe that the success of DL methods in the analysis of regular NMR data should be amplified in the nano-NMR setup due to the larger amount of noise in this setting, which originates from two main ingredients that are absent in the regular NMR setting.The first ingredient is the origin of the signal.While in regular NMR the signal is created by thermal polarization, in nano-NMR the signal is created by statistical polarization 22 , which imply that in the nano-NMR setup the noise is stronger.The second ingredient is the quantum projection noise, which is of a Poissonian or a Bernouli nature, and in many cases is the dominant source of noise.Here we provide evidence that DL methods can tackle these noises efficiently.
To evaluate the efficiency of DL methods in terms of the spectroscopy of nano-NMR data, we consider two problems, frequency discrimination and frequency resolution.We first examine the ability of DL methods to discriminate between two signals corresponding to two different frequencies.In particular, we consider data from signals that were read by an NV center, which simulates noisy nano-NMR data.Typical data for these two frequencies are shown in Fig. 1, which presents two time traces of the datasets together with their Fourier transforms.It is immediately clear that it is impossible to discriminate between the two frequencies based on the Fourier transform alone because the signal has a strong phase noise on top of the detection noise.In this work we show that DL methods are able to classify the data with the same efficiency as Bayesian methods, which use full knowledge of the signal and noise model and are numerically much more demanding than DL methods.The advantage of DL methods is also indicated by their superior performance in frequency discrimination of the experimental data, where the signal and noise models are not fully known.
We then employ DL methods to tackle the problem of frequency resolution in a noisy environment.We show that DL methods can efficiently discriminate between the signal of a single frequency and the signal of two nearby frequencies that have a strong amplitude and phase noise.The probe, which is initially polarized along x, freely evolves according to H s i (Eq. 1) for a short duration, ∆t, and then is measured along ŷ.In the measurement scheme of a single experiment, the sequence of probe operations consists of initialization, evolution, and measurement, which is repeated N times under the constant presence of a signal.In each experiment, the frequency of the signal is equal to one of two known frequencies, ω 1 and ω 2 .(b) Our aim is to discriminate between the two frequencies.A single experiment results in a string of bits, x = {1, 0, 0, 1, ...}.Given x, we want to obtain an estimation of the frequency of the signal, Our results strongly suggest that DL methods can effectively learn the physical and noise models and by that constitute an efficient alternative to Bayesian methods, which require a priori knowledge on the physical and noise models.

Frequency discrimination
The physical model We consider the problem of discrimination between two signals corresponding to two different frequencies by a single quantum probe.In the nano-NMR setup this corresponds, for example, to the scenario where a single NV center, which serves as a tiny magnetometer, is placed in the proximity of a sample that contains two known molecules between which we wish to discriminate.Specifically, in the presence of a single frequency signal (a single molecule) the Hamiltonian of the spin probe is given by where g i , ω i , and φ i are the amplitude, frequency, and (random) phase of signal i respectively, which is the standard setting in nano-NMR experiments [1][2][3][4]23 . The robe, which is initially polarized along x, freely evolves according to H s i for a short duration, ∆t, and then is measured along ŷ.In the measurement scheme of a single experiment, the sequence of probe operations consists of initialization, evolution, and measurement, which is repeated many times under the constant presence of a signal (Fig. 2 (a)).In the case of a single shot measurement, the measurement result is a sequence of zeros and ones, Fig. 1 (right), and the probability for a successful measurement (one) is given by We start by considering an ideal scenario (no noise or inefficiencies) where Eq. ( 2) holds.We assume that in each experiment the signal corresponds to one of two known frequencies (ω 1 and ω 2 ), the amplitudes of the signals are known, but in each experiment the signal has an unknown uniformly distributed random phase.A single experiment results in a string of bits, x = {1, 0, 0, 1, ...}, where 1 and 0 correspond to a detection of the m s = 0 state or m s = −1 state of the NV center.Given x, we want to obtain an estimation of the frequency of the signal, The M DL neural network is a feed-forward fully connected neural network.The input layer inputs the measurement results x to the second layer (first hidden layer).The output of the last hidden layer is fed to the output layer, which results in the frequency discrimination, ω est = ω 2 (Fig. 2 (b)).We quantify the performance of a discrimination method M by the error probability of the frequency estimation, which is defined by where P M (ω est = ω j |ω i ) is the probability of method M to output ω est = ω j given that the frequency of the signal is ω i .

Full Bayesian method
In the ideal scenario considered here, we have full knowledge of the model (Eq.( 2)) and the only unknowns are the random phases φ i .Hence, we can simply utilize a Full Bayesian method known as the likelihood-ratio test and denoted by M F B , where for each frequency we calculate the maximal log-likelihood over the unknown random phases.That is, where We estimate the frequency according to the larger likelihood; that is As M F B utilizes the maximal information on the signal, it obtains the minimal possible error for an unbiased estimator, which can serve as a benchmark to evaluate the efficiency of a learning method.Hence, its error probability serves as a lower bound for the DL method.It is known that Bayesian methods are optimal given the maximal amount of information and given that the optimization can be done efficiently, which is usually not the case, specifically when considering a noisy environment.In order to verify that we indeed have the optimal method, we compare the results to an analytical calculation of the Fisher Information (FI), which can be done in this case.In general, full knowledge is not available due to either a lack of knowledge of the noise model in the experiment and detection inefficiencies, or lack of knowledge of the signal.In this case, we can utilize a correlation based method, M corr , for frequency discrimination.To this end, we first use a train set of measurement results, X train , for which the frequency of the signal is known.For each x ∈ X train we calculate the correlation vector C k = x i x i+k i (here we replace the 0 bit by −1).Then, for each frequency we calculate the averaged correlation vector, To estimate the frequency of an unknown signal we calculate its correlation vector, C k , and then the distances by the L 2 norm.We estimate the frequency according to the smaller distance; that is, This method, however, disregards higher order correlations functions and the finite precision of the correlation functions itself which varies considerably between the nearest neighbors and the higher neighbor separation.While in the limit where all these effects are taken into account this should approach the optimum, it is numerically very challenging or even impossible to apply to most problems of interest.

Deep Learning method
To overcome the model's lack of knowledge, we suggest using a supervised DL model, which we denote by M DL .
In particular, we consider a feed-forward fully connected neural network.The main reason for choosing a fully connected neural network is that the signal (frequency) information is encoded in the correlations between the values of different input neurons (measurement results), and in particular far apart input neurons.Similar to M corr , we use a training dataset of measurement results of known signals (known labels) to train M DL .We denote the labels of the two frequencies by 0 and 1. M DL is then applied to a test dataset and results in estimations of the frequencies of the test measurement results.Our DL model is a feed-forward fully connected neural network of four layers (two hidden layers) as depicted in Fig. (3).While two hidden layers are sufficient for the scenarios considered in this work, it may be the case that more complex noise models would require to employ deeper neural networks.The first layer is called the input layer.The neurons of the input layer output the input data; in our case, the measurement results x of a single experiment, to the second layer.The output of neuron j in the second (hidden) layer is given by f j (z) = f ( i w ij x i + b j ), where f is the activation function, and w ij and b j are the weights and 5/14 biases respectively.For the hidden layers we use the rectified linear (ReLU) activation function, f (z) = max(0, z).
The output of the second layer is then fed as an input to the third layer and so on.The last layer is called the output layer.In our model the output layer has one neuron whose low and high activation levels are associated with the two possible labels (frequencies).For the output neuron we use the Sigmoid activation function.We use the mean squared error (MSE) between the outputs of the learning model and the labels of the train set as the loss function that is minimized during the training by optimizing the weights and biases of the model.Please note that there is no special reason for choosing the Sigmoid activation function with the MSE loss function; the softmax activation function together with the cross-entropy loss functions may be used as well.Overfitting is avoided by restricting the total numbers of nodes in the network (and hence, the number of free variables).In particular, for the examples considered in this work we use a second layer of 20 nodes, and a third layer of 35 nodes (a small modification of the number of neurons in each of the two hidden layers would not change the model's accuracy significantly).Regarding the test dataset, after the application of the Sigmoid activation function on the output of M DL , we label the output by 1 or 0 according to whether it is > 0.5 or < 0.5 respectively.We then calculate P M DL by the loss function (the MSE) between the output labels and the true labels.

Numerical analysis
As a way of testing the performance of M DL in terms of frequency discrimination, we constructed numerical sets of measurement results, x, according to Eq. ( 2) for two different frequencies, where the phase, φ i , was chosen randomly (uniformly distributed between 0 and 2π) for each x.The input data were generated with g 1 = g 2 = ω 1 = 10/(2π) Hz, ω 2 = ω 1 + ∆ω, ∆t = 0.5 sec, and a total measurement time of T tot = 500 sec (1000 measurements).Part of the datasets were used for training and the remainder was used for testing the learning model.We compared the performance of M F B to the performance of M DL and M corr .In Fig. ( 4) we show the discrimination error probabilities, P M F B , P M DL , and P M corr as a function of the frequency difference, ∆ω, between the two signals, as well as the corresponding M DL receiver operating characteristic (ROC) curves and areas under the curve (AUC).We considered a first layer of 1000 nodes (1000 measurements), a second layer of 20 nodes, and a third layer of 35 nodes.This choice of number of nodes limits the free variable space and allows us to avoid overfitting without resorting to regularization methods.In this ideal scenario, both M corr and M DL approach the optimal performance of M F B even though both methods have no a priori information on the physical model.
In order to provide indications on the performance of M DL in real-world noisy scenarios we further considered a few more noise models and assumed that these noise models are "unknown" and hence, they are not taken into account in the Bayesian methods M F B and M corr , which remain unchanged as described above.This serves as an indication on how much better the performance of M DL could be in comparison to M F B and M corr in a real-world scenario when the noise model is truly unknown to some extent.The first noise model is still a phase noise.While previously we considered that the random (uniformly distributed) phase of the signal is constant during a single experiment, here we consider a scenario in which the random phase is changed once during a single experiment, where the second random phase is also uniformly distributed.Moreover, the time interval in which the phase change occurs is also uniformly distributed between the time intervals of a single experiment (1000 time intervals).The discrimination error probabilities, P M F B , P M DL , and P M corr as a function of the frequency difference, ∆ω, between the two signals are shown in Fig. (5 (a)).It is clear that while the phase noise damages the discrimination capability of M F B and M corr , M DL is capable of learning the noise model.The second noise model considers a magnetic noise δb, which modifies the Hamiltonian of the probe, Eq. (1) to Similar to the phase noise, we assume that δb is changed once during a single experiment and that the time interval in which the change of δb occurs is uniformly distributed between the time intervals of a single experiment.Each of the two values of δb is Normally distributed with a zero mean and a standard deviation of σ = g i /5 = 2/(2π) Hz.The discrimination error probabilities, P M F B , P M DL , and P M corr as a function of the frequency difference, ∆ω, between the two signals are shown in Fig. (5 (b)).In this case M DL handles the magnetic noise better that M F B and much better than M corr .In the third noise model we consider noise in the amplitude of the signal.Specifically, we assume that the amplitude value is different in each time interval and that it is Normally distributed with a mean of g = 10/(2π) Hz (the previous value of the non-noisy amplitude) and a standard deviation that is equal to the mean value, that is, σ = g = 10/(2π) Hz .It is apparent that M DL is still capable of learning the noise model while the performance of M F B and M corr is severely degraded when assuming that we have no further knowledge on the noise model.Of course, in case that we have more knowledge on the noise model, we may be able to modify the Bayesian methods accordingly.However, the implication of such a modification is that the optimization is performed with respect to a larger set of free variables, and therefore implies longer run times while the DL run time remains unchanged.Moreover, the above results suggest that Bayesian method could be very sensitive to the noise model; a minor unknown difference between the true noise model and the assumed noise model could result in a significantly reduced performance of the Bayesian method (say, for example, that there are three phase changes in a single experiment instead of two).

Experimental verification
The NV center in diamond [24][25][26] is one of the leading quantum probe systems for sensing, imaging and spectroscopy.Here we considered frequency discrimination of measurement results obtained by a single NV center in ambient conditions.Two artificial signals were produced by a signal generator with frequencies ω 1 = 2π × 250 Hz and ω 2 = 2π × 251.6 Hz.Each signal was measured for a total measurement time of T tot = 220 sec, with a time interval of ∆t = 10 µs.From the row data, we generated strings of 25000 measurement results (T tot = 0.25 sec) such that the phase corresponding to each x can be considered as a random phase (no phase relation), and the frequencies cannot be resolved by a Fourier-Transform (see Fig. 1 (left)).The low photon-detection efficiency of a true detection (m s = 0) and a false detection (m s = −1) was ∼ 7.4% and ∼ 5.2% respectively, indicating low SNR and contrast.
In order to achieve a theoretical bound on the discrimination error, we considered a theoretical model with a modified probability for a successful measurement, which is given by where P (t) is given by Eq. ( 2), and η true and η f alse are the true and false detection efficiencies respectively.Assuming that η f alse = 0.7η true , we constructed numerical datasets according to Eq. ( 10), and set the amplitudes of the signals, g 1 and g 2 , and the efficiency η true for each signal to match the experimental results according to two constraints: (i) The power spectrum at the frequency of the signal of the numerical data was required to be approximately equal to the power spectrum of the experimental data.(ii) The average of the experimental and numeric signals fulfilled . For the numerical model we achieved P M F B ≈ 10.8% and P M DL ≈ 11.6%, (see Fig. 6 (left), green square and red circle under the diamonds).These results are consistent with the experimental data, for which we obtained P exp M DL ≈ 12.1% (Fig. 6 (left) blue diamond), reaching P M F B without having any information on the model.Moreover, the Full Bayesian method on the experimental data obtained only P exp M F B ≈ 16.2% (Fig. 6 (left) green diamond).This difference is due to the fact that the experimental statistics differ slightly from our probability function; while for the Bayesian method this creates a problem, the DL method is able to learn this difference and take it into account.This difference is expected to be much more dramatic in real nano-NMR experiments in which there are much more uncertainties of the model.In addition, we analyzed P M F B and P M DL on the numerical data 8/14 It is worth noting that due to the relatively large window size of 25000, a full analysis of M corr is not possible within a reasonable time scale on a common computer.Partial analysis (taking into account segments of two-point correlations only) of M corr of both the numerical model and the experimental data yielded P M corr 0.4.This indicates that DL could indeed be the better choice when there is a lack of knowledge on the model.

Comparison to other Machine Learning methods
So far we have shown that DL methods are useful for the problem of frequency discrimination in the nano-NMR settings.In this section we ask whether other machine learning methods could be useful for this task and if so, how these methods perform compared to DL.
Any method that is able to discriminate between two signals of near-by frequencies, as we have considered in previous sections, should be able to learn and acquire the information on the signals from the correlations between different measurement results (different x i ).Hence, any successful discrimination method should involve some non-linearity.Indeed, a fully connected neural network with only linear layers fails in the considered discrimination problem (the achieved error probability is 1/2).We tested the performance of three other linear learning methods, namely, logistic regression (with no interaction terms), K nearest neighbours and supported vector machines (SVM) with a linear kernel, in the ideal model scenario (Fig. 4).Similarly to a fully connected linear neural network, these methods completely fail to discriminate between the signals and achieve an error probability of 1/2 for all values of ∆ω.
Regarding non-linear models, we considered two models, SVM with the non-linear radial basis function (rbf) kernel and XGboost, which is an implementation of gradient boosted decision trees (in our case we consider non-linear boosting as linear boosting fails).We tested these two models in the ideal model scenario (Fig. 4) as well as in the mixed noise model scenario (see Fig. 5 (d)).The results are shown in Fig. 7.It can be seen that these two methods achieve accuracies (discrimination error probabilities) which are very similar to the accuracies obtained by DL.However, there is a big difference in terms of required computational resources as these two methods consume more memory compared to DL, and require much longer running times of, for example, ∼ 20 hours compared to ∼ 20 minutes by DL.Indeed, it is not feasible to use these methods for the discrimination in the case of the experimental data, where the size of the inputs is much larger (input strings of 25000 compared to input strings of 1000).While we have not made an exhaustive study and analysis of machine learning methods, which is beyond the scope of this work, these findings strengthen the possible advantage and benefit of DL method for data processing of nano-NMR experimental results.3)).Full Bayesian, P M F B (green squares), Deep Learning, P M DL (red circles), and correlations, P M corr (blue hexagons), as a function of the frequency difference, ∆ω.The input data were produced according to eq. ( 12) with T tot = 2T 2 .Right: ROC curve and AUC of M DL for different values of ∆ω, corresponding to the second, fourth, and sixth points from left in the left figure .frequency difference between the two frequencies.We generated many sets of random OU processes denoted by O k and calculated where We estimated the signal as a single frequency signal or as a signal of two frequencies according to the larger likelihood; that is Fig. 9 (left) shows the error probability as a function of the frequency difference.The M DL results were better than the results of M corr as well as the results of M F B .Interestingly, even though M F B has full knowledge of the noise model it achieves a larger error probability than M DL .We note that increasing the number of OU processes, O k , in the above likelihood calculation does not improve P M F B .While M DL and M corr could reach a result within ∼ 45 min, M F B did so within ∼ 7 hours (CPU times, both considered on the same common PC without utilizing GPU).The M DL ROC curves and AUC are shown in Fig. 9 (right).These numerical results provide a strong indication that DL methods can potentially identify molecules based on their NMR signal extremely fast, which may be a useful tool in probing chemical reactions at the nano scale.

Theoretical implications
While the numerical advantages of machine learning methods were already shown 28,29 , their theoretical value was not demonstrated before.Beyond the practical interest of utilizing machine learning methods in the nano-NMR frequency resolution problem, machine learning methods, and in particular DL methods, could also have a considerable theoretical value.
Generally in estimation problems, the MSE of an estimator M for a given unseen test input x can be written as where y is the true label, M (x) is the estimated label, E is the expectation value with respect to the training set, the bias of M is given by Bias An unbiased estimator is an estimator M for which we have that Bias [M (x)] = 0.An optimal unbiased estimator has a minimal variance, which is known as the minimum variance unbiased (MVU) estimator.However, from Eq. ( 16) it is seen that an MVU estimator is not necessarily an optimal estimator which minimizes the MSE.Indeed, it is known that biased methods can outperform the unbiased ones [30][31][32] .In this case the magnitude of the bias is increased and Bias [M (x)] 2 > 0, but the variance Var [M (x)] is significanlty decreased such that the MSE is smaller than the MSE of an MVU estimator.Such strategies of error reduction are used ubiquitously in image restoration 33,34 and beamforming applications 35,36 .Moreover, it is known that biased methods can be superior in various spectral analysis applications 37 .Despite its superiority there are only a few structured methods in which such a biased estimator can be constructed 31 and in most cases the search for such estimators is extremely challenging, especially as it is unknown if such an estimator exists.
Our numerical analysis of M F B has converged to the final result, however, the method has resolved the two frequencies with a higher error rate than the error rate of M DL .Since M F B is an MVU estimator, our results indicate that for the model at hand the unbiased full model Bayesian analysis is not optimal, and that a superior biased method exists.This brings up an extra advantage of DL as the search for a biased method in usually done in an ad-hc manner.Moreover, in most cases there is no way of knowing if a superior method to the unbiased method exists.Hence, our results provide some hope that DL methods could be used as an analytical tool for identifying superior estimators, and in particular, for identifying the ultimate limits of resolution problems.

Conclusion
In conclusion, we showed that DL methods are able to mitigate the effect of the inherent strong noise in the nano-NMR settings.In particular, the DL neural networks effectively learn the noise model, even when no prior knowledge on the noise model is assumed.This is a crucial property of the DL methods as in many realistic nano-NMR scenarios the noise model is complex and not accurately known.We investigated the performance of DL methods in the problems of frequency discrimination and frequency resolution.We showed that DL methods can outperform Bayesian methods when full knowledge of the noise model is not available and that DL methods can analyze a test signal as accurately as numerically demanding Bayesian methods even though Bayesian methods have full knowledge of the noise model, and the DL methods have no prior knowledge at all.DL methods can perform better than Bayesian methods when the noise model is not precisely known or when the noise model is known but it is a complex model.In the first case DL methods can achieve better results than Bayesian methods as DL methods do not assume prior knowledge on the model while Bayesian methods rely on precise knowledge of the model.This was demonstrated in the case of frequency discrimination in the noisy scenario, as well as in the analysis of the experimental data.In the second case the results of both methods may be similar, but the consumption of computational resources of Bayesian methods can be much larger compared to the resources required by DL methods, as was demonstrated in the problem of frequency resolution of noisy signals.
Our results can be seen as a strong indication that DL methods will turn out to be the method of choice when analyzing spectroscopic nano-NMR data.In addition, our results indicate that DL methods could be utilized as a tool that may enable to identify superior biased estimators and ultimate limits of resolution problems, which are otherwise difficult to obtain 12 .

Figure 1 .
Figure 1.Typical noisy data of two different frequencies that we aim to discriminate in this work.The oscillating magnetic signals at the two different frequencies suffer from a strong phase noise and are read by an NV center, which adds quantum noise to the output binary signal (see Eq. 10).(upper right): The time trace binary signal from one frequency of 250 Hz together with its Fourier transform after subtracting the zero frequency (upper left).(lower right): The time trace binary signal from the second frequency of 251.6 Hz and its Fourier transform (lower left).

Figure 2 .
Figure 2. The physical model.(a)The probe, which is initially polarized along x, freely evolves according to H s i (Eq. 1) for a short duration, ∆t, and then is measured along ŷ.In the measurement scheme of a single experiment, the sequence of probe operations consists of initialization, evolution, and measurement, which is repeated N times under the constant presence of a signal.In each experiment, the frequency of the signal is equal to one of two known frequencies, ω 1 and ω 2 .(b) Our aim is to discriminate between the two frequencies.A single experiment results in a string of bits, x = {1, 0, 0, 1, ...}.Given x, we want to obtain an estimation of the frequency of the signal, ω est = ω 1 or ω est = ω 2 .

Figure 4 .
Figure 4. Performance in the ideal model scenario.Left: discrimination error probabilities (Eq.(3)) as a function of the frequency difference, ∆ω.Full Bayesian, P M F B (green squares), Deep Learning, P M DL (red circles), correlations, P M corr (blue hexagons), and analytical bound on P M F B (dashed black).The input data were generated according to Eq. (2) with g 1 = g 2 = ω 1 = 10/(2π) Hz, ω 2 = ω 1 + ∆ω, ∆t = 0.5 sec, and a total measurement time of T tot = 500 sec (1000 measurements).Right: receiver operating characteristic (ROC) curve and area under the curve (AUC) of M DL for different values of ∆ω, corresponding to the first, third, fifth and seventh points from left in the left figure.

Figure 5 .
Figure 5. Performance in noisy scenarios.Discrimination error probabilities (Eq.(3)) as a function of the frequency difference, ∆ω.Full Bayesian, P M F B (green squares), Deep Learning, P M DL (red circles), and correlations, P M corr (blue hexagons).(a) phase noise -the random phase of the signal is randomly changed once during a single experiment at a random time interval.(b) magnetic noise -the probe is subject to a random magnetic field, which is randomly changed once during a single experiment at a random time interval.(c) amplitude noise -the amplitude of the signal has a different (random) value in each time interval.(d) Mixed noise scenario, which includes all of the above noise models.See text for more details.(e) ROC curve and AUC of M DL for different values of ∆ω, corresponding to the second, fourth, sixth, eighth and tenth points from left in figure (d).
. The discrimination error probabilities, P M F B , P M DL , and P M corr as a function of the frequency difference, ∆ω, between the two signals are shown in Fig. (5 (c)).In this case M DL performs slightly better than M corr and better than M F B .Lastly, we consider the mixed-noise scenario where all of the above three noise models are includes.The discrimination error probabilities, P M F B , P M DL , and P M corr as a function of the frequency difference, ∆ω, between the two signals are shown in Fig. (5 (d)) and the corresponding M DL ROC curves and AUC are shown in Fig. (5 (e))

Figure 6 .
Figure 6.Performance in the low-efficiency model scenario.Left: discrimination error probabilities (Eq.(3)).Full Bayesian, P M F B (green squares) and Deep Learning, P M DL (red circles) on numeric data, Full Bayesian, P exp M F B (green diamond), and Deep Learning, P exp M DL (blue diamond) on the experimental data, as function of the frequency difference, ∆ω.The input numeric data were produced according to Eq. (10) with g 1 = 12.5 KHz, g 2 = 11.25 KHz, ω 1 = 250 Hz , ω 2 = ω 1 + ∆ω, ∆t = 10 µsec and a total measurement time of T tot = 0.25 sec (25000 measurements).Right: ROC curve and AUC of M DL on the experimental data, corresponding to the blue diamond in the left figure.

Figure 9 .
Figure 9. Performance in the noisy frequency resolution scenario.Left: discrimination error probabilities (Eq.(3)).Full Bayesian, P M F B (green squares), Deep Learning, P M DL (red circles), and correlations, P M corr (blue hexagons), as a function of the frequency difference, ∆ω.The input data were produced according to eq. (12) with T tot = 2T 2 .Right: ROC curve and AUC of M DL for different values of ∆ω, corresponding to the second, fourth, and sixth points from left in the left figure.
2, and Var [ ] is the irreducible error due to the (zero mean) noise .The error probability, P M , is then obtained by the expectation value of the MSE, E y − M (x)2, with respect to the test set.