Abstract
The growing field of nano nuclear magnetic resonance (nanoNMR) seeks to estimate spectra or discriminate between spectra of minuscule amounts of complex molecules. While this field holds great promise, nanoNMR experiments suffer from detrimental inherent noise. This strong noise masks to the weak signal and results in a very low signaltonoise ratio. Moreover, the noise model is usually complex and unknown, which renders the data processing of the measurement results very complicated. Hence, spectra discrimination is hard to achieve and in particular, it is difficult to reach the optimal discrimination. In this work we present strong indications that this difficulty can be overcome by deep learning (DL) algorithms. The DL algorithms can mitigate the adversarial effects of the noise efficiently by effectively learning the noise model. We show that in the case of frequency discrimination DL algorithms reach the optimal discrimination without having any preknowledge of the physical model. Moreover, the DL discrimination scheme outperform Bayesian methods when verified on noisy experimental data obtained by a single NitrogenVacancy (NV) center. In the case of frequency resolution we show that this approach outperforms Bayesian methods even when the latter have full preknowledge of the noise model and the former has none. These DL algorithms also emerge as much more efficient in terms of computational resources and run times. Since in many realworld scenarios the noise is complex and difficult to model, we argue that DL is likely to become a dominant tool in the field.
Introduction
The newly developed discipline of nanoNMR^{1,2,3,4,5,6,7} is aimed at reducing the minimal NMR sample size by many orders of magnitude, and thus increasing the NMR sensitivity and spatial resolution down to a few molecules^{8}. This is achieved by replacing the macroscopic coil of the NMR setup, which measures the magnetic field, by a single or an ensemble of controllable spins, e.g., NV centers in diamond, which serve as tiny magnetometers. Recent experiments have shown that it is possible to estimate the spectrum of artificial signals and signals of polarized samples with high resolution^{9,10,11,12,13}. However, the obvious advantages of receiving spectral information about tiny quantities of molecules are masked by the extra amount of noise that goes hand in hand with most configurations of this setup. There are a few sources of this extra noise, which include the NV coherence time (magnetic noise), the controller noise (laser and microwave operations), and most importantly the diffusion induced noise, which is negligible in the regular NMR setup but is extremely large in the nanoNMR setup and broadens the linewidth above the required resolution. This noise creates a serious bottleneck, as the crucial information is encoded in the tiny chemical shifts and small energy gaps caused by J  couplings. That is, the nanoNMR setup is usually characterized by a weak measured signal, which is masked by a strong noise.
Moreover, the precise noise model is usually complex and unknown. Consequently, it is an intractable data processing challenge to achieve a spectral discrimination between weak and similar signals of nearby frequencies. In particular, because the noise model is complex and unknown, it is difficult to tackle this noise and reach the optimal discrimination by conventional data analysis methods with which optimal discrimination can usually be obtained only when full knowledge of the noise model is available.
In this work we show that the challenge of spectral discrimination between weak and similar signals in the presence of strong and complex noise, can be efficiently confronted by DL algorithms, which effectively learn the noise model. Moreover, we show that DL methods are capable of learning the noise model from a small amount of data which only needs to be gathered for a few minutes. This means that a DL algorithm can analyze a test signal with the same efficiency as numerically demanding Bayesian methods that rely on precise knowledge of the model. In addition, we show that DL methods can be extremely useful in dealing with challenging frequency resolution problems and possibly overcome Bayesian methods even under assumptions that these have full knowledge of the model and infinite computing power.
DL techniques have been successfully applied to spectral data in the fields of Astronomy, Chemistry, Geosciences, and Bioinformatics^{14}. Spectral data from these disciplines pose similar challenges: (1) High data dimensionality (2) Difficulty of modeling the important features from first principles (3) Dirty environments with many classes of objects that need to be differentiated along with varying signal intensities (4) Importance of subtle differences in the signal. Despite these difficulties, which apply in our context as well, impressive achievements have been made such as the detection of narcotics in Raman spectroscopy data with a 0.5% error rate^{15}.
DL methods have also been used for the analisys of NMR data, in particular in the context of automated protein structure for peakpicking of nuclear magnetic resonance spectra^{16}, of biological macromolecules^{17}, and recently also in the context of analyzing a variety of spectral images of proteins by using support vector machine classifier combined with histogram of oriented gradient^{18}, and by using convolutional neural network^{19}. In addition, deep learning techniques such as long short term memory networks^{20} and variational autoencoder networks^{21} have been used in NMR applications for material characterization and subsurface characterization.
We believe that the success of DL methods in the analysis of regular NMR data should be amplified in the nanoNMR setup due to the larger amount of noise in this setting, which originates from two main ingredients that are absent in the regular NMR setting. The first ingredient is the origin of the signal. While in regular NMR the signal is created by thermal polarization, in nanoNMR the signal is created by statistical polarization^{22}, which imply that in the nanoNMR setup the noise is stronger. The second ingredient is the quantum projection noise, which is of a Poissonian or a Bernouli nature, and in many cases is the dominant source of noise. Here we provide evidence that DL methods can tackle these noises efficiently.
To evaluate the efficiency of DL methods in terms of the spectroscopy of nanoNMR data, we consider two problems, frequency discrimination and frequency resolution. We first examine the ability of DL methods to discriminate between two signals corresponding to two different frequencies. In particular, we consider data from signals that were read by an NV center, which simulates noisy nanoNMR data. Typical data for these two frequencies are shown in Fig. 1, which presents two time traces of the datasets together with their Fourier transforms. It is immediately clear that it is impossible to discriminate between the two frequencies based on the Fourier transform alone because the signal has a strong phase noise on top of the detection noise. In this work we show that DL methods are able to classify the data with the same efficiency as Bayesian methods, which use full knowledge of the signal and noise model and are numerically much more demanding than DL methods. The advantage of DL methods is also indicated by their superior performance in frequency discrimination of the experimental data, where the signal and noise models are not fully known.
We then employ DL methods to tackle the problem of frequency resolution in a noisy environment. We show that DL methods can efficiently discriminate between the signal of a single frequency and the signal of two nearby frequencies that have a strong amplitude and phase noise.
Our results strongly suggest that DL methods can effectively learn the physical and noise models and by that constitute an efficient alternative to Bayesian methods, which require a priori knowledge on the physical and noise models.
Frequency Discrimination
The physical model
We consider the problem of discrimination between two signals corresponding to two different frequencies by a single quantum probe. In the nanoNMR setup this corresponds, for example, to the scenario where a single NV center, which serves as a tiny magnetometer, is placed in the proximity of a sample that contains two known molecules between which we wish to discriminate. Specifically, in the presence of a single frequency signal (a single molecule) the Hamiltonian of the spin probe is given by
where g_{i}, ω_{i}, and ϕ_{i} are the amplitude, frequency, and (random) phase of signal i respectively, which is the standard setting in nanoNMR experiments^{1,2,3,4,23}. The probe, which is initially polarized along \(\hat{x}\), freely evolves according to \({H}_{{s}_{i}}\) for a short duration, Δt, and then is measured along \(\hat{y}\). In the measurement scheme of a single experiment, the sequence of probe operations consists of initialization, evolution, and measurement, which is repeated many times under the constant presence of a signal (Fig. 2(a)). In the case of a single shot measurement, the measurement result is a sequence of zeros and ones, Fig. 1 (right), and the probability for a successful measurement (one) is given by
We start by considering an ideal scenario (no noise or inefficiencies) where Eq. (2) holds. We assume that in each experiment the signal corresponds to one of two known frequencies (ω_{1} and ω_{2}), the amplitudes of the signals are known, but in each experiment the signal has an unknown uniformly distributed random phase. A single experiment results in a string of bits, x = {1, 0, 0, 1, …}, where 1 and 0 correspond to a detection of the m_{s} = 0 state or m_{s} = −1 state of the NV center. Given x, we want to obtain an estimation of the frequency of the signal, ω_{est} = ω_{1} or ω_{est} = ω_{2} (Fig. 2(b)). We quantify the performance of a discrimination method M by the error probability of the frequency estimation, which is defined by
where P_{M} (ω_{est} = ω_{j}ω_{i}) is the probability of method m to output ω_{est} = ω_{j} given that the frequency of the signal is ω_{i}.
Full bayesian method
In the ideal scenario considered here, we have full knowledge of the model (Eq. (2)) and the only unknowns are the random phases ϕ_{i}. Hence, we can simply utilize a Full Bayesian method known as the likelihoodratio test and denoted by M_{FB}, where for each frequency we calculate the maximal loglikelihood over the unknown random phases. That is,
where
We estimate the frequency according to the larger likelihood; that is
As M_{FB} utilizes the maximal information on the signal, it obtains the minimal possible error for an unbiased estimator, which can serve as a benchmark to evaluate the efficiency of a learning method. Hence, its error probability serves as a lower bound for the DL method. It is known that Bayesian methods are optimal given the maximal amount of information and given that the optimization can be done efficiently, which is usually not the case, specifically when considering a noisy environment. In order to verify that we indeed have the optimal method, we compare the results to an analytical calculation of the Fisher Information (FI), which can be done in this case.
In general, full knowledge is not available due to either a lack of knowledge of the noise model in the experiment and detection inefficiencies, or lack of knowledge of the signal. In this case, we can utilize a correlation based method, M_{corr}, for frequency discrimination. To this end, we first use a train set of measurement results, X_{train}, for which the frequency of the signal is known. For each x ∈ X_{train} we calculate the correlation vector \({C}_{k}={\langle {x}_{i}{x}_{i+k}\rangle }_{i}\) (here we replace the 0 bit by −1). Then, for each frequency we calculate the averaged correlation vector, \({C}^{{\omega }_{i}}={\langle {C}_{k}\rangle }_{x\in {X}_{train}^{{\omega }_{i}}}\), where \({X}_{train}={X}_{train}^{{\omega }_{1}}\cup {X}_{train}^{{\omega }_{2}}\). To estimate the frequency of an unknown signal we calculate its correlation vector, C_{k}, and then the distances
by the L_{2} norm. We estimate the frequency according to the smaller distance; that is,
This method, however, disregards higher order correlations functions and the finite precision of the correlation functions itself which varies considerably between the nearest neighbors and the higher neighbor separation. While in the limit where all these effects are taken into account this should approach the optimum, it is numerically very challenging or even impossible to apply to most problems of interest.
Deep learning method
To overcome the model’s lack of knowledge, we suggest using a supervised DL model, which we denote by M_{DL}. In particular, we consider a feedforward fully connected neural network. The main reason for choosing a fully connected neural network is that the signal (frequency) information is encoded in the correlations between the values of different input neurons (measurement results), and in particular far apart input neurons. Similar to M_{corr}, we use a training dataset of measurement results of known signals (known labels) to train M_{DL}. We denote the labels of the two frequencies by 0 and 1. M_{DL} is then applied to a test dataset and results in estimations of the frequencies of the test measurement results. Our DL model is a feedforward fully connected neural network of four layers (two hidden layers) as depicted in Fig. 3. While two hidden layers are sufficient for the scenarios considered in this work, it may be the case that more complex noise models would require to employ deeper neural networks. The first layer is called the input layer. The neurons of the input layer output the input data; in our case, the measurement results x of a single experiment, to the second layer. The output of neuron j in the second (hidden) layer is given by \({f}_{j}(z)=f(\sum _{i}\,{w}_{ij}{x}_{i}+{b}_{j})\), where f is the activation function, and w_{ij} and b_{j} are the weights and biases respectively. For the hidden layers we use the rectified linear (ReLU) activation function, f (z) = max(0, z). The output of the second layer is then fed as an input to the third layer and so on. The last layer is called the output layer. In our model the output layer has one neuron whose low and high activation levels are associated with the two possible labels (frequencies). For the output neuron we use the Sigmoid activation function. We use the mean squared error (MSE) between the outputs of the learning model and the labels of the train set as the loss function that is minimized during the training by optimizing the weights and biases of the model. Please note that there is no special reason for choosing the Sigmoid activation function with the MSE loss function; the softmax activation function together with the crossentropy loss functions may be used as well. Overfitting is avoided by restricting the total numbers of nodes in the network (and hence, the number of free variables). In particular, for the examples considered in this work we use a second layer of 20 nodes, and a third layer of 35 nodes (a small modification of the number of neurons in each of the two hidden layers would not change the model’s accuracy significantly). Regarding the test dataset, after the application of the Sigmoid activation function on the output of M_{DL}, we label the output by 1 or 0 according to whether it is >0.5 or <0.5 respectively. We then calculate \({P}_{{M}_{DL}}\) by the loss function (the MSE) between the output labels and the true labels.
Numerical analysis
As a way of testing the performance of M_{DL} in terms of frequency discrimination, we constructed numerical sets of measurement results, x, according to Eq. (2) for two different frequencies, where the phase, ϕ_{i}, was chosen randomly (uniformly distributed between 0 and 2π) for each x. The input data were generated with g_{1} = g_{2} = ω_{1} = 10/(2π) Hz, ω_{2} = ω_{1} + Δω, Δt = 0.5 sec, and a total measurement time of T_{tot} = 500 sec (1000 measurements). Part of the datasets were used for training and the remainder was used for testing the learning model. We compared the performance of M_{FB} to the performance of M_{DL} and M_{corr}. In Fig. 4 we show the discrimination error probabilities, \({P}_{{M}_{FB}}\), \({P}_{{M}_{DL}}\), and \({P}_{{M}_{corr}}\) as a function of the frequency difference, Δω, between the two signals, as well as the corresponding M_{DL} receiver operating characteristic (ROC) curves and areas under the curve (AUC). We considered a first layer of 1000 nodes (1000 measurements), a second layer of 20 nodes, and a third layer of 35 nodes. This choice of number of nodes limits the free variable space and allows us to avoid overfitting without resorting to regularization methods. In this ideal scenario, both M_{corr} and M_{DL} approach the optimal performance of M_{FB} even though both methods have no a priori information on the physical model.
In order to provide indications on the performance of M_{DL} in realworld noisy scenarios we further considered a few more noise models and assumed that these noise models are “unknown” and hence, they are not taken into account in the Bayesian methods M_{FB} and M_{corr}, which remain unchanged as described above. This serves as an indication on how much better the performance of M_{DL} could be in comparison to M_{FB} and M_{corr} in a realworld scenario when the noise model is truly unknown to some extent. The first noise model is still a phase noise. While previously we considered that the random (uniformly distributed) phase of the signal is constant during a single experiment, here we consider a scenario in which the random phase is changed once during a single experiment, where the second random phase is also uniformly distributed. Moreover, the time interval in which the phase change occurs is also uniformly distributed between the time intervals of a single experiment (1000 time intervals). The discrimination error probabilities, \({P}_{{M}_{FB}}\), \({P}_{{M}_{DL}}\), and \({P}_{{M}_{corr}}\) as a function of the frequency difference, Δω, between the two signals are shown in Fig. 5(a). It is clear that while the phase noise damages the discrimination capability of M_{FB} and M_{corr}, M_{DL} is capable of learning the noise model. The second noise model considers a magnetic noise δb, which modifies the Hamiltonian of the probe, Eq. (1) to
Similar to the phase noise, we assume that δb is changed once during a single experiment and that the time interval in which the change of δb occurs is uniformly distributed between the time intervals of a single experiment. Each of the two values of δb is Normally distributed with a zero mean and a standard deviation of σ = g_{i}/5 = 2/(2π) Hz. The discrimination error probabilities, \({P}_{{M}_{FB}}\), \({P}_{{M}_{DL}}\), and \({P}_{{M}_{corr}}\) as a function of the frequency difference, Δω, between the two signals are shown in Fig. 5(b). In this case M_{DL} handles the magnetic noise better that M_{FB} and much better than M_{corr}. In the third noise model we consider noise in the amplitude of the signal. Specifically, we assume that the amplitude value is different in each time interval and that it is Normally distributed with a mean of g = 10/(2π) Hz (the previous value of the nonnoisy amplitude) and a standard deviation that is equal to the mean value, that is, σ = g = 10/(2π) Hz. The discrimination error probabilities, \({P}_{{M}_{FB}}\), \({P}_{{M}_{DL}}\), and \({P}_{{M}_{corr}}\) as a function of the frequency difference, Δω, between the two signals are shown in Fig. 5(c). In this case M_{DL} performs slightly better than M_{corr} and better than M_{FB}. Lastly, we consider the mixednoise scenario where all of the above three noise models are includes. The discrimination error probabilities, \({P}_{{M}_{FB}}\), \({P}_{{M}_{DL}}\), and \({P}_{{M}_{corr}}\) as a function of the frequency difference, Δω, between the two signals are shown in Fig. 5(d) and the corresponding M_{DL} ROC curves and AUC are shown in Fig. 5(e). It is apparent that M_{DL} is still capable of learning the noise model while the performance of M_{FB} and M_{corr} is severely degraded when assuming that we have no further knowledge on the noise model. Of course, in case that we have more knowledge on the noise model, we may be able to modify the Bayesian methods accordingly. However, the implication of such a modification is that the optimization is performed with respect to a larger set of free variables, and therefore implies longer run times while the DL run time remains unchanged. Moreover, the above results suggest that Bayesian method could be very sensitive to the noise model; a minor unknown difference between the true noise model and the assumed noise model could result in a significantly reduced performance of the Bayesian method (say, for example, that there are three phase changes in a single experiment instead of two).
Experimental verification
The NV center in diamond^{24,25,26} is one of the leading quantum probe systems for sensing, imaging and spectroscopy. Here we considered frequency discrimination of measurement results obtained by a single NV center in ambient conditions. Two artificial signals were produced by a signal generator with frequencies ω_{1} = 2π × 250 Hz and ω_{2} = 2π × 251.6 Hz. Each signal was measured for a total measurement time of T_{tot} = 220 sec, with a time interval of Δt = 10 μs. From the row data, we generated strings of 25000 measurement results (T_{tot} = 0.25 sec) such that the phase corresponding to each x can be considered as a random phase (no phase relation), and the frequencies cannot be resolved by a FourierTransform (see Fig. 1 (left)). The low photondetection efficiency of a true detection (m_{s} = 0) and a false detection (m_{s} = −1) was ~7.4% and ~5.2% respectively, indicating low SNR and contrast.
In order to achieve a theoretical bound on the discrimination error, we considered a theoretical model with a modified probability for a successful measurement, which is given by
where P(t) is given by Eq. (2), and η_{true} and η_{flase} are the true and false detection efficiencies respectively. Assuming that η_{flase} = 0.7 η_{true}, we constructed numerical datasets according to Eq. (10), and set the amplitudes of the signals, g_{1} and g_{2}, and the efficiency η_{true} for each signal to match the experimental results according to two constraints: (i) The power spectrum at the frequency of the signal of the numerical data was required to be approximately equal to the power spectrum of the experimental data. (ii) The average of the experimental and numeric signals fulfilled \(\langle x\rangle =\frac{{\eta }_{true}+{\eta }_{false}}{2}\). For the numerical model we achieved \({P}_{{M}_{FB}}\approx 10.8 \% \) and \({P}_{{M}_{DL}}\approx 11.6 \% \), (see Fig. 6 (left), green square and red circle under the diamonds). These results are consistent with the experimental data, for which we obtained \({P}_{{M}_{DL}}^{\exp }\approx 12.1 \% \) (Fig. 6 (left) blue diamond), reaching \({P}_{{M}_{FB}}\) without having any information on the model. Moreover, the Full Bayesian method on the experimental data obtained only \({P}_{{M}_{FB}}^{\exp }\approx 16.2 \% \) (Fig. 6 (left) green diamond). This difference is due to the fact that the experimental statistics differ slightly from our probability function; while for the Bayesian method this creates a problem, the DL method is able to learn this difference and take it into account. This difference is expected to be much more dramatic in real nanoNMR experiments in which there are much more uncertainties of the model. In addition, we analyzed \({P}_{{M}_{FB}}\) and \({P}_{{M}_{DL}}\) on the numerical data as a function of the frequency difference, Δω. The results are shown in Fig. 6 (left). The ROC curve and AUC of M_{DL} on the experimental data are shown in Fig. 6 (right).
It is worth noting that due to the relatively large window size of 25000, a full analysis of M_{corr} is not possible within a reasonable time scale on a common computer. Partial analysis (taking into account segments of twopoint correlations only) of M_{corr} of both the numerical model and the experimental data yielded \({P}_{{M}_{corr}}\gtrsim 0.4\). This indicates that DL could indeed be the better choice when there is a lack of knowledge on the model.
Comparison to other machine learning methods
So far we have shown that DL methods are useful for the problem of frequency discrimination in the nanoNMR settings. In this section we ask whether other machine learning methods could be useful for this task and if so, how these methods perform compared to DL.
Any method that is able to discriminate between two signals of nearby frequencies, as we have considered in previous sections, should be able to learn and acquire the information on the signals from the correlations between different measurement results (different x_{i}). Hence, any successful discrimination method should involve some nonlinearity. Indeed, a fully connected neural network with only linear layers fails in the considered discrimination problem (the achieved error probability is 1/2). We tested the performance of three other linear learning methods, namely, logistic regression (with no interaction terms), K nearest neighbours and supported vector machines (SVM) with a linear kernel, in the ideal model scenario (Fig. 4). Similarly to a fully connected linear neural network, these methods completely fail to discriminate between the signals and achieve an error probability of 1/2 for all values of Δω.
Regarding nonlinear models, we considered two models, SVM with the nonlinear radial basis function (rbf) kernel and XGboost, which is an implementation of gradient boosted decision trees (in our case we consider nonlinear boosting as linear boosting fails). We tested these two models in the ideal model scenario (Fig. 4) as well as in the mixed noise model scenario (see Fig. 5(d)). The results are shown in Fig. 7. It can be seen that these two methods achieve accuracies (discrimination error probabilities) which are very similar to the accuracies obtained by DL. However, there is a big difference in terms of required computational resources as these two methods consume more memory compared to DL, and require much longer running times of, for example, ~20 hours compared to ~20 minutes by DL. Indeed, it is not feasible to use these methods for the discrimination in the case of the experimental data, where the size of the inputs is much larger (input strings of 25000 compared to input strings of 1000). While we have not made an exhaustive study and analysis of machine learning methods, which is beyond the scope of this work, these findings strengthen the possible advantage and benefit of DL method for data processing of nanoNMR experimental results.
Frequency resolution
In this section we considered the problem of discrimination between a signal with a single frequency and a signal with two proximal frequencies centred at the value of the single frequency (Fig. 8). We assumed that the signals have strong amplitude and phase noise, which we model by the OrnsteinUhlenbeck (OU) process, motivated by NV probed statistically polarized nanoNMR experiments^{4,5,23}. The OU process is a stochastic random process, which is a stationary, a Gaussian and a Markov process. It is given by \(x(t+dt)=x(t)\frac{1}{\tau }x(t)dt+\sqrt{c}\,dW(t)\), where τ and c are positive constants called, respectively, the relaxation time and the diffusion constant, and dW(t) is a temporally uncorrelated normal random variable with mean 0 and variance dt. The environmental noise experienced by an NV center is faithfully modelled by an OU process^{27}.
Specifically, the Hamiltonian of the probe is given by
where A_{i} and B_{i} undergo an OU process due to the amplitude and phase noise, and δ_{i} are the frequencies. The probability for a successful measurement (one) is
where n = 2 and δ_{i} = δ_{c} ± Δ2. For two frequencies Δ is finite, and for a single frequency Δ = 0.
Numerical analysis
We constructed numerical datasets according to Eq. (12) where A_{i}(t) and B_{i}(t) follow OU processes with mean μ = 0, volatility \(\sigma =\frac{\pi }{10}\sqrt{\frac{4}{\pi {T}_{2}}}\), and reversion speed θ = 1/T_{2}, where T_{2} = 256 sec is the coherence time of the signal. In addition, we fixed T_{tot} = 2T_{2} and Δt = 1 sec. We tested the performance of M_{DL} as a function of the frequency difference, Δ, in comparison to M_{FB} and M_{corr}. In M_{FB} the maximal loglikelihood was calculated over the random OU processes. For each string of measurement results x, we considered the single frequency signal with Δ = Δ_{0} = 0 and the signal of two nearby frequencies with Δ = Δ_{n} > 0, where Δ_{n} corresponds to the numerical value of the frequency difference between the two frequencies. We generated many sets of random OU processes denoted by O_{k} and calculated
where
We estimated the signal as a single frequency signal or as a signal of two frequencies according to the larger likelihood; that is
Figure 9 (left) shows the error probability as a function of the frequency difference. The M_{DL} results were better than the results of M_{corr} as well as the results of M_{FB}. Interestingly, even though M_{FB} has full knowledge of the noise model it achieves a larger error probability than M_{DL}. We note that increasing the number of OU processes, O_{k}, in the above likelihood calculation does not improve \({P}_{{M}_{FB}}\). While M_{DL} and M_{corr} could reach a result within ~45 min, M_{FB} did so within ~7 hours (CPU times, both considered on the same common PC without utilizing GPU). The M_{DL} ROC curves and AUC are shown in Fig. 9 (right). These numerical results provide a strong indication that DL methods can potentially identify molecules based on their NMR signal extremely fast, which may be a useful tool in probing chemical reactions at the nano scale.
Theoretical implications
While the numerical advantages of machine learning methods were already shown^{28,29}, their theoretical value was not demonstrated before. Beyond the practical interest of utilizing machine learning methods in the nanoNMR frequency resolution problem, machine learning methods, and in particular DL methods, could also have a considerable theoretical value.
Generally in estimation problems, the MSE of an estimator M for a given unseen test input x can be written as
where y is the true label, M(x) is the estimated label, E is the expectation value with respect to the training set, the bias of M is given by \({\rm{Bias}}[M(x)]={\rm{E}}[M(x)]M(x)\), the variance of M is given by \({\rm{Var}}[M(x)]={\rm{E}}[M{(x)}^{2}]{\rm{E}}{[M(x)]}^{2}\), and \({\rm{Var}}[\epsilon ]\) is the irreducible error due to the (zero mean) noise \(\epsilon \). The error probability, P_{M}, is then obtained by the expectation value of the MSE, \({\rm{E}}[{(yM(x))}^{2}]\), with respect to the test set.
An unbiased estimator is an estimator M for which we have that \({\rm{Bias}}\,[M(x)]=0\). An optimal unbiased estimator has a minimal variance, which is known as the minimum variance unbiased (MVU) estimator. However, from Eq. (16) it is seen that an MVU estimator is not necessarily an optimal estimator which minimizes the MSE. Indeed, it is known that biased methods can outperform the unbiased ones^{30,31,32}. In this case the magnitude of the bias is increased and \({({\rm{Bias}}[M(x)])}^{2} > 0\), but the variance \({\rm{Var}}\,[M(x)]\) is significanlty decreased such that the MSE is smaller than the MSE of an MVU estimator. Such strategies of error reduction are used ubiquitously in image restoration^{33,34} and beamforming applications^{35,36}. Moreover, it is known that biased methods can be superior in various spectral analysis applications^{37}. Despite its superiority there are only a few structured methods in which such a biased estimator can be constructed^{31} and in most cases the search for such estimators is extremely challenging, especially as it is unknown if such an estimator exists.
Our numerical analysis of M_{FB} has converged to the final result, however, the method has resolved the two frequencies with a higher error rate than the error rate of M_{DL}. Since M_{FB} is an MVU estimator, our results indicate that for the model at hand the unbiased full model Bayesian analysis is not optimal, and that a superior biased method exists. This brings up an extra advantage of DL as the search for a biased method in usually done in an adhoc manner. Moreover, in most cases there is no way of knowing if a superior method to the unbiased method exists. Hence, our results provide some hope that DL methods could be used as an analytical tool for identifying superior estimators, and in particular, for identifying the ultimate limits of resolution problems.
Conclusion
In conclusion, we showed that DL methods are able to mitigate the effect of the inherent strong noise in the nanoNMR settings. In particular, the DL neural networks effectively learn the noise model, even when no prior knowledge on the noise model is assumed. This is a crucial property of the DL methods as in many realistic nanoNMR scenarios the noise model is complex and not accurately known. We investigated the performance of DL methods in the problems of frequency discrimination and frequency resolution. We showed that DL methods can outperform Bayesian methods when full knowledge of the noise model is not available and that DL methods can analyze a test signal as accurately as numerically demanding Bayesian methods even though Bayesian methods have full knowledge of the noise model, and the DL methods have no prior knowledge at all.
DL methods can perform better than Bayesian methods when the noise model is not precisely known or when the noise model is known but it is a complex model. In the first case DL methods can achieve better results than Bayesian methods as DL methods do not assume prior knowledge on the model while Bayesian methods rely on precise knowledge of the model. This was demonstrated in the case of frequency discrimination in the noisy scenario, as well as in the analysis of the experimental data. In the second case the results of both methods may be similar, but the consumption of computational resources of Bayesian methods can be much larger compared to the resources required by DL methods, as was demonstrated in the problem of frequency resolution of noisy signals.
Our results can be seen as a strong indication that DL methods will turn out to be the method of choice when analyzing spectroscopic nanoNMR data. In addition, our results indicate that DL methods could be utilized as a tool that may enable to identify superior biased estimators and ultimate limits of resolution problems, which are otherwise difficult to obtain^{12}.
Data availability
The authors declare that all relevant data supporting the findings of this study are available within the paper (and its Supplementary Information Files).
References
 1.
Balasubramanian, G. et al. Nanoscale imaging magnetometry with diamond spins under ambient conditions. Nat. 455, 648–651 (2008).
 2.
Gruber, A. et al. Scanning confocal optical microscopy and magnetic resonance on single defect centers. Sci. 276, 2012–2014 (1997).
 3.
Maze, J. et al. Nanoscale magnetic sensing with an individual electronic spin in diamond. Nat. 455, 644–647 (2008).
 4.
Staudacher, T. et al. Nuclear magnetic resonance spectroscopy on a (5nanometer) 3 sample volume. Sci. 339, 561–563 (2013).
 5.
Mamin, H. et al. Nanoscale nuclear magnetic resonance with a nitrogenvacancy spin sensor. Sci. 339, 557–560 (2013).
 6.
Müller, C. et al. Nuclear magnetic resonance spectroscopy with single spin sensitivity. Nat. communications 5 (2014).
 7.
DeVience, S. J. et al. Nanoscale nmr spectroscopy and imaging of multiple nuclear species. Nat. nanotechnology 10, 129 (2015).
 8.
Lovchinsky, I. et al. Nuclear magnetic resonance detection and spectroscopy of single proteins using quantum logic. Sci. 351, 836–841 (2016).
 9.
Schmitt, S. et al. Submillihertz magnetic spectroscopy performed with a nanoscale quantum sensor. Sci. 356, 832–837 (2017).
 10.
Boss, J., Cujia, K., Zopes, J. & Degen, C. Quantum sensing with arbitrary frequency resolution. Sci. 356, 837–840 (2017).
 11.
Bucher, D. B. et al. High resolution magnetic resonance spectroscopy using solidstate spins. arXiv preprint arXiv:1705.08887 (2017).
 12.
Rotem, A. et al. Limits on spectral resolution measurements by quantum probes. arXiv preprint arXiv:1707.01902 (2017).
 13.
Zaiser, S. et al. Enhancing quantum sensing sensitivity by a quantum memory. Nat. Commun. 7 (2016).
 14.
Villmann, T. & Merényi, E. Machine learning approaches and pattern recognition for spectral data. In Proceedings of the 16. European Symposium on Artificial Neural Networks ESANN 2008, 433–444 (DSide Publications, 2008).
 15.
Howley, T., Madden, M. G., O’Connell, M.L. & Ryder, A. G. The effect of principal component analysis on machine learning accuracy with highdimensional spectral data. KnowledgeBased Syst. 19, 363–370 (2006).
 16.
Carrara, E. A., Pagliari, F. & Nicolini, C. Neural networks for the peakpicking of nuclear magnetic resonance spectra. Neural Networks 6, 1023–1032 (1993).
 17.
Corne, S. A., Johnson, A. P. & Fisher, J. An artificial neural network for classifying cross peaks in twodimensional nmr spectra. J. Magn. Reson. (1969) 100, 256–266 (1992).
 18.
Klukowski, P., Walczak, M. J., Gonczarek, A., Boudet, J. & Wider, G. Computer visionbased automated peak picking applied to protein nmr spectra. Bioinforma. 31, 2981–2988 (2015).
 19.
Klukowski, P. et al. Nmrnet: a deep learning approach to automated peak picking of protein nmr spectra. Bioinforma. 34, 2590–2597 (2018).
 20.
Li, H. & Misra, S. Long shortterm memory and variational autoencoder with convolutional neural networks for generating nmr t2 distributions. IEEE Geosci. Remote. Sens. Lett. 16, 192–195 (2018).
 21.
Li, H. & Misra, S. Prediction of subsurface nmr t2 distributions in a shale petroleum system using variational autoencoderbased neural networks. IEEE Geosci. Remote. Sens. Lett. 14, 2395–2397 (2017).
 22.
Herzog, B. E., Cadeddu, D., Xue, F., Peddibhotla, P. & Poggio, M. Boundary between the thermal and statistical polarization regimes in a nuclear spin ensemble. Appl. Phys. Lett. 105, 043112 (2014).
 23.
Staudacher, T. et al. Probing molecular dynamics at the nanoscale via an individual paramagnetic centre. Nat. communications 6 (2015).
 24.
Jelezko, F. & Wrachtrup, J. Single defect centres in diamond: A review. physica status solidi (a) 203, 3207–3225 (2006).
 25.
Schirhagl, R., Chang, K., Loretz, M. & Degen, C. L. Nitrogenvacancy centers in diamond: nanoscale sensors for physics and biology. Annu. review physical chemistry 65, 83–105 (2014).
 26.
Doherty, M. W. et al. The nitrogenvacancy colour centre in diamond. Phys. Reports 528, 1–45 (2013).
 27.
de Lange, G., Wang, Z. H., Ristè, D., Dobrovitski, V. V. &Hanson, R. Universal dynamical decoupling of a single solidstate spin from a spin bath. Sci. 330, 60–63, https://science.sciencemag.org/content/330/6000/60, https://doi.org/10.1126/science.1192739 (2010).
 28.
Santagati, R. et al. Magneticfieldlearning using a single electronic spin in diamond with onephotonreadout at room temperature. arXiv preprint arXiv:1807.09753 (2018).
 29.
Granade, C. E., Ferrie, C., Wiebe, N. & Cory, D. G. Robust online hamiltonian learning. New J. Phys. 14, 103013 (2012).
 30.
Efron, B. Biased versus unbiased estimation. Adv. Math. 16, 259–277 (1975).
 31.
Eldar, Y. C. et al. Rethinking biased estimation: Improving maximum likelihood and the cramér–rao bound. Foundations Trends Signal Process. 1, 305–449 (2008).
 32.
James, W. & Stein, C. Estimation with quadratic loss. In Breakthroughs in statistics, 443–460 (Springer, 1992).
 33.
Demoment, G. Image reconstruction and restoration: Overview of common estimation structures and problems. IEEE Transactions on Acoust. Speech Signal Process. 37, 2024–2036 (1989).
 34.
Meng, L. & Clinthorne, N. H. A modified uniform cramerrao bound for multiple pinhole aperture design. IEEE Transactions on Med. Imaging 23, 896–902 (2004).
 35.
Cox, H., Zeskind, R. & Owen, M. Robust adaptive beamforming. IEEE Transactions on Acoust. Speech, Signal Process. 35, 1365–1376 (1987).
 36.
Carlson, B. D. Covariance matrix estimation errors and diagonal loading in adaptive arrays. IEEE Transactions on Aerosp. Electron. systems 24, 397–401 (1988).
 37.
Stoica, P. & Moses, R. L. Introduction to spectral analysis, vol. 1 (Prentice hall Upper Saddle River, N. J., 1997).
Acknowledgements
A.R., N.A. have received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 770929 ERC consolidator grant QRES and the collaborative projects ASTERIQS and Hyperdiamond. F.J. acknowledges the support of BMBF, ERC, VW Stiftund BW Stiftung and ASTERIQS.
Author information
Affiliations
Contributions
A. Retzker and Z.R. supervised the project. N.A., A. Retzker and Z.R. performed the theoretical analysis. N.A. and Z.R. performed the deep learning and machine learning analysis. N.A. and A. Rotem carried out the spectral resolution numerical analysis. L.P.M. and F.J. conducted the experiment. N.A. took the lead in writing the manuscript with support from A. Retzker and Z.R.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Aharon, N., Rotem, A., McGuinness, L.P. et al. NV center based nanoNMR enhanced by deep learning. Sci Rep 9, 17802 (2019). https://doi.org/10.1038/s41598019541199
Received:
Accepted:
Published:
Further reading

Learning models of quantum systems from experiments
Nature Physics (2021)

Quantum nanophotonic and nanoplasmonic sensing: towards quantum optical bioscience laboratories on chip
Nanophotonics (2021)

Deep learning enhanced individual nuclearspin detection
npj Quantum Information (2021)

Optimal frequency measurements with quantum probes
npj Quantum Information (2021)

Artificial intelligence enhanced twodimensional nanoscale nuclear magnetic resonance spectroscopy
npj Quantum Information (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.