NV center based nano-NMR enhanced by deep learning

Aharon, Nati; Rotem, Amit; McGuinness, Liam P.; Jelezko, Fedor; Retzker, Alex; Ringel, Zohar

doi:10.1038/s41598-019-54119-9

Download PDF

Article
Open access
Published: 28 November 2019

NV center based nano-NMR enhanced by deep learning

Nati Aharon¹,
Amit Rotem¹,
Liam P. McGuinness²,
Fedor Jelezko²,
Alex Retzker¹ &
…
Zohar Ringel¹

Scientific Reports volume 9, Article number: 17802 (2019) Cite this article

4822 Accesses
19 Citations
2 Altmetric
Metrics details

Subjects

Abstract

The growing field of nano nuclear magnetic resonance (nano-NMR) seeks to estimate spectra or discriminate between spectra of minuscule amounts of complex molecules. While this field holds great promise, nano-NMR experiments suffer from detrimental inherent noise. This strong noise masks to the weak signal and results in a very low signal-to-noise ratio. Moreover, the noise model is usually complex and unknown, which renders the data processing of the measurement results very complicated. Hence, spectra discrimination is hard to achieve and in particular, it is difficult to reach the optimal discrimination. In this work we present strong indications that this difficulty can be overcome by deep learning (DL) algorithms. The DL algorithms can mitigate the adversarial effects of the noise efficiently by effectively learning the noise model. We show that in the case of frequency discrimination DL algorithms reach the optimal discrimination without having any pre-knowledge of the physical model. Moreover, the DL discrimination scheme outperform Bayesian methods when verified on noisy experimental data obtained by a single Nitrogen-Vacancy (NV) center. In the case of frequency resolution we show that this approach outperforms Bayesian methods even when the latter have full pre-knowledge of the noise model and the former has none. These DL algorithms also emerge as much more efficient in terms of computational resources and run times. Since in many real-world scenarios the noise is complex and difficult to model, we argue that DL is likely to become a dominant tool in the field.

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Digital colloid-enhanced Raman spectroscopy by single-molecule counting

Article 17 April 2024

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Article 26 February 2024

Introduction

The newly developed discipline of nano-NMR^{1,2,3,4,5,6,7} is aimed at reducing the minimal NMR sample size by many orders of magnitude, and thus increasing the NMR sensitivity and spatial resolution down to a few molecules⁸. This is achieved by replacing the macroscopic coil of the NMR setup, which measures the magnetic field, by a single or an ensemble of controllable spins, e.g., NV centers in diamond, which serve as tiny magnetometers. Recent experiments have shown that it is possible to estimate the spectrum of artificial signals and signals of polarized samples with high resolution^{9,10,11,12,13}. However, the obvious advantages of receiving spectral information about tiny quantities of molecules are masked by the extra amount of noise that goes hand in hand with most configurations of this setup. There are a few sources of this extra noise, which include the NV coherence time (magnetic noise), the controller noise (laser and microwave operations), and most importantly the diffusion induced noise, which is negligible in the regular NMR setup but is extremely large in the nano-NMR setup and broadens the line-width above the required resolution. This noise creates a serious bottleneck, as the crucial information is encoded in the tiny chemical shifts and small energy gaps caused by J - couplings. That is, the nano-NMR setup is usually characterized by a weak measured signal, which is masked by a strong noise.

Moreover, the precise noise model is usually complex and unknown. Consequently, it is an intractable data processing challenge to achieve a spectral discrimination between weak and similar signals of near-by frequencies. In particular, because the noise model is complex and unknown, it is difficult to tackle this noise and reach the optimal discrimination by conventional data analysis methods with which optimal discrimination can usually be obtained only when full knowledge of the noise model is available.

In this work we show that the challenge of spectral discrimination between weak and similar signals in the presence of strong and complex noise, can be efficiently confronted by DL algorithms, which effectively learn the noise model. Moreover, we show that DL methods are capable of learning the noise model from a small amount of data which only needs to be gathered for a few minutes. This means that a DL algorithm can analyze a test signal with the same efficiency as numerically demanding Bayesian methods that rely on precise knowledge of the model. In addition, we show that DL methods can be extremely useful in dealing with challenging frequency resolution problems and possibly overcome Bayesian methods even under assumptions that these have full knowledge of the model and infinite computing power.

DL techniques have been successfully applied to spectral data in the fields of Astronomy, Chemistry, Geosciences, and Bioinformatics¹⁴. Spectral data from these disciplines pose similar challenges: (1) High data dimensionality (2) Difficulty of modeling the important features from first principles (3) Dirty environments with many classes of objects that need to be differentiated along with varying signal intensities (4) Importance of subtle differences in the signal. Despite these difficulties, which apply in our context as well, impressive achievements have been made such as the detection of narcotics in Raman spectroscopy data with a 0.5% error rate¹⁵.

DL methods have also been used for the analisys of NMR data, in particular in the context of automated protein structure for peak-picking of nuclear magnetic resonance spectra¹⁶, of biological macromolecules¹⁷, and recently also in the context of analyzing a variety of spectral images of proteins by using support vector machine classifier combined with histogram of oriented gradient¹⁸, and by using convolutional neural network¹⁹. In addition, deep learning techniques such as long short term memory networks²⁰ and variational auto-encoder networks²¹ have been used in NMR applications for material characterization and subsurface characterization.

We believe that the success of DL methods in the analysis of regular NMR data should be amplified in the nano-NMR setup due to the larger amount of noise in this setting, which originates from two main ingredients that are absent in the regular NMR setting. The first ingredient is the origin of the signal. While in regular NMR the signal is created by thermal polarization, in nano-NMR the signal is created by statistical polarization²², which imply that in the nano-NMR setup the noise is stronger. The second ingredient is the quantum projection noise, which is of a Poissonian or a Bernouli nature, and in many cases is the dominant source of noise. Here we provide evidence that DL methods can tackle these noises efficiently.

To evaluate the efficiency of DL methods in terms of the spectroscopy of nano-NMR data, we consider two problems, frequency discrimination and frequency resolution. We first examine the ability of DL methods to discriminate between two signals corresponding to two different frequencies. In particular, we consider data from signals that were read by an NV center, which simulates noisy nano-NMR data. Typical data for these two frequencies are shown in Fig. 1, which presents two time traces of the datasets together with their Fourier transforms. It is immediately clear that it is impossible to discriminate between the two frequencies based on the Fourier transform alone because the signal has a strong phase noise on top of the detection noise. In this work we show that DL methods are able to classify the data with the same efficiency as Bayesian methods, which use full knowledge of the signal and noise model and are numerically much more demanding than DL methods. The advantage of DL methods is also indicated by their superior performance in frequency discrimination of the experimental data, where the signal and noise models are not fully known.

We then employ DL methods to tackle the problem of frequency resolution in a noisy environment. We show that DL methods can efficiently discriminate between the signal of a single frequency and the signal of two nearby frequencies that have a strong amplitude and phase noise.

Our results strongly suggest that DL methods can effectively learn the physical and noise models and by that constitute an efficient alternative to Bayesian methods, which require a priori knowledge on the physical and noise models.

Frequency Discrimination

The physical model

We consider the problem of discrimination between two signals corresponding to two different frequencies by a single quantum probe. In the nano-NMR setup this corresponds, for example, to the scenario where a single NV center, which serves as a tiny magnetometer, is placed in the proximity of a sample that contains two known molecules between which we wish to discriminate. Specifically, in the presence of a single frequency signal (a single molecule) the Hamiltonian of the spin probe is given by

$${H}_{{s}_{i}}={g}_{i}\,\cos \,({\omega }_{i}t+{\phi }_{i}){S}_{z},$$

(1)

where g_i, ω_i, and ϕ_i are the amplitude, frequency, and (random) phase of signal i respectively, which is the standard setting in nano-NMR experiments^1,2,3,4,23. The probe, which is initially polarized along $\hat{x}$, freely evolves according to ${H}_{{s}_{i}}$ for a short duration, Δt, and then is measured along $\hat{y}$. In the measurement scheme of a single experiment, the sequence of probe operations consists of initialization, evolution, and measurement, which is repeated many times under the constant presence of a signal (Fig. 2(a)). In the case of a single shot measurement, the measurement result is a sequence of zeros and ones, Fig. 1 (right), and the probability for a successful measurement (one) is given by

$$P(t)=\,\sin \,{[\frac{{g}_{i}}{2{\omega }_{i}}(\sin [{\omega }_{i}t+{\phi }_{i}]-\sin [{\omega }_{i}(t-\Delta t)+{\phi }_{i}])+\frac{\pi }{4}]}^{2}.$$

(2)

We start by considering an ideal scenario (no noise or inefficiencies) where Eq. (2) holds. We assume that in each experiment the signal corresponds to one of two known frequencies (ω₁ and ω₂), the amplitudes of the signals are known, but in each experiment the signal has an unknown uniformly distributed random phase. A single experiment results in a string of bits, x = {1, 0, 0, 1, …}, where 1 and 0 correspond to a detection of the m_s = 0 state or m_s = −1 state of the NV center. Given x, we want to obtain an estimation of the frequency of the signal, ω_est = ω₁ or ω_est = ω₂ (Fig. 2(b)). We quantify the performance of a discrimination method M by the error probability of the frequency estimation, which is defined by

$${P}_{M}^{error}\equiv \frac{1}{2}\mathop{\sum }\limits_{\begin{array}{c}i=1\\ j\ne i\end{array}}^{i=2}\,{P}_{M}({\omega }_{est}={\omega }_{j}|{\omega }_{i}),$$

(3)

where P_M (ω_est = ω_j|ω_i) is the probability of method m to output ω_est = ω_j given that the frequency of the signal is ω_i.

Full bayesian method

In the ideal scenario considered here, we have full knowledge of the model (Eq. (2)) and the only unknowns are the random phases ϕ_i. Hence, we can simply utilize a Full Bayesian method known as the likelihood-ratio test and denoted by M_FB, where for each frequency we calculate the maximal log-likelihood over the unknown random phases. That is,

$${L}_{1}=\mathop{{\rm{\max }}\,}\limits_{{\phi }_{k}}L({\phi }_{k}|x,{\omega }_{1}),\,{L}_{2}=\mathop{{\rm{\max }}\,}\limits_{{\phi }_{k}}L({\phi }_{k}|x,{\omega }_{2}),$$

(4)

where

$$L({\phi }_{k}|x,{\omega }_{i})=\sum _{j}\,({x}_{j}\,\log \,P({t}_{j},{\omega }_{i},{\phi }_{k})+\,(1-{x}_{j})\,\log \,(1-P({t}_{j},{\omega }_{i},{\phi }_{k}))).$$

(5)

We estimate the frequency according to the larger likelihood; that is

$${\omega }_{est}=(\begin{array}{ll}{\omega }_{1} & {L}_{1} > {L}_{2}\\ {\omega }_{2} & {\rm{otherwise}}.\end{array}$$

(6)

As M_FB utilizes the maximal information on the signal, it obtains the minimal possible error for an unbiased estimator, which can serve as a benchmark to evaluate the efficiency of a learning method. Hence, its error probability serves as a lower bound for the DL method. It is known that Bayesian methods are optimal given the maximal amount of information and given that the optimization can be done efficiently, which is usually not the case, specifically when considering a noisy environment. In order to verify that we indeed have the optimal method, we compare the results to an analytical calculation of the Fisher Information (FI), which can be done in this case.

In general, full knowledge is not available due to either a lack of knowledge of the noise model in the experiment and detection inefficiencies, or lack of knowledge of the signal. In this case, we can utilize a correlation based method, M_corr, for frequency discrimination. To this end, we first use a train set of measurement results, X_train, for which the frequency of the signal is known. For each x ∈ X_train we calculate the correlation vector ${C}_{k}={\langle {x}_{i}{x}_{i+k}\rangle }_{i}$ (here we replace the 0 bit by −1). Then, for each frequency we calculate the averaged correlation vector, ${C}^{{\omega }_{i}}={\langle {C}_{k}\rangle }_{x\in {X}_{train}^{{\omega }_{i}}}$, where ${X}_{train}={X}_{train}^{{\omega }_{1}}\cup {X}_{train}^{{\omega }_{2}}$. To estimate the frequency of an unknown signal we calculate its correlation vector, C_k, and then the distances

$${D}_{1}=||{C}_{k}-{C}^{{\omega }_{1}}|{|}_{{L}_{2}},\,{D}_{2}=||{C}_{k}-{C}^{{\omega }_{2}}|{|}_{{L}_{2}},$$

(7)

by the L₂ norm. We estimate the frequency according to the smaller distance; that is,

$${\omega }_{est}=(\begin{array}{ll}{\omega }_{1} & {D}_{1} < {D}_{2}\\ {\omega }_{2} & {\rm{otherwise}}.\end{array}$$

(8)

This method, however, disregards higher order correlations functions and the finite precision of the correlation functions itself which varies considerably between the nearest neighbors and the higher neighbor separation. While in the limit where all these effects are taken into account this should approach the optimum, it is numerically very challenging or even impossible to apply to most problems of interest.

Deep learning method

To overcome the model’s lack of knowledge, we suggest using a supervised DL model, which we denote by M_DL. In particular, we consider a feed-forward fully connected neural network. The main reason for choosing a fully connected neural network is that the signal (frequency) information is encoded in the correlations between the values of different input neurons (measurement results), and in particular far apart input neurons. Similar to M_corr, we use a training dataset of measurement results of known signals (known labels) to train M_DL. We denote the labels of the two frequencies by 0 and 1. M_DL is then applied to a test dataset and results in estimations of the frequencies of the test measurement results. Our DL model is a feed-forward fully connected neural network of four layers (two hidden layers) as depicted in Fig. 3. While two hidden layers are sufficient for the scenarios considered in this work, it may be the case that more complex noise models would require to employ deeper neural networks. The first layer is called the input layer. The neurons of the input layer output the input data; in our case, the measurement results x of a single experiment, to the second layer. The output of neuron j in the second (hidden) layer is given by ${f}_{j}(z)=f(\sum _{i}\,{w}_{ij}{x}_{i}+{b}_{j})$, where f is the activation function, and w_ij and b_j are the weights and biases respectively. For the hidden layers we use the rectified linear (ReLU) activation function, f (z) = max(0, z). The output of the second layer is then fed as an input to the third layer and so on. The last layer is called the output layer. In our model the output layer has one neuron whose low and high activation levels are associated with the two possible labels (frequencies). For the output neuron we use the Sigmoid activation function. We use the mean squared error (MSE) between the outputs of the learning model and the labels of the train set as the loss function that is minimized during the training by optimizing the weights and biases of the model. Please note that there is no special reason for choosing the Sigmoid activation function with the MSE loss function; the softmax activation function together with the cross-entropy loss functions may be used as well. Overfitting is avoided by restricting the total numbers of nodes in the network (and hence, the number of free variables). In particular, for the examples considered in this work we use a second layer of 20 nodes, and a third layer of 35 nodes (a small modification of the number of neurons in each of the two hidden layers would not change the model’s accuracy significantly). Regarding the test dataset, after the application of the Sigmoid activation function on the output of M_DL, we label the output by 1 or 0 according to whether it is >0.5 or <0.5 respectively. We then calculate ${P}_{{M}_{DL}}$ by the loss function (the MSE) between the output labels and the true labels.

Numerical analysis

As a way of testing the performance of M_DL in terms of frequency discrimination, we constructed numerical sets of measurement results, x, according to Eq. (2) for two different frequencies, where the phase, ϕ_i, was chosen randomly (uniformly distributed between 0 and 2π) for each x. The input data were generated with g₁ = g₂ = ω₁ = 10/(2π) Hz, ω₂ = ω₁ + Δω, Δt = 0.5 sec, and a total measurement time of T_tot = 500 sec (1000 measurements). Part of the datasets were used for training and the remainder was used for testing the learning model. We compared the performance of M_FB to the performance of M_DL and M_corr. In Fig. 4 we show the discrimination error probabilities, ${P}_{{M}_{FB}}$, ${P}_{{M}_{DL}}$, and ${P}_{{M}_{corr}}$ as a function of the frequency difference, Δω, between the two signals, as well as the corresponding M_DL receiver operating characteristic (ROC) curves and areas under the curve (AUC). We considered a first layer of 1000 nodes (1000 measurements), a second layer of 20 nodes, and a third layer of 35 nodes. This choice of number of nodes limits the free variable space and allows us to avoid overfitting without resorting to regularization methods. In this ideal scenario, both M_corr and M_DL approach the optimal performance of M_FB even though both methods have no a priori information on the physical model.

In order to provide indications on the performance of M_DL in real-world noisy scenarios we further considered a few more noise models and assumed that these noise models are “unknown” and hence, they are not taken into account in the Bayesian methods M_FB and M_corr, which remain unchanged as described above. This serves as an indication on how much better the performance of M_DL could be in comparison to M_FB and M_corr in a real-world scenario when the noise model is truly unknown to some extent. The first noise model is still a phase noise. While previously we considered that the random (uniformly distributed) phase of the signal is constant during a single experiment, here we consider a scenario in which the random phase is changed once during a single experiment, where the second random phase is also uniformly distributed. Moreover, the time interval in which the phase change occurs is also uniformly distributed between the time intervals of a single experiment (1000 time intervals). The discrimination error probabilities, ${P}_{{M}_{FB}}$, ${P}_{{M}_{DL}}$, and ${P}_{{M}_{corr}}$ as a function of the frequency difference, Δω, between the two signals are shown in Fig. 5(a). It is clear that while the phase noise damages the discrimination capability of M_FB and M_corr, M_DL is capable of learning the noise model. The second noise model considers a magnetic noise δb, which modifies the Hamiltonian of the probe, Eq. (1) to

$${H}_{{s}_{i}}={g}_{i}\,\cos \,({\omega }_{i}t+{\phi }_{i}){S}_{z}+\delta b{S}_{z}.$$

(9)

Similar to the phase noise, we assume that δb is changed once during a single experiment and that the time interval in which the change of δb occurs is uniformly distributed between the time intervals of a single experiment. Each of the two values of δb is Normally distributed with a zero mean and a standard deviation of σ = g_i/5 = 2/(2π) Hz. The discrimination error probabilities, ${P}_{{M}_{FB}}$, ${P}_{{M}_{DL}}$, and ${P}_{{M}_{corr}}$ as a function of the frequency difference, Δω, between the two signals are shown in Fig. 5(b). In this case M_DL handles the magnetic noise better that M_FB and much better than M_corr. In the third noise model we consider noise in the amplitude of the signal. Specifically, we assume that the amplitude value is different in each time interval and that it is Normally distributed with a mean of g = 10/(2π) Hz (the previous value of the non-noisy amplitude) and a standard deviation that is equal to the mean value, that is, σ = g = 10/(2π) Hz. The discrimination error probabilities, ${P}_{{M}_{FB}}$, ${P}_{{M}_{DL}}$, and ${P}_{{M}_{corr}}$ as a function of the frequency difference, Δω, between the two signals are shown in Fig. 5(c). In this case M_DL performs slightly better than M_corr and better than M_FB. Lastly, we consider the mixed-noise scenario where all of the above three noise models are includes. The discrimination error probabilities, ${P}_{{M}_{FB}}$, ${P}_{{M}_{DL}}$, and ${P}_{{M}_{corr}}$ as a function of the frequency difference, Δω, between the two signals are shown in Fig. 5(d) and the corresponding M_DL ROC curves and AUC are shown in Fig. 5(e). It is apparent that M_DL is still capable of learning the noise model while the performance of M_FB and M_corr is severely degraded when assuming that we have no further knowledge on the noise model. Of course, in case that we have more knowledge on the noise model, we may be able to modify the Bayesian methods accordingly. However, the implication of such a modification is that the optimization is performed with respect to a larger set of free variables, and therefore implies longer run times while the DL run time remains unchanged. Moreover, the above results suggest that Bayesian method could be very sensitive to the noise model; a minor unknown difference between the true noise model and the assumed noise model could result in a significantly reduced performance of the Bayesian method (say, for example, that there are three phase changes in a single experiment instead of two).

Experimental verification

The NV center in diamond^24,25,26 is one of the leading quantum probe systems for sensing, imaging and spectroscopy. Here we considered frequency discrimination of measurement results obtained by a single NV center in ambient conditions. Two artificial signals were produced by a signal generator with frequencies ω₁ = 2π × 250 Hz and ω₂ = 2π × 251.6 Hz. Each signal was measured for a total measurement time of T_tot = 220 sec, with a time interval of Δt = 10 μs. From the row data, we generated strings of 25000 measurement results (T_tot = 0.25 sec) such that the phase corresponding to each x can be considered as a random phase (no phase relation), and the frequencies cannot be resolved by a Fourier-Transform (see Fig. 1 (left)). The low photon-detection efficiency of a true detection (m_s = 0) and a false detection (m_s = −1) was ~7.4% and ~5.2% respectively, indicating low SNR and contrast.

In order to achieve a theoretical bound on the discrimination error, we considered a theoretical model with a modified probability for a successful measurement, which is given by

$$Q(t)={\eta }_{true}P(t)+{\eta }_{false}[1-P(t)],$$

(10)

where P(t) is given by Eq. (2), and η_true and η_flase are the true and false detection efficiencies respectively. Assuming that η_flase = 0.7 η_true, we constructed numerical datasets according to Eq. (10), and set the amplitudes of the signals, g₁ and g₂, and the efficiency η_true for each signal to match the experimental results according to two constraints: (i) The power spectrum at the frequency of the signal of the numerical data was required to be approximately equal to the power spectrum of the experimental data. (ii) The average of the experimental and numeric signals fulfilled $\langle x\rangle =\frac{{\eta }_{true}+{\eta }_{false}}{2}$. For the numerical model we achieved ${P}_{{M}_{FB}}\approx 10.8 \% $ and ${P}_{{M}_{DL}}\approx 11.6 \% $, (see Fig. 6 (left), green square and red circle under the diamonds). These results are consistent with the experimental data, for which we obtained ${P}_{{M}_{DL}}^{\exp }\approx 12.1 \% $ (Fig. 6 (left) blue diamond), reaching ${P}_{{M}_{FB}}$ without having any information on the model. Moreover, the Full Bayesian method on the experimental data obtained only ${P}_{{M}_{FB}}^{\exp }\approx 16.2 \% $ (Fig. 6 (left) green diamond). This difference is due to the fact that the experimental statistics differ slightly from our probability function; while for the Bayesian method this creates a problem, the DL method is able to learn this difference and take it into account. This difference is expected to be much more dramatic in real nano-NMR experiments in which there are much more uncertainties of the model. In addition, we analyzed ${P}_{{M}_{FB}}$ and ${P}_{{M}_{DL}}$ on the numerical data as a function of the frequency difference, Δω. The results are shown in Fig. 6 (left). The ROC curve and AUC of M_DL on the experimental data are shown in Fig. 6 (right).

It is worth noting that due to the relatively large window size of 25000, a full analysis of M_corr is not possible within a reasonable time scale on a common computer. Partial analysis (taking into account segments of two-point correlations only) of M_corr of both the numerical model and the experimental data yielded ${P}_{{M}_{corr}}\gtrsim 0.4$. This indicates that DL could indeed be the better choice when there is a lack of knowledge on the model.

Comparison to other machine learning methods

So far we have shown that DL methods are useful for the problem of frequency discrimination in the nano-NMR settings. In this section we ask whether other machine learning methods could be useful for this task and if so, how these methods perform compared to DL.

Any method that is able to discriminate between two signals of near-by frequencies, as we have considered in previous sections, should be able to learn and acquire the information on the signals from the correlations between different measurement results (different x_i). Hence, any successful discrimination method should involve some non-linearity. Indeed, a fully connected neural network with only linear layers fails in the considered discrimination problem (the achieved error probability is 1/2). We tested the performance of three other linear learning methods, namely, logistic regression (with no interaction terms), K nearest neighbours and supported vector machines (SVM) with a linear kernel, in the ideal model scenario (Fig. 4). Similarly to a fully connected linear neural network, these methods completely fail to discriminate between the signals and achieve an error probability of 1/2 for all values of Δω.

Regarding non-linear models, we considered two models, SVM with the non-linear radial basis function (rbf) kernel and XGboost, which is an implementation of gradient boosted decision trees (in our case we consider non-linear boosting as linear boosting fails). We tested these two models in the ideal model scenario (Fig. 4) as well as in the mixed noise model scenario (see Fig. 5(d)). The results are shown in Fig. 7. It can be seen that these two methods achieve accuracies (discrimination error probabilities) which are very similar to the accuracies obtained by DL. However, there is a big difference in terms of required computational resources as these two methods consume more memory compared to DL, and require much longer running times of, for example, ~20 hours compared to ~20 minutes by DL. Indeed, it is not feasible to use these methods for the discrimination in the case of the experimental data, where the size of the inputs is much larger (input strings of 25000 compared to input strings of 1000). While we have not made an exhaustive study and analysis of machine learning methods, which is beyond the scope of this work, these findings strengthen the possible advantage and benefit of DL method for data processing of nano-NMR experimental results.

Frequency resolution

In this section we considered the problem of discrimination between a signal with a single frequency and a signal with two proximal frequencies centred at the value of the single frequency (Fig. 8). We assumed that the signals have strong amplitude and phase noise, which we model by the Ornstein-Uhlenbeck (OU) process, motivated by NV probed statistically polarized nano-NMR experiments^4,5,23. The OU process is a stochastic random process, which is a stationary, a Gaussian and a Markov process. It is given by $x(t+dt)=x(t)-\frac{1}{\tau }x(t)dt+\sqrt{c}\,dW(t)$, where τ and c are positive constants called, respectively, the relaxation time and the diffusion constant, and dW(t) is a temporally uncorrelated normal random variable with mean 0 and variance dt. The environmental noise experienced by an NV center is faithfully modelled by an OU process²⁷.

Specifically, the Hamiltonian of the probe is given by

$$H=(\mathop{\sum }\limits_{i=1}^{n}\,{A}_{i}(t)\,\cos \,[{\delta }_{i}t]-{B}_{i}(t)\,\sin \,[{\delta }_{i}t]){S}_{z},$$

(11)

where A_i and B_i undergo an OU process due to the amplitude and phase noise, and δ_i are the frequencies. The probability for a successful measurement (one) is

$$\begin{array}{rcl}P(t) & = & \sin [\mathop{\sum }\limits_{i=1}^{n}\,\frac{{A}_{i}(t)}{{\delta }_{i}}(\sin \,[{\delta }_{i}t]-\,\sin \,[{\delta }_{i}(t-\Delta t)])\\ & & +\,{\frac{{B}_{i}(t)}{{\delta }_{i}}(\cos [{\delta }_{i}t]-\cos [{\delta }_{i}(t-\Delta t)])+\frac{\pi }{4}]}^{2},\end{array}$$

(12)

where n = 2 and δ_i = δ_c ± Δ2. For two frequencies Δ is finite, and for a single frequency Δ = 0.

Numerical analysis

We constructed numerical datasets according to Eq. (12) where A_i(t) and B_i(t) follow OU processes with mean μ = 0, volatility $\sigma =\frac{\pi }{10}\sqrt{\frac{4}{\pi {T}_{2}}}$, and reversion speed θ = 1/T₂, where T₂ = 256 sec is the coherence time of the signal. In addition, we fixed T_tot = 2T₂ and Δt = 1 sec. We tested the performance of M_DL as a function of the frequency difference, Δ, in comparison to M_FB and M_corr. In M_FB the maximal log-likelihood was calculated over the random OU processes. For each string of measurement results x, we considered the single frequency signal with Δ = Δ₀ = 0 and the signal of two near-by frequencies with Δ = Δ_n > 0, where Δ_n corresponds to the numerical value of the frequency difference between the two frequencies. We generated many sets of random OU processes denoted by O_k and calculated

$${L}_{1}=\mathop{{\rm{\max }}\,}\limits_{{O}_{k}}L({O}_{k}|x,{\Delta }_{0}),\,{L}_{2}=\mathop{{\rm{\max }}\,}\limits_{{O}_{k}}L({O}_{k}|x,{\Delta }_{n}),$$

(13)

where

$$L({O}_{k}|x,{\Delta }_{i})=\sum _{j}\,({x}_{j}\,\log \,P({t}_{j},{\Delta }_{i},{O}_{k})+(1-{x}_{j})\,\log \,(1-P({t}_{j},{\Delta }_{i},{O}_{k}))).$$

(14)

We estimated the signal as a single frequency signal or as a signal of two frequencies according to the larger likelihood; that is

$${\Delta }_{est}=(\begin{array}{ll}{\Delta }_{0} & {L}_{1} > {L}_{2}\\ {\Delta }_{n} & {\rm{otherwise}}.\end{array}$$

(15)

Figure 9 (left) shows the error probability as a function of the frequency difference. The M_DL results were better than the results of M_corr as well as the results of M_FB. Interestingly, even though M_FB has full knowledge of the noise model it achieves a larger error probability than M_DL. We note that increasing the number of OU processes, O_k, in the above likelihood calculation does not improve ${P}_{{M}_{FB}}$. While M_DL and M_corr could reach a result within ~45 min, M_FB did so within ~7 hours (CPU times, both considered on the same common PC without utilizing GPU). The M_DL ROC curves and AUC are shown in Fig. 9 (right). These numerical results provide a strong indication that DL methods can potentially identify molecules based on their NMR signal extremely fast, which may be a useful tool in probing chemical reactions at the nano scale.

Theoretical implications

While the numerical advantages of machine learning methods were already shown^28,29, their theoretical value was not demonstrated before. Beyond the practical interest of utilizing machine learning methods in the nano-NMR frequency resolution problem, machine learning methods, and in particular DL methods, could also have a considerable theoretical value.

Generally in estimation problems, the MSE of an estimator M for a given unseen test input x can be written as

$${\rm{E}}[{(y-M(x))}^{2}]={({\rm{Bias}}[M(x)])}^{2}+{\rm{Var}}[M(x)]+{\rm{Var}}[\epsilon ],$$

(16)

where y is the true label, M(x) is the estimated label, E is the expectation value with respect to the training set, the bias of M is given by ${\rm{Bias}}[M(x)]={\rm{E}}[M(x)]-M(x)$, the variance of M is given by ${\rm{Var}}[M(x)]={\rm{E}}[M{(x)}^{2}]-{\rm{E}}{[M(x)]}^{2}$, and ${\rm{Var}}[\epsilon ]$ is the irreducible error due to the (zero mean) noise $\epsilon $. The error probability, P_M, is then obtained by the expectation value of the MSE, ${\rm{E}}[{(y-M(x))}^{2}]$, with respect to the test set.

An unbiased estimator is an estimator M for which we have that ${\rm{Bias}}\,[M(x)]=0$. An optimal unbiased estimator has a minimal variance, which is known as the minimum variance unbiased (MVU) estimator. However, from Eq. (16) it is seen that an MVU estimator is not necessarily an optimal estimator which minimizes the MSE. Indeed, it is known that biased methods can outperform the unbiased ones^30,31,32. In this case the magnitude of the bias is increased and ${({\rm{Bias}}[M(x)])}^{2} > 0$, but the variance ${\rm{Var}}\,[M(x)]$ is significanlty decreased such that the MSE is smaller than the MSE of an MVU estimator. Such strategies of error reduction are used ubiquitously in image restoration^33,34 and beamforming applications^35,36. Moreover, it is known that biased methods can be superior in various spectral analysis applications³⁷. Despite its superiority there are only a few structured methods in which such a biased estimator can be constructed³¹ and in most cases the search for such estimators is extremely challenging, especially as it is unknown if such an estimator exists.

Our numerical analysis of M_FB has converged to the final result, however, the method has resolved the two frequencies with a higher error rate than the error rate of M_DL. Since M_FB is an MVU estimator, our results indicate that for the model at hand the unbiased full model Bayesian analysis is not optimal, and that a superior biased method exists. This brings up an extra advantage of DL as the search for a biased method in usually done in an ad-hoc manner. Moreover, in most cases there is no way of knowing if a superior method to the unbiased method exists. Hence, our results provide some hope that DL methods could be used as an analytical tool for identifying superior estimators, and in particular, for identifying the ultimate limits of resolution problems.

Conclusion

In conclusion, we showed that DL methods are able to mitigate the effect of the inherent strong noise in the nano-NMR settings. In particular, the DL neural networks effectively learn the noise model, even when no prior knowledge on the noise model is assumed. This is a crucial property of the DL methods as in many realistic nano-NMR scenarios the noise model is complex and not accurately known. We investigated the performance of DL methods in the problems of frequency discrimination and frequency resolution. We showed that DL methods can outperform Bayesian methods when full knowledge of the noise model is not available and that DL methods can analyze a test signal as accurately as numerically demanding Bayesian methods even though Bayesian methods have full knowledge of the noise model, and the DL methods have no prior knowledge at all.

DL methods can perform better than Bayesian methods when the noise model is not precisely known or when the noise model is known but it is a complex model. In the first case DL methods can achieve better results than Bayesian methods as DL methods do not assume prior knowledge on the model while Bayesian methods rely on precise knowledge of the model. This was demonstrated in the case of frequency discrimination in the noisy scenario, as well as in the analysis of the experimental data. In the second case the results of both methods may be similar, but the consumption of computational resources of Bayesian methods can be much larger compared to the resources required by DL methods, as was demonstrated in the problem of frequency resolution of noisy signals.

Our results can be seen as a strong indication that DL methods will turn out to be the method of choice when analyzing spectroscopic nano-NMR data. In addition, our results indicate that DL methods could be utilized as a tool that may enable to identify superior biased estimators and ultimate limits of resolution problems, which are otherwise difficult to obtain¹².

Data availability

The authors declare that all relevant data supporting the findings of this study are available within the paper (and its Supplementary Information Files).

References

Balasubramanian, G. et al. Nanoscale imaging magnetometry with diamond spins under ambient conditions. Nat. 455, 648–651 (2008).
Article ADS CAS Google Scholar
Gruber, A. et al. Scanning confocal optical microscopy and magnetic resonance on single defect centers. Sci. 276, 2012–2014 (1997).
Article CAS Google Scholar
Maze, J. et al. Nanoscale magnetic sensing with an individual electronic spin in diamond. Nat. 455, 644–647 (2008).
Article ADS CAS Google Scholar
Staudacher, T. et al. Nuclear magnetic resonance spectroscopy on a (5-nanometer) 3 sample volume. Sci. 339, 561–563 (2013).
Article ADS CAS Google Scholar
Mamin, H. et al. Nanoscale nuclear magnetic resonance with a nitrogen-vacancy spin sensor. Sci. 339, 557–560 (2013).
Article ADS CAS Google Scholar
Müller, C. et al. Nuclear magnetic resonance spectroscopy with single spin sensitivity. Nat. communications 5 (2014).
DeVience, S. J. et al. Nanoscale nmr spectroscopy and imaging of multiple nuclear species. Nat. nanotechnology 10, 129 (2015).
Article ADS CAS Google Scholar
Lovchinsky, I. et al. Nuclear magnetic resonance detection and spectroscopy of single proteins using quantum logic. Sci. 351, 836–841 (2016).
Article ADS MathSciNet CAS Google Scholar
Schmitt, S. et al. Submillihertz magnetic spectroscopy performed with a nanoscale quantum sensor. Sci. 356, 832–837 (2017).
Article ADS CAS Google Scholar
Boss, J., Cujia, K., Zopes, J. & Degen, C. Quantum sensing with arbitrary frequency resolution. Sci. 356, 837–840 (2017).
Article ADS CAS Google Scholar
Bucher, D. B. et al. High resolution magnetic resonance spectroscopy using solid-state spins. arXiv preprint arXiv:1705.08887 (2017).
Rotem, A. et al. Limits on spectral resolution measurements by quantum probes. arXiv preprint arXiv:1707.01902 (2017).
Zaiser, S. et al. Enhancing quantum sensing sensitivity by a quantum memory. Nat. Commun. 7 (2016).
Villmann, T. & Merényi, E. Machine learning approaches and pattern recognition for spectral data. In Proceedings of the 16. European Symposium on Artificial Neural Networks ESANN 2008, 433–444 (D-Side Publications, 2008).
Howley, T., Madden, M. G., O’Connell, M.-L. & Ryder, A. G. The effect of principal component analysis on machine learning accuracy with high-dimensional spectral data. Knowledge-Based Syst. 19, 363–370 (2006).
Article Google Scholar
Carrara, E. A., Pagliari, F. & Nicolini, C. Neural networks for the peak-picking of nuclear magnetic resonance spectra. Neural Networks 6, 1023–1032 (1993).
Article Google Scholar
Corne, S. A., Johnson, A. P. & Fisher, J. An artificial neural network for classifying cross peaks in two-dimensional nmr spectra. J. Magn. Reson. (1969) 100, 256–266 (1992).
Article CAS Google Scholar
Klukowski, P., Walczak, M. J., Gonczarek, A., Boudet, J. & Wider, G. Computer vision-based automated peak picking applied to protein nmr spectra. Bioinforma. 31, 2981–2988 (2015).
Article CAS Google Scholar
Klukowski, P. et al. Nmrnet: a deep learning approach to automated peak picking of protein nmr spectra. Bioinforma. 34, 2590–2597 (2018).
Article CAS Google Scholar
Li, H. & Misra, S. Long short-term memory and variational autoencoder with convolutional neural networks for generating nmr t2 distributions. IEEE Geosci. Remote. Sens. Lett. 16, 192–195 (2018).
Article ADS Google Scholar
Li, H. & Misra, S. Prediction of subsurface nmr t2 distributions in a shale petroleum system using variational autoencoder-based neural networks. IEEE Geosci. Remote. Sens. Lett. 14, 2395–2397 (2017).
Article ADS Google Scholar
Herzog, B. E., Cadeddu, D., Xue, F., Peddibhotla, P. & Poggio, M. Boundary between the thermal and statistical polarization regimes in a nuclear spin ensemble. Appl. Phys. Lett. 105, 043112 (2014).
Article ADS Google Scholar
Staudacher, T. et al. Probing molecular dynamics at the nanoscale via an individual paramagnetic centre. Nat. communications 6 (2015).
Jelezko, F. & Wrachtrup, J. Single defect centres in diamond: A review. physica status solidi (a) 203, 3207–3225 (2006).
Article ADS CAS Google Scholar
Schirhagl, R., Chang, K., Loretz, M. & Degen, C. L. Nitrogen-vacancy centers in diamond: nanoscale sensors for physics and biology. Annu. review physical chemistry 65, 83–105 (2014).
Article ADS CAS Google Scholar
Doherty, M. W. et al. The nitrogen-vacancy colour centre in diamond. Phys. Reports 528, 1–45 (2013).
Article ADS CAS Google Scholar
de Lange, G., Wang, Z. H., Ristè, D., Dobrovitski, V. V. &Hanson, R. Universal dynamical decoupling of a single solid-state spin from a spin bath. Sci. 330, 60–63, https://science.sciencemag.org/content/330/6000/60, https://doi.org/10.1126/science.1192739 (2010).
Article ADS CAS Google Scholar
Santagati, R. et al. Magnetic-field-learning using a single electronic spin in diamond with one-photon-readout at room temperature. arXiv preprint arXiv:1807.09753 (2018).
Granade, C. E., Ferrie, C., Wiebe, N. & Cory, D. G. Robust online hamiltonian learning. New J. Phys. 14, 103013 (2012).
Article ADS MathSciNet Google Scholar
Efron, B. Biased versus unbiased estimation. Adv. Math. 16, 259–277 (1975).
Article MathSciNet Google Scholar
Eldar, Y. C. et al. Rethinking biased estimation: Improving maximum likelihood and the cramér–rao bound. Foundations Trends Signal Process. 1, 305–449 (2008).
Article Google Scholar
James, W. & Stein, C. Estimation with quadratic loss. In Breakthroughs in statistics, 443–460 (Springer, 1992).
Demoment, G. Image reconstruction and restoration: Overview of common estimation structures and problems. IEEE Transactions on Acoust. Speech Signal Process. 37, 2024–2036 (1989).
Article Google Scholar
Meng, L. & Clinthorne, N. H. A modified uniform cramer-rao bound for multiple pinhole aperture design. IEEE Transactions on Med. Imaging 23, 896–902 (2004).
Article CAS Google Scholar
Cox, H., Zeskind, R. & Owen, M. Robust adaptive beamforming. IEEE Transactions on Acoust. Speech, Signal Process. 35, 1365–1376 (1987).
Article ADS Google Scholar
Carlson, B. D. Covariance matrix estimation errors and diagonal loading in adaptive arrays. IEEE Transactions on Aerosp. Electron. systems 24, 397–401 (1988).
Article ADS Google Scholar
Stoica, P. & Moses, R. L. Introduction to spectral analysis, vol. 1 (Prentice hall Upper Saddle River, N. J., 1997).

Download references

Acknowledgements

A.R., N.A. have received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 770929 ERC consolidator grant QRES and the collaborative projects ASTERIQS and Hyperdiamond. F.J. acknowledges the support of BMBF, ERC, VW Stiftund BW Stiftung and ASTERIQS.

Author information

Authors and Affiliations

Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, 91904, Givat Ram, Israel
Nati Aharon, Amit Rotem, Alex Retzker & Zohar Ringel
Institute for Quantum Optics, Ulm University, Albert-Einstein-Allee 11, Ulm, 89081, Germany
Liam P. McGuinness & Fedor Jelezko

Authors

Nati Aharon
View author publications
You can also search for this author in PubMed Google Scholar
Amit Rotem
View author publications
You can also search for this author in PubMed Google Scholar
Liam P. McGuinness
View author publications
You can also search for this author in PubMed Google Scholar
Fedor Jelezko
View author publications
You can also search for this author in PubMed Google Scholar
Alex Retzker
View author publications
You can also search for this author in PubMed Google Scholar
Zohar Ringel
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A. Retzker and Z.R. supervised the project. N.A., A. Retzker and Z.R. performed the theoretical analysis. N.A. and Z.R. performed the deep learning and machine learning analysis. N.A. and A. Rotem carried out the spectral resolution numerical analysis. L.P.M. and F.J. conducted the experiment. N.A. took the lead in writing the manuscript with support from A. Retzker and Z.R.

Corresponding author

Correspondence to Nati Aharon.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Infromation

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Aharon, N., Rotem, A., McGuinness, L.P. et al. NV center based nano-NMR enhanced by deep learning. Sci Rep 9, 17802 (2019). https://doi.org/10.1038/s41598-019-54119-9

Download citation

Received: 05 June 2019
Accepted: 07 November 2019
Published: 28 November 2019
DOI: https://doi.org/10.1038/s41598-019-54119-9

This article is cited by

Quantum neural networks with multi-qubit potentials
- Yue Ban
- E. Torrontegui
- J. Casanova
Scientific Reports (2023)
Parallel detection and spatial mapping of large nuclear spin clusters
- K. S. Cujia
- K. Herb
- C. L. Degen
Nature Communications (2022)
A machine learning approach to Bayesian parameter estimation
- Samuel Nolan
- Augusto Smerzi
- Luca Pezzè
npj Quantum Information (2021)
Learning models of quantum systems from experiments
- Antonio A. Gentile
- Brian Flynn
- Anthony Laing
Nature Physics (2021)
Deep learning enhanced individual nuclear-spin detection
- Kyunghoon Jung
- M. H. Abobeih
- Dohun Kim
npj Quantum Information (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Frequency Discrimination

The physical model

Full bayesian method

Deep learning method

Numerical analysis

Experimental verification

Comparison to other machine learning methods

Frequency resolution

Numerical analysis

Theoretical implications

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links