Quantitative nanoscale MRI with a wide field of view

Novel magnetic sensing modalities using quantum sensors or nanoscale probes have drastically improved the sensitivity and hence spatial resolution of nuclear magnetic resonance imaging (MRI) down to the nanoscale. Recent demonstrations of nuclear magnetic resonance (NMR) with paramagnetic colour centres include single molecule sensitivity, and sub-part-per-million spectral resolution. Mostly, these results have been obtained using well-characterised single sensors, which only permit extended imaging by scanning-probe microscopy. Here, we enhance multiplexed MRI with a thin layer of ensemble spin sensors in an inhomogeneous control field by optimal control spin manipulation to improve ensemble sensitivity and field of view (FOV). We demonstrate MRI of fluorine in patterned thin films only 1.2 nm in thickness, corresponding to a net moment of 120 nuclear spins per sensor spin. With the aid of the NMR signal, we reconstruct the nanoscale depth distribution of the sensor spins within the substrate. In addition, we exploit inhomogeneous ensemble control to squeeze the point spread function of the imager to about 100 nm and show that localisation of a point-like NMR signal within 40 nm is feasible. These results pave the way to quantitive NMR ensemble sensing and magnetic resonance microscopy with a resolution of few ten nanometers.


Rabi profile
The Rabi frequencies of an NV are given by the component of the driving MW field perpendicular to the defect axis h111i. If the amplitude of this component is B 1 , the effective amplitude reduces to B 1 /2 in the rotating wave approximation. In the following, we derive the Rabi frequencies within the field of the wire, using the following definitions: I : peak current in the wire,Î I I : direction of the wire, r r r : position of the NV, r r r w : a reference point on the wire.
A point on the wire may be given by r r r 0 = r r r w + sÎ I I and the minimal distance vector pointing from the closest point on the wire to the NV is r r r min = r r r r r r w s minÎ I I = r r r r r r w + ((r r r w r r r) · · ·Î I I)Î I I. where Unit[.. .] denotes taking the unit vector.
Within the geometry of the experiment (lab frame), we can assume the following vectors and directions: r r r w = (0, 0, r w ) T , r r r = (0, y, 0) T , and (5) whereẑ z z NV is the unit vector in direction of the NV axis. Using these definitions, we arrive at the following expressions for the parallel and perpendicular components of B B B at the position of a given NV: B ? (y) = µ 0 I 2 p 2p · p 2r 2 w + y 2 (r 2 w + y 2 ) .
We can interpret B ? as the result of the peak flux of an AC field created by the microwave currents. At the current frequencies and distances, retarded components of the field are of order O(10 3 ) and are neglected. The B 1 term in the microwave Hamiltonian can be replaced by B ? , which results in the following expression for the Rabi frequency at distance y from the wire: W 2 0 (y) + D 2 (9) = q (gB ? (y)/2) 2 + D 2 (10) where D is a detuning of the microwave from the NV spin transition. The Rabi profiles in our experiment were well described by this expression, when leaving D as a free parameter for an effective detuning. At low field strengths corresponding to Rabi frequencies  10 MHz, further corrections due to the distribution of hyperfine coupling to 13 C nuclei may become necessary.

Sensitivity with finite pulse durations
The phase accumulated due to a time-dependent magnetic field during an interferometric sequence is given by where g is the gyromagnetic ratio of the sensor, B z is the component of the field along the NV axis and g is the time-domain sensitivity function generated by the decoupling pulses in the present case. Ideally, g switches between 1 and +1 with every 2/9 switch of sign of B z to lead to a maximum accumulation of phase. In the frequency domain, g can be expressed via its Fourier transform G( f ). This transformation is detailed in the following. We can decompose g(t) into a few building blocks: Here, the periodicity of g(t) is described using two Dirac combs III O t and III E t , which we use to denote Dirac combs which only include terms at the odd and even multiples of t, respectively. The Dirac combs define the center of each sensing interval of duration t 0 , represented by the convolution (⇤) with a unit step P t 0 . In this model, there is no phase accumulation during the pulse when t 0 < t. The overall duration of the sequence is obtained by multiplying (⇥)with a unit step of duration T . Application of the Fourier transform to g(t) results in where the time shifting property of the Fourier transform was applied and the Dirac comb III E t which contained only even terms was replaced with the comb III 2t of twice the period, but all terms. Next, the convolution can be evaluated: Due to the properties of the d -distribution, the convolution resulted in evaluating the terms at the frequencies k/2t given by the Dirac comb III (2t) 1 . Notably, all terms with even k vanish and relabeling k 7 ! 2k + 1 results in The sum now extends over the different harmonic orders of the filter, with k 2 { 1, 0} giving the main lobes in the frequency domain at ±(2t) 1 . The acquired phase can be expressed as F = with the inverse Fourier transform F 1 . The factors including t 0 do not include f and result in scaling factors to the phase due to the linearity of the Fourier transform. Hence, the relative change in acquired phase compared to the instantaneous p-pulse (for which t 0 = t), is given by the expression from the main text, k = t 0 t (sinc( t 0 2t )/sinc( 1 2 )). The relative contrast in phase noise spectroscopy is given by exp( hF 2 i/2), where hF 2 i µ B 2 rms . The relative change in accumulated phase reflects on B rms and in order to obtain the same relative contrast, B rms needs to increase by a factor of 1/k. On the other hand, since exp( k 2 x) ⇡ 1 k 2 x, a reduction of the acquired phase by k reduces the change in the signal for a given B rms by k 2 . In the shot noise limit, this requires an increase of the measurement resources by 1/k 4 .

Scaling of pulse error with duration of the optimal control pulses
For a given combination of detunings and Rabi inhomogeneities, we optimized p rotations across a range of durations and with random starting guesses. In Fig. S1 we show the error (defined as 1 -fidelity) of the best optimal control pulses out of several different ensembles obtained via our GRAPE implementation. The ensembles used for optimization covered detunings due to the hyperfine coupling strengths of both, 14 N and 15 N in the NV, and a range of driving strength inhomogeneities b = W/20 MHz as indicated by the legend. The step size of b was 0.05. The starting guesses for the controls were random, but the bandwidth was limited to 170 MHz throughout the optimization. The solid lines indicate the lowest error (pulses with best fidelity) obtained up to a given duration and indicate the lower boundary for the pulse error. For the blue lines, the lower value of the range of b decreases until a value of 0.3 is reached. The larger end of the b -range increases steadily across the sets of pulses. Notably, the lower boundary of the error increases strongly until b = 0.3 is the smallest driving strength in the ensemble. The increase of the upper boundary of b can be absorbed without major increases of the error. In other words, synthesis of the p rotation is most expensive at low driving strength, presumably due to competition with the drift part of the Hamiltonian. In the pulses shown here, the error of the best results decays approximately exponentially with the duration of the pulse. 8% of all pixels. The remaining pixels were outliers, especially in the corners of the field of view, where the laser intensity was lower due to the illumination beam profile. Typically, fits with normal distributions to these histograms resulted in standard deviations of less than 2 MHz, corresponding to an inhomogeneity of the static magnetic field of about 70 µT across the field of view. The inhomogeneity in resonance frequencies was dominated by a remaining gradient of the static magnetic field across the field of view, such that the histogram of resonance frequencies need not be normal, but may change depending on the orientation of the static gradient. For the Rabi readout method, the gradients were extracted locally over a width of a few micrometers only, minimizing the effect of static detuning on the Rabi frequencies.

Integrals over decay data
In the following we motivate the use of the integrated T 2 decay data (S in the main text) as a benchmark for the efficacy of dynamical decoupling. In the regime where dynamical decoupling fails, little signal-to-noise is left in the acquired data and the results extracted by least squares fitting routines become volatile. While robust estimators like the median data point along a decay enable qualitative comparison between well controlled and badly controlled subensembles, the integral could also be used to estimate T 2 with relatively little additional requirements. We start by noticing that by normalizing our data using the Michelson contrast, there is practically no offset baseline in the data, i.e. the decay is always towards zero. If that were not the case, a robust estimate of the baseline would be necessary in the following discussion. Assume now, that we are able to continuously record an exponential decay with amplitude A and lifetime T (e.g. T 2 or T 1 ), and with additional noise term d . For simplicity, we integrate our record from a starting time t 0 0 to infinity:  where the integral over the noise in the second term vanishes for identically distributed noise with short autocorrelation times. For a finite record up to a maximal value t 1 , the noise needs to be reduced experimentally. If t 0 > 0, our record does not include the value of the amplitude A and we replace it in eq. (16) with with an appropriately scaled amplitude A = A t 0 exp(t 0 /T ), with a supporting point A 0 recorded at time t 0 t 0 , to obtain In the case t 0 = t 0 , we have T = I/A(t 0 ) and can extract the lifetime of the decay from the integral over our record and the first recorded value at t 0 .
In the main text, we used this expression and approximated the value of I by a Riemann-type midpoint summation over the discrete data points. In the following, we add additional remarks on the applicability and alternate implementations of this scheme and lower boundaries set on T extracted in this way.
For a choice of the supporting point t 0 > t 0 , i.e. at a point in the record other than the initial data point, we can estimate the decay time using the general solution to eq. (17), where W is the Lambert-W function. This expression is also valid when the integral (eq. 16) does not extent to infinity but to some finite value t 1 , such that t 0 < t 0  t 1 , which is naturally the case for experimental data. The Lambert-W evaluates to real values only for real arguments 1/e, such that generally certain combinations of recorded values may lead to unphysical results. Examples include errors of the recorded values and if the actual physical mechanism results in deviations from a monoexponential decay.
In the discrete case, we replace the integral in (eq. 16) with a sum over the set of N data points (t k , y k ), k = 1,. .., N, to find: where D k is the 'bin size' occupied by data point k, i.e. the spacing between data points in case of equally spaced data. In this example, a Riemann-type midpoint summation is used as a robust approximation to the integral based on the measured data. To extract T from this equation, an arbitrary data point (t m , y m ) can be chosen as the reference data point for the overall amplitude, but in practice one should choose t m ⌧ T , such that y m is sufficiently larger than zero. The term t m /T in the exponent within the sum can then be neglected. The relaxation time T can in both cases, continuous and discrete, be evaluated from the integral over the recorded data and one reference data point, or a small subset of reference data points. The choice of these 'special' data points (at t 0 and t m , respectively), or generally the estimate of the amplitude A, can have great influence on the overall result, especially due to noise. Since short evolution times also result in short measurements, a high number of repetitions can be chosen to improve the signal-to-noise ratio of the reference data. Due to the averaging effect of the integral or sum, higher noise levels can then be tolerated for the remaining data points.
The implicit and explicit simplifications and assumptions made above almost exclusively lead to an underestimation of T , such that the values extracted in this way represent a lower boundary. It should be noted, that any underestimation of the integral over data results in an underestimation of the T value and that all of the following deviations from the ideal monoexponential model result in such a underestimation of the integral.
Mostly, a monoexponential decay does not describe the shape of the experimentally observed decay curves well, and usually a stretched exponential decay, exp( (t/T ) b ), is applied. If we evaluate the integral over the stretched exponential form, we obtain The Gamma function in this expression introduces an additional factor G ⇡ 1 for the relevant range of 0.5 < b < 2. In the case of dense ensembles of shallow NVs, we have so far only observed b < 1, for which I b & I. Hence, neglecting b leads us to underestimate the value of T 2 . The situation changes if we start our integration at a finite time t 0 , such that the altered shape of the decay compared to a monoexponential decay affects the integral over the data: where G(a, z) = R • z t a 1 e t dt is the incomplete Gamma function. For t 0 ⌧ T , the additional factor due to G/b for b 6 = 1 may drop to 0.9 around b ⇡ 0.7 and further reduces as t 0 /T increases. By including an appropriate correction factor into the analysis, overestimation of T can be prevented. Alternatively, b can also be estimated by using several reference points.
Additional modulations of the signal, like the dips observed due to the 13 C spin bath, result in a reduction of the integral or sum over the signal. This also applies to the maximal t used in measurements, which is finite and truncates the integral. The error in estimating T in the case of truncation scales like the integral in equation 16, i.e. it decays exponentially. In the case of a stretched exponential decay with b  1, the Riemann summation underestimates the integral. Finally, the number and choice of measurement times t k influences the accuracy of the estimate. We typically use quadratically increasing free evolution times, such that there is a higher density of data points where the slope of the decay is highest. This improves accuracy of the result compared to using the same number of equally spaced data points, while also reducing the measurement time. The error of the sum due to noise decreases as p N.

Drift correction and upsampling
The relative translation u u u = (u x , u y ) between the frames recorded during the measurements was determined by pairwise cross-correlation. Taking a set of N frames, there are N 2 possible pairs of frames, where N k = N! k!(N k)! . In total, there are 2 N 2 equations for the 2N coordinates x i , y i : u u u i j = r r r j r r r i , where r r r i = (x i , y i ) are the absolute positions of the frames within a common coordinate system and u u u i j is the shift vector determined from the cross correlation. Admitting for noise in the counts at each pixel, the vectors u u u i j will carry an error. N k is weakly lower bounded by (N/k) k , such that the number of equations scales more strongly than N 2 for k = 2. The number of coordinates only increases to 2N, such that the system of equations gets overdetermined and a least-squares solution to the equation A A A · x x x = b b b, made up from the relative positions determined by pairwise cross-correlation, improves the estimated coordinates (unless there are significant systematic errors in the u u u i j ).
To keep the computational problem within reasonable bounds, we did not obtain the cross-correlation for all frames of a measurement, but only a subset of 16 frames which served as 'ground truth' reference to which all other frames were aligned. The reference frames were evenly spaced along the whole duration of the measurement and were taken from different combinations of the experimental parameters (interpulse distance t used for locking onto the NMR signal, duration t R of final Rabi drive, and direction of axis +s x or s x along which the Rabi drive was applied). This diverse set of conditions was chosen to mitigate systematic effects in the cross-correlation which might have been caused by common patterns in NV brightness for similar parameters. We determined the pairwise cross-correlations between this set of frames at a resolution of 1/20th of a pixel, or about 8 nm, using the skimage.feature.register translation function of the scikit-image Python 6/9 frames used in analysis 24 hours Figure S3. Drift The drift, measured in pixels, of the optical microscope determined from cross-correlation of the measurement frames with a set of reference frames. Only the set of frames obtained after a stabilization phase were taken for analysis of the Rabi readout methods explained in the main text. library 1 and obtained the least-squares solution for the relative positions of the reference frames. Afterwards, we determined the relative shift of each frame from the measurement against each of the reference frames and took the average of the 16 positions obtained for a given frame to determine its position within the coordinate system established by the reference frames. Finally, we resampled the frames using cubic splines, upsampling by a factor of two during the process, i.e. doubling the number of pixels along each dimension x and y of the frames. Fig. S3 shows the drift of the optical microscope as determined by our analysis. The data used in analysis of the Rabi readout were obtained during a day of total measurement time (marked in the plot) during which we determined only sub-pixel drift which was corrected for during resampling.

Periodograms of the Rabi oscillations
Periodograms are estimators for the power spectrum of a signal. Fig. 3c in the main text shows classical periodograms of the Rabi oscillations obtained from Fast Fourier Transforms (FFT), which were interpolated between the raw data points by zero-padding the time-domain signal. The classical periodogram is where N is the number of data points g n sampled over a total duration T at evenly spaced times t n = T /N. The FFT captures the complete frequency information when evaluated at frequencies f k = n/T , with n = N/2 to N/2. Zero-padding the data does not improve spectral resolution, it merely adds intermediate frequencies at which eq. (20) is evaluated. These interpolated spectra were also the basis for the Rabi readout, i.e. integration over small bands within these spectra correspond to the 'filtered' data points in Fig. 3d. To show that the interpolated spectra behave well around their maxima, we fit the peaks of the interpolated spectra with squared normal distributions to extract the dominant frequency contribution in a line of pixels perpendicular to the gradient within the Rabi frequencies. We compared the frequencies obtained in this way to the Rabi frequencies obtained from least-squares fits to the time-domain data. As shown in Fig. S4, the shift of the frequencies was nearly identical for both procedures, corresponding to a consistent extraction of the gradient in both domains. We observed a small constant offset between the absolut values, which was smaller than the width of the band used for band-filtering the spectra. In addition, the remaining relative fluctuations between the frequency values was about 1/10th of the width of the band used for filtering the spectra. We took the absolute values from the frequency-domain fits as the centers of the band-filtering to obtain the contribution of NV centers at each position along the gradient in the Rabi readout method.

Simulation of Rabi Readout
We performed numerical simulations of the Rabi readout for a range of gradients and intrinsic line widths using experimental conditions, including a signal-to-noise ratio of 5. Here we show an example for |-W| = 5 MHz/µm. In this example, we simulated pixels of 80 nm in width, which may be obtained experimentally from supersampling during correction of frame drift. We generated a random distribution of NV coordinates and calculated the photon distribution due to the NVs with a Gaussian point spread function with standard deviation s opt = 165 nm. We modeled the NMR signal as a Gaussian dip in the NV response around 1.5 MHz. The NMR signal was only applied at the lower half of the simulated frames. We applied the analysis of the Rabi spectra as described in the main text and obtained a significant reduction in the observed step width to 40 nm compared to the expected result of s opt . Even without filtering, a slight improvement in step width to 144 nm was observed, which we attribute to a filtering effect of least-squares fitting to the data. The results of the simulation are shown in Fig. S5.  Figure S5. Simulation of Rabi readout In the simulation, an NMR signal was only applied at the lower half of the frame, with a sharp edge at y = 0. a, Diffraction limited image of randomly distributed NVs (red crosses). b Rabi osciilations modulated by NMR signal from a single pixel (orange: off resonance, blue: on resonance). c Signal strength of the traces in b, where t R corresponds to a p/2-pulse. Resonance was at f lock = 1.5 MHz. d MRI using Rabi readout. The NMR signal at the lower half of the frame results in reduced contrast. A sharp edge is observed. e MRI using the unfiltered signal. A smoother step is observed. f Comparison of step profiles with (orange) and without (blue) Rabi readout. The step width reduces to 40 nm with Rabi readout, compared to 144 nm without.