Introduction

Due to the wave nature of light, the spatial resolution of conventional far-field microscopes is fundamentally limited by the diffraction limit to approximately half the wavelength of the incident light, known as the Rayleigh criteria1 or Abbe limit2. Far-field super-resolution microscopy (SRM) techniques that aim at overcoming the diffraction limit could greatly impact the fields of biology, physics and chemistry, as well as device engineering, semiconductor industry, and could lead to novel applications3,4,5,6,7,8. The developed SRM techniques typically break one or more of the underlying fundamental assumptions on the nature of light-matter interaction within the optical system, under which the diffraction limit is derived. Specifically, it is assumed that the illumination intensity is homogenous, the optical response of the stationary object is linear, and all the optical fields in the system are classical. Recently, a plethora of novel super-resolution techniques, including stimulated emission depletion9, structured illumination microscopy10, photoactivated localization microscopy11, and stochastic optical reconstruction microscopy12 has been developed. All the aforementioned techniques are realized within classical optical systems via breaking the homogeneity, linearity, or stationarity assumptions.

Another promising route in the realization of SRM techniques is to take into account the quantum nature of light13,14,15,16,17. Recently, several quantum schemes, utilizing multimode squeezed light18 and generalized quantum states19 have been proposed. These approaches use complex quantum states of light as an illumination source, which demand highly efficient, deterministic sources of such quantum photons or entangled photon pairs. In contrast, several SRMs have been developed by relying on the quantum nature of the light emitted by the object itself. This approach is based on fact that some quantum sources of light produce emission with sub-Poissonian temporal photon statistics, which can be analyzed by measuring the autocorrelation function of the emission20. By analyzing the n-th order autocorrelation function at zero time delay \({g}^{\left(n\right)}(\tau=0)\) of nonclassical light emitted from a point source, it is possible to reduce the size of the effective point spread function by a factor of \(\sqrt{n}\)21,22,23.

The antibunching-based SRM can be coupled with a classical approach to further improve the resolution of the imaging system. By combining image scanning microscopy with the measurement of the second-order quantum photon correlation, a spatial resolution of four times beyond the diffraction limit was achieved24 with only a modest hardware overhead compared to regular confocal scanning microscopy. This combination makes the antibunching-based SRM technique a very attractive platform for imaging quantum light sources, as these are typically analyzed using confocal scanning microscopy. The main bottleneck of this framework is the time required for the acquisition of the time-resolved photon statistics needed to accurately determine the values of the autocorrelation function at zero delay. This accuracy depends on the number of registered correlated photon detection events. The time requirement scales up exponentially with the increasing order of the autocorrelation function. This exponential time cost makes antibunching based SRM techniques prohibitively difficult for in situ and live tissue samples, where despite a great potential of quantum metrology to image such samples25,26,27, biomarkers still exhibit limited “photon budgets.”28 Hence, in order to realize scalable and practical antibunching-based SRM, one needs to develop a fast and precise approach to determine \({g}^{(2)}(\tau=0)\).

Recently, convolutional neural networks (CNNs) enabled the rapid classification of quantum emitters depending on whether \({g}^{(2)}(0)\) is above or below a given threshold value based on sparse autocorrelation function measurements29,30,31. Leveraging on these results, we present a CNN-based regression model that allows an accurate estimation of the \({g}^{(2)}(0)\) value based on sparse data. Using the developed CNN model, we reduced the acquisition time in the antibunching-based scanning SRM technique by 12 times, thus marking an important step towards the practical realization of scalable quantum super-resolution imaging devices.

Results

Machine learning assisted antibunching super-resolution microscopy

The antibunching SRM technique relies on the detection of quantum correlations in the signal radiated by quantum emitters, which allows for a gain in the spatial resolution of a factor of \(\sqrt{n}\) by measuring n-th order autocorrelation function22. This fact can be understood by conducting a Gedanken experiment first proposed by Hell et al. 32. In the case of a hypothetical emitter that emits photons by pairs, an improvement in resolution can be theoretically obtained by sending each of the two photons to a separate camera. Since the two cameras will record two independent point-spread function (PSFs) estimates, the spatial resolution can be improved by a factor of \(\sqrt{2}\) via simple multiplication. However, instead of requiring the emitter to emit pairs of photons, one can acquire the same amount of information by assessing an absence of the two-photon correlation in single photon emission by measuring the second-order autocorrelation function. Furthermore, one can achieve an arbitrarily high improvement in resolution by measuring higher-order correlations in the emission of a single photon emitter. In the most general form, the intensity distribution of the super-resolved image based on antibunching SRM \({G}^{\left(n\right)}(x,y)\) can be obtained via retrieving spatial distributions of the n-th order autocorrelation function at zero time delay \({g}^{\left(n\right)}(x,y,\tau=0)\) and the number of detected photons \(\widetilde{N}(x,y)\)22:

$${G}^{\left(n\right)}(x,y) \sim {\left\langle \widetilde{N}(x,y)\right\rangle }^{n}\mathop{\sum }\limits_{i=1}^{i={i}_{\max }}{c}_{i}{\chi }_{i},$$
(1)

here \(\langle \widetilde{N}(x,y)\rangle\) is the average number of detected photon from a given point \(\left(x,y\right)\) of the sample; \({\chi }_{i}\) is a function of the product \({g}^{\left({j}_{1}\right)}\left(x,y,0\right){g}^{\left({j}_{2}\right)}\left(x,y,0\right)\ldots {g}^{\left({j}_{l}\right)}\left(x,y,0\right),\) where \({i}_{\max }\) is the number of ordered combinations, fulfilling the condition \(\mathop{\sum }\nolimits_{k=1}^{l}{j}_{k}=n\). For example, for \(n=2\) case, Eq. (2) takes the following simple form22:

$${G}^{\left(2\right)}(x,y) \sim {\left\langle \widetilde{N}(x,y)\right\rangle }^{2}\left(1-{g}^{\left(2\right)}(x,y,0)\right)$$
(2)

The most commonly used approach for retrieving the \({g}^{\left(2\right)}(0)\) value is a Hanbury-Brown-Twiss (HBT) interferometry measurement, composed of a beam-splitter directing the emitted light to two single-photon detectors connected to a correlation board (Fig. 1a). The correlation board registers events consisting of pairs of detector clicks. It then arranges these events into a histogram as a function of the time delay \(\tau\) between the clicks, which can be used for the post-processing via Levenberg-Marquardt fitting:

$${g}^{\left(2\right)}\left(\tau \right)=1-{a}_{1}{e}^{-\frac{\tau }{{t}_{1}}}+{a}_{2}{e}^{-\frac{\tau }{{t}_{2}}},$$
(3)
Fig. 1: General framework of the machine learning (ML) assisted antibunching SRM.
figure 1

Antibunching-based SRM image acquisition starts with an area of n by m pixels (a) and measures complete antibunching histograms via Hanbury-Brown-Twiss (HBT) interferometry at each pixel (b). The Levenberg-Marquardt (L-M) fit is done on each pixel’s HBT histogram to retrieve \({g}^{\left(2\right)}\left(x,y,0\right)\) distribution. Finally, the super-resolved image is constructed using Eq. 2 (d). Alternatively, ML-assisted approach relies on pre-trained CNN regression model, which allows to accurately predict \({g}^{\left(2\right)}\left(x,y,0\right)\) maps utilizing sparse HBT measurement data (c). The developed approach ensures at least 12 times speed-up compared with the conventional L-M fitting based antibunching SRM.

Here, \({a}_{j},{t}_{j},j={{{{\mathrm{1,2}}}}}\) are the fitting parameters related to the internal dynamics of the emitters. Figure 1b shows the main steps of the fitting-based approach for the realization of the antibunching SRM technique. The area of interest is divided into \(n\times m\) pixels, and autocorrelation histograms are acquired at each pixel. The autocorrelation measurement is performed for several minutes. The L-M fitting is done over all of the HBT histograms and the corresponding \({g}^{\left(2\right)}\left(x,y,0\right)\) map is retrieved. Finally, the resolved image is calculated via Eq. (2) (Fig. 1d).

In our demonstration, we use single nitrogen-vacancy (NV) centers in nanodiamonds dispersed on a coverslip glass substrate as single photon emitters. These emitters typically yield between 104 and 105 counts per second on each of the single-photon detectors in the HBT setup (when in focus) and exhibit fluorescence lifetimes between 10 and 100 ns. During the scan, when the emitters are partially out of focus, the fluorescence counts drop significantly. Consequently, in order to assess \({g}^{\left(2\right)}(0)\) via Levenberg-Marquardt (L-M) fitting with an uncertainty varying between ±0.01 to ±0.05, autocorrelation histogram acquisition times of 1 min are required per pixel. In the pulsed excitation regime, the fitting is not required to retrieve \({g}^{\left(2\right)}(0)\) as long as the pump repetition period is much longer than the emitter’s fluorescence lifetime. However, this requirement becomes somewhat impractical when the emitter lifetime is long as in the case of NV centers. The developed ML approach addresses the aforementioned problem by rapidly estimating the \({g}^{\left(2\right)}\left(x,y,0\right)\) values based on sparse HBT measurement. The main framework of the developed approach is shown in Fig. 1c. A CNN regression network is trained on a set of “sparse” autocorrelation data with short acquisition times (see the Methods section). Once trained, the CNN network estimates the \({g}^{\left(2\right)}(0)\) values, requiring an acquisition time of less than 10 s.

Machine learning assisted autocorrelation function measurement

The main building block of our ML assisted antibunching SRM technique is the CNN based regression model, used for retrieving \({g}^{(2)}(0)\) values. In this section, we highlight the structure of the CNN, its training and testing, as well as compare its performance against conventional L-M fitting. The training dataset for sparse second-order autocorrelation histograms consists of measurements performed on a set of 40 randomly dispersed nanodiamonds with NV centers on a coverslip glass substrate. Figure 2a shows the schematics of the HBT setup used for these measurements. Two avalanche detectors (D1, D2) with 30 ps jitter are connected to a pulse correlator with a 4 ps internal jitter. The co-detection events are recorded over a range of 500 ns and collected into 215 equally sized time bins. For each of the 40 emitters, hundreds of sparse autocorrelation histograms with 1 s acquisition time are collected, until the total number of co-detection events in their sum allows a precise ground truth (\({g}^{(2)}(0)\)) estimation via L-M fitting with fitting uncertainty varying between ±0.01 to ±0.05. The estimated ground truth value is then assigned as a label to the entire set of 1 s histograms. We then formed all the possible combinations of 1 to 10 of these 1 s histograms to obtain training data that emulated histograms with acquisition times from 1 s to 10 s. Such a data augmentation process assumed that the emission is a process with no memory over times exceeding 1 s and allowed us to significantly extend the training dataset. More information on the training dataset collection process and augmentation is described in the Methods section. Additional details on the CNN structure and training can be found in Supplementary Section 1, Table S1 and Figure S1.

Fig. 2: Machine learning assisted measurement of g(2)(0).
figure 2

a Schematics of the HBT interferometer. Labels: DM dichroic mirror, LPF long-pass filter, BS beam splitter, D1/D2 detectors. b Schematics of the CNN regression network. The input layer takes in sparse HBT histograms. The total number of events, Nevents, of the histogram is concatenated to the output of the feature learning part and used as a regularization term. c, d Regression plot (predicted vs expected \({g}^{\left(2\right)}\left(0\right)\) values) for L-M fitting (c) and CNN regression of \({g}^{\left(2\right)}\left(0\right)\) (d) from 5s datasets. Dots show the average predicted \({g}^{\left(2\right)}\left(0\right)\) value, while error bars show the standard deviation of the predicted value over all the 5s datasets acquired for a given emitter.

Figure 2b shows the structure of the CNN used for \({g}^{(2)}(0)\) regression. The CNN consists of one input layer, three hidden convolutional layers, one max-pooling layer followed by dropout, three fully connected layers, and one output node containing the regression result. The input layer had 215 nodes corresponding to the number of bins in the input histogram. The feature learning part of the CNN is optimized to capture the salient features of the autocorrelation datasets, while the regression part is trained to predict \({g}^{(2)}(0)\) values based on these extracted features. All the hidden layers were comprised of 260 filters. The third hidden layer’s output is connected with the max-pooling layer, followed by the dropout layer. The kernel size of the filters (4) is chosen to be the same for each layer. Importantly, the CNN takes the total number of two-photon detection events Nevents in the histogram as an additional input. Nevents is concatenated to the output of the feature learning part and used as a regularization term during the training process. The 5s-10s histograms acquired on pixels where the contribution of the quantum emission to the total counts is negligible, feature Nevents < 4, while the histograms on areas close to the quantum emitter locations feature Nevents = 65 on average. To populate the “dark” pixels, the CNN regression network is implicitly biased to produce \({g}^{(2)}(0)=1\), on the datasets with Nevents < 4 counts. Supervised training of the CNN regression model was performed using the augmented dataset of 5s-10s sparse HBT histograms and the corresponding ground truth labels. The training process is realized by performing adamax gradient descent optimization using the Keras library33 for 100 epochs with mean absolute percentage error loss function. 80% of the dataset is used for training, while the remaining 20% are used for validation and testing.

The performance of the trained CNN regression model is assessed via calculating the mean absolute percentage error (MAPE) and the coefficient of determination (r2) on the 5 s histogram datasets. Figure 2c shows the regression plot of the L-F fitting performed on 5 s HBT histograms.

Markers show the average value of the prediction, while error bars show the standard deviation over the set of 5 s histograms belonging to the same emitter. Due to the sparsity of the HBT measurement, the L-M fitting expectedly cannot ensure precise fitting of the data, which results in MAPE = 32%, r2 = 70% and root mean square error (RMSE) of 0.215.

In contrast, the CNN regression model ensures very precise predictions of the \({g}^{(2)}(0)\) values based on 5 s HBT histograms (Fig. 2d). Due to the ability of the CNN network to learn hidden correlations between signature features of the sparse datasets and the ground truth labels, the CNN regression model shows excellent performance on the sparse dataset and ensures low MAPE (5%), a high coefficient of determination of 93% and RMSE of 0.0018. The CNN performance is also robust against the reduction of the acquisition time. We analyze the performance of both approaches on 5 s, 6 s, and 7 s HBT datasets. The performance of the direct fitting ensures 30% and 27% MAPE when applied to 6 s and 7 s HBT measurements, respectively. The CNN regression model ensures performance that is much more robust than L-M fitting. It ensures 3.92% MAPE on 6 s HBT datasets and reaches up to 3.58% MAPE when applied to 7 s datasets.

Experimental realization of machine learning assisted antibunching super-resolution microscopy

The benchmarking of the ML-assisted regression of autocorrelation data enables the experimental demonstration of the ML-assisted antibunching SRM. The experiment is realized on a sample of randomly dispersed nanodiamonds with NVs on a glass substrate. In this demonstration, the objective is scanned using a piezo-stage with sub-10 nm resolution over the 775×775 nm2 region of interest, which is divided into 1024 (32×32) pixels and contains one nanodiamond with a single NV center. Autocorrelation measurements are performed on each pixel in 1 s time increments with a 7 s total acquisition time per pixel. Along with the autocorrelation data, the corresponding photoluminescence (PL) map is retrieved (Fig. 3a).

Fig. 3: Machine learning assisted antibunching SRM of a single NV center.
figure 3

 Photoluminescence (PL) distribution within the area of 32 by 32 pixels containing one NV center. b Cross section of the PL image (blue) along the dashed line in a and Gaussian fitting (dashed, black) with 310 nm FWHM. ce Results of L-M based antibunching SRM based on 7s HBT measurement: distribution of retrieved \({1-g}^{\left(2\right)}\left(x,y,0\right)\) (c); reconstructed image (d) and cross-section of \({G}^{\left(2\right)}\left(x,y\right)\) distribution of the L-M fitting-based image (blue) and Gaussian fitting (dashed, black) with 310 nm FWHM (e). fh Results of ML assisted antibunching SRM: \({1-g}^{\left(2\right)}\left(x,y,0\right)\) map (f); reconstructed image (d); and corresponding cross section of intensity distribution of reconstructed image (blue) and the Gaussian fitting (dashed, black) with 219 nm FWHM.

The cross-section of the diffraction-limited image, taken along the blue dashed line (Fig. 3a), is shown in Fig. 3b. Gaussian fitting of the intensity distribution yields a full width half maximum (FWHM\(=2\sqrt{2ln(2)}\sigma\)) of 310 nm. We therefore treat the PSF of the optical microscope as a Gaussian distribution with a FWHM of ~300 nm (see Supplementary Figure S2 for setup PSF characterization). By L-M fitting the 5 s sparse histograms of each pixel, the \({g}^{(2)}(x,y,0)\) map is retrieved. Due to the sparsity of the HBT histograms, the L-M fitting expectedly leads to a noisy reconstruction of the \({g}^{(2)}(x,y,0)\) distribution (Fig. 3c). Figure 3d shows the corresponding reconstructed image of \(G^{(2)}(x,y)\) (Eq. 3). The cross-section of the obtained image and corresponding fitting with the same \({{{{{\boldsymbol{\sigma }}}}}}\) value as of the original PL image are shown in Fig. 3e. Here we can see that the \({g}^{(2)}(x,y,0)\) obtained via L-M fitting leads to a noisy, blurred image without any gain in spatial resolution, which is a direct consequence of the inaccurate retrieval of the \({g}^{(2)}(x,y,0)\) In contrast, the CNN-based antibunching SRM ensures the expected √2 gain in resolution on sparse 7 s HBT scan. Figure 3 (f, g) show \({g}^{(2)}(x,y,0)\) distribution retrieved via using the pre-trained CNN (f) and corresponding super-resolved image (g). Here, we can see that ML-based framework ensures precise reconstruction of the \({g}^{(2)}(x,y,0)\) map, and as a result achieves a \(\sqrt{{2}}\) gain in the spatial resolution of the reconstructed image. Gaussian fitting of the cross-section distribution of the resolved image shows that ML assisted approach ensures a FWHM of 219 nm, which corresponds to \(\sigma_{CNN}=\sigma/\sqrt{2}\) (Fig. 3h).

Up until now, we have considered an acquisition time of 7 s per pixel. However, the robustness of the regression model indicates that the developed approach can be efficiently applied to more sparse datasets. Figure 4a–c shows the reconstructed images based on 5 s, 6 s, and 7 s HBT scans, respectively, and Fig. 4d compares their cross-sections, which appear stable against the reduction of the acquisition time. It is worth noting that the fitting-based approach requires at least 1 min of HBT measurement per pixel for precise retrieval of the \({g}^{(2)}(0)\) values, as it has been observed during dataset collection process (Section 2). This time requirement significantly depends on the properties of the single-photon emitters, e.g. quantum purity, lifetime, and emission rate, and can be significantly longer in the case of low emission rates of the emitter. Here, the developed ML-assisted anti-bunching approach ensures up to 12 times speed-up compared with the fitting-based approach.

Fig. 4: Robustness of the machine learning assisted SRM against the reduction of acquisition time.
figure 4

ac Resolved images obtained via applying ML assisted antibunching SRM on 5s (a), 6s (b), and 7s (c) sparse HBT scans. The Blue, dashed line shows the cross-section line. d Comparison of intensity cross-sections for three cases.

The developed ML-assisted SRM is also capable of resolving closely spaced quantum emitters (Fig. 5). Figure 5a–c shows the PL distribution, CNN-based retrieved \({g}^{(2)}(x,y,0)\) map and the resolved image of the two NVs separated by ~600 nm distance. By comparing the original PL distribution and the resolved image, the expected \(\sqrt{2}\) improvement in the spatial resolution is observed. By performing the Gaussian fitting of the cross-section (taken along the dashed line in Fig. 5a), one can retrieve the FWHM values of each of the lobs, which are equal to ~465 nm (Fig. 5d). By performing the same fitting on the resolved image, \(\sqrt{2}\) narrowing of the emission features (FWHM = 330 nm) by the CNN based approach is confirmed. We also use a Monte-Carlo simulation (parameters are tabulated in Table S2) to confirm enhanced resolution of two closely spaced emitters (Figures S3 and S4), as well as three closely spaced emitters (Figure S5). See Supplementary Section 3 for further details on the simulation and its results.

Fig. 5: Machine learning assisted antibunching SRM resolves two closely spaced emitters.
figure 5

ac Results of ML assisted antibunching SRM done on 7s HBT measurement: PL image (a); distribution of retrieved \({1-g}^{\left(2\right)}\left(x,y,0\right)\) (b) and the reconstructed image (c). d Cross-section of the intensity distribution of the PL image(blue) and Gaussian fitting (dashed, black). Blue dashed line in a shows the cross-section line. e Cross-section of the intensity distribution of the resolved image (blue) and Gaussian fitting (dashed, black).

Discussion

The proposed ML assisted regression technique allows for a significant speed-up of quantum SRM imaging. Specifically, the performance of the CNN-assisted SRM is demonstrated on nanodiamonds that contain single NV centers as quantum emitters. In the microscopy of quantum light sources, the developed ML-assisted super-resolution framework ensures a speed-up of 12 times compared to the conventional L-M fitting-based approach for retrieving the second-order autocorrelation value at zero delays, \({g}^{\left(2\right)}\left(0\right)\). The proposed approach can be extended to rapid measurements of higher-order autocorrelation functions, which opens up the way to the practical realization of scalable quantum super-resolution imaging systems. It is worth noting that the single-to-noise ratio of quantum imaging scanning microscopy decreases with the greater density of markers for the case of higher-order autocorrelation measurements. This makes it difficult to extend quantum imaging scanning microscopy to higher order correlations. While the approach developed in this work opens the way to speed-up the antibunching-based microscopy in general, it is an open question if the ML-assisted techniques can be used for overcoming the aforementioned issues. This interesting avenue can be a subject of future studies.

Methods

Experimental setup

The sample with nanodiamonds containing NV centers was prepared by cleaning a coverslip glass substrate with solvents, treating it with ultraviolet radiation for an hour, and drying a 5 μL droplet of a sonicated nanodiamond solution (20 nm average size, Adamas Nano) on the coverslip surface. Optical characterization was performed using a custom-made scanning confocal microscope with a 100 μm pinhole based on a commercial inverted microscope body (Nikon Ti−U). To locate the emitters, objective scanning was performed using a P-561 piezo stage driven by an E-712 controller (Physik Instrumente). Immersion microscopy was performed using an oil objective with a numerical aperture (NA) of 1.49. The optical pumping in the CW experiments was administered by a continuous wave 532 nm laser (RGB Photonics). Power on the order of 1 mW (measured before entering the optical objective) was used to pump the NV centers. This excitation power is above the NV center saturation power, and was used in order to obtain sufficient photon counts in the 1 s autocorrelation histograms for the machine learning algorithm to extract g(2)(0) values. The excitation beam was reflected off a 550 nm long-pass dichroic mirror (DMLP550L, Thorlabs), and a 550 nm long-pass filter (FEL0550, Thorlabs) was used to filter out the remaining pump power. Two avalanche detectors with a 30 ps time resolution and 35% quantum efficiency at 650 nm (PDM, Micro-Photon Devices) were used for single-photon autocorrelation measurements. Time-correlated photon counting was performed by a “start-stop” acquisition card with a 4 ps internal jitter (SPC-150, Becker & Hickl). The total histogram span was set to 500 ns and the co-detection events were collected into 215 time bins.

Training dataset

In order to train the regression network, autocorrelation measurements were performed on a set of 40 emitters. For each emitter, autocorrelation datasets were acquired in series of 1-second-long intervals. These “sparse” datasets acquired for each emitter were compounded into a “full” dataset, from which the \({g}^{\left(2\right)}\left(0\right)\) value was attained using the L-M fitting algorithm. Autocorrelation measurements on each emitter were performed by repeating acquisitions for one second, until accumulating about 300 co-detection events per bin in total. To extract an estimate of the autocorrelation at zero delays, the complete autocorrelation histograms were fitted according to a three-level emitter model (Eq. 3) using the Levenberg-Marquardt (L-M) method.

L-M fitting is realized by using non-linear least squares to fit a function, \({g}^{\left(2\right)}\left(\tau \right),\) to data. The main goal of the fit is to determine parameters (\({a}_{1},{a}_{2},{t}_{1},{t}_{2}\)) that minimizes the mean absolute difference between data and the function. It is worth noting that the position of the zero-th time-bin is assumed to be known.

The training dataset at this point consisted of 9416 sparse HBT histograms. The emitters in the dataset covered a broad range of \({g}^{\left(2\right)}\left(0\right)\) values from 0.1 to 0.884, while the total number of counts of the 1 s HBT histograms was in the range of 1.2 to 61.

The 1 s HBT histograms were used for data augmentation. Specifically, we formed all the possible combinations of 1 to 10 of these 1 s histograms to obtain training data that emulated histograms with acquisition times from 1 s to 10 s. This was done via bin-wise summation of the histograms. Such data augmentation processes assumed that the emission is a process with no memory over times exceeding 1 s and allowed us to significantly extend the training dataset.