High-resolution single-photon imaging with physics-informed deep learning

Bian, Liheng; Song, Haoze; Peng, Lintao; Chang, Xuyang; Yang, Xi; Horstmeyer, Roarke; Ye, Lin; Zhu, Chunli; Qin, Tong; Zheng, Dezhi; Zhang, Jun

doi:10.1038/s41467-023-41597-9

Download PDF

Article
Open access
Published: 22 September 2023

High-resolution single-photon imaging with physics-informed deep learning

Liheng Bian^1,2^na1,
Haoze Song¹^na1,
Lintao Peng¹,
Xuyang Chang ORCID: orcid.org/0000-0001-5696-0033¹,
Xi Yang³,
Roarke Horstmeyer³,
Lin Ye⁴,
Chunli Zhu¹,
Tong Qin¹,
Dezhi Zheng ORCID: orcid.org/0000-0003-3998-5989^1,2 &
…
Jun Zhang¹

Nature Communications volume 14, Article number: 5902 (2023) Cite this article

10k Accesses
1 Citations
2 Altmetric
Metrics details

Subjects

Abstract

High-resolution single-photon imaging remains a big challenge due to the complex hardware manufacturing craft and noise disturbances. Here, we introduce deep learning into SPAD, enabling super-resolution single-photon imaging with enhancement of bit depth and imaging quality. We first studied the complex photon flow model of SPAD electronics to accurately characterize multiple physical noise sources, and collected a real SPAD image dataset (64 × 32 pixels, 90 scenes, 10 different bit depths, 3 different illumination flux, 2790 images in total) to calibrate noise model parameters. With this physical noise model, we synthesized a large-scale realistic single-photon image dataset (image pairs of 5 different resolutions with maximum megapixels, 17250 scenes, 10 different bit depths, 3 different illumination flux, 2.6 million images in total) for subsequent network training. To tackle the severe super-resolution challenge of SPAD inputs with low bit depth, low resolution, and heavy noise, we further built a deep transformer network with a content-adaptive self-attention mechanism and gated fusion modules, which can dig global contextual features to remove multi-source noise and extract full-frequency details. We applied the technique in a series of experiments including microfluidic inspection, Fourier ptychography, and high-speed imaging. The experiments validate the technique’s state-of-the-art super-resolution SPAD imaging performance.

Concept, implementations and applications of Fourier ptychography

Article 10 February 2021

Mid-infrared single-pixel imaging at the single-photon level

Article Open access 25 February 2023

DEEP-squared: deep learning powered De-scattering with Excitation Patterning

Article Open access 13 September 2023

Introduction

Single-photon avalanche diode (SPAD) array has received wide attention due to its excellent single-photon sensitivity^1,2,3,4. Such a single-photon imaging sensor has been widely applied in various fields such as fluorescence lifetime imaging⁵, fluorescence fluctuation spectroscopy⁶, time-off-light imaging^7,8,9, quantum communication and computing^10,11, and so on^12,13. Compared with EMCCD and sCMOS cameras that also maintain high detection sensitivity, SPAD arrays acquire photon-level light signals at a low-noise level, and perform direct photon-digital conversion that can effectively eliminate readout noise and enhance readout speed¹⁴.

While early SPAD arrays were limited in imaging resolution due to low fill factors, which depend on a large guard ring of each pixel to prevent premature edge breakdown during electron avalanche¹⁵, recent advance has exhibited higher fill factors approaching 100%¹⁶. Nevertheless, compared with the well-established manufacture craft of CMOS using which Kitamura et al. have reported a 33 megapixel standard CMOS imaging sensor (7680 × 4320 pixels) in 2012¹⁷, the array size of SPAD in academic literature was only 400 × 400 in 2019¹⁸ and 1024 × 1000 in 2020¹⁵. Moreover, the commercial SPAD products available in the market maintain only thousand pixels¹⁹, and cost more than twenty thousand euros. In this regard, the average pixel cost of SPAD is 8 orders of magnitude higher than that of a commercial CMOS. To improve SPAD imaging resolution, Sun et al. reported an optically coded super-resolution technique²⁰ with a high-SNR input with 5.2 ms integration time, which is several orders higher than practical single-photon imaging applications with sub-nanosecond exposure time^7,8,9. The underlying limitation originates from the employed single-source Poisson noise model^21,22, which deviates from complex real SPAD noise containing a variety of different-model sources such as crosstalk, dark count rate, and so on¹ (as shown in Fig. 1a). Mora-Martín et al. presented a technique that utilizes synthetic SPAD depth sequences to train a 3D convolutional neural network (CNN) for denoising and upscaling (4×)²³. However, the technique also falls into the single-source based Poisson and Gaussian noise statistics that lead to degraded imaging quality without considering such multiple noise sources (as validated in the following experiments shown in Fig. 2b).

**Fig. 1: Illustration of the reported large-scale single-photon imaging technique.**

**Fig. 2: Experiment results of large-scale single-photon imaging.**

In the era of deep learning, another obstacle preventing super-resolution SPAD imaging is the lack of datasets containing image pairs of low-resolution single-photon acquisitions and high-resolution ground truth. Although one can accumulate multiple subframes to generate noise-free images, its pixel resolution is still low and far from meeting the Nyquist sampling requirement. One way to acquire high-resolution images is placing another CMOS or CCD camera to take images of the same target, which however introduces additional laboursome registration workload²⁴. Besides the extremely low resolution and complex noise model, the single-photon image enhancement task faces a more severe challenge due to its low bit depth that is different from the conventional image enhancement tasks on CMOS or CCD acquired images^25,26,27,28. In such a case, the local features employed by either the conventional optimization techniques^29,30 or the convolutional neural network techniques^26,28 are limited to predicting neighboring SPAD pixels that may vary drastically (as seen below).

In this work, to tackle the great challenge of high-fidelity super-resolution SPAD imaging with low bit depth, low resolution and heavy noise in photon-limited scenarios, we first established a real-world physical noise model of SPAD arrays. As shown in Fig. 1a, the real physical noise sources consist of shot noise from photon incidence, fixed-pattern noise from SPAD array’s photon absorption, dark count rate, afterpulsing and crosstalk noise from blind electron avalanche, and deadtime noise from the quenching circuit. To calibrate the parameters of such a complex multi-source noise model, we collected a real-shot SPAD image dataset containing 2790 images in total, each with 64 × 32 pixels. Among these images, there are 90 scenes, each with 10 different bit depths and 3 different illumination fluxes for studying different application conditions, as shown in Fig. 1b. With the calibrated physical noise model under different illumination and acquisition settings, we further employed off-the-shelf public high-resolution images (collected from the PASCAL VOC2007³¹ and VOC2012³² datasets) to digitally synthesize a large-scale realistic single-photon image dataset containing 2.6 million image pairs.

Driven by the single-photon image dataset, we designed a gated fusion transformer network for single-photon super-resolution enhancement. The transformer framework^33,34 has recently attracted increasing attention and produced an impressive performance on multiple vision tasks^34,35,36. As presented below, the reported network mainly consists of three modules, including the shallow feature extraction module, the deep feature fusion module and the image reconstruction module. The framework can gradually extract multi-level frequency features of input images and perform adaptive weighted fusion to reconstruct high-quality images. The gated fusion transformer network was trained using the above large-scale single-photon image dataset and tested on various SPAD images. We built four experiment setups to acquire various macroscopic and microscopic images, two of which acquired target images for dataset collection and direct enhancement validation, and another two were applied for microfluidic inspection and Fourier ptychograpic imaging. A series of experiments validate the technique’s state-of-the-art super-resolution single-photon imaging performance.

Results

Collected dataset for noise model evaluation

We first built an optical setup to acquire SPAD images of various targets, as shown in Fig. 2a. The setup was built on an optical table in a darkroom, which consists of a 488 nm laser (Coherent Sapphire SF 488-100), a neutral density filter, a set of lenses and a SPAD array camera (MPD-SPC3). The target is illuminated by the laser, and the reflected light is focused to the SPAD array. Using the setup, we first collected a real-acquired SPAD image dataset containing 90 targets with 9 classes (element, geometry, sculpture, rock, plant, animal, car, food, and lab tool), as illustrated in Fig. 1b. For each target, we set the laser power being 10 mW, 20 mW, and 40 mW, respectively. Under each illumination, the integration time of single-photon detection is $0.02\mu s$, and we acquired 1024 images for each target to synthesize multiple bit depths ranging from 1 to 10. As such, the frame time is $20\mu s$. The captured images are at the resolution of 32 × 64 pixels. These images were applied to calibrate the multi-source noise model parameters in Eq. (1). More parameter details are referred in Supplementary Note 1.

To validate the fidelity of the reported multi-source noise model, we used the synthesized images of different noise models to train different neural networks under the same transformer framework (without super resolution yet), and compared their enhanced results with the reference ground-truth (GT) image composed of 60,000 SPAD subframes. As shown in Fig. 2b, the enhanced images using the reported multi-source model maintain the highest fidelity and fine details compared to ground truth, while the other models produce various aberrations with remaining noise that is hard to remove. Besides, we note that the GT image exhibits a smooth background and fine target details, which alleviates the concern of coherent speckle noise coming from laser illumination. The quantitative comparison using the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) as metrics are presented on the right side of Fig. 2b, which indicates that our technique produces more than 5 dB higher PSNR and 0.3 higher SSIM than the other models. Such superiority validates that the reported multi-source physical noise model and calibration strategy enable an accurate description of SPAD’s photon flow. The details of collected imaging datasets are referred in the Supplementary Note 2.

Large-scale dataset for super-resolution SPAD imaging

Despite the above collected single-photon image dataset, further super-resolution enhancement cannot be achieved due to the lack of high-fidelity and high-resolution ground truth pairs. To tackle such a big challenge, we further synthesized a large-scale single-photon image dataset using the public copyright-free Pascal voc2007 and voc2012 images (24 bits). Specifically, the high-resolution images were cropped to 512 × 512 pixels and downsampled to different scales of 2, 4, 8, and 16 in each dimension. Then, to simulate acquired SPAD images, measurement noise of different illumination and bit depth conditions were added to the 32 × 32 images using the above calibrated noise parameters. In this way, there produce 17250 image pairs of low-resolution SPAD images (32 × 32 pixels, 4 different bit depths and 4 different illumination flux) and corresponding high-resolution ground truth. The dataset structure is demonstrated in Fig. 1b.

We trained the reported gated fusion transformer neural network (detailed in the Methods section) on the synthesized image dataset under different bit depths, illumination conditions, and super-resolution scales, and tested the network and state-of-the-art enhancing techniques on the test dataset including 505 synthetic noise images. Each test image is composed of different subframes ranging from 16 to 1024 (corresponding to the bit depth ranging from 4 to 10), where each pixel is 1 or 0 indicating whether a photon is detected. The competing methods are classified into two categories, including the popular optimization-based denoising techniques (BM3D and VST) together with subsequent bicubic interpolation (“BM3D+bicubic” and “VST+bicubic”), and the state-of-the-art neural networks (U-net, Memnet and SwinIR). These networks were also trained on our synthesized dataset to converge for a fair comparison.

We list a quantitative comparison of peak signal-to-noise ratio (PSNR) and structural similarity metric (SSIM) in Table 1. We can see that the reported technique achieves the best performance for all the different input settings. Especially, compared to the conventional enhancing methods such as BM3D, our method produces more than 5 dB higher PSNR and more than 0.2 higher SSIM. Compared to the competing neural networks, the reported transformer network maintains as much as 2 dB superiority on PSNR. The results further validate the effectiveness of our technique in attenuating measurement noise and retrieving fine details.

Table 1 Quantitative enhancement comparison among different enhancing methods and networks under different bit depth (subframe), illumination flux (laser power), and super-resolution scale

Full size table

Further, we fixed the illumination flux to be 40 mW, and the super-resolution scale to be 4, and studied these method’s enhancement performance under different bit depths ranging from extremely-low 16 subframes to 1024 frames. The quantitative comparison results are presented in Table 2, from which we can see that the reported technique maintains excellent robustness to different subframes. The PSNR metric degrades less than 2 as the subframe number decreases from 1024 to 16, in which case the conventional BM3D+bicubic technique obtains PSNR degradation higher than 7.

Table 2 Network evaluation on different bit depth, with the illumination flux fixed to be 40 mW and the super-resolution scale fixed to be 4

Full size table

Enhancement on real-acquired data

We applied the trained neural network on real-acquired data for experiment validation. The acquired images were input into the enhancing network with the super-resolution scale being 4 (output 128 × 256 pixels). The targets include a plaster model, a cat toy and different printed shapes. The enhanced results are presented in Fig. 2c. We can see that the reported technique produces high-fidelity super-resolution results, and maintains strong generalization on different targets. Specifically for the plaster model (target #1), as shown in the close-ups, the nose and mouth details are completely missing in the input image, while they are clearly retrieved in the enhanced results. The similar observation also exists for the cat toy (target #2). The comparison results of the different shape targets reveal that the reported technique is able to produce a smooth background while reconstructing fine structure details. We provide more enhancement results on real-acquired images in Supplementary Note 5. The experiment settings and specifications are provided in Supplementary Note 11 for easy reference.

To further validate the enhancing ability on microscopic imaging, we built the second microscopy setup by mounting the SPAD array camera (MPD-SPC3) to a commercial microscope (Olympus BX53). The hardware integration time is still set as $0.02\mu s$, ensuring that each frame’s maximum pixel value is 1 to generate a 1-bit image. This allows us to synthesize input images at different bit depths. The numerical aperture of the microscope objective is 0.25. The micro targets include a circuit board and a United States Air Force (USAF) resolution test target (R3L3S1N, Thorlabs). The enhanced results are presented in Fig. 2d. For the circuit board, we can clearly observe the resistance component and connecting wires in the enhanced super-resolution images, while such details are buried in heavy noise in the low-resolution and low-bit-depth acquired images. For the resolution test target, it is hard to even discriminate element 1 of group 3 for direct SPAD detection. In comparison, the enhanced resolution achieves element 6 of group 3 (Fig. 1c), with a smooth background and clear details.

Experiment on microfluidic inspection

Microfluidic technique manipulates and processes small amounts of fluids using mini channels of micrometer width³⁷. Inspection of microfluidic chips is important to monitor fluid flow. We built a microfluidic inspection setup using SPAD, as shown in Fig. 3a. A custom-made microfluidic device was constructed using two syringe pumps (BT01100, Longer Pump Limited Company) and a microfluidic chip (Wenhao Microfluidic Technology Limited Company). The chip and pumps were connected by polytetrafluoroethene (PTFE) tube with an inner diameter of 0.6 mm and outer diameter of 1.6 mm. The aqueous lemon yellow solution (2 mg/mL, Meryer Chemical Reagent Company) was used as the disperse phase, and the liquid paraffin (Macklin Chemical Reagent Company) was used as the continuous phase. The former was pumped into the middle channel of the chip, and the latter flowed into the chip from the side channels. After these two kinds of liquid met in the middle channel, the lemon yellow solution was sheared by the continuous phase and formed micro-droplets in the channel. The flow rate of the continuous and dispersed phase was 1 mL/h and 0.1 mL/h, respectively. The micro-droplets formed by lemon yellow solution were driven forward by the flow of the paraffin.

**Fig. 3: Microfluidic inspection experiment using single-photon detection.**

The micro-droplet formation process was recorded using the SPAD camera, as presented in the upper row in Fig. 3b. The images were accumulated using 1024 binary single-photon images, corresponding to 10-bit depth. However, due to the extremely low resolution and heavy noise, it is hard to reveal the fluid flow process or even the micro channel structure from the acquired single-photon images. The channel boundaries are barely visible in the measurements. The closeups at the left bottom of each image can barely render the micro-droplet structure but only a clutter of pixels. In comparison, by applying the reported enhancement technique on the acquired single-photon images, the enhanced images displayed in the bottom row in Fig. 3b can reveal much more details of the micro-droplets and micro channels. The width evolutionary process of microdroplets can be conveniently calculated from the super-resolution images, which is impossible for the low-resolution single-photon images where the pixel size is as large as $30\mu m$.

Using the microfluidic inspection setup, we can clearly study fluid flow and micro-droplet change process from frame sequence. As shown in the four frames in Fig. 3b, with the gradual increase of the continuous phase shear force in the microfluidic channel, the dispersed phase fluid gradually becomes thinner until it fractures. The phenomenon originates from that in a microfluidic chip with a flow-convergence structure, two immiscible liquids coaxially enter different microchannels of the chip. Among them, the continuous phase liquid flows in from the microchannels on both sides of the dispersed phase liquid, and produces an interflow extrusion effect on the dispersed phase liquid. At the junction of the channels, the dispersed phase is sheared by the continuous phase and extends into a “finger-like” or “nozzle-like” shape. Afterward, under the action of the liquid flow field, the dispersed phase is squeezed or sheared, breaking into uniform droplets. More microfluidic inspection experiment results are provided as Supplementary Note 6.

Experiment on Fourier ptychographic microscopy

Fourier ptychographic microscopy (FPM) is a synthetic coherent imaging technique, providing high-throughput amplitude and quantitative phase images^38,39. It works by acquiring multiple images under different angles of illumination, and inputting these images into phase retrieval algorithms for simultaneous amplitude and phase reconstruction. In our recent work⁴⁰, we have successfully implemented FPM with SPAD array, realizing coherent imaging with higher resolution and larger dynamic range with single-photon inputs. The experiment setup is presented in Fig. 4a, where a 0.1 NA Olympus plan achromatic objective with a 600 mm focal length tube lens was applied for imaging, providing ×18.5 magnification for Nyquist sampling. An Adafruit P4 Light Emitting Diode (LED) array was placed 84 mm from the sample to provide different angles of illumination, among which we used 9 × 9 632 nm LEDs. The illumination NA was 0.18.

**Fig. 4: Fourier ptychographic microscopy (FPM) experiment using single-photon detection.**

The exemplar acquired images of an onion cell sample, a blood cell sample (simulated data), a USAF resolution test target and a plant sample (real-acquired data) are presented in the left column of Fig. 4b, c, respectively. The acquired images were all taken under vertical illumination, with the bit depth being 9 (accumulated by 512 binary single-photon images). We first applied the standard FPM reconstruction algorithm⁴⁰ to stitch the acquired SPAD images of different illumination angles together, and the FPM reconstruction results are shown in the middle column. We can see that although FPM effectively enhances imaging resolution, there still exists obvious noise and aberration in the reconstruction that is introduced by heavy SPAD noise. Then, we applied the reported technique on the FPM reconstruction, and the enhanced results are presented in the last columns. We can see that in the improved results, the background is smoothed, while the super-resolution details are further enhanced. The experiment further validates the superior enhancing ability of the reported technique on different imaging modalities.

We also noticed that the enhancement of our technique on experiment data (Fig. 4c) is not as superior as that on simulated data (Fig. 4b). We consider the reason may arise from the fact that in practical FPM experiments, besides the noise coming from the detection part, there also exist noise and aberrations from the illumination part and optical path. Especially for the FPM imaging modality that requires multiple acquisitions, there may be more aberrations such as non-uniform light flux and LED misalignment³⁹. In this regard, it is our future work to investigate deep into these factors. One possible solution is to incorporate the reported enhancement technique into the iterations of FPM optimization, which may help reduce error accumulations and improve robustness.

Experiment on high-speed scenarios

SPAD imaging holds unique advantages in capturing ultra-fast scenes under limited illumination conditions. Our SPAD arrays can achieve a minimum integration time of 20 ns to capture 1-bit images. We demonstrate the effectiveness of SPAD imaging for high-speed, microsecond timing scenarios in Fig. 5.

**Fig. 5: Single-photon imaging experiment of high-speed scenarios.**

The first demonstration involves recording a rapidly rotating fan. In this experiment, we captured images of the spinning fan using an experimental setup shown in Fig. 5a. The setup consists of an objective lens, an LED light source (GCI-060401, 3W electrical power), and the SPAD arrays. The integration frame time was set to 1 microsecond. We observed that there was noticeable noise and aberration in the raw single-photon images due to heavy SPAD noise, making it difficult to distinguish the rotating blade angle of the fast fan. We applied the reported technique to enhance the recorded SPAD images. The results showed improvement compared to the initial recordings. The background appeared smoother, and the super-resolution details were further enhanced. We have measured the rotational speed to be 0.0107 rad/μs, which is approximately equivalent to 102,000 RPM (revolutions per minute). This measured speed is consistent with the manufacturer’s instructions, which state a rotating speed of about 100,000 revolutions per minute.

The second demonstration features an electrostatic ball, as presented in Fig. 5c. When activated, the circuit generates a high-frequency electric field that illuminates the thin gas inside the sphere. A high voltage alternating current is applied to the sphere via a central electrode. This energy ionizes the gases, producing positively charged ions and plasma. The high voltage from the electrode then creates an arc through the plasma to the edge of the sphere, causing it to glow. We conducted recordings of the glowing process at microsecond intervals to enhance the understanding of the high-speed arc glowing process. Through observations, we noted that the duration of the arc glowing period is approximately $4\mu s$, which is consistent with the statement in ref. ⁴¹. We provided a high-speed single-photon imaging demonstration video in Supplementary Movie 1.

Discussion

In this work, we introduced deep learning into single-photon detection, and presented a enhancing technique for large-scale single-photon imaging. By tackling the big challenge of single-photon enhancement with extremely low resolution, low bit depth, complex heavy noise and lack of image datasets, the reported technique realizes simultaneous single-photon denoising and super-resolution enhancement (up to $4\times 4$ times). A series of experiments on both macro and micro setups validate the reported technique’s state-of-the-art large-scale single-photon imaging performance. Besides, we also built two application setups of microfluidic inspection and Fourier ptychographic microscopy to further validate the strong adaptation ability of the reported technique for various imaging modalities.

The state-of-the-art large-scale single-photon imaging performance of the reported technique builds on several innovations, including the physical multi-source noise modeling of SPAD arrays, the construction of two single-photon image datasets (one for noise calibration and another for network training), and the design of a deep transformer neural network with a content-adaptive self-attention mechanism for single-photon enhancement. The reported noise model studied the entire photon flow process of single-photon detection, and considered multiple noise sources that degrade single-photon imaging quality. The noise sources include shot noise from photon incidence, fixed-pattern noise from SPAD array’s photon response, dark count rate, afterpulsing and crosstalk noise from blind electron avalanche, and deadtime noise from the quenching circuit. The first single-photon image dataset of 3600 single-photon images was collected with different illumination flux and bit depth, to calibrate the statistical parameters of the multi-source physical noise model. Employing the calibrated noise model, the second single-photon image dataset was synthesized, containing 1.1 million images over 17250 scenes, providing image pairs of low-resolution single-photon ideas and corresponding high-resolution ground truth ranging from thousand to mega pixels. This strategy saves great efforts to collect and register paired data, and makes it possible to study latent signal features of single-photon data. The dataset was applied to train the transformer network for single-photon image enhancement, providing a smooth background and fine details benefiting from its self-attention and gated fusion mechanism.

In the workflow, the multi-source noise model parameters were calibrated employing a set of collected images acquired using a specific SPAD camera. Although the reported noise model is generalized and applicable to various single-photon detection schemes, the noise parameters of different SPAD arrays may deviate from each other, even for the same version of cameras. In this regard, the automatic calibration of different SPAD arrays is worthy of further study. Besides, we consider that the transfer learning technique can be adapted to other single-photon detection hardware and settings⁴². This would save laboursome data collection and parameter calibration efforts for different experiments. Furthermore, big model training has gotten a big development for computer vision tasks⁴³. It may be an effective method to generate synthetic datasets based on the reported physics model to train a big model for SPAD imaging.

Besides direct enhanced imaging, single-photon detection has also been applied in multiple computational sensing modalities for broader applications, such as non-line-of-sight imaging^44,45 and quantum key distribution⁴⁶. In such schemes, we can integrate the reported enhancing technique with the sensing framework to achieve global optimum^47,48. The technique may work as an enhancing solver in the alternating optimization process to improve resolution and attenuate noise and aberrations⁴⁷, preventing error accumulation for global optimization.

The photon detection efficiency of the SPAD array is varied at different wavelengths. This property provides two hints for us to further develop the reported technique. First, we can further consider this wavelength-dependent noise in our multi-source physical noise model, which can compensate for the degradation introduced by varied photon efficiency. Second, we can employ various photon efficiency to retrieve spectral information of incident light, opening research avenues on single-photon multispectral imaging that would benefit a set of single-photon applications⁴⁹.

Besides single-photon detection sensitivity, another highlighted ability of SPAD is its picosecond-scale time gating, indicating the time stamp of photon arrivals. In our present work, we did not consider time gating in different applications such as 3D LiDAR imaging capabilities for autonomous vehicles²³ and fluorescence lifetime imaging in microscopy. Prior to the availability of the reported method, addressing issues with commercially available 64 × 32 resolution SPAD for LiDAR systems was difficult, as they struggled to perform accurate and fast detection due to limited resolution. Using the reported approach, we can assist LiDAR systems in obtaining higher resolution and more precise scene information for enhanced detection. Furthermore, data collection in complex environments, such as rainy, foggy, or hazardous conditions, poses even greater challenges. Tackling these scenarios is vital for safety-related tasks. In this context, the reported technique presents an alternative solution for LiDAR systems confronted with these situations. In our future work, it is worth studying time gating in our enhancement framework to improve depth-selective resolution. Besides, considering time stamp into the multi-source physical noise model may further help improve noise robustness and enhancing ability.

Methods

Noise modeling of SPAD arrays

As shown in Fig. 1a, the noise sources of SPAD arrays include signal-dependent shot noise from photon incidence, fixed-pattern noise from SPAD array’s photon detection efficiency (PDE), dark count rate, afterpulsing and crosstalk noise from electron avalanche, and deadtime noise from circuit quenching.

In the incident process that photons enter the photosensitive surface of SPAD arrays, photon arrival is a stochastic process due to the quantum properties of light, known as the shot noise ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{shot}}}}}}}$. During a certain integration time of the detector, the incident photon number $k$ subjects to the Poisson distribution as $p\left(k\right)=\frac{{e}^{-{{{{{\rm{\chi }}}}}}}{\chi }^{k}}{k!}$, where $\chi$ is the expectation of incident photons in a unit time (namely the latent light signal). Considering the unique avalanche circuit of SPAD arrays that detects at most one photon during the integration time, we apply the Poisson distribution with probability p(Poisson) = 1-p(k = 0), where $p\left(k\right)=\frac{{e}^{-{{{{{\rm{\chi }}}}}}}{\chi }^{k}}{k!}$. In this regard, the shot noise is independent of specific detectors, and its parameters do not need to be calibrated. We considered shot noise in our synthesized large-scale single-photon image dataset for network training.

SPAD arrays absorb incident photons and generate electric charges through photoelectric conversion. In this process, each SPAD pixel responds with a certain probability, and an avalanche occurs under a certain probability to form a saturation current for a new photon count. The probability that the above process occurs is termed photon detection efficiency (PDE), which is defined as the ratio between the number of incoming photons and the number of output current pulses. Since PDE is mainly related to manufacture craft and hardware settings, it is approximated to be a fixed probability distribution, meaning that the fixed-pattern noise (${{{{{{\rm{N}}}}}}}_{{{{{{\rm{fp}}}}}}}$) probability of each pixel is assumed to be a constant once it is calibrated. In our workflow, we employed the PDE data provided by manufacturer. In the photoelectric conversion process, SPAD arrays would also excite electrons due to the thermal effect even in darkness. These electrons are amplified by the avalanche circuit to form a saturation current, resulting in dark count rate noise. The dark count rate noise ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{dcr}}}}}}}$ subjects to the Poisson distribution⁵⁰, with the expectation to be calibrated.

Followed the photoelectric conversion, the electrons are amplified by an avalanche. In the electron avalanche amplification process, the carriers pass through the P-N junction and may be trapped by the conductor. In such a case, the saturation current is relayed, and the detector will count an additional detection event with a certain probability, denoted as afterpulsing noise ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{ap}}}}}}}$. The existing studies of afterpulsing noise^51,52 consider its probability to be a power function or exponential decay with time. Since we only consider its statistical effect related to the manufacture of each pixel unit, we simplify the model to be a fixed probability distribution map, and assume that its effect of the previous frame may only apply to the current frame. Besides the electrical interference, the avalanche current in one pixel may trigger the surrounding pixels due to the photons emitted by hot carriers. This process produces crosstalk noise ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{ct}}}}}}}$, which is assumed to follow a fixed probability distribution similar to afterpulsing noise⁵³.

To prevent self-sustaining current from damaging the circuit, SPAD arrays implement a quenching operation that adjusts the bias of P-N junction to the breakdown voltage after the avalanche during each integration time. In this process, each pixel does not respond to additional incident photons. As a result, this process brings deadtime noise to photon count and keeps it being zero. In our experiments, we set the integration time to be 20 ns, the dead time to be 60 ns, and the readout time to be 80 ns. In this way, we can eliminate the negative influence of asynchronous deadtime noise of each pixel. Therefore, we do not include deadtime noise in the following modeling and calibration phases.

To sum up, the multi-source physical noise model of SPAD arrays is described as

$${{{{{\rm{N}}}}}}={{{{{{\rm{N}}}}}}}_{{{{{{\rm{shot}}}}}}}+{{{{{{\rm{N}}}}}}}_{{{{{{\rm{fp}}}}}}}+{{{{{{\rm{N}}}}}}}_{{{{{{\rm{dcr}}}}}}}+{{{{{{\rm{N}}}}}}}_{{{{{{\rm{ap}}}}}}}+{{{{{{\rm{N}}}}}}}_{{{{{{\rm{ct}}}}}}}+{{{{{{\rm{N}}}}}}}_{{{{{{\rm{dt}}}}}}}$$

(1)

As stated above, the negative influence of ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{shot}}}}}}}$, ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{fp}}}}}}}$ and ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{dt}}}}}}}$ is tackled by either data-driven processing, employing manufacture parameters or adjusting hardware settings. The model parameters that are required to calibrate include the expectation of ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{dcr}}}}}}}$, the fixed probability ${{{{{{\rm{p}}}}}}}_{{{{{{\rm{ap}}}}}}}$ of ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{ap}}}}}}}$, and the expectation of ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{ct}}}}}}}$.

Noise parameter calibration

We acquired 60000 single-photon images (1 bit) of dark field without illumination. In such a case, we considered shot noise ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{shot}}}}}}} \, \approx \, 0$ and fixed-pattern noise ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{fp}}}}}}} \, \approx \, 0$. Besides, the deadtime noise is also assumed ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{dt}}}}}}} \, \approx \, 0$, because it mainly describes the response to incident light in the quenching process. In this regard, the noise formation model of the dark-field images can be summarized as

$${I}_{{dark}}\left(x,\, y,\, n\right)= {N}_{{dcr}}\left(x,\, y,\, n\right)+{p}_{{ap}}\left(x,\, y\right){I}_{{dark}}\left(x,\, y,\, n-1\right) \\ +{p}_{{ct}}\left(x,\, y\right)U\left({I}_{{dark}}\left(x,\, y,\, n\right)\right),$$

(2)

where ${{{{{{\rm{I}}}}}}}_{{{{{{\rm{dark}}}}}}}\left(x,\, y,\, n\right)$ denotes the detected signal at the pixel location $\left(x,y\right)$ in the $n$-th frame, ${{{{{{\rm{N}}}}}}}_{{{{{{\rm{dcr}}}}}}}\left(x,\, y,\, n\right)$ is the dark-count rate noise map that follows Poisson distribution, ${p}_{{ap}}$ is the fixed probability map of afterpulsing noise, ${p}_{{ct}}$ represents the Poisson probability of crosstalk noise, and $U\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{dark}}}}}}}\left(x,\, y \,,n\right)\right)$ denotes neighboring detected signal of pixel $\left(x,\, y\right)$.

To calibrate the above multiple noise parameters, we first extract the noise maps of afterpulsing noise ${{{{{{\rm{p}}}}}}}_{{{{{{\rm{ap}}}}}}}\left(x,\, y\right)$ and crosstalk noise ${{{{{{\rm{p}}}}}}}_{{{{{{\rm{ct}}}}}}}\left(x,y\right)$ according to their generation mechanism. Specifically, because the detected photon number of dark images is small, two consecutive detection events at the same pixel location is most likely afterpulsing noise. Under this assumption, we obtained the intensity of afterpulsing at each pixel $\left(x,y\right)$ in the $n$-th frame as ${{{{{{\rm{I}}}}}}}_{{{{{{\rm{ap}}}}}}}\left(x,\, y,\, n\right)$, provided that the signals of the previous frame and the current frame at the same pixel are the same as 1. Similarly, we can extract the map of crosstalk noise ${{{{{{\rm{I}}}}}}}_{{{{{{\rm{ct}}}}}}}\left(x,y,n\right)$, considering that the neighboring pixels in the same frame are both 1 if a crosstalk event occurs.

As stated in the technical manual provided by the SPAD manufacture, the typical probability of afterpulsing events is 1–2 orders of magnitude higher than that of crosstalk events. Therefore, we prioritize afterpulsing events during the calibration process. When both afterpulsing and crosstalk events are observed at a single pixel, we classify it as an afterpulsing event rather than a crosstalk event. By doing so, we can effectively ignore the influence of crosstalk events while ensuring that afterpulsing events receive the necessary attention. This strategy helps alleviate the problem of redundant calibration and enables us to obtain accurate parameter values.

Using the sub-noise maps, the fixed probability of afterpulsing noise is calculated as

$${{{{{{\rm{p}}}}}}}_{{{{{{\rm{ap}}}}}}}\left(x,y\right)=\frac{{\sum }_{n}{{{{{{\rm{I}}}}}}}_{{{{{{\rm{ap}}}}}}}\left(x,\, y,\, n\right)}{{\sum }_{n}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{dark}}}}}}}\left(x,\, y,\, n\right)-{{{{{{\rm{I}}}}}}}_{{{{{{\rm{ap}}}}}}}\left(x,\, y,\, n\right)-{{{{{{\rm{I}}}}}}}_{{{{{{\rm{ct}}}}}}}\left(x,\, y,\, n\right)\right)},$$

(3)

and the crosstalk noise probability is

$${{{{{{\rm{p}}}}}}}_{{{{{{\rm{ct}}}}}}}\left(x,y\right)=\frac{{\sum }_{n}{{{{{{\rm{I}}}}}}}_{{{{{{\rm{ct}}}}}}}\left(x,\, y,\, n\right)}{{\sum }_{n}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{dark}}}}}}}\left(x,y,n\right)-{{{{{{\rm{I}}}}}}}_{{{{{{\rm{ap}}}}}}}\left(x,\, y,\, n\right)-{{{{{{\rm{I}}}}}}}_{{{{{{\rm{ct}}}}}}}\left(x,\, y,\, n\right)\right)}.$$

(4)

Followed the calibration of afterpulsing noise and crosstalk noise, the dark count rate noise is approximated as

$${{{{{{\rm{N}}}}}}}_{{{{{{\rm{dcr}}}}}}}\left({{{{{\rm{x}}}}}},{{{{{\rm{y}}}}}}\right)=\frac{{\sum }_{{{{{{\rm{n}}}}}}}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{dark}}}}}}}\left(x,\, y,\, n\right)-{{{{{{\rm{I}}}}}}}_{{{{{{\rm{ap}}}}}}}\left(x,\, y,\, n\right)-{{{{{{\rm{I}}}}}}}_{{{{{{\rm{ct}}}}}}}\left(x,\, y,\, n\right)\right)}{{{{{{{\rm{n}}}}}}}_{{{{{{\rm{total}}}}}}}},$$

(5)

where ${{{{{{\rm{n}}}}}}}_{{{{{{\rm{total}}}}}}}$ is the number of acquired frames for calibration. In our implementation, we set ${{{{{{\rm{n}}}}}}}_{{{{{{\rm{total}}}}}}}=60000$. More details of generating an synthetic single-photon image dataset based on the calibrated noise model is presented in Supplementary Note 7. We further provide our SPAD arrays’ sampling/ISP scheme and simulation strategy in Supplementary Note 8.

Network structure

We designed a gated-fusion transformer network for single-photon enhancement. The network is inspired by the Swin-Transformer structure³⁰, whose shift-window mechanism can effectively reduce network parameters by more than one order of magnitude compared to the conventional transformer networks³⁴. On this basis, our network further introduces dense connections among different Swin-Transformer layers (STL), enabling long-distance dependency modeling and full-frequency information retrieval. As shown in Fig. 5a, the network consists of three modules, including the shallow feature extraction module, the deep feature fusion module, and the image reconstruction module. Compared with the conventional convolutional networks, the gated fusion transformer network maintains the following advantages: 1) The content-based interactions between image content and attention weights can be interpreted as spatially varying convolution, and the shift window mechanism can perform long-distance dependency modeling. 2) The gated fusion mechanism can deeply dig medium-frequency and high-frequency information at different levels, enabling the prevention of losing long-term memory as the network deepens and enhancement of fine local details.

Shallow feature extraction

Given a low-quality image ${{{{{{\rm{I}}}}}}}_{{{{{{\rm{LQ}}}}}}}\in {{\mathbb{R}}}^{h*w*{c}_{{in}}}$ ($h$, $w$ and ${{{{{{\rm{c}}}}}}}_{{{{{{\rm{in}}}}}}}$ are the image’s height, width and channel number), we use the shallow feature extractor ${{{{{{\rm{H}}}}}}}_{{{{{{\rm{FE}}}}}}}\left(*\right)$ to explore its low-frequency features ${{{{{{\rm{F}}}}}}}_{0}\in {{\mathbb{R}}}^{h*w*c}$ as

$${{{{{{\rm{F}}}}}}}_{0}={{{{{{\rm{H}}}}}}}_{{{{{{\rm{FE}}}}}}}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{LQ}}}}}}}\right).$$

(6)

This module is composed of convolution, batch normalization and activation layers (specific parameters and settings are presented in Supplementary Note 3). The convolution layers are applied for preliminary visual processing, providing a simple way to map the input image space to a higher-dimensional feature space. Besides the following deep fusion operation, the output of this module is also linked to the final image reconstruction module, so that the target’s low-frequency information can be well preserved in final reconstruction.

Deep feature fusion

Next, we use several densely connected Swin-Transformer blocks (DCSTB) to extract different levels of medium-frequency and high-frequency features ${{{{{{\rm{F}}}}}}}_{{{{{{\rm{i}}}}}}}\in {{\mathbb{R}}}^{h*w*c}\left(i={{{{\mathrm{1,2}}}}},\ldots,n\right)$ from ${{{{{{\rm{F}}}}}}}_{0}$, denoted as

$${{{{{{\rm{F}}}}}}}_{{{{{{\rm{i}}}}}}}={{{{{{\rm{H}}}}}}}_{{{{{{\rm{DCSTB}}}}}}}\left({{{{{{\rm{F}}}}}}}_{0},\, {{{{{{\rm{F}}}}}}}_{1},\ldots,{{{{{{\rm{F}}}}}}}_{{{{{{\rm{i}}}}}}-1}\right),$$

(7)

where ${{{{{{\rm{H}}}}}}}_{{{{{{\rm{DCSTB}}}}}}}$ represents the ${i}_{{th}}$ DCSTB operation. Compared to the conventional convolution blocks such as U-net, the transformer-structure blocks realize spatial-variable convolution that helps pay more attention to the regions of fine details and interests, as validated in Fig. 5b. Consequently, such blocks help recover more high-frequency information that is beneficial to enhancing imaging resolution.

The last layer of the deep feature fusion module is the Gated Fusion layer, which fuses the outputs of different DCSTB operations with adaptively different weights. The process can be described as

$${{{{{{\rm{F}}}}}}}_{{{{{{\rm{DF}}}}}}}={{{{{{\rm{H}}}}}}}_{{{{{{\rm{GATE}}}}}}}\left({{{{{{\rm{F}}}}}}}_{1},\, {{{{{{\rm{F}}}}}}}_{2},\ldots,{{{{{{\rm{F}}}}}}}_{{{{{{\rm{n}}}}}}}\right)={{{{{{\rm{w}}}}}}}_{1}{{{{{{\rm{F}}}}}}}_{1}+{{{{{{\rm{w}}}}}}}_{2}{{{{{{\rm{F}}}}}}}_{2}+\ldots+{{{{{{\rm{w}}}}}}}_{{{{{{\rm{n}}}}}}}{{{{{{\rm{F}}}}}}}_{{{{{{\rm{n}}}}}}},$$

(8)

where ${{{{{{\rm{F}}}}}}}_{{{{{{\rm{DF}}}}}}}$ represents the multi-level deep fusion features output by the gated fusion layer, and ${w}_{n}$ represents the weight parameters during gated fusion for different levels of feature, which are adaptively adjusted through backpropagation during network training. Such a module structure is conducive to deep mining of different levels of medium-frequency and high-frequency information, which prevents losing long-term memory as the network deepens and enhances local details⁵⁴, as validated in Fig. 6b.

**Fig. 6: Illustration of the reported deep transformer network for high-fidelity large-scale single-photon imaging.**

Image reconstruction

We retrieve high-quality single-photon images by aggregating shallow features and multi-level deep fusion features. The operation is described as

$${{{{{{\rm{I}}}}}}}_{{{{{{\rm{R}}}}}}}={{{{{{\rm{H}}}}}}}_{{{{{{\rm{REC}}}}}}}\left({{{{{{\rm{F}}}}}}}_{0}+{{{{{{\rm{F}}}}}}}_{{{{{{\rm{DF}}}}}}}\right).$$

(9)

The shallow features ${{{{{{\rm{F}}}}}}}_{0}$ are mainly low-frequency image features, while the multi-level deep fusion features ${{{{{{\rm{F}}}}}}}_{{{{{{\rm{DF}}}}}}}$ focus on recovering lost medium-frequency and high-frequency features. Benefiting from the long-term skip connections, the Gated Fusion Transformer network can effectively transmit different-frequency information to final high-quality reconstruction. Different from the SwinIR network⁵⁵ that lacks the densely connected gated fusion structure (as shown in Fig. 7a), the reported network helps preserve and compensate the key medium and high-frequency feature signals, and enriches the image’s local details (as the intermediate feature maps validate in Fig. 7a). In addition, sub-pixel convolution is applied in the reconstruction block to further upsample the feature map for single-photon super resolution. More network structure details are presented in Supplementary Note 10.

**Fig. 7: The enhancement comparison between the SwinIR network and the reported network.**

Loss function

We designed a hybrid loss function consisting of ${{{{{{\rm{L}}}}}}}_{1}-{norm}$ loss, perceptual loss and SSIM loss, to train the Gated Fusion Transformer network. The ${{{{{{\rm{L}}}}}}}_{1}-{norm}$ loss calculates the absolute distance between two images as ${{{{{\rm{Los}}}}}}{{{{{{\rm{s}}}}}}}_{{{{{{{\rm{L}}}}}}}_{1}}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{R}}}}}}},{{{{{{\rm{I}}}}}}}_{{{{{{\rm{G}}}}}}}\right)={\left|\left|{{{{{{\rm{I}}}}}}}_{{{{{{\rm{R}}}}}}}-{{{{{{\rm{I}}}}}}}_{{{{{{\rm{G}}}}}}}\right|\right|}_{{{{{{{\rm{L}}}}}}}_{1}}$, where ${{{{{{\rm{I}}}}}}}_{{{{{{\rm{R}}}}}}}$ represents the reconstructed image by the network, and ${{{{{{\rm{I}}}}}}}_{G}$ denotes its ground truth. The perceptual loss is defined as the ${{{{{{\rm{L}}}}}}}_{2}-{norm}$ distance between feature maps output by the pool-3 layer of a VGG19 network pretrained on ImageNet as ${{{{{\rm{Los}}}}}}{{{{{{\rm{s}}}}}}}_{{{{{{\rm{PER}}}}}}}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{R}}}}}}},{{{{{{\rm{I}}}}}}}_{{{{{{\rm{G}}}}}}}\right)={\left|\left|\varphi \left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{R}}}}}}}\right)-\varphi \left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{G}}}}}}}\right)\right|\right|}_{{L}_{2}}$, where the $\varphi \left(\cdot \right)$ operation extracts feature maps. The perceptual loss regulates different-frequency similarity in the feature space. The SSIM loss is calculated as ${{{{{\rm{L}}}}}}{{{{{\rm{os}}}}}}{{{{{{\rm{s}}}}}}}_{{{{{{\rm{SSIM}}}}}}}=1-{{{{{\rm{SSIM}}}}}}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{R}}}}}}},{{{{{{\rm{I}}}}}}}_{{{{{{\rm{G}}}}}}}\right)$, which further regulates the two images’ similarity in the structural domain. To sum up, the loss function for network training is

$$\begin{array}{c}{{{{{\rm{Loss}}}}}}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{RHQ}}}}}}},\, {{{{{{\rm{I}}}}}}}_{{{{{{\rm{HQ}}}}}}}\right)=\alpha {{{{{\rm{Los}}}}}}{{{{{{\rm{s}}}}}}}_{{{{{{{\rm{L}}}}}}}_{1}}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{RHQ}}}}}}},\, {{{{{{\rm{I}}}}}}}_{{{{{{\rm{HQ}}}}}}}\right)\\+\beta {{{{{\rm{Los}}}}}}{{{{{{\rm{s}}}}}}}_{{{{{{\rm{PER}}}}}}}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{RHQ}}}}}}},\, {{{{{{\rm{I}}}}}}}_{{{{{{\rm{HQ}}}}}}}\right)+\gamma {{{{{\rm{Los}}}}}}{{{{{{\rm{s}}}}}}}_{{{{{{\rm{SSIM}}}}}}}\left({{{{{{\rm{I}}}}}}}_{{{{{{\rm{RHQ}}}}}}},\, {{{{{{\rm{I}}}}}}}_{{{{{{\rm{HQ}}}}}}}\right).\end{array}$$

(10)

where $\alpha,$ $\beta$ and$\,\gamma$ are hyperparameters balancing the three loss parts. In our implementation, these hyperparameters were set as $\alpha=0.1,\beta=10$ and $\gamma=100$ after careful network tuning. Furthermore, we have presented the results of our transfer learning approach for further improvement in Supplementary Note 9.

Network training

We randomly selected 505 image pairs from the synthetic large-scale single-photon image dataset as the testing set, and the remaining 16620 image pairs were used for network training. All the images were cropped to 32 × 32 pixels when input into the network. We implemented the network on Ubuntu20 operating system using the Pytorch framework, and trained 1000 epochs until convergence using Adam optimization on NVIDIA RTX3090 with the Batch size set to 24. We set the initial learning rate as 0.0003, which was decreased by 10% for every 100 epochs. The default weight decay was set to 0.00005, and the Adam parameters of $\beta 1$ and $\beta 2$ were set to 0.5 and 0.999, respectively. We used RTX3090 GPU for network implementation on the Python and Pytorch frameworks. The average time to enhance a single SPAD image from 32*64 to 128*256 is 0.04 s. Other details of network training are presented in Supplementary Note 4.

The minimum data generated in this study have been deposited in the Figshare database under accession code https://doi.org/10.6084/m9.figshare.23966922.v1. The complete data are available under restricted access for the funded project requirements, access can be obtained by the corresponding author within one month of the reasonable request to the corresponding author.

Data availability

The minimum data generated in this study have been deposited in the Figshare database under accession code https://doi.org/10.6084/m9.figshare.23966922.v1. The complete data are available under restricted access for the funded project requirements, access can be obtained by the corresponding author within one month of the reasonable request to the corresponding author.

Code availability

The demo code of the reported technique is available from https://github.com/bianlab/single-photon.

References

Bruschini, C., Homulle, H., Antolovic, I. M., Burri, S. & Charbon, E. Single-photon avalanche diode imagers in biophotonics: review and outlook. Light-Sci. Appl. 8, 1–28 (2019).
Article CAS Google Scholar
Gariepy, G., Tonolini, F., Henderson, R., Leach, J. & Faccio, D. Detection and tracking of moving objects hidden from view. Nat. Photon. 10, 23–26 (2016).
Article ADS CAS Google Scholar
Ma, J., Masoodian, S., Starkey, D. A. & Fossum, E. R. Photon-number-resolving megapixel image sensor at room temperature without avalanche gain. Optica 4, 1474–1481 (2017).
Article ADS CAS Google Scholar
Nam, J. H. et al. Low-latency time-of-flight non-line-of-sight imaging at 5 frames per second. Nat. Commun. 12, 6526 (2021).
Article ADS PubMed PubMed Central Google Scholar
Castello, M. et al. A robust and versatile platform for image scanning microscopy enabling super-resolution FLIM. Nat. Methods 16, 175–178 (2019).
Article CAS PubMed Google Scholar
Slenders, E. et al. Confocal-based fluorescence fluctuation spectroscopy with a SPAD array detector. Light-Sci. Appl. 10, 31 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Tinsley, J. N. et al. Direct detection of a single photon by humans. Nat. Commun. 7, 12172 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Kirmani, A. et al. First-photon imaging. Science 343, 58–61 (2014).
Article ADS CAS PubMed Google Scholar
Gariepy, G. et al. Single-photon sensitive light-in-fight imaging. Nat. Commun. 6, 6021 (2015).
Article ADS CAS PubMed Google Scholar
Zhang, J., Itzler, M. A., Zbinden, H. & Pan, J.-W. Advances in InGaAs/InP single-photon detector systems for quantum communication. Light-Sci. Appl. 4, e286–e286 (2015).
Article ADS CAS Google Scholar
Zhang, A. et al. Quantum verification of NP problems with single photons and linear optics. Light-Sci. Appl. 10, 169 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Carolan, J. et al. On the experimental verification of quantum complexity in linear optics. Nat. Photon. 8, 621–626 (2014).
Article ADS CAS Google Scholar
Lyons, A. et al. Computational time-of-flight diffuse optical tomography. Nat. Photon. 13, 575–579 (2019).
Article ADS CAS Google Scholar
Zang, K. et al. Silicon single-photon avalanche diodes with nano-structured light trapping. Nat. Commun. 8, 628 (2017).
Article ADS PubMed PubMed Central Google Scholar
Morimoto, K. et al. Megapixel time-gated SPAD image sensor for 2D and 3D imaging applications. Optica 7, 346–354 (2020).
Article ADS Google Scholar
Thomas, S. Low-light imaging with SPAD pixels. Nat. Electron 4, 862 (2021).
Article Google Scholar
Kitamura, K. et al. A 33-megapixel 120-frames-per-second 2.5-watt cmos image sensor with column-parallel two-stage cyclic analog-to-digital converters. IEEE Trans. Electron Devices 59, 3426–3433 (2012).
Article ADS Google Scholar
Hirose, Y. et al. 5.6 a 400×400-pixel 6um-pitch vertical avalanche photodiodes cmos image sensor based on 150ps-fast capacitive relaxation quenching in geiger mode for synthesis of arbitrary gain images. In 2019 IEEE International Solid- State Circuits Conference - (ISSCC), p. 104–106 (2019).
S.r.l., M. P. D. Micro Photon Devices. http://www.micro-photon-devices.com/.
Sun, Q. et al. End-to-End Learned, Optically Coded Super-Resolution SPAD Camera. ACM Trans. Graph. 39, 1–14 (2020).
Bolduc, E., Agnew, M. & Leach, J. Video-rate denoising of low-light-level images acquired with a SPAD camera. In 2016 Photonics North (PN), p. 1–1 (2016).
Chandramouli, P. et al. A bit too much? high speed imaging from sparse photon counts. In 2019 IEEE International Conference on Computational Photography (ICCP), p. 1–9 (2019).
Mora-Martín, G. et al. Video super-resolution for single-photon LIDAR. Opt. Express 31, 7060–7072 (2023).
Article ADS PubMed Google Scholar
Brady, D. J. et al. Multiscale gigapixel photography. Nature 486, 386–389 (2012).
Article ADS CAS PubMed Google Scholar
Glasner, D., Bagon, S. & Irani, M. Super-resolution from a single image. In ICCV, p. 349–356 (2009).
Wang, Z., Chen, J. & Hoi, S. C. H. Deep learning for image super-resolution: A survey. IEEE Trans. Pattern Anal. 43, 3365–3387 (2021).
Article Google Scholar
Chen, C., Chen, Q., Xu, J. & Koltun, V. Learning to see in the dark. In CVPR (2018).
Zhang, K., Zuo, W. & Zhang, L. FFDNet: toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 27, 4608–4622 (2018).
Article ADS MathSciNet Google Scholar
Azzari, L. & Foi, A. Variance stabilization for noisy+estimate combination in iterative poisson denoising. IEEE Signal Proc. Lett. 23, 1086–1090 (2016).
Article ADS Google Scholar
Dabov, K., Foi, A., Katkovnik, V. & Egiazarian, K. Image denoising by sparse 3-D transformdomain collaborative filtering. IEEE Trans. Image Process. 16, 2080–2095 (2007).
Article ADS MathSciNet PubMed Google Scholar
Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A. The PASCAL Visual Object Classes Challebge (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A. The PASCAL Visual Object Classes Challebge (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
Vaswani, A. et al. Attention is all you need. In NIPS, p. 5998–6008 (2017).
Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), p. 10012–10022 (2021).
Park, S. et al. Self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation. Nat. Commun. 13, 1–11 (2022).
Article ADS Google Scholar
Han, K. et al. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence. p. 1–1 (2022).
Whitesides, G. M. The origins and the future of microfluidics. Nature 442, 368–373 (2006).
Article ADS CAS PubMed Google Scholar
Zheng, G., Horstmeyer, R. & Yang, C. Wide-field, high-resolution Fourier ptychographic microscopy. Nat. Photon. 7, 739–745 (2013).
Article ADS CAS Google Scholar
Zheng, G., Shen, C., Jiang, S., Song, P. & Yang, C. Concept, implementations and applications of Fourier ptychography. Nat. Rev. Phys. 3, 207–223 (2021).
Article Google Scholar
Yang, X., Konda, P. C., Xu, S., Bian, L. & Horstmeyer, R. Quantized Fourier ptychography with binary images from SPAD cameras. Photon. Res. 9, 1958–1969 (2021).
Article Google Scholar
Li, J. et al. Atmospheric diffuse plasma jet formation from positive-pseudo-streamer and negative pulseless glow discharges. Nat. Commun. Phys. 4, 64 (2021).
Google Scholar
Smith, J. S. et al. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Comm. 10, 1–8 (2019).
Article Google Scholar
Kirillov, A. et al. Segment anything. arXiv preprint arXiv:2304.02643 (2023).
Altmann, Y. et al. Quantum-inspired computational imaging. Science 361, eaat2298 (2018).
Article PubMed Google Scholar
Callenberg, C., Shi, Z., Heide, F. & Hullin, M. B. Low-cost spad sensing for non-line-of-sight tracking, material classification and depth imaging. ACM Trans. Graph. 40. 1–12 (2021).
Hadfield, R. H. Single-photon detectors for optical quantum information applications. Nat. Photon. 3, 696–705 (2009).
Article ADS CAS Google Scholar
Xuyang Chang, L. B. & Zhang, J. Large-scale phase retrieval. eLight 1, 4 (2021).
Popescu, G. Large-scale phase retrieval. Light: Sci. Appl. 10, 175 (2021).
Dam, J. S., Tidemand-Lichtenberg, P. & Pedersen, C. Room-temperature mid-infrared single-photon spectral imaging. Nat. Photon. 6, 788–793 (2012).
Article ADS CAS Google Scholar
Kang, Y., Lu, H. X., Lo, Y.-H., Bethune, D. S. & Risk, W. P. Dark count probability and quantum efficiency of avalanche photodiodes for single-photon detection. Appl. Phys. Lett. 83, 2955–2957 (2003).
Article ADS CAS Google Scholar
Cheng, Z., Zheng, X., Palubiak, D., Deen, M. J. & Peng, H. A Comprehensive and Accurate Analytical SPAD Model for Circuit Simulation. IEEE Trans. Electron Devices 63, 1940–1948 (2016).
Article ADS Google Scholar
Moreno-Garcia, M. et al. Characterization-based modeling of retriggering and afterpulsing for passively quenched CMOS SPADs. IEEE Sens. J. 19, 5700–5709 (2019).
Article ADS CAS Google Scholar
Zappa, F., Tisa, S., Tosi, A. & Cova, S. Principles and features of single-photon avalanche diode arrays. Sens. Actuat. A-Phys. 140, 103–112 (2007).
Article CAS Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In CVPR, 770–778 (2016).
Liang, J. et al. SwinIR: Image Restoration Using Swin Transformer. In ICCV Workshops, 1833–1844 (2021).

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61971045, 61827901, 61991451).

Author information

These authors contributed equally: Liheng Bian, Haoze Song.

Authors and Affiliations

MIIT Key Laboratory of Complex-field Intelligent Sensing, Beijing Institute of Technology, Beijing, 100081, China
Liheng Bian, Haoze Song, Lintao Peng, Xuyang Chang, Chunli Zhu, Tong Qin, Dezhi Zheng & Jun Zhang
Yangtze Delta Region Academy of Beijing Institute of Technology (Jiaxing), Jiaxing, 314019, China
Liheng Bian & Dezhi Zheng
Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA
Xi Yang & Roarke Horstmeyer
School of Materials Science and Engineering, Beijing Institute of Technology, Beijing, 100081, China
Lin Ye

Authors

Liheng Bian
View author publications
You can also search for this author in PubMed Google Scholar
Haoze Song
View author publications
You can also search for this author in PubMed Google Scholar
Lintao Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xuyang Chang
View author publications
You can also search for this author in PubMed Google Scholar
Xi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Roarke Horstmeyer
View author publications
You can also search for this author in PubMed Google Scholar
Lin Ye
View author publications
You can also search for this author in PubMed Google Scholar
Chunli Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Tong Qin
View author publications
You can also search for this author in PubMed Google Scholar
Dezhi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.B. conceived and designed the project. H.S., X.Y., R.H., and L.Y. designed and implemented the experiments. L.P. and H.S. developed the network and conducted image enhancement. L.B., H.S., and L.P. wrote the manuscript with input from all authors. X.C., C.Z., T.Q., D.Z., and J.Z. helped revise the manuscript. J.Z. and L.B. supervised the work.

Corresponding authors

Correspondence to Liheng Bian or Jun Zhang.

Ethics declarations

Competing interests

L.B., H.S., and J.Z. hold patents on technologies related to the techniques developed in this work (China patent numbers ZL 2020 1 1157082.2) and submitted related patent applications. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Jinyang Liang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Movie 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bian, L., Song, H., Peng, L. et al. High-resolution single-photon imaging with physics-informed deep learning. Nat Commun 14, 5902 (2023). https://doi.org/10.1038/s41467-023-41597-9

Download citation

Received: 16 March 2023
Accepted: 12 September 2023
Published: 22 September 2023
DOI: https://doi.org/10.1038/s41467-023-41597-9

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.