Introduction

Optical sensing and metrology are crucial for a range of modern applications in biomedical, environmental, manufacturing, robotics, and autonomous driving applications. While conventional systems are bulk and expensive, single-pixel compressive sensing has emerged as a cost-effective alternative featuring high mechanical flexibility and with potential advantages in size, weight, and power parameters1,2. Its applications span over terahertz imaging3,4,5, short-wave infrared imaging6,7,8,9, and optical microscopy10,11,12, as well as imaging through scattering media13,14,15,16,17 and three-dimensional sensing18,19,20.

Among them, active optical compressive sensing uses a laser beam to illuminate the target and a single-pixel detector to capture the back-reflected signal21,22,23,24. The illuminating beam can be encoded with various patterns, like random patterns2,25 and structured patterns, including those of Fourier transform bases and Hadamard matrices2,26,27,28. These patterns under-sample the target’s optical properties for image reconstruction, recognition, and assessment, albeit at the price of information loss. Recently, it has been shown that artificial intelligence (AI) and machine learning can be employed to process compressed data, where the quality of the reconstructed image is improved by deep learning and recurrent neural networks29,30,31,32,33.

Applying to single photon compressive sensing, however, those techniques face challenges arising from background noise, including those from the ambient environment, from detector dark counts, and due to the inherent Poissonian fluctuations of the signal photon numbers. In particular, satellite light detection and ranging (LiDAR) systems do not perform well during daytime due to the overwhelming sunlight background. In recent years, there has been extensive research on mid and near infrared (IR) upconversion imaging and detection, for enhanced sensitivity compare to direct IR detection techniques34. In comparison, the direct detection primarily uses thermal sensors, suffering limited sensitivity and high noise, and usually requires expensive cryogenic cooling. In contrast, visible detectors exhibit significantly lower noise, better sensitivity, and do not need cryogenic cooling. Thus far, frequency upconversion has been demonstrated for photon starving imaging35,36, coincidence pumping upconversion37, and mid-IR photon counting38. Also, encoding the object with Hadamard matrices has been studied to reduce the distortion caused by upconversion39.

Here, we report an experiment combining single-pixel compressive sensing, single-photon detection, and machine learning for image classification, aiming at understanding and addressing quantum noise effects. Our setup consists of a structured illuminating beam prepared in Walsh 2D patterns created by a digital micromirror device (DMD)29,40,41, a spatial light modulator to create target images, and two switchable single photon counting systems to capture the reflected photons. The first system is an avalanche photodiode to directly count single photons, used as our baseline to study how photon counting and ambient noises affect the classification accuracy. The second is an upconversion photon detector based on quantum parametric mode sorting (QPMS), an exotic quantum frequency conversion capable of mode selectivity42,43,44,45. By using broadband pump pulses with a spectral width comparable to the phase-matching bandwidth of the frequency converter, QPMS is able to selectively convert single photons in a custom spatiotemporal mode46, where the returning signal photons are expected. As such, background photons in other modes will be rejected efficiently, even when they have spatial and temporal overlap with the probe. In this way, QPMS not only converts the signal to a more favorable wavelength for detection—with higher efficiency, lower dark counts, and smaller device volume—but also rejects the majority of the background noise upfront and increases the detection signal-to-noise by as much as 40 dB47. Applying to image classification, it will reject most sunlight or ambient background noise to achieve high accuracy in practical operation environment. As such, high classification accuracy can be realized despite high background noise44,47,48. Our results find that for direct detection (DD), the accuracy drops significantly when the mean photon numbers for one digit pattern falls below 300. To test the effect of ambient noise, we also inject white noise from amplified spontaneous emission (ASE). For the data with ASE noise, no notable change happens to QPMS detection even when the SNR level drops to −20 dB. In contrast, the classification accuracy decreases by ~30% for DD when the signal to noise ratio (SNR) level is 3 dB and the classification fails for lower SNR. These results demonstrate the sensitivity of single-photon compressive sensing to various types of noises and the ways to mitigate them, as applicable to many other single-photon sensing systems.

Results

Experimental setup

The experimental setup is sketched in Fig. 1. It involves the use of a femtosecond Mode-Locked Laser (MLL, CALMAR LASER, FPL-03CFF) at ~1550 nm with FWHM about ~60 nm and operating at a repetition rate of 50 MHz. The optical pulse train from the MLL is passed through a Wavelength Division Multiplexing (WDM, 200 GHz bandwidth) filters, which separates the optical pulse into two wavelengths. The resulting pulses for both signal and pump, after further filtering and amplification, to be 6 picosecond. As the phase matching bandwidth for the QPMS waveguide is 1 nm, such pulses lead to high mode-selectivity47. In Fig. 1, inset (a) represents the spectrum of 6 ps probe pulse centered at 1554.3 nm, while inset (b) represents the 6 ps pump pulse centered at 1564.3 nm. The probe and pump pulses are then amplified using erbium-doped fiber amplifiers (EDFAs) and further filtered using additional WDMs. The probe passes through an electrically controlled variable attenuator (V1550A, Thorlabs) and the 1% port of a 99:1 fiber beamsplitter to decrease the power of the probe to the nanowatt level. The probe pulses are collimated into free space via a fiber coupler and illuminated on a DMD, where we select its first negative diffracted order. This selected order is directed toward a Spatial Light Modulator (SLM) for further manipulation. To facilitate this setup, 4-f relay lenses are employed for both the DMD and SLM. The DMD is composed of an array of micromirrors with high reflectivity. We would like to note that our current choice of a DMD followed by a SLM is merely due to the equipment availability in our lab. Other setup configurations, including those both use DMD or SLM are applicable for this work.

Fig. 1: Experimental setup for single-pixel compressive sensing with single photon counting.
figure 1

Block a is for the single and pump generation; Block b is for the noise generation; Block c is for the direct detection; Block d is for QPMS detection. Insets ei show the normalized spectrum of the probe, the pump, the ASE (amplified spontaneous emission), the wavelength selected ASE noise, and the upconverted probe and pump respectively. They are taken from corresponding five different positions in the setup. The DMD (Digital micromirror device) exhibits Walsh 2D patterns, while the SLM displays MNIST digit images. The components used in the setup include an MLL (Mode Lock Laser), SLM (Spatial Light Modulator), DMD, EDFA (Erbium Doped Fiber Amplifier), WDM (Wavelength division multiplexing), FC (Fiber Coupler), PPLN Module (Fiber-coupled Magnesium-doped Periodic Poled Lithium Niobate waveguide), InGaAs-SPD (Indium Gallium Arsenide single-photon detector) and Si-SPD (Silicon single-photon detector), L1 and L2, L3 and L4, and L5 and L6 are three couples of 4-f relay lenses. Fiber polarization controller (FPC) 1 is used to adjust the probe polarization for SLM. FPC 2 and FPC 3 are used to adjust the polarization of the probe and pump, respectively.

We upload Walsh 2D patterns onto the DMD, as illustrated in the inset of Fig. 1. The SLM (SLM210, Santec) contains the phase patterns from the MNIST49 dataset. Each pixel in the phase patterns is expressed as a binary phase value of either 0 or 3π/8. To ensure the MNIST 28 × 28 digit images align with the probe beam diameter on our SLM, we resize them to 240 × 240 images. Then the resulting probe is coupled into a single-mode fiber and subsequently divided into two equal parts using a 50:50 fiber beamsplitter. One part is directed towards DD, where the photons are counted directly using an InGaAs SPD (ID210, IDQ). In this experiment, the quantum efficiency of the gated (1 ns effective gate width) InGaAs-SPD is set to be 20%, with losses in fiber connectors and filters, giving a total detection efficiency of 4.2% for the DD channel.

The other part is combined with the pump using a WDM fiber before coupling into periodic poled lithium niobate (PPLN) module (see Supplementary Fig. 1) for QPMS detection. The pump pulse train is delayed by an optical delay line (ODL) and polarization is controlled via a polarization controller to generate an efficient upconverted output, as shown in inset (e). The resulting output, operating at a wavelength of 779.59 nm (see Supplementary Fig. 2), is then detected using a silicon SPD (Si-SPD, Excelitas). The maximum normalized internal conversion efficiency of the PPLN module is 207.7%W−1cm−2 and the Si-SPD has a quantum efficiency of 66%. Together with losses in fiber connectors and filters, the total detection efficiency for the QPMS channel is 12.0%.

Both outputs from the InGaAs SPD and Si-SPD are captured separately using a time tagger (Time Tagger Ultra, Swabian Instruments). To evaluate the performance of our setup, we introduce ASE noise using an EDFA, as shown by inset (c) in Fig. 1 having an identical spectrum to the signal. The ASE noise is then combined with the probe by passing it through a 1554.1 nm WDM, which aligns the noise within the same wavelength range as the probe. Subsequently, the noise is amplified by another EDFA and its level is adjusted using a mechanically controlled variable attenuator. The normalized spectrum is shown in Fig. 1, inset (d). Finally, the noise is introduced into the setup by connecting it to the last port of the 50:50 fiber beamsplitter. The data collected from the time tagger is post-processed using MATLAB.

Model

The deep neural network (DNN) consists of a total of 7 layers including the input layer, five hidden layers, and the output layer. Rectified linear unit (ReLU) activation functions are applied to the hidden layers, while a log-softmax function is used between the second last layer and the last layer. In our experiment, there are 100 different images for each digit from the MNIST dataset. For 10 digits, the amounts to a total of 1000 samples, each containing 300 mean photon counts. For classification, we allocate 75% of the samples as the training data and 25% as testing. The changing pattern time on the DMD is ~100 μs. When we process the data from the experiment setup, disregard these photon-counting events during the changing pattern time on the DMD.

For each handwritten digit, 300 photon counts for 300 Walsh 2D patterns are acquired as shown in Fig. 2a, b. Figure 2a shows as an example a part of the original photon counting events by QPMS for the first image of digit “0" for QPMS detection with no ASE noise added. Figure 2b shows their average for each of the 300 patterns uploaded onto the DMD, which extracts 300 features from each digit image. These features serve as the inputs to the DNN as illustrated in Fig. 2c.

Fig. 2: Steps for Process Data.
figure 2

Example experimental data and the processing neural network. a Example raw photon counts recorded by DD (direct detection) for the first image of digit “0", where each event counts photons for 100 μs, and there are 10 events for each DMD (digital micro-mirror device) pattern (thus the figure shows the results for 80 patterns). b Example raw photon counts recorded by QPMS (quantum parametric mode sorting) for the first image of digit “0", where each event counts photons for 40 μs, and there are 25 events for each DMD pattern (thus the figure shows the results for 80 patterns). c Example mean photon counts of QPMS for each DMD pattern by averaging over 20 of the 25 events, where the first three and last two events are dropped as the systems settle during the pattern transition. The effective integration time of photon counting is thus 0.8 millisecond in this case. Mean photon counts of DD for each DMD pattern by averaging over 8 of the 10 events, where the first and last events are dropped as the systems settle during the pattern transition. d The neural network architecture, used for the handwritten digit classification, consists of input, hidden, and output layers.

In the experiment, two white patterns are added to the end of the existing 300 pattern sequence, to serve the purpose of distinguishing each target uploaded onto the SLM. This is manifested by the first two-photon counting events of Fig. 2a.

Measurements and classification accuracy

In the experiment, the photon counts for each sample are varied by changing the integration time of the single photon detectors, allowing us to study the effects of quantum noise. When taking measurements at the single photon level, it is crucial to choose the SPD dwell time. If the time is too short, the impact of shot noise is significant. Conversely, if it is too long, the SPD is likely to become saturated. As an example, Fig. 3 shows the confusion matrices for both DD Block c) in Fig. 1) and QPMS Block d) in Fig. 1) under different integration times and different probe powers (corresponding to Fig. 1 Block a) on and Block b) off). As shown in Fig. 3a, when the integration time per mask is 600 μs for the InGaAs SPD and the probe power is ~14 nW, the average photon counts per mask is 398.6, which gives an 82.8% classification accuracy. In Fig. 3b, as we increase it to 800 μs but keeping the probe power, the average photon count per mask becomes 551.2, and the classification accuracy improves to 90%. On the other hand, with the same 800 μs integration time but reducing the probe power to ~8.5 nW, the average photon counts per mask is 391.5, and the classification accuracy drops to 80.4%, as shown in Fig. 3c. These results highlight the accuracy dependency on the photon counts, whose fluctuations are governed by shot noise. As such, with higher photon counts, the measurement uncertainty is reduced, so that the recognition accuracy is higher. To further illustrate this, as the probe power drops down to ~4 nW (while keeping the same 800 μs integration time), the average photon counts is 299.7, for which the classification accuracy is only 31.6%. Data details in Supplementary Table 1 and Supplementary Figs. 79.

Fig. 3: Classification Accuracy without in-Band Noise Added.
figure 3

Normalized confusion matrices for DD (direct detection) (ac) and QPMS (quantum parametric mode sorting) (d), using 300 training epochs for DD and 200 training epochs for QPMS. a Normalized confusion matrix of DD with 600 μs integration time per mask with average photon counts as 398.6 when the probe power is ~14 nW. Classification accuracy is 82.8%, b Normalized confusion matrix of DD with 800 μs integration time per mask with average photon counts as 551.2 when the probe power is ~14 nW. Classification accuracy is 90%, c Normalized confusion matrix of DD with 800 μs integration time per mask with average photon counts as 391.5 when the probe power is ~8.5 nW. Classification accuracy is 80.4%, d Normalized confusion matrix of QPMS detection with 200 μs integration time per mask with average photon counts as 485.8 when the probe power is ~14 nW. Classification accuracy is 98%.

In comparison, in Fig. 3d we show the classification accuracy obtained with the QPMS detection system. With an integration time of 200 μs and probe power of ~14 nW, the average photon count per DMD mask is 485.8. The higher photon counts despite a shorter integration time is a result of a higher total detection efficiency for QPMS 12.0% versus 4.2% for the DD channel. The classification accuracy in this case is 98%, which is significantly higher than the DD case with similar or even higher photon counts.

The much better performance in QPMS over DD as seen in Fig. 3 is mainly attributed to its much lower dark counts. To check this, Fig. 4 plots the measured dark counts of the two detection channels. As shown, the dark count is about 0.29 per 10 microsecond integration time for DD, and about 0.21 per 10 μs for QPMS. The Raman scattering noise in the QPMS is 7.5 × 10−4 photons per pulse. They translate to 234 per 800 μs integration time for the DD channel in Fig. 3b, which is about half of the average photon count. In contrast, the dark count is only 40 per 200 μs integration time for the QPMS channel in Fig. 3d, about 8.2% of the average count (see Supplementary Figs. 5 and 6). This leads to the significantly better performance for QPMS. In both cases, the dark count fraction in the total photon counts can be reduced by improving the quantum efficiency of the avalanche photodiodes (APD’s). However, there is always a trade-off among APDs’ quantum efficiency, dark counts, and saturation counts. For example, in our experiment, the dark count for the InGaAs SPD can be reduced to only 0.01 per 10 microseconds by setting quantum efficiency to 15%. However, the saturation counts will be lowered by more than a half, so that the dynamic range of detected signal photons is reduced. In comparison, Si-SPD has a much better performance in this perspective, with orders of magnitude lower dark counts, at least 10 times higher saturation counts, and several times higher efficiency. This is a practical advantage of QPMS as an upconversion single photon detector.

Fig. 4: Dark Counts.
figure 4

The background counts versus the integration time on the InGaAs SPD (single photon detector) and Si SPD. For the dark counts of the DD (direct detection), the probe and the pump are off. The red triangle sign is the experiment background counts of the InGaAs SPD, whose fitting curve (the blue line) is a quadratic function. For the dark counts of the QPMS (quantum parametric mode sorting) with the input pump power as 16.5 dBm. The green diamond sign is the experiment shot noise of the QPMS detection, whose fitting curve (the brown line) is a quadratic function.

Figure 5a summarizes the classification accuracy as a function of the average detected photon number per DMD mask, for both detection channels and different numbers of DMD masks used. For QPMS, when the average photon number is 119.7 per mask, the accuracy is 97.2% with 300 masks and 93.2% with 100 masks. For higher photon counts, the accuracy fluctuates a bit but quickly saturates to 99.2% for both cases, when the average photon count increases to around 500. In contrast, the accuracy for the DD is much lower for the same average photon counts. For example, with a 59.5 average photon count, it is 42.4% with 300 masks and only 26.4% with 100 masks. As the average count increases to 550, the accuracy increases to around 90% and 82.8%, respectively, for the two mask numbers, which is still much lower than the QPMS results. Due to the InGaAs-SPD saturation, the average photon count cannot be further increased for the DD channel. Again, the much better performance of QPMS over DD is due to a much higher total detection efficiency (12.0% vs 4.2%) and a lower dark count, so the percentage of true signal photons in the total photon counts is much higher. Reducing the integration time on the SPD leads to dark noise becoming the primary source of noise in the data. The observed trend in classification accuracy for DD versus the average photon counts per mask is consistent with previous findings50, where the relationship between the optical energy per inference and classification accuracy. Also, as shown in Fig. 5a, as we decrease masks for DD and QPMS, the classification accuracy of QPMS drops less than DD.

Fig. 5: Classification Accuracy in Different Cases.
figure 5

a The classification accuracy is plotted for two different detection methods with 40, 100, or 300 Walsh 2D patterns to sample targets (each corresponding to 0.07%, 0.17%, and 0.52% compression ratio): QPMS (quantum parametric mode sorting) represented by the green, the red, and the black curves with 200 training epochs, and DD (direct detection) represented by the yellow and the blue curves with 300 training epochs. The efficiency and integration time of Si SPD (single photon detector) and InGaAs SPD are different. As observed, the InGaAs SPD has more background noise than the Si SPD, which causes the classification accuracy of DD to be lower than the QPMS detection with comparable photon counts. The QPMS encodes the probe optically, which helps us go further in the compressive ratio. b The classification accuracy of DD and QPMS single photon detection vs SNR (signal to noise ratio) (dB) of the ASE (amplified spontaneous emission) noise with 100 or 300 input values into the neural network.

Figures 35a present the results without external background noise, where the noise photon counts are from within the detection systems, including SPD dark counts and Raman scattering noises in the QPMS module. In practical applications, however, there are external background photons from ambient environment, such as sunlight and city light. They will add to the noise counts and lower the classification accuracy. To simulate their effects, we add ASE noise into the system and apply the same machine-learning technique to test the recognition accuracy. As shown in Fig. 1, the ASE noise is generated by an EDFA without input, followed by a WDM filter and another EDFA to create in-band noise with a nearly flat spectrum. Its amplitude is controlled using the mechanically controlled variable attenuator, before split equally into the DD and QPMS channel through a 50:50 fiber beamsplitter. Its strength relative to the signal, i.e., the signal-to-noise ratio (SNR), is defined as \(10{\log }_{10}\left[({N}_{{{{{{{{\rm{off}}}}}}}}}-{N}_{{{{{{{{\rm{dark}}}}}}}}})/({N}_{{{{{{{{\rm{tot}}}}}}}}}-{N}_{{{{{{{{\rm{off}}}}}}}}})\right]\). Here Noff is the photon counts without ASE, as registered by the InGaAs SPD with the first image of digit 1 from MNIST dataset on SLM and set the 31st Walsh 2D pattern we generated on DMD, which gives the highest photon counts over all combinations used in this experiment. Ndarkis the same photon counts, but with the probe shut off. Ntot is the same photon counts, but with the added ASE noise.

To ensure the reliability of our data, we first examine the detection results without ASE noise. This is done to establish a baseline and confirm the validity of our results before introducing ASE noise under the same parameter settings, including the effective dwell time of each mask on the DMD, integration time on the SPDs, settings on EDFAs, and polarization conditions. In this particular case, which corresponds to both Block a) and Block b) in Fig. 1 being on, the effective dwell time of each Walsh 2D pattern on the DMD is set to 800 μs. The integration time on InGaAs SPD is 100 μs, while the integration time on Si SPD is always 40 μs. Under this condition, the classification accuracy for DD is 80.4% with 300 masks after 300 epochs of training, as shown in Fig. 3c with the probe power ~8.5 nW and the SPD dark counts ~234. If we further reduce the probe power, the dark counts will dominate and significantly lower the classification performance. For example, the classification accuracy drops to 41.2% with the probe power ~6 nW, and to 31.6% with the probe power ~4 nW. In contrast, for the Si SPD, the dark counts are much less and the saturation level is much higher, we are able to attenuate the prob power to around ~6 nW so that the SNR can be as low as −27 dB for QPMS.

Next, we inject the ASE noise into the setup, whose power level is adjusted through a mechanically controlled attenuator. At each level, we use 75% of data for training and the remaining for testing. The results are shown in Fig. 5b. For DD, the probe power after the FC1 is set to be about 8.5 nW. The ASE power is first set to be the same as the probe when they are combined at the 50:50 beamspliter, for which the SNR is 0 dB. In this case, the photon counts for all patterns vary between 67 and 115, well below the InGaAs SPD saturation level. The classification accuracy is only 22.0% after 300 training epochs with 300 masks on the DMD. Halving the ASE power increases the SNR to 3 dB and improves the accuracy to 51.6%. As we attempt to double the ASE to reduce the SNR to −3 dB, the SPD saturates. In contrast, QPMS performs much better. As shown in Fig. 5b, the classification accuracy remains above 98% when the signal is down to 100 times weaker than the noise. Even when SNR is −27 dB, for which the signal is 500 times weaker, the classification accuracy is still 94%. This advantage comes from two reasons. The first is the noise rejection. In our experiment, QPMS rejects over 99.9% of noise51, so that the actually detected noise by the Si-SPD is only on the dark count level, much weaker than the signal, leading to high accuracy. The second reason is with the much lower noise equivalence of dark counts (NEDC), defined as the detector dark count divided by the total detection efficiency. For DD, per 40 μs, the dark count is 11.8 and the efficiency is 4.2%, so that the NEDC is 281. For QPMS, the dark count is 8.5 per 40 μs, and the efficiency is 12.0%, which gives a NEDC of 71. As such, the baseline photon counts for the DD are much higher, thus blurring the feature mapping onto the changes in the photon counts as the DMD masks are changed. Due to these reasons, the registered photon counts change over a much larger dynamic range for QPMS than DD, as seen in Supplementary Note 2, so that the features are more pronounced and the neural network is better and easier to recognize them. More data are in the Supplementary Figs. 910 present the normalized confusion matrices, and in Supplementary Table 2 gives the range of average photon counts over photon counting events.

Also, the high timing resolution of QPMS might offer another advantage. That is, the photons reflected off different pixel locations on the SLM and DMD could experience different times of flight. In our experiment setup, this difference is on the order of picoseconds. While it is well within the timing resolution of typical SPD, it approaches that of QPMS (about 10 ps). As such, those photons could be detected with different efficiency, so that the photon counts not only contain the aggregated phase information of the digits as sampled by the DMD but also their relative distribution information. However, we have not been able to verify it yet in this study.

Finally, to test the robustness of our system, we perform training on one ASE noise level and testing on a different level. The results are shown in the Fig. 6, with (a) for the training at 0 dB SNR and testing without noise and Fig. 6b for the training at -10 dB SNR and testing at −17 dB. In both cases, the classification accuracy remains rather high, no less than 98% with 300 masks. These findings demonstrate the robustness of compressive sensing and recognition via QPMS, as it rejects most of the noise, has a low NEDC, and may capture the relative pixel distribution of the digits. There are more cases testes (see Supplementary Fig. 11).

Fig. 6: Classification Accuracy of QPMS with Different Levels of Noise for Training and Testing.
figure 6

QPMS (quantum parametric mode sorting): In this case, the training process involves using a specific level of ASE (amplified spontaneous emission) noise as the training dataset (200 training epochs), while other levels of ASE noise are used as the testing dataset. a Normalized confusion matrix with SNR (signal to noise ratio) = 0 dB as training and no noise as testing. b Normalized confusion matrix with SNR = −10 dB as training and SNR = −17 dB as testing.

Conclusion

We have explored single-pixel, single-photon counting for remote compressive sensing and image recognition, focusing on studying the effects of background and inherent quantum noise. Our results identify their significance and demonstrate an effective way of mitigation by quantum parametric mode sorting. By rejecting the noise while improving the detection efficiency, it allows highly accurate classification of MNIST images even if the signal is mixed with orders of magnitude stronger in-band noise. Our technique may find applications in remote sensing areas where information compression is desirable, yet the signals are either weak or contaminated, or both, such as those in astronomy observation and biomedical diagnosis. In particular, it may help compressive imaging with single photon detection, by greatly suppressing the background noise and reducing detector saturation39. It also allows efficient detection of mid-IR photons with low noise and compact single-photon detectors at room temperature, to extend compressive imaging to mid-infrared38.

Methods

The pictures of the experimental setup are shown in Fig. 7a a side view and b a top view. In Fig. 7a, the Mode-Locked Laser (MLL, Calmar Mendocino) is labeled as (1). The probe light propagates through a fiber coupler (FC1, C220TMD C, Thorlabs) labeled as (2) to transmit into free space. The DMD is indicated as (3), while the SLM is labeled as (4). The probe light is subsequently coupled into the FC2 (C230TMD C, Thorlabs) labeled as (5). Before we take data at single photon level, we try with classical power level (the scheme of the setup is in Supplementary note 1), like mW and get good result with that power level (see Supplementary Figs. 3 and 4.)

Fig. 7: Core of the Experiment Setup.
figure 7

Photos of the experimental setup: a side view and b top view of the free-space setup. The labels refer to the following components: (1) MLL: Mode-locked laser sends out pulse trains for the probe and pump. (2) FC1: Fiber coupler couples the probe out to free space. (3) DMD: Digital micromirror device projects Walsh 2D masks. (4) SLM: Spatial light modulator uploads digit images from the MNIST dataset. (5) FC2: Fiber coupler couples the probe into a fiber. (6) PPLN module: Periodically poled lithium niobate module upconverts the probe and pump into high frequency. (7) ODL: Optical delay line helps the probe and pump achieve temporal alignment. (8) InGaAs SPD: Indium gallium arsenide single-photon detector is used for direct detection (DD) at the single-photon level. (9) Si SPD: Silicon single-photon detector is used for quantum parametric mode worting (QPMS) detection at the single-photon level. (10) Time Tagger: Device used for collecting data.

We have augmented the training and testing with much more data, including those direct detection data without noise, QPMS data without noise, and QPMS data with different levels of noise; see below. In all cases, the classification accuracies remain the same. As a further validation, we have take more data for 2000 images, and found that the confusion matrix to be nearly identical when compared with 1000 images (see Supplementary Fig. 12).