Introduction

Light detection and ranging (Lidar) enabled by Single-Photon Avalanche Detector (SPAD) sensors has seen significant research in recent years as a next generation imaging technology1,2,3. By coupling Silicon Semiconductor pixels to Time-to-Digital Converters (TDCs) imaging systems with both single-photon sensitivity and picosecond temporal resolution can be created, allowing for, in principle, the measurement of an objects ‘depth’ (i.e., its surface profile) at millimeter scales4,5. The ability to measure depth has led to SPADs being used in a wide range of applications, for instance: exploiting the single-photon sensitivity to image underwater6, through obscurants7,8 or at long range9,10; or, leveraging temporal gating to image ‘through’ nets11,12 or to view the flight of a laser pulse13,14. More recently, SPAD-based Lidars have been examined for automotive applications15,16,17 as well as for machine vision tasks including gesture recognition18,19, identification20, drone tracking21, and pose detection22. The growing number of SPAD based imaging applications highlights the need to develop imaging systems which can resolve fine depth features at high acquisition rates.

In conjunction with the adoption of SPAD based imaging systems several publications have focused on modeling the performance of SPADs. These works are predominantly focused on two areas, either, simulating the electron avalanche characteristics of individual pixels, or, examining methods for better data processing. In the case of the former, analyses of the peak detection efficiency, timing resolution, jitter, and dark count rates have been achieved using physical models23,24,25, software26,27,28, Monte-Carlo simulations29, and analytical models30. Further, the effect of specific factors such as tunneling31, nano-structures32, and temperature33 have also been examined.

As well as examinations of the electron avalanche characteristics several works have focused on the processing of SPAD data. To create a depth image the timing information from multiple laser pulses is accumulated into a histogram. The position of the peak of the histogram is assumed to correspond to the depth of the target. Consequently the quality of the depth image produced by a SPAD imaging system is directly dependent on the ability to identify the peak of the histogram. This process is complicated by spurious counts in the histogram resulting from dark counts, solar background counts, or, back reflections from scattering media34. Prior works have addressed this by adjusting TDC triggering based on adaptive photon thresholds35,36,37, laser pulse modulation38,39,40, attenuation41, principle component analysis for rapid processing42, and match filtering43,44,45.

Despite the growing number of SPAD-based Lidar applications, comparatively few works focus on modeling the performance of such systems in the context of entire images. Specifically, uncertainty in depth estimation has been examined using Monte-Carlo simulations and Fisher information for single pixels in the case of both simple and complex surfaces46,47. Further, semi-analytical single pixel models that characterise the SPADs using Poisson or Erlang distributions together with confidence intervals48,49 and incorporate effects such as SPAD dead time50,51 have been presented. However, the Poisson (and by extension Erlang) distributions have been found to diverge from observed SPAD behaviour in both the small and large photon number limits52,53. Additionally, accurate simulations of SPAD images have been presented in the context of Fluorescence Lifetime Imaging Microscopy (FLIM) and lifetime estimation53,54.

In this work we explore the fundamental limits of depth imaging with SPAD sensors and develop a model applicable to multiple applications including, SPAD Lidar, imaging in obscurants, and imaging around corners. We build upon the existing examinations of SPAD Lidar systems by combining a physical model with a binomial sampling procedure and a 3-Dimensional (3D) virtual environment to extend the accurate estimation of SPAD Lidar capabilities to entire images. A summary of this work is depicted conceptually in Fig. 1. Specifically, by accounting for the salient factors of real imaging systems, as bulleted in Fig. 1, we derive a physical process which directly relates SPAD measurements to the parameters of the optical system. By integrating this process with a 3D virtual environment (realised by the Unreal Engine) we develop a description of SPAD imaging in the context of photon arrival probabilities and Fisher information. The use of a 3D virtual environment allows for the application of our model to a wide variety of SPAD imaging scenarios since the complex geometries of real world objects, for instance the car depicted in Fig. 1, can be accurately accounted for. We develop a fast computational model for SPAD images by implementing our calculation of photon arrivals on a Graphics Processing Unit (GPU). We make this model publicly available at (https://github.com/HWQuantum/Simulating-single-photon-detector-array-sensors-for-depth-imaging).

Figure 1
figure 1

A conceptual summary of this work. In a direct time-of-flight flash Lidar system a laser pulse, depicted by the red beam, is used to illuminate a target, shown here by the simple vehicle model. Each pixel in the SPAD sensor then recieves light from a region of the target plane (illustrated by the square on the vehicle) subject to the properties of the target, scattering, the aperture of the collection lens, and the parameters of each pixel. The 19 bulleted parameters are used to define a computational model for photon measurements in the context of probabilities and Fisher information. This model can be used in two regimes: A computationally efficient (but inflexible) mode which directly simulates depth images by leveraging the Cramér-Rao bound (CRB); A computationally intensive mode which is highly flexible. This second mode precisely mimics SPAD operation by producing physically realistic histograms (and by extension depth images) on a per-pixel basis.

The computational model is capable of operating in two regimes; a computationally intensive but highly flexible and precise mode, or, a computationally efficient mode subject to some restrictions. In the one regime (the PC’s right hand side output in Fig.  1) we employ a binomial sampling process to precisely mimic the operation of SPAD sensors. This process allows for the generation of physically realistic SPAD histograms (and by extension depth images) on a per pixel basis over an entire sensor. In the other regime (the PC’s left hand side output in Fig.  1), we demonstrate that (under appropriate conditions) the Cramér-Rao bound associated with the Fisher information of the SPAD system can be used to model SPAD images directly. This direct modeling removes the need for histogram generation significantly reducing computational complexity. We confirm the validity of our approaches with experimental measurements of a resolution test target using a state-of-the-art SPAD array sensor and present median simulation accuracies in excess of 80%. Finally, we highlight the flexibility of our system by accurately modeling the SPAD images obtained of a vehicle at 1.4 km under real world conditions.

Theory

The number of photons returned from a target is described by a photon channel that models the loss of signal photons as a series of sequential processes. Consider a laser pulse of wavelength \(\lambda\) and initial energy \(E_0\) having a divergence \(\theta\) projected over a range R, through an atmosphere of attenuation length \(C_{atm}\) as depicted in Fig.  1. For a target with Lambertian scattering and reflectivity \(\Gamma\), the average number of photons detected per-pulse-per-pixel \(P_{pp}\) by an imaging sensor with pixels of quantum efficiency q and effective (pixel dimension \(\times\) fill factor) size \(W_p\times H_p\) at the focal plane of a collecting lens with f-number \(f_{no}\) is

$$\begin{aligned} \begin{aligned} P_{pp} = \frac{\lambda E_0}{h c}\frac{q\Gamma e^{\frac{-2R}{C_{atm}}}}{8}\frac{ W_pH_p}{f_{no}^2\pi R^2\tan ^2(\theta )}. \end{aligned} \end{aligned}$$
(1)

Here, h is Plank’s constant and c is the speed of light. The 8 in the denominator is a consequence of treating the surface reflectivity \(\Gamma\) and the atmospheric scattering \(C_{atm}\) of the return as two separate processes. If a different scattering mechanism is examined, the value in the denominator will change. Additional information on the derivation of Eq. (1) is given in supplementary materials 1. Note that even if \(P_{pp}>1\) then only a single photon is measured due to the single-photon-per-pulse limit for SPADs. For many SPAD imaging applications the detected number of photons-per-pulse-per-pixel is less than one hence, Eq. (1) represents the probability of a signal photon being measured by the detector. i.e., the SPAD can be expected to click at least every \(1/P_{pp}\) pulses on average. Further, for clarity of explanation, we assume only a single wavelength for Eq. (1). In the most general case Eq. (1) could be extended to multiple wavelengths by integrating over all relevant \(\lambda\)’s. Equation 1 is a variant of the radar equation with \(\Gamma\) functioning as an optical cross-section and \(( W_pH_p)/[f_{no}^2\pi R^2\tan ^2(\theta )]\) defining the relationship between the effective aperture and the illuminated area. This ratio indicates that larger pixels are desirable since in the limit of a single-point detector that sees the whole field-of-view all of the gathered energy is collected by a single pixel. Conversely, for a fixed sensor footprint increasing the transverse resolution of the detector will reduce the light gathering capabilities of each pixel. Note that Eq. (1) is independent of the focal length of the collecting lens. This is because as the range increases, the focal length of the collecting lens must also increase to maintain the field-of-view. If the f-number is constant, this corresponds to a lens with a larger aperture which captures more of the scattered light and offsets the scattering losses.

Fisher information

Assuming that the impulse-response-function of the laser pulse and SPAD is approximately Gaussian in time with a mean \(\mu\) and a standard deviation \(\sigma '\), Eq. (1) can be extended to a likelihood function \(L(t|\mu ,\sigma ')\),

$$\begin{aligned} \begin{aligned} L(t|\mu ,\sigma ') = C_{dc} + C_{bckg} + \frac{P_{pp}}{\sigma '\sqrt{2\pi }}\exp {\left[ -\frac{1}{2}\left( \frac{t-\mu }{\sigma '}\right) ^2\right] }. \end{aligned} \end{aligned}$$
(2)

Here, \(C_{dc}\) is the dark count rate of the detector measured in Hz, and \(C_{bckg}\) is the background counts in Hz, both of which are constant in time. In the case of solar background, a solar photon which strikes the target must travel through the same photon channel as the signal photon in order to reach the detector, hence \(C_{bckg}\) is

$$\begin{aligned} \begin{aligned} C_{bckg} = \frac{\lambda }{h c}\frac{q\Gamma e^{\frac{-R}{C_{atm}}}}{8 f_{no}^2}W_{bckg}W_pH_p, \end{aligned} \end{aligned}$$
(3)

where \(W_{bckg}\) is the solar background at \(\lambda\) in Watts-per-square-meter. The standard deviation of the impulse-response-function is defined as \(\sigma ' = (\text {FWHM}/2\sqrt{2\ln {2}})\), where FWHM is the full-width at half-maximum of the impulse-response-function. Equation  2 is defined such that

$$\begin{aligned} \begin{aligned} \int _{0}^{T} L(t|\mu ,\sigma ') dt = \alpha = T(C_{dc}+C_{bckg})+P_{pp}, \end{aligned} \end{aligned}$$
(4)

i.e., integrating the likelyhood for the Time Correlated Single Photon Counting (TCSPC) interval [0, T] returns the average number of counts \(\alpha\) measured by the detector in that interval. The Fisher information per pulse \(F(t|\mu ,\sigma ')\) with respect to the peak position \(\mu\) is defined as,

$$\begin{aligned} \begin{aligned} F(t|\mu ,\sigma ')&= \int _{0}^{T}\left\{ \frac{\partial \ln {\left[ \frac{L(t|\mu ,\sigma ')}{\alpha }\right] }}{\partial \mu }\right\} ^2\frac{L(t|\mu ,\sigma ')}{\alpha } dt\\&= \int _{0}^{T}\frac{P_{pp}^{2}(t-\mu )^2\exp {\left[ -\left( \frac{t-\mu }{\sigma '}\right) ^2\right] }}{\sigma '^62\pi \alpha \left\{ C_{dc} + C_{bckg} + \frac{P_{pp}}{\sigma '\sqrt{2\pi }}\exp {\left[ -\frac{1}{2}\left( \frac{t-\mu }{\sigma '}\right) ^2\right] }\right\} } dt\\ \end{aligned} \end{aligned}$$
(5)

for which a closed form analytical solution does not exist. By accumulating multiple frames, wherein each frame consists of many laser pulses (but at most one photon measurement) a histogram of photon arrival times can be created. A lower bound on the standard deviation \(\sigma _{\mu }^{*}\) in the estimate of the peak position of this histogram55,56, and by extension the quality of the depth image (Fig. 1) is given by the Cramér-Rao bound

$$\begin{aligned} \begin{aligned} \sigma _{\mu }^{*} = \frac{1}{\sqrt{N[1-(1-\alpha )^{\eta \nu }] F(t|\mu ,\sigma ')}}. \end{aligned} \end{aligned}$$
(6)

Here, N is the number of frames accumulated to create the histogram and \(1-(1-\alpha )^{\eta \nu }\) is the probability of measuring at least one photon per frame for a frame of exposure time \(\eta\) and a laser repetition rate \(\nu\). Together, \(N[1-(1-\alpha )^{\eta \nu }]\) represents the number of successful events in the histogram. Equation 6 characterises the minimum possible standard deviation associated with estimating the depth of a single point. We can additionally apply a stricter criteria and define the minimum distinguishability \(\sigma _{\mu }\) as \(\sigma _{\mu } = \sigma _{\mu }^{*}\times 2\sqrt{2\ln {(2)}}\) i.e., in a manner analogous to the Rayleigh resolution criteria, two depths are considered distinguishable when the peaks of the distributions associated with each point are separated by at least one FWHM.

By examining Eqs. (1)–(6) it is possible to isolate the effects of individual optical parameters on the variance in the depth estimation. Specifically, changes to the parameters that are shared between Eqs.  (1) and (3) do not effect the Signal to Background-Noise Ratio (SbNR) of the system and so have a negligible impact (if \(C_{dc}<< C_{bckg} + P_{pp}\)) on the Fisher information per pulse Eq. (5). However, changing these parameters does effect the average number of counts per pulse \(\alpha\) Eq. (4) and therefore the number of successful measurements \([1-(1-\alpha )^{\eta \nu }]\) per frame. Consequently, for fixed image acquisition time (finite N and \(\eta\)) the imaging ability of the Lidar system is affected. Alternatively, the Fisher information can be acted upon directly by adjusting the SbNR given by SbNR \(=[E_{0}\exp (-R/C_{atm})]/[W_{bckg}\pi R^2 \tan ^2(\theta )]\). Further, from the integrand of Eq. (5): when \(P_{pp}\) = 0, the Fisher information is 0, which is consistent with the notion that depth cannot be inferred from dark counts and background photons arriving at the SPAD at random; and, the Fisher information is strongly governed by the pulse width \(\sigma '\). This strong \(\sigma '\) dependence in conjunction with the \(E_0\) dependence in the SbNR implies that for the same average power, laser illuminators with higher peak powers will have superior imaging performance. Finally, the performance of the system can be affected by changes to the laser repetition rate \(\nu\) in a manner comparable to changes in the \(\alpha\) parameter. This is consistent with the intuition that the imaging performance of the Lidar system is contingent on the average power \(\nu E_0\) of the laser illuminator.

Figure 2 conceptually illustrates the imaging performance in the 4 regimes which arise as a result of the independence between the number of samples and the SbNR. Specifically, two SbNR and two sampling cases are depicted. For low SbNR and sparse sampling the variance in the estimate of the peak position (Fig. 2 double-headed red arrows) is large and the resultant distinguishability in the depth image poor i.e., fine features cannot be seen. When the variance is reduced, either by denser sampling or higher SbNR, the quality of the depth image improves. Lastly, when the histogram position can be repeatedly accurately determined the variance is small and fine features in the depth image can be seen.

Figure 2
figure 2

A conceptual illustration of the relationship between the variance in the estimate of the peak position of the histograms and the obtained depth image. Two Signal to Background-Noise Ratio (SbNR) and two sampling cases are depicted for a total of four regimes. Within each regime the underlying probability distribution of photon arrivals is shown by the grey curve. This curve is consistently centered on the true depth value shown by the black dashed line. The grey curve is sampled to produce a series of histograms, as shown by the green bars. For each histogram, the vertical red line illustrates the estimated position of the peak i.e., the depth. The variance in the estimate of the peak position, around the true depth value, is depicted by the double-headed red arrows. When this variance is large due to low SbNR and sparse sampling, as shown in the left most regime, the distinguishability in the depth image is poor i.e., fine features cannot be seen. When the variance is reduced, either by denser sampling or, higher SbNR (middle two regimes), the quality of the depth image improves. Lastly, when the histogram position can be repeatedly accurately determined (high SbNR with dense sampling) the variance is small and fine features in the depth image can be seen.

The notion of distinguishability of the histogram peak is subtly, but importantly different from the notion of ‘resolution’ (in the common definition). This is because a ‘high resolution sensor’ with both a large number of transverse pixels and fine temporal sampling can still produce a poor quality depth image if used in a scenario where the peak of the histogram cannot be distinguished. A more apt comparison is ‘focus’. In an analogous manner to distinguishing two points at the same object plane in a traditional image (being ‘in focus’), a high quality depth image is one in which two points at different object planes (either between adjacent pixels, or, for the same pixel in consecutive images) can be reliably distinguished. For this reason we adopt the term ‘distinguishability’ (rather than resolution) to describe the depth imaging abilities of a system for the remainder of this work.

Simulating SPAD data

The first regime in which the model can be employed is one where the Cramér-Rao bound is used directly to generate realistic depth images. The minimum distinguishability \(\sigma _\mu\) derived from Eq. (6) characterises the minimum possible standard deviation associated with estimating the peak of a histogram. Consequently, a depth image can be generated by taking a ground truth image i.e., an image of true depth μ values and noising each pixel mq according to \(\mu _{m,q} \rightarrow \mu _{m,q}+\Delta _\mu\) where \(\Delta _\mu\) represents a sample drawn from a normal distribution with a mean of 0 and a standard deviation of the minimum distinguishability \(\sigma _\mu\). As no histograms are produced in this method, it is highly computationally efficient and is able to generate 100000 images (with a transverse resolution of 192 × 128) in under 20 seconds, making it well suited to the generation of training datasets for machine learning applications. However, this method is subject to the assumption that the peak in the histograms can be estimated by a procedure that is capable of saturating the Cramér-Rao bound. Specifically, the Cramér-Rao bound, and by extension the minimum distinguishability \(\sigma _\mu\), can only be reached by a minimum variance estimator and is only valid (as a lower bound) for unbiased estimators of the peak position57. Importantly, the existence of a minimum variance estimator (or even an unbiased estimator) is not guaranteed for all distributions of photon arrivals (Eq. 2). Consequently, the minimum distinguishability \(\sigma _\mu\) represents the absolute best case performance of an imaging system independent of whether such performance can be achieved. The minimum distinguishability \(\sigma _\mu\) remains beneficial as it represents the absolute best case scenario for any instance of an imaging system and acquisition time.

The second regime in which the model can be employed is one where realistic histograms are generated via a process that simulates the photon-by-photon detection of a SPAD array. This approach addresses any dependence on minimum variance estimators as depths can now be measured using any appropriate peak estimation method applied directly to synthetic data. This ‘histogram’ approach provides a highly flexible means of simulating the performance of the imaging system for situations in which there is a large difference between the distinguishability which can actually be obtained and the Cramér-Rao bound. i.e., situations for which unbiased estimators of Eq. (2) do not exist. The cost of this approach is computational complexity with a single image requiring \(\sim 1\) hour of computation. It should be noted though that this computation time is dependent upon the number of pixels, the number of frames-per-image, and the number of laser-pulses-per-frame.

Consider a single pixel of a SPAD detector having b bins of width \(\omega\) for a total TCSPC interval of \(T = b\omega\). The probability vector \(P_{i}^{'}\) for the arrival of a photon in a bin for a given laser pulse is,

$$\begin{aligned} \begin{aligned} P_{i}^{'}&= \int _{[(i-1)\omega ,i\omega )}^{}L(t|\mu +j,\sigma ')dt|_{i\in \mathbb {Z}_{[1,b]}^{+};j\sim \mathcal {N}(\mu _{j},\sigma _j)},\\&=\omega (C_{dc}+C_{bckg})+\frac{P_{pp}}{2}\Biggl \{\text {erf}\left[ \frac{\mu +j-(i-1)\omega }{\sigma '\sqrt{2}}\right] -\text {erf}\left( \frac{\mu +j-i\omega }{\sigma '\sqrt{2}}\right) \Biggr \}|_{i\in \mathbb {Z}_{[1,b]}^{+};j\sim \mathcal {N}(\mu _{j},\sigma _j)}, \end{aligned} \end{aligned}$$
(7)

i.e., the probability of a photon in the ith bin is the area under the likelyhood Eq. (2) inclusive of the bins’ lower bound (\((i-1)\omega\)) and exclusive of its upper bound (\(i\omega\)). j represents the jitter associated with a given pulse as a displacement of \(\mu\) sampled from a Normal distribution \(\mathcal {N}\) having a mean \(\mu _j > 0\) with a standard deviation \(\sigma _j\).

To simulate SPAD histograms the parameters of the optical system (bulleted points in Fig. 1) are used to define a probabilistic distribution of photon arrivals on a per-pulse basis. A data structure having a dimensionality corresponding to the number of bins b, the number of laser pulses (as determined by the exposure time \(\eta\) and the laser repetition rate \(\nu\)), and the desired number of frames N is then created. This data structure is populated by a Binomial sampling process that treats each element (i.e., each histogram bin) as an independent single-trial event with a probability determined by Eq. (7). In this way we fully model (for a single pixel) all possible photon arrivals on a per-bin per-pulse basis as depicted conceptually in Fig.  3. This process is highly parallaziable and has been optimized to run on a GPU. Once populated, the data structure is processed such that only the bin index of the first non-zero value (i.e., the first photon) on a per-frame basis is recorded precisely simulating the ‘first-photon behaviour’ of the SPAD. These indices can then be accumulated over the number of frames to construct a histogram. Further, since the Binomial distribution converges with the Poisson distribution in the many trial limit58, our approach remains consistent with prior works whilst extending the simulation to the single-bin, single-trial limit.

Figure 3
figure 3

A conceptual illustration of the SPAD histogram simulation process. The salient parameters of the SPAD imaging system (Fig. 1) are used to define a probabilistic distribution of photon arrivals. Next, a data structure with a dimensionality corresponding to the number of bins b, the number of laser pulses \(\eta \nu\), and the desired number of frames N is populated by a Binomial sampling procedure based on Eq. (7). The index of the first non-zero value (green bars) on a per-frame basis is then recorded with subsequent non-zero values (orange bars) ignored precisely simulating the ‘first-photon behaviour’ of the SPAD. The first non-zero values can then be accumulated over multiple frames to reconstruct a histogram with a distribution matching that of Eq. (7).

To extend Eq. (7) to an entire image (from a single pixel) we model \(\mathcal {M}\times \mathcal {Q}\) independent (neglecting cross talk) pixels and introduce an additional noise parameter k which captures the inter-pixel variance at an inter-frame scale. k is a sensor specific parameter which characterises the random discrepancies in the triggering of timing circuitry between pixels as a result of manufacturing variations in the propagation of the timing trigger signal across the sensor59. The intra-pixel noise modifies Eq. (7) such that the probability associated with pixel mq is

$$\begin{aligned} \begin{aligned} P_{i,m,q}^{'}&= \int _{[(i-1)\omega ,i\omega )}^{}L(t|\mu +j+k,\sigma ')dt|_{i\in \mathbb {Z}_{[1,b]}^{+};j\sim \mathcal {N}(\mu _{j},\sigma _j);k\sim \mathcal {N}(0,\sigma _q)}. \end{aligned} \end{aligned}$$
(8)

Experimental results and discussion

Resolution test target

To validate the predictions of Sec. 2 the resolution test target shown in Fig. 4 was constructed. The test target consisted of a white backboard to which equidistantly spaced white plastic cylinders ranging in both diameter and height from 90 mm to 10 mm in 20 mm increments were attached. The target was placed at a range of \(R\sim 15\) m and illuminated using a laser operating at \(\lambda = 671\) nm and 2.25 MHz with a pulse energy of \(E_0 = 1\) nJ. The use of a MHz repetition rate laser ensured a sufficient number of laser pulses per frame at high frame rates. For the full list of experimental parameters the reader is referred to supplementary materials 2. The target was imaged using a state-of-the-art SPAD with 4096 50-picosecond bins per-pixel and a one-photon-per-frame sensing limit60. The SPAD has a limited transverse resolution of 192\(\times\)128 pixels in a 2:1 aspect ratio

Figure 4
figure 4

A reference image of the resolution test target. The test target consisted of a series of cylinders ranging in both diameter and height from 90 mm to 10 mm in 20 mm increments. The ground truth depth map for the simulation of the resolution test target.

A 50 mm focal length lens was used to image the target such that the field-of-view of the sensor matched the transverse size of the test target resulting in a transverse pixel resolution at the target plane of \(4.55\times 2.22\) millimeters-per-pixel. Consequently, the narrowest posts of the resolution test target (10 mm) are represented by only \(\sim 2\times 4\) pixels, highlighting the ongoing need to develop high transverse resolution sensors. The inter-pixel noise parameter k was determined to follow a Normal distribution with \(\mu _{k} = 0\) and a standard deviation which varied linearly across the sensor such that for the first column of pixels \(k\sim \mathcal {N}(0,\sigma _{q = 0})|\sigma _{q = 0} = 41\times 10^{-12}\) whilst for the last column \(k\sim \mathcal {N}(0,\sigma _{q = \mathcal {Q}})|\sigma _{q = \mathcal {Q}} = 166\times 10^{-12}\).

Figure 5 shows the simulated and measured depth images of the resolution test target accumulated over 1000 frames using an f2 collection lens. Each frame had a 1 ms duration resulting in 2250 laser pulses-per-frame. Additionally, Fig. 5 provides histograms showing the number of pixels at a given depth. The excellent qualitative agreement between the simulated and measured depth images demonstrates the ability of the model to accurately simulate SPAD data. The agreement is emphasised by the overlap between the depth histograms which have a median per bar accuracy in excess of 80%, meaning that under idealized lab conditions the simulation is able to predict the number of pixels at a given depth over an entire image with accuracies in excess of 80%.

Figure 5
figure 5

Quantitative and qualitative comparison between the simulated and measured depth images for the resolution test target. Left column, the number of pixels at a given depth as a histogram. Right column, the depth images. The first and second rows show the simulated images calculated using the Cramér-Rao bound approach and the histogram simulation approach respectively. The bottom row shows the experimental results.

To create the images in Fig. 5 the peak of the histogram associated with each pixel was determined using match filtering (a.k.a cross correlation). Specifically, a Gaussian kernel was convolved with the histograms and the position of highest response taken as the depth. We selected match filtering as an estimator due to its prevalence within Lidar data processing. Additionally, it represents an unbiased (although possibly not minimum variance) estimator for the case of a single Gaussian peak with a uniform background. For further discussion, the reader is directed to supplementary materials 3. Whilst more advanced processing techniques exist for depth calculation61, we stress that the objective of this work was not to examine the efficacy of different image processing techniques but was rather to accurately simulate the data acquired by SPAD based imaging systems.

To further demonstrate the performance of the model we examined the distinguishability of a single pixel as a function of images-per-second, and lens f-number. Figure  6 shows the simulated minimum distinguishabilities for a single pixel and directly compares these to the measured distinguishablilities of the resolution test target. The simulation is performed for a collection lens with f number \(f_{no} = 2\) as well as \(f_{no} = 4\). 1000 frames of SPAD data are simulated with an exposure time of 1 ms (1000 fps). The number of frames N accumulated to produce a histogram is varied from 1 (1000 images-per-second) to 1000 (1 image-per-second) in 100 increments and this process repeated 100 times to create 100 independent images at each frame summation increment. The distinguishability of the simulated histograms is calculated as \(2\sqrt{2\ln {(2)}}\times\) the standard deviation in the depth estimate of the 100 images at each summation increment. This process was then matched experimentally. The Fisher information is \(F = 1.525\times 10^{19}\) s\(^{-2}\) and \(F = 1.507\times 10^{19}\) s\(^{-2}\) for the f2 and f4 collection lenses respectively.

Figure 6
figure 6

The simulated minimum distinguishability based on Eqs. (6) and (7) overlayed with the experimentally obtained average distinguishability of the resolution test target (Fig.  5 bottom row). c) the lower bounds on distinguishability as determined by Eqs. (6) and (7) shown in blue and green respectively for the case of an f2 lens. The average distinguishability based on experimental histogram variance is shown as orange dots. The upper sub-panels, a) and b), show exemplar simulated histograms (green) with experimentally obtained histograms (for a single pixel) overlayed in orange at the positions indicated by the risers. d) the lower bounds on distinguishability as determined by Eqs. (6) and (7) shown in blue and green respectively with the experimentally obtained average distinguishability overlayed as orange dots for the case of an f4 lens. At less than 50 summed frames the simulated histograms contain no counts and the minimum distinguishability becomes ill-defined.

Figure 6c,d show that as more frames are accumulated to produce the histogram, the distinguishability of the depth estimate (of both the Cramér-Rao bound and the histograms) improves at the expense of images-per-second. This improvement follows a proportionality of 1/N consistent with Eq. (6). However, the distinguishability of the histograms remains larger than that of the Cramér-Rao bound as a result of the match filtering used for peak estimation. The 1/N dependence confirms that the minimum axial distance that can be resolved, i.e., the distinguishability, of a SPAD based Lidar systems is contingent on both the imaging optics and the acquisition time. This is illustrated by comparing Fig. 6c,d which share the 1/N trend, but which have different absolute values due to the proportionally poorer light gathering capability (\([1-(1-\alpha )^{\eta \nu }]\)) of the f4 collection lens. Further, Fig. 6c,d show that the distinguishability of the measured data improves with the same 1/N trend as predicted by Eq. (6). However, it remains poorer than the simulated histogram distinguishability in both the f2 and f4 lens cases despite the simulated and measured histograms (Fig. 6a,b) close agreement. Additionally, the minimum measured distinguishability associated with the f2 lens (Fig.  6c) is 15.1 mm. This result is in keeping with the measured depth image in Fig. 5 where the right-most column of 10 mm cylindrical posts cannot be reliably distinguished from the back plane.

The difference between the simulated and measured distinguishabilities in Fig. 6 can be attributed to the k parameter and its subsequent effects on Eq. (7). Specifically, Eq. (7) (from which the single pixel simulated histograms are derived) characterises a single pixel from a perfectly uniform sensor, i.e., if Eq. (7) was extended to an \(\mathcal {M}\times \mathcal {Q}\) sensor it implicitly assumes perfect triggering between adjacent pixels. Consequently, the ability to distinguish the depths associated with two adjacent points at the target plane using a sensor described by Eq. (7) would only be constrained by the completeness of the histograms. By contrast, the inclusion of the k parameter in Eq. (8) allows for adjacent pixels to trigger at different times. Hence, even if the distinguisability of each individual pixel in a sensor described by Eq. (8) is small (i.e., the histograms are complete) the sensors ability to produce a depth image will be poor. This is because although the depth of adjacent points at the target plane could be calculated precisely, the accuracy would be poor, resulting in a distorted surface profile. This result implies that the quality of the surface profiles obtained by SPAD imaging systems is dependent not only on the temporal resolution of the individual pixels, but also on the consistency of that resolution across all pixels in the sensor.

Long range landrover

To illustrate the versatility of our approach the parameters in the model were adapted to reproduce a SPAD image of a Landrover Wolf 110 parked in front of a wall captured at a range of 1.4 km. The data was captured under daylight conditions using a “FLIMera”camera by Horiba. This camera uses the same sensor as that examined in prior sections. A Cassegrain telescope was used to achieve the required focal length. For the full list of model parameters the reader is referred to supplementary materials 4. The experimental scene was recreated virtually within the Unreal Engine using a 3D vehicle model which approximated the Landrover. The Unreal environment was used to create the reflectivity and depth maps shown in Fig.  7. The reflectivity map accounts for both the reflectivity coefficients (assuming lambertian scattering) of the surfaces, which is contingent on the physical material of the surface, as well as the surface normals i.e., orientation of each surface relative to the camera. The depth map is used as the ground truth depth μ in Eq. (8) in the simulation.

Figure 7
figure 7

The reflectivity map (left) and ground truth depth map (right) created by the Unreal Engines’ virtual environment. The reflectivity map accounts for both the reflectivity coefficient of the surface as well as its orientation relative to the camera (the surface normal). Note that the colorbar of the reflectivity map has been clamped to aid in visualization.

Figure 8 shows the simulated and measured depth images of the Landrover. The simulation was performed using the model in histogram mode so as to match the SPAD data as closely as possible. Additionally, Fig. 8 provides histograms showing the number of pixels at a given depth. The ability to produce reasonable approximations of depth images together with the ability to attribute specific areas of a depth image to physical components (as annotated in the histograms of Fig. 8) is beneficial when selecting or designing components for real world SPAD imaging systems. Notable in the experimental depth image in Fig. 8 is the region of constant depth in the lower-right corner of the image. This region can be attributed to the disproportionately high reflectivity of the numberplate (as seen in Figs.  7 and 9). This high reflectivity results in pixel cross talk as well as lens flaring effects, phenomena which are currently not accounted for by the model.

Figure 8
figure 8

Quantitative and qualitative comparison between the simulated and measured depth images for the Landrover. Top row; the depth images derived from the simulated (left) and experimental (right) histograms respectively. Bottom row; the number of pixels at a given depth as a histogram for the simulated (left) and measured (right) results. The annotations represent the physical component in the image corresponding to the peak in the histogram.

Figure 9
figure 9

Quantitative and qualitative comparison between the simulated and measured intensity images for the Landrover. Top row; the simulated (left) and measured (right) intensity images. Middle row; the simulated (left) and measured (right) histograms for the pixel marked in red in the top row. Bottom row; the number of pixels (on a log scale) at a given intensity as a histogram for the simulated (left) and measured (right) results. The insets show the distribution at the high end of the counts (intensity) axis. Further, regions of notable intensity i.e., the numberplate and headlights have been annotated.

Figure 9 shows the simulated and measured intensity images of the Landrover. The middle row of Fig.  9 compares, for a single pixel, the histogram generated by the model to the measured histogram. Additionally, Fig.  9 provides histograms showing the number of pixels at a given intensity. The qualitative and quantitative agreement between the images and histograms in Fig. 9 illustrates the models ability to produce reasonable approximations of SPAD images under real world conditions. Specifically, the ability to estimate the total photon return from specific components, in this instance the numberplate and headlights, as annotated in Fig. 9 on the scale of entire images is beneficial to the design process of imaging systems. Additionally, the ability to illustrate target or scenario specific features, such as the diagonal cross bar behind the drivers seat, and the wall visible through the rear window of the vehicle are beneficial in examining the performance of potential imaging systems. Note that the directions of the diagonal bars are reversed between the simulated and experimental image due to the drivers seat being on opposites sides between the 3D model and the measured vehicle. These differences in the 3D model, the assumption of Lambertian scattering, and the estimation of the reflectivity coefficients for the surfaces are the primary sources of the difference between the distribution of intensity histograms in Fig. 9.

Finally, we stress that the approach presented here, i.e., first determining the likelyhood of photon arrivals Eq. (2) and then using a sampling routine Eq. (8), is applicable to a multitude of scenarios. For instance, to adapt the model for underwater imaging (or any environment in which back-scatter is significant) only Eq. (2) needs to be changed. Specifically, by introducing a non-uniform back-scatter term (analogous to \(P_{pp}\)) the exponentially decaying background characteristic of scattering media can be modelled47,62. Alternatively, by introducing multiple \(P_{pp}\) terms to Eq. (2) one could model multiple scattering returns per-pixel. These multiple returns are characteristic of the histograms obtained when using low transverse resolution integrating sensors, or imaging around corners19,63.

Conclusion

We have presented a tool for the accurate simulation of the images produced by SPAD based Lidar systems. We integrate a physical model of photon arrivals with a 3D virtual environment to create a physically significant description for the performance of SPAD imaging systems in the context of Fisher information. We present a computational model capable of either; leveraging the Cramér-Rao bound to rapidly produce large numbers of physically realistic SPAD images; or, operating in a regime where it produces realistic SPAD histograms on a pixel-by-pixel basis over the entire sensor. In this later regime we implement a Binomial sampling procedure which extends the simulation to the single-photon, single-bin limit whilst remaining consistent with prior Poisson based works. We comment on the effects that specific optical components have regarding the imaging abilities of SPAD systems and demonstrate the validity of our predictions with experimental measurements of a resolution test target. Finally, we illustrate the versatility of our approach by exploiting the 3D environment to correctly account for the complex surface geometries of a real vehicle. In so doing we present a complete image generation chain capable of accurately modelling the performance of SPAD systems under real world long range conditions. We expect this work to serve as a valuable tool in the ongoing development of SPAD enabled Lidar systems.