Low-latency time-of-flight non-line-of-sight imaging at 5 frames per second

Non-Line-Of-Sight (NLOS) imaging aims at recovering the 3D geometry of objects that are hidden from the direct line of sight. One major challenge with this technique is the weak available multibounce signal limiting scene size, capture speed, and reconstruction quality. To overcome this obstacle, we introduce a multipixel time-of-flight non-line-of-sight imaging method combining specifically designed Single Photon Avalanche Diode (SPAD) array detectors with a fast reconstruction algorithm that captures and reconstructs live low-latency videos of non-line-of-sight scenes with natural non-retroreflective objects. We develop a model of the signal-to-noise-ratio of non-line-of-sight imaging and use it to devise a method that reconstructs the scene such that signal-to-noise-ratio, motion blur, angular resolution, and depth resolution are all independent of scene depth suggesting that reconstruction of very large scenes may be possible.


Supplementary Note 1: Comparison of diffuse and retroreflective objects
We want to illustrate the much brighter signal levels created in confocal NLOS imaging of retroreflective objects compared to NLOS imaging of conventional diffuse objects. In order to compare the diffuse objects used in this paper with retroreflective ones, we set up an illustrative scene in a dark room shown in Fig. 1: it contains the two hodded sweatshirts and the stuffed toy used in the videos, as well as two retroreflective street signs and retroreflective tape. The top image is taken with room lights on and no camera flash, and the bottom image is taken with room lights off and camera flash. The latter resembles confocal NLOS acquisition, because the used cellphone camera has the flash and the camera aperture very close together, about 1.5 cm. As can be seen, the retroreflective objects send so much more light back to the camera (which mimics the relay wall) that the diffuse objects cannot be resolved by the camera and are completely dark, which shows that the objects we used indeed reflect very little light compared to retroreflective objects.

Supplementary Note 2: Intensity arriving at point object
This section explains that the signal characteristic of 1/r 4 is only a worst-case scenario for distances r that are large with respect to the relay wall dimensions. For this reason, let us investigate the intensity arriving at a point object for different relay wall sizes. As in the main paper, the relay wall has the coordinates x p = (x p , y p , 0), and the object, without loss of generality, is located perpendicular to the origin in the relay wall plane at x v = (0, 0, z v ). For a perfectly diffuse relay wall, the light intensity reflected into a direction at an angle θ with respect to the relay wall normal is given by Lambert's cosine law 1 I sr = I sr0 cos(θ), where the intensities I sr , I sr0 are given in W/m 2 /sr, i.e., I sr0 is the intensity at the relay wall divided by 2π, so I sr0 =I 0 /2π, with I 0 the usual optical intensity (in W/m 2 ) at the relay wall. In NLOS imaging, the relay wall usually is scanned, which means each point is individually illuminated in sequence temporally and therefore the intensities at the object position add linearly to yield the total intensity of the object's illumination. (We want to remark that such a straightforward addition of intensities would be an oversimplification if all relay wall points would be illuminated at the same time, because the electric field contributions from different points might cancel, leading to a lower total intensity.) The total intensity at the object is therefore calculated by summing up the 2 Supplementary Figure 1: Scene containing the two hooded sweatshirts and the stuffed toy used in this paper. In addition, the scene has two retroreflective street signs and retroreflective tape. The images have been taken in a dark room. Top: image with room lights on, no camera flash; bottom: room lights off, camera flash. 3 contributions from all relay wall positions; assuming the scanning covers the whole wall surface without gaps, this results in the integral cos arctan where the identity cos(arctan(x)) = 1/ √ 1 + x 2 has been used and the region P denotes the relay wall area that is illuminated.
• When the relay wall is infinite, −∞ < x p , y p < ∞, we have which has remarkable consequences: the intensity arriving at the point object is independent of z v , the distance between relay wall and object. That means that in this case, there is no intensity loss between relay wall and object, which is physically sound because the total energy propagating into the half space away from the relay wall must not change. Due to the fact that the object sends back a spherical wave to the relay wall, there is a total intensity falloff of only 1/r 2 .
• For an infinitesimally small relay wall, i.e., just one point, the integration that needs to be performed is given by and incorporating the loss along the return path results in the total intensity falloff of 1/r 4 .

4
• For the most relevant practical case of a rectangular relay wall, the falloff from the relay wall to the object will be in between the extreme cases of a constant and 1/r 2 : For a square relay wall, b = a, this can be rewritten For large z v a, the arctan argument is very small and the Taylor series approximation arctan(x) ≈ x can be used, again leading to a characteristic proportional to 1/r 2 .
Supplementary Figure 2 shows the intensity arriving at the point object as a function of the object distance z v and relay wall side length a. From (6), we see that only the relative ratio between relay wall size a and object distance z v is important, not the absolute numbers, and therefore the bottom plot of Supplementary Fig. 2 gives the fraction of the total relay wall illumination intensity that arrives at the point object.

Supplementary Note 3: NLOS imaging SNR analysis
The SNR analysis of NLOS imaging is crucial in determining its capabilities and limits. Existing publications on NLOS imaging 2, 3 only consider the fall-off of the signal with increasing distance between object and relay wall, which roughly follows a 1/r 4 curve for large r which is approximately the perpendicular distance between the relay wall and the object. However, in order to adequately assess the image quality of NLOS imaging systems, it is necessary to also analyze the noise contribution. We will derive the SNR for NLOS imaging below. Here, we again point out the analogy between conventional photography and seeing around corners 4 : a conventional camera pixel images (i.e., it integrates light from) an area that increases with r 2 , which means that at larger distances, it has lower resolution, see Supplementary Figure 3 and similarly Figure 4e in the main paper. NLOS image reconstruction is conventionally performed on a Cartesian reconstruction grid having a constant lateral resolution at all distances. Similar to conventional imaging, however, like any NLOS reconstruction method, the RSD approach we use has a Point Spread Function (PSF) that becomes wider with increasing distance. We utilize this fact by applying depth-dependent averaging: the further away a reconstruction plane, the more corresponding planes from neighboring frames are averaged. This has a beneficial effect on both moving and stationary objects: the SNR  Figure 2: Top: Intensity arriving at a point object located on the optical axis at distance z v from the relay wall for I 0 = 1. For a large relay wall, a → ∞, there is no intensity loss; when the relay wall is merely a point, the intensity falls proportionally to 1 z 2 v . Bottom: This time, the intensity is plotted against the ratio between the relay wall side length a and the distance z v (cf. (6)). For example, when a is ten times larger than the object distance, about 91 % of the total intensity arrive at the object.
Supplementary Figure 3: Depth-dependent area covered by the pixels of a conventional camera cover (left), actual RSD point spread function at various positions (right). While calculated on a rectangular grid, the uncertainty of NLOS reconstruction, analogously to conventional imaging, grows with r 2 . The white arrows depicts the movement of 3 point objects with the same speed between two video frames. At the closest distance, the object position is blurred linearly along the arrow because of the object's motion. The point object at the farthest distance cannot be resolved within a single frame because of the spatial blur which is larger anyway, so when incorporating the time axis, more frames can be averaged at larger distances without loss of resolution. Static objects benefit from the averaging anyway. Note that depth-dependent averaging of video frames is not possible with conventional imaging where depth information is unavailable. of stationary objects improves naturally, and objects moving less than the width of the PSF would be blurred anyway because of the reduced spatial resolution, so the averaging also improves the SNR in this case. We want to emphasize again that a loss in resolution at increasing distance is intrinsic to conventional imaging and not a downside specific to NLOS imaging.
The most relevant noise sources of a SPAD-based NLOS imaging system are natural (Poisson noise and ambient light) and arise from the hardware (dark count rate and afterpulsing). The ambient light can be greatly reduced by a suitable bandpass filter, while the rest is assumed to follow a Poisson distribution, which also includes the dark count rate which for our system is almost negligible. Afterpulsing can be reduced with sufficiently long dead-time after each avalanche detection (200 ns in our system). The dominant source of noise is the stochastic nature of photon arrival times, i.e., Poisson-distributed noise.

NLOS reconstruction using Rayleigh Sommerfeld Diffraction (RSD)
The RSD NLOS reconstruction algorithm 5 is based on interpreting the time histograms acquired by SPAD sensors as impulse response functions. Therefore, considering the fact that the light travels much faster than scene objects can move, the scene can be considered a linear and time invariant system and the response of the scene to any input signal can be calculated by convolving the input function with the measured impulse response. The chosen input function is a phasor field pulse, a sinusoidal intensity modulation (having a typical wavelength between 4-8 centimeters) multiplied by a Gaussian to only propagate a pulse comprising few modulation periods. This input signal is considered to emerge from the stationary SPAD observation position on the relay wall (note that because of scene reciprocity, the laser and SPAD positions can be swapped). After convolving the time responses at all scanned laser relay wall positions with this input signal, we now know the response of the system to the modulated pulse. The scene reconstruction is then calculated by propagating this response wavefront back into the scene, and at the correct object locations, this wavefront will take on the shape of the object.
Using conventional notation 5 , this process can be described mathematically as follows. The relay wall is illuminated at a set of points x p = (x p , y p , 0) (projection points) and the light returning to points x c = (x c , y c , 0) (camera points) is collected. The resulting impulse response is denoted H(x p → x c , t). For this and all other quantities, frequency domain representations are denoted by the same variable as the respective time domain quantities, but with the subscript F and the argument of angular frequency Ω instead of t, so the Fourier transformed version of H(x p → x c , t) with respect to time is H F (x p → x c , Ω). The frequency domain phasor field illumination pulse is denoted P F (x p , Ω), and the received signal at x c is given by , Ω C is the frequency of the intensity modulation sine function and σ the inverse width of the Gaussian pulse in the time domain. The frequency domain representation P F (x p , Ω) of the received phasor field pulse is convenient because the Rayleigh-Sommerfeld diffraction integral only processes a single frequency component; at a reconstruction plane parallel to the relay wall, the resulting wavefront is just a 2D spatial convolution of the respective convolution kernel with the wavefront in the relay wall plane. After processing each frequency component separately, inverse Fourier transform calculates the time-dependent wavefront. The overall scene reconstruction at the voxels located on the plane at depth z v is calculated by (see 5 for details) is the 2D convolution kernel for electromagnetic wave propagation, but the factor α(x v , y v , z v ) will be ignored during NLOS reconstruction as it only affects the reconstruction brightness, but not the location of the reconstructed objects. Furthermore, only the angular frequencies in the interval Ω ∈ [Ω C − ∆Ω, Ω C + ∆Ω] are considered; the very small magnitudes outside this interval are neglected. To further reduce the computation time, the time variable t is replaced by a certain formula of spatial variables 5 . We want to remark that the convolution integral over the finite relay wall aperture dx c dy c obviously should not be calculated from −∞ to ∞, in the continuous (integral) case; this is just done here for illustration of the convolution. When numerically calculating the convolution on a computer using the 2D FFT (Fast Fourier Transform), multiplication and inverse 2D FFT, this is fine, as the FFT implicitly assumes periodic extension.

Stochastic Preliminaries
The analysis of the SNR in NLOS imaging has to start at the beginning; the light source. The number of photons emitted by the laser in each pulse (more precisely, within all disjoint time intervals during one pulse) is a random number following a Poisson distribution. This means that everything that happens later, i.e., reflection off the relay wall, the objects, and the relay wall again, also has to be treated in a stochastic way and all calculations have to be performed with random variables. The following considerations are based on the SNR defined by where µ denotes the mean of a random variable and σ its standard deviation. In our case, we are interested in the SNR at each reconstruction voxel, which means we have to calculate the mean and the variance of the reconstruction value (brightness) of a specific voxel. This can be performed as a function of the mean and the variance of the laser pulse. It is of course possible to describe the forward model (integration along spheroids) and the RSD reconstruction from Eq. (8) as a sequence of integrals and to calculate mean and variance at a specific reconstruction position from them. However, this is very cumbersome even for just a point object and there might not be an analytical solution, as from one integral to the next, the integrands become more complicated. Instead, we choose a discrete model which is based on simulated scenes; repeated simulations of the same scene with different noise realizations in each run are added to the ground truth signal, and from all reconstructions sample mean and variance are calculated to determine the SNR (9). The presented model allows the use of all reconstruction methods that can be represented as a matrix multiplication (for example the RSD, backprojection and others).
Let us briefly introduce some tools for the stochastic description of NLOS imaging. Let X ∈ R K be a real random vector that is multiplied by a constant complex matrix A ∈ C M ×K , Y = AX.
The resulting mean vector is calculated from The covariance matrix of X in general is calculated by where (·) H denotes the conjugate transpose and we introduce X = E{X}. The covariance matrix K YY of Y can then be expressed by For calculating higher-order expected values, moment-generating functions are a convenient tool. The moment-generating function of a random variable X is defined by and for the linear combination S K = K k=1 a k X k of independent random variables X k , the momentgenerating function is the product of the moment-generating functions with scaled arguments: The expected values are then calculated by

Physical Preliminaries
The light emitted by a laser follows a Poisson distribution. To not further complicate things, we assume that the laser pulse is shorter than a SPAD time bin width. If that is not the case, the photon rate can be described by an inhomogeneous Poisson point process, where the photon rate within disjoint time intervals is in general varies, but still follows a Poisson distribution. In the considered scenario, we can therefore assume the random variable Ph laser describing the number of photons emitted by the laser in a time period ∆t (SPAD bin width) during the pulse follows the Poisson probability mass function given by where Λ = λ∆t (the expected value of photons during time interval ∆t) and λ is the expected rate, i.e., the number of photons per time interval.
Let us analyze in more detail how this distribution arises because it is very important for all subsequent derivations. The Poisson distribution is actually a special case of the binomial distribution, so let us explain this one first. The binomial distribution analyzes a sequence of K independent experiments with a binary outcome. In our case, the outcome is either that there is a photon arriving in a very short time interval δt or there is no photon arriving in this interval; δt is chosen so small that no more than one photon can arrive. We can divide the time interval ∆t into a very large number K of time intervals of length δt, which means we will on average detect Λ/K photons during an interval of length δt. The binomial distribution is the discrete probability distribution of registering a certain number of photons during the full interval ∆t. Taking the limit as K goes to infinity makes one arrive at the Poisson distribution.
It is very important to stress that these single photon events are independent of each other, which means that they have no influence on each other. What happens if the laser beam hits a surface and we are interested in the number of photons scattering in a certain direction? On average, we know that the fraction α of incident photons is reflected into this direction. Going back to our consideration of small time intervals δt, it is now less likely that we will receive a photon during one interval. We don't know which photons exactly will be eliminated from the photon stream because they are either absorbed or scattered into another direction, but we know that the average number of photons during ∆t will reduce by α, so Λ will reduce to αΛ. Because of the independence of the photon events, they don't influence each other (there might be other effects during scattering which we neglect; there is a large amount of publications studying scattering in detail for certain cases, and we want to have a generic high-level description suitable for this application), so the other photons don't care if a photon is removed from the stream or not. As another consequence, the measurements in each time bin are independent of all other bins. The only constraint that we have to consider is that the total number of photons emitted by the laser has to equal the total number of photons reflected in all directions plus the number of absorbed photons. Because of the fact that most photons are not absorbed, this latter constraint can be set to zero.
It is important to note that the parameter Λ of the Poisson distribution (16) at the same time is its mean value and its variance, so µ Poisson = σ 2 Poisson = Λ. According to Eq. (9), the SNR of a Poisson-distributed random variable is given by so it increases with the square root of the expected number of photons.
Let us now introduce the rest of the notation. As before, the laser illumination position on the relay wall ("projector") is denoted by the vector x p having the elements x p = (x p , y p , 0) and the SPAD ("camera") observation point is given by x c = (x c , y c , 0). The reconstruction voxels ("volume") have the coordinates The distance d 1 is the distance from the physical laser to the laser position on the relay wall, d 2 is the distance from the laser position on the relay wall to the point object, d 3 the distance from the object to the SPAD observation point on the relay wall and d 4 the distance from the relay wall SPAD detection position to the position of the physical SPAD detector.
For the following, we assume that the relay wall reflects the light with an attenuation factor α in all directions, and the target reflects light with an attenuation factor β, also isotropically. These values could be adapted to being spatially dependent and having directivity, but that does not change the principle described in the following. In our case, the laser emits multiple pulses and the SPAD counts at most one photon per pulse, but let us assume all those processes happen during one pulse (the sum of Poisson-distributed random variables is a random variable that also follows a Poisson distribution with the total parameter Λ total equaling the sum of the parameters of the sum elements). The pulse hits the relay wall, where only a fraction α of the photons is reflected and scatters in all directions. At the object, only the fraction α 2πd 2 2 of the original number of photons arrives on average (we want to remark that the size of the reflective object surface was omitted; 1 2πd 2 only provides the intensity per area, not intensity itself because that would require the multiplication by the object area). Note that the SNR decreases, as the SNR of the photon stream at the object before reflection calculates to Repeating this process for the reflection off the object and for the second reflection off the relay wall yields the following parameter for the Poisson-distributed number of photons arriving at the detector within the respective time interval: Let us define the attenuation factor γ( , when there is an object at voxel x v , Note that d 4 is constant and therefore not an argument of γ(x p , x c , x v ). Strictly speaking, the Poisson probability mass function is not defined for Λ = 0, but in the following, we will set the value of a random variable with such parameter to zero.
The SPAD detects the travel time of each photon and builds a histogram. The arriving photons are placed in the histogram bin corresponding to the time delay according to the total distance the photons traveled (i.e., from the laser to relay wall, from relay wall to object and back to the relay wall and finally to the detector). Considering a general scene, not just a point object, the histograms are given by H(x p → x c , t), where x p denotes the laser projection position and x c the SPAD detection position on the relay wall. Each histogram bin at time index t contains the sum of random variables where W (t) is the set of all points x v that lie on a spheroid with foci x p and x c such that the travel time from x p to x v to x c equals t. The random variables Ph SPAD (x p , x c , x v ) stand for the photons returning from the locations x v in the 3D reconstruction space and follow a Poisson distribution with parameter Λ(x p , x c , This means the histogram entry H(x p → x c , t) at time t is a Poisson-distributed random variable with parameter

RSD SNR analysis
We continue with common notation 4 , but now let the arguments be discrete instead of continuous. This means that the time responses are indexed by t, and the total number of considered time bins is T . Ω is the frequency index.

13
The phasor field of frequency index Ω at position (x c , y c , 0) that needs to be backpropagated is given by where P F (x p , y p , 0, Ω) is the illuminating virtual phasor field pulse. Subsequently, the phasor field of frequency index Ω in the reconstruction plane at distance z v is calculated by the spatial 2D convolution The reconstruction at the plane with depth z v at the right point in time follows from t denotes the time index of interest, i.e., when the illumination phasor field pulse passes through the considered reconstruction voxel, with its calculation described in 5 . Inserting all this into one equation yields We can express this as a matrix multiplication. Let the vector X be the vectorized version of all time responses; one time response is arranged to a column vector and all of them are stacked on top of each other. The DCT can be expressed as matrix multiplication, so the DCT with respect to time of this long vector is calculated by multiplying it from the left with the block diagonal matrix D where the main-diagonal blocks are the square DCT matrices DCT T of size T × T : The diagonal matrix C contains the (repeated) values of the illumination phasor field pulse P F (x p , y p , 0, Ω) and is multiplied to the result from the left. Convolutions can be expressed as matrix multiplication, so let B be the matrix that performs the 2D convolution of the values corresponding to frequency index Ω. Last but not least, matrix A takes care of the inverse Fourier transform with respect to time at the corresponding time index. This results in the vector Y of random variables representing the brightness (before calculating the absolute value squared) of the reconstruction voxels at depth z v being calculated by Elementwise calculation of the mean therefore yields and the covariance matrix K YY is calculated using (12). Note that K XX is a diagonal matrix because the random variables in the time bins are considered independent and their pairwise covariance therefore equals zero. K XX has the original variances Λ(x p → x c , t) on its diagonal (see (22)), and subsequently, The variances of the resulting vector Y are the diagonal elements of K YY . The final question is how to calculate the mean and the variance of the absolute values squared of the elements of Y.
Note that each of these elements is a complex number, so we split M in its real and imaginary part R = (M) and S = (M), respectively.
We consider the i-th element of Y, Y i , corresponding to some reconstruction voxel of interest. It is calculated by where R i is the i-th row of R (for S analogously). In the following, the SNR will be calculated.
Let us calculate the mean first. We are interested in From the definition of the variance, Var{X} = E{X 2 } − E{X} 2 , we see that and we can reformulate (32) as where it is important to emphasize that the mean value of the squared variables does not only depend on the mean value squared of the original variable, but also its variance. For the simplification of the variance we have used Var{aX + bY } = a 2 Var{X} + b 2 Var{Y } + 2abCov{X, Y }; note that the random variables in X are independent and the covariances therefore are zero. (·). 2 denotes elementwise squaring of a matrix or, as in this case, vector. X and Var{X} in (34) are vectors (mean and variance are calculated elementwise) and can be replaced by the corresponding Λ(x p → x c , t) from (22) (recall that for Poisson-distributed random variables, the mean equals the variance). When undergoing the same vectorization process as the time responses, we can relabel the Λ(x p → x c , t) to Λ i .
For the variance, we can write using the standard formula Var{X} = E{X 2 } − E{X} 2 The terms E{ (Y i ) 4 } and E{ (Y i ) 4 } can be calculated easily using moment-generating functions, see below. For 2 E{ (Y i ) 2 (Y i ) 2 }, one has to keep in mind that the real and the imaginary part are correlated; we can calculate the covariance matrix as in (30), but this time with a reduced matrix M red consisting of just the two rows representing only the real and imaginary parts of M belonging to one particular reconstruction voxel. We see that the resulting covariance matrix (of size 2 × 2) in general is not a diagonal matrix, so the real and imaginary parts are correlated and thus E{ ( Let us therefore move on by writing For simplification, we made use of the formula Let us now consider the individual terms of (36): The moment-generating function of a Poisson-distributed variable with parameter Λ is defined by and for a linear combination of independent Poisson-distributed random variables we get using (14) M X (s) = e K k Λ k (e a k s −1) .
Equation (36) has to be evaluated carefully. We have to calculate the expected values of the product of four different random variables, e.g., E{X 1 X 2 X 3 X 4 }, and the fourth moment of a random variable E{X 4 1 }, and all combinations in between. A fortunate factor is that these random variables are independent (they are the original Poisson-distributed ones), so we can write E{X 1 X 2 X 3 X 4 } = E{X 1 } E{X 2 } E{X 3 } E{X 4 }, but we have to be careful when there are terms like E{X 1 X 1 X 1 X 1 }, which has to be evaluated as E{X 4 1 } and not E{X 1 } E{X 1 } E{X 1 } E{X 1 } (such terms appear for example in . For our Poisson-distributed random variables we have Applying Mathematica to (43) for calculating the term E{ (Y i ) 4 } = E{(R i X) 4 } in (35) (note that unlike the single X k , this is a linear combination of all of these Poisson-distributed random variables) yields After having calculated the general SNR according to Eq. (9) for an arbitrary reconstruction voxel, it is hard to tell what exactly the SNR curve as a function of the distance z v between the reconstruction voxel and the relay wall looks like. Even one row vector of the matrix M has hundreds of thousands of elements, and it is hard to evaluate their effect analytically. However, what we can do is to consider the full SNR derived in the case of a large distance z v , first in the squared case. The parameters Λ k denote the means and the variances of the original Poisson variables in each time bin; for large distances they are approximated by 1/z 4 v . The mean value from Eq. (34) has terms of the form a/z 4 v + b/z 8 v (variance of the original random variables goes with 1/z 4 v , plus their mean squared which follows 1/z 8 v ), where we can neglect the higher order exponent. For the variance, we have terms of the order 1/z 4 v (last terms of Eq. (44) and Eq. (45)), where the Poisson means Λ k and therefore the attenuation factors γ(x p , x c , x v ) from Eq. (20) appear linearly. The higher exponents of the attenuation factor can be ignored because 1/z 8 v , 1/z 12 v and 1/z 16 v fall off much faster than 1/z 4 v . For these reasons, the SNR can be approximated by This is quite an interesting result; while all NLOS imaging publications correctly state that the signal roughly falls of with 1/z 4 v at large distances, the variance falls off with 1/z 2 v , which cancels part of this effect! In total, this means that the SNR only falls off with 1/z 2 v .

Incorporating background noise
When background noise (ambient light + detector noise) is present, we add a Poisson distribution with constant mean to all histogram time bins, which means that the Poisson variables in each time bin (see Eq. (22) with the attenuation factors Eq. (20)) now consist of a component with 1/z 4 v and a constant component. For calculating the SNR as the mean over the standard deviation, we have to be careful, because the mean value of the noise should not be considered. For this reason, we take the mean value of the noise-free model and calculate the variance as previously according to Eq. (35). However, when noise is present, the variance now again has terms that fall with 1/z 4 v , 1/z 8 v , 1/z 12 v and 1/z 16 v , but also a distance-independent constant. This means that for large distances z v , the standard deviation becomes a constant. In contrast to Eq. (46), the standard deviation does not balance part of the 1/z 4 v fall-off of the mean, so for a constant c 1 This means that unlike the no noise case, the SNR will eventually not follow a 1 fall-off. This is because the standard deviation will level off at some point. This point depends on the noise mean; the larger it is, the earlier the standard deviation levels off.

SNR of arbitrary linear NLOS reconstruction
Let us also briefly consider what the SNR looks like when considering a reconstruction method (for example simple backprojection) that does not require the calculation of the absolute value squared, but only the matrix multiplication from Eq. (28). In this case, the Poisson means/variances Λ k also appear linearly in the reconstruction voxel formulas for both mean and variance, and we get the same SNR as in Eq. (46) for large distances without considering detector noise or ambient light, and the same SNR as in Eq. (47) when these degrading effects are present.  Figure 4: Mean, standard deviation and SNR of voxels reconstructed at different depths. All curves are normalized to 1 at the closest distance. Top row: without background noise (ambient light, sensor dark count). Bottom row: with background noise. When background noise is considered, the trend derived from the theory is apparent: the standard deviation levels off at a certain distance, meaning that it doesn't cancel part of the decreasing mean anymore and the SNR falls faster after this point. Also, as expected, the standard deviation is smaller when averaging is applied, leading to a higher SNR. For each depth, the considered lateral coordinates x v , y v are the same. The four dashed lines are for comparison with 1 (---), 1/r (---), 1/r 2 (---) and 1/r 4 (---).

SNR Plots
predicted in the calculation from the previous section, the SNR curve exhibits a 1/r 2 decrease when no background noise (ambient light or sensor noise) is present. In case there is background noise (bottom plot of Fig. 4 in the main paper), the standard deviation at a specific distance levels out, and the SNR then falls with 1/r 4 . The background noise of our experimental setup is so low that this turning point is further away than the maximum possible physical distance we could realize. All shown mean and standard deviation curves are obtained by simulating the ground truth scene with the acquisition parameters (laser power, SPAD sensing area, target size and average background noise level) the same as in the real experiments. The relay wall and target are assumed to exhibit perfect diffuse (Lambertian) reflectance. Reconstructions from 20 000 different noise realizations were calculated and the sample mean and variance were calculated from the reconstructions. While Figure 4 in the main paper only shows the resulting SNR, Supplementary Fig. 4 for the sake of completeness also shows the mean and standard deviation. We see that at large distances, coinciding with the derived theory, the RSD mean falls parallel to 1/r 4 and its standard deviation with 1/r 2 , meaning that the SNR also exhibits a 1/r 2 characteristic. Once background noise is considered in the simulations, the standard deviation levels off at a certain distance, leading to the SNR falling steeper in a 1/r 4 manner.
In addition, we want to provide a detailed comparison of RSD and BP SNR when different powers of the results are taken. According to (26) and (32), the RSD at the very end requires the calculation of the absolute value squared, while BP is just the linear matrix multiplication of the form (28). In the main paper, the SNR curves according to the respective power are shown. For fairness, we also show the SNR curves when the square root of the RSD result is taken and when the BP output is squared in Supplementary Fig. 5. As can be seen, the SNR curves look very similar for both RSD and BP, regardless of which power is used eventually. For BP, the mean falls roughly with 1/r 3 and the variance with 1/r 1.5 , meaning that the SNR also follows a 1/r 1.5 curve. Squaring the pixelwise BP results results in a mean that first falls steeper than 1/r 4 , but then changes to a falloff with 1/r 2 . This can be explained with Eq. (33); overall, we can write this approximately as with two constants A, B 1, and where we assume that they only weakly depend on r and we can neglect this dependence for simplicity. This means that for small r, the first term is dominant because of the squared constant, and with increasing r, at a certain value, the second one becomes dominant. A similar argument can be made for the variance, and both these arguments together explain the curve shape for mean and standard deviation and experimentally confirm the theory derived above, albeit the distance of the turning points is much larger than the distances at which experimental results have been demonstrated in the NLOS literature so far. We want to empha- effect of coherent and incoherent summation of multiple reconstructions. This example shows how reconstruction changes when we use multiple pixels.
Lastly, we show the effect of accumulative exposure time on stationary scenes. 100 frames were collected with an exposure time of 0.2 s per frame. Supplementary Figure 8 shows reconstructions of 3 different stationary scene with different exposure times ranging from 0.2 seconds up to 20 seconds. One can notice that the reconstructions don't change significantly after 1.4 − 2 seconds of exposure time. We want to emphasize that the noise can be greatly reduced in the future by using SPAD arrays with more pixels that allow for the acquisition of more photons in the same amount of time.