Passive optical time-of-flight for non line-of-sight localization

Optical imaging through diffusive, visually-opaque barriers and around corners is an important challenge in many fields, ranging from defense to medical applications. Recently, novel techniques that combine time-of-flight (TOF) measurements with computational reconstruction have allowed breakthrough imaging and tracking of objects hidden from view. These light detection and ranging (LiDAR)-based approaches require active short-pulsed illumination and ultrafast time-resolved detection. Here, bringing notions from passive radio detection and ranging (RADAR) and passive geophysical mapping approaches, we present an optical TOF technique that allows passive localization of light sources and reflective objects through diffusive barriers and around corners. Our approach retrieves TOF information from temporal cross-correlations of scattered light, via interferometry, providing temporal resolution that surpasses state-of-the-art ultrafast detectors by three orders of magnitude. While our passive approach is limited by signal-to-noise to relatively sparse scenes, we demonstrate passive localization of multiple white-light sources and reflective objects hidden from view using a simple setup.

The full experimental setup for imaging through a diffusive barrier is given in Supplementary  Figure 1. The light source used for the experiments was a white-light LED source (MWWHF1, Thorlabs), which was split into four by coupling to a fiber bundle (BF42HS01 Thorlabs). The sources in all the experiments were 2-3 tips of the four fiber bundle ends. The sources were positioned behind a highly scattering diffuser (light shaping diffuser, 80°scattering angle, Newport).
On the other side of the diffusive barrier, a 4-f telescope (f 1 = 150mm, f 2 = 180mm) imaged the barrier's surface on a movable mask, comprised of two small apertures (for 2D localization), or two pairs of apertures, for 3D localization. The mask was placed at the front focal plane of a lens (f 3 = 100mm). A cooled sCMOS camera (Andor Neo 5.5) placed at the back focal plane of the lens, records the diffraction pattern of the scattered light that is transmitted through the double aperture mask.
From the ratio of the focal lengths of the 4-f telescope, the dimensions and separation of the apertures on the mask are reduced by a factor of 1.2 when imaged on the diffuser. Circular apertures were used in our mask-based experiments, and rectangular apertures were used only in the SLM-based experiments, due to the square dimensions of the SLM pixels.
For the experiments with reflective objects (Fig. 5), the diffuser used was Newport light shaping diffuser with a scattering angle of 40°. Movable mask 4-f telescope Figure 1: Setup for passive TOF through a highly scattering medium: a white-light LED point source is hidden behind a diffusive barrier. A double aperture mask that is 4-f imaged on the barrier, selects diffused light from two points on the barrier to be interfered on an sCMOS camera. The mask is translated vertically to measure different points on the barrier.

Supplementary Note 2:
Results using a dynamic programmable mask In order to go beyond the limitations of a static double-aperture mask that is mechanically scanned across the barrier, the setup shown in Supplementary Figure 2a was constructed. In this system, the double aperture mask is replaced by a programmable digital amplitude mask constructed using a spatial light modulator (SLM). For amplitude shaping, the phase-only SLM (Holoeye Pluto BB) was placed between two linear polarizers in a cross-polarization configuration.
The setup for a dynamic programmable mask is conceptually identical to the setup using a mechanically-scanned fix mask (Supplementary Figure 1), with two main technical differences: The first is that in order to obtain a high contrast amplitude mask using the specific liquid crystal SLM model (Holoeye PLUTO BB), which had significant chromatic dispersion, a narrow bandpass filter (BPF) with a 10nm bandwidth (FB550-10 Thoralbs) was placed before the camera. This reduces the light utilization efficiency of this setup, and could be improved by using less dispersive SLMs, amplitude only SLMs, or potentially MEMS based SLMs. The second difference of the programmable mask setup of Supplementary Figure 2a, was that the reflective SLM was placed at a close distance (< 5cm) to the diffusive barrier, instead of being 4-f imaged on it. Even with such imperfect conjugation, two light sources could be simultaneously separated and localized with our approach through a diffusive barrier, by displaying a double aperture mask at different positions on the SLM (Supplementary Figure 2b,c). In this experiment the diffusive barrier was a light shaping diffuser, with a scattering angle of 5°(Newport), and the mask was generated using the SLM to produce rectangular reflective slits 0.24mm wide with a separation of 3.2mm .
Supplementary Figure 2b presents the results of fringe localization from 18 different positions of the displayed double mask aperture, achieved without any mechanical scanning: each row in Supplementary Figure 2b, is the fringe envelope amplitude at different positions of the camera, extracted using spectrogram analysis (see Supplementary Figure 3). The reconstructed sources positions from these measurements are shown in Supplementary Figure 2c.
A programmable mask is advantageous over a mechanically scanned fixed mask as it does not require mechanical scanning, it can straightforward implement advanced multiplexing approaches that may be used for reconstructing the scene and the source image from a single camera exposure, as done in aperture masking interferometry in astronomy [1].
The main disadvantage of a programmable mask is the lower contrast and transmission compared to a mechanical mask, and potential chromatic aberrations. In our implementation the chromatic aberrations required the use of a narrow bandpass filter, which not only resulted in lower light utilization efficiency, but also reduced the temporal resolution by an order of magnitude, to be of the order of 100fs, still considerably better than the response time of the state of the art detectors. The lower temporal resolution can be observed in a larger width of the curves shown in Supplementary Figure 2b

Supplementary Note 3:
Constructing the spatio-temporal TOF (x,t) maps Supplementary Figure 3a shows a raw camera image taken with the setup of Supplementary Figure 1 . The image seems to be a random, low contrast, speckle pattern, as would be seen through a highly scattering barrier. However, as a result of the double aperture mask, low coherence (white-light) fringes are present in the image, at a position which reflects the TOF difference (Supplementary Figure 3a, red arrow). The fringes are localized around the zero delay time providing the optical path difference of the light from the hidden light source to the two apertures.
In order to allow detection of the fringes, we made sure that the fringes period is considerably smaller than the speckle grain size. This was ensured by using double aperture masks having an aperture separation distance that is larger than the size of each aperture. Thus, the fringes can be localized by high-pass (or bandpass) filtering the camera image, around the fringes spatial frequency. One possible approach to perform such filtering is via a 2D Hilbert transform, as shown in Fig. 2.
Another equivalent processing approach is to perform a spectrogram analysis for each camera row. The result of such an analysis is shown in Supplementary Figure 3b: The shown result is the average of spectrogram analysis performed over each of the camera image rows, after averaging each 10 camera rows. The spectrogram analysis performs a short time Fourier transform (STFT) around each horizontal pixel position with a chosen window length (here, 128 pixels). This provides spatial frequency information to be analyzed with a spatial resolution of the window length. The direct result of such simple spectrogram analysis shows a clear spectral peak that is localized around the position of the interference fringes. Taking the relevant spectrogram row (i.e. spatial frequency of the interference fringes, Supplementary Figure 3b  Spectrogram of one camera row, averaged over all camera rows in a single camera image. The resulting amplitude peak, at the row corresponding to the fringes spatial-frequency, marks the interference fringes position on the camera, i.e. the measured TOF. (c) Repeating the process shown in (a-b) for different mask positions, and plotting the resulting spectrograms at the fringes' spatial frequency (row in (b) marked by a dashed box), provides the spatiotemporal TOF trace from which the scene can be reconstructed. Scale bars: 100fs.

Supplementary Note 4:
Localizing sources from hyperbolas intersection Several approaches can be used in order to reconstruct the positions of the sources from the spatio-temporal TOF traces (e.g. Supplementary Figure 3c, and Supplementary Figure 4). Such approaches include filtered back-projection [2], or similar inversion procedures, tracing the light back to ellipsoids [3] or spheres [4]. In our approach, each fringes intensity peak in the spatio-temporal (x-t) trace (e.g. Supplementary Figure 3c) provides a TOF difference, and thus localizes the sources on a hyperbola.
To demonstrate a simple localization approach, we have implemented a back-projection procedure, which sums up the contributions of all the hyperboloids retrieved from each peak of the fringes detected intensity. To account for the experimental setup mounting inaccuracies, the fringe position to TOF delay was determined based on a set of calibration measurements with set of sources at known positions. Example results of such calibration measurements over a field of view of 80cm x 20mm (z,x) are shown in Supplementary Figure 4. For a coarse calibration, measurements for a set of 63 source positions (7 x 9, in (x,z) respectively) were taken, when each curve contains 40 different mask positions. For noise filtering and smoothing the results of the reconstructed hyperbolas were convoluted with a rectangular smoothing kernel. For the localization around a corner (Fig. 4), due to the roughness of the wall surface, the fringes peak positions varied considerably more as a function of the mask position ( Fig. 4d), compared to the experiments with the diffusive barrier. In order to reduce the resulting errors in the measured TOF, the hyperboloids were reconstructed from the average fringes positions over 5 adjacent mask positions.  The temporal TOF resolution of our approach is dictated by the source coherence time, i.e. the envelope of the source field autocorrelation. Which, according to the Wiener-Khinchin theorem, is given by the Fourier transform of the source power spectrum. In our experiments we have used two light sources: a white light LED and a tungsten-halogen lamp (Thorlabs OSL2). These two sources have different spectra (Supplementary Figures 5a,d), and thus the autocorrelation envelope of these two sources, as calculated by Fourier-transforming their spectra, have slightly different temporal widths (Supplementary Figures 5b,e). Specifically, the full width half maximum (FWHM) of the of the field auto-correlation envelope is τ F W HM ≈ 6.6fs for the LED source , and τ F W HM ≈ 6.1fs for the halogen lamp ( Supplementary Figures 5b,e). These resolutions are three orders of magnitude better than ultrafast detectors used in conventional TOF techniques, which are of the order of 15ps using a streak camera [2], and 8ps using SPAD detectors [4]. In addition to calculating the coherence time of each of the sources, we have compared the TOF resolution obtained with the two sources in our experiments. This was done by comparing the envelope of the interference fringes (as detected with a spectrogram analysis) obtained in the localization experiments with the two sources ( Fig. 3 and Fig. 5) for a single source/object. Taking the FWHM of the envelope gives an estimate of the experimental TOF resolution of τ F W HM ≈ 21fs for the LED source and τ F W HM ≈ 17fs for the tungsten-halogen lamp (Supplementary Figure 5c,f). The TOF resolution in experiments is lower than the coherence-time of the sources due to the presence of the diffusive barrier, optical aberrations of the setup, and the effects of windowing of the spectrogram analysis. The experimental coherence length is given by l c = τ c · c = 6.3µm for the LED source.

Fulfilling the spatial Nyquist sampling criteria
In order to be able to measure the interference fringes, the interference pattern has to be spatially Nyquist sampled, i.e. the camera pixel pitch ∆x pixel has to be smaller than half the spatial period of the white light fringes Λ/2. This period is dictated by the central wavelength of the light source, λ 0 , and the geometry of the measurement system (Supplementary Figure 6): In our setup, the interference fringes are a result of interference between two apertures separated by a distance D = 3.2/1.2mm (the factor 1.2 is the reduced image of the slits on the barrier). The aperture mask is placed at the front focal plane of a lens (f = 100mm), and the camera is positioned in the back focal plane of the lens. In this geometry the spherical wave that is transmitted through each of the slits becomes a plane wave propagating at an angle θ = atan(D/2f ) after passing through the lens. The low coherence interference fringes have a period which is Λ = λ 0 /2sin(θ) ≈ λ 0 f D . In our setup D = 3.2/1.2mm and f = 100mm were chosen such that Λ ≈ 22µm , which is approximately three times larger than the camera pixel pitch ∆x pixel = 6.5µm , fulfilling the required Nyquist sampling criterion. Figure 6: System geometry and path length difference δL n to the different camera pixels, located at r n .

Camera pixel to TOF difference conversion
The calculation of the optical path length difference, δL (i.e. the time delay times the speed of light, δL = c∆t) for each pixel in the camera plane is depicted in Supplementary Figure  6. The position of the fringes on the camera, r n , is directly converted to the TOF delay by considering that the relative path length difference from the on-axis camera pixel to the n th pixel is δL n = r n sin(θ) , where r n = ∆x pixel · n is the position of the n th pixel. Thus, the time delay corresponding to fringes measured at the n th pixel position is given by: In the last step we assumed f D, r n , as is the case in our experiments. Substitution of our setup chosen parameters: f = 100mm, D = 1.5/1.2mm, and r n = n · 6.5µm, will give the time delay for the n th pixel: resulting in a conversion of each camera pixel in our spatio-temporal maps to a TOF delay of 0.3fs, roughly a third of the central wavelength, as required for proper Nyquist sampling. The number of time delays that can be sampled in a single camera image is limited by the number of pixels in a single camera row. In our experiments this number was limited to 2,160 pixels by the camera pixel count, and to 1400 pixels by the specific optics used, which limited the angular field-of-view of the speckles collected on the camera (the diameter of the speckled circle in Fig. 2b).

Spatial localization resolution
In this section we provide an analytical derivation of the spatial localization resolution as derived from the TOF temporal resolution, and present experimental results quantifying the localization resolution in our experiments.

Theoretical estimate
Consider a distant object located at a distance z from the barrier, and at a transverse position x from the center of the measurement position. The light from this object arrives to the barrier at an angle of θ = atan(x/z) measured in respect to the normal of the barrier (Supplementary Figure 6). The TOF difference in arrival time of the light from the object to the two measurement apertures having a separation D is: This TOF difference is measured in our approach with a resolution that is given by the source coherence time: τ c = l c /c, where l c is the illumination coherence length. The angular localization resolution, dθ can be estimated by equating the differential of the measured time delay, ∆t, to the measurement temporal resolution τ c : yielding: Substituting our experimental parameters: l c = 6.3µm (see Supplementary Note 5) and D = 1.25mm at θ ≈ 0, we obtain dθ ≈ 5mrad for our experimental geometry. The transverse (dx) and axial (dz) resolutions can be similarly derived by plugging into Supplementary Equation (3): sin(θ ) ≈ (x + x slits )/z, where x slits is the center position of the moving slits, and taking the differential of ∆t with respect to x and z separately, yielding: and: Thus, the theoretically expected transverse and axial localization resolutions are given by: Substituting our experimental parameters of l c = 6.3µm, x = 0, x slits = 10mm, and z = 80cm yields dz ≈ 32cm, and dx ≈ 4mm for this distance.
The localization resolution is improved for larger apertures separation, D (i.e. a larger baselength). However, the apertures separation cannot be arbitrarily large when extended sources are concerned. In order to obtain high contrast fringes for an extended source, the apertures separation must be smaller than the coherence size of the source at the barrier, r coh . According to the Van-Cittert Zernike theorem, r coh ≈ zλ/D obj , where D obj is the object's diameter. The largest apertures separation providing high contrast fringes is thus of the order of the coherence size: D ≈ r coh . This will yield an angular resolution from a single camera shot of: dθ ≈ l c /(r coh · cos(θ )) ≈ l c ∆θ obj /(λ · cos(θ )) where ∆θ obj = D obj /z is the angular size of the hidden source. For broadband white-light sources the coherence length, l c is of the order of the wavelength, λ . Thus, the single-shot localization resolution for extended sources is comparable to the object angular size.
The final localization accuracy is considerably better than the single shot localization resolution, since it is obtained from multiple single-shot TOF measurements, as is studied experimentally in Supplementary Note 7. We have repeated this experiment for different transverse separations ∆x = 3 − 12mm, for two longer distances from the barrier: z = 110cm and z = 130cm. The results of this study are presented in Supplementary Figure 7c The experimental results for axial localization resolution measured with our optical setup are summarized in Supplementary Figure 8. In each of these four experiments, two sources were present in the scene at the same transverse position, but at different depths. In all of these experiments one source is located at a fixed distance z 1 = 110cm behind the diffuser, and the second source is located at a distance of z 2 = 30cm, 45cm, 60cm, 80cm from the barrier respectively, with no transverse translation. The four traces show the envelope of the interference fringes position on the camera as a function of the mask's position. Supplementary Note 7:

Localization accuracy
As mentioned in the main text, the localization accuracy is significantly better than the localization resolution, since it is obtained from multiple, high-SNR measurements. In order to study the transverse and axial localization accuracy in our experiments as a function of the number of TOF measurements taken (number of different mask's positions), we have acquired TOF data for a single source located at various transverse positions at a distance of z = 57cm behind the diffusive barrier . For each source position, we have localized the source from a different number of TOF measurements ranging from 2 to 42. From each TOF measurement a plot of the corresponding hyperbola was back-projected (Supplementary Figure 9a).  Supplementary  Figure 9b,c, respectively. As expected, the localization accuracy is improved when more TOF measurements are used. At this depth (z = 57cm) the transverse localization accuracy with our experimental parameters reaches 0.2mm, and the depth resolution reaches ∆z = 1cm. The transverse resolution is two orders of magnitude better than the axial resolution in our experiments, as expected from the localization resolution (see Supplementary Note 6). In the around the corner localization experiments of Fig. 4, the light was collected at a reflection angle approximately equal to the angle of incidence of the light from the object. This was done in order to maximize the light collection efficiency, and speckle contrast. The latter is maximized as the speckle spectral decorrelation width is maximized, which occurs in this angle [5]. In order to verify that there is no significant specular reflection at this angle that can reveal the position of the light sources with conventional imaging, we have performed conventional imaging with the light reflected from the wall, and compared to the case when the wall was replaced by a mirror. The results of these measurements are displayed in Supplementary Figure 11. The setup (Supplementary Figure 11a) is a simple single lens imaging setup, which images the two hidden objects of Fig. 4a, on a camera. The light source that is closer to the wall is located at a distance U 1 + U 2 = 56 + 14 = 70cm from the lens (f = 8cm), and an sCMOS camera is positioned at a distance V = 9cm behind the lens, such that ( Supplementary Figure 11b shows the image recorded using the light reflected from the white painted wall, showing no information on the hidden objects positions. In contrast, replacing the wall with a mirror wall clearly reveals the positions of the two light sources (Supplementary Figure 11c).

Wall roughness effects on TOF measurement
Supplementary Figure 12a,b compares the TOF traces for localization through a scattering diffuser (Supplementary Figure 12a) and localization using light scattered off a white painted wall (Supplementary Figure 12b). It can be observed that in the white-painted wall case, there are considerable variations of the TOF trace from a linear curve. These variations are due to the roughness of the wall. As mentioned in the main text, these TOF variations contain information on the barrier and can provide a measure of the barrier roughness. In addition to the variations in the positions of the peak of the different TOF traces, the temporal width of a single TOF trace taken with the white-painted wall is broader than the one taken with the diffuser. The broader response of the wall is a result of the wall being an effective thick scatterer, producing a larger spread in optical paths of the back-scattered light, as is analyzed theoretically in Supplementary Note 11, below. The width of the single TOF path can be used to study the scattering properties of the barrier, such as its transport mean free path [6].

Supplementary Note 11:
Influence of a thick barrier on the temporal cross-correlation In this section we theoretically analyze the potential distortions induced by multiple-scattering in a thick barrier on the measured temporal cross-correlation. Consider two fields E 1 (t) and E 2 (t), arriving at the diffusive barrier at two different points, as depicted in Supplementary  Figure 13. Each of the fields exiting the barrier E i,m , which are measured by our system, are given by the convolution of the entering field with the impulse response function of the barrier, h i (t), for the specific input-output positions r i , such that: Diffusive The cross-correlation of the measured fields exiting the barrier is thus given by: Since all the fields and impulse responses are real functions, the cross-correlation of the convolutions is equal to the convolution of the autocorrelations: For a sufficiently thin barrier the impulse responses, h i (t), can be approximated as delta functions, thus providing a good estimate for the arriving fields cross-correlations, as required: In the case of thick multiply scattering barriers, the measured cross-correlation will be the convolution of the desired cross-correlation with the cross-correlation of the impulse responses, which is a function that is limited in time to twice of the Thouless time (dwell time) of the light in the medium. Thus, a thick scattering barrier will induce distortions and smearing that are given by the path delay spread of the light in the medium. While effectively lowering the temporal resolution, the temporal cross-correlation approach is effective also through effectively thick barriers, as we prove experimentally in the results presented in Fig. 4.