Non-line-of-sight imaging with arbitrary illumination and detection pattern

Non-line-of-sight (NLOS) imaging aims at reconstructing targets obscured from the direct line of sight. Existing NLOS imaging algorithms require dense measurements at regular grid points in a large area of the relay surface, which severely hinders their availability to variable relay scenarios in practical applications such as robotic vision, autonomous driving, rescue operations and remote sensing. In this work, we propose a Bayesian framework for NLOS imaging without specific requirements on the spatial pattern of illumination and detection points. By introducing virtual confocal signals, we design a confocal complemented signal-object collaborative regularization (CC-SOCR) algorithm for high-quality reconstructions. Our approach is capable of reconstructing both the albedo and surface normal of the hidden objects with fine details under general relay settings. Moreover, with a regular relay surface, coarse rather than dense measurements are enough for our approach such that the acquisition time can be reduced significantly. As demonstrated in multiple experiments, the proposed framework substantially extends the application range of NLOS imaging.


Introduction
The technique of imaging objects out of the direct line of sight has attracted increasing attention in recent years  . A typical non-line-of-sight (NLOS) imaging scenario is looking around the corner with a relay surface, where the target is obscured from the vision of the observer. NLOS imaging aims to recover the albedo and surface normal of the hidden targets with the measured photon information. Potential applications of NLOS imaging include but are not limited to robotic vision, autonomous driving, rescue operations, remote sensing and medical imaging.
To achieve NLOS reconstruction, laser pulses of high temporal resolution are used to illuminate several points on the relay surface, where the first diffuse reflection occurs.
After that, photons enter the NLOS domain and are bounced back to the visible surface again by the unknown targets. The hidden targets can be reconstructed with the timeresolved photon intensity measured at several detection points on the visible surface.
The imaging system is confocal if the illumination point coincides with the detection point for each spatial measurement, otherwise being non-confocal. Besides, we call the measurements regular if the illumination and detection points are uniformly distributed in a rectangular region.
According to how the hidden surface is represented, existing imaging algorithms are divided into three categories: point-cloud-based 28 , mesh-based 29 and voxel-based methods 1,8,9,[30][31][32][33][34][35] . Among these categories, voxel-based algorithms yield to be the most efficient ones with low time complexity 32 and fine reconstruction results 34 . For voxelbased methods, the reconstruction domain is discretized with three-dimensional grid points and the albedo is represented as a grid function.
The first voxel-based NLOS reconstruction method is the back-projection algorithm proposed by Velten et al. 1 . The measured photon intensity is modeled as a linear operator applied to the albedo, and the targets are reconstructed by applying the adjoint operator to the measured data. Further improvements of the back-projection method include rendering approaches for fast implementations 2,16  Despite these breakthroughs, two major obstacles of existing methods toward practical applications are the need for a large relay surface and dense measurement.
When there are limitations on the shape and size of the relay surface, these algorithms may fail due to the lack of data. Besides, dense measurement results in a long acquisition time, which poses a significant challenge for applications such as auto-driving where the observer may move at high speed.
In this work, we propose a Bayesian framework for NLOS reconstruction that is not limited by the spatial pattern of illumination and detection points. By introducing the virtual confocal signal at rectangular grid points, we design joint regularizations for the measured signal, virtual confocal signal and the hidden target. We put forward a confocal complemented signal-object collaborative regularization (CC-SOCR) framework, which reconstructs both the albedo and surface normal of the hidden target.
The proposed method works quite well under the most general setting, allowing regular and irregular measurement patterns in both confocal and non-confocal scenarios.
Besides, our approach provides sparse reconstructions of the targets with clear boundaries and negligible background noise, even in cases with very coarse and noisy measurements. Notably, the proposed method suggests a paradigm shift, liberating the research of NLOS imaging from relying heavily on the assumption of a large-size relay surface with regular shape and entire region (wall, ground) ever since the technique was first proposed. To the best of our knowledge, this work demonstrates high quality NLOS reconstruction for the first time, in the scenarios with the relay surfaces having discrete scattering regions, irregular shape, or very limited size, enabling the hidden object reconstruction with far more types of realistic relay surfaces such as window shutter, window frame, and fence, which significantly broadens the scope of NLOS imaging applications. As shown in Fig. 1, the illumination and detection patterns are irregular but manifest in ubiquitous scenes of daily lives. Reconstruction results of the bunny with synthetic confocal signals 38 detected at the entire relay surface and these four scenarios are provided in Supplementary Figures 1 -5. Besides, our method can significantly reduce the acquisition time and accelerate the imaging process by using sparse measurements for the conventional scenario of a large relay surface.

Results
The NLOS physical model. The goal of NLOS imaging is to take a collection of measured transient data and find the target that comes closest to fitting these signals. In this work, we adopt the physical model proposed in SOCR 34 . Let and be the illumination and detection points on the visible surface, and we call an active measurement pair, or simply a pair in the following. The photon intensity measured at time is given by (1) in which is the three-dimensional reconstruction domain, denotes the albedo value of the point , is the unit surface normal at that points toward the visible surface. The unit vector can be arbitrarily chosen for points with zero albedo value. By denoting , equation (1) is written equivalently as (2) Noting that the intensity is linear with , the physical model can be written as in the discrete form. The albedo and surface normal can be obtained directly from . and is considered to be generated with the ideal nonlinear physical model. The simulated signal is generated using equation (1) and the hidden target. The measured signal is inevitably corrupted with noise, which is considered as certain deterioration of the ideal signal. The degradation is related to detection efficiency and background noise, whose distribution is hard to estimate and may vary from one scenario to another. To tackle this problem, we introduce the approximated signal , which serves as a better approximation of the ideal signal than the measured signal.
When the number of measurement pairs is small, the solution to the reconstruction problem may not be unique due to the lack of data. To overcome the rank deficiency of the measurement matrix, we introduce the virtual confocal signal at regular focal points.
Suppose that the reconstruction domain is discretized with voxels in the depth, horizontal and vertical directions. We denote by the collection of confocal measurement pairs, which are the orthogonal projections of the voxels to a virtual planar surface perpendicular to the depth direction. The corresponding ideal, simulated and approximated signals for are denoted by , and , respectively.
A Bayesian framework. We treat the reconstructed target , the measured signal and the approximated signals , as random vectors and formulate the imaging task as an optimization problem using Bayesian inference. The target and signals , and are obtained simultaneously by maximizing the joint posterior probability.
Three assumptions are made to formulate this as a concrete optimization problem. Fig. 2 The proposed CC-SOCR method. a The CC-SOCR framework. For high quality reconstructions, the measured signal, approximated signal and virtual confocal signal are treated as random variables and solved simultaneously using Bayesian inference. The term includes the sparseness of the approximated signal, as well as the sparseness and non-local selfsimilarity prior of the target. The term corresponds to an empirical Wiener filter, in which the simulated signal of the target serves as the pilot estimation. The term contains the sparseness of the virtual confocal signal, as well as the joint sparse representation of the local structures of the simulated signal and the virtual confocal signal. b The approximated signals and the reconstructed target of the instance of the statue (confocal, measured data). The measured data is provided in the Stanford dataset 8 . We assume the relay surface to be the region consisting of four letters 'N', 'L', 'O', and 'S'. The measured signal, approximated signal, virtual confocal signal and the reconstructed albedo are shown at the bottom.
Firstly, the conditional distribution of the measured signal given the joint probability distribution of , and is (4) in which is related to the joint prior distribution of , and . With this assumption, does not provide additional information to predict when is known.
Secondly, the joint prior distribution of and is (5) in which describes the prior distribution of and . With this regularization term, we search for the target only in the set of real-world objects. Besides, is less noisy than the measured data and is closer to the ideal signal of a certain real-world target, which helps to enhance the reconstruction quality.
Thirdly, the conditional distribution of given and is (6) in which and are the subsets of the approximated signals and that share the same measurement pairs. is related to the joint prior distribution of the target and the virtual signal .
With these assumptions, we derive a concrete optimization problem using the Bayesian formula.
in which the third equality follows from equation (4) and the last equality holds with equations (4), (5) and (6). By designing appropriate regularization terms , and , we obtain high quality reconstructions of the targets even in scenarios with highly  Table 1. To bring existing methods into comparison, we interpolate the signal with the nearest neighbor method 8,35 , which generates better results than zero padding 32 in extreme cases (See Supplementary Figure 24).
Results on synthetic data. Instead of using an entire planar visible surface, we assume the relay to be a square box which simulates the scenario of four edges of a window.
The hidden object is a regular quadrangular pyramid, whose base length and height are Albedo values that are less than 0.25 are thresholded to zero. The LOG-BP method fails to locate the target correctly, and contains misleading artifacts near the boundary of the  is 2.86%, which is one order of magnitude smaller than that of the LOG-BP reconstruction (21.75%).
Results on measured data. For confocal experiments, we use the instance of a statue in the Stanford dataset 8 to test the performance of the proposed method. The target is 1

Discussion
We have proposed a novel framework towards the most general setting of NLOS imaging. In this section, we discuss its relationship with the original SOCR method, the complexity of the algorithm and possible directions for further improvements. Other types of virtual signals. In CC-SOCR, virtual confocal signals observed at planar rectangular grid points are used to complement the reconstruction process in the case of incomplete measurements. It is also possible to consider virtual non-confocal signals for stronger regularizations. Besides, virtual confocal signals at several planes may be introduced to use the spatial correlation better. However, the time and memory complexities will also increase.
Virtual confocal signals at coarse grids. In the CC-SOCR method, the time complexity is still due to the virtual signal introduced, even when the number of measurement pairs . To accelerate the reconstruction process, the virtual signal at coarser grids may be used. If the virtual confocal signal is considered at points, the time complexity reduces to . In Supplementary

Materials and methods
The joint regularizations. In equation (7), we formulate the CC-SOCR framework as an optimization problem. Here we show how the regularization terms , and are designed.
describes the prior distribution of the reconstructed target and the approximated signal of the measurement pairs. For the reconstructed target, we consider the sparsity and non-local self-similarity priors and directly follow the SOCR method 34,39,40 . We also use the zero norm to impose sparseness on the approximated signal . We set (8) in which , , and are fixed parameters. is the albedo of , is the block matching operator, with the index of a reference block. The summation is made over all possible blocks. and are two orthogonal matrices that capture the local structure and non-local correlations of the 3D albedo block. is the matrix consisting of transform coefficients of the block. denotes the zero norm, which represents the number of nonzero values of a tensor.
For the term , we also follow the original SOCR method and set (9) in which is a fixed parameter, is the patch extracting operator, with the index of a local patch. Noting that the signals may not be measured at regular grid points, the patch extracting operator only applies to the temporal direction of the signals.
is the measured signal. is the matrix of discrete cosine transform. Let be the matrix of transform coefficients of the patch, the regularization term is given by (10) in which and are two fixed parameters that control the weight of the simulated signal and the sparsity of the representation, respectively.
( 11) in which , and represent the collections of the transform-domain coefficients , and respectively. represents the identity matrix of order . The solution to the optimization problem is provided in Supplementary Note 2.

Data availability
The Zaragoza dataset is available in Zaragoza NLOS synthetic dataset Synthetic data of the instance of the pyramid are attached to the code.

Code availability
The code will be made freely available in the future.

Supplementary Note 1 Additional experimental results
For all experiments, we interpolate the signals with the nearest neighbor method where necessary to bring F-K 1 , LCT 2 , D-LCT 3 , PF 4 and SOCR 5 methods into comparison. The coordinates of the focal points of all experiments are provided in the code. Supplementary Figures 1 -5 compare the reconstruction results of the bunny under different relay settings with the synthetic confocal signal provided in the Zaragoza dataset 6 . These results indicate the capability of the proposed CC-SOCR method in providing clear reconstructions of the hidden targets, even in cases with highly irregular relay settings (See Supplementary Figures 4 and 5).
Supplementary Figure 23 shows the reconstruction results of the statues with different sizes of the virtual confocal signals introduced. The confocal signal is measured at 200 randomly distributed focal points in a square region of 2 × 2 m2. The reconstruction quality decreases with the size of the virtual confocal signal, which indicates the necessity of the dense virtual signal introduced. However, sparser virtual signals result in shorter execution time (See Supplementary Tables 4 -7), which shows the trade off between the reconstruction quality and computation runtime.
Supplementary Figure 24 compares the F-K, LCT, D-LCT and SOCR reconstructions of the statue with confocal signal measured at a heart-shaped region consisting of 258 focal points. The signal is preprocessed with zero padding and nearest neighbor interpolation techniques. It is shown that existing methods fail in this extreme case. Supplementary Figures 25 and 26 show reconstruction results of the statue with confocal signals measured at the letters 'N', 'L', 'O' and 'S' and a heart-shaped region. It is shown that the least squares reconstruction without regularizations is of poor quality. When the sparsity and non-local self-similarity priors of the target are introduced, the quality of the reconstruction enhances, but is still blurry or contains artifacts. The CC-SOCR method reconstructs the target faithfully.

The CC-SOCR algorithm
The proposed CC-SOCR optimization problem for NLOS reconstruction writes in which is the index of a local patch, is the patch generating operator, is the block matching operator. represents the matrix of the discrete cosine transform, with its filter denoted by . represents the element of the vector .
, and represent the collections of the transform-domain coefficients , and respectively. and are the subsets of the approximated signals and that share the same measurement pairs. , and are sizes of the local patches of the albedo. is the maximum number of neighbors kept in the block matching process. , and are the patch sizes of the virtual confocal signal in the horizontal, vertical and temporal directions. , , , in which is an orthogonal matrix. In order to solve this problem with convergence guarantee, it suffices to generalize the data-driven tight frame image denoising algorithm 7 to three dimensions and apply it to with the regularization parameter .
For the sub-problem (2.1), if the measurement pair of the measured signal does not appear in the virtual confocal signal, the solution is given by represents the operator that aggregates the patches back to the signal. is the collection of the Wiener coefficients in the frequency domain. is an abbreviation of , where is the set of indices of the spatial patches. The reconstructed target is updated by solving the sub-problem (2.3). In this subproblem, the term is omitted. Otherwise, the problem will be non-linear and difficult to solve. This problem contains a regularization term, which can be solved efficiently with the split Bregman method 8  in which and are the collections of transform-domain coefficients, and are the operators that aggregate the patch dataset or block dataset back to the signal and albedo. and are understood as and . Noting that is a three-dimensional volume that does not contain information of the surface normal, we use the technique introduced in SOCR 5 to construct a directional albedo with the surface normal provided by . See the supplement of SOCR 5 for more detail. Here, we abuse the notation and also use to represent this directional albedo. Minimizing the objective function (S.10) yields a least-squares problem without constraint, which can be solved with the conjugate gradient method. We remark that this sub-problem is solved approximately due to the omitted term and the treatment of . Nonetheless, extensive experimental results in Supplementary Note 1 indicate that high-quality reconstructions are obtained with these tricks. For the sub-problem (2.5), if the measurement pair of the virtual confocal signal does not appear in the measured signal, we have (S.12) Otherwise, the solution writes (S.13) in which and are the simulated signal of the measurement pair and the corresponding signal of , respectively. The sub-problem (2.6) is of the same type with (1.6) and can be solved using the same method discussed above.

Execution time
Execution time of the CC-SOCR algorithm for the instance of statue with 200 randomly distributed confocal measurements and virtual confocal signals of different sizes are shown in Supplementary Tables 4 -7. The code was run on an AMD EPYC 7452 server with 64 cores. It is shown that sparser virtual confocal signal result in shorter excution time. However, the reconstruction quality decreases with the size of the virtual confocal signal (See Supplementary Figure 23).