Lensless light-field imaging through diffuser encoding

Microlens array-based light-field imaging has been one of the most commonly used and effective technologies to record high-dimensional optical signals for developing various potential high-performance applications in many fields. However, the use of a microlens array generally suffers from an intrinsic trade-off between the spatial and angular resolutions. In this paper, we concentrate on exploiting a diffuser to explore a novel modality for light-field imaging. We demonstrate that the diffuser can efficiently angularly couple incident light rays into a detected image without needing any lens. To characterize and analyse this phenomenon, we establish a diffuser-encoding light-field transmission model, in which four-dimensional light fields are mapped into two-dimensional images via a transmission matrix describing the light propagation through the diffuser. Correspondingly, a calibration strategy is designed to flexibly determine the transmission matrix, so that light rays can be computationally decoupled from a detected image with adjustable spatio-angular resolutions, which are unshackled from the resolution limitation of the sensor. The proof-of-concept approach indicates the possibility of using scattering media for lensless four-dimensional light-field recording and processing, not just for two- or three-dimensional imaging.


System calibration
Fig . S2 shows the calibration procedure by using an on-axial point source 0 p to generate a pseudorandom pattern. The pattern is evenly segmented to a series of non-overlapped sub-images, which exactly correspond to the column vectors of the encoding kernel, i.e.,   ;, t x 0 u .

Setup
The aperture close to the diffuser limits the imaging region on the sensor. The intensity in such a region generates a support of the detected pattern, as shown by the orange box in Fig. S3a. When the point source is shifted laterally, the pattern correspondingly moves on the sensor. The region outside the support determines the maximum lateral spacing of point sources, as shown in Fig. S3b. Furthermore, the size of the support depends on the distance from the point source to the diffuser, i.e., the depth. As shown in Fig. S3c, when the point source is closer to the system, the support becomes larger, while the lateral spacing reduces. This means that the field of view (FoV) becomes smaller.
where s w is and p w are the sizes of the sensor and the aperture, respectively, 0 d is the distance from the diffuser to the sensor, and d is the depth. If the system is fixed, s w and 0 d are constants. For a specific p w , the FoV is proportional to d , which is consistent with the above discussion. For a specific d , the FoV can be increased by decreasing p w . However, a small aperture will reduce the amount of light and thus may require longer exposure time. Besides, a small support may also limit the angular sampling on the pattern.
The size of the aperture can be selected according to the desired FoV and the depth range. In the experiments, we used an aperture with a size of 6  6 mm 2 . Then 12 patterns were captured by axially moving the point source far away from the system with steps of 2 mm. Fig. S4a shows three patterns (Pattern 1, 6, and 12), captured at the distances: 25 mm, 35 mm, and 47 mm, respectively. It can be seen that the size of the patterns decreases with increasing distance, while the shape of the patterns remains unchanged. According to Eq. (S1), the FoVs related to Pattern 1 and Pattern 12 are 12.28 mm and 28.36 mm, respectively. The requirement of the experiments are thus satisfied. The enlarged view (see left side of Fig. S4a) shows the high-contrast structure of Pattern 1. The structure remains in a large depth range. Fig. S4b and S4c show the spectrum and its central cross-section to Pattern 1, respectively. In Fig. S4b, the central region is suppressed and the green dashed circle indicates the valid spatial frequencies. The system can record high-frequency information, therefore, high-resolution light fields can be reconstructed, avoiding the resolution limitation of the sensor.

Spatio-angular sampling
Once a pattern is chosen as a base, it can be segmented for the decoupling reconstruction according to the desired spatio-angular resolutions. First, a spatial resolution can be selected. Since the resolution of the camera used is 2048  2048 pixels, the spatial resolution can be conveniently adjusted by a multiple of 2  2. Thus, spatial samplings like 1024  1024, 512  512, and 256  256 are used.
Subsequently, the captured pattern can be scaled with the selected spatial resolution. Fig. S5a shows the scaled Pattern 6 with a spatial sampling of 512  512. Then, the support of the scaled pattern can be identified. This can be easily processed by horizontally and vertically projecting the pattern, as shown by the curves in Figs. S5b and S5c, respectively. Finally, the sub-images can be obtained by evenly segmenting the support. The size of each sub-image can be roughly determined based on the full size of the support and the desired angular resolution. For example, by using a size of 60  60 pixels, 6  6 sub-images are obtained, as shown in Fig. S5d. Therefore, the spatio-angular resolution used for lightfield reconstruction is 512  512  6  6. Note that there is no special requirement for the selection of the spatio-angular sampling. Simulation scene of multiple sparse object points Fig. S6 shows an example of lateral shifting. The support of each captured pattern, except for Pattern 6 that was used as a base for calibration, was randomly shifted (integer pixels shift). According to the shift invariance property of the diffuser, the shifting of the pattern is equivalent to the lateral translation of the corresponding point source. The point sources used to generate these patterns were locate at different distances from the diffuser. Therefore, the combination of these shifted patterns can be regarded as the intensity detection of a simulated scene consisting of multiple object points located at different positions in the measured volume.

Light-field reconstruction using a scaled pattern
Light fields related to area objects can be reconstructed correctly as long as a pattern generated by a point source located at the depth near the objects is used. When we used Pattern 1 as a base to reconstruct the light field in the spatio-angular sampling of 1024  1024  6  6 from the raw image shown in Fig. 5a, an acceptable result could be obtained. This is illustrated by the in-focus slices of the reconstructed light field (see Fig. S7a). Alternatively, we scaled Pattern 6, as shown in Fig. S7b. The enlarged views of Pattern 1 and the scaled pattern show that their structures are basically the same.
Interestingly, the light field could also be correctly reconstructed by using the scaled pattern, as illustrated by corresponding in-focus slices in Fig. S7b.

Optimization algorithm
Based on the shift invariance of the encoding kernel, the forward imaging model described by Eq.
(2) can be replaced with a convolution, similar to that used by Antipa et al., 1 such that