Single-shot lensless imaging with fresnel zone aperture and incoherent illumination

Lensless imaging eliminates the need for geometric isomorphism between a scene and an image while allowing the construction of compact, lightweight imaging systems. However, a challenging inverse problem remains due to the low reconstructed signal-to-noise ratio. Current implementations require multiple masks or multiple shots to denoise the reconstruction. We propose single-shot lensless imaging with a Fresnel zone aperture and incoherent illumination. By using the Fresnel zone aperture to encode the incoherent rays in wavefront-like form, the captured pattern has the same form as the inline hologram. Since conventional backpropagation reconstruction is troubled by the twin-image problem, we show that the compressive sensing algorithm is effective in removing this twin-image artifact due to the sparsity in natural scenes. The reconstruction with a significantly improved signal-to-noise ratio from a single-shot image promotes a camera architecture that is flat and reliable in its structure and free of the need for strict calibration.


Introduction
Traditional optical imaging architectures follow a pointto-point imaging model by using a set of lenses. This way to collect and converge light limits the freedom of imaging parameters. By introducing computing capability into the imaging system, breakthrough systems have been built, yielding improvements in the image dimensions 1-3 , image size 4,5 , and even imaging mechanisms [6][7][8] . Furthermore, the burden of imaging can be transferred from bulky and expensive hardware to computation, which enables new architectures for low-cost cameras.
Recently, lensless imaging has become attractive due to its thin and easy-to-build form. In the past few years, various lensless imaging techniques have been proposed with coherent systems, such as an on-chip microscope 9,10 , coherent diffractive imaging 11,12 , and a series of learningbased methods [13][14][15] . Due to the requirement of coherent illumination, however, applications of such systems are limited. The pinhole camera offers a simple architecture for incoherent light imaging, which has an infinite depth of field and no aberration but suffers from the severe limitation of low light throughput. To make up for this drawback, coded aperture imaging replaces the pinhole with a mask. Each light source in the scene casts a unique shadow of the mask onto the sensor, encoding the intensity and the location of the light source. Masks, which include a uniform redundant array 16 or modified URA 17 , have traditionally been used in X-ray and γ-ray imaging. Several lensless cameras at visible wavelengths have been proposed using diffractive gratings that are insensitive to wavelength 18,19 , separable masks 20,21 and diffusers 22 . These methods depend on large-scale matrix inversion, which requires strict calibration and heavy computational resources.
Unlike the pure mathematical optimization methods mentioned above, we adopt the Fresnel zone plate as the mask to improve the condition of the imaging problem at the physical level. Soon after Gabor's invention of holography 23 , Rogers noted that the pattern of the Fresnel zone plate coincides with a point source hologram 24 . Inspired by this, Mertz and Young proposed zone plate coded imaging 25 and extended the concept and application of holography to the field of incoherent light. Recent works have demonstrated the imaging capability of the Fresnel zone plate, also called the Fresnel zone aperture (FZA), at visible wavelengths [26][27][28][29] . This prototype camera needs at least four FZAs with different phases to extract the signal from captured images. Time division and spatial division are two alternatives to implement the change of masks. The former increases the complexity of the system by using a spatial light modulator. The latter reduces the resolution because the size of the reconstruction image is only a quarter of the sensor size. Moreover, the crosstalk of light and sensor noise would further degrade the quality of the reconstructed image.
In mask-based imaging systems, image reconstruction is susceptible to noise, so a robust reconstruction algorithm is critical. Compressive sensing (CS) is a powerful signal reconstruction framework and provides complete theoretical support for image reconstruction 30 . The CS algorithm has been widely applied in optical imaging systems, such as single pixel cameras [31][32][33] and compressive holography [34][35][36][37][38] . These previous works pave the way for inverse problem solving in optical imaging and inspire us in terms of image reconstruction.
In this work, we propose a single-shot noise suppression lensless imaging method with FZA and incoherent illumination. The structure of the proposed imaging system is simple: an FZA mask is placed close to a sensor ( Fig. 1a) without precise alignment. Each point source in the scene casts an FZA shadow on the sensor plane, which is similar to the point source hologram, so that the FZA acts in some sense as a Fresnel hologram encoder. Backpropagation (BP) reconstruction is feasible but disturbed by twin image. Here, the CS algorithm that has been proven to be efficient for Fresnel holography is applied (Fig. 1c). By enforcing a sparsity constraint in the gradient domain, the reconstruction eliminates the twin-image effect and suppresses the other noise. Since the single-shot method only needs a single FZA, all the pixels can be saved for a full-size reconstruction image, while in the multiple-shot method, only a quarter size can be realized. The proposed method can greatly support high-quality imaging without the need for strict calibration for the FZA lensless camera.

Results
A lensless camera using a QHY163M CMOS image sensor is designed and arranged as shown in Fig. 1b where r 1 denotes the radius of the innermost zone and r is the radial distance from the center of the aperture. In addition, "sgn" is the signum function that is +1 for a positive number or −1 for a negative number. We use three FZAs with different r 1 to test the imaging resolution. The thickness of the photomask is 2 mm, and the thickness of the protective glass is 1 mm. Thus, the distance between the mask and the sensor is 3 mm. An LCD monitor with 1920 × 1080 resolution is placed 300 mm from the FZA. The test images are displayed on the screen, and the sensor records the rays from the screen that are modulated by the mask. For each test image, we adjusted the exposure time according to the scene intensity to avoid overexposure or underexposure.
The FZA imaging follows the pinhole imaging model. The position of each reconstructed image point on the sensor plane is at the intersection of the chief ray passing through the center of the FZA and the sensor plane (Fig. 2a). The magnification of the system is derived from the geometrical relationship: where h o is the object height and h i is the image height. The field of view (FOV) is expressed as The binary, grayscale and color images are tested in our experiments. The original images, the corresponding measured images, and the reconstruction results are shown in Fig. 2b. The original image is 20 cm × 20 cm when displayed on screen. The measured images are monochrome images with 16-bit depth. For binary and grayscale images, the closeups of the raw images normalize the grayscale of the measured images. The BP and CS algorithms are used to reconstruct images from measured images. According to the magnification, the central 526 × 526 portions of each image are cropped to be shown in Fig. 2b.
The FZA lensless camera has the capacity for color imaging as well. Because the imaging model is based on geometrical optics, the reconstruction is not influenced by the wavelength of the object. By using an RGB sensor, the intensity of different wavelength bands can be recorded by the RGB channels independently and the reconstruction algorithm applied to the three channels. Then, the reconstructed images of the three channels are combined into the final color image (Fig. 2c).

Resolution
In practice, the sensor and the FZA mask both have finite sizes, which means that the reconstruction is band-limited. If the sensor can completely record the FZA shadow and the pixel pitch is small enough to satisfy the sampling theorem, the frequency range is limited by the number of recorded zones, which in turn depends on the radius of the FZA. The larger the radius, the finer the zones that are included. Thus, the resolution of the reconstructed image can be improved by increasing the aperture radius R. On the other hand, the FZA constant r 1 also determines the number of zones within a fixed aperture radius R. Improving the resolution of the reconstructed image can also be realized by shrinking the FZA constant r 1 .
Quantitative analysis is carried out by means of the coherent impulse response (CIR). As shown in section "Resolution", the CIR of the imaging system is calculated as where R is the radius of FZA and J 1 (·) is the first-order Bessel function of the first kind. Because the intensity of the object is real-valued, only the real part of the CIR should be considered. According to the Rayleigh criterion, the minimum distance between resolvable points is defined as the distance from the center to the first zero of the CIR. The first zero of the real part of the exponential term is 0.707r 1 , whereas the first zero of the order-one Bessel function is 0:61 r 1 =R ð Þr 1 . Since r 1 << R, the resolution is determined by Assuming the FZA contains n zones, namely, R ¼ ffiffiffi n p r 1 , then the width of the outermost zone is Þ. In terms of Δr, the resolution is expressed as Equation (6) reveals the simple yet useful result that the resolution of an FZA imaging system is approximately equal to the width of the outermost zone. Taking the system magnification M into account, the resolution at the object plane is r 0 Figure 3 shows the CIRs and the corresponding reconstructed images with different values of r 1 . The aperture radius R = 5.12 mm, and the values of r 1 are 0.8, 0.5, and 0.3 mm. The corresponding values of r c and Δr are labeled in Fig. 3, which agrees with Eq. (6). The smaller FZA constant r 1 yields higher-quality reconstructions, which verifies the above discussion. Note that the resolution improvement is limited because when the FZA constant decreases, diffraction becomes noticeable, which obfuscates the reconstruction.
A negative USAF 1951 test chart is displayed on the monitor to experimentally test the resolution. The displayed size has a magnification three times larger than the standard size. Three FZAs with the same aperture radius R and different FZA constants r 1 are used for comparison. The FZA constants r 1 are 0.56, 0.32, and 0.25 mm. The aperture size is approximately equal to the sum of the whole image size and object imaging size, and the aperture radius is 4.56 mm. The results are shown in Fig. 4. For r 1 = 0.56 mm, group-2/ element 5 can barely be resolved (Fig. 4b), which represents that the minimum resolved distance is 3.78 mm at the object plane (or 0.038 mm at the image plane, after multiplying by the magnification 0.01). For r 1 = 0.32 mm, group 0/element 1 can barely be resolved (Fig. 4c), and the corresponding minimum resolved distance is 1.5 mm in the object plane and 0.015 mm in the image plane. The experimental values are close to the theoretical values, which are 0.042 mm and 0.014 mm calculated by Eq. (5), respectively. When r 1 reduces to 0.25 mm, the resolution does not improve significantly (Fig. 4d); that is, the diffraction effect limits the resolution.

Noise and error analysis
Except for the twin-image noise, the measurement can be corrupted by other noises and errors originating mainly from three sources: sensor, mask, and diffraction. The sensor noise consists of quantization noise and dark current noise. The effect of this noise on the reconstruction is relatively moderate and can be reduced by improving bit depth and increasing exposure time.
The mask error is caused by binarization of transmittance. The binary FZA, as a substitute of the GZP, consists of only transparent and opaque zones. However, in comparison to the GZP, which has only one pair of conjugate foci at f ¼ ± r 2 1 =λ, the FZA has multiple foci with corresponding focal lengths f ¼ ± r 2 1 =nλ; n ¼ 1; 3; 5:::. The property can be explained by using Fourier expansion. The Fourier series representation of this binary function is The derivation can be seen in the supplementary information (Section S1). Equation (7) states that the intensity transmittance function of the FZA is a line0ar superposition of a series of GZPs with different focal lengths. The higherorder focal lengths cause a series of defocused images at the first-order focal plane, which degrades the image quality. This issue can be remedied by using a binary Gabor zone plate (GZP) that has only one pair of conjugate foci 39,40 .
Increasing the regularization coefficient appropriately can also effectively suppress these defocused noises.
With a decreasing zone width, diffraction effects become noticeable. The diffracted pattern is no longer an ideal FZA pattern. To observe the diffraction phenomena, a white collimating LED light is adopted to illuminate the FZA mask. The FZA pattern with r 1 = 0.32 mm and the corresponding diffracted pattern are shown in Fig. 5a. The cross section of the radial intensity is shown in Fig. 5b. The contrast drops to the lowest value around a radius of 500 pixels away from the center due to the diffraction propagation law of objects with periodic transmittance. Since the contrast of the diffraction fringe of grating changes periodically along with propagation distance, the contrast is minimized when the propagation distance is an odd number of half Talbot lengths 41 ; that is, where p is the grating period. The distance is fixed, but the period p ¼ r 2 1 =r varies with the radius. Thus, the contrast also varies with the radius. This decrease in contrast results in model error and limits the reconstruction resolution. The mask can be improved by adopting a well-designed diffraction optical element to obtain the desired diffraction pattern in future work.

Outlook
We have proposed an FZA lensless imaging method under incoherent illumination using computational reconstruction. CS theory provides a guarantee for accurate signal reconstruction. The proposed camera consists of an image sensor and an FZA mask and thus is thin and compact. The incoherent rays pass through the FZA and cast object-dependent shadows on the sensor plane to compose the raw image. This mechanism is somewhat analogous to that of inline holography. Unlike conventional BP, which produces twin images, the compressive sensing algorithm with total variation (TV) regularization that we adopt here eliminates the twin-image effect and reconstructs the image with reduced noise.
The imaging resolution of this mask-based camera still has much room for further improvements by modifying the mask so that the diffracted pattern coincides with the expected pattern. The prospect of FZA imaging using such a relatively thin setup can lead to all kinds of applications. One is a portable camera to image various objects from a person in a room to the landscape outdoors. This approach can fulfill a range of recognition tasks, such as object detection, character recognition, and face recognition. Furthermore, the proposed camera can be monolithically fabricated by depositing the FZP pattern on the cover glass of the sensor so that it can be readily integrated with portable devices or any flat surface. It is suggested that this

Imaging model
An ideal zone plate is called a GZP. Its amplitude transmission function is However, such a zone plate is difficult to manufacture because of its sinusoidal variation transmittance. The FZA with binary transmission is a more practical alternative mask. For the sake of derivation, we substitute the transmission function of the GZP for the FZA in subsequent content. The object is placed at a distance of z 1 from the FZA and illuminated by an incoherent light source. The FZA is placed in front of an image sensor at distance z 2 . The object surface diffuses the light and can be considered a superposition of point sources. Each point source casts an FZA shadow on the sensor plane (Fig. 6a). The shadow center is at the intersection of the chief ray and the sensor plane. The size of the shadow is expanded from the FZA by the magnification factor (1 + z 2 /z 1 ). Then, the FZA constant of the shadow would be r 0 Thus, the image formed on the sensor is a superposition of shifted and scaled versions of FZA shadows. When z 1 >> z 2 , the FZA constant r 1 ′ is almost equal to r 1 . In this case, the imaging process can be formulated as where I(r) is the intensity distribution on the sensor plane; I k is the intensity of the kth point source; the vector r is the arbitrary position vector, and r k is the translation vector of the kth shadow in the sensor plane. Each FZA shadow can be considered a point source hologram that encodes the intensity and the location of the point source. All these elementary holograms synthesize the final measurement. Then, the reconstruction can be performed by coherent propagation (Fig. 6b). Both optical and computational methods are available for the reconstruction.
In the Fresnel approximation, the reconstructed wavefront may be written as where λ and d represent the wavelength of the reconstructed wavefront and the reconstructed distance, respectively. To ensure correct reconstruction, the two parameters should satisfy r 2 1 ¼ λd. The vector r o is the position vector in the reconstructed plane. Here, dS is the area element in the measured image. For the ideal case, the integral area is infinite. Expanding the cosine term in Eq. (10) into an exponential form and substituting it into Eq. (11) while ignoring the constant coefficient, Eq. (11) becomes The first term is a constant term that is proportional to the total intensity of the object. The second term is a set of points appearing at the same locations as the The third term is the superposition of spherical waves propagating from distance 2d. It can be considered an out-of-focus image, which is the so-called twin image. The inherent twin image obscures the reconstruction. The most common methods to remove twin images by experimental means, such as off-axis holography 42 and phase-shifting 43 , are not applicable in this case. Instead, we use a computational guarantee in the form of compressive sensing toward a twin-image-free reconstruction.
For the finite size in the real system, the reconstructed image point is no longer a delta function. By introducing the aperture function A r ð Þ ¼ circ r=R ð Þ into the integral, the CIR of the imaging system after eliminating the twin image is calculated by setting r k = 0 and I k = 1 in the second term of Eq. (12), which is Reconstruction algorithm The captured image may be represented as the convolution of the ideal image and FZA shadow by rewriting Eq. (10) as where "*" denotes the convolution. O(x, y) is the image to be restored on the sensor plane. e(x, y) is a random term that includes photodetector noise, crosstalk, quantization noise and artifacts caused by diffraction. If we divide the cosine term of T(x, y) into h x; y and h* is the conjugate of h to express T(x, y), Eq. (14) becomes where C ¼ O x; y ð Þ ð1=2Þ is a constant. Here, h(x, y) has the same form as the Fresnel propagation kernel exp i π=λd ð Þ x 2 þ y 2 ð Þ ½ when r 2 1 ¼ λd. U x; y ð Þ can be regarded as the diffracted wavefront of propagating at the virtual distance of d and the virtual wavelength of λ. U Ã x; y ð Þ is the conjugate wave of U x; y ð Þ. Equation (15) indicates that the measurement has the same form as the inline hologram except for the background intensity U j j 2 .
Let us denote by N x N y ¼ N xy the number of im O(x, y) age samples. The constant term in Eq. (15) can be removed by filtering out the direct current component. Then, the measured image I 2 R N xy is expressed as a function related to O 2 R N xy , which is the forward transform model: where F and F −1 are the Fourier transform operator and inverse Fourier transform operator, respectively, and H indicates the operator that multiplies by the transfer . Since O is a real function and H is a central symmetry function, Eq. (16) can be written as where H T indicates the operator multiplied by (17) is the representation of Eq. (14) in the frequency domain. Let F 2 C N xy N xy be a 2D discrete Fourier transform matrix. Denote Σ as a diagonal matrix whose nonzero entries are the discrete value of H T . Then, the observation I is given by I = KO, where K is Solving O with a known I and forward transform K is a typical inverse problem. The solution is not unique because any value can be freely assigned to the imaginary part of U(x, y) in Eq. (15). To remedy this issue, a priori knowledge as a regularization item should be integrated into the process of image reconstruction to obtain a stable solution. For natural images, the gradient distribution tends to be zero, and object O can be regarded as sparse in the gradient domain. In contrast, the twin image generates a diffuse pattern, which is nonsparse in the gradient domain. Thus, the sparsity constraint can eliminate the twin image.
Therefore, the reconstruction can be realized by minimizing the objective function: where Φ is the regularizer imposing the sparsity constraint and ||·|| 2 denotes the ' 2 norm. The regularization parameter τ controls the relative weight of the two terms. To guarantee accurate reconstruction, the observation matrix should be subject to a restricted isometry property (RIP) condition according to compressive sensing theory 44 . Note that K is a block-circulant matrix, which is under the RIP condition with very high probability 45 . With a suitable regularizer, the objective function can converge rapidly and acquire a good result.
Since Kis related to the reconstruction distance, the method can realize numerical focusing for 3D scenes. The axial resolution is analyzed in the supplementary section (Section S2).

Total variation denoising
Exploiting the sparsity of the unknown image can significantly enhance reconstruction performance. The ' 1 norm and TV are widely used regularization methods. For natural scenes, TV regularization usually works better than ' 1 regularization. The TV of an image is given by the sum of magnitudes of the image gradients as where Δ h i and Δ v i denote the horizontal and vertical firstorder local difference operations, respectively. Since the in-focus object has sharp edges while the out-of-focus twin image is diffuse, the TV of the in-focus image is much less than that of the twin image. can be represented in the form of the following TV minimization problem: We adapt the two-step iterative shrinkage/thresholding (TwIST) algorithm 46 to solve Eq. (21). The TwIST algorithm is a nonlinear two-step iterative version of the iterative shrinkage/thresholding (IST) algorithm to improve the convergence rate. The TwIST algorithm provides ' 1 regularization by default. It can also be extended to TV regularization by passing a function handle. The corresponding code can be found in GitHub 47 . The performance of TV regularization and ' 1 regularization is shown in Fig. 7. In this case, z 1 = 200 mm, z 2 = 3 mm, and the size of the object is 200 mm × 200 mm. The mean square error (MSE) is introduced to evaluate reconstruction quality quantitatively. The MSE of the reconstruction error is defined as Figure 7a shows the BP reconstruction calculated by Eq. (12); the original image is immersed in the noise caused by the twin image. Figure 7b, c shows the reconstructions by ' 1 and TV regularization, respectively. Both methods are able to eliminate the twin image effectively, but the TV regularization result has higher contrast and smaller MSE than the ' 1 regularization result. Moreover, TV regularization not only eliminates the twin image but also suppresses noise.
To evaluate the noise immunity capability, zero-mean Gaussian noise with 0.01 variance is added to the observed image. The BP reconstruction, the reconstruction by ' 1 and TV regularization from a noised observed image are shown in Fig. 7d-f, respectively. The ' 1 regularization result is not satisfactory due to residual noise. By contrast, TV denoising is remarkably effective in smoothing away noise while preserving edges. These results are obtained with 200 iterations. Figure 7g shows that the MSE varies with the iterations (see also Movie 1 for details). Reconstruction with TV regularization is used in our experiments.

Fabrication
The FZA mask is fabricated on soda-lime glass substrates with a thickness of 2 mm by applying the laser direct writing technique. A Cr layer (140 nm) is deposited on the substrates by vacuum evaporation. The surface of the Cr layer is oxidized for antireflection. A photoresist layer (1 μm) is then spin-coated on the Cr layer. This blank photomask is exposed under a scanning laser beam to create a latent image in the photoresist layer. After exposure, unexposed parts of the photoresist can be removed in a developer (5-7‰ NaOH solution). Subsequently, by immersion in an etchant ((NH 4 ) 2 Ce(NO 3 ) 6 + HCIO 4 solution), the open area of the Cr layer is etched away to form a transparent zone, while the Cr layer protected by the photoresist is not etched to form an opaque zone. Finally, the remaining photoresist is stripped by immersion in a high concentration of developer.