Progressive compressive sensing of large images with multiscale deep learning reconstruction

Compressive sensing (CS) is a sub-Nyquist sampling framework that has been employed to improve the performance of numerous imaging applications during the last 15 years. Yet, its application for large and high-resolution imaging remains challenging in terms of the computation and acquisition effort involved. Often, low-resolution imaging is sufficient for most of the considered tasks and only a fraction of cases demand high resolution, but the problem is that the user does not know in advance when high-resolution acquisition is required. To address this, we propose a multiscale progressive CS method for the high-resolution imaging. The progressive sampling refines the resolution of the image, while incorporating the already sampled low-resolution information, making the process highly efficient. Moreover, the multiscale property of the progressively sensed samples is capitalized for a fast, deep learning (DL) reconstruction, otherwise infeasible due to practical limitations of training on high-resolution images. The progressive CS and the multiscale reconstruction method are analyzed numerically and demonstrated experimentally with a single pixel camera imaging system. We demonstrate 4-megapixel size progressive compressive imaging with about half the overall number of samples, more than an order of magnitude faster reconstruction, and improved reconstruction quality compared to alternative conventional CS approaches.

. The Paley ordered Hadamard matrix can be constructed from two lower level Paley ordered Hadamard matrices; demonstration on the Hadamard matrix Rn=3 of size 2 3 ×2 3 . The upper half of the matrix represents the process of decimation, while the lower half represents the high-frequency detail extraction.

One-Dimensional Multiscale Hadamard Transform
Paley ordered Hadamard transformation of the signal f∈ ℝ 2 can be defined by the multiplication of f by the matrix R∈ ℝ 2 ×2 can be expressed in the following linear set of equations: where R is defined recursively by (2). To show the multiscale property of the Hadamard transform we will explore the Hadamard-Haar relations. The Haar wavelet transform operation can be defined recursively with the help of the unitary matrix 1 : where W 0 =1, and I n ∈ ℝ 2 ×2 is the identity matrix. Then, the Haar transform of the signal f ∈ ℝ 2 is given by s n =W n f= 1 √2 ( s n-1 d n-1 ) = 1 √2 ( W n-1 a n-1 d n-1 ) = (A3) , where a n-1 ∈ ℝ 2 −1 and d n-1 ∈ ℝ 2 −1 are the − 1 order approximation (lower resolution) and the details parts of the Haar wavelet transform, respectively, defined by: .

(A4)
The wavelet transformation of f 4 is illustrated in Fig. A2. To show the multiscale property of the Hadamard transform, we prove that if we separate g obtained by (A1) into an upper half u ∈ ℝ 2 −1 and lower half v ∈ ℝ 2 −1 , so that: we get: Because a n-1 is the approximation (lower resolution) of the signal f, it means that the upper half of the sample vector g is proportional to the Hadamard transform of the lower resolution level of f. Furthermore, the same treatment (because the form of the equation is the same as Eq. (A1)) can be applied also on u, meaning that the upper half of the vector u is proportional to the Hadamard transform of the lower resolution of a n-1 , and so on. Thus (A6) defines a recursive relation between the Hadamard measurements to the multiscale approximation levels.

(A8)
By using the Kronecker product relation 2 (A⊗B)(C⊗D)=(AC)⊗(BD): Because W n is unitary, we can write After separating g into an upper and lower half, we get: and because s n-1 =W n-1 a n-1 , we get (A6) as intended.

Two-Dimensional Multiscale Hadamard Transform
In this subsection, we generalize the property (A6) to two dimensional signals (images). The two-dimensional Hadamard transform of an image F ∈ ℝ 2 ×2 can be written as 1 : To show the multiscale property of the Hadamard transform in two dimensions, we will use Eq. (A6). First, we write where f∈ ℝ 2 are the column vectors of F, u∈ ℝ 2 −1 are the upper half of G matrix column vectors and v∈ ℝ 2 −1 are the lower half.
By applying the one-dimensional wavelet transform on the image columns and using Eq. (A6) we get: ( u 1 u 2 ⋯ u 2 n ) = = 1 √2 R n-1 ( a n,1 a n,2 ⋯ a n,2 n )R n . (A14) Equation (A14) means that the upper part of the matrix G represents the Hadamard transform of the approximation (lower resolution) along the columns of the image F (see Fig. A3).

B. DEEP LEARNING NETWORK ARCHITECTURE
The proposed deep Convolutional Neural Network (CNN) architecture is illustrated schematically in Fig. B1 (a). Due to the way we reconstruct the compressed image, we can only use CNN frameworks that operate by pre-upsampling 3 the lowresolution image. Therefore, for our application, we considered networks such as the VDSR 4 and DRRN 5 , and network design strategies introduced in ResNet 6 , SRDenseNet 7 and RDN 8 . Inspired by the VDSR 4 , the proposed Compressive Multi-Scale network (CMSnet) receives an approximation of the image and returns the predicted residual between the estimation and the original image. However, instead of using standard convolutional layers (as done with VDSR), we use five concatenation blocks (CB) illustrated schematically in Fig. B1 In a similar way to the ResNet 6 , and SRDenseNet 7 , we add a skip connection to each CB, followed by a depth concatenation instead of addition. As in the previous frameworks, we add a rectified linear unit (ReLU) after each convolutional layer. The purpose of the skip connection is to propagate a coarse approximation and to mitigate vanishing gradients 3,9 . After the concatenation, we add an additional convolutional layer. The purpose of this layer is to compress the number of features by the factor of two, de facto forcing the network to use a weighted average of the features after the skip connection, instead of addition.  Tables B1 ad B2 for lock sizes).
We summarize CMSnet's layers in Table B1 and CB layers in Table B2. For each layer, we present the dimension of feature maps at the output (activations), and the dimensions of the weights and bias of the convolutional layers.
The proposed CMSnet was trained on a combination of 3 image datasets -DiV2K 10 which contains 800 images, Flickr2K 11 which contains 2650 images and Outdoor Scene Training 12 which contains over 10000 images. Each image was converted to grayscale representation and cropped to an n-by-n resolution. We trained a different network for each range of compression ratios, in jumps of 10 (e.g., M/N = 0.1-0.2, 0.2-0.3, …). The test and validation images consisted of images separate from the database.
One of the most important aspects of the proposed multiscale sensing scheme is the variable density sampling. In contrast to the random compressive sampling with the Hadamard patterns, the variable density sampling makes the inverse Hadamard transform of the compressed samples closely resemble the original uncompressed image. This key aspect makes it possible to apply a convolutional neural network directly to this first approximation of the image and therefore, it can work on any image of any size. This is in contrast with alternative compressive sampling reconstruction methods that would have to apply a fully connected layer in order to build this first approximation layer. Such fully connected layer will be prohibitively expensive computationally, due to the high amount of weights needed to be optimized.
To demonstrate this aspect, in Fig. B2 we show what the reconstruction of a high-resolution image with the CNN method, would look like after sampling the scene with random Hadamard patterns. We reconstruct the image with the same proposed CNN but trained on the random Hadamard patterns. We compare the reconstruction result with our sensing and reconstruction approach, trained on the same CNN but utilizing the multiscale property of the Hadamard sampling matrix. Notice that the scene was sampled by the same amount of the same Hadamard samples. The only difference is that while during the naïve approach we choose the samples uniformly at random, with the proposed approach, we chose the samples at random but with variable density, preferring more lower sequency samples rather than samples at higher sequency. From Fig. B2 it is apparent that it is very difficult to reconstruct an image with a CNN, after taking random Hadamard samples, contrary to iterative, non-CNN methods, which might reconstruct the image in such a case. This extremely poor reconstruction result can be explained by the fact that there are many very low-resolution samples missing during the compressive sampling process. This can be seen in Fig. B2 (c), the inverse Hadamard transform of the random Hadamard samples is missing a lot of low frequency and DC information, resulting in large block artifacts. Because the network is only trained on small 42 by 42 patches, it is impossible for it to reconstruct an image that has patterns of a larger size. On the contrary, the inverse Hadamard transform of the image, sampled after the proposed multiscale approach (Fig. B2(f)), has all the required DC and low frequency information, thus making it possible to use a convolutional network that was trained on small patches.
The most straightforward way to apply the CMSnet on the compressed multiscale Hadamard samples is to apply it directly to the Hadamard transform of the compressed samples as shown in Fig. B3.   Fig. B3. Reconstruction of the compressed multiscale Hadamard samples directly by the CMSnet.
To improve the reconstruction quality, it is also possible to apply the reconstruction iteratively. We can add the compressive samples to the newly reconstructed image and apply the CMSnet again to improve the quality (see Fig. B4). This process can be repeated multiple times until there is no further improvement to the reconstruction quality. In our research, we found that after 3 times, the image quality no longer improves. This reconstruction method works well, however, the reconstruction quality can be further improved by using the multiscale information about the image that was obtained during the sampling process. We reconstruct each scale of the image from low to high resolution. A heuristic way to show how this multiscale information helps in reconstruction, is to look at the sampling patterns of the Hadamard matrix (see Fig. B5). In Fig. B5 we start from the initial sampling map of the 2D Hadamard transform, ordered sequentially. The samples taken are depicted in beige color and the missing samples are depicted in black. By using the multiscale property of the Hadamard transform, we estimate the 128 by 128 image from the samples by using the proposed CNN. We then can fill in the missing, black samples by taking the Hadamard transform of the estimated image. The process is repeated iteratively until the final resolution is achieved.
Notice that in the final step, only the high sequency samples are missing while all the low sequency points are already filled in by the estimation from the previous step. This way there are no low frequency patterns that are larger than the 42 by 42 patches that the network was trained on (and therefore couldn't reconstruct) and we can prevent the reconstruction artifacts depicted in Fig. B2 (d). The complete reconstruction procedure is performed in two steps. In the first step, we generate the first estimation of the image, and in the second step, we plug the estimation into the proposed CNN in order to receive the final image.
At the first stage of the progressive sampling, where the first resolution level is captured and, hence, no information of a smaller resolution image is available, the first estimation of the image is taken as the inverse Hadamard transform of the compressed Hadamard samples. In the following stages of the progressive sampling, the already available lower resolution reconstruction is up-sampled by a Bicubic interpolation. After the interpolation, the image is converted to the Paley ordered Hadamard domain and its entries are substituted with the additional compressive Hadamard samples, ΔM. We then restore the image back to the spatial domain, but this time, including the newly measured compressive Hadamard samples. This approximation process is illustrated in Fig. B6. In the second step, we plug the obtained image into the CMSnet designed for this purpose. Finally, we reconstruct the image by adding the predicted residual to the estimated image. In order to improve the reconstruction quality, the second step can be applied iteratively (See Fig. B4).
A typical multiscale sampling procedure would require the user to prepare beforehand the variable sampling strategy. For example, let's imagine that it was decided to use M/N = 0.125 compression at a 256x256 scale. Following variable density sampling 13 , this would mean that at 64x64 scale, M/N = 0.91, and at 128x128 scale, M/N = 0.48. Now, we assume that a human observer is in the loop, sampling the scene at its lowest resolution, 64x64. If the observer cannot identify the target at its lowest resolution, he may request additional samples, predetermined by the sampling strategy. This would mean that to achieve the M/N = 0.48 ratio at 128x128, only M/N = 0.26 samples would be added. If the quality is still unsatisfactory, the observer can request additional M/N = 0.004 samples to achieve M/N = 0.10 compression ratio at 256x256 scale. Following this scenario, only a single scale would need to be reconstructed by the CMSnet each time.
To show the effectiveness of the proposed multiscale sampling and reconstruction approach, we compare our reconstruction method with other DL compressive reconstruction methods in tables B3 and B4. The methods chosen for the first comparison are ReconNet 14 , ISTA-Net 15 and CSNet 16 . The comparison was run on Set11 14 containing eleven 256 by 256 grayscale images. We can see that our method is superior for this kind of application in terms of PSNR over other methods in table B3.
For the second comparison, we compare our method to CSGAN 17 , ReconNet 14 and SCGAN 18 trained on cropped 64 by 64 images of faces from the CelabA 19 dataset. For testing, we have used 20,000 face images from the test set. We can see that our method is superior for this kind of application in terms of PSNR over other methods in table B4.
For the third comparison, we compared our multiscale sensing and reconstruction method with two adaptive sampling methods 20,21 . Our method achieved 31.75dB reconstruction PSNR from M = 12,796 compressive samples of the "Lena" image, while ADS 20 achieved 28.7dB PSNR under the same conditions. On the "house" image, after simulating shot-noise and detector read-out noise, our method achieved 29.31dB SNR at M/N = 6.25% compression, while the wavelet three parsing method 21 Table B4. PSNR and SSIM comparison between the proposed multiscale sampling and reconstruction method, and alternative DL reconstruction methods on CelebA dataset. Best PSNR and SSIM is highlighted in bold.

C. SPC experiment results
In figure C1 below we presented experimental results, taken with the multiscale CS SPC imager, and reconstructed by CMCnet from only M/N = 1% compressive samples.