Abstract
Modern lens designs are capable of resolving greater than 10 gigapixels, while advances in camera framerate and hyperspectral imaging have made data acquisition rates of Terapixel/second a real possibility. The main bottlenecks preventing such high datarate systems are power consumption and data storage. In this work, we show that analog photonic encoders could address this challenge, enabling highspeed image compression using ordersofmagnitude lower power than digital electronics. Our approach relies on a siliconphotonics frontend to compress raw image data, foregoing energyintensive image conditioning and reducing data storage requirements. The compression scheme uses a passive disordered photonic structure to perform kerneltype random projections of the raw image data with minimal power consumption and low latency. A backend neural network can then reconstruct the original images with structural similarity exceeding 90%. This scheme has the potential to process data streams exceeding Terapixel/second using less than 100 fJ/pixel, providing a path to ultrahighresolution data and image acquisition systems.
Similar content being viewed by others
Introduction
From the invention of photography until the early 1990, the function of a camera was to record analog images. With the development of solidstate focal planes and digital coding, however, this function changed, and modern digital cameras act as transceivers that transform massively parallel optical data streams into serial coded electronic data that is processed 1 pixel at a time^{1,2,3}. While this paradigm shift introduced numerous advantages, the power consumption associated with electronic digital processing, along with limits on data transmission rate and storage capacity, are now the major bottlenecks limiting image data acquisition rates^{1,2,3,4}. In contrast, compact lens designs are capable of resolving greater than 10 gigapixels of transverse resolution^{5,6}, while advances in multimodal imaging systems capable of acquiring spectral, polarization, temporal, and range information could enable future imaging systems to acquire (Tera)10^{12} pixels per second of data. However, the power consumption and resulting coding capacity of optoelectronic transceivers is one of the primary barriers to achieving such systems^{1,2,3}.
In current digital electronicsbased image processing systems, electrical power consumption is proportional to the number of mathematical operations performed on each pixel^{1}. Conventional image signal processing (ISP) systems perform 1001000 operations per pixel to first condition (e.g., compensate for pixel nonuniformity, hotpixels, denoising etc.) and then compress the image data stream, resulting in a per pixel cost of 0.11 microjoule^{1}. Refs. ^{1,2,3}. show that in highresolution gigapixel imaging systems operating at 30 frames/s, the image sensors draw 100 milliwatts/megapixel, whereas the backend digital image processing pipeline draws approximately 10X more power amounting to ~1000 milliwatts/megapixel. This implies that processing 1 terapixel of data would result in power consumption of around 1 megawatt, which is prohibitive for many applications. In addition, while image compression is required for most remote sensing applications, many of these pixel conditioning operations are performed at the front end regardless of whether they are necessary for a given application. Recently, different ISP approaches, such as blind sensor head compression were proposed to reduce the number of operations per pixel^{1}. Blind compression, in this context, can be understood as implementing the first layer or two of a deep neural networkbased autoencoder on the readout data stream. While this approach substantially reduces the number of operations per pixel, the per pixel power cost still remains unacceptably high, at ~0.010.1 microjoule^{1}.
Future Terapixel/second imaging systems (e.g., imaging systems with (Giga)10^{9} pixel resolution with frame rates of (Kilo) 10^{3} Hz or imaging systems with higher pixel resolution but slower frame rates) will require an alternative ISP approach with dramatically higher throughput and lower power consumption. In this work, we explore the potential for analog photonics to improve throughput and power consumption in an image acquisition pipeline after image formation on the focal plane array. This is in stark contrast to the significant body of work focused on image classification, inference, or compressed sensing of the raw scene information that operates by processing the scene data directly in the analog optical domain at the image acquisition stage^{4,7,8,9,10,11}. While both schemes are valuable, we believe there are several important advantages for our approach, which has been relatively underexplored. First, conventional imaging optics and focal plane arrays are highly optimized, and it is difficult to improve on their baseline performance, particularly in wavelength regimes where highresolution focal plane arrays are available. As a result, our approach does not attempt to alter the original image formation process. Instead, we designed an accelerator to address the two main bottlenecks limiting persistent, highdatarate image acquisition: power consumption and compression speed. Second, by positioning the accelerator after the image formation and initial opticaltoelectrical conversion step, this scheme is immediately compatible with any image acquisition system, regardless of operating wavelength, camera resolution, frontend optics, frame rate, or application. There is no added insertion loss before the initial detection stage, enabling compatibility with lowlight and highspeed imaging applications. Perhaps most significantly, our approach works with either ambient illumination imaging or active illumination, whereas many of the compressive imaging schemes that operate at the image acquisition stage require active illumination with a coherent source^{7,8}, which severely limits the application space.
Instead of modifying the original image formation process, our approach builds on the neural network framework proposed in the blind compression work^{1}, which allows the key frontend image processing task to be accomplished using a single matrixvector multiplication. Fortunately, analog optical computing is particularly wellsuited for this type of operation and is being explored for a variety of matrixmultiplicationheavy computing applications^{12,13,14,15}. The key advantage is that optical computing engines can perform matrix multiplication with energy consumption that scales linearly with the dimension of the input dataset (N) as opposed to the quadratic scaling (N^{2}) inherent to electronic approaches. In addition, optical computing engines are able to process N pixels in parallel with an overall speed that can exceed 100 GHz, only limited by the speed of the optical modulators and photodetectors used to encode the input data and record the result^{16,17,18}.
In this work, we show that an optical image processing engine can take advantage of these unique features to enable highspeed image compression with the potential for ordersofmagnitude lower power consumption than current techniques. Our approach is based on a passive, CMOScompatible silicon photonics device that performs the matrixvector multiplication required for frontend image processing. We experimentally demonstrate image compression with a ratio of 1:4 and develop a backend neural network capable of reconstructing the original images with an average peak signaltonoise ratio (PSNR) ~ 25 dB and structural similarity index measure (SSIM) ~ 0.9, comparable to the common electronic softwarebased lossy compression schemes such as JPEG^{19}. By processing modestsized kernels in series, our approach is inherently scalable and could process highframe rate, limited regionofinterest image data, or Gigapixel images at slower data rates. Finally, by constructing the optical image compression engine on a silicon photonics platform, this approach meets the sizeweightandpower and integration requirements for a wide range of applications including surveillance, microscopy, machine vision, astronomy, or remote sensing. Analysis of the throughput and power consumption for our optical image processing engine indicates that this technique has the potential to encode Terapixel/second data streams utilizing <100 femtojoules per pixel—representing a > 1000x reduction in power consumption compared with the stateoftheart electronic approaches. This improvement in throughput and power consumption is made possible by parallelization of pixel processing and transferring the majority of the image processing and conditioning tasks (e.g., denoising, linearization, etc.) from the frontend encoding interface to the backend decoding interface. This tradeoff is particularly attractive for remote sensing and imaging applications in which image reconstruction is only performed on a needtoknow basis and is usually conducted at a remote cloud site with better access to power.
Results
Operating principle of the image encoder
The optical image compression scheme relies on an autoencoder neural network framework in which the compressed image is naturally formed at the “bottleneck” layer in a neural network^{20}. This approach has gained traction in recent years due to its ability to simultaneously perform data compression, dimensionality reduction, and denoising^{20,21,22}. In our implementation, the first half of the neural network (mapping the original image data to the compressed image at the bottleneck layer) is performed optically, while the second half of the network (reconstructing the image) is performed using digital electronics. A schematic of the optical encoder and corresponding neural network structure is shown in Fig. 1. Our hybrid optoelectronic autoencoder takes advantage of the fact that most neural encoding networks are not very sensitive to the details of the feature map (i.e., the weights and connections) implemented by the first few layers. In fact, researchers have shown that the first few layers can often be assigned random weights without compromising performance^{23}. This allows us to use a predesigned, passive photonic device to perform the transform used in the first layer of the autoencoder network. In this case, we designed the photonic layer to perform local kernellike random transforms on small blocks of the image at a time. This random encoding scheme was selected based on compressive measurement theory, which has shown that random transforms are ideal for a variety of dimensionality reduction and compression tasks^{24,25,26,27,28}. However, unlike conventional compressed sensing measurements, which are applied at the data acquisition stage^{11}, we are addressing the problem of dimensionality reduction and data compression after image formation.
As shown in Fig. 1, the silicon photonicsbased image encoder consists of N single mode input waveguides, each with a dedicated modulator, followed by a multimode waveguide region, a random encoding layer, and M photodetectors. A laser (not shown) provides light with equal amplitude to the N input waveguides where the N modulators encode a \(\sqrt{N}\times \sqrt{N}\) pixel block of the input image onto the amplitude of light transmitted through each waveguide. Light from each waveguide is then coupled into a multimode waveguide region before scattering through the random encoding layer and finally reaching the photodetectors. The random encoding layer consists of a series of randomly positioned scattering centers fabricated by etching air holes in the silicon waveguiding layer (see the “Methods” section for more details). Since the optical device operates in the linear regime, we can describe the encoding process using a single transmission matrix (T), which relates the input (I) to the transmitted output (O) as O = TI, where I is a N × 1 vector, O is a M × 1 vector, and T is a \(M\times N\) matrix. By forcing M to be less than N, the device effectively performs a single matrix multiplication in order to compress an N pixel block of the original image into M output pixels. Since the random encoding layer is entirely passive, this compression process can be extremely fast, operating on N pixels in parallel at speeds limited only by the modulators and photodetectors. In addition, the energy consumption scales linearly with N (i.e., the number of modulators), even though the device performs \(M\times N\) operations.
To further clarify the role of our data compression system, we can consider the entire image acquisition/compression process in 4 steps:

(1)
Conventional imaging optics form an image on the focal plane array of the camera.

(2)
Conventional focal plane array detectors convert the analog optical image to the electrical domain.
At this point we have two options:
(3a) Most commercially available cameras are designed to digitize the image data recorded on the focal plane array. Using this type of camera, we would then use a digital to analog converter (DAC) to drive the optical modulators onchip, reencoding the image information in the optical domain on an optical carrier.
(3b) Focal plane arrays are also commercially available, which provide a direct analog output^{29}. This analog output could be used to directly drive the optical modulators to reencode the image information without an intermediate digitization step. A transimpedance amplifier (TIA) can directly convert the analog photocurrent to voltage to drive the reencoding modulation, with appropriate amplification.
(4) The onchip photonic encoder then performs highspeed, low power compression and the output from the detectors onchip is digitized and stored for offline image reconstruction.
Using focal plane arrays that provide an analog output (option 3b) has the potential to significantly reduce the overall power consumption and throughput by avoiding the intermediate analog to digital conversion (ADC)/DAC steps. However, our underlying approach (the photonic chip performing compression) is compatible with both approaches, which is useful given the ubiquity of cameras with integrated ADCs.
The local kernel size, N, is a key parameter driving the performance of the photonic image processing engine. While using smaller kernellike transforms reduces the data throughput, since the device can now only compress N pixels at a time, smaller kernels also have several advantages. First, local transforms maintain the spatial structure of the original image, which tends to improve the image reconstruction, as we discuss in the next section. Second, the kernel approach can be used to compress arbitrarily large images without requiring a corresponding increase in the numbers of modulators and detectors. Third, using these local transforms helps to isolate noise from a given pixel (e.g., a hot pixel), which could otherwise spread across the entire compressed image. Finally, since this compression scheme effectively maps the input image blocks to speckle patterns, using a large kernel could lead to lowcontrast speckles which can also degrade the image reconstruction similar to the trend observed in specklebased spectrometers^{30}.
Effect of kernel size and kernel type on image compressibility
To determine the effect of kernel size on image compression, we performed numerical simulations of the image compression and reconstruction process using images taken from the DIV2K and Flickr2K dataset^{31,32} and synthetically generated random T matrices. To reduce the computation time, the images were converted to grayscale and cropped to a resolution of 512 \(\times\) 512 pixels. The dataset consisted of 4152 images divided into a 3650image training dataset and a 502image validation dataset. In this case, the compression process was simulated by multiplying each \(\sqrt{N}\times \sqrt{N}\) block of an image by a numerically generated random matrix consisting of real, positive numbers uniformly distributed between 0 and 1. We then trained a neural network to reconstruct the original image from the compressed image (see “Methods” for a detailed description of the neural network architecture and training routine). Finally, we used the test images from DIV2K and Flickr2K dataset to evaluate the reconstructed image fidelity after compression using kernels of varying sizes.
An example image from the set of test images is shown in Fig. 2a along with the compressed images obtained using 8\(\times\)8 (Fig. 2b) and 32\(\times\)32pixel kernels (Fig. 2e). In this case, we fixed the compression ratio \((M:N)\) at 1:8 and the images were compressed from 512\(\times\)512 to 8\(\times \left(64\times 64\right)\) or \(128\times \left(16\times 16\right)\) pixel datacubes. The reconstructed images using the two kernel sizes are shown in Fig. 2c, f. Using a smaller kernel size clearly retains more of the spatial structure in the compressed image (Fig. 2b), resulting in a higher fidelity reconstruction. The average peak signaltonoise ratio (PSNR) and the structural similarity index measure (SSIM) of the reconstructed images in the test image dataset are shown in Fig. 2d as a function of kernel size. We found that smaller kernels generally result in higherquality image reconstruction. Unlike spatially uncorrelated and sparse data—which can be efficiently compressed using large random matrices—image data naturally includes spatial structure and spatial correlations which are important to maintain. In general, the optimal kernel size will depend on the type of images being compressed and will be impacted by factors such as the sparsity and spatial frequency content of the images.
For benchmarking, we compared our compression technique using an 8\(\times\)8 kernel and 16\(\times\)16 kernel to standard JPEG compression (which uses an 8\(\times\)8 kernel) on images from the same test dataset (DIV2k and Flickr2K). Figure 3 (a, b) compares the average PSNR (a) and SSIM (b) of the test image dataset obtained using JPEG compression to our encoding scheme as a function of compression ratios. As shown in Fig. 3a, b, our approach provides slightly lower PSNR/SSIM than JPEG at low compression ratios (e.g., <1:16) and comparable PSNR/SSIM at intermediate ratios (1:32 and 1:64). Our approach also enables higher compression ratios than is possible using JPEG (which is limited to a ratio of 1:64) and maintains an acceptable average PSNR > 20 dB up to ratios of 1:256. Furthermore, unlike photonic compression which allowed for fixed compression ratios, not all images from the test dataset could be compressed to the ratio of 1:64 using JPEG compression (only ~400 out of 500 test images could be compressed up to 1:64). Figure 3c, d shows the compression ratio dependence of the same example image as in Fig. 2a reconstructed after compression using the photonic approach and using JPEG. For this image, the highest compression ratio that we could achieve using the JPEG algorithm is 1:45 (see Supplementary Information: Section S6. Comparison of JPEG and photonic compression for more examples). At ratios of 1:8 and 1:16, both techniques provide excellent quality images. At higher compression ratios (compression ratios > 1:32), JPEG compression introduces pixilation artifacts, whereas the photonic compression scheme loses some of the higher spatial frequency content. These differences result from fundamental distinctions between the two compression schemes.
The JPEG compression algorithm applies a discrete cosine transform (DCT) to every 8 × 8pixel block in an image^{19}. It then applies a thresholding operation to store the most significant basis functions. While this nonlinear, datadependent transformation approach results in highquality compression for low compression ratios or for sparse images, it has several drawbacks compared to the photonic compression scheme introduced here, particularly for high datarate imaging applications. First, JPEG compression ratios are imagedependent. This can be problematic for highdatarate image acquisition, which would have to allocate variablesized memory blocks as the compression ratio changes. Second, JPEG performs many more operations on each pixel than our photonic scheme, since it performs a full DCT transformations on every 8 × 8 pixel block before selecting the basis functions to retain. This results in increased power consumption and slower throughput since it requires multiple clockcycles. Finally, JPEG is unable to perform denoising or image conditioning. As described in the Experimental section below, the photonic scheme is able to simultaneously perform image compression, image conditioning (pixel linearity, hot pixels, etc.), and denoising by using a neuralnetwork based nonlinear decoding scheme.
In addition to kernel size, we also investigated two types of random kernels. In Figs. 2 and 3, we presented simulations using synthesized random T matrices that were real and positive. This simulated the compression process if light coupled to each of the input waveguides was effectively incoherent. For example, a frequency comb or other multiwavelength source could be used to couple light at different wavelengths into each waveguide (to be more quantitative, light in each waveguide should be separated in frequency by at least ~10X the detector bandwidth to minimize interference effects)^{33}. In this case, the speckle patterns formed by light from each waveguide would sum incoherently on the detectors and the compression process can be modeled using a random T matrix that is realvalued and nonnegative. The second case we considered is using a complexvalued field T matrix, in which each element in T was assigned a random amplitude and phase. In this case, the compressed image was obtained as the squarelaw detector response: \(O=\left(T\sqrt{I}\right){\left(T\sqrt{I}\right)}^{*}\). This case simulates the effect of coupling a single, coherent laser to all the input waveguides at once such that the measured speckle pattern is formed by interference between light from each waveguide.
To evaluate the tradeoff between real and complex transforms, we evaluated the reconstructed image quality at varying noise levels. In general, noise could be introduced in the original image formation process (e.g., due to lowlight levels or imperfections in the imaging optics), through the camera optoelectronic conversion process (e.g., due to pixel nonlinearity or the limited bit depth of the camera pixels), or through the optical compression process described in this work (e.g., due to laser intensity noise, environmental variations in the T matrix, or simply shot noise at the detection stage). To simulate the effect of noise on the reconstruction of the compressed images, we numerically added gaussian white noise to the compressed images. Figure 4a, d shows the same test image evaluated in Fig. 2, compressed by a factor of 8X (compression ratio 1:8) using either a realor complexvalued 8 × 8 pixel T matrix. In this case, gaussian white noise with an amplitude equal to 2% of the average signal level in the image (corresponding to an SNR of 50) was added to each compressed image. The reconstructed images using the real and complex \(T\) matrices are shown in Fig. 4b, e. At 2 % noise level (SNR = 50), the reconstructed images are only marginally worse than the reconstructed image without noise shown in Fig. 2c (PSNR = 25.1 dB with noise vs PSNR = 26.9 dB without noise for the case of a real transform). This measurement confirms that the autoencoder framework is relatively resilient to noise, which is consistent with prior applications of autoencoders for denoising tasks. This resilience could also enable the system to forego the energyintensive image conditioning by encoding raw image data and relying on the backend neural network to compensate for noise due to effects such as pixel nonuniformity. In Fig. 4c, f, we present the average PSNR and SSIM for reconstructed test images that were compressed using either real valued or complex valued T matrices, as a function of the SNR of the compressed images. These simulations showed that at relatively high SNR ( > 50), the real and complex valued T matrices provided comparable performance. However, at lower SNR, the complex valued T matrices provided more robust image compression due to the higher contrast in the compressed images obtained using a complex \(T\) matrix.
Experimental image compression and denoising
Next, we performed experiments to validate the following key predictions:

(1)
To confirm that our proposed approach (using an analog photonicsbased fixed linear random matrix for compression and a nonlinear neural network for decompression) could provide comparable quality image compression to JPEG (which uses an imagedependent compression scheme which is far more energy and timeintensive).

(2)
To confirm that this technique could be used for both denoising and compression.
For experimental validation of our image processing approach, we fabricated a prototype device on a silicon photonics platform. The experimental device included \(N\)=16 singlemode input waveguides connected to the scattering layer through a multimode waveguide. The multimode waveguide region allowed light from each singlemode waveguide to spread out along the transverse axis before reaching the random scattering layer. This ensured that we obtained a uniformly distributed random transmission matrix without requiring an excessively long random scattering medium, which would introduce excess loss through outofplane scattering. To illustrate the impact of the multimode waveguide, we performed fullwave numerical simulations of singlemode waveguides either connected directly to the scattering layer or connected through an intermediate multimode waveguide region. In the first case, shown in Fig. 5a, the scattering layer was not thick enough for light to fully diffuse along the transverse axis, resulting in a high concentration of transmitted light near the position of the input waveguide. In terms of a \({T}\) matrix, this resulted in stronger coefficients along the diagonal rather than the desired, uniformly distributed random matrix. We then simulated the effect of adding a 32 μm multimode waveguide between the singlemode input waveguide and the scattering layer. As shown in Fig. 5(b), the multimode waveguide allowed light from the singlemode waveguide to extend across the scattering layer, resulting in a transmitted speckle pattern that was uniformly distributed.
The device was fabricated using a standard silicononinsulator wafer with a 250 nm thick silicon layer. The fabricated device consisted of 16 singlemode input waveguides connecting the device to the edge of the chip. The waveguides were 450 nm wide and separated by 3 \(\mu m\) (corresponding to a spacing of ~2\(\lambda\) at a wavelength of 1550 nm to minimize evanescent coupling). All 16 waveguides were connected to a 55.2 \(\mu m\) wide, 120 \(\mu m\) long multimode waveguide region, followed by a 30 \(\mu m\) long scattering region. The scattering region consisted of randomly placed 50 nm radius cylinders with a 3% filling fraction etched in the silicon waveguiding layer. The scattering layer parameters were empirically optimized to achieve a transmission of ~30% [see Supplementary Information: Section S5. Additional Experimental Characterization for experimental results]. To minimize leakage of light at the edges of the scattering layer, we added a full bandgap photonic crystal layer on the sides of the scattering layer^{34,35,36}. We experimentally confirmed that transmission through the device was ~30%, as described in the Supplementary Information [Supplementary Information: Section S5. Additional Experimental Characterization]. Since this initial prototype did not include integrated photodetectors, we etched a ridge in the silicon waveguiding layer after the scattering region. This allowed us to record the light scattered outofplane from this ridge to measure the optical power which would be recorded if detectors were integrated in the device. Scanning electron microscope images of the fabricated device are shown in Fig. 5c.
We note here that the design goals of the random scattering medium for this application are significantly different compared with prior applications of onchip scattering media such as the speckle spectrometer reported in Ref. ^{36}. While a speckle spectrometer relies on obtaining distinct random projections for different input wavelengths, necessitating a relatively large, stronglyscattering region, here we desire a scattering medium with a broadband response (to minimize temperature dependence), but distinct, uniformly distributed random projections for different spatial modes. The compression device should also be designed to minimize the footprint and loss while still providing the type of fully distributed random transmission matrix required for highquality image compression. As shown in Fig. 5 a, b, we found that a multimode waveguide region followed by a short scattering region allowed us to achieve this combination since it allowed each spatial input to overlap before reaching the scattering medium. This provided a uniformly distributed transmission matrix without requiring a large scattering region which would add significant loss.
To test the device, we first measured the \(T\) matrix by coupling an input laser operating at a wavelength of 1550 nm into each single mode waveguide and recording the speckle pattern scattered from the detection ridge after the scattering layer using an optical microscope setup. A typical image recorded using the optical setup is shown in Fig. 5d.
In order to account for experimental noise in the image compression and recovery process, we recorded two sequential \(T\) matrices, as shown in Fig. 6a, b. The \({T}\) matrix was highly repeatable, as revealed in Fig. 6c, which shows the difference between the two measurements. A histogram of the difference in the matrices, shown in Fig. 6d, indicates a gaussianlike random noise with amplitude ~1% of the average signal value (corresponding to a measurement SNR ~ 100). As shown in Fig. 4, at this SNR, both real and complex transformations provide similar results in terms of image reconstruction. This implies that we can use the experimentally measured intensity transmission matrix for image compression.
We note here that the experimental noise is due to noise generated by the laser and the electronics as expected for a real application. The photonic encoder, on the other hand, is extremely stable and provides a highly repeatable response. To confirm this, we monitored the transmission matrix for 60 h and the results are shown in the Supplementary Information [Supplementary Information: Section S5. Additional Experimental Characterization]. We found that the device is very stable with negligible fluctuations for the span of 60 h without requiring active temperature stabilization. This level of stability for an integrated photonic device is not surprising given the short lifetime of light passing through the scattering region, corresponding to a low effective quality factor with minimal temperaturedependence. Regarding temperaturedependence, based on our previous work^{36} and assuming a thermooptic coefficient in Si of dn/dT ≈ 1.8 × 10^{−4 }K^{−1} ^{37}, the generated speckle pattern at the output will stay correlated for temperature changes up to ± 4^{o}K. This stability is one of the advantages of the compact scattering device structure used in this work. Moreover, as shown later, using our unique approach of combining image compression with denoising, some of the noise introduced by thermal fluctuations during image compression could potentially be suppressed by training the backend image reconstruction neural network using data acquired at a range of temperatures.
To convert the raw measured transmission matrix into the \(T\) matrix used for compression, we selected 4 nonoverlapping spatial regions along the output ridge shown in Fig. 5d. This corresponds to selecting 4 columns of the matrix shown in Fig. 6a. This updated \(T\) matrix had dimensions of \(16\times 4\), providing a compression factor of 1:4. We then used this experimental matrix to train the backend neural network required to reconstruct the original image. Note that we included noise in the training process by adding gaussian noise with the same 1% variance measured experimentally. Finally, we compressed the test images in the DIV2K and Flickr2K dataset while again adding random noise with 1% variance. A typical compressed image using the experimentally measured \({T}\) matrix is shown in Fig. 6e and the corresponding reconstructed image is shown in Fig. 6f. Excellent agreement between the original and reconstructed images can be seen with PSNR = 26.02 dB and SSIM = 0.91. In comparison, 1:4 JPEG compression of the same image and with similar SNR gives PSNR = 29.63 dB and SSIM = 0.83. We repeated this process for the entire set of test images and obtained an average PSNR of \(26\pm 4\) dB and SSIM of \(0.9\pm 0.07.\) Additional examples of compressed and reconstructed images along with comparisons to JPEG compression are shown in the Supplementary Information [Supplementary Information: Section S1. Additional Experimental results: Compressed and reconstructed images and their statistics and Section S6. Comparison of Digital JPEG and Photonic Compression].
Finally, in addition to showing that this approach is robust to noise introduced in the analog photonic image compression step, this demonstration also illustrates how this technique could be used for image denoising. From the perspective of the backend image reconstruction neural network, noise added during the original image acquisition process (e.g., due to pixel noise, nonuniform responsivity, or simply low light levels) is equivalent to noise added during the image compression step (as tested explicitly here using experimental data for noise). Thus, this work also highlights the potential for this technique to move the energyintensive image conditioning and denoising steps to the backend image reconstruction stage.
While this initial demonstration was performed at low speed because of the lack of integrated detectors and modulators, this approach is compatible with highspeed operation (e.g., >10 GHz). Our compression device can be considered as a photonic communication link where data is encoded on an optical carrier using the input modulators, transmitted through the scattering region (analogous to transmission along a bus waveguide or through a fiber in a communications link) and is recorded on a highspeed photodetector. Since the loss through the scattering region of 30% is relatively low, we expect the compression device to operate with comparable SNR to photonic links operating at similar speeds. One potential complication is that noise will be added during the data encoding step (i.e., on the highspeed integrated modulators). However, our approach can allow us to compress and denoise simultaneously, and the image compression process works well even for SNR as low as 10 (as shown in Fig. 4f). In addition, we performed simulations where we calculated the compression quality as a function of noise added to the input image (simulating the effect of noise introduced by the modulators while encoding the inputs). The results are shown in the Supplementary Information (S4: Denoising Images), and they confirm that our approach is quite robust to noise introduced by the input modulators.
Predicted energy consumption and operating speed for the photonic image processor
As described above, our encoding and compression technique can be reduced to a matrix multiplication operation. In order to compare the power consumption using our photonic approach with a traditional electronic scheme, we estimated the energy per multiplyaccumulate (MAC) operation using both approaches. Electronic hardware accelerators have been thoroughly optimized to reduce the power consumption per MAC.
The power consumed by the photonic image processing engine includes contributions from the power consumed by the laser, the optical modulators, and the photodetectors. To estimate the required laser power, we first estimated the required detected power to provide sufficient signaltonoise for accurate image compression. Assuming shotnoise limited detection, we can express the required optical power reaching each photodetector as^{38}:
where ENOB is the required effective number of bits, \(q\) is the charge of a single electron (1.6 × 10−^{19} coulombs), \({f}_{0}\), is the operating frequency of the modulator (and the detector baud rate), and \({{{{{\mathscr{R}}}}}}\) is the responsivity of the photodetector in units of \(A/W\). The ENOB can be related to the measurement SNR in dB as \({SNR}=6.02\times {ENOB}+1.72\)^{38}\(.\) In the energy consumption calculations below, we assumed a required \({ENOB}\) of 6, corresponding to a measurement SNR of 38 dB, which provides significant margin compared with the experimentally measured SNR of 17 dB. Based on the required power at the detector, we can work backwards to estimate the required laser power as
where \(N\) is the number of pixels in an image block, \({T}_{{{{{\mathrm{mod}}}}}}\) is the transmission through the optical modulators, and \({T}_{{scatter}}\) is the transmission through the scattering medium. The electrical power required to drive the laser can then be written as \({P}_{{laser}}/\eta\), where \(\eta\) is the wallplug efficiency of the laser. The factor of \(N\) in Eq. (2) implies that the multimode waveguide and scattering region support \(N\) spatial modes (the minimum required to efficiently couple light from \(N\) singlemode input waveguides) and each detector collects, on average, \(1/N\) of the light transmitted through the scattering medium. In our preliminary experiment, presented above, a slightly larger multimode waveguide than required was used to simplify the experiment. As a result, the optical power was distributed over more than \(N\) modes in our initial demonstration. In the future, adiabatically coupling the singlemode input waveguides into an \(N\)mode multimode waveguide would optimize the power efficiency.
The power required by the optical modulators can be expressed as^{39}
where \({C}_{{Mod}}\) is the capacitance and \({V}_{{pp}}\) is the peaktopeak driving voltage of the modulator. The power required by the photodetectors can be approximated as
where \({V}_{{bias}}\) is the bias voltage of PN junction. The total electrical power consumed by the photonic imageprocessing engine can then be calculated as
Since the total number of MACs per second is \(N\times M\times {f}_{0}\), the energy consumption per MAC is given by \({P}_{{total}}/\left(N\times M\times {f}_{0}\right)\). After substituting Eq. 1 into the expressions for \({P}_{{laser}}({{{{{\rm{Eq}}}}}}.2)\) and \({P}_{{PD}}\) (Eq. (4)), we see that the total energy consumption per MAC is independent of the modulation frequency.
To quantitatively compare the energy per MAC required by an optimized photonic processing engine with a conventional electronic GPU, we assumed typical specifications for the optoelectronic components. \({C}_{{Mod}}\) is usually on the order of 1fF, \({V}_{{pp}}\) is ~1 V, \({V}_{{bias}}\) is typically 3.3 V, and \({{{{{\mathscr{R}}}}}}\) is typically ~1 mA/mW at a wavelength of 1550 nm^{40,41}. In addition, typical insertion loss for highspeed optical modulators is ~ 6.4 dB (\({T}_{{{{{\mathrm{mod}}}}}}=0.27\)) and the wallplug efficiency for distributed feedback lasers is \(\eta=0.2\)^{42,43}\(.\) The transmission through the experimental scattering medium is assumed to be \({T}_{{scatter}}=0.2\) which also takes into account coupling efficiency to the integrated photodetectors. To be specific, our scattering medium provides a transmission of 30% as shown in the Supplementary Information [Supplementary Information: Section S5. Additional Experimental Characterization]. However, we assumed 20% overall transmission, which included ~67% transmission to the photodetectors (see Supplementary Information: Section S3. Integration of photonic encoder with silicon photonics and CMOS components for more details).
The estimated energy consumption per MAC as a function of the image block size \(N\) is shown in Fig. 7. The energy required by the photonic image processor decreases rapidly with image block size. We also find that the majority of the power consumption is driven by the laser and \({P}_{{laser}}\) is 9.2 mW for a kernel size of \(8\times 8\) pixels and an ENOB of 6. While this power level is readily available from most commercial lasers operating in this wavelength regime (near 1550 nm), lower power consumption could be achieved if a lower \({ENOB}\) was sufficient for a given application, which would enable a lower power laser (see Fig. 4c, f for an analysis of the tradeoff between image reconstruction quality and the SNR of the compressed image). Nonetheless, we find that for an image block size of \(8\times 8\) pixels, the photonic image processor has the potential to provide 100X lower power consumption than a typical GPU. Although the photonic processor is even more efficient when using larger image blocks; this can degrade the image reconstruction, as shown in Fig. 2(d). In the future, alternative inversedesigned transforms could enable large pixel blocks without sacrificing image reconstruction fidelity.
We can also use this framework to estimate the energy consumption per pixel, which is calculated as \({P}_{{total}}/\left(N\times {f}_{0}\right).\)The energy per pixel is independent of both modulation frequency and the size of the pixel blocks, and for an \({ENOB}\) of 6 and the parameters listed above, can be as low as 72 fJ. This is dramatically lower than the ~0.1 \(\mu\)J used in existing image processing systems; however, the latter also includes the power required to operate the pixels and the analogtodigital conversion process used to extract the signal recorded by each pixel. Nevertheless, since more than 50% of the energy consumed by standard electronic image processing systems is dedicated to image compression and conditioning, our optoelectronic approach can contribute significantly to reducing the overall energy consumption.
Finally, the device throughput, in terms of pixels/second, can be estimated as \(N\times {f}_{0}\). Assuming an image block size of \(8\times 8\) (\(N=64\)), this approach can process a Terapixel/second of image data using a clock speed of ~16 GHz, which is readily achieved using standard optical modulators and photodetectors^{16,17,18}. While such compression would require significant temporal multiplexing to provide the compression engine with 64pixel kernels at a rate of 16 GHz, fortunately, current onboard memory has the capability to store as much as 1 Gpixel of data in a buffer to feed the compression engine for realtime processing. Since the compression engine can process 1Tpixel/sec (or 1 Gpixel/msec), this is sufficient to keep up with the data rate acquired by a Gpixel camera operating at 1 kHz (well beyond current stateoftheart). This speed is far beyond the compression speeds of conventional stateoftheart digital electronics schemes such as JPEG compression which can compress at most around a Gigapixel/sec.
While the analysis presented above compares the energy consumption of our photonic approach to GPUs implementing the same random transformbased compression algorithm, another important comparison is the energy required for JPEG compression using a variety of processors that might be found on a digital camera or smart phone. To calculate the energy consumption for JPEG compression^{19}, first, we analyzed the number of operations required by the JPEG algorithm. The original JPEG compression algorithm performs a discrete cosine transform (DCT) followed by thresholding (to store the most significant basis functions) for every 8 × 8, nonoverlapping kernel of pixels. To encode an image with \(N\times N\) pixels, \({N}^{2}/64\) DCTs, each requiring 64 × 64 multiply and accumulate operations (MACs), are necessary, totaling 64\({N}^{2}\) MACs. The more recent JPEG 2000 compression algorithm^{44} decomposes the image into wavelet representations by successively passing it through 2D \(n\)tap filters and applying 2 × 2 down sampling. For an image with \(N\times N\) pixels undergoing \(K\)level decompositions, the total number of MACs is \(4n\mathop{\sum }\nolimits_{j=0}^{K1}N/{2}^{j}\). The computing complexity of both JPEG and JPEG 2000 scales with \(O({N}^{2})\). This is the same computing complexity that our algorithm would require if you implemented the random projections using digital electronics rather than analog photonics.
Modern digital cameras or smart phones use an embedded systemonchip (SoC) as the core processing unit. These SoCs typically include ARM cores for generalpurpose computing, memory interface, and device control, along with specifically designed hardware codecs (e.g., JPEG/HEIF encoder in Canon DIGIC) or imageprocessing ASIC (e.g., Apple A16 Bionic) for JPEG compression and decoding. The power consumption of these codecs or ASICs can range from 0.5 to 20 pJ/MAC^{45}. Because digital hardware codecs are designed for streaming pipelined, sequential inputs of image blocks, the total power consumption also grows as \(O({N}^{2})\). Therefore, the power consumption per MAC remains constant regardless of the image size, sharing a similar scaling trend as GPUs. Moreover, the energy/MAC for a GPU ( ~ 1 pJ/MAC) is comparable to that of SOCs and ASICs and is plotted in Fig. 7.
Thus, to conclude, the power consumption for JPEG compression on a smartphone or digital camera is comparable to a GPU and exhibits the same scaling. The advantage of using a GPU for compression is that it might enable higher throughput by leveraging parallel processing. After this analysis, it is clear that the photonic compression engine still outperforms both in terms of speed and energy consumption with the potential for ordersofmagnitude lower power consumption than digital electronics solutions based on GPUs, ASICs or SOCs. Table 1 in the Supplementary Information (Section S2. Energy consumption typical of mainstream electronic architectures) summarizes the energy consumption typical of mainstream electronic architectures, including desktop processors with the most efficient schemes reaching ~1 pJ/MAC.
Discussion
In summary, we proposed a CMOScompatible silicon photonicsbased approach for largescale image processing. Our approach performs image compression and denoising using an autoencoder framework in which the first layer of the network is implemented using analog photonics, while the backend image reconstruction is implemented using digital electronics. Our approach enables image compression with comparable quality to standard digital compression techniques such as JPEG and image denoising quality comparable to stateoftheart digital neural network based denoising algorithms. In contrast to the prevailing image processing approach, which performs a large number of image conditioning tasks at the front end, our approach is designed to compress the raw image data and use the neural network backend to both reconstruct the original image and perform the image denoising and conditioning operations.
In this work, we presented a combination of numerical simulations and experimental characterization of a proofofprinciple device to validate this approach. Using numerical simulations, we optimized the system design by evaluating the compression quality as a function of kernel size and transmission matrix type (positive, realvalued vs. complexvalued). These simulations confirmed that this scheme can provide comparable quality compression to the JPEG algorithm. We also evaluated the impact of noise on the image compression quality and demonstrated the ability of this scheme to perform denoising. Finally, we analyzed the potential for this scheme to perform highthroughput processing with low power consumption. For example, by processing 8 × 8 blocks of pixels (i.e., 64 inputs) in parallel at ~16 GHz, this approach could process 1 Terapixel/s. This would enable large format image sensors such as Gigapixel cameras to operate at speeds up to 1 kHz or standard Megapixel imaging systems to operate at frame rates up to 1 MHz. From a powerconsumption perspective, we found that an optimized photonic encoder would consume 100X less energy per MAC than a GPU. In addition to the numerical simulations and analysis, we experimentally characterized a passive silicon photonic prototype with 16 inputs designed to process 4 × 4 pixel kernels and demonstrated compression using a realvalued experimental transmission (T) matrix. This confirmed that the properties of the experimental transmission matrix enable image compression with similar quality to the digital JPEG algorithm and image denoising with similar quality to digital electronic neural network based denoising algorithms. This experimental characterization also confirmed that this approach is robust to fabrication imperfections, provided calibration is performed after fabrication. Future work will focus on integration of highspeed modulators and detectors and increasing the kernel size to support 8 × 8 pixel blocks.
While this work focused on compression and denoising of grayscale images, the general approach could be used to compress redgreenblue (RGB) images, hyperspectral, or timeseries image data. Finally, this general scheme is amenable to a variety of imaging processing tasks other than compression including inference or classification. Again, the analog photonic transform could form the first layer of a neural network, accelerating the initial time and energyintensive processing of highdimensional image data, while relying on backend digital electronics to complete the network. By tailoring this backend neural network, the same photonic image processing engine could be applied to a variety of image processing and remote sensing applications.
Although we utilized a random scattering layer to generate the random encoding in this work, alternative approaches may be advantageous in some applications. For example, multimode waveguides^{46,47,48} or chaotic cavities^{49} could also be used to perform random encoding for compression and have the potential for lower loss due to reduced outofplane scattering.
Finally, although we focused on using a linear encoding of the raw image data for compression, nonlinear, multilayer encodings could potentially enable higher compression ratios. One tradeoff is that nonlinear encoding schemes are likely to require higher power consumption (due to optical attenuation and/or the power required to drive the nonlinear process). Nonetheless, the potential for improved compression ratios and the ability to process larger kernel sizes at once makes this an intriguing approach which we hope to investigate in future studies.
Methods
Neural processing algorithm at the backend
All the images tested were from the DIV2K and Flickr2K dataset. A dataset of 4152 grayscale images was generated, each with a resolution of 512 × 512 pixels, through cropping and grayscale conversion. The dataset was divided into a training set of 3650 images and a validation set of 502 images. To study the image compression process, numerically generated random transmission matrices as well as experimentally measured transmission matrices were utilized as the encoding matrix. By using the original images as the ground truth and the compressed measurements as input, a convolutional neural network (CNN) was successfully trained to establish a correlation between the ground truth and the compressed images. The neural networks used were constructed based on Deep ResUnet^{50} and ResUNet + +^{51}. In order to investigate the impact of different compressive kernel sizes on the networks’ ability to reconstruct compressed images, four different kernel sizes were explored (Fig. 2). The network architecture for the 4 × 4 kernel size consisted of 1 initial layer, 11 residual blocks, 2 down sampling layers, 4 up sampling layers, and 1 final convolutional layer. The residual block architecture was based on the residual neural network^{52}, and each block consisted of a Conv(3 × 3)BNLeakyReLUConv(3 × 3) BN[Conv(1 × 1)]LeakyReLU block, with a Conv(1 × 1) layer added in the residual connection (marked with brackets). The downsampling (upsampling) rate in each downsampling (upsampling) layer was 2. The downsampling layer comprised of a Conv(3 × 3)BN block, where the convolutional layer had padding =1 and stride = 2. The upsampling layer used ConvTranspose2d with kernel size 2 × 2 and stride=2. The initial layer was a Conv(3 × 3)BNLeakyReLUConv(3 × 3)Conv(3 × 3) layer, which produced 64 feature maps, and the final convolutional layer had kernel size 1 × 1. For the networks with other kernel sizes, we added additional upsampling layers and residual blocks to maintain the number of layers and the size of the final output tensor. The network was trained with the mean squared error loss, Xavier initialization^{53}, Adam optimizer^{54} with learning decay rate 0.1 per 400 epochs, and initial learning rate 0.001 for 800 epochs in PyTorch. The training was performed on four Nvidia V100S GPUs with a batch size of 64.
Fullwave electromagnetic simulations
The simulated fields shown in Fig. 5a, b were obtained using the finite element method, (COMSOL with the Electromagnetic Waves, Frequency Domain interface) in a 2D geometry. The length of the simulated devices was scaled down to reduce the computational time and consisted of a scattering region 15 µm in length followed by a 10 µm long output port. Perfect conductors were substituted for the photonic crystal reflectors on the top and bottom sides of the scattering region. The scattering region contained a randomly generated set of 6180 holes with radius = 50 nm. Perfectly matched layer (PML) boundary conditions were used on the boundaries of the observation region. Ports were used to excite the input waveguides (width 0.45 um) which had a pitch of 3.45 um with 16 total inputs in all giving a total device width of 55.2 µm (similar to fabricated structure). The prescatteringregion length corresponding to the multimode waveguide region between the single mode waveguide inputs and the scattering region was varied between 50 nm and 32 µm and had a 4 µm air buffer on the side edges before absorbing PML boundaries. The effective material index of the silicon domains that we used for the 2D simulations was determined from 3D simulations and it was found to be n_{e} = 2.83. The effective material index of the air domains was n_{e} = 1. For the simulation, a triangular mesh was used with 1,256,090 elements with an average element quality of 0.92. A boundary mode analysis step was performed for each input waveguide port and the model was simulated at λ = 1550 nm. For analyzing the electromagnetic fields in the output region, the fields were exported on a regular grid with 100 nm steps in X and Y.
Sample fabrication
The siliconphotonics image encoder was fabricated using commercially available silicononinsulator (SOI) wafers. The wafers consisted of 250 nm silicon on top of a 3 µm buried oxide. The encoder was fabricated using a positive tone ZEP resist followed by electron beam lithography and inductivelycoupled plasma reactive ion etching. The encoder consisted of N = 16 input singlemode waveguides, and the width of each waveguide was 450 nm. At the output, a ridge was fabricated to scatter the light outofplane, which was then measured to determine the transmission matrix of the encoder. The input waveguides were separated by 3.45 µm to minimize crosscoupling. The input waveguides were adiabatically tapered out towards the edge of the chip to increase the spacing between them to 10 µm, which was ~10X the size of the focused laser spot used for coupling the input light. This ensured that only one input waveguide was excited at a time during the measurement of the transmission matrix.
Experimental measurement
To measure the transmission matrix of our encoder (Fig. 5), we used an aspheric lens to couple continuous wave (CW) laser light at λ = 1550 nm to the chip. The lens was used to focus the laser beam to a spot of diameter ∼1.5 µm at the edge of the waveguides. To measure the transmission matrix, the focused laser spot was scanned to couple to each input waveguide while the transmitted speckle pattern was recorded from above using a longworking distance objective (50x, NA = 0.7) and an InGaAs camera (Xenics, Cheetah). All the images acquired during the measurement were then processed to determine the transmission matrix which we used for encoding the images. To quantify the experimental noise, the measurements were repeated, and the difference in magnitude of the elements of the transmission matrix was used as the experimental noise present in encoding the images.
Data availability
The data that support the findings of this study are available from the corresponding authors upon request.
Code availability
The codes and datasets in this study have been deposited in the Zenodo database under Creative Commons Attribution 4.0 International Public License at https://doi.org/10.5281/zenodo.10819458.
References
Yan, X. et al. “Compressive sampling for array cameras.” SIAM J. Imaging Sci. 14, 156–177 (2021).
Nichols, J. M. et al. “Range performance of the DARPA AWARE wide fieldofview visible imager.” Appl. Opt. 55, 4478–4484 (2016).
Brady, D. J. et al. “Parallel cameras.” Optica 5, 127–137 (2018).
Wang, T. et al. “Image sensing with multilayer nonlinear optical neural networks.” Nat. Photonics 17, 408–415 (2023).
Brady, D. J. et al. “Multiscale gigapixel photography.” Nature 486, 386–389 (2012).
Pang, W. & Brady, D. J. “Galilean monocentric multiscale optical systems.” Opt. Express 25, 20332–20339 (2017).
Chen, Y. et al. “Photonic unsupervised learning variational autoencoder for highthroughput and low latency image transmission.” Sci. Adv. 9, eadf8437437 (2023).
Li, J. et al. “Spectrally encoded singlepixel machine vision using diffractive networks.” Sci. Adv. 7, eabd7690 (2021).
Ashtiani, F., Geers, A. J. & Aflatouni, F. “An onchip photonic deep neural network for image classification.” Nature 606, 501–506 (2022).
Baek, S. H. et. al., “Singleshot hyperspectraldepth imaging with learned diffractive optics,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2631–2640 https://doi.org/10.1109/ICCV48922.2021.00265 (2021).
Yuan, X. & HaimiCohen, R. “Image compression based on compressive sensing: endtoend comparison with JPEG.” IEEE Trans. Multimed. 22, 2889–2904 (2020).
Wetzstein, G. et al. “Inference in artificial intelligence with deep optics and photonics.” Nature 588, 39–47 (2020).
Solli, D. R. & Jalali, B. “Analog optical computing.” Nat. Photonics 9, 704–706 (2015).
Wu, J. et al. “Analog optical computing for artificial intelligence.” Engineering 10, 133–145 (2022).
Chen, Y. et al. “Allanalog photoelectronic chip for highspeed vision tasks.” Nature 623, 48–57 (2023).
Wang, X., Weigel, P. O., Zhao, J., Ruesing, M. & Mookherjea, S. “Achieving beyond100GHz largesignal modulation bandwidth in hybrid silicon photonics Mach Zehnder modulators using thin film lithium niobate.” APL Photonics 4, 096101 (2019).
Siew, S. Y. et al. “Review of silicon photonics technology and platform development.” J. Lightwave Tech. 39, 4374–4389 (2021).
Vivien, L. et al. “Zerobias 40Gbit/s germanium waveguide photodetector on silicon,”Opt. Express 20, 1096–1101 (2012).
Wallace, G. K. “The JPEG still picture compression standard.” Commun. ACM 34, 30–44 (1991).
Bank, D., Koenigstein, N. and Giryes, R. “Autoencoders.” Preprint at https://doi.org/10.48550/arXiv.2003.05991 (2020).
Bajaj, K., Singh, D. K. & Ansari, M. A. “Autoencoders based deep learner for image denoising.” Procedia Computer Sci. 171, 1535–1541 (2020).
Theis, L., Shi, W., Cunningham, A. & Huszár, F. “Lossy image compression with compressive autoencoders.” Preprint at https://doi.org/10.48550/arXiv.1703.00395 (2017).
Havasi, M., Peharz, R., and HernándezLobato, J. M. “Minimal random code learning: Getting bits back from compressed model parameters.” Preprint at https://doi.org/10.48550/arXiv.1810.00440 (2018).
Johnson, W. B. “Extensions of Lipschitz mappings into a Hilbert space,” Contemp. Math. 26, 189–206 (1984).
Candes, E. J. & Tao, T. “Nearoptimal signal recovery from random projections: Universal encoding strategies?” IEEE Trans. Inf. Theory 52, 5406–5425 (2006).
Donoho, D. L. “Compressed sensing.” IEEE Trans. Inf. Theory 52, 1289–1306 (2006).
Liutkus, A. et al. “Imaging with nature: Compressive imaging using a multiply scattering medium.” Sci. Rep. 4, 1–7 (2014).
Wendland, D. et al. “Coherent dimension reduction with integrated photonic circuits exploiting tailored disorder.” JOSA B 40, B35 (2023).
ONSEMI NOII4SM6600A: https://www.onsemi.com/pdf/datasheet/noii4sm6600ad.pdf (2024).
Redding, B., Popoff, S. M., Bromberg, Y., Choma, M. A. & Cao, H. “Noise analysis of spectrometers based on speckle pattern reconstruction.” Appl. Opt. 53, 410–417 (2014).
Agustsson, E. and Timofte, R. “Ntire 2017 challenge on single image superresolution: Dataset and study.” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, Honolulu, HI, USA. https://doi.org/10.1109/CVPRW.2017.151 (2017).
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K., “Enhanced deep residual networks for single image superresolution.” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, Honolulu, HI, USA. https://doi.org/10.48550/arXiv.1707.02921 (2017).
Spencer, D. T. et al. “An opticalfrequency synthesizer using integrated photonics.” Nature 557, 81–85 (2018).
Yamilov, A. et al. “Positiondependent diffusion of light in disordered waveguides.” Phys. Rev. Lett. 112, 023904 (2014).
Sarma, R., Yamilov, A. G., Petrenko, S., Bromberg, Y. & Cao, H. “Control of energy density inside a disordered medium by coupling to open or closed channels.” Phys. Rev. Lett. 117, 086803 (2016).
Redding, B., Liew, S. F., Sarma, R. & Cao, H. “Compact spectrometer based on a disordered photonic Chip.” Nat. Photonics 7, 746 (2013).
Komma, J., Schwarz, C., Hofmann, G., Heinert, D. & Nawrodt, R. “Thermooptic coefficient of silicon at 1550 nm and cryogenic temperatures.” Appl. Phys. Lett. 101, 041905 (2012).
Valley, G. C. “Photonic analogtodigital converters.” Opt. Express 15, 1955 (2007).
Miller, D. A. “Energy consumption in optical modulators for interconnects.” Opt. express 20, A293–A308 (2012).
Nozaki, K. et al. “Femtofarad optoelectronic integration demonstrating energysaving signal conversion and nonlinear functions.” Nat. Photonics 13, 454–459 (2019).
Li, G. et al. “25 Gb/s 1Vdriving CMOS ring modulator with integrated thermal tuning.” Opt. Express 19, 20435 (2011).
Mulcahy, J., Peters, F. H. & Dai, X. “Modulators in silicon photonics  Heterogenous integration & and beyond.” Photonics 9, 40 (2022).
Vaskasi, J. R. et al. “High wallplug efficiency and narrow linewidth IIIVonsilicon Cband DFB laser diodes.” Opt. Express 30, 27983–27992 (2022).
Usevitch, B. E. “A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000.” IEEE Signal Process. Mag. 18, 22–35 (2001).
Turcza, P. “Entropy encoder for lowpower lowresources highquality CFA image compression.” Signal Process.: Image Commun. 106, 116716 (2022).
Vandoorne, K. et al. “Experimental demonstration of reservoir computing on a silicon photonics chip.” Nat. Commun. 5, 3541 (2014).
Redding, B. et al. “Evanescently coupled multimode spiral spectrometer.” Optica 3, 956–962 (2016).
Borlaug, D. B. et al. “Photonic integrated circuit based compressive sensing radio frequency receiver using waveguide speckle.” Opt. Express 29, 19222–19239 (2021).
Grubel, B. C. et al. “Silicon photonic physical unclonable function.” Opt. Express 25, 12710–12721 (2017).
Zhang, Z., Liu, Q. & Wang, Y. “Road extraction by deep residual unet.” IEEE Geosci. Remote Sens. Lett. 15, 749–753 (2018).
Jha D. et. al. “Resunet ++: An advanced architecture for medical image segmentation.” in 2019 IEEE International Symposium on Multimedia (ISM), 225–2255. https://doi.org/10.1109/ISM46123.2019.00049 (2019).
He, K., Zhang, X., Ren, S., and Sun, J. “Deep residual learning for image recognition.” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 https://doi.org/10.1109/CVPR.2016.90 (2016).
Xavier, G. and Bengio, Y. “Understanding the difficulty of training deep feedforward neural networks.” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 9, 249–256 (2010).
Kingma, D. P. and Ba, J. “Adam: a method for stochastic optimization.” Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2014).
Acknowledgements
This work was performed in part at the Center for Integrated Nanotechnologies, an Office of Science User Facility operated by the U.S. Department of Energy (DOE) Office of Science. This work was supported by the Laboratory Directed Research and Development program at Sandia National Laboratories, a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DENA003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
Author information
Authors and Affiliations
Contributions
R.S., D.B., and B.R. conceived the idea. R.S. supervised the project. X.W., J.S., and D.B. developed the image processing and reconstruction algorithms. B.R. performed the optical characterization. N.K. and R.S. performed the electromagnetic simulations. R.S. and C.L. fabricated the samples. Z.Z. and S.P. performed the energy analysis. All the authors discussed the results and contributed to the writing and editing of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Qionghai Dai, Peter McMahon and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, X., Redding, B., Karl, N. et al. Integrated photonic encoder for low power and highspeed image processing. Nat Commun 15, 4510 (2024). https://doi.org/10.1038/s41467024480992
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467024480992
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.