Sparse pixel image sensor

As conventional frame-based cameras suffer from high energy consumption and latency, several new types of image sensors have been devised, with some of them exploiting the sparsity of natural images in some transform domain. Instead of sampling the full image, those devices capture only the coefficients of the most relevant spatial frequencies. The number of samples can be even sparser if a signal only needs to be classified rather than being fully reconstructed. Based on the corresponding mathematical framework, we developed an image sensor that can be trained to classify optically projected images by reading out the few most relevant pixels. The device is based on a two-dimensional array of metal–semiconductor–metal photodetectors with individually tunable photoresponsivity values. We demonstrate its use for the classification of handwritten digits with an accuracy comparable to that achieved by readout of the full image, but with lower delay and energy consumption.

the encoder side, can be reconstructed at the decoder by finding the sparsest solution of an underdetermined linear system.Both sampling and compression are performed simultaneously to reduce the number of measurements at the expense of increased computational cost for signal reconstruction.
By combining CS with statistical learning, the number of required measurements can be further reduced, particularly if a given signal only needs to be assigned to one of a few categories, or classes, rather than being fully reconstructed.This can be achieved by using a task-specific basis, learned from data, instead of a generic one such as Fourier or Wavelet.
In the sparse sensor placement optimization for classification (SSPOC) algorithm 7,8 , the data are not sampled randomly, but a few representative measurement locations are identified from training data.Subsequent samples can then be classified with performance comparable to that obtained by processing the full signal.
Several new types of image sensors have been developed in recent years 9 , targeting lower energy consumption and latency than their conventional frame-based counterparts.Many of those devices emulate certain neurobiological functions of the retina, either using complementary metal-oxide-semiconductor (CMOS) technology (silicon retina) 10 , 11 , 12 or emerging device concepts 13,14,15,16,17,18 .CS has likewise led to new types of image acquisition systems, such as single-pixel cameras 19 , coded aperture imagers 20 , and CMOS CS imaging arrays 21,22 .SSPOC, on the other hand, has inspired applications in dynamics and control 23,24 , but has to the best of our knowledge not been employed in an imaging device yet.Here, we present a hardware implementation of this algorithm, based on a two-dimensional array of tunable metal-semiconductor-metal (MSM) photodetectors.Each of these detectors can be addressed individually and their photoresponsivity values can be set by the application of a bias voltage.The device is fully reconfigurable and we demonstrate its use for the classification of handwritten digits from the MNIST dataset with an accuracy comparable to that of conventional systems, but with substantially lower delay and energy consumption.
Let us first lay out the operation principle of the image sensor (Figure 1a), exemplified by a simple linear classification problem.We restrict ourselves to binary classification, where an optical image, that is projected onto the chip, is assigned to one of two possible classes.The image is represented by a vector  = ( % ,  ' , … ,  ) ) + in an -dimensional vector space ℝ ., where  / is the optical power at the  -th pixel.Unlike in conventional imagers, the photoresponsivity of each pixel is not fixed, but varies over the face of the chip.We aggregate the photoresponsivity values into a vector  = ( % ,  ' , … ,  ) ) + ∈ ℝ ., where  / denotes the responsivity of the -th detector.A linear classifier is a predictor of the form 25 where  is a threshold function that maps all values of the inner product  +  below a certain threshold (bias) to the first class and all other values to the second class (Figure 1b).
Physically, the inner product is implemented by simply summing up the photocurrents produced by all  detector elements,  tot = ∑  / . <=% = ∑  /  / .<=% =  + .By thresholding  tot , a binary output is obtained that is representative of the two classes. is learned from a set of labeled training data.A generalization to multi-class problems can be achieved by splitting pixels into subpixels 13,26 which allows for a physical implementation of a responsivity matrix .In Figure 1c we plot  for a linear support vector machine (SVM) that is trained to classify handwritten zeros ("0") and ones ("1") from the MNIST dataset.90% of randomly picked images are used for training and the remaining 10% for assessment.Almost all photodetectors are active, with varying responsivity values, and a classification accuracy of 99.8% is reached.
We now aim to obtain a comparable performance by selecting a small, optimal subset of detectors, or pixels.Figure 1d provides a geometrical interpretation of the algorithm 7,8 .A dimensional feature space , that spans the  ≪  most significant variations among the training data, is calculated using principal component analysis (PCA) 25 .For categorical decisions, a measurement  is projected into this low-dimensional subspace ( + : ℝ .→ ℝ E ) and a linear classifier, described by the weight vector  ∈ ℝ E , is then applied therein:  = ( +  + ) .In image space coordinates, this expression resembles equation (1) with a photoresponsivity vector  = .Note, however, that there exists an infinite number of solutions for , because adding any vector  in the null space (kernel) of  + projects to the very same  in feature space.We seek the sparsest solution for , that is the one that has at most  nonzero elements: ‖‖ I ≤  .As shown by the CS community 3,4,5,6 , ℓ %minimization leads to a convex optimization problem that can be efficiently solved with modern methods to find a good approximate solution: Figure 1e presents the results for the same binary MNIST classification task as before.Here, the data are projected into a six-dimensional PCA subspace ( = 6) in which a SVM is trained for classification.The photoresponsivity vector  is calculated by ℓ % -minimization of (2) using the PySensors package 27 in Python and is plotted in Figure 1e.Although less than 0.8% of the total pixels (6 out of 784) exhibit a responsivity  ≠ 0 , the classifier performs nearly as well as the SVM applied to the full image, and an accuracy of 99.1% is achieved.Importantly, energy consumption and delay are substantially reduced, as both scales linearly with the number of detector elements being read out.We stress that it is not possible to obtain this result by merely thresholding  in Figure 1c, as can be seen from Supplementary Figure S1.In Figure 2a we present the actual device implementation.The sensor is fabricated on a semiinsulating gallium arsenide (SI-GaAs) wafer, with two metal layers for routing of the electrical signals, using standard technology and without high temperature process steps.
Details are provided in the Methods section.GaAs is preferred over silicon (Si) because of its shorter absorption and diffusion lengths, which both reduce cross-talk between neighboring pixels and allow for a relatively simple planar device structure.However, with some minor modifications, the sensor concept can be transferred to the Si platform, which also provides the opportunity for low-cost monolithic integration of the electronic driver circuits, that are currently implemented off-chip.Our sensor consists of a two-dimensional array of  = 14 × 14 = 196 pixels, each containing an MSM photodetector 28   As in CMOS sensor technology, detectors are addressed by row and column decoders.The readout is performed one pixel at a time, with relevant pixel locations  and corresponding

Figure 1 |
Figure 1 | Theoretical background and operation principle.a, Schematic illustration of the setup.An optical image p is projected onto the face of the image sensor with photoresponsivity values  that vary from pixel to pixel.b, A binary linear classifier assigns an image to one of two possible classes I or II, depending whether or not the inner product  +  is larger than some threshold.In our implementation, the inner product is realized by summing up the photocurrents produced by all detector elements.c, Photoresponsivities for a sensor that has been trained as a linear SVM for the classification of zeros and ones from the MNIST dataset.Almost all pixels exhibit non-zero

Figure 2 |
Figure 2 | Image sensor architecture and characterization.a, Microscope image of the sensor, with schematic illustrations of the external row/column decoders and integrating output (left).Scale bar, 200 µm.A detailed view of one of the MSM photodetectors is presented in the inset and a schematic illustration is in the picture to the right.b, Bias voltage dependent device currents for all 196 detectors with (red lines) and without (blue lines) optical illumination (~160 W/m 2 ).The detectors are operated in the range ±5 V to ±10 V.
that converts incident light into photocurrent.Each detector comprises interdigitated metal fingers on the SI-GaAs semiconductor.Photoexcited electrons and holes drift under an electrical field applied between the fingers, giving rise to an external current.The photoresponsivity of the device can be controlled by a bias voltage, as shown in Figure2b, where the negative sign of the responsivity indicates a reversed current flow direction.The low background carrier concentration of the SI-GaAs wafer ( ~8 × 10 [ cm -3 ) ensures full depletion of majority carriers.As a result, the electric field drops homogeneously in the region between the metal fingers, so that photogenerated carriers are efficiently swept out of the device.Low residual doping is also required to suppress dark current and reduce cross-talk between neighboring detectors.Finally, we verified an approximately linear illumination intensity-dependence of the photocurrent (Supplementary FigureS2), as required by equation (1).

Figure 3 |
Figure 3 | Image sensor operation and performance evaluation.a, Relevant pixel locations  (bottom) and applied bias voltages  B,k (top) for the binary classification task discussed in the main text.b, Temporal evolution of the sensor output for more than 2000 samples from the dataset.Red (blue) lines show cases in which a "0" ("1") has been projected onto the sensor.The black lines show two representative examples with corresponding MNIST digits.c, Experimental confusion matrix.A classification accuracy of 98.3% is achieved.d, Histogram of sensor output as determined from the measurements in b.The dashed line indicates the decision threshold.