All-analog photoelectronic chip for high-speed vision tasks

Photonic computing enables faster and more energy-efficient processing of vision data1–5. However, experimental superiority of deployable systems remains a challenge because of complicated optical nonlinearities, considerable power consumption of analog-to-digital converters (ADCs) for downstream digital processing and vulnerability to noises and system errors1,6–8. Here we propose an all-analog chip combining electronic and light computing (ACCEL). It has a systemic energy efficiency of 74.8 peta-operations per second per watt and a computing speed of 4.6 peta-operations per second (more than 99% implemented by optics), corresponding to more than three and one order of magnitude higher than state-of-the-art computing processors, respectively. After applying diffractive optical computing as an optical encoder for feature extraction, the light-induced photocurrents are directly used for further calculation in an integrated analog computing chip without the requirement of analog-to-digital converters, leading to a low computing latency of 72 ns for each frame. With joint optimizations of optoelectronic computing and adaptive training, ACCEL achieves competitive classification accuracies of 85.5%, 82.0% and 92.6%, respectively, for Fashion-MNIST, 3-class ImageNet classification and time-lapse video recognition task experimentally, while showing superior system robustness in low-light conditions (0.14 fJ μm−2 each frame). ACCEL can be used across a broad range of applications such as wearable devices, autonomous driving and industrial inspections.

output, the 6-bit binary code of the corresponding output is selected with a 1 of 16 multiplexer (MUX), and decoded to thermometer-code to control 64 switches, each controlling the connection of one PDC to the computing line.
Extended Data Fig. 1d illustrates the timing diagram of the capacitance compensation process, where the numbers of positive and negative weights are 490 and 534, respectively.The compensation-enable (CE) signal controls the connection between the computing lines V+/V-and the current source as shown in Fig. 2f.The process of the compensation is performed in multiple steps using a binary search strategy.Each step contains three operation phases: pre-charging, discharging, and comparing.First, the computing lines V+ and V-are pre-charged to supply voltage VDD by enabling signal RST in the pixel.Signal RST is sequentially disabled and signal CE is set high to connect the current source to the computing lines.In this way, each of the V+ line and Vline is discharged by a current with the same value.After a pre-defined discharging time, the CE signal is set low and the voltage of V+ and V-are compared.In the case shown in Extended Data Fig. 1d, the voltage-drop on V+ is greater than the voltage-drop on V-in the first step, indicating load capacitance of V+ is smaller than V-.Therefore, extra amount of capacitance should be connected to V+ to ensure the V+ and V-lines have the same value of load capacitance.A binary search algorithm is then adopted to determine the number of PDC needed to be connected to the computing line V+.According to the voltage comparison result, 32 PDC are connected to V+ at the end of the first step.In the second step, voltage drop on V+ is smaller than voltage drop on V-, indicating that the number of PDC connected to V+ should be reduced, and so the number is reduced to be 16.The above procedure is repeated until the voltage drop on V+ and V-line is the same at the end of the dis-charging operation.The capacitance compensation is performed during the system start-up, and once the capacitance compensation is finished, ACCEL is ready for computation.

Supplementary Note 3: Experimentally measured nonlinear response in ACCEL
We use the photoelectric nonlinearity between OAC and EAC as the nonlinearity in ACCEL.As the photodiode in ACCEL converts the complex optical field into electric current, the input amplitude of the optical field is nonlinear to the output current I.The optical field is composed of both the electric field whose amplitude is E and the magnetic field whose amplitude is H.The electric field and the magnetic field convert to each other constantly.We use the energy of the electric field to represent the energy of the optical field as common practice 1 .Then the relationship of photoelectric current I and the amplitude of the electric field of the light wave E can be described as 1,2 : where ϵ is the dielectric constant; c is the speed of light; η is the responsivity of the photodiode depending on the characteristic of the photodiode; e is elementary charge; h is Planck's constant; ν is the frequency of light; A is the area of the photosensitive surface.
The photocurrent does not increase infinitely along with the increase of light power 3 .When the photocurrent of the photodiode saturates, the saturated photocurrent Isat, input light power Psat and the amplitude of electric field Esat satisfy: In our demonstration, the response of the photodiode in ACCEL does not involve the saturate situation because ACCEL focuses on ultra-fast and low-exposure computation in vision tasks.For example, for incoherent situations, the power of sunlight can hardly reach the saturate power Psat.
We measured the response to both coherent light at 532 nm (situations of Fig. 4-5) and incoherent white light (situations of Extended Data Fig. 4g-h and Supplementary Video 1) across a large intensity range to cover the exposure situations in the manuscript.White light can hardly reach extra high power in vision tasks in daily life, so it has a relatively small amplitude range (Extended Data Fig. 2a).For coherent light, we further enlarged the amplitude range with the laser (Changchun New Industries Optoelectronics Tech.Co., Ltd., MGL-III-532-300mW) to explore the response in a larger scope (Extended Data Fig. 2b).Both situations accord well with the theoretical quadric response and experimentally demonstrates the effective nonlinear response in ACCEL during allanalog computing.

Supplementary Note 4: Experimental computing speed and energy efficiency of ACCEL with 1-layer digital computation
The computing speed is calculated with the formula: where Nop is the number of multiplication and adding operations ACCEL implements in one frame; tframe is the time for ACCEL to process one frame; No is the number of operations implemented by OAC; Ne is the number of operations implemented by EAC; tr is the reset time; tp is the response time; ta is the accumulating time; tAD is the analog-to-digital data conversion time between ACCEL output and TPU input, and tTPU is the time for the digital computing with TPU.
The number of operations implemented by ACCEL for one frame can be calculated as: Nop The systemic energy efficiency is therefore calculated as: where Esys is the systemic energy consumption of ACCEL; Eo is the laser energy; Ecp is the energy of the photocurrent to compute; ESRAM is the energy of the SRAM to store, read and switch the weights; Econtrol is the energy of control unit; EAD is the energy consumption of the analog-to-digital data conversion, and ETPU is the computing energy of the TPU.We calculate the experimental performance of ACCEL for 3-class and 10-class image classification in Supplementary Table 5.The systemic computing speed of ACCEL for both 3-class and 10-class MNIST classification reaches 3.69 × 10 2 TOPS.The systemic energy efficiency of ACCEL for 3-class and 10-class image MNIST achieves 5.90 × 10 3 TOPS/W and 5.88 × 10 3 TOPS/W, respectively.The systemic computing speed of ACCEL for time-lapse tasks reaches 1.05 × 10 3 TOPS and the systemic energy efficiency achieves 4.22 × 10 3 TOPS/W.

Supplementary Note 5: Comparison between the comparator and ADC
ACCEL adopts the comparator shown in Extended Data Fig. 1e as an analog-to-digital interface instead of conventional ADC for time-lapse task.The comparator is composed of two back-to-back inverters that form a latch, and several switches for timing controlling.The timing diagram of the comparator is illustrated in Extended Data Fig. 1f.The comparator operates in three phases.First, the RESET signal is set high to clear the residual charges at sampling node S+ and S-.Then, the RESET signal is set low and the SMP signal is set high to sample the input voltages V+ and Vat the sampling node S+ and Srespectively.Finally, the SMP signal is set low and the CMP_EN signal is set high to compare the sampled voltages at S+ and S-.The voltage of the node with lower voltage is pulled down to ground voltage, and the voltage of the node with higher voltage is pulled up to supply voltage, which indicates the comparison result.
Since the comparator converts the input to a single-bit output, the conversion time and energy consumption of data conversion by comparator is much smaller than ADC, which samples the analog input signal and quantizes the input to multi-bit resolution digital data.For imaging application, the resolution of ADC is generally around 10-bit, and Supplementary Table 2 listed the performance of the comparator utilized in ACCEL and a state-of-the-art high-speed 10-bit ADC that is fabricated with 180 nm CMOS technology, the same as EAC.The conversion time and energy consumption of the comparator is respectively 3.81% and 1.21% of the ADC.Besides, ACCEL reduces the dimensionality of the input before the analog-to-digital interface from 224 × 224 (original image) to 16 (extracted analog features), so ACCEL reduces the latency and energy consumption of the analog-to-digital interface by 8.2 × 10 4 and 2.6 × 10 5 times.

Supplementary Note 6: Weight-switching time of SRAM
The weight-readout operation of SRAM is performed in two phases: pre-charge phase and readout phase (see Supplementary Note 1 for detailed principles of SRAM operation).To readout the data from SRAM, the control signal is first set low to reset the SRAM output node Q to 1, i.e., 1.8 V. Then the control signal is set high, and the on-chip controller generates the read word-line signal RWL to enable the readout process.If the data to be readout is 1, the output of SRAM stays at 1.8 V, and otherwise the output of SRAM is discharged to 0 V.
As the weight is binary, i.e., either -1 or 1 (corresponding to either V-or V+ lines in EAC), there are altogether four situations of the weight switching: from -1 to 1, from -1 to -1, from 1 to -1 and from 1 to 1.The experimentally measured weight switching time of the off-chip output signal is shown in Extended Data Fig. 7a-d.Since the process for weight changing by SRAM comprises two steps: 1) resetting the weight to 1; 2) changing to the new weight, the situation from 1 to 1 requires 0 ns.The measured weight signal indeed remained 1, i.e. the supply voltage 1.8 V (Extended Data Fig. 7a).Since SRAM output is a control signal that determines the switch in the pixel unit (Fig. 2h) to be turned on or off, the signal slightly below half supply voltage, i.e. 0.9 V, already means the switch is turned-off and does not necessarily have to be 0 V. Similarly, the signal slightly above half supply voltage, i.e. 0.9 V, already means the switch is turned-on, and does not necessarily have to be 1.8 V.Here we use 1.5 V and 0.3 V as the criteria instead of just 0.9 V to give an extra-secure upper-bound measurement.The off-chip output signal experimentally took 12.73 ns to switch from -1 to 1 (Extended Data Fig. 7b) and 13.54 ns to switch from 1 to -1 (Extended Data Fig. 7c).For the fourth situation: from -1 to -1, the internal signal changes from 0 V to 1.8 V as reset and then to 0 V.Because of the significant delay in the off-chip output signal, the output signal starts to drop before it could actually reach 1.8 V (Extended Data Fig. 7d, the orange line).The measured latency for it to finish the -1 to 1 to -1 process is 12.97 ns.As this one is a complicated situation, we also labeled the latency for the signal to actually reach 0 V, which is 13.27 ns (Extended Data Fig. 7d, the green dashed vertical line).
As a result, we have measured all four situations of the weight switching by SRAM, and the time are 0 ns, 12.73 ns, 13.54 ns and 12.97 ns.Even the delayed off-chip output signal completes the weight switching within 14 ns, indicating the actual complete time inside the chip is within 14 ns.We also calculated the theoretical output signal with parameters provided by the foundry Semiconductor Manufacturing International Corporation (SMIC).The trends and calculation results correspond quite well with the measured results (Extended Data Fig. 7e-h) and also are all below 14 ns.We used 500 MHz as the clock frequency for ACCEL and assign 7 clock periods for reset time.Therefore, the reset operation of the computing-line and weight-switching of SRAM can be conducted simultaneously within the reset time of 14 ns.
implemented by ACCEL for one frame can be calculated as: Nop The systemic energy efficiency is therefore calculated as: The clock frequency in our ACCEL prototype is 500 MHz.The supply voltage of the computing module in EAC and SRAM is 1.8 V; the supply voltage of the control unit is 1.0 V; the measured average current of the computing module, control unit and SRAM are 89.15μA, 8.38 mA and 2.83 mA respectively.Although we used ACCEL with two-layer 400 × 400 OAC for 3-class ImageNet classification, the two OAC layers are two linearly connected matrix multiplication, without nonlinearity between.Therefore, we calculate the operation number in OAC as a matrix multiplication of a single 400 × 400 OAC layer as the minimum operation number.We use the measured laser energy instead of the energy arriving at ACCEL as the energy of light.

c: Image reconstruction with OAC output (MNIST) Digital neural network ACCEL Network structure
Esys is the systemic energy consumption of ACCEL; Eo is the laser energy; Ecp is the energy of the photocurrent to compute; ESRAM is the energy of the SRAM to store, read and switch the weights and Econtrol is the energy of control unit.We calculate the experimental performance of ACCEL for 3-class and 10-class image classification in Supplementary Table4.The systemic computing speed of ACCEL (10-class MNIST) reaches 5.95 × 10 2 TOPS and the systemic energy efficiency achieves 9.49 × 10 3 TOPS/W.The systemic computing speed of ACCEL (3-class ImageNet) reaches 4.55 × 10 3 TOPS and the systemic energy efficiency achieves 7.48 × 10 4 TOPS/W.

Table 2 | Comparisons between the adopted comparator and the state-of-the- art 10-bit high-speed ADC fabricated with 180-nm CMOS process.
The data of the comparator is derived by circuit post-simulation with Cadence virtuoso tool according to model files provided by SMIC (Semiconductor Manufacturing International Corporation), where the EAC chip is fabricated with 180-nm standard CMOS process.The values of ADC are the latency and energy for 10-bit data.SOTA: state-of-the-art.

Table 3 | Experimentally measured latency for the output voltage of ACCEL to reach 20-dB signal-to-noise ratio (SNR) under different input light power.
The noise level of ACCEL output is 6.43 μVrms, so ACCEL output has a SNR of 20 dB when the output voltage drops 65 μV.Since the photocurrent that causes the voltage drop is proportional to the input light power, the time it takes for ACCEL output voltage to drop by 65 μV is approximately inversely proportional to the input light power.

Table 5 | Experimental computing performance of ACCEL connected with one digital layer.
6tate-of-the-art high-speed 10-bit ADC6.† On-chip comparator in EAC chip.The clock frequency in our ACCEL prototype is 500 MHz.The supply voltage of the computing module in EAC and SRAM is 1.8 V; the supply voltage of the control unit is 1.0 V; the measured average current of the computing module, control unit and SRAM are 89.15μA, 8.38 mA and 2.83 mA, respectively.We use the measured laser energy instead of the energy arriving at ACCEL as the energy of light. *

Table 6 | Structures of digital neural networks with different layer numbers for comparison with ACCEL on 10-class MNIST classification.
The size of the input image is 28 × 28.Each layer of the network is connected to a nonlinear ReLU layer, and the convolutional layer is connected to a pooling layer which has the pooling size of 2 and stride of 1 before ReLU.Each convolutional layer has padding of two pixels on each side.FC: fully-connected layer.Conv: convolutional layer.