Powering AI at the edge: A robust, memristor-based binarized neural network with near-memory computing and miniaturized solar cell

Memristor-based neural networks provide an exceptional energy-efficient platform for artificial intelligence (AI), presenting the possibility of self-powered operation when paired with energy harvesters. However, most memristor-based networks rely on analog in-memory computing, necessitating a stable and precise power supply, which is incompatible with the inherently unstable and unreliable energy harvesters. In this work, we fabricated a robust binarized neural network comprising 32,768 memristors, powered by a miniature wide-bandgap solar cell optimized for edge applications. Our circuit employs a resilient digital near-memory computing approach, featuring complementarily programmed memristors and logic-in-sense-amplifier. This design eliminates the need for compensation or calibration, operating effectively under diverse conditions. Under high illumination, the circuit achieves inference performance comparable to that of a lab bench power supply. In low illumination scenarios, it remains functional with slightly reduced accuracy, seamlessly transitioning to an approximate computing mode. Through image classification neural network simulations, we demonstrate that misclassified images under low illumination are primarily difficult-to-classify cases. Our approach lays the groundwork for self-powered AI and the creation of intelligent sensors for various applications in health, safety, and environment monitoring.


Introduction
Artificial intelligence (AI) has found widespread use in various embedded applications such as patient monitoring, building, and industrial safety 1 .To ensure security and minimize energy consumption due to communication, it is preferable to process data at the edge in such systems 2 .However, deploying AI in extreme-edge environments poses a challenge due to its high power consumption, often requiring AI to be relegated to the "cloud" or the "fog" 3,4 .A promising solution to this problem is the use of memristor-based systems, which can drastically reduce the energy consumption of AI 5,6 , making it even conceivable to create self-powered edge AI systems that do not require batteries and can instead harvest energy from the environment.Additionally, memristors provide the advantage of being non-volatile memories, retaining stored information even if harvested energy is depleted.
The most-energy efficient memristor-based AI circuits rely on analog-based in-memory computing: they exploit Ohm's and Kirchhoff's laws to perform the fundamental operation of neural networks, multiply-and-accumulate (MAC) [7][8][9] .This concept is challenging to realize in practice due to the high variability of memristors, the imperfections of analog CMOS circuits, and voltage drop effects.To overcome these challenges, integrated memristor-based AI systems employ complex peripheral circuits, which are tuned for a particular supply voltage [10][11][12][13][14][15][16] .This requirement for a stable supply voltage is in direct contrast with the properties of miniature energy harvesters such as tiny solar cells or thermoelectric generators, which provide fluctuating voltage and energy, creating a significant obstacle to realizing self-powered memristor-based AI 17 .
In this work, we demonstrate a binarized neural network, fabricated in a hybrid CMOS/memristor process, and designed with an alternative approach that is particularly resilient to unreliable power supply.We demonstrate this robustness by powering our circuit with a miniature wide-bandgap solar cell, optimized for indoor applications.Remarkably, the circuit maintains functionality even under low illumination conditions equivalent to 0.08 suns, experiencing only a modest decline in neural network accuracy.When power availability is limited, our circuit seamlessly transitions from precise to approximate computing as it begins to encounter errors while reading difficult-to-read, imperfectly-programmed memristors.
Our fully digital circuit, devoid of the need for any analog-to-digital conversion, incorporates four arrays of 8,192 memristors each.It employs a logic-in-sense-amplifier two-transistor/two-memristor strategy for optimal robustness, introducing a practical realization of the near-memory computing concept initially proposed in ref. 18,19 .The design is reminiscent of the smaller-scale memristor-based Bayesian machine recently showcased in ref. 20 , with the added novelty of logic-in-memory functionality.This feature is achieved by executing multiplication within a robust precharge differential sense amplifier, a circuit initially proposed in ref. 21.Accumulation is then performed using a straightforward digital circuit situated near-memory.Our system also integrates on-chip a power management unit and a digital control unit, responsible for memristor programming and the execution of fully pipelined inference operations.
We first introduce our integrated circuit and provide a comprehensive analysis of its electrical characteristics and performance across a variety of supply voltages and frequencies.We then characterize the behavior of the circuit under solar cell power, demonstrating its adaptability and resilience even when the power supply is significantly degraded due to low illumination.To further showcase the robustness of the circuit, we present results from neural network simulations using the popular MNIST and CIFAR-10 datasets.These results highlight the capability of the circuit to perform well even under extremely low illumination conditions.

Binarized neural network machine based on distributed memristor modules
In binarized neural networks, both synaptic weights and neuronal activations assume binary values (meaning +1 and −1) 22,23 .These networks are particularly appropriate for the extreme edge, as they can be trained for image and signal processing tasks with high accuracy, while requiring less resources than conventional real-valued neural networks 24,25 .In these simplified networks, multiplication can be implemented by a one-bit exclusive NOR (XNOR) operation and accumulation by a population count (popcount).The output neuron activations X out, j are, therefore, obtained by using the synaptic weights W ji , the input neuron activations X in,i and the output neuron threshold T j .The quantity popcount (XNOR (W ji , X in,i )) − T j is a signed integer, referred to as neuron preactivation throughout this paper.We fabricated a binarized neural network hardware system (Fig. s 1a,b) employing hafnium-oxide memristors integrated into the back end of a CMOS line to compute equation 1.The memristors replace vias between metal layers four and five (Fig. 1c) and are used to program the synaptic weights and neuron thresholds in a non-volatile manner.The system comprises four memristor arrays, each containing 8,192 memristors.These arrays can be used in two distinct configurations: one with two neural network layers featuring 116 inputs and 64 outputs, or an alternative single-layer configuration that has 116 inputs and 128 outputs.Additionally, we fabricated a smaller die that includes a single 8,192-memristor module with peripheral circuits that provide more flexibility to access memristors.Our circuits use a low-power 130-nanometer process node, which is interesting for extreme-edge applications, as it is cost-effective, offers well-balanced analog and digital performance, and supports a wide range of voltages.Due to the partially academic nature of our process, only five layers of metals are available.
Our design choices aim to ensure the most reliable operation under unreliable power supply and follow the differential strategy proposed in ref. 19 .To achieve this, we use two memristors per synaptic weight, programmed in a complementary fashion, with one in a low resistance state and the other in a high resistance state (see Fig. 1d).We also employ a dedicated logic-in-memory precharge sense amplifier 21 to perform the multiplication, which simultaneously reads the state of the two memristors representing the weight and performs an XNOR with its X input (Fig. 1f).This differential approach makes our circuit highly resilient.It minimizes the effects of memristor variability by ensuring that the sense amplifier functions as long as the memristor in the low resistance state has a lower resistance than the memristor in the high resistance state, even if they deviate significantly from their nominal values.Furthermore, fluctuations in the power supply voltage affect both branches of the sense amplifier symmetrically.This robustness eliminates the need for compensation and calibration circuits, unlike in other analog in-memory computing implementations that require a finely controlled supply voltage.
Our system computes the values of all output neurons in parallel.We provide a detailed description of the pipelined operation of the neural network in Supplementary Note 3, and summarize the main principle here.The neuron thresholds, which are stored in dedicated rows of the memristor arrays, are read simultaneously and transferred to neuron registers located near the memristor arrays.Then, input neurons are presented sequentially to the memristor array.The accumulation operation of the neural network is performed by integer digital population count circuits that take as input the outputs of the XNOR-augmented sense amplifiers and decrement the neuron registers.These circuits, which are replicated for each output neuron, are located physically near the memristor arrays.This near-memory computing principle saves energy, as only the binarized activations of the output neurons, obtained by taking the sign bit of the threshold register at the end of the inference process, need to be transmitted away from the memories.
As the synaptic weights are stored in non-volatile memory, the system can be turned off and on at any time, cutting power consumption completely, and can immediately perform a new inference or restart a failed one.The programming of the weights needs to be carried out prior to inference, and a forming operation must be performed on each memristor before its first programming operation.A challenge is that the forming operation requires voltages as high as 4.5 volts, whereas the nominal voltage of our CMOS process is only 1.2 volts.To overcome this, we included level shifters in the periphery circuitry of the memristor arrays (Fig. 1e), which can sustain high voltages.These circuits, similar to the ones used in ref. 20 , use thick-oxide transistors to raise the voltage of the on-chip signals commanding the programming of memristors.The higher-than-nominal voltages are provided by two power pads.Once the memristors have been programmed, these pads can be connected to the digital low-voltage power supply VDD, as high voltages are no longer needed.The details of the memristor forming and programming operation are provided in Supplementary Note 2. Additionally, we incorporated a power management unit and a complete state machine into our fabricated circuit.These components, placed and routed all around the die, are detailed in Supplementary Note 1.Our fabricated system is functional across a wide range of supply voltages and operating frequencies, without the need for calibration.As shown in Fig. 2a, the measured output of the system, obtained using the setup depicted in Fig. 2b, matches the register-transfer-level simulation of our design (see Methods).This first experiment was conducted using the maximum supported supply voltage of our process (1.2 volts) and a clock frequency of 66 MHz.The energy consumption of the system can be reduced by decreasing the supply voltage, as seen in Fig. 2c.This graph displays the measured energy consumption 4/15 across various supply voltages and frequencies where the system remained functional.The x-axis represents the square of the supply voltage to highlight its direct proportionality to energy consumption: all circuits on-chip, including the sense amplifiers, and with the exception of the power management circuits, function solely with capacitive loads.Notably, energy consumption is largely independent of operation frequency at a given supply voltage.This result, typical for CMOS digital circuits, suggests an absence of short-circuit currents in our design.Supply voltages lower than one volt do not support 66 MHz operation and require slower clock speeds.The lowest measured energy consumption of 45 nJ was achieved at a supply voltage of 0.7 volts (close to the threshold voltage of the transistors in the low-leakage process that we are using) and a clock frequency of 10 MHz.

Characterization of the fabricated distributed memory modules BNN machine
Fig. 2d details the various sources of energy consumption in our circuit, as determined through simulations based on the process design kit of our technology.(It is not possible to separate the consumption of the different on-chip functions experimentally.)As the Figure illustrates, a significant portion of the energy is consumed by the on-chip digital control circuitry.In scaled-up systems, this proportion is expected to decrease considerably as the control circuitry would remain largely unchanged.Clock distribution represents only 5.2% of the energy, which is lower than typical digital circuits.This is due to the high proportion of circuit area taken up by memristor arrays, which do not require clock distribution.Neuron registers consume a substantial 16.0% of the energy, owing to their constant activity due to our design decision of not clock-gating them.This design choice simplified timing constraints in the circuit, ensuring its experimental functionality.However, a fully optimized design would be clock-gated, substantially reducing energy usage for the registers (see Discussion).The actual multiply-and-accumulate operations, including memristor read with XNOR logic-in-memory and population count, consume a modest 6.5% of the energy.
We now present a comprehensive characterization of the accuracy of our fabricated system.Initially, we programmed a memristor array with synaptic weights and neuron thresholds and tested it with neuron inputs, carefully selected to span the entire spectrum of potential output preactivation values (see Methods).Fig. 3a presents the measured accuracy (percentage of correct output neurons) across varying supply voltages and operational frequencies in a schmoo plot.With this setup, we observe no errors when the supply voltage is at least one volt.At 0.9 volts, occasional errors occur at 66 MHz operation, and below this voltage, error rates up to 2% can manifest at any frequency.We attribute these residual errors to the sense amplifiers, likely due to memristor variability and instability, which cause their resistance to deviate from the target nominal value.Conventional digital circuits incorporating memristors employ strong multiple-error correction codes to compensate for these issues 26 .By contrast, our sense amplifier, owing to its differential nature, can still determine the correct weight even if one memristor exhibits an improper resistance, as long as the memristor programmed in low resistance maintains a lower resistance than the memristor programmed in high resistance.At lower supply voltages, this task becomes more challenging, resulting in the observed residual bit errors.
As neuron errors arise from weight errors, they are only observed when the population count and threshold values of a neuron are comparable.We found that errors were absent experimentally when the difference between the population count and threshold (or neuron preactivation ∆) exceeded five.Figs.3c,d, based on extensive experiments (see Methods), depict the error rates for different supply voltages as a function of the neuron preactivation, when the system operates at 33 MHz and 66 MHz.At a supply voltage of 1.2 volts, errors only occur when the preactivation is -1, 0, or 1.At a supply voltage of 0.9 volts, errors are observed for preactivation magnitudes up to five.To illustrate how errors occur, Fig. 3 shows measurements of 64 output neurons with varying preactivations values, ranging from -5 to +5, taken at 33 and 66 MHz, with a supply voltage of 0.9 volts.At this voltage, more errors are observed at 66 MHz than at 33 MHz.Almost all errors detected at 33 MHz continue to exist at 66 MHz.This observation implies that residual errors are likely due to specific weakly-programmed memristors (i.e., complementary memristors programmed with similar resistance), rather than random thermal noise.

Powering the system with harvested energy
To validate the suitability of our circuit for energy harvesting applications, we connected it to a miniature AlGaAs/GaInP heterostructure solar cell (see Fig. 4a and Methods).Fig. 4b displays a photograph of this cell, along with its current-voltage characteristics measured under standardized one-sun AM1.5 illumination (see Methods).This type of solar cell, fabricated following the procedure of ref. 27 (see Methods), with a 1.73 eV bandgap, performs better than conventional silicon-based cells under low-illumination conditions, making it particularly suitable for extreme edge applications.Additionally, due to the wide bandgap, the open-circuit voltage provided by our solar cell (1.23 volts under high illumination) aligns with the nominal supply voltage of our CMOS technology (1.2 volts), unlike silicon solar cells, whose maximum voltage is only 0.7 volts.While energy harvesters are typically connected to electronic circuits through intricate voltage conversion and regulation circuits, we demonstrate the resilience of our binarized neural network by directly connecting the power supply pads of our circuit to the solar cell, without any interface circuitry.In those experiments, the solar cell is illuminated by a halogen lamp (Fig. 4d).Fig. 4c presents the current voltage of the solar cell with this setup for various illuminations, expressed as "equivalent solar powers" based on the short-circuit current of the solar cell (see Methods).Fig. 4e shows the measured accuracy of our system, plotted as a function of neuron preactivation, similarly to Fig. 3d.
Under an equivalent solar power of 8 suns, the circuit performs almost equivalently to when powered by a 1.2 volts lab bench supply.When illumination decreases, even under a very low equivalent solar power of 0.08 suns where the characteristics of the solar cell is strongly degraded, the circuit remains functional.However, its error rate increases, especially for low-magnitude preactivation values.The circuit naturally transitions to an approximative computing regime: neurons will large-magnitude preactivations are correctly computed, but those with low-magnitude preactivations may exhibit errors.1. Simulated accuracy of solar-cell power a fully-connected (MNIST task) and a convolutional (CIFAR-10 task) binarized neural network under various illuminations.The software baseline assumes no bit error (see Methods).
We now evaluate the performance of our circuit on neural networks.Our system functions with 128×64 memristor arrays; however, in practice, neural networks can have various structures.To map neural networks to our hardware, we employ a technique that subdivides neural network layers into several binarized arrays and then obtains the value of output neurons through majority votes of the binary output of each array (see Figs. 5a,b).This method, which we describe in more detail in Supplementary Note 4, is highly efficient in terms of hardware usage and causes only moderate accuracy degradation compared to software-based neural networks on the two tasks considered here: Modified National Institute of Standards and Technology (MNIST) handwritten digit recognition and CIFAR-10 image recognition.
To evaluate the classification accuracy of our hardware, we incorporated the error rates measured experimentally as a function of preactivation value and illumination (Fig. 4d) into neural network simulations (see Methods).Table 1 lists the obtained accuracy on a fully-connected neural network trained on MNIST and a convolutional neural network trained on CIFAR-10 (see Methods).Remarkably, the MNIST accuracy is hardly affected by the bit errors in the circuit: even under very low illumination equivalent to 0.08 suns, the MNIST accuracy drops by only 0.7 percentage points.Conversely, bit errors significantly reduce the accuracy of the more demanding CIFAR-10 task.Under 0.08 suns, the accuracy drops from the software baseline of 86.6% to 73.4%.The difference with the MNIST arises because more neurons tend to have low-magnitude preactivation when solving CIFAR-10, as the differences between classes are more subtle.
To further understand the impact of low illumination on neural network performance, we plotted the t-distributed stochastic neighbor embedding 28 (t-SNE) representation of the MNIST test dataset in Fig. 5b.This technique represents each image as a point in a two-dimensional space, where similar images cluster together and dissimilar ones reside at a distance.In the left image, we marked in black the images that were correctly classified by a neural network under illumination equivalent to 8 suns, but incorrectly under 0.8 suns.Interestingly, these images tend to be on the edges of the clusters corresponding to the different digit classes, or even outliers that do not belong in a cluster.This suggests that the images that the network starts misclassifying under 0.8 suns tend to be subtle or atypical cases.The right image shows that this effect intensifies under illumination equivalent to 0.08 suns, with a few images inside clusters also being misclassified.Fig. 5c presents the same analysis for the CIFAR-10 dataset.The trend of incorrectly classified images under low illumination tending to be edge or atypical cases persists, albeit less pronounced than with MNIST.

Discussion
Our circuit exhibits an original behavior when solving tasks of varying difficulty levels.For simpler tasks such as MNIST, the circuit maintains accuracy even when energy is scarce.When addressing more complex tasks, the circuit becomes less accurate as energy availability decreases, but without failing completely.This self-adaptive approximate computing feature has several roots and can be understood by the circuit's memory read operations.They are highly robust due to their differential nature: fluctuations of the power supply affect both branches of the sense amplifier equally.Still, when power voltage fluctuates or becomes low, some memory reads fail.Nevertheless, binarized neural networks are highly robust to weight errors, which in many cases do not change neuron activation 29,30 .Even in the worst case, weight errors cause some images to be misclassified, but these are typically atypical or edge cases.Therefore, when the power supply degrades, the AI naturally becomes less capable of recognizing harder-to-classify images.
In this context of low-quality power supply, memristors offer distinct advantages over conventional static RAMs.While static RAMs lose stored information upon power loss, memristors retain data.Furthermore, when the supply voltage becomes low, static RAMs are prone to read disturb, meaning that a read operation can change the bit stored in a memory cell.In contrast, memristors exhibit near-immunity to read disturb effects, especially when read by precharge sense amplifiers 20 (we observed no read disturb in our experiments), and are non-volatile (ten-years retention has been demonstrated in hafnium-oxide memristors 31 ).
After eliminating the energy used by the digital control circuitry (finite state machine), our circuit has an energy efficiency of 2.9 tera-operations per second and per watt (TOPS/W) under optimal conditions (10 MHz frequency, supply voltage of 0.7 volts).By further subtracting the energy consumption of clock distribution and neuron registers that can be eliminated through clock gating, and simultaneously optimizing the read operation (see Methods), energy efficiency increases to 22.5 TOPS/W.Due to the digital nature of our circuit, this number would scale favorably if a more current CMOS process was used.For example, employing the physical design kit of a fully-depleted silicon-on-insulator 28-nanometer CMOS process, we found that the energy efficiency of a clock-gated design would reach 397 TOPS/W (see Methods).Supplementary Note 5 compares these numbers and other properties of our digital system with fabricated emerging memory-based analog in-memory computing circuits.The most noteworthy comparison is with a recent study that presents an analog magnetoresistive memory (MRAM) based 64x64 binarized neural network fabricated in a 28-nanometer process 14 , which has a measured energy efficiency of 405 TOPS/W, which surpasses our projection slighly.However, this energy efficiency comes with the need for complex compensation and calibration circuits, matched to a stable power supply, which is not suitable with the unreliable power supply delivered by energy harvesters.
Our circuit can function with power supplies as low as 0.7 volts, enabling us to power it with a wide-bandgap solar cell optimized for indoor applications, with an area of only a few square millimeters, even under low illumination equivalent to 0.08 suns.Such lightweight, ultrathin solar cells can also be transferred into a fully-integrated, self-powered device 32,33 .Supply voltages lower than 0.7 volts result in significant inaccuracies in memristor readings due to the high threshold voltages of the thick-oxide transistors in our process.Employing a process with a lower threshold voltage thick-oxide transistor option could enable operation at lower supply voltages, broadening compatibility with various solar and non-solar energy harvesters.Some very low-voltage harvesters (e.g., thermoelectrics) may still require the voltage to be raised, which can be accomplished on-chip using switched capacitor circuits like Dickson charge pumps 34 .Self-powered AI at the edge, therefore, offers multiple opportunities to enable the development of intelligent sensors for health, safety, and environmental monitoring.

Figure 1 .
Figure 1.Overview of the fabricated memristor-based binarized neural network.a Optical microscopy image of the fabricated die, showing four memory modules and their associated digital circuitry and power management unit.b Detail on one of the memory modules.c Cross-sectional scanning electron micrograph of a hybrid CMOS/memristor circuit, showing a memristor between metal levels four and five.d Schematic of a memory module.For each operation mode, biasing conditions for WL, BL, and SL are given with respect to the power domain (VDDC, VDDR) and VDD. e Schematic of the level shifter, used in d for shifting digital voltage input to medium voltages needed during programming operations or nominal voltage during reading operations of the memristors.f Schematic of the differential pre-charge sense amplifier PCSA, used to read the binary memristor states, with embedded XNOR function, to compose a XPCSA: it computes an XNOR operation between input activation X and weight (memristor value) during bit-cell sensing.

Figure 2 .
Figure 2. Measurements of the memristor-based binarized neural network, employing a lab-bench power supply.a Sample measurement of the output of the integrated circuit, compared with a delay-less register-transfer level (RTL) simulation.b Photograph of the printed circuit board used for the experiments.c Measurement of the energy consumption to perform a whole-chip inference, for various operating frequencies and supply voltages.d Pie chart comparing the different sources of energy consumption in the system, obtained using simulations (see Methods).

Figure 3 .
Figure 3. Accuracy of the memristor-based binarized neural network.a Measured schmoo plot, presenting mean accuracy of the output neuron activations, for different operation frequency and supply voltage.They were obtained using patterns of weights and inputs chosen to cover all possible neuron preactivations (see Methods).NF means non-functional.b Measurements of 64 neurons with preactivations -5, -1, 0, 1, and 5, at 33 and 66 MHz with a power supply of 0.9 volts.Errors are marked in red.c,d Mean accuracy of the output neuron activations, as a function of neuron preactivation ∆ and supply voltage, measured at (c) 33 and (d) 66 MHz (see Methods).

Figure 4 .Figure 5 .
Figure 4. Measurements of the binarized neural network powered by a miniature solar cell.a Schematic view of the AlGaAs/GaInP heterostructure solar cell.b Photograph of the solar cell, and its measured current-voltage characteristics under one-sun AM1.5 illumination provided by a standardized solar simulator (see Methods).c Current-voltage characteristics of the solar cell for various illuminations provided by the halogen lamp (see Methods).d Photograph of the experimental setup where the fabricated binarized neural network is powered by the solar cell illuminated by the halogen lamp.e Mean measured accuracy of the output neuron activations, with the binarized neural network powered by the solar cell, as a function of neuron preactivation ∆ and solar cell illumination.