Energy-efficient memcapacitor devices for neuromorphic computing

Data-intensive computing operations, such as training neural networks, are essential for applications in artificial intelligence but are energy intensive. One solution is to develop specialized hardware onto which neural networks can be directly mapped, and arrays of memristive devices can, for example, be trained to enable parallel multiply–accumulate operations. Here we show that memcapacitive devices that exploit the principle of charge shielding can offer a highly energy-efficient approach for implementing parallel multiply–accumulate operations. We fabricate a crossbar array of 156 microscale memcapacitor devices and use it to train a neural network that could distinguish the letters ‘M’, ‘P’ and ‘I’. Modelling these arrays suggests that this approach could offer an energy efficiency of 29,600 tera-operations per second per watt, while ensuring high precision (6–8 bits). Simulations also show that the devices could potentially be scaled down to a lateral size of around 45 nm. Arrays of memcapacitor devices that work via charge shielding can be used to implement artificial neural networks and could potentially offer an energy efficiency of 29,600 tera-operations per second per watt.

B rain-inspired computing-often termed neuromorphic computing-based on artificial neural networks and their hardware implementations could be used to solve a broad range of computationally intensive tasks. Neuromorphic computing can be traced back to the 1980s (refs. 1,2 ), but the field gained considerable momentum after the development of memristive devices 3 and the proposal of convolutional layers in deep neural networks at the algorithmic level 4,5 . Since then, several resistive neuromorphic systems and devices have been implemented using oxide materials [6][7][8] , phase-change memory 9 , spintronic devices 10,11 and ferroelectric devices (tunnel junctions 12,13 and ferroelectric field-effect transistors (FeFETs) 14,15 ), and such systems-namely, ferroelectric tunnel junctions 13 and SONOS (that is, silicon-oxide-nitrideoxide-silicon) transistors 16 -have exhibited energy efficiencies of up to 100 tera-operations per second per watt (TOPS W -1 ). All these approaches rely on the analogue storage of synaptic weights, which can be used in multiplication operations, and use Kirchhoff 's current law for the summation of currents implemented via crossbar arrays 17 .
Memcapacitive devices 18 are similar to memristive devices but are based on a capacitive principle, and could potentially offer a lower static power consumption than memristive devices. There have been theoretical proposals for memcapacitor devices [18][19][20][21][22] , but few practical implementations [23][24][25][26] . Memcapacitor devices can be realized through the implementation of a variable plate distance concept, as demonstrated in micro-electromechanical systems 27 , a metal-to-insulator transition material in series with a dielectric layer 22 , changing the oxygen vacancy front in a classical memristor 20 , and a simple metal-oxide-semiconductor capacitor with a memory effect 24,25 . To obtain a high dynamic range, these devices either have a large parasitic resistive component 20 at small plate distances or limited lateral scalability due to large plate distances. Similar problems occur with memcapacitors having varying surface areas 23 or varying dielectric constants 26 .
In this Article, we report memcapacitor devices based on charge shielding that can offer high dynamic range and low power operation. We fabricate devices on the scale of tens of micrometres and use them to create a crossbar array architecture that we use to run an image recognition algorithm. We also assess the potential scalability of our devices for use in large-scale energy-efficient neuromorphic systems using simulations.

Memcapacitive device based on charge shielding
Our memcapacitive device consists of a top gate electrode, a shielding layer with contacts and a back-side readout electrode (Fig. 1a). These layers are separated by dielectric layers. The top dielectric layer can have a memory effect, for example, charge trapping or ferroelectric, which may influence the shielding layer, or the shielding layer itself can exhibit a memory effect (in this paper, only the first principle is investigated). A very high on/off ratio of electric field coupling and therefore the capacitance between the gate electrode and readout electrode can be obtained with either total shielding or transmission. The lateral scalability is substantially better compared with the previously mentioned concepts, since the thickness of each layer can be readily optimized, while the dynamic ratio is mainly dependent on the shielding efficiency of the shielding layer.
Generally, charge screening depends on the Debye screening length L D : where U T is the thermal voltage, n is the charge carrier concentration, ε 0 is the electric field constant, ε r is the relative electric field constant and e is the elementary charge. The electric field drops exponentially within the shielding layer and drops to 37% within the screening length L D under the condition Ψ ≪ U T . In practice, in semiconductors, the relationship is highly nonlinear depending on potential ψ at depth x, as follows: where p 0 and n 0 are the charge carrier concentrations of holes and electrons in thermal equilibrium, respectively. Therefore, the Debye screening length (equation (1))-given the exponential spatial dependence of the field in the material-is only a linear approximation of nonlinear differential equation (2). Especially for strong inversion and accumulation within the shielding layer, the length scales of screening become much smaller than the Debye length. This nonlinearity with respect to the applied gate voltage or charge stored in the memory dielectric leads to either strong shielding or fairly good transmission. A more detailed device structure is shown in Fig. 1b with lateral p + nn + junctions in the shielding layer. The p + -and n + -doped regions act as reservoirs for electrons and holes, respectively, and can inject each carrier type for the purposes of shielding. This enables additional device functionality; however, more importantly, it also allows a symmetric device response for positive and negative gate voltages. This is a crucial feature for neuromorphic devices, because the weight update is then undistorted and the training accuracy is thus higher 17 . During readout, the shielding layer is connected to the ground (GND). During writing and training, the voltages applied to the p + and n + contacts can differ and can also act as a selector, as explained in Supplementary Section 1. As shown in Fig. 1c, the single device can be arranged into a crossbar for highly parallel multiply-accumulate (MAC) operations. In this case, the gate electrode becomes the word line (WL), where input signals are applied, and the shielding layer becomes a shielding line (SL) in a direction vertical to the WL. The readout electrode functions as the bit line (BL), which is parallel to the SL, and the accumulated charge out of one BL is the calculated result of accumulated multiplications at each crossing point. The multiplication is conducted between the input signal of the WL and the state of the shielding layer, which, in turn, is adjusted by the memory material. The weights are encoded in the capacitance of each crossing point. In contrast to resistive devices, capacitive devices only react on dynamic voltage or current signals; therefore, an alternating current (a.c.) voltage is applied to the WL during readout. Writing of the memory material is achieved by a voltage difference between the SL and WL.

cV curves and gradual programming of single devices
Single devices on the micrometre scale were fabricated on a silicon-on-insulator wafer, whereas the handle wafer containing a highly n-doped epitaxial layer acts as the readout electrode and the buried oxide acts as the bottom dielectric layer. As a memory principle, ferroelectric-assisted charge trapping (polarization charge attracts carriers and thus promotes trapping) was used to combine the advantages of both principles 28,29 , whereas the tunnelling oxide was 2.5 nm thick to avoid charge detrapping. Details of the fabrication can be found in Methods.   Fig. 1 | Structure of the memcapacitor device. a, General device structure with a gate electrode, shielding layer (SL) and readout electrode (I, current; Q, charge). the electric field coupling is indicated by the blue arrow. b, Device structure with a lateral pin junction as well as electron and hole injection. c, crossbar arrangement of the device in b, where a.c. input signals are applied to the word lines (WLs) and the accumulated charge is read out at the bit lines (bLs). During readout, the SL is mostly connected to GND.
The fabricated devices had a gate length ranging from 10 to 60 µm, and the gate width was enlarged by winding it around several highly p + -and n + -doped finger-shaped regions, thus forming several parallel pin junctions. The larger area leads to a readily detectable capacitance and the minimum capacitance of turned-off devices could also be precisely measured (capacitive dynamic range). Figure 2a shows a microscopic image of the fabricated device. Capacitance-voltage (CV) measurements were carried out by applying an a.c. signal with a direct current (d.c.) bias (sweep) to the gate: the resulting a.c. current of the readout electrode was measured either by lock-in amplification or by an oscilloscope and current pre-amplifier. Data from the resulting fundamental CV curves for different d.c. voltages (V AK ) on the n + and p + regions are shown in Fig. 2b (note that a normal silicon dioxide dielectric layer was used here instead of a memory dielectric). The CV curves get broader or are nearly extinguished depending on whether the pin junction is used in the reverse or forward bias direction, respectively; this behaviour is further explained in Supplementary Section 1. Generally, a capacitive coupling window is observed, which is high for depletion (and therefore for transmission through the shielding layer) and low during inversion or accumulation. The curves are derivatives of a sigmoid curve, which play an important role in modelling neurons in artificial neural networks. A direct measurement of the sigmoid curve and further uses are explained in Supplementary Section 1.
Replacing the normal silicon dioxide dielectric with a memory dielectric and with a CV sweep from −5 to 5 V, one can observe a shifting of the capacitive coupling window with a memory window of 2.7 V (Fig. 2d), while the pin junction was grounded. Due to the shifting direction, one can conclude that charge trapping is the memory principle (for purely ferroelectric switching, the curves would shift in the opposite direction). By contrast, capacitive devices can only be read out by a.c. voltages or current signals. For this reason, an alternating voltage (0.5 V) is applied to the gate for readout, together with a bias voltage (1.0 V) to adjust the readout window, as indicated by the shaded area in Fig. 2d (note that the pin junction is grounded during readout). In Supplementary Fig. 11a,b, the readout current of a written and erased cell is shown, and a capacitive dynamic range of ~1:1,478 was experimentally achieved.
To store analogue values, one can apply short pulses with the same amplitude (Fig. 2d,g), apply pulses with increasing height (Fig. 2e) or change the pulse length ( Fig. 2f) applied to the gate. The resulting curves exhibit some similarities to those obtained from pure ferroelectric switching 14 , indicating the ferroelectric assistance in the memory storage process. The curve in Fig. 2d shows a typical nonlinear long-term potentiation (LTP) curve with an exponential dependence.  In d-f, the shielding layer was grounded, and readout was performed between each pulse with an a.c. signal, as shown in c. g, Pulse number modulation for different write pulse heights.
The same applies for the long-term depression (LTD) where N pgr and N er denote the number of programming or erase pulses, respectively; β pgr and β er are the stretching factors; and C min and C max denote the minimum and maximum capacitance, respectively. Here ΔC describes the maximum change in capacitance.
Changing the write pulse height of the pulse number modulation leads to more flattened or steepened curves (Fig. 2g). Write/erase pulse height modulation (Fig. 2e) can lead to relatively symmetric and-in certain regions, linear-behaviour with respect to the pulse height steps. This is highly beneficial for implementing neuromorphic algorithms 17 . Pulse length modulation shows similar behaviour to pulse number modulation (Fig. 2f). In Supplementary Fig. 11c, the measured readout current is illustrated for LTP and LTD for different pulse numbers of pulse height modulation (Fig. 2e) and reveals the pinch-off and increase.
Other memory parameters, like device-to-device variation, endurance and retention can be found in Supplementary Section 9.

crossbar array and implementation of training algorithm
Crossbar devices-used to execute an image recognition algorithm-were fabricated and wire bonded onto a chip carrier. A printed circuit board (PCB) was designed and controlled by a data acquisition system. An image of the fabricated chip with the bonding pads, a zoomed-in microscopy image of the crossbar and a scanning electron microscopy image are shown in Fig. 3a. Each memory cell had a size of 50 × 50 µm 2 .
A schematic of the device cross section is shown in Fig. 3b. The BLs of the memory array were separated by refilled deep trenches. Details of the fabrication process can be found in Methods.
The matrix comprised 26 WLs and 6 BLs (Fig. 3c). A differential weight topology 17 was used with the positive and negative value of each weight separated in two memory cells. The values of these two BLs were subtracted from each other.
The input values are separated by a sign with a 180° phase shift. For the desired 'four-quadrant multiplication' (input × weight), a global clock signal is used together with the switched capacitor approach (Fig. 3c). Further details are explained in Supplementary Section 11. The integration capacitance of the amplifier is charged up in each period of the input sine signal, and hence, the number of periods (N per ) encodes the value of the input signal. This effect also leads to an averaging of the noise level and improvement in the signal-to-noise ratio, as explained later. This theoretical concept of 'four-quadrant multiplication' was confirmed with the following measurement (Fig. 3d): the input number of periods (N per ) and the number of programming pulses (N pgr ), which adjust the actual weight, were varied in positive and negative values, while the output voltage is read. Positive and negative N per values were encoded by a 180° phase shift and positive/negative programming pulses (N pgr ) only changed the positive/negative weights, while the counterpart was in an erased state. Supplementary Fig. 12a,b shows the cross sections of the 3D plot in Fig. 3d. The curves along the input period number behave in a highly linear manner, and this linearity was also confirmed for the accumulation operation ( Supplementary Fig.  12c), demonstrating a highly linear MAC operation with the proposed switched capacitor approach.
The first 25 WLs enable a vectorized input feature map for images of 5 × 5 pixels; thus, one single fully connected layer is carried out. Dark pixels are represented by positive values and bright pixels, by negative values. The bias input is mapped to the 26th WL.
Regarding the implemented training algorithm, the Manhattan update 8,30 rule was chosen, due to its simplified training procedure. In conventional backpropagation training, the weight update is calculated as follows: where α describes the learning rate, δ i (n) is the backpropagated error and X j (n) is the current input for the nth input image, which is randomly chosen from the training set. The weights are updated after each sample (stochastic training). The backpropagated error for a one-layer perceptron can be calculated as follows: where f d i (n) is the desired output value and f i (n) is the current output. Function f i is related to the voltage output v i (n) of the ith sense amplifier and the activation function of the neuron (in this case, tanh): where κ is the steepness factor. With the Manhattan update rule, the weight update from equation (6) is coarse-grained by using the following signing.
Therefore, all the weights are updated by the same amount based on their sign. Figure 4a illustrates the pulse scheme for implementing the algorithm. The term δ i (n) X j (n) in equation (6) becomes positive if both error δ i (n) and input X j (n) are positive or it becomes negative for the opposite sign if both δ i (n) and X j (n) are negative . Hence, one can describe this by an XNOR combination. To update the weights, the error signal is applied to the SL, as shown in Fig. 4a. The corresponding input signals are applied to the WL. The differential signal at the crossing points follows the XNOR operation, while the specific signals (shown in Fig. 4a) ensure that the maximum disturbance level is not higher than 1/3 and thus effectively prevents the overwriting of cells in the same column or row (the memory cell acts as the selector itself; see Supplementary Sections 7 and 8). As a 5 × 5 image recognition task, the letters M, P and I were chosen, and one pixel in each of the samples was flipped, which results in a total set of 78 samples. These pseudo-images were separated into a test and training set; the test images are indicated by a blue frame (Fig. 4b).
The resulting misclassified images versus training epochs for the training and test images are shown in Fig. 4c. Evidently, the number rapidly decreases after one training epoch and stays almost zero throughout the training epochs. Figure 4d shows the obtained mean neuron activations for the three classifications over the training epochs. The slightly higher simulated average misclassification rate (Fig. 4c) is the consequence of single steep climbs of the misclassification rate after an arbitrary number of epochs with 100% accuracy in some runs. Misclassifications after epoch 1 are caused by the very similar expected value for individual presynaptic neurons for letters M and P. Measurements also confirm the more stable results for the classification of letter I, as shown in Fig. 4d. The results are in accordance with other studies 7,8 .
Thus, experimental results on micrometre-sized devices demonstrate the working principle. For demonstrating scalability to the nanometre regime and superior energy efficiency, detailed and extensive simulations were performed, which are explained in the upcoming sections.

tcAD simulations on single devices
A device with 90 nm gate length (Fig. 5a) was simulated by Synopsys. Figure 5b (where no memory dielectric was integrated for the first simulations) shows the CV curves of the coupling capacitances between the gate and readout electrode with respect to the applied gate voltage (V G ), which are consistent with the observed experimental behaviour (Fig. 2b).
The ratio between the maximum capacitance and lower-state capacitance obtained by shifting the gate voltage by 3 V is 1:90 in this device, and this ratio can be further enlarged by using thinner gate oxides or larger gate lengths, as shown in Fig. 5c. In general, the capacitive ratio decreases with a smaller gate length due to the fact that the influence of the space charge region becomes more pronounced for smaller gate lengths (short channel effect) and sufficient shielding is hard to achieve in this region (Fig. 5c,  inset). By using high-κ dielectrics for the top and bottom oxides, a ratio of 1:60 was obtained for a 45 nm device with the same capacitance as the 90 nm device, as shown in Supplementary Section 2.
Including a memory window (~3 V for charge-trapping memories and ~1-2 V for ferroelectric memories depending on the thickness and coercive field) leads to shifted CV curves (Fig. 5d). The a.c. readout voltage is indicated in Fig. 5d; for the positive shifted curve, the resulting readout current and therefore the accumulated charge will be very large. The total readout charge over one-half period of the applied sinusoidal signal versus memory shift is shown in Fig. 5e. Most of the negative memory window is used for turning off the device.

Scalability to 45 nm
With regard to lateral scalability, it is necessary to distinguish three aspects: (1) the scalability of the memory technology in the top dielectric itself with regard to how many levels can be stored; (2) the sensitivity of the sense amplifier at the end of each BL for detecting the accumulated charge; (3) the noise level of one single device during  Fig. 2, and the number of periods encodes the amount of input. the clock signal is high for a rising edge in the positive signal and the switches are in the left position during a high clock signal. the SL is connected to GND during readout. d, Measured 'four-quadrant multiplication' for different input period numbers N per and programming pulse numbers (pulse number modulation) N pgr . For negative N per , the input signal is 180° phase shifted, and for positive N pgr , a positive bL is programmed; a negative bL is kept in an erased state (vice versa for negative N pgr ).
readout. Fairly common resolutions for input, weight and output signals for neural networks are in the range of 4-8 bits (16-256 levels) 31 . This analogue-like resolution has a significant influence on scalability. Typically, lower precision is needed for inference tasks.
With respect to the memory material, one can generally conclude that charge-trapping memories (for example, SONOS) have shown up to 31 levels down to 40 nm (ref. 16 ). The disadvantage of this memory technology is the relatively high write energy and slowness during writing (millisecond regime). However, SONOS might be an alternative for inference-only applications. On the other hand, hafnium oxide (a ferroelectric) has very low write energies and is fast (nanosecond to microsecond regime). Ongoing research is still underway on the scalability of ferroelectric memories with regard to analogue storage. From FeFETs, it is known that they tend to show abrupt switching events below 500 nm, which is attributed to the limited grain size 15 .
Regarding capacitive measurement resolutions, some work was done in the context of DNA sensing and chip interconnect measurements with resolutions down to <10 aF (charge-based capacitive measurements, capacitance-to-frequency conversion and lock-in detection) [32][33][34][35][36] . These are similar to a conventional sense amplifier 37,38 and contain an integration capacitor that is charged either by an operational amplifier circuit or a current mirror. Details on the sensitivity calculation can be found in Supplementary Section 3; generally, however, one has to consider that in neuromorphic devices, the accumulated charge from many memory cells (several hundreds to thousands) is read out at once and used for further information processing, which gives rise to much larger charges compared with only one cell. Furthermore, several pulse/period numbers are used for encoding the input value and leads to step-wise charge integration over many periods. For the device shown in Fig. 5, N per = 142 periods is necessary, which fits well into a range of 7-8 bits of the input signal (Supplementary Section 3). Note that 128 periods are sufficient for an 8-bit signed integer due to the use of the 180° phase shift for negative values of the switched capacitor approach.
Regarding the noise level of capacitive devices, one has to consider kTC noise.
where k B defines the Boltzmann constant, T the temperature and C the capacitance. For a 6.65 aF device (Fig. 5d), one obtains a noise voltage of 25.00 mV (at room temperature), which is 14 times lower than the effective readout value of 0.35 V. However, one has to consider that the noise level decreases with the number of repetitive measurements, namely, 1/ √ Nper, which results in a noise level of 2.20 mV (at room temperature) or 169 times lower than the effective readout value; this defines a precision of ~7 bits. Based on this minimum amplitude necessary to distinguish between different levels, it also becomes possible to assess the theoretical energy efficiency of resistive and capacitive devices in general (Supplementary Section 4): capacitive devices are at least eight times more energy efficient than resistive devices.

Simulation of ultrahigh energy efficiency
Much of the energy sourced to 'memcapacitors' can be recovered since it is stored in the capacitor; this is an important difference from resistors in which the readout operation is inherently dissipative due to Joule heating. The energy fed in during charging can be, in principle, recovered during discharging. This concept of energy recovery is also present in adiabatic circuit designs 39,40 , which are at the core of the reversible computing paradigm 41,42 . The limiting factor of energy recovery in adiabatic circuits are resistive losses in the circuit, as well as in the inductances used for the power clock generators. The inductances have limited quality factors (q factor) in the order of dozens to hundreds. In common adiabatic realizations, energy recovery of the supply clock generators is of the order of 95% for harmonic signals [43][44][45] , which means the supplied active power is q = 20 times lower than the reactive power. To estimate the time delay, areal efficiency and energy efficiency (Table 1) of a realistic crossbar arrangement (including parasitic elements), a SPICE model (Supplementary Fig. 4a) for the 90 nm device was developed (Supplementary Section 5). One can conclude that extremely fast readout transitions can suppress shielding in the SL, since charge cannot be supplied any longer (silicide lines are a critical resistive path). In the table, the energetically worst-case scenario was assumed: all the WLs are activated at once and all the weights are zero with a resulting shielding effect, which, in turn, would lead to charging in the top gate oxide. Table 1 summarizes the minimum period of time for different matrix sizes, which is proportional to the RC delay, with R being the resistance and C the capacitance. The areal efficiency A η in TOPS mm -2 can be derived from the memory footprint (2 × 8 F 2 ), assuming differential weights and the earlier mentioned time delay. The active (W p ) and reactive (W r ) energy per cell for 142 periods is also summarized in Table 1. With this estimate in mind, we can conclude a minimum energy efficiency η rec of 3,452.6 TOPS W -1 in the worst-case scenario for 0% input signal sparsity and 100% weight sparsity and an energy recovery of 95% (Supplementary Section 5). Without any charge recovery, the energy efficiency η would amount to 198.5 TOPS W -1 . In a realistic neural network scenario, for example, a one-layer perceptron trained on the Modified National Institute of Standards and Technology (MNIST) database, the energy efficiency is 29,600 TOPS W -1 including charge recovery (Supplementary Section 6). Without recovery, the efficiency amounts to 1,702 TOPS W -1 for MNIST.

comparison of simulation and experimental results
To verify the functionality of the simulator, we performed simulations of the device with 60 µm gate length (Fig. 2). As shown in Fig. 5f, experimental data from Fig. 2d match well with the simulated data.
As shown in Supplementary Fig. 14, we measured the gate charging current together with the applied readout a.c. voltage for the single device (Fig. 2), and a perfect 90° phase shift is visible. From the curves, we can calculate the reactive (W R ) power consumption per period (using equations 31-33, Supplementary Section 5) and obtain W r = 3.22 nJ per period. Furthermore, for 142 periods, as in the simulation, we obtain the total reactive energy for one MAC operation, namely, W r,tot = 457 nJ per cell. If we scale this value by seven orders of magnitude, we obtain W r,scaled = 45.7 fJ per cell (capacitance shown in Fig. 2d is seven orders of magnitude lower compared with the capacitance of the simulated 90 nm device shown in Fig. 5b).
This value is approximately ten times higher than the value shown in Table 1 (5 fJ per cell). One has to consider that the thickness of the buried oxide of the experimental devices is much thicker (190 nm) than in the case of the 90 nm device simulation (15 nm), leading to a 12.7 times lower readout capacitance/area at approximately the same gate oxide capacitance/area. Also considering the different device silicon thicknesses, one can obtain a corrected reactive energy of W r,scaled,corr = 5.84 fJ cell, which is very close to the value shown in Table 1. Other influencing phenomena during scaling, like short channel effects (Fig. 5c), quantum confinement and band-to-band tunnelling, are explained in Supplementary Section 10.   voltages V AK along the p + nn + diode (quasi-static simulation). the voltage V AK was applied antisymmetrically, as that in Fig. 2. c, capacitive dynamic ratio (maximum capacitance/minimum capacitance of the cV curves with p + nn + connected to GND) for different gate lengths and gate oxide thicknesses. the inset shows the electron density, and the short channel effect becomes obvious. EOt, equivalent oxide thickness. d, Shifting of the cV curves for V AK = 0 V for different memory charges in the gate oxide. Note the applied readout a.c. signal with bias. e, Accumulated charge (Q acc ) for different voltage shifts (V shift ; caused by memory charges) over one-half period of the a.c. signal in d. f, comparison of the simulated and experimental capacitive coupling curves for the micrometre-scaled device shown in Fig. 2. conclusions We have reported a memcapacitive device with the potential to deliver high tera-operations per second per watt when scaled. By using a shielding layer between two electrodes, we can achieve high dynamic ratios of ~1,480 for microscale devices and ~90 for simulated 90-nm-sized devices. Furthermore, a 5 × 5 image recognition task was implemented using an experimental crossbar array with 156 memory cells. Circuit-level simulations and noise-level calculations show that our memcapacitive devices can potentially offer superior energy efficiency compared with conventional resistive devices. Using adiabatic charging, most of the charging energy of the capacitors can be recovered. This allows a combination of reversible computing and neuromorphic computing. The energy efficiency of the human brain is estimated to be in the range of ~10 fJ per operation (ref. 46 ) (or 100 TOPS W -1 ), which is similar to current memristive-device-based approaches 13,16 . Our approach could potentially offer an energy efficiency of 1,000-10,000 TOPS W -1 . The technology is also compatible with complementary metal-oxide-semiconductor technology and could be fabricated using state-of-the-art processes.

Methods
The technology computer-aided design (TCAD) simulations were performed with Synopsys and SPICE-level simulations were performed with LTspice. In the TCAD simulations, the drift-diffusion equations (electron + hole continuity equation and Poisson equation) were included. Furthermore, Shockley-Read-Hall recombination and electric-field-, temperature-and dopant-dependent mobility models were included. The influence of quantum confinement and band-to-band tunnelling was investigated in Supplementary Section 10. The devices were fabricated using a silicon-on-insulator wafer with an n + -handle, 3.5-µm-thick epitaxial layer; a 190-nm-thick buried oxide layer; and an 88-nm-thick device layer. First, alignment marks were etched into the device layer, followed by boron-and phosphorous-ion implantation and subsequent activation annealing. The interface oxide was chemically grown by Standard Clean 1 solution and O 2 oxidation at 750 °C. The Hf 0.5 Zr 0.5 O 2 deposition with a TiN capping layer was carried out by atomic layer deposition and annealed at 600 °C. The Hf 0.5 Zr 0.5 O 2 was patterned for contact holes and the first aluminium metallization was deposited by sputtering. The SLs were etched by ion beam sputtering and the BLs were separated by the reactive-ion etching of 7-µm-deep trenches. The trenches were refilled by SU-8 resist and the second metallization layer (WLs) were insulated from the first metallization layer by another patterned SU-8 layer.
Measurements were carried out with a function generator (Agilent 33500B), a lock-in amplifier (Stanford Research Systems SR830) and a current pre-amplifier (Stanford Research Systems SR570). A DSO5052A oscilloscope was used for visualizing the measured currents.
The PCB for the neuromorphic chip was designed using EAGLE and manufactured by Eurocircuits GmbH. A data acquisition system (USB-6363, National Instruments) was used for controlling the PCB. The measurement routines were written in LabVIEW. Python was used for simulating the Manhattan algorithm and Keras for MNIST simulation.

Data availability
The data that support the findings of this study are available from the corresponding authors upon reasonable request.

code availability
The code that supports the findings of this study is available from the corresponding authors upon reasonable request. *All cells are erased (worst-case scenario), 95% energy efficiency of power clock source Necessary time period T per and resulting areal efficiency A η for different matrix sizes. the reactive energy during the readout of arrays, W r , and active energy, W p , are obtained from simulations (Supplementary Section 5). the energy is presented per cell and for 142 periods. From this number and assuming a 95% energy recovery of the power source, energy efficiency η rec (in tOPS W -1 ) can be calculated for the energetically worst-case scenario (erased state). the same applies for energy efficiency η without recovery.