Abstract
Biological neural networks do not only include longterm memory and weight multiplication capabilities, as commonly assumed in artificial neural networks, but also more complex functions such as shortterm memory, shortterm plasticity, and metaplasticity  all collocated within each synapse. Here, we demonstrate memristive nanodevices based on SrTiO_{3} that inherently emulate all these synaptic functions. These memristors operate in a nonfilamentary, low conductance regime, which enables stable and energy efficient operation. They can act as multifunctional hardware synapses in a class of bioinspired deep neural networks (DNN) that make use of both long and shortterm synaptic dynamics and are capable of metalearning or learningtolearn. The resulting bioinspired DNN is then trained to play the video game Atari Pong, a complex reinforcement learning task in a dynamic environment. Our analysis shows that the energy consumption of the DNN with multifunctional memristive synapses decreases by about two orders of magnitude as compared to a pure GPU implementation. Based on this finding, we infer that memristive devices with a better emulation of the synaptic functionalities do not only broaden the applicability of neuromorphic computing, but could also improve the performance and energy costs of certain artificial intelligence applications.
Similar content being viewed by others
Introduction
Biological neural networks (BNNs) have inspired today’s most successful artificial neural networks (ANNs), which consist of neurons linked through connections known as synapses. Traditionally, each synapse in such a network serves three functions: (1) storage of longterm memories in their weights (W), (2) synaptic transmission  modeled as inputweight multiplication, and (3) longterm plasticity  the update of W during training.
However, these ANN synapses only capture a subset of the functionalities of biological ones. The latter follow complex biophysical dynamics and learning rules such as Hebbian plasticity^{1} and shortterm plasticity^{2,3} (Fig. 1a). Additionally, higherorder plasticity rules exist that do not directly determine the synaptic weight, but rather the properties of the plasticity rule itself. One example is the control over the decay timescale of the shortterm plasticity rule, which can range from milliseconds to minutes, depending on the neuronal activation^{2,4}. These rules, known as metaplasticity^{5,6}, play a crucial role in demanding tasks that require not only learning but also learningtolearn, i.e., metalearning^{7,8,9,10,11,12}.
The complexity of biophysical mechanisms in synapses and the corresponding plasticity rules are essential for nervous system function (e.g., refs. ^{13,14,15}), but are missing in conventional ANNs. This limited biological realism might partly explain why artificial intelligence (AI) systems often perform inferiorly to humans and animals in various aspects, such as motor skills and adaptability to dynamic environments^{16}. Moreover, today’s ANNs consume vast amounts of energy due to the large network size required for complex tasks^{17}. For instance, training the large language model GPT3 consumed 1.287 GWh of electrical energy^{18}, enough to power over 100 households for a year.
To address these issues, a more bioinspired model for synapses was developed, incorporating shortterm and Hebbian plasticity, as well as metaplasticity^{19}. Specifically, this model, known as the STHebb synapse, does not only perform the abovementioned three functions of traditional ANN synapses but also includes additional roles such as (Fig. 1b): (4) storage of shortterm memories (F) that decay over time, (5) shortterm plasticity  the update of F (ΔF) during training and inference, and (6) metaplasticity  the control over the decay time. To incorporate STHebb synapses into a deep neural network (DNN), the shortterm plasticity neuron (STPN) model has been proposed (Fig. 1c), combining a conventional neuron model with STHebb synapses^{20}. This model utilizes all six synaptic functions (1) to (6), incorporates metalearning, can be integrated into multilayer networks, and outperforms more conventional ANNs with less biologically realistic synapses in various challenging tasks.
The hardware of choice to run such neural networks are parallel computing architectures like graphics processing units (GPUs). However, GPUbased implementations of multifunctional synapses suffer from the computational overhead caused by the aforementioned additional synaptic operations. This trend is exacerbated by the large amount of synapses building stateoftheart neural networks, ranging from 10^{6} to 10^{14}^{21}. On top of that, the operations governing STHebb’s synaptic dynamics are memory bound and are thus negatively affected by the wellknown von Neumann bottleneck imposed by physically separated memory and processing units^{22}. These factors render the implementation of STHebb synapses on GPUs inefficient, thus motivating the development of new hardware paradigms that are better suited to neural networks with multifunctional synapses.
Several promising neuromorphic architectures use memristors as hardware synapses because of their ability to collocate memory and computation in a single device, which circumvents the von Neumann bottleneck^{23}. Memristors are twoterminal devices that can change their conductance state upon electrical^{24,25} or optical^{26,27} stimuli, similar to the change of the synaptic coupling (weight) upon a neuronal spike in biological systems. A growing body of research suggests that the rich internal dynamics of memristors can be leveraged to mimic biophysical processes taking place in synapses and neurons^{28,29}.
There have been multiple demonstrations of bioinspired hardware synapses realized using memristors with both long and shortterm dynamics^{30,31} that exhibit biological learning rules such as triplet spiketimingdependent plasticity (tripletSTDP)^{32} or BienenstockCooperMonroe (BCM)^{33}. However, these demonstrations rely on spike timing plasticity rules and can therefore not be integrated into DNNs^{34}, which limits their applicability. Meanwhile, a singlelayer neural network that makes use of bioinspired, multifunctional synapses was recently demonstrated on memristive hardware^{35}. The authors showed the benefit of adding shortterm synaptic plasticity during inference for a classification task in dynamically changing environments. Memtransistive devices were used as synapses. In addition to the two electrical contacts common to all memristors, they possess a gate analogous to transistors. To realize decaying traces a voltage signal with the shape of the shortterm decay was applied to the third gate contact. Shortterm plasticity is therefore not an intrinsic property of these devices, i.e., the devices do not inherently exhibit shortterm memory, but require an additional stimulus to do so. The need for threeterminal devices and precisely engineered voltage signals applied to each memtransistive synapse poses challenges for a largescale implementation of such systems, because the required control circuit and wiring would rapidly become considerably complex. Therefore, the introduction of a twoterminal memristive device that intrinsically encompasses all six synaptic roles (1–6) is key to enable scalable neuromorphic hardware that is not only energyefficient, but also reaches or even surpasses the performance of conventional AI approaches.
In this work, we propose such a twoterminal memristive device that relies on the valencechange switching mechanism in SrTiO_{3} (STO)^{36} and intrinsically possesses the six operations needed to function as an STHebb synapse. A symbolic representation on top of an SEM image of the fabricated nanoscale device is shown in Fig. 1d. The measured memristor conductance acts as the plastic synaptic weight and mirrors the behavior displayed in Fig. 1b. Specifically, our device can store two different states in its memory, (I) a state with slow dynamics (longterm weight W) and (II) a state with fast dynamics (shortterm weight F), which are both encoded in the conductance of the memristor. In terms of computation, the four synaptic operations labeled 2, 3, 5 and 6 in Fig. 1b can all be performed by our STO devices: (III) Longterm plasticity (i.e., change in the longterm weight W) and (IV) shortterm plasticity (shortterm weight update ΔF) can both be triggered by voltage pulses of different magnitudes. Notably, the shortterm decay happens spontaneously, without the application of a complex signal. (V) Metaplasticity (i.e., control over the decay time) can be achieved by applying a DC bias voltage to one of the two terminals, which limits the complexity of the control circuit and wiring. (VI) Additionally, our devices provide the standard inmemory multiplication capabilities of the input (voltage U) by the synaptic weight (conductance G), which is realized by Ohm’s law I = G ⋅ U. They also exhibit low cycletocycle variability due to their nonfilamentary switching operation. As a consequence, the random displacement of few atoms does not induce as much noise as in filamentary valencechangetype memristors^{37}. Moreover, we can operate our devices at very low conductance values (10s of nS), which lowers the power consumption during operation. Their achievable shortterm timescales range from 10 milliseconds to 100’s of seconds. Importantly, timescales in the order of 100 seconds are typically difficult to realize with nanoscale footprints using other neuromorphic approaches such as analog circuits because the required capacitors rely on much larger dimensions^{38,39,40}.
To estimate the energy consumption of our multifunctional hardware synapses in the context of a large DNN, we introduce a modified STPN (mSTPN) unit that emulates parts of the device characteristics and fully incorporates the measured energy consumption of our devices. We then integrate this unit into the original STPN network simulator of ref. ^{20} to perform a complex reinforcement learning task in software with multifunctional synapses, namely learning to play Atari’s video game Pong. The Atari suite is a common benchmark for reinforcement learning and is chosen here as an exemplary task for a dynamic environment. We show that the mSTPN unit enables faster and more stable training compared to the original version for the task of Atari Pong. A major reason for this is the introduced constraint on the shortterm decay time constant imposed by our devices. Furthermore, we demonstrate that shortterm weights with long timescales, such as the ones exhibited by our memristors, are required for a robust and fast training of the network. Finally, we compare the network’s energy consumption for a pure GPU implementation of the synapses with the estimated energy consumed by our memristive synapses. We demonstrate an estimated gain in energy efficiency between 96× and 966× , depending on the GPU implementation.
Results and discussion
Multifunctional synaptic behavior in single memristor
We fabricated a multifunctional memristive synapse on an STO single crystal substrate (Fig 1d). We chose STO as active material, because it is a versatile and wellunderstood platform with rich internal dynamics due to the generation and movement of oxygen vacancy defects^{41,42} that can be tuned by e.g., doping^{43,44,45}, different electrode materials^{33,46} or interface engineering^{47}. First, a high work function contact (Pt with a Cr adhesion layer beneath) was deposited. This step was followed by the fabrication of a Ti electrode with a Pt capping layer that prevents the Ti from oxidizing in air. Both contacts were deposited using electron beam evaporation and patterned by electron beam lithography with a subsequent liftoff process, resulting in a typical gap between the electrodes of roughly 40nm. The devices were annealed at 300 °C for 20min in flowing Ar, which causes a thermal oxide to form at the TiSTO interface. The whole stack was finally covered with a uniform layer of 15nm of SiN. The fabrication process is discussed in detail in Methods section “Device fabrication”.
Figure 2a shows 30 cycles of the IV characteristics of our Cr/PtSTOTi memristor (Fig. 2b). The voltage (−2V to 2V) is applied to the Pt electrode, while the Ti one is grounded. A high cycletocycle repeatability as well as low conductance values (10s of nS) are obtained, which allows for energyefficient device operation. The low conductance values and the counter clockwise switching direction, as indicated by the black arrows, are attributed to a nonfilamentary switching mechanism, which has already been reported for similar material stacks^{48}. In this switching regime the conductance change is not caused by the formation of a filament made of oxygen vacancies (\({{{\rm{V}}}}_{{{\rm{O}}}}^{\bullet \bullet }\)) that bridges the two electrodes, but by the modulation of the Schottky barrier at the PtSTO interface^{49}. This modulation is attributed to generation and recombination of \({{{\rm{V}}}}_{{{\rm{O}}}}^{\bullet \bullet }\)’s upon the application of an external voltage (bottom of Fig. 2b). The vacancies in turn locally dope the STO, which changes the height and width of the Schottky barrier, affecting the conductance. When a positive voltage is applied to the Pt contact, oxygen from the crystal (\({{{\rm{O}}}}_{{{\rm{O}}}}^{\times }\)) moves to the PtSTO interface or into the porous Pt electrode, leaving behind a positively charged crystal defect^{42}. This kind of ntype doping increases the conductance due to a decrease in the Schottky barrier height and width. Since \({{{\rm{V}}}}_{{{\rm{O}}}}^{\bullet \bullet }\)’s are mobile and positively charged, they migrate away from the Pt electrode along the applied electric field towards the Ti electrode, where they accumulate and potentially form a filament in a process called electroforming^{50}. We observed that for high positive voltages (>4V) we are able to electroform our device and put it in a filamentaryswitching operation (Supplementary Section S2). This confirms the generation of \({{{\rm{V}}}}_{{{\rm{O}}}}^{\bullet \bullet }\)’s at positive voltages in our devices and allows to distinguish the filamentary and nonfilamentary regime based on an analysis of the IV characteristics.
The amount of vacancies generated as well as the distance over which the \({{{\rm{V}}}}_{{{\rm{O}}}}^{\bullet \bullet }\)’s migrate from the Pt contact depend on the voltage and duration of the applied electrical signal^{51}. Long electrical pulses at high voltages are expected to lead to a high vacancy concentration extending far away from the Pt electrode whereas short, lowvoltage pulses result in a relatively small vacancy concentration close to the Pt. After these pulses the generated vacancies migrate back towards the Pt contact without an external voltage, driven by a gradient in electrochemical potential^{41} and get filled there by the interfacial oxygen^{52}. In addition to the incorporation of molecular oxygen from the porous Pt electrode also atmospheric water vapor can lead to the filling of oxygen vacancies by incorporating oxygen from water molecules into STO^{53}. Through such processes vacancies start disappearing from the vicinity of the Pt contact, forming a growing, vacancyfree region. Since the Schottky barrier is mainly sensitive to the vacancy concentration immediately adjacent to the Pt electrode even small vacancy movements in this region significantly change the contact resistance and thus the overall device conductance^{42}, explaining the observed conductance decay in our memristors. Furthermore, the vacancies close to the Pt get annihilated first in timescales of minutes (shortterm), while the vacancies further away require an increasingly long time to migrate back resulting in timescales of multiple hours (longterm), as described in^{41}. Therein, this slowdown is attributed to the builtin electric field at the PtSTO interface, which decreases monotonically with the distance from the Pt electrode. It is also likely that the oxygen incorporation kinetics at the PtSTO interface play a role in determining the shortterm decay timescale^{52}.
Additionally, the back migration flux of \({{{\rm{V}}}}_{{{\rm{O}}}}^{\bullet \bullet }\) and subsequent vacancy filling at the PtSTO interface can be increased by the application of a negative bias, leading to a faster conductance decay. Hence, the decay time can be voltagecontrolled. A summary of the postulated physical mechanisms and how they underlie the synaptic functions in Fig. 1b is given in Supplementary Section S6. Even though this physical picture supports our experimental observations it cannot be excluded that other effects play an important role in the switching process, such as interface trap states^{54} or protonic conduction, which is well studied in oxidebased memristors^{53,55,56,57}. Further investigations will be needed to unequivocally determine the physical mechanism(s) at the origin of our devices’ behavior.
In our approach the memristor’s conductance implements the synaptic weight, whose dynamics (long and shortterm) are crucial in STHebb synapses (Fig. 1b). To investigate the conductance dynamics of our STO memristors we apply pulses of different voltages and widths to them (Fig. 2c, d). We first induce longterm plasticity (function 3 in Fig. 1b) by applying 100 SET pulses with an amplitude of 4V and a duration of 500μs that cause the device to switch from a low to a high conductance state (Fig. 2c). This high conductance state slowly decays over thousands of seconds without applied bias (Supplementary Section S3). After the SET procedure we leave the device at 0V for 240s (not shown) to let it settle to a stable state. We then proceed with measuring the conductance of the device at 0.6V for 375s (Fig. 2d), during which 100μslong pulses with voltages of 2, 2.5, and 3V, are applied. The longterm conductance induced by the SET pulses remains largely constant for the time period of the measurement. The 100μslong pulses lead to a short term conductance increase, i.e., shortterm plasticity (function 5 in Fig. 1b), whose magnitude depends on the pulse voltage (3, 5 and 10nS for 2, 2.5 and 3V, respectively) and is followed by a decay. This can be observed in Fig. 2e, where the conductance during the three last pulses of the protocol (dotted rectangle in Fig. 2d) is plotted. The conductance during the read voltage is shown, omitting the values during the 100μslong pulse. In Fig. 2f the long and shortterm components of the conductance (functions 1 and 4 in Fig. 1b) are visualized for six measurements with different values of the longterm weight W. The measurement data was obtained by repeating the protocol of Fig. 2c, d multiple times, i.e., first setting the longterm weight (W_{1}, W_{2}, ...) by 100 SET pulses, waiting for 240s, and then applying the shortterm pulse protocol of Fig. 2d. Values of the longterm conductance in the range of 12 to 23nS can be set in this way (longterm plasticity). These conductance values can further undergo shortterm increases induced by voltage pulses (shortterm plasticity). The obtained collocation of both long and shortterm plasticity motivates the use of this devices as STHebb synapses.
The shortterm plasticity is investigated in more detail in Fig. 3a, which displays the mean (solid line) and standard deviation (shaded area) of five measurements. Pulseinduced shortterm conductance updates (ΔF) and subsequent decays are obtained using four different voltage amplitudes (2, 2.5, 3, and 3.5V). The pulse width was fixed to 100μs and the read voltage to 0.6V. The conductance values were normalized by subtracting the initial conductance at t=0 from the data. We observe low cycletocycle variability, in agreement with the IV characteristics in Fig. 2a. The same measurement was repeated for two additional pulse widths (20 and 500 μs). The resulting ΔF’s are reported in Fig. 3b as a function of the pulse amplitude and width. It can be seen that ΔF values in the range of 0.7  38.6 nS can be achieved by adjusting these parameters. The corresponding energy per pulse is given in Fig. 3c for the same pulse voltage and width combinations. The details of the energy calculations are given in Supplementary Section S7 and a measurement with 200 pulse cycles is given and discussed in Supplementary Section S4.
Besides the magnitude of the conductance increase, it is also possible to control the subsequent decay using a DC bias voltage (V_{bias}) that is constantly applied during the experiment (Fig. 3d), effectively implementing metaplasticity (function 6 in Fig. 1b). The mean and standard deviation of the conductance for five measurements are shown as a function of time. The voltage pulse that triggers the conductance increase is the same in all cases (3.5V / 500μs), thus resulting in similar ΔF, whereas the bias voltage is varied (see Supplementary Section S9 for details). The timescale of the decay increases with increasing V_{bias} from hundreds of ms (V_{bias} = −0.6V) to tens of seconds (V_{bias} = 0.6V). To quantify the resulting decay time constant (Λ) as a function of the bias voltage, we fitted an exponential to the measured curves (Supplementary Section S9). In our fit, the maximum value of Λ = 1 indicates no decay and the minimum value (Λ = 0) corresponds to immediate decay. Similar measurements were performed on other devices to qualitatively assess devicetodevice variability (Supplementary Section S5). Figure 3e demonstrates that we can experimentally control Λ over a range from 0.08 to 0.92 as a function of the applied V_{bias}. The relationship between V_{bias} and Λ is modeled by a sigmoid function \(\Lambda ({V}_{bias})=\frac{L}{1+\exp (k\cdot ({V}_{bias}{V}_{0}))}+{\Lambda }_{0}\), where L, k, V_{0}, and Λ_{0} are fitting parameters.
In summary, the following functions are performed intrinsically by our memristors: Storing both (1) long (W) and (2) shortterm (F^{(t)}) weights (Fig. 2f), (3) longterm plasticity (Fig. 2c), (4) shortterm plasticity (Fig. 3a, b), (5) metaplasticity via control over the decay time parameter Λ (Figs 3d, e), and (6) multiplication of the input voltage with the synaptic weight according to Ohm’s law.
DNN with multifunctional memristive synapses
The six intrinsic functionalities of our memristors can be utilized by STHebb synapses in a deep STPN network. Such networks have been shown to outperform traditional DNN implementations without multifunctional synapses at a variety of complex tasks in dynamic environments^{20}. One such dynamic task is learning to play Atari Pong, a video game and common machine learning benchmark. In Pong a player (the STPN network) confronts an opponent, each manipulating a vertically movable bar to strike a ball, aiming to get the ball past the opponents bar (i.e., scoring a point) or preventing the opponent from doing so. The game concludes when either player scored 21 points. The STPN network’s reward is the difference between the player’s and the opponent’s points at the end of the game. Given only this scalar reward as input, the network finds a strategy that results in the maximum score of 21 by repeatedly playing the game and employing reinforcement learning, a bioinspired learning paradigm^{58}. Below we show the development of a modified STPN unit (mSTPN), which are STPNs^{20} with a modified weight normalization scheme (see Methods section “Modified STPN model” for details and benefits of this approach). These units make use of our multifunctional synapses to play Atari Pong. Through simulation we could estimate the energy consumption of the whole network if it were running on our memristive hardware and compare it to a pure GPU implementation.
Modified shortterm plasticity neuron
The deep STPN network simulator investigated here (Fig. 4a) employs a network layer consisting of our modified STPN units (mSTPN layer). The network itself relies on an actorcritic architecture that takes frames of the Atari Pong environment as inputs and computes both the next action to take in the environment (actor) as well as an estimation of the value of the current state (critic). The frames are first processed by two convolutional layers into a dense feature set that forms the input for the mSTPN layer. The latter consists of 64 mSTPN units, each of which is connected through STHebb synapses to the 2592 inputs as well as recurrently to 64 outputs. In total, this amounts to (2592 + 64) ⋅ 64 = 169984 synapses. The output of the mSTPN layer is then fed into two fullyconnected linear layers that compute the next action (the actor’s next step to take in the game) and the current value (how advantageous is the current game state). To compare the influence of the STPN implementation on the training performance, three networks with different STPN layers (mSTPN, STPN, and no plasticity) were investigated (Fig. 4b). Here, the reward during training is plotted as a function of the steps taken by the actor (see Methods section “Network training” for details). Each curve represents the average reward of 16 agents that learn to play the game with different randomly initialized parameters. We observe that in terms of training speed both mSTPN and STPN outperform the no plasticity implementation (i.e., a traditional recurrent layer without time dependent synaptic weights). Furthermore, the mSTPN version learns slightly faster than the original STPN network, while also exhibiting a much smaller standard deviation among different training runs (shaded areas in Fig. 4b). While the robust training performance of the mSTPN layer is encouraging, the main aim of our mSTPN’s is to show that our mutlifunctional memristors can act as hardware STHebb synapses in the STPN network of Fig. 4a. To achieve this, the following device characteristics where implemented into the mSTPN units: (1) mapping of the memristor conductance (G_{meas}) to the simulated, unitless synaptic weight (G) by the linear relationship
with m = 2nS and G_{min} = 12nS. (2) Adding a discretization operation to the simulated shortterm weight update (ΔF) that limits the number of ΔF values (states) to an amount that can be resolved by our memristors. To satisfy this requirement, the conductance values corresponding to two adjacent states should be separated by at least one standard deviation, which is below 1nS for all shortterm weight updates ΔF_{meas} (max. ± 0.9 nS in Fig. 3b). We therefore chose a discretization step of 1nS for ΔF_{meas}, which translates to a step of 0.5 for the simulated ΔF according to Eq. (1). (3) Fixing the maximum of ∣ΔF∣ to 20, which makes sure that the weight update remains in a range that is achievable by the STO memristors. A histogram of ΔF for all synapses during an entire Pong game, with and without nonidealities, is given in Supplementary Section S11. (4) Limiting the range of the decay time constant Λ to values that can be reached by our devices ([0.08, 0.92]). Furthermore, it was observed that the constraining of Λ also has an impact on the training performance of the network, as shown in Fig. 4c. The five lines denote different constraints imposed on the learned decay time parameter Λ. Notably, it is beneficial to incorporate synapses with large decay time constants during training: The larger the upper limit of Λ the faster the reward increases. Unexpectedly, the case with Λ = 0 (i.e., immediate decay of the shortterm weight changes for all synapses) also learns, albeit slower and less robustly, as can be seen from the larger standard deviation compared to Λ = [0.08, 0.92] (inset of Fig. 4c). The longer, constrained decay times were made possible by the modified weight normalization scheme in mSTPN’s (Methods section “Modified STPN model”). Because the decay constant Λ is naturally limited in our devices, destabilizing phenomena such as an exponential gain (Λ > 1) instead of a decay (Λ < 1) are automatically prevented. Also note that nonvolatile memristive devices, which correspond to a Λ = 1, are insufficient for the implementation of synapses in STPN networks (Supplementary Fig. S15)
After training some of the 16 trained agents achieve the maximum reward of 21 (Supplementary Section S12). The total synaptic weight value G = W + F of a single synapse of such a trained agent is reported in Fig. 4d over the course of an entire game that lasts roughly 50 seconds. This specific synapse was chosen because it exhibits the largest synaptic changes (ΔF) in the whole network. It is therefore referred to as S_{max{ΔF}} in the remainder of the text and will serve as a representative example for the behavior of a synapse in an STPN network. It is observed that the value of the synapse’s weight G changes over time due to the shortterm plasticity of STHebb synapses. Importantly, the shortterm updates are sparse, which makes the implementation of this reinforcement learning task energy efficient on our memristive hardware as only a small number of energy consuming shortterm weight updates (ΔF) are needed. The zoomin additionally shows both the longterm weight component W (in red) and the shortterm weight updates ΔF (in black). Each simulation timestep is marked by a dot.
Energy consumption of deep STPN network
Next, we estimate the energy consumption of synapse S_{max{ΔF}} for the duration of the entire game if it were implemented on our memristor. Two sources of energy loss are considered: Firstly, each voltage pulse that causes a shortterm weight update consumes energy (E_{pulse}) (Fig. 3c). Secondly, due to the application of a constant bias to control the decay time a small current continuously flows through the devices, inducing a power loss (P_{bias}). We address these two components separately. Figure 4e reports the first one (E_{pulse}) as a function of the shortterm weight updates ΔF. This quantity is extracted from the measurement data in Figs. 3b and 3c, for different pulse widths (w_{p}). The measured energy data points closely follow a power law relation: E_{pulse}(ΔF) = c ⋅ (ΔF)^{α} with c = 30pJ and α = 1.52. This power law relation was incorporated into our neural network simulator to estimate the energy consumption of the shortterm weight updates in our memristors. Because the value of ∣ΔF∣ is limited to 20 and because the weight updates are sparse, this first contribution to the energy consumption remains low. In Fig. 4f the second contribution to the energy consumption (P_{bias}) is given as a function of the total synaptic weight G. It is calculated according to \({P}_{bias}= {G}_{meas} \cdot {V}_{bias}^{2}\). Note that even for a simulated weight of G = 0 there is a remnant power draw (except if V_{bias} = 0) because of the finite minimum conductance value G_{min} = 12nS of the physical devices. For the maximum bias voltage V_{bias} = 0.6 the power consumed by a synapse with a constant weight of G = 0 is therefore 4.3 nW. This low power consumption is a direct consequence of our memristor’s low conductance values, enabled by their nonfilamentary switching behavior.
In Fig. 4g the estimated energy consumed during inference over the course of a Pong game by either a memristor (blue) or a pure GPU implementation (orange) of synapse S_{max{ΔF}} is provided. In the memristor case, the energy consumption can be decomposed into two contributions, the shortterm weight updates (ΔF) and the applied bias voltage needed to control the decay time constant (Decay). These two components cover the shortterm synaptic plasticity and metaplasticity required by an STHebb synapse during inference. The standard inputweight multiplication is obtained through Ohm’s law I = G ⋅ V_{read}, where V_{read} encodes the input. The power consumed by this operation is however already accounted for by P_{bias}: The current resulting from the application of the maximum bias voltage \(\max \{{V}_{bias}\}=0.6V\) can be read out to compute the inputweight multiplication. To implement the same plasticity, metaplasticity, and inputweight multiplication on a GPU the following four operations need to be executed at every time step during the game (6826 in total): (1) Elementwise addition of short and longterm weight components, (2) elementwise multiplication of F with Λ for the shortterm decay, (3) elementwise addition of F and ΔF for the shortterm weight update, and (4) vectormatrix multiplication of inputs and weights (weight mult.). For each of these operations the GPU’s energy consumption was measured for a single synapse (see Methods section “GPU energy measurement”). It is found that the energy consumption of the memristor increases more slowly with the number of timesteps than the GPU baseline. It should however be noted that even though our multifunctional memristive synapse can fully mimic the behavior of an STHebb synapse, the operations of the neuron still need to be performed on a GPU: This concerns the calculation of the magnitude of ΔF via the first term in Methods Eq. (4), the calculation of the nonlinear activation function in Methods Eq. (3), and the normalization of the presynaptic input (Supplementary Fig. S11b).
To estimate the total synaptic energy consumption of the whole network the contribution of each synapse for an entire game of Pong has to be considered (Fig. 4h). Both the energy consumed by the ΔF updates (dark blue), and by the control of the decay time constant (light blue) are shown in the form of a histogram. Most synapses do not undergo any shortterm weight update during the entire game and therefore do not consume energy for this operation, as indicated by the large ΔF spike centered around 0. For the decay control, we assume the worstcase scenario where a bias voltage of 0.6 V is applied to all synapses. The current due to this bias can be read out, which accounts for the energy consumption due to the calculation of the vectormatrix multiplication between the input and the weights. A crossbar array architecture is assumed for this purpose.
The total energy (i.e., ΔF plus Decay) consumed by each memristive synapse is shown in the histogram of Fig. 4i. By summing up the contribution from all synapses we obtain a total energy consumption of 36mJ (Memristor row in Table 1). This value takes into account the four synaptic operations (ΔF, Decay, W+F, and weight multiplications) of all memristive synapses of the entire STPN network for a whole Pong game. To give a nuanced comparison with a pure GPU implementation, we provide two separate measurements using an NVIDIA A100 40GB device (Method section “GPU energy measurement” for details). We report the median of 100 individual runs per synaptic operation for half and singleprecision floatingpoint arithmetic (fp16 and fp32, respectively).
First, we measure the GPU’s energy consumption for executing each synaptic operation for all the network’s 169984 multifunctional synapses. The results are shown in the GPU (standard) row. It is observed that roughly one third of the total energy consumption stems from the three STHebb specific operations (ΔF, Decay, and W + F) and two thirds from the standard inputweight multiplication. We note that since the GPU is a massively parallel machine, this number of synapses may not fully utilize the device, potentially leading to lower energy efficiency. Indeed, the A100 GPU achieves the highest energy efficiency for a hypothetical network with around 2^{21} synapses. The case labeled GPU (optimal) is the corresponding energy consumption scaled to the original network’s number of synapses. By comparing the fp16 case of the GPU (optimal) energy consumption with the total in the Memristor row an improvement of a factor of 96 is obtained. The saved power is due to both the multifunctional nature of our memristors and their inmemory compute capabilities, which in combination allow for the simultaneous computation of four operations without any memory traffic. The absence of memory traffic is especially beneficial, because all operations considered (i.e., elementwise and vectormatrix multiplication) have little to no data reuse and are memorybound. As a consequence, most energy is consumed in data movement rather than computation (von Neumann bottleneck). This is demonstrated in Method section “GPU Energy measurement” where we quantify the energy consumption of the GPU’s memory traffic: It accounts for more than 98% of the total. We also provide a discussion on the latency and energy delay product (EDP) of our implementation in Supplementary Section S17. Note that the energy consumption in the memristor case was estimated from the behavior of individual devices and not based on a comprehensive circuit simulation encompassing the whole STPN network. Although such investigations would certainly lead to increased energy consumption^{59}, we believe that the memristor advantage is large enough (two orders of magnitude) to persist even under more realistic conditions.
In conclusion, we presented a twoterminal memristor based on STO that is able to store and compute both long and shortterm synaptic weight updates, effectively collocating memory and computation as well as long and shortterm dynamics. In particular, we demonstrated control over the shortterm decay time constant without the need for an additional electrical contact or complex control signals, which implements a form of intrinsic metaplasticity. All these features are essential for neuromorphic circuit implementations, e.g., STPN networks, which outperform traditional artificial neural networks in largescale, complex machine learning tasks such as Atari Pong. We contributed here to the development of these networks with the introduction of mSTPN units, increasing the reliability during training and highlighting the importance of long decay time constants. Finally, in simulation, we compared our memristor implementation of an STPN network to a GPU one and obtained a significant increase in inference energy efficiency by a factor of at least 96.
To fully realize our simulation concept in hardware, further work is needed: Firstly, our STO memristors should be converted to vertical structures, which is expected to reduce device to device variability and also allows for the creation of crossbar arrays. In such a vertical, thinfilmbased structure the spacing between the electrodes could most likely be significantly decreased, as compared to our planar devices, which in all likelihood will lead to lower operating voltages. Secondly, the longterm retention of our memristors should be improved, while still preserving their shortterm plasticity. It has been suggested that an oxide layer between the Pt electrode and STO could increase the retention of low conductance states^{51}. Moreover, since we observed a significant impact of the decay time constant on the training performance, different decay models should be investigated for both long and shortterm components in STPN networks. The advancement of such neural networks inspired by biology holds the potential to significantly increase the performance of AI applications across diverse dynamic environments. Furthermore, multifunctional memristive synapses with intrinsic dynamics could function as a key enabling technology for the energy efficient hardware implementation of nextgeneration neural networks.
Methods
Device fabrication
The STO single crystal substrate was first submersed into a 90 °C DI water bath under UV light illumination for 100 min^{60}. The substrate was then baked at 250 °C for 5min and subjected to an O_{2} plasma treatment (200W) for 3 min. This water leaching surface treatment is expected to produce an atomically flat, predominantly TiO_{2}terminated surface, which is characterized by terraces of 1 unit cell (u.c.) height. This was indeed observed at several locations of the substrate, as shown in Supplementary Fig. S1. Both electrode stacks (CrPt and TiPt) were then patterned using ebeam lithography and deposited by ebeam evaporation (Supplementary Figs. S2a and S2b). After deposition the whole device was subsequently annealed at 300 degrees for 20 min in Ar atmosphere (Supplementary Fig. S2c). This step causes a thermal oxide to form at the TiSTO interface, leaving behind oxygen vacancies^{61}. Annealing also likely leads to diffusion of chromium into STO, doping the STO in the process^{62}. The device stack was finally encapsulated within 30nm of SiN using plasma enhanced chemical vapor deposition (PECVD) to protect against oxidation (Supplementary Fig. S2d). The STO single crystal substrate was characterized by a four point probe measurement, which resulted in a surface resistance of >10 GOhm, exceeding the measurement limit of the setup. We can therefore safely ignore surface contributions to our device conductance.
Experimental setup
The quasi static IV characteristics were measured with a Keysight M9601A Source Measure Unit. Voltage pulses were generated with a Keysight 33500 Arbitrary Waveform Generator. The current was fed through a DHPCA100 transimpedance amplifier from Femto and read out with a Rohde&Schwarz RTE 1104 oscilloscope.
Modified STPN model
The equations describing the forward pass through an STPN layer follow^{20}:
where bold letters denote matrices, ⊙ elementwise multiplications and ⊗ outer products. The STPN layer model is parameterized by the longterm weight W, the Hebbian association strength Γ, and the shortterm decay parameter Λ. During training these 3 parameters are learned using back propagation through time (BPTT). While W directly controls the synaptic strength the Λ and Γ parameters define how the synaptic weight responds to stimuli, effectively implementing a form of metaplasticity or learning to learn. The plastic update of the synapse is modeled by Eq. (4). Equations (3) and (4) are adapted slightly from the original work in^{20} to reflect the specific implementation here. In addition to Eqs. (2) to (4) the original STPN model also includes a form of normalization on both the synaptic input \(x\to {x}_{eff}=\frac{x}{  W+F  }\) and the plastic weight \(F\to {F}_{eff}=\frac{F}{  W+F  }\) (Supplementary Fig. S11a). This speeds up stochastic gradient descent during training. The normalization of F leads to a modification of Eq. (4) where the decay parameter Λ becomes \({\Lambda }_{eff}=\frac{\Lambda }{  W+{F}^{(t)}  }\). As a consequence, the decay time constant changes at every time step, because F^{(t)} varies over time. Such variations can lead to instabilities during training and they cannot be straightforwardly implemented on our memristors. Another consequence is that the decay time constant Λ cannot be a priori constrained to a certain range because Λ_{eff} depends on the values of W and F, which are unknown at the start of training. However, clamping Λ is important as training becomes highly unstable if synapses reach values Λ_{eff} > 1 (see Supplementary Fig. S15). The solution adopted to circumvent this issue in the original formulation of^{20} consisted of starting with small values of Λ at the beginning of the training to ensure that Λ_{eff} does not exceed 1. This has the disadvantage that the network only slowly learns longer decay time constants. By removing the normalization of the plastic weight F and only normalizing the input (Supplementary Fig. S11b) in our modified STPN unit we achieve a better performance during training and also make the implementation on memristors feasible.
Network training
We closely follow the training protocol established in ref. ^{20}. Concretely we use RLLib^{63} to train and evaluate agents in PongNoFrameskipv4. During training the network repeatedly plays against the computer opponent of the gymnasium software library (a common python implementation of Atari game environments) on standard difficulty setting (0 out of 3). Preprocessing (dimensionality and color scale) for the game frames is done as in ref. ^{64} with the exception of frame stacking, which was omitted. The training parameters were also adopted from ref. ^{20}: rollout length (50), gradient clipping (40), discount factor (0.99) and a learning rate starting at 0.0001 with a linear decay schedule finishing at 10^{−11} at 200 million iterations. Models are trained from the experience collected by 4 parallel agents.
GPU Energy measurement
To fairly compare the efficiency of a memristor and GPU implementation of the network in Fig. 4a, it is essential that the GPU’s energy consumption is only measured for the specific arithmetic operations that can be performed on the memristor: (1) W+F, (2) Decay, (3) ΔF and (4) weight multiplication (for more details see Supplementary Section S15). On the GPU, these kernels take the form of (matrix) additions and multiplications that can be optimally performed on such hardware. A dedicated code was implemented in Python: It runs each kernel separately. To measure the GPU’s energy consumption, we use the pyJoules library^{65}, which is a Python wrapper for NVIDIA’s own energy reporting framework, NVIDIA Management Library (nvml). Since all operations have very short runtimes, to improve accuracy, we measure the energy spent for 10,000 to 200,000 executions of the corresponding kernels. We report the median of 100 multiexecutions and estimate the 99% confidence interval (CI) using bootstrapping with 1000 samples. We report the GPU energy consumption of operations (1) to (4) in three ways: (I) per full Atari Pong game of the whole neural network, which employs 64 ∗ 2656 = 169984 synapses and runs for 6826 time steps (Table 1 in the main text), (II) per operation, i.e., a single synapse and one time step (Table 2 in this section), and (III) for a single synapse over the course of a full Atari Pong game, i.e., 6826 time steps (Fig. 4g).
(I) For the GPU (standard) results in Table 1, the matrices W, F^{(t)}, x^{(t)}, and Λ required by operations (1) to (4) have the same size as in the neural network simulation. For the GPU (optimal) results, we increase the size of the matrices using the formula (2592 + 64) ⋅ k, where k is a power of two and ranges from 64 (original network size) to 4096. We report the energy spent for k = 1024, which exhibits the highest energy efficiency, scaled down to the original network’s size (see Supplementary Fig. S17).
(II) The GPU (standard) and GPU (optimal) rows in Table 2 were obtained by dividing the values of Table 1 by the number of operations executed during the whole game (64 ∗ 2656 ∗ 6826). The energy is given per floating point operation (flop) in pJ. Note that the weight multiplication is computed by a fused multiplyadd (FMA) operation, which counts as two flops (one for addition and one for multiplication).
In the GPU (compute) row of Table 2 we implement a CUDA kernel that only operates on data stored in registers without reading/writing from/to the GPU global memory. These results therefore measure the energy spent for the computation only, without the contribution of memory traffic. Concretely, each kernel execution performs as many arithmetic operations (addition, multiplication or FMA) as needed for one complete Atari Pong game. To increase accuracy, each measurement combines 10,000 kernel executions. As before, we report the median of 100 multiexecutions and estimate the 99% CI. The energy consumption per flop for the weight multiplication correspond to approximately 5.9 and 9.5 pJ/flop for half and singleprecision. This is in the same ballpark as measurements provided by NVIDIA and independent testing of the GPU’s floating point unit (FPU)^{66,67}, which validates our measurements. By comparing the GPU (compute) results with the GPU (optimal) we observe that the memory traffic accounts for more than 98% of the GPU’s total energy consumption. This result shows the remarkable energy efficiency of the GPU’s FPU and the benefit of reducing memory traffic. Note, however, that this particular GPU implementation would not be useful in practice, because the results of the kernel’s computations are not accessible via the memory and can therefore not be used by a program running on the GPU. For this reason the highest efficiency GPU benchmark that corresponds to a working implementation is the fp16 energy measurement in the GPU (optimal) row.
(III) For the GPU energy consumption of a single synapse shown in Fig. 4g we made use of the energy measurements per operation in the fp32 case of the GPU (standard) row in Table 2. It should be noted that the energy values in Table 2 were computed by first measuring the energy consumed by all synapses of the network in parallel and then divided by the number of synapses. This ensures that the massive parallelism of GPU’s is utilized, although we’re only interested in the energy consumption of a single synapse. The energy contributions per operation were then cumulatively summed for all time steps to obtain the timeseries GPU data in Fig. 4g.
We note that the kernels utilize the GPU’s regular FP cores rather than the tensor cores because the operations (W+F, Decay, ΔF and weight multiplication) do not compute matrixmatrix products.
The specifications of our test system are:
Hardware:

GPU: NVIDIA A100 with 40GB memory

CPU: 2x AMD EPYC 7742 @ 2.25 Ghz (2 x 64/128 Physical/Logical Cores)

RAM: 512 GB
Software:

Rocky Linux release 8.4

Python 3.11.5

Pytorch 2.2.0 dev20230913

CUDA 12.1.1
Data availability
Source data is available from the corresponding author on request.
Code availability
The repository with the mSTPN network source code can be found here^{68}: https://bitbucket.org/weilenmc/stpn/src/publication/ The GPU energy calculations are available here^{69}: https://github.com/NanoTCAD/SpikeDecay.
References
Markram, H., Gerstner, W. & Sjöström, P. J. Spiketimingdependent plasticity: a comprehensive overview. Front. Synaptic Neurosci. 4, 2 (2012).
Erickson, M. A., Maramara, L. A. & Lisman, J. A single brief burst induces GluR1dependent associative shortterm potentiation: a potential mechanism for shortterm memory. J. Cogn. Neurosci. 22, 2530–2540 (2010).
Zucker, R. S. & Regehr, W. G. Shortterm synaptic plasticity. Annu. Rev. Physiol. 64, 355–405 (2002).
Wang, Y. et al. Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nat. Neurosci. 9, 534–542 (2006).
Abraham, W. C. & Bear, M. F. Metaplasticity: the plasticity of synaptic plasticity. Trends Neurosci. 19, 126–130 (1996).
Barrett, A. B., Billings, G. O., Morris, R. G. & Van Rossum, M. C. State based model of longterm potentiation and synaptic tagging and capture. PLoS Comput. Biol. 5, 1–12 (2009).
Finn, C., Abbeel, P. & Levine, S. Modelagnostic metalearning for fast adaptation of deep networks. International Conference on Machine Learning 1126–1135 (2017).
Miconi, T., Stanley, K. & Clune, J. Differentiable plasticity: training plastic neural networks with backpropagation. International Conference on Machine Learning 3559–3568 (2018).
Miconi, T., Rawal, A., Clune, J. & Stanley, K. O. Backpropamine: training selfmodifying neural networks with differentiable neuromodulated plasticity https://arxiv.org/abs/2002.10585 (2020).
Tyulmankov, D., Yang, G. R. & Abbott, L. F. Metalearning synaptic plasticity and memory addressing for continual familiarity detection. Neuron 110, 544–557 (2022).
Najarro, E. & Risi, S. Metalearning through hebbian plasticity in random networks. Adv. Neural Inf. Process. Syst. 33, 20719–20731 (2020).
Hospedales, T., Antoniou, A., Micaelli, P. & Storkey, A. Metalearning in neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5149–5169 (2022).
Nadim, F. & Manor, Y. The role of shortterm synaptic dynamics in motor control. Curr. Opin. Neurobiol. 10, 683–690 (2000).
Citri, A. & Malenka, R. C. Synaptic plasticity: multiple forms, functions, and mechanisms. Neuropsychopharmacology 33, 18–41 (2008).
Shimizu, G., Yoshida, K., Kasai, H. & Toyoizumi, T. Computational roles of intrinsic synaptic dynamics. Curr. Opin. Neurobiol. 70, 34–42 (2021).
Zador, A. et al. Catalyzing nextgeneration artificial intelligence through neuroai. Nat. Commun. 14, 1597 (2023).
Canziani, A., Paszke, A. & Culurciello, E. An analysis of deep neural network models for practical applications http://arxiv.org/abs/1605.07678 (2016).
Patterson, D. et al. Carbon emissions and large neural network training http://arxiv.org/abs/2104.10350 (2021).
Moraitis, T., Sebastian, A. & Eleftheriou, E. Shortterm synaptic plasticity optimally models continuous environments http://arxiv.org/abs/2009.06808 (2020).
Rodriguez, H. G., Guo, Q. & Moraitis, T. Shortterm plasticity neurons learning to learn and forget. Proc. 39th Int. Conf. Mach. Learn. 162, 18704–18722 (2022).
Xu, X. et al. Scaling for edge inference of deep neural networks. Nat. Electron. 1, 216–222 (2018).
Yu, S. Neuroinspired computing with emerging nonvolatile memorys. Proc. IEEE 106, 260–285 (2018).
Sebastian, A., Le Gallo, M., KhaddamAljameh, R. & Eleftheriou, E. Memory devices and applications for inmemory computing. Nat. Nanotechnol. 15, 529–544 (2020).
Jo, S. H. et al. Nanoscale memristor device as synapse in neuromorphic systems. Nano Lett. 10, 1297–1301 (2010).
Waser, R. Nanoelectronics and Information Technology (John Wiley and Sons, 2012).
Emboras, A. et al. Optoelectronic memristors: prospects and challenges in neuromorphic computing. Appl. Phys. Lett. 117, 230502 (2020).
Portner, K. et al. Analog nanoscale electrooptical synapses for neuromorphic computing applications. ACS Nano 15, 14776–14785 (2021).
Kumar, S., Wang, X., Strachan, J. P., Yang, Y. & Lu, W. D. Dynamical memristors for highercomplexity neuromorphic computing. Nat. Rev. Mater. 7, 575–591 (2022).
Demirağ, Y. et al. Pcmtrace: scalable synaptic eligibility traces with resistivity drift of phasechange materials. 2021 IEEE International Symposium on Circuits and Systems (ISCAS) 1–5 (2021).
Yang, R., Huang, H. M. & Guo, X. Memristive synapses and neurons for bioinspired computing. Adv. Electron. Mater. 5, 1–32 (2019).
Choi, S., Yang, J. & Wang, G. Emerging memristive artificial synapses and neurons for energyefficient neuromorphic computing. Adv. Mater. 32, 1–26 (2020).
Yang, R. et al. Synaptic suppression tripletSTDP learning rule realized in secondorder memristors. Adv. Funct. Mater. 28, 1–10 (2018).
Xiong, J. et al. Bienenstock, cooper, and munro learning rules realized in secondorder memristors with tunable forgetting rate. Adv. Funct. Mater. 29, 1–8 (2019).
Pfeiffer, M. & Pfeil, T. Deep learning with spiking neurons: opportunities and challenges. Front. Neurosci. 12, 409662 (2018).
Sarwat, S. G., Kersting, B., Moraitis, T., Jonnalagadda, V. P. & Sebastian, A. Phasechange memtransistive synapses for mixedplasticity neural computations. Nat. Nanotechnol. 17, 507–513 (2022).
Regina Dittmann, S. M. & Waser, R. Nanoionic memristive phenomena in metal oxides: the valence change mechanism. Adv. Phys. 70, 155–349 (2021).
Li, Y. et al. Filamentfree bulk resistive memory enables deterministic analogue switching. Adv. Mater. 32, 2003984 (2020).
CruzAlbrecht, J. M., Yung, M. W. & Srinivasa, N. Energyefficient neuron, synapse and STDP integrated circuits. IEEE Trans. Biomed. Circuits Syst. 6, 246–256 (2012).
Joubert, A., Belhadj, B., Temam, O. & Héliot, R. Hardware spiking neurons design: Analog or digital? The 2012 International Joint Conference on Neural Networks (IJCNN) 1–5 (2012).
Gopalakrishnan, R. & Basu, A. Triplet spike timedependent plasticity in a floatinggate synapse. IEEE Trans. Neural Netw. Learn. Syst. 28, 778–790 (2015).
Jiang, W. et al. Mobility of oxygen vacancy in SrTiO_{3} and its implications for oxygenmigrationbased resistance switching. J. Appl. Phys. 110, 034509 (2011).
Cooper, D. et al. Anomalous resistance hysteresis in oxide ReRAM: oxygen evolution and reincorporation revealed by in situ TEM. Adv. Mater. 29, 1–8 (2017).
Gwon, M., Lee, E., Sohn, A., Bourim, E. M. & Kim, D. W. Dopinglevel dependences of switching speeds and the retention characteristics of resistive switching Pt/SrTiO_{3} junctions. J. Korean Phys. Soc. 57, 1432–1436 (2010).
Goossens, A. S. & Banerjee, T. Tunability of voltage pulse mediated memristive functionality by varying doping concentration in SrTiO3. Appl. Phys. Lett. 122, 034101 (2023).
Rana, K. G., Khikhlovskyi, V. & Banerjee, T. Electrical transport across Au/Nb:SrTiO_{3} Schottky interface with different Nb doping. Appl. Phys. Lett. 100, 1–4 (2012).
Park, C., Seo, Y., Jung, J. & Kim, D. W. Electrodedependent electrical properties of metal/Nbdoped SrTiO_{3} junctions. J. Appl. Phys. 103, 054106 (2008).
Hensling, F. V., Heisig, T., Raab, N., Baeumer, C. & Dittmann, R. Tailoring the switching performance of resistive switching SrTiO3 devices by SrO interface engineering. Solid State Ion. 325, 247–250 (2018).
Muenstermann, R., Menke, T., Dittmann, R. & Waser, R. Coexistence of filamentary and homogeneous resistive switching in Fedoped SrTiO_{3} thinfilm memristive devices. Adv. Mater. 22, 4819–4822 (2010).
Baeumer, C. et al. Quantifying redoxinduced schottky barrier variations in memristive devices via in operando spectromicroscopy with graphene electrodes. Nat. Commun. 7, 12398 (2016).
Menzel, S. & Waser, R. Mechanism of memristive switching in oxram. Advances in NonVolatile Memory and Storage Technology (2nd Edition) 137–170 (2019).
Siegel, S. et al. Tradeoff between data retention and switching speed in resistive switching reram devices. Adv. Electron. Mater. 7, 2000815 (2021).
Zurhelle, A. F. Modeling the oxygen transport at heterointerfaces for oxidebased electronics. Ph.D. thesis, RheinischWestfälische Technische Hochschule Aachen (2023).
Heisig, T. et al. Oxygen exchange processes between oxide memristive devices and water molecules. Adv. Mater. 30, 1–7 (2018).
Mikheev, E., Hoskins, B. D., Strukov, D. B. & Stemmer, S. Resistive switching and its suppression in Pt/Nb:SrTiO_{3} junctions. Nat. Commun. 5, 3990 (2014).
Valov, I. & Tsuruoka, T. Effects of moisture and redox reactions in VCM and ECM resistive switching memories. J. Phys. D Appl. Phys. 51, 413001 (2018).
Kreuer, K. D. Aspects of the formation and mobility of protonic charge carriers and the stability of perovskitetype oxides. Solid State Ion. 125, 285–302 (1999).
Sata, N., Hiramoto, K., Ishigame, M. & Hosoya, S. Site identification of protons in SrTiO_{3}: Mechanism for large protonic conduction. Phys. Rev. B 54, 15795–15799 (1996).
Neftci, E. O. & Averbeck, B. B. Reinforcement learning in artificial and biological systems. Nat. Mach. Intell. 1, 133–143 (2019).
Aguirre, F. et al. Hardware implementation of memristorbased artificial neural networks. Nat. Commun. 15, 1974 (2024).
Speier, W., Szot, K. & Karthaeuser, S. Verfahren zur Herstellung einer Bterminierten Oberfläche auf PerowskitEinkristallen. German Patent No. DE200410019690 (2005).
Li, Y. et al. Nanoscale chemical and valence evolution at the metal/oxide interface: a case study of Ti/SrTiO_{3}. Adv. Mater. Interfaces 3, 1–8 (2016).
La Mattina, F., Bednorz, J. G., Alvarado, S. F., Shengelaya, A. & Keller, H. Detection of charge transfer processes in Crdoped SrTiO_{3} single crystals. Appl. Phys. Lett. 93, 022102 (2008).
Liang, E. et al. RLlib: Abstractions for distributed reinforcement learning. Proc. 35th Int. Conf. Mach. Learn. 80, 3053–3062 (2018).
Mnih, V. et al. Asynchronous methods for deep reinforcement learning. Proc. 33rd Int. Conf. Mach. Learn. 48, 1928–1937 (2016).
Belgaid, M. C., Rouvoy, R. & Seinturier, L. Pyjoules: Python library that measures python code snippets https://github.com/poweraping/pyJoules (2019).
Dally, B. The path to exascale computing https://images.nvidia.com/events/sc15/pdfs/SC5102pathexascalecomputing.pdf (2015).
Bhalachandra, S., Austin, B., Williams, S. & Wright, N. J. Understanding the impact of input entropy on fpu, cpu, and gpu power https://arxiv.org/abs/2212.08805 (2022).
Weilenmann, C. Single neuromorphic memristor closely emulates multiple synaptic mechanisms for energy efficient neural networks https://doi.org/10.5281/zenodo.12685701 (2024).
Ziogas, A. & Weilenmann, C. Single neuromorphic memristor closely emulates multiple synaptic mechanisms for energy efficient neural networks https://doi.org/10.5281/zenodo.12685560 (2024).
Acknowledgements
We would like to thank the Operations Team of the Binnig and Rohrer Nanotechnology Center, especially Antonis Olziersky, Roland Germann, Ute Drechsler, and Diana Davila for the generous sharing of their immense fabrication knowledge. We would also like to thank Johannes Hellwig for sharing his insights on the STO switching mechanism, Dhananjeya Kumaar for inspiring the title of this work, and Hector Rodriguez for helping us understand the STPN code better. T.M. was with Huawei Technologies, Zurich Research Center when the authors initially agreed to collaborate. Funding from the Werner Siemens Foundation (A.E., M.L., and M.M.), the SNSF Strategic JapaneseSwiss Science and Technology Program under project metacross (grant number 214068, C.W. and A.E.), the SNSF Sinergia project ALMOND (grant number 198612, M.K., M.L., and T.Z.), and the SNSF Advanced Grant project QuaTrEx (grant number 209358, A.N.Z. and M.L.) is acknowledged. Finally, this work used computational resources from the Swiss National Supercomputing Center (CSCS) under project s1119 (M.K., M.L., and M.M.).
Author information
Authors and Affiliations
Contributions
C.W. developed the concept of the paper, fabricated and measured devices, implemented, tested and trained the neural network. C.W. wrote the paper with input from all the authors. A.Z. performed the measurement of the GPU energy consumption. T.Z. developed the characterization setup and helped with device measurements. K.P. assisted with fabrication. M.M. and M.K. provided feedback on the theoretical device operation principle. T.M. wrote part of the abstract and introduction and gave guidance on the original version of the STPN network. M.L. supervised the project and helped on the structuring of the paper. A.E. supervised the project and led the study with inputs on numerous topics including the fabrication/characterization of the devices and the writing of the paper. C.W. and A.E. conceived the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Yiyu Shi, Gaokuo Zhong and Ilia Valov for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License, which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/byncnd/4.0/.
About this article
Cite this article
Weilenmann, C., Ziogas, A.N., Zellweger, T. et al. Single neuromorphic memristor closely emulates multiple synaptic mechanisms for energy efficient neural networks. Nat Commun 15, 6898 (2024). https://doi.org/10.1038/s41467024510933
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467024510933
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.