Stateful Three-Input Logic with Memristive Switches

Memristive switches are able to act as both storage and computing elements, which make them an excellent candidate for beyond-CMOS computing. In this paper, multi-input memristive switch logic is proposed, which enables the function X OR (Y NOR Z) to be performed in a single-step with three memristive switches. This ORNOR logic gate increases the capabilities of memristive switches, improving the overall system efficiency of a memristive switch-based computing architecture. Additionally, a computing system architecture and clocking scheme are proposed to further utilize memristive switching for computation. The system architecture is based on a design where multiple computational function blocks are interconnected and controlled by a master clock that synchronizes system data processing and transfer. The clocking steps to perform a full adder with the ORNOR gate are presented along with simulation results using a physics-based model. The full adder function block is integrated into the system architecture to realize a 64-bit full adder, which is also demonstrated through simulation.

and the resistance and as output the state representation. In contrast, the MAGIC and IMPLY logic families use only the device resistance as inputs and output and are thus stateful logic families.
To advance beyond functional completeness toward a broader logical computation structure, recent work demonstrated the capability of memristive switches to implement adder circuits based on the non-stateful CRS approach 26,27 , and the stateful MAGIC 32 and IMPLY logic approaches 23,33,34 . Experimentally, an adder based on the CRS logic has been demonstrated using bipolar memristive devices 35 . The functionality of a MAGIC adder has been shown with organic unipolar switching devices 36 . Next to the demonstration of the IMPLY logic in the proposing publication 17 additional publications presented experimental studies for this approaches 37,38 . All of these adders require a certain number of devices and steps to perform a certain operation, e.g. a 1-bit addition. As memristive switches do not offer an unlimited endurance, reducing the number of steps required for the targeted operation will aid in the advancement of memristive switches as a promising computing technology.
In this paper, we extend the functionality of the IMPLY logic by implementing stateful logic with three devices simultaneously. Three devices are shown to execute the function + + X (Y Z), which will be called the ORNOR gate. We show that this function can aid in reducing the number of steps and improving efficiency in memristive computing. This concept is validated using a physics-based simulation model, which has been fitted to experimental data. This modeling approach enables us to identify possible limitations of the logic approach. The analysis of these limitations paves the way for deducing device and circuit requirements.
Results oRnoR gate. The proposed ORNOR gate can be regarded as an extension of the IMPLY gate proposed by Borghetti et al. 17 . In an IMPLY gate, two memristive switches, P and Q are connected via a common node over a resistance to ground. Two different voltages V Set and V Cond are applied to Q and P, where V Cond is not high enough to set P, but V Set can set Q in the specific cycle time. By applying these voltages at the same time, the potential at the shared connection is rising depending on the states of the devices. Thus, the voltage drop over Q can be lowered depending on the state of P, so that Q does not switch (see also Supplementary Information). The ORNOR gate, in contrast, uses three memristive switches X, Y, and Z as depicted in Fig. 1a. The common line connecting the memristive devices with the resistance R G is referred to as wordline in the following. The memristive devices X, Y and Z are contacted via the bitlines 2, 1 and 0, respectively. The conditional voltage V Cond is now applied to two devices (Y and Z) and V Set is applied to the third device (X). Only the device with V Set applied (X) can change states from the HRS to the LRS, but now, two memristive switches (Y and Z) determine the final state of X. The voltage at the V RG node is near to V Cond relative to ground when either device Y or Z is in the LRS state. In this case, the voltage applied across device X is V Set -V Cond and therefore does not switch the binary state of X to the LRS. In the case where both Y and Z are in the HRS, the effective voltage at V RG is nearly GND, as there is just a tiny current flow through resistor R G . In this scenario, the voltage across device X is equal to V Set , which is   The simulation of the critical cases of the ORNOR gate are depicted. Here, the subscripts X, Y, Z are indicating the correlation of the applied voltages, currents or state variables to the devices X, Y and Z, respectively. In step 1-3 the initialization process of the ORNOR gate is shown by writing the three inputs to the devices X (blue), Y (light green) and Z (yellow). In step 4 (red) the ORNOR operation is executed, which is followed by three verifying read-out steps (dark green). If a high current is detected the read-out value is a 1, whereas a low current is a 0. Row 1/2/3: voltage applied to the Bitline 2/Bitline 1/Bitline 0. Row 4: potential at the wordline. Row 5/6/7: current at Bitline 2/Bitline 1/Bitline 0. Row 8/9/10: state variable of device X/Y/Z. The scale is changed for small state variable values.
sufficiently large to change the state of X from the HRS to the LRS. The truth table of this circuit for different inputs (X, Y, and Z) is shown in Fig. 1b. This table can be simplified to the function ′ = + + X X (Y Z), which will be referred to as the ORNOR gate, as the function is stated as X OR (Y NOR Z). The output of the function is the state of X after the voltages are applied, written as X' .
To validate the function of the ORNOR gate circuit simulations are performed. For the memristive elements, we used a physics-based simulation model, which is described in detail in the Methods section. The model was fitted to experimental data of a Pt/Ta 2 O 5 /Ta VCM device (see Supplementary Information) and it fulfills the six fundamental criteria required to model VCM devices 39,40 , among which the nonlinear switching kinetics is the most important one. It means that the device will switch upon application of a non-zero voltage in a finite time. It depends, however, on the voltage magnitude how fast the switching will occur. Due to the involved physics, the switching time is a highly nonlinear function of the applied voltage 41 . Since the fastest switching occurs at higher voltages and the target voltages of this application are in the higher voltage range, the fit was chosen to be more accurate in this range.
The simulations of the two critical cases as described below are shown in Fig. 1d. To perform the ORNOR operation, the memristive devices are first initialized to the designated inputs (X blue, Y green and Z yellow), starting in an HRS (steps 1-3). To this end, zero volts are applied to the wordline and the desired inputs to the bitlines. Then, the ORNOR operation (red) is performed in step 4 and afterwards verified by the read-out steps 5-7. The last three rows in Fig. 1d present the state variables of the devices X, Y and Z. Tracking the state variable allows us to observe small state changes, which are hard to detect in the read-out current. Note that the scale is changed for small state variable values.
The critical case 1 (X = 0, Y = 0 and Z = 0) is the only one in which device X switches. Thus, it determines the minimum cycle time t C . During operation an unwanted state drift occurs. The devices Y and Z, to which V Cond is applied, show a small state drift. This state drift is independent of the cycle time since it stops as soon as device X becomes sufficiently low resistive and in turn the potential at V RG is high enough. Consequently, the voltage drop over devices Y and Z decreases.
The critical case 2 is X = 0, Y = 0 and Z = 1 and has the same behavior as the case X = 0, Y = 1 and Z = 0. Here, a cycle time dependent state drift of device X is present, since V RG is not increased sufficiently by the low ohmic connection of device Z to prevent this drift.
The case X = 0, Y = 1 and Z = 1 is not critical, since the potential V RG is even closer to V Cond as both devices Y and Z are low ohmic. Thus, the voltage drop over device X decreases and the switching process slows down.
The observed state drift is a direct consequence of the nonlinear switching kinetics of VCM devices, which are included in our device model. Due to a smaller voltage drop over the devices (here in a range of 0.8 V) during the operation the switching process starts in unselected cells 42 . As the switching process is slower in this regime according to the switching kinetics (Fig. 2b), only a small state drift is observed. To alleviate this problem a device with steeper kinetic could be chosen 43 . This effect imposes an additional circuit design constraint on the circuit parameter, i.e. the applied voltages, the timing and the series resistance. The used parameters listed in Fig. 1c were optimized to minimize the drifts and cycle time for performing the ORNOR function.
computing system. In the proposed system architecture, arrays of memristive switches are dedicated to performing a specific function. The memristive switches have clock signals applied to their bitlines, as in Fig. 2a. These clocks encode the data for a specific function and drive the memristive switches to compute this function in a serial manner. These arrays are defined as function blocks. An example of one of these function blocks is shown on the left in Fig. 2a, where six memristive switches are set up to compose a full adder. This function block is one of many function blocks that ultimately comprise a computing system based on memristive switches for complex computations. Each of the function blocks repeatedly performs the defined function enabling pipelining 44 . One advantage to this system is that multiple function blocks can be driven by the same set of clocks, thereby performing parallel processing without much overhead in area. Data can be transferred from one function block to another function block for additional computation, or each function block could be reconfigured to perform a different function by changing the clock set. A master clock controls how the various function blocks are synchronized. As shown in Fig. 2a, there is a transistor that gates connections between the common node of one function block and the common node of another function block. Additionally, there are transistors that control the connection of the C0 and C1 memristive switches to this common node, which are specific to the full adder function. The additional transistors enable the transfer of data between function blocks, which use a common clock set for all full adder function blocks. In this computing system architecture, a full adder circuit is realized. The N-bit full adder circuit proposed in this paper is optimized to exploit the ORNOR function, and requires 6•(N bit + 1) memristive switches. Due to the doubling of the most significant bit to ensure a correct result for a two's complement addition, an extra full adder functional block (6 devices) is needed. The full adder circuit is realized here with a common transistor at the wordline (see Fig. 2b) instead of a resistor R G as in the ORNOR gate (see Fig. 1a). This provides flexibility for using different functions on this block, as the conductance of the transistor can be tuned according to the performed functions (see also Supplementary Information). Figure 2c depicts a 2-bit adder circuit, which is composed of three full adder circuits with common transistor, as an example of a component of the 64-bit adder circuit. The 2-bit adder circuit includes parasitic line resistances R P and parasitic capacitances C P . Here, the parasitic capacitances C p between the wordlines are not shown due to readability, but are considered later in the simulations.
To transfer the data from one functional block to another a COPY operation needs to be implemented. During this operation, the data transfer gate drive is set to a conductive state and thus it connects two functional blocks to transfer the data. By performing an IMP operation with two devices, one of each block, the data is transferred to the other block. To copy data of C1 WL1 to C0 WL2 in Fig. 2c, the voltage scheme highlighted in color needs to be applied. Since the applied voltages at the bitlines are always applied to all functional blocks, if they are sharing one (2019) 9:14618 | https://doi.org/10.1038/s41598-019-51039-6 www.nature.com/scientificreports www.nature.com/scientificreports/ clock set, selecting transistors must be added to the bitlines that are involved in the COPY operation. Otherwise, V Set would be applied to two devices, which share the same bitline and the connected wordlines. In the same way, V Cond is applied to two devices sharing another bitline. Thus, no COPY operation would be achieved. By adding the transistors to the circuit, only two devices on the active bitlines can be chosen to be connected to the wordlines by setting the selecting (V GWL1BL6 and V GWL2BL5 ) transistors to a conductive state, here by applying a high voltage V Tr to the gates.
For implementing a functionally complete stateful logic system in this architecture, a FALSE operation is required, to switch the memristive devices to 0. When performing the function ORNOR(X, Y, Z) or IMP(P, Q), X and Q are the only memristive switches whose state can be changed. It can be observed from the truth tables that there is no set of inputs that causes the memristive switches X or Q (to which V Set have been applied) to change from 1 to 0. Without the FALSE operation, all the memristive switches will eventually be changed to the 1 state, preventing further computation. Since the used memristive switches are bipolar switching devices, a voltage with the opposite polarity of the SET operation needs to be applied to reset the device to the HRS state. To this end, a RESET voltage V Reset is applied to the wordline while the bitlines are set either to GND or to V Pro in order to reset the device or keep its information. To reset device A in Fig. 2b, the voltage scheme illustrated in purple is applied to the respective terminals.
The full adder implementation proposed here takes 17 steps to perform a non-K2 one-bit addition, as described in Fig. 3. The step count assumes that setting the initial input values and the final readout are not part of the actual implementation of the function, which is consistent with previous research for a standardized analysis and comparison 23 . These steps are labelled as "-" in Fig. 3 and were required for the simulation. The functions are applied serially to perform the complete full adder function, where data for each step is encoded into the clocks that are applied to the memristive switches. The third column shows the operation of each step, where the memristive switches used in the operations are in parentheses. The outcome of the operation is shown in the column associated with each memristive switch. For example, in step 1, a FALSE operation is applied to the devices M1, S1, C0, and C1. Therefore, the output of each of these memristive switches is shown in their respective columns, where each device state is set to 0.
In general, A, B, and C0 are the memristive switches into which data is loaded, representing the standard A, B, and carry-in for a full adder. Before the execution of the function, data is initially loaded into A and B from another function block using a COPY function and a data transfer transistor. The carry-in is loaded into the C0 memristive switch in step 8. C1 contains the carry-out data of the function block array, and once the computation is complete, S contains the calculated sum. The M1 memristive switch is an additional supporting device. The additional transistors connected to the wordlines of C0 and C1 allow the carry-out of one stage to be shifted to the carry-in of the next stage in a multi-bit chain similar to how a shift register propagates data along a serial chain.
One-Bit Full Adder Function Block www.nature.com/scientificreports www.nature.com/scientificreports/ Although the memristive switches are defined as inputs or outputs to aid in the explanation of the full adder, every memristive switch can act both as an input or an output based on how they are used.
As shown in Fig. 3, with only two steps (9 and 10) of delay between each successive bit of a full-adder, there is increased parallel processing and therefore increased overall efficiency. Two different processing schemes can be realized. The addition is done bit by bit as illustrated in Fig. 4a. In this scheme, the individual bits are processed with an offset of two steps. This scheme, however, requires a unique clock for each memristive switch. In Fig. 4b, the use of the same clock set is coordinated such that the first eight steps and last seven steps are driven between all function blocks simultaneously. This reduces the overall number of drivers required for multi-bit addition.    In the independent clocking scheme, each functional block has an independent set of clocks, which can be applied at any time to perform the function. In this way, each functional block is aligned so that there is no idle time, however, an increased number of clocks and drivers are required. Approach (b) uses the same set of clock signals to drive all memristive switches for the multi-bit addition. The first eight steps and last seven steps are driven across all function blocks simultaneously. This approach is able to compute the multi-bit full addition in the same number of steps while significantly reducing the number of clock signals required due to parallel control of multiple function blocks.
www.nature.com/scientificreports www.nature.com/scientificreports/ Simulations. The proposed one-bit adder circuit with parallel clocking scheme is simulated using the model and the model parameters described in the Methods section, the circuit parameters given in Fig. 5a, and the clock scheme introduced in Fig. 3. Using the two's complement the addition of A = −1 and B = −1 is conducted. To secure a valid result the most significant bit is doubled. The applied voltages are depicted in Fig. 5b. Figure 5c shows the resulting terminal voltages, the change of the state variable of the individual bits during the computation and the currents on the wordlines WL1 and WL2. In the last step of the simulation, the result is read out by applying a small voltage to BL4 (connected to the S devices). For WL1 the detected wordline current is below 1 μA, resulting to a read 0, whereas the detected wordline current for WL2 is above 5 μA and thus interpreted as a 1. This means that the result of (1) 2 + (1) 2 = (10) 2 , which verifies the functionality.
Next, the implementation and verification of a 64-Bit-Adder using the parallel clocking scheme is demonstrated. To ensure proper operation, the worst case in the matter of drift needs to be found first. Since the first eight steps and the last seven steps are executed in parallel, these steps do not differ from the one-bit adder. For multibit operations, however, the steps 9-10 are executed many times. In both steps, the lines BL5 and BL6 are active, but the select transistors only address the required two devices. As the active devices change each repetition in the carry propagation, no drift is expected in these devices. In step 9, BL4 is active, too. The devices connected to BL4 do not have a selector. Thus, a state drift is possible and this effect determines the maximum length of the addition. By means of simulation the worst case is found to be the operation B -A, with A = 0 and B = 0. Figure 6a shows the state variable transition of S WL65 for the worst-case operation. Here, a small drift is visible, but the calculated result is still valid. While the result is correct, the state variable of M1 WL65 is misbehaving as shown in Fig. 6b. Before the carry propagation phase begins, the device M1 WL65 should have switched to the LRS (N max ) but the state variable does not reach N max . The reason can be found directly in the applied voltage as the potential on the bitline does not reach V Set anymore and thus slows down the switching process. By using a shorter cycle time, the final value of the state will be even lower and eventually the carry would be interpreted as a 0, leading to a malfunction. To ensure proper operation, a cycle time of 250 ns is used here. Extending the cycle time even more to enable a complete switching of M1 WL65 to N max would lead to state drift in other memristive devices.

Discussion
Previous approaches to calculating a 'stateful' full adder focused on solely IMP and FALSE operations, which resets the devices to the HRS 33 , while others 23 extended this idea by utilizing the XOR operation in both serial and parallel optimized approaches. Table 1 lists the cycle steps and amount of memristive switches as stated in the original papers (if stated). As the adders in the referenced papers use varying methodologies, or count with or without input and output memristive switches, the given quantities have some ambiguity. This ambiguity becomes less important for a large number N. In this case, the proposed adder can reduce the needed steps by about 60%. Like in CMOS there is a tradeoff between area (here the amount of memristive switches) and time (here the steps). The amount of memristive switches can be reduced by reducing the parallelism of the algorithm. The number of steps and devices are also a hint towards the power consumption. If the array needs to be bigger (higher number of memristive switches), higher parasitic charging costs and a higher number of sneak paths need to be assumed. Moreover, if more steps are needed to achieve the functionality, more operations with higher voltages are executed on the array and thus the power consumption increases. More details on the energy consumptions are given in the Supplementary Information.
Whereas previous work on stateful memristor logic has proposed flexible functionality with enormous overhead control circuit costs 45 , this proposed system architecture employs a parallel clocking scheme that trades functional flexibility for a drastic reduction in the overhead circuit footprint. There is a fundamental relationship between functional flexibility and overhead circuit cost, as functional flexibility requires additional control signals that must be generated by the control circuit. The total transistor count N TR of the stateful memristor logic control circuit is given by 45 where S is the number of steps required performing a particular function, X is the number of memristive switches in the circuit, and T is the number of select transistors included for functional flexibility. As this overhead circuit cost is quite significant, the proposed system architecture minimizes the overhead circuit footprint by using each control signal to drive a large number of memristive switches and transistors in parallel (Fig. 4b). The decrease in the required number of steps resulting from the use of multi-input memristor logic, in concert with this parallel clocking of function blocks, therefore provide significant improvements to the efficiency of the control circuit and of the system as a whole. The proposed computing system makes use of the ORNOR gate. The performance improvement relative to using only IMPLY gates is related to the fact that the ORNOR gate is a three input logic gate. The potential of using multi-input gates with three or more memristive devices have been described before [23][24][25]46 , but the limitations of such circuits could not be addressed partly due to the lack of physics-based simulation models. To allow for multi-input gates, the resistor R G needs to be chosen properly. It can be scaled with the number of inputs as proposed in literature 23,46 . In this case, the connection to ground becomes less resistive with each additional input, thus increasingly influencing the switching dynamics of the circuits. If the system is to enable functionality with a wide range in the number of inputs, additional complex periphery circuits must be added due to the scaling of R G . A second option is to optimize R G to enable proper functionality for the two-and three-input gates. Using the simulation model described in the Methods section, we investigated the functionality of multi-input gates for the two different choices of R G . Figure 7 depicts the simulation results of the slowest desired (red) and fastest erroneous (blue) switching times of gates with various number of inputs. For this study, gates of n-inputs were simulated, where the two-input gate www.nature.com/scientificreports www.nature.com/scientificreports/ resembles an IMPLY gate and the three-input gate without scaling of R G is the proposed ORNOR gate. Thus, an n-input gate includes n memristive switches. Here, n-1 of these switches are connected to V Cond whereas always exactly one is connected to V Set . If R G is scaled, n-1 parallel R G s are assumed in the circuit. If R G is not scaled,   www.nature.com/scientificreports www.nature.com/scientificreports/ no additional resistances are added to the circuit. The two worst cases for desired and erroneous switching are simulated for these circuits by applying V Set and V Cond as constant voltages. Then the switching time of the device connected to V Set was analyzed, since it shows the fastest desired and the fastest erroneous switching. The slowest desired switching appears in the case that all devices are in the HRS. The fastest erroneous switching appears if only one device connected to V Cond is in the LRS. The limit is set to the slowest desired switching process. All erroneous switching processes must be slower than this limit; hence, all desired switching process need to be completed before an erroneous switching process occurs. Therefore, in both cases the two-input device gates cannot be used in the same circuit with the same voltages and clock period as six-input circuits. This analysis also enables a rough estimation of how many operations can be conducted without additional refreshes. Depending on the minimum time interval between the slowest desired and fastest erroneous switching process, more or fewer sequential steps can operate on the same data without intermediate refresh cycles. This study also represents the   Figure 7. Simulation of the two multi-input approaches with (left) and without (right) R G scaling. In both plots, two switching times of the device connected to V Set are depicted. In the red case, all devices are in the HRS, and the proper desired functionality is that the device should switch (desired switching). The blue points depicts the switching time if one of the devices connected to V Cond is in the LRS. In this case, the device should not switch (erroneous switching). The limit is chosen as the slowest of the desired switching processes; all erroneous switching processes must be slower than this limit. stability of this logic approach against variability of the resistance states, since the connection to ground as well as the resistances to V Cond are varied over multiple times. As it is depicted in Fig. 7, the scheme is still functioning for reasonable variations. Figure 7 also depicts the strong influence of R G , as the desired switching process without scaling of R G becomes slower with a higher number of inputs, whereas this process gets faster with increasing number of inputs, if R G is scaled. The results presented here are highly dependent on the device characteristic and the circuit parameters. The erroneous switching event is a consequence of the nonlinear switching kinetics of the memristive device. As the device will switch under non-zero voltage input in a finite amount of time, erroneous switching events cannot be avoided completely. Instead, the circuit parameters have to be chosen accordingly. These design constraints can be only deduced when using proper physics-based memristive device models as in this study. In this regard, the circuit parameters must be chosen in concert to ensure that the desired switching process is faster than the erroneous switching process. A desired switching speed can be chosen first, which enables determination of the minimum switching voltage with contemplation of the kinetic characteristic. The set voltage must be higher than this voltage, as there is also a voltage drop over R G ; but this V Set must not be too high in order to prevent the device from switching faster in the erroneous switching cases. V Cond must also be chosen carefully, as a too high value will cause drift in the devices connected to V Cond in the desired switching case, while a too low value causes faster drift of the target device in the erroneous switching case. As can be seen in Fig. 7, a high R G value causes a larger voltage drop and slower switching in all cases; a small R G value speeds up all switching processes. The optimal circuit parameter can be found by maximizing the time window between the slowest desired and fastest erroneous switching processes.
Moreover, the nonlinear switching dynamics also have to be considered for the cells that do not take part actively in the functional operation. In arrays, a protection voltage V Pro that is applied to the unselected devices is required 47 . Depending on the input cases and the states of the unselected devices, V Pro has a huge impact on V RG and so influences the speed of the operation and the unwanted state drift of the active device. Hence, V Pro is also a parameter that needs to be optimized to achieve the best performance.
The 64-Bit-Adder simulation shows, that the desired state of N max is not reached (Fig. 6b), but the resulting resistance of the device is nearly unchanged. Hence, the results are still valid. Here, the voltage levels V Set and V Cond as well as V Pro are reduced compared to the optimal values, which are applied by drivers at the one end of the bl near to wl 0 . Since parasitic elements of the wls and bls are included in the simulations, a voltage drop over the lines occurs, resulting in the reduction of applied voltages. Nevertheless, the logic scheme is still functioning, but it may be reasonable to find a better compromise of the circuit parameters for such a setup. Thus, the design is robust against moderate voltage deviations. Next to changing the circuit parameters, the lines could be widened to reduce the line resistance and thus the voltage drop. Moreover, the resistance levels of the memristive devices could be increased, thus less current is flowing over the bl and the line voltage drop is reduced. Due to the included parasitics, also sneak paths and programming disturbances are included in the simulation, but they do not show to have negative effects on the circuit in addition to the voltage drop.

conclusions
Memristive switches enable a stateful beyond CMOS computing architecture. A novel extension to current computations with memristive switches is the three-input memristive switch logic gate, named the ORNOR function. A system architecture and clocking scheme have been proposed utilizing the ORNOR function, which enables the memristive switches to perform logic with fewer steps. In particular, a full adder was designed as an element of a multi-bit full adder function; the carry-in-to-carry-out delay was therefore minimized to optimize the overall number of steps required to perform the function. The solution shown here reduces the number of steps by up to 60%, providing a significant improvement in system efficiency. By using a physics-based simulation model, a couple of design constraints could be revealed. The major challenge is to choose the circuit parameters (voltages and cycle times) in a way that enables correct functionality. As memristive devices change their state under non-zero input in a finite time, devices that are not supposed to switch should see small voltage drops only for a limited amount of time. One consequence is that multi-input gates with a large difference in the number of inputs cannot be used with the same clocking scheme. The nonlinearity of the switching kinetics is not identical for all type of memristive devices. Thus, the circuit design parameters will differ when another type of memristive device is used.

Methods
Simulation model. Since the physics of VCM devices is still not completely understood, finding an accurate model showing all aspects of memristive switches is an impossible task. There have been initial attempts to characterize the plethora of published ReRAM models and define needed features 39,40 : the most important one being the nonlinear switching kinetics. One model for VCM devices fulfilling these criteria is published by Fleck et al. 42 . Here, this model is adapted to model a Pt/TaO x /Ta device. It is based on the movement of oxygen vacancies within a filamentary region and a concurrent resistance change. The corresponding equivalent circuit model is shown in the Supplementary Information (Fig. 1). In this model, the conductive oxygen-deficient filament is divided into two regions, the disc (light green) and the plug. The plug region is defined as the part of the filament located at the Ta electrode and has a constant high concentration of oxygen vacancies. The disc is located at the Pt electrode and has an oxygen vacancy concentration N disc that varies between a minimum concentration N disc,min and a maximum concentration N disc,max . As the resistance is altered by the change of N disc , this quantity is considered as the state variable. The change of N disc is defined as follows: where z Vo is the charge of the oxygen vacancies, e is the elementary charge, A is the cross-section of the conducting filament, l disc is the length of the disc region, and I ion is the ionic current of oxygen vacancies defined at the interface between plug and disc. The ionic conduction can be modeled by a hopping conduction described by the Mott-Gurney law 48 : Here, a is the hopping distance, ν 0 is the attempt frequency, ∆W A is the barrier height for the ion hopping process, k B is the Boltzmann constant, T is the local temperature, E is the electric field, which is considered to be the driving force for the hopping process, and c Vo is the mean concentration of plug and disc. This means c Vo is modeled by Vo plug disc with N plug being the oxygen vacancy concentration of the plug region. The electric field E is given by where V Schottky is the voltage drop over the Schottky contact, V disc is the voltage drop over the disc region, V plug is the voltage drop over the plug, and l cell is the oxide layer thickness. For positive voltages, only the thermionic emission is considered as a conduction mechanism of the Schottky contact and is modeled as 49 : Here, A* is the Richardson constant and eφ Bn is the effective Schottky barrier height, which is lowered by the image-force lowering effect. The effective Schottky barrier height can be described as follows 49 : with eφ Bn0 being the Schottky barrier height under zero bias, eφ n being the difference between the conduction band and the Fermi level, and ε φn being the effective permittivity in the area of influence of the image-force lowering effect. The Schottky barrier transport mechanism is considered the thermionic-field emission for negative voltages. Thus, the current is calculated by 49  with the parameters E 00 , E 0 and ε′: The contact resistance R contact and the plug resistance R plug are considered as constant resistances in the model. The contact resistance is supposed to result from the electrodes and the TaO x /Ta interface, whereas the plug resistance depends on the geometry of the filament and the assumed oxygen vacancy concentration in the plug region N plug and is set to: where μ n is the mobility of the electrons and l plug is the length of the plug region. In contrast to the plug and contact resistance, the disc resistance changes with the state variable N disc and is calculated as follows: disc disc Vo disc n Filamentary VCM devices show strong nonlinear kinetics (cmp. Fig. 2b in the supplement). This feature can only be achieved if temperature acceleration is considered 41,50 . Thus, it is important to model the internal temperature: disc disc th,eff 0 where R th,eff is the effective thermal resistance of the disc region and T 0 is the ambient temperature.
Simulation parameters. For this work, the model is fitted to measured kinetic data of a TaO x device for the region of applied voltages (0.5 V-1.3 V) (cmp. Fig. 2b) 51 . For small applied voltages the switching speed of the simulated device differs from the real device about some orders of magnitude. The applied voltages in this paper, however, are inside the fitted region. To estimate the values of the parasitic elements, Cu wires (bitlines/ wordline) with a feature size of 40 nm and a height of 40 nm were considered, which are embedded in SiO 2 as the insulating material. Thus, for a line segment of 80 nm with a spacing of 40 nm and a height of 40 nm the coupling capacitance to the neighboring lines is calculated as C P = 2.76•10 −18 F and the segment resistance is R P = 0.86 Ω. The transistors are modeled by a BSIM 4 model with the parameters of 52,53 . The remaining simulation parameters are listed in Table 2.  Table 2. Simulation parameter at T = T 0 .