Multistate Memristive Tantalum Oxide Devices for Ternary Arithmetic

Redox-based resistive switching random access memory (ReRAM) offers excellent properties to implement future non-volatile memory arrays. Recently, the capability of two-state ReRAMs to implement Boolean logic functionality gained wide interest. Here, we report on seven-states Tantalum Oxide Devices, which enable the realization of an intrinsic modular arithmetic using a ternary number system. Modular arithmetic, a fundamental system for operating on numbers within the limit of a modulus, is known to mathematicians since the days of Euclid and finds applications in diverse areas ranging from e-commerce to musical notations. We demonstrate that multistate devices not only reduce the storage area consumption drastically, but also enable novel in-memory operations, such as computing using high-radix number systems, which could not be implemented using two-state devices. The use of high radix number system reduces the computational complexity by reducing the number of needed digits. Thus the number of calculation operations in an addition and the number of logic devices can be reduced.

The properties of integer numbers for a specific modulus, spanning addition, subtraction and multiplication are written as following.
Given a 1 ≡ b 1 (mod n) and a 2 ≡ b 2 (mod n), we have Scientific RepoRts | 6:36652 | DOI: 10.1038/srep36652 Apart from applications in mathematics, modular arithmetic plays a fundamental role in modern computer arithmetic. Here, a ring of integers modulo 2 is termed as a Boolean ring and every Boolean ring gives rise to Boolean algebra, where the ring multiplication is conjunction operator (∧ ) and the ring addition is exclusive disjunction operator (∨ ). Furthermore, the idea of secure and fault-tolerant data communication relies on the principles of public-key cryptography and error-correcting codes, respectively. Both of these fields require efficient implementations of modular arithmetic.
Modular arithmetic is also useful for reducing the complexity of standard arithmetic circuits 15,16 and is essential for building the residue numeral systems (RNS). The RNS representation allows overflow-free addition, subtraction and multiplication, thereby enabling high degree of parallelism.
State-of-the-art modular arithmetic circuits in CMOS technology are implemented using two-state Boolean arithmetic operations, which follows directly from the two-level switching algebra introduced by Shannon 17 . Memristive devices were suggested to replace register files in conventional signed-digit adders 18 or to be used in conjunction with complex quantization circuits 19 . The current paper reports the first implementation of modular arithmetic using multi-state ReRAM devices, which is fully crossbar array compatible in conjunction with a selector device. So far, most previous memristive circuit studies are based on over simplistic memristor models 20 , we hereby use real memristive devices fabricated in word structures to verify the proposed functionality. It should also be noted that we perceive no theoretical limit in scaling the number of states for memristive devices, thereby, opening a new research direction on multi-state storage and computing devices.

Results
Device properties. In this work, 5 μ m × 5 μ m Pt/W/TaO x /Pt cross-point devices arranged in word structures ( Fig. 1a) are used to realize the three Trit (trinary digit) modular addition. The device stack 25 nm Pt/ 13 nm W/7 nm TaO x /30 nm Pt is depicted in Fig 1b. The typical I-V characteristic of this device is shown in Fig. 1c. In supplementary S1, the I-V characteristic of an 80 nm × 80 nm cross-point device is also shown, highlighting the scaling potential of these devices. During RESET process, the maximum applied voltage |V stop | defines the final resistive state (1.8 V in Fig. 1c). This feature is also present in pulse mode, thus can be implemented in memory and logic operations for controlling the multi-level states. Figure 2a shows the cumulative probability of low resistance state (LRS) and six multi-level resistive states, which are obtained for 200 ns pulses in the range of V stop = − 1.50 V to − 2.25 V. Each state is based on 5 devices with 10 cycles. The inset explains the statistical information of distribution. The tight distribution highlights the excellent switching properties of this device. In Fig. 2b, the mean value for each of the resistive state R0 to R5 is given.
For the proposed arithmetic operation, the input operands are applied to the top (TE) and bottom (BE) electrode, respectively. To enable an equidistant voltage stepping, we use a predefined OFFSET voltage (V OFFSET ) for each pulse. The operand voltages are V op = 0.00 V to 0.75 V with increment of 0.15 V. The actual pulse applied at the bottom electrode is therefore V BE = V OFFSET + V op1 and V TE = − (V OFFSET + V op2 ) for the top electrode. Thus, the overall potential difference is V stop = V TE -V BE = − (2 V OFFSET + V op1 + V op2 ). Since the overall device voltage is always negative, a logic operation corresponds to a RESET pulse whose amplitude depends on the actual operands. To show the multi-level pulse operation mode, we set V BE = 0.75 V and vary V TE from − 0.75 V to − 1.50 V (Fig. 3a). The resulting resistances are depicted in Fig. 3b. Depending on the overall device voltage (V stop = − 1.50 V to − 2.25 V) six different resistance states (R0, R1, R2, R3, R4 and R5) are easily accessible (Fig. 3b). Note that three resistive states would be sufficient to represent a ternary numeral system (Trit). This multi-level device property is used for the modular arithmetic operation. To enable highly reproducible RESET operation, we always apply a DC SET operation before each pulsed RESET operation. Note that also nanosecond pulsed SET operations are feasible, but not applied in this work. Details on the pulsed SET operation can be found in Supplementary S2. Developed Modular Arithmetic Working Principle. The new developed algorithm calculates the carries and sums directly in the ReRAM devices, which store the results until they are read out. Initially, all the devices in a wordline are initialized, i.e. written to the LRS. Starting from this state the sum bit of significance 0 (s 0 ) can be directly calculated in the device of significance 0 while the other devices are calculating the first output carry c 1 . The actual sum or carry calculating devices are shifted for each significance one device to the left.
In general, for the carry algorithm (Fig. 4a), first the device state of the actual device is read, to check whether the input carry c in is 0 or 1. In case of 1, V OFFSET is set to 0.875 V whereas in case of 0, the OFFSET remains V OFFSET = 0.75 V. Next, the logic operation is conducted after a SET operation using the evaluated OFFSET. We apply V TE = − (V OFFSET + V op1 ) to the top electrode and V BE = V OFFSET + V op2 to the bottom electrode. Finally, the resistive state of the device is read and evaluated. To enable a proper modulus operation the ReRAM device has to provide 2n states for an n-ary number system. The background is that in a n-ary number system the operands at each specific significance are in the range of 0… n-1, i.e., the sum of two operands is at most 2n-1. Since, an input carry of 1 may also occur, the totally required number of states per device is 2n. Thus, for a ternary number system six states (R0… R5) are required. If the state is R ≤ R2, the output carry c out is 0 and R0 is written back. For R > R2, the device is written to R1, i.e. c out = 1. Note that prior to the write back operation, the SET operation is conducted to enable a highly controlled R1 state. In supplementary S4, the required peripheral circuitry is depicted. Note that a certain minimum crossbar array size is required to justify the peripheral circuitry overhead. In this respect, a suitable selector device is a key component enabling large-scale ultra-dense multi-level ReRAM arrays. Based on the input carry, the final sum can be calculated (Fig. 4b). As for the carry calculation, the required level of the V OFFSET (either 0.75 V or 0.875 V) is first evaluated. Next, the operand voltages V TE = − (V OFFSET + V op1 ) and V BE = V OFFSET + V op2 are applied after the SET operation. Based on a final readout, the mapping R3 → R0, R4 → R1 and R5 → R2 has to be conducted to complete the modulo sum operation. The corresponding write back operation is done subsequently after the SET operation.
Since the signals which are applied to the TE are the same in both algorithm, these can be conducted in parallel on devices of different significance. Thus the cycle count can be kept low.  Proof-of-concept. For the proof-of-concept measurement, a two Trit modular addition is selected, adding the ternary numbers p = p 1 p 0 and q = q 1 q 0 . Since the sum output z = z 2 z 1 z 0 needs three Trit digits, three ReRAM devices are required for this operation and initialized to LRS firstly. The addition is performed in a word-line structure (cf. Fig. 1a). For the exemplary addition, operand 1 is p = 21 (= 7) and operand 2 is q = 22 (= 8). Note that input 0 corresponds to V op = 0.00 V, input 1 corresponds to V op = 0.15 V and input 2 corresponds to V op = 0.30 V, using the earlier described incremental stepping of 0.15 V. In Fig. 5a-c, the sequentially obtained resistive states are shown. The arrows mark the order of steps without showing the in between SET-steps. The algorithm described in Fig. 4 realizes the following mathematical modulo sum operation: In device z 0, the sum operation is conducted directly: Note that the function 'rem' returns the remainder. Starting from LRS, the device is reset (p 0 = 1 = > V TE = − 0.9 V and q 0 = 2 = > V BE = 1.05 V, i.e. − 1.95 V). According to Fig. 2b, this voltage leads to state R3, as can be also seen in Fig. 5c directly. According to the sum algorithm (Fig. 4b), R3 is finally mapped to R0, see Fig. 5c. In device z 1 , first the carry operation is conducted: The function 'div' returns the floor quotient. Starting from LRS, the device is toggled to R3 state by applying p 0 and q 0 to calculate the carry c 1 . According to the carry algorithm (Fig. 4a), R3 is then mapped to R1, see Fig. 5b.
In Fig. 6 the schematics of applied operation voltages and corresponding states are depicted. The first line shows the voltages at the common bottom electrode (BE) acting as a wordline (WL). The second, fourth and sixth lines show the voltages applied to the three separate top electrodes (V TE2 , V TE1 , V TE0 ) acting as bitlines (BL) while the third, fifth and seventh lines represent the resistance states (R TE2 , R TE1 , R TE0 ) at each BL. The three background colors are used. The gray shows the LRS after SET. The yellow depicts the logic implementations and the blue shows the corresponding states after the logic implementation. Overall twelve steps are presented.
Step 1-2 show the initialized LRS and logic implementation.
Step 3-4 depict the corresponding resistance states with the  Figure 6. Schematics of operation. Schematics of operation for applied voltages (V TE , V BE ) are shown. The gray shows the LRS after SET. The yellow depicts the logic implementations and the blue shows the corresponding states after the logic implementation.
Step 1 is the LRS after initialization and Step 2 is logic implementation.
Step 3 is the resistance states based on Step 2.
Step 4 is the LRS after SET and Step 5 implements the RESET for the modulo operation.
Step 6 is the corresponding resistance states and Step 7 is the LRS after SET.
Step 8 is the logic implementation with adjusted OFFSET and Step 9 is the corresponding resistance states.
Step 10 is the LRS after SET.
Step 11 is the logic implementation and Step 12 is the corresponding resistance states.

Discussion
We have demonstrated a ternary number system implementation, using multi-states tantalum oxide devices in word structures. Depending on the available number of resistive states, higher order number systems can also be implemented in the same way. For n-ary systems, we would need 2n resistive states, hence further progress in ReRAM memory technology will directly enable arithmetic operations using higher radix number systems.
On the other hand, the choice of radix for a number representation can be motivated from the perspective of underlying implementation as well as the analysis of radix economy. A quantifiable measure of radix economy proposed in ref. 21 is as following: where, b is the radix and N is the number to be represented. This metric yields E as the most economical real-valued radix. It also turns out that the radix value of 3 (ternary) is more economical compared to binary. We argue that the above measure does not take the growth of the implementation media into account. For several device technologies, the area requirement grows linearly with the radix size and make the radix implementation very tough. However, it is not true for multistate memristive devices such as the Pt/W/TaO x /Pt ReRAM, since the implementable radix size depends on the number of resistance states. Considering, a k-state device can be realized at the same cost of a two-state device, a more appropriate metric would be Compared to binary arithmetic, an n-ary number representation reduces the space complexity in a logarithmic ratio. Given comparable performance for the base devices, the gain in arithmetic circuits, such as, integer addition is also expected to be in logarithmic scale. However, the actual gain will be somehow smaller due to need for better sense amplifiers and more control circuitry.
The presented approach is not limited to a specific multistate ReRAM device, but would work for any memristive device offering multiple resistance levels induced by different stop voltages V stop . The proposed algorithm could further be simplified by avoiding in between SET operations, however this requires ultra-low variance ReRAM devices. For the considered ReRAM device only RESET pulses were allowed as logic inputs. Appropriate SET pulses enabling step-by-step decrease of the resistance could be used to implement also subtraction within the same device similar to the here shown additional operation.
The presented approach is compatible to the passive crossbar array configuration, by integrating a selector device to each TaO x junction. The implementation of the arithmetic functionality within the resistive memory device using the available multi-resistance levels is a highly attractive option for future functionality enhanced hybrid CMOS/ReRAM chips. This approach enables a reduction of cycle count compared to Boolean logic based ReRAM approaches [22][23][24][25] . For example, a recently proposed cipher application could be decisively improved using multi-level ReRAMs 26 , enabling efficient in-hardware encryption and decryption for future smart devices. The energy per operation depends on the device properties, namely switching voltage, multi-level resistive states and inherent switching speed and control circuit properties such as the applied pulse width. Since the pulse width (t) that will be used in real application is much shorter than 200 ns (used in study), the final power consumption will be reduced further. In summary, low-variance multi-level ReRAM could play a key role for implementation of public-key cryptography and error-correcting codes in smart devices.

Conclusion
Pt/W/TaO x /Pt devices enable highly reliable multi states, which can be accessed reproducibly by pulses of specific height, starting from a defined LRS. By using word and bit lines as inputs for pulses, the resistive multi-levels can be used to store and calculate in-memory logic operations. To avoid an overflow in individual devices, a modulus arithmetic is implemented, assuring the device to be always in a valid 0, 1, 2 (Trit) state. By using a ternary number system, the amount of devices and cycles can be reduced significantly. In contrast to two-state devices, multistate devices provide better radix economy with the option for further scaling. Therefore, establishing multi-state ReRAM for non-volatile memory opens the door to novel storage and in-memory computer arithmetic options.

Methods
Device Fabrication. Devices are fabricated for 1 × 3 array with crossbar structure based on 5 × 5 μ m 2 single cell size. Three top electrodes shares single bottom electrode. For the bottom electrode (BE), 5 nm titanium (Ti) and 30 nm platinum (Pt) layers are deposited by sputtering on top of thermally grown SiO 2 layer (430 nm). A photolithography and dry etch processes are applied to pattern the bottom electrode. And then 7.0 nm-thick TaO x is deposited by reactive sputtering under process gas mixture of Ar (23%) and oxygen (7%) with the RF power of 116W at the chamber pressure of 2.3 × 10 −2 mbar. Without breaking the vacuum, 13 nm tungsten (W) ohmic electrode, and 25 nm platinum (Pt) are deposited consecutively. And then top electrode (TE) is patterned with a photolithography and a reactive ion etch. A scanning electron microscopy of 1 × 3 crossbar array with 5 × 5 μ m 2 size cell is shown in Fig. 1a and its corresponding cross-sectional structure with tunneling electron microscopy is shown in Fig. 1b. Measurement Set-up. Initially the resistive switching devices remain in high resistance state (HRS) and we need to apply irreversible forming process in order to activate the devices before repetitive switching cycles are possible. Detailed information is given in supplementary section. During the forming process, the resistance state of device changes from HRS to LRS. In order to change the resistance state from LRS to HRS, a 'RESET' process is required. In reverse way, a 'SET' process converts the HRS to LRS. The DC operation of a single device also known Scientific RepoRts | 6:36652 | DOI: 10.1038/srep36652 as current-voltage (I-V) sweep is shown in Fig. 1. The voltage sweep is applied only to the top electrode (TE) while the bottom electrode is grounded. In order to achieve better control on the resistance state of device with single pulse operation, the 'SET' process is based on DC operation. However, the other operations such as 'RESET' and 'read' use AC pulse operation. For the RESET operation, 200 ns pulse width based on full width half maximum (FWHM) is applied for switching operations with 40 ns rising/falling times. The read operation at 0.1 V uses 120 μ s long pulse in order to verify the each resistance value more accurately, especially HRS values. By applying a 0.15 V stepping, seven resistive levels can be distinguished in the considered devices. Depending on the actual devices even more multi-level states are feasible in ReRAM devices. However, due to variability, e.g., induced by random telegraph noise 27 , the maximum number of properly accessible multi-levels is limited.