Single-Readout High-Density Memristor Crossbar

High-density memristor-crossbar architecture is a very promising technology for future computing systems. The simplicity of the gateless-crossbar structure is both its principal advantage and the source of undesired sneak-paths of current. This parasitic current could consume an enormous amount of energy and ruin the readout process. We introduce new adaptive-threshold readout techniques that utilize the locality and hierarchy properties of the computer-memory system to address the sneak-paths problem. The proposed methods require a single memory access per pixel for an array readout. Besides, the memristive crossbar consumes an order of magnitude less power than state-of-the-art readout techniques.

Scientific RepoRts | 6:18863 | DOI: 10.1038/srep18863 to a random sneak-paths resistance. This is translated into having distributions that represent the "One" and "Zero" values rather than a single value. In addition, the magnitude of the sneak-current is typically higher than the current of the desired memory cell 3 ; hence the distributions for the two binary values are highly overlapped, as shown in Fig. 1c. Direct memory readout is therefore not possible; thus, a power efficient sneak-paths immune readout is a necessity for a functional system.
One of the generally utilized properties of the sneak-paths current is its spatial correlation. Knowing the sneak-path noise value at one location of the crossbar helps to estimate the values at other correlated locations. Engineering such properties enables us to propose faster and more power-efficient readout techniques for the resistive crossbar memories. In general, a crossbar can be accessed using two modes: "floating terminals" and "connected terminals", as shown in Fig. 2. In the first approach, the selected array terminals are kept floating. On the other hand, in the "connected terminals" approach the selected rows and columns are connected to two common nodes. The two extra nodes can be used as access terminals to the array 3 , or to enforce a bias voltage 29 . This allows for better control of the sneak-paths behavior and yields a more usable equivalent circuit. In such case, the sneak-paths are represented by three lambed resistances ('R r ' , 'R a ' and 'R c ') as shown in Fig. 2d. Understanding the correlation of these elements over the crossbar facilitates a better handling of the sneak-paths noise. For instance, 'R r ' is a parallel combination of all the desired row cells apart from the desired one; it is given by, is the resistance of a one-row cell, and 'L' is the array length. The row cell resistance can be either ' ′ R on ' or ' ′ R off ' , which are the ON and OFF resistance of the device under 'V n1 − V n4 ' voltage drop respectively. The row resistance can be rewritten as, where 'N on ' is the number of ON cells within the accessed row not counting the accessed cell itself. The remaining two sneak-path components (R c and R a ) have similar expressions. Although, in the case of biasing the unused array terminals the sneak-paths component'R a ' is shorted out. It should be noted that although the metal line resistances are not included in the equivalent circuit for sake of simplicity, they have been fully considered in the simulations carried out in this work. For practical array size, the values of 'R r ' and 'R c ' are almost constant over the same row or column respectively. For instance, the sneak-paths row resistances found at two different locations in the same row have all cells in common except the two cells that are swapped because of the accessed locations. For devices with a large OFF/ON ratio, the relative change in the sneak-paths row resistance is given by,  where 'ρ' is the OFF/ON ratio of the used device. The maximum relative change in the row resistance versus the array size for a balanced number of zeros and ones is plotted in Fig. 3a. The figure shows that as the array size increases the effect of a single bit swap diminishes. The other parameter that affects Δ R/R is the number of ones (per row or column), as given by (2). Figure 3b shows that the maximum relative change of sneak-paths resistance is still small while the percentage of ones per row/column is swept. Hence, 'R r ' is almost constant over a given row and 'R c ' is almost constant over a given column. It should be noted that, given the randomness of the data, 'R r ' and 'R c ' are considered two independent random variables.

Adaptive-Threshold Readout Techniques
The sneak-paths correlation property can be effectively utilized in case of sequential reading for the stored data on a memory array. The good news is that this is the typical memory access scheme in computer systems. Because of the memory-locality property, data is transferred and shared between different memory layers as a block of contiguous bits, rather than in random bits or words 30 . This locality property is of help only if the knowledge gained from reading a single bit can be adopted in reading its neighborhoods. This is true for the "connected terminals" crossbar, where the values of 'R r ' and 'R c ' can be safely shared over the same row or column, respectively, as discussed in the previous sections. This is equivalent to defining an adaptive threshold that changes at each new row readout, which can be achieved with the aid of the "connected terminals" crossbar. The generic "connected terminals" circuit model shown in Fig. 2d can be simplified for the case of 'V B ' terminals bias. Terminals 'n 3 ' and 'n 4 ' are connected to 'V B ' , and terminals 'n 1 ' and 'n 2 ' are connected to 'V DD ' and virtual ground. This can be done with two different implementations as shown in Fig. 4. Using a virtual ground sensing circuit forces all of the array elements to have a defined voltage drop independent of the data stored in the array. The desired cell experiences a full 'V DD ' voltage drop, while the sneak-paths components of 'R r ' and 'R c ' have a voltage drop of 'V DD − V B ' . Because of the device saturation nonlinearity, the full voltage drop on the desired cell makes the magnitude difference between its ON and OFF states much larger than any error introduced by sharing 'R r ' or 'R c ' over a segment. While both of 'R r ' and 'R c ' drain parasitic sneak-current, the current leak through only  one of them affects the correctness of the readout operation. When the read circuit is connected to node 'n 1 ' , as shown in Fig. 4b, the sense current (I sense ) is defined as, where 'I m ' is the desired current and 'I r ' is the row sneak current component. Sensing from node 'n 2 ' swaps the locations and the role of 'R r ' and 'R c ' in the circuit, as shown in Fig. 5a. The sense current is shifted from its desired value by the sneak-current of the row or the column. However this shift is constant within a given row or column, based on the connection orientation.

Initial Bits Readout
Each bit generally has two unknowns: 'R m ' and 'R r ' (or 'R c '). Without adopting sneak-paths correlation and locality, multiple access stages are needed to estimate the bit value. However, a faster readout can be achieved by categorizing the bits into two types: the "initial bits, " which are the first bits accessed in a given column, and "regular bits, " which are any other bits in the array. To estimate the value of the "initial bit, " two unknowns need to be solved, namely the desired resistance (R m ) and the row sneak resistance (R r ). However, the remaining bits in the row share the same 'R r ' value, and 'I r ' is treated as the significant sneak-path component for a given row. Any of the readout techniques presented in the literature can be used to estimate the "initial bit" 21 . These "initial bits" readout dictates the threshold used for the remaining bits in that row. Figure 5a shows the readout sequence for the array when"initial bits" strategy is adopted. Therefore, the first (initial bit) could be any bit in the array that requires 'n' stages of reading. The rest of the bits in the same row are then accessed in sequence, only one time for each. Reading from the next row requires a new "initial bit", which in this case is the first bit in the row, as shown in Fig. 5a. The same sequence is followed until the fetched data block for the cache is completed, i.e., each row contains one "initial bit", and the rest of the bits are accessed in a single stage fashion. For a contiguous block of data readout using the "initial bits" technique, the proposed readout procedure is given as follows: Case 1: The first accessed bit in the row 'i' (the initial bits), • Use a multi stage readout technique to estimate the desired cell current ( ) , I m i j and the row sneak-current component ( ) I r i .

Case 2:
Accessing the rest of the bits in the same row, • Access the desired cell for a single time to estimate its value, where ' = − , , where 'i' and 'j' are the desired row and column respectively. It should be noted that in the case of sensing from 'n 1 ' data is accessed in a column-wise rather than row-wise scheme. The readout circuitry for the "initial bit" is made of two parts: a virtual-ground ADC for the current sensing, and a digital processing circuitry for calculating the "initial bit" parameters and do the threshold comparisons. Typically, a single readout circuitry is needed per memory array. This does not impact the whole memory density as presented in previous works 3 .

Predefined Dummy Bits Readout
A more time efficient way to estimate the adaptive threshold is to add "dummy bits" with predefined value to the array. The general concept of adding predefined bits to an array for sneak-paths estimation is presented in 31 . In our case, for a "dummy bit" the value of 'R m ' is known in advance, and a single readout is needed to estimate the value of 'R r ' . This estimated 'R r ' value is reused with the other bits in the same row. A single readout is required in this case to estimate the remaining unknown (R m ). This value is used for the rest of the bits in the same row. The "dummy bit" can be organized in several ways, given that each row contains a single bit. Figure 5b shows a possible organization of dummy bits that is suitable for a row-wise readout analogy. For a contiguous block of data readout using the "dummy bits" technique, the proposed readout procedure is given as follows: Case 1: Accessing the "dummy bit" of row 'i' , where 'i' and 'j' are the desired row and column respectively. The dummy current 'I dummy ' is known in the design time, where it can be 'I on ' or 'I off ' depending on which value is used to be stored in the dummy cells. Moreover, a dummy cell could be just a reference static resistor rather than a memristor, since there is no need to write it after the array fabrication. The "dummy bits" technique adds a small overhead to the readout process, as a "dummy bit" needs to be accessed a single time (in comparison to 'n' times for an "initial bit"). However, for practical size arrays of 256 k size or more, the average number of array accesses per bit that occurs when fetching a block of data from memory is almost one for both methods. Figure 6a shows the average number of readouts per memory bit, where the overhead is shared over "regular bits", versus the fetched data size. It also illustrates how the average number of readouts converges to one very fast. The ripples in the curve occur because that start reading from a new row adds extra overhead of an "initial bit" or a "dummy bit". It should be noted that the typical cache line is 0.5 kb (64 bytes), where multiple lines are fetched from memory in sequence based on the cache policy. This value is much larger in the case of RAM fetching from HDD. While the "dummy bits" technique exhibits a better behavior, it comes at a small cost to the effective area of the array, as "dummy bits" are not used to store real data. This negligible overhead is shown in Fig. 6b.
The readout circuitry for the "dummy bits" technique can be implemented in two ways. The first approach is to use an analog circuit for current sensing and a simple digital circuit for comparisons and estimation, as discussed in the "initial bits" readout. Typically, most of the readout circuit area in this methodology is consumed by the conversion of the data from one domain to the other using ADCs 3 . A more area efficient implementation is to adopt a totally analog compensated readout circuit, as presented in previous work 23 . In this approach, the current of a "dummy cell" is sampled on a first capacitor, and the sensed current from each desired cell is sampled on a second one in sequence. Comparison between the two capacitor voltages leads to estimating the stored data in the desired memory cells 23 .

Variability
Variability is a challenge that faces resistive-memory readout techniques. In general, two types of variability issues face memristor-based memory. The first is fabrication variability, in which fabricated cells have a distribution of ON and OFF values, rather than two unique states. The other source of variability is operation variability, in which the device parameters change stochastically or with aging. In the proposed readout, a memristor device variability can affect three types of cells. The first type is the "normal bits", where any change in the cell properties can ruin its readout alone and impact the sneak-paths estimation. This is because the presented sneak-paths estimation techniques do not assume any properties of the "normal" cells, instead a group of parallel resistances whose effect as sneak-paths is probed for each new row readout. However, variability in the "normal bits" has a secondary impact on the read margins of the system, since it widens the distributions representing the possible ON and OFF values. The change in the read margins is defined as, where δ r is the maximum absolute shift in a cell value due to variability. The second type of bit, which are "initial bits" does not suffer from any variability issues, because multi-stage readout techniques used to access such bits typically do not assume any properties of the probed cell. Such methods read and write the cell under probe multiple times during each readout to define a local threshold for the cell. Therefore, adopting multistage readout for accessing the "initial bits" makes the system less vulnerable to variability yet yields a very high total throughput. The last type of bits are "dummy bits". Variability can impact the sneak-paths estimation since the dummy bit is used as a reference for the other cells in the row. Typically, variability in a "dummy bit" results in a threshold shift, which results in a decrease in read margins defined as where δ r d is the maximum absolute shift in a "dummy cell value" due to variability. The variability effect caused by a "dummy cell" can be reduced by storing the less variable memory stage (ON or OFF) in a dummy bit, or by using static resistance as a dummy cell rather than a memristor. Moreover, in the case of devices with high OFF resistance, storing "Zeros" in the dummy cell makes the probed current from its location much closer to the sneak-current rather than the cell current.

Results and Discussions
In order to evaluate the validity and efficiency of crossbar readout techniques, an accurate simulation platform that includes different crossbar non-idealities is needed. To achieve this goal, we employ a Python script that creates SPICE netlists for realistic size arrays and sweeps different parameters and data patterns by calling HSPICE or Cadence APS iteratively. The test array can be filled with any predefined, random, or realistic workload as NIST RAM images 28 . Finally, the Python script braces the SPICE output files to collect the data of interest and tabulate it. We used a crossbar parasitic resistance value of 5 Ω per cell 3 and included the effect of the switching row and column circuitry in all of the simulations in this work. For the memristor device, we adopted a bipolar device model proposed for memory operations 5 . Finally, it should be noted that resistive RAMs are built in the same hierarchy and structure of DRAMs, where subarrays of size up to 256 kb are used to reduce the capacitive loading of the metal lines 32 . Hence, we use an array size up to 256 kb for simulations and comparisons with this work.
Error Free Readout. To verify the proposed concept, we simulated the readout operation at different locations of a 256 kb array of various NIST RAM images. In the first case, the readout locations are distributed over the array while in the second case all the readouts are made for cells in the same column. Figure 7 shows the histogram of the sensed read current for the two cases. The results indicate that the distributions of reading "One" and reading "Zero" are highly overlapped, and that it is not possible to define a threshold to distinguish between the two binary cases, as shown in Fig. 7 (inset). However, for a given row or column, reading from different locations reveals a clear separation between the distribution of ones and zeros, as shown in Fig. 7. This verifies our proposed readout scheme, in which an adaptive threshold is defined for each column (or row) as discussed earlier.
The simulation results show that a simple comparator is required to differentiate between "One" and "Zero" states.
Crossbar Power Consumption. Undesirable sneak-paths power consumption is not avoidable in high-density gateless arrays. However, it can be reduced by utilizing devices with nonlinear saturation behavior.  Figure 8 shows the 'i-v' hysteresis of two of our fabricated devices. The second device shows higher saturation nonlinearity than the first one. Reducing the voltage applied to such devices by fifty percent can increase its saturation resistance up to two orders of magnitude 27 . This is a very attractive property since a sneak path is made of a series of memristor devices, where a sub-voltage is dropped on each of them. In the "connected terminals" structure, the device nonlinearity can be enforced by biasing the unused terminals to sub-read voltage. In such case, the very small 'R a ' is shorted out, and the nonlinearity of the other terminals efficiently utilized. Figure 9a shows that the optimal selection is made by biasing the unused terminals voltage to be V B = V DD /2. The power consumption of this method is almost the same as the baseline "floating terminals", as shown in Fig. 9b. The figure also shows the great power-saving of the "connected terminals" while comparing it with the power hungry "grounded   terminals" technique. It should be noted that power consumption saturates for larger array sizes because of the crossbar metal lines 3 .

Figure-of-Merit.
In general, the presented technique offers a sneak-paths immune readout that is more power efficient and faster than the state-of-the-art crossbar accessing techniques that are presented in the literature. Table 1 shows a detailed comparison of the various gateless techniques that can provide an error-free readout. The different methods are compared based on a figure-of-merit (FoM), which is defined as, Array Density Array Read Power 6 where the proposed technique shows the best FoM.

Conclusion
Taking advantage of memory locality and the sneak-paths correlation yields a fast and power efficient readout technique. Unlike other techniques, the proposed method achieves the theoretical limit of a single memory access per pixel for an array readout at a fraction of the power of state-of-the-art readout techniques. The presented adaptive-threshold readout is 7 to 24 times better than the other gateless techniques presented in the literature, based on the density-power figure-of-merit. In addition, the new sneak-paths immune technique requires minimal hardware to distinguish between the memory data values.