Multi-state MRAM cells for hardware neuromorphic computing

Magnetic tunnel junctions (MTJ) have been successfully applied in various sensing application and digital information storage technologies. Currently, a number of new potential applications of MTJs are being actively studied, including high-frequency electronics, energy harvesting or random number generators. Recently, MTJs have been also proposed in designs of new platforms for unconventional or bio-inspired computing. In the current work, we present a complete hardware implementation design of a neural computing device that incorporates serially connected MTJs forming a multi-state memory cell can be used in a hardware implementation of a neural computing device. The main purpose of the multi-cell is the formation of quantized weights in the network, which can be programmed using the proposed electronic circuit. Multi-cells are connected to a CMOS-based summing amplifier and a sigmoid function generator, forming an artificial neuron. The operation of the designed network is tested using a recognition of hand-written digits in 20 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}× 20 pixels matrix and shows detection ratio comparable to the software algorithm, using weights stored in a multi-cell consisting of four MTJs or more. Moreover, the presented solution has better energy efficiency in terms of energy consumed per single image processing, as compared to a similar design.

www.nature.com/scientificreports/ of four MTJs is sufficient for the network to achieve a recognition error rate below 3%, while providing better energy efficiency per operation than circuit presented by Zhang et. al. 29 .

Experimental
Mulltibit-cell based artificial synapse. A key element of the design of the ANN is a spintronic memristor, which involves serially connected MTJs. Each of the MTJs may be characterized by a R(V) curve (Fig. 1a), where two stable resistance states can be observed, as well as critical voltages (cN and cP), for which the switching occurs. By serially connecting N of such MTJs 20 , a multi-state resistive element is obtained (Fig. 1b), for which N + 1 resistance states are observed. The concept of the multi-cell was experimentally confirmed using up to seven MTJs connected in series. For the simulation of the network, we introduce a model of the multi-cell based on the following protocol. A typical R(V) loop of an MTJ may be approximated using four linear functions (resistance vs. bias voltage dependence in each MTJ state) and two threshold points (switching voltages) as presented in Fig. 1a. In addition, in the case of a real MTJ the following parameters are related to each other: a1n = −a1p = a1 , b1n = b1p = b1 , a0n = −a0p = a0 and b0n = b0p = b0 . Moreover, a current resistance state (high or low resistance) has to be included. Using such a model of the R(V) curve allows also to calculate other transport curves, including V(I).
The proposed model corresponds to all MTJs that were investigated during the study. Parameters obtained from the experimental part, and further used in the simulation, are presented in Tab. 1. MTJs with perpendicular magnetic anisotropy were patterned as pillars 100 nm in diameter and interconnected using metalization layers and vias (Fig. 2).
The model was used to simulate serially connected MTJs and a representative comparison between simulation and experiment is presented in Fig. 1b. Moreover, simulations of up to seven MTJs were carried out, where, additionally, a spread of parameters was taken into account. This allowed for defining distribution of stable resistance states as well as voltages used for writing. The results of such simulation as well as representative experimental data are presented in Fig. 3.

Electronic neuron.
After the analysis of the multi-cell, which may be used as a programmable resistor for performing weighted sum operation for many input voltages, we turn to the artificial neuron design. A sche-   www.nature.com/scientificreports/ zero weight is obtained, which is equivalent to the situation when an input is disconnected from the synapse. The resistive summing circuit architecture is being used in order to implement an addition operation while reducing the footprint of the synapse. A differential amplifier converts differential voltage to a single bipolar signal, which is transformed using a non-linear sigmoid function. This voltage may be used as the input of the next synapse, or as the output of the network. Additionally, to provide a constant bias, a standard input with constant voltage may be used, where the level of this constant bias is determined in the same way as weights for other functional inputs.
Neural network circuit. The electrical circuit implementing the proposed neural network was designed in a standard CMOS technology-UMC 180 nm. To program the demanded resistance of seven serially connected MTJs, a voltage of about 3.25 V is needed, so input/output (I/O) 3.3 V transistors were used to design a circuit for MTJs programming purpose, while for other circuits, a standard 1.8 V transistors were used. An individual neuron circuit is composed of three parts. At the input, two resistive networks consisting of memristors implement a multiplication of input voltages by coefficients and summing of these products (Fig. 4a). Next, the obtained voltages are subtracted and amplified to the demanded value in a differential amplifier (Fig. 4b). Voltage followers are used to separate stages of the circuit and eliminate unwanted loading (Fig. 4d). Finally, the third part is a sigmoid function block, which implements the activation function (Fig. 4c). It is based on an inverter and has negative transfer characteristic, thus appropriate polarizations of signals are required. The differential voltage V d generated by the divider network (Fig. 4a) connected to a pair of voltage followers (Fig. 4d) can be expressed as:

Results
To evaluate the performance of the multi-bit MTJ cell-based ANN, a set of classification tasks using the MNIST dataset of handwritten digits (Fig.5a) was prepared. The conceptual architecture used for the network is shown in Fig. 5b and consists of the input layer, two hidden layers containing N neurons each and the output layer. A benchmark software network was trained using the standard scaled conjugate gradient method and cross-entropy error metrics, with tanh activation function for every layer except the last one, where the softmax function was used. Then, its performance was evaluated on a testing subset that has been drawn randomly from the input data and has not participated in training. This procedure was repeated 50 times in total, with training and testing subsets being redrawn each time, leading to an average error estimate for each network size.
Having established the performance of the benchmark software network, the evaluation of our MTJ-based design was performed. The original float-accuracy weights between different neurons were replaced by their discrete versions corresponding to our multi-state MTJ synapses. The new weights were calculated using simulated conductance data (as described in Sec. "Multibit-cell based artificial synapse") and rescaled by tuning amplifier gains to match the desired value range for the neurons. Then, the performance of the network was re-evaluated on the testing data subset. The results are presented in Fig. 5c. It can be seen that, as long as the number of MTJs used per multi-state cell exceeds three, the performance of the MTJ-based solution is comparable to the original software version, with differences being only incremental in character. Due to a relatively shallow structure of our network, the total number of individual MTJ elements necessary to perform the calculation is thus remarkably low and ranges from around 200 to around 700, depending on the assumed tolerance for error. This is one order of magnitude lower than the number previously reported for quantized neural networks based on MTJs with comparable performance 19 .
The neural network shown in Fig. 5b, using 7 MTJs per memristor, was also described and simulated electrically in Hspice for the same data as computer simulations mentioned above, assuming 7 MTJs per memristor. Input voltages (with maximum amplitude of 0.2 V) corresponding to hand written MNIST digits were changed www.nature.com/scientificreports/ to a next image every 4 µs . The circuit gave the same results as theoretical calculations-for a given subset of cases the same error rate was achieved. The circuit had a latency of approximately 1 µs and to process one picture, only 37.4 pJ of energy were needed. It is therefore a significant improvement compared to the work by Zhang et. al. 29 , where processing of a 10 by 10 pixel area (4 times smaller area than our 20 by 20 pixel images) consumed 194 pJ. The power consumption of our network could be further decreased, and speed could be increased at the expense of the output voltage. Also, the total resistance of each synapse might be increased by connecting additional resistances as well as by careful optimization in the MTJ structure such as using devices with higher RA product 31 and by further miniaturization of the MTJ pillar size below 100 nm in diameter 32 . However, it could also lead to deterioration of the reliability of the ANN. www.nature.com/scientificreports/

Discussion
The presented architecture of full hardware artificial neural network proves to be an effective way of performing neuromorphic computing. Compared to other solutions, it utilizes standard MTJs that are compatible with STT-MRAM technology, which has been recently developed for mass production. Additionally, MTJs in such application are very stable over time and they exhibit high endurance in terms of reprogramming, comparing to low-energy barrier MTJs used in probabilistic computing. To validate the circuit, the artificial CMOS-based neuron was designed, consisting of multi-cell based synapses, differential amplifiers and sigmoid function generator. It was shown that the quantized-weight approach enables the development of a functional artificial neural network, capable of solving recognition problems with accuracy level similar to the benchmark software model. Moreover, the electronic simulations additionally proved low latency of the operation of the order of µs as well as low energy consumption per recognized picture.

Methods
Circuit details. The operational amplifier, presented in Fig. 6 was designed as a two stage circuit consisting of a differential pair M1, M2 with a current mirror load M3, M4 biased by M5 with a current of 1 µA . The output stage M6, M7 provided appropriate amplification and output current. The total current consumed by the operational amplifier is about 12 µA and amplification with an open loop of around 74 dB. Dimensions of transistors were chosen in such a way to obtain the smallest area possible while meeting the required electrical parameters (width of M1 and M2 is 0.7 µm , M3 and M4 is 0.45 µm , M5 and M8 is 0.96 µm , M6 is 7.48 µm , and M7 is 7 µm , capacitance of C0 is 100fF). The final stage of the neuron is a circuit, which performs activation functions and has negative hyperbolic tangent transfer characteristic, presented in Fig. 7b. It is designed as a modified inverter, which has voltage-tovoltage transfer in contrast to other solutions, such as resistive-type sigmoid 33 . Transistors M2 and M3 work as resistors, moving operating point of transistors M0 and M1 to the linear region. Finally, the circuit implements the transfer characteristic shown in Fig. 7a. Minimum length of channels were used (180 nm, except for M3, Figure 6. Operational amplifier circuit used in the design. Programming of the synapse. The important part of the design involved a circuit for memristors programming. The overview of the programming circuit is presented in Fig. 8. The switches are controlled from the digital circuit in such a way that the memristor to be programmed is connected with one terminal to the programming voltage input and the other terminal to the ground. After the selected elements are connected, the required voltage value is applied to the programming input in order to program the chosen memristors. Those elements that are not programmed with a given voltage are disconnected from the programming input. In the next cycle, another set of memristors is connected for programming and another voltage is applied. In such solution, all memristors may be programmed in a number of cycles corresponding to the number of stable quantized states of used memristors (e.g., for 7 MTJs per memristor the programming may be completed in only 8 cycles; if the programming voltage spread is too high, additional cycles might be introduced or adaptive programing scheme can be used, however state-of-the-art MTJ industrial fabrication technology can meet requirements with accepted write voltage distribution 34 ). The purpose of the digital control circuits to connect the desired components to the programming voltage and ground lines or to switch to normal operation. The state of the switches is stored in serially connected flip-flops. Therefore, additional AND gates controlled by the "enable" signal are used to disconnect all memristors while entering information about elements for programming. Then, after setting the appropriate programming voltage, the enable signal goes high for the duration of programming. The flip-flop and the AND gate are placed as close to the switches as possible, to save connection length. Digital components placed close to the sensitive analog circuit do not have influence on them, because during the operation of the ANN the digital circuitry is inactive, remaining in a static state (no clock signal) while providing the connection of memristors to the analog circuit.