Abstract
Recently, the exponential increase in compute requirements demanded by emerging applications like artificial intelligence, Internet of things, etc. have rendered the stateofart vonNeumann machines inefficient in terms of energy and throughput owing to the wellknown vonNeumann bottleneck. A promising approach to mitigate the bottleneck is to do computations as close to the memory units as possible. One extreme possibility is to do insitu Boolean logic computations by using stateful devices. Stateful devices are those that can act both as a compute engine and storage device, simultaneously. We propose such stateful, vector, inmemory operations using voltage controlled magnetic anisotropy (VCMA) effect in magnetic tunnel junctions (MTJ). Our proposal is based on the well known manufacturable 1transistor  1MTJ bitcell and does not require any modifications in the bitcell circuit or the magnetic device. Instead, we leverage the very physics of the VCMA effect to enable stateful computations. Specifically, we exploit the voltage asymmetry of the VCMA effect to construct stateful IMP (implication) gate and use the precessional switching dynamics of the VCMA devices to propose a massively parallel NOT operation. Further, we show that other gates like AND, OR, NAND, NOR, NIMP (complement of implication) can be implemented using multicycle operations.
Introduction
The pioneering works of the likes of Charles S. Peirce^{1}, Claude Shannon^{2} among others laid the foundation of digital logic design. The fact that few basic digital gates, that form a complete logic basis, can be easily implemented using electronic switches had far reaching implications for the future of digital computing. Indeed, with the invention of transistor switches^{3}, digital logic quickly gained ground and has become the workhorse of today’s information processing^{4}.
In general, the stateofart digital processors rely heavily on Boolean gates constituting the computational unit which is separate from the storage unit consisting of numerous memory cells. This decoupled architecture wherein memory and compute units are physically separated is named after its inventor as the vonNeumann architecture^{5}. The vonNeumann architecture forms the backbone of almost all the available commercial processors. Despite the tremendous strides made in computing efficiency powered by the vonNeumann machines, it fails to deliver the required speed and efficiency demanded by the recent developments in bigdata, artificial intelligence, Internetofthings (IoT) etc^{6}. The major limitation associated with the vonNeumann architecture is the socalled vonNeumann bottleneck^{7}. This bottleneck mainly arises from the limited data transfer rate between the physically decoupled compute and memory units. The frequent toandfro data transfer between the compute and the memory units, not only limits the overall throughput but also results in large energy overhead associated with each data transfer. In order to mitigate the limitations associated with the vonNeumann bottleneck one promising approach is to enable inmemory vector computations^{8,9,10}.
These novel computing paradigms termed as inmemory computations aim to implement some (or all) aspects of Boolean logic computations as close to the memory units as possible, thereby avoiding expensive data transfer between the compute and memory units, resulting in higher throughput and better energyefficiency. Such inmemory computations using conventional silicon based complementarymetaloxidesemiconductor (CMOS) technology has been demonstrated in ref.^{11}. The basic idea behind the inmemory compute mechanism proposed in ref.^{11} is to activate multiple rows of memorycells and readout a voltage which is proportional to the desired logic computations. However, silicon technology is itself facing tremendous challenges due to aggressive scaling of the CMOS transistors^{12,13,14}. As such, novel memory technologies like spin based magnetic random access memories (MRAMs)^{15,16}, resistive RAMs^{17}, phase change materials based memories^{18} are being actively investigated for possible replacement of silicon based technologies. A key benefit of these novel technologies is their nonvolatility. The nonvolatile characteristics of these memory units make them wellsuitable for ultralow leakage applications ultimately increasing the energyefficiency^{19}.
Exploration of inmemory compute designs using such nonvolatile technologies are crucial to meet the energy and throughput requirement demanded by the emerging data intensive applications. Spintransfertorque MRAM (STTMRAM) based inmemory Boolean computations have been proposed in refs^{20,21}. These inmemory architectures rely on the peripheral read circuits to implement the actual computations. Nevertheless, the peripheral circuits being close to the memory array does provide energy and throughput benefits. However, a major drawback associated with such read circuit based memory computations is the fact that the typical sense margin for STTMRAM is quite small^{22} and techniques that rely on logic computations based on multilevel sensing of the STTMRAM bitcells would inevitably suffer from robustness concerns. Further, the logic computation results are only available when the data is being read from the memory array. This implies if one were to do multiple logic operations which are dependent on the intermediate results, one would require to do a read operation for every logic computation. Thus, each inmemory logic operation is inevitably associated with a memory read operation even for intermediate results, leading to decreased memory throughput and energyefficiency. As opposed to the aforementioned works which use the memory peripheral circuits to do the actual logic computations, there are other classes of inmemory compute designs that do computations ‘insitu’ using ‘stateful’ memory devices. ‘Stateful’ devices are those wherein the same device acts both as a memory element and compute unit. The well known memristive implication (IMP) logic demonstrated in ref.^{23} is a good example of such stateful computations. However, the limited endurance of memristors in general make these devices unsuitable for onchip cache or IoT applications that have extreme longevity requirement. Out of all the nonvolatile technologies, spin based devices are the only devices that have high switching speed as well as unlimited endurance. Few works on stateful computations using spin devices can be found in refs^{24,25,26}. Specifically, the work presented in ref.^{24} uses a three terminal device exploiting the spin Hall effect and the voltage controlled magnetic anisotropy (VCMA) in spin devices to do stateful computations. However, one of the inputs to these devices is an electrical quantity i.e. input charge current. This in turn implies if we were to compute say the vector AND operation on the logic states stored in two separate memory rows, one of the memory rows will have to be read first, then converted into electrical signal (a current in this case) before the actual logic computation can be completed. This requirement of ‘read before compute’ would lead to degraded benefits in throughput and energy. Stateful computations as described in refs^{25,26} do not require the ‘read before compute’ scheme. Besides, works in refs^{27,28,29} have also shown inmemory compute primitives using spin based magnetic tunnel junctions (MTJs). The compute operations in refs^{25,26,27,28,29} are fundamentally based on the difference in resistance of the MTJ in the parallel and antiparallel state. Given the fact that the MTJ tunnel magnetoresistance ratio (TMR) is usually small^{30}, this leads to highly constrained design space. In the present work, we not only rely on the MTJ TMR but also on the voltage polarity based conditional lowering of the MTJ energy barrier (EB) through the VCMA effect, thereby widening the design space for stateful computations in spin devices. In addition, all the previous works can carry computations corresponding to one single memoryrow at a time, we show the possibility of enabling massively parallel multirow NOT and XOR operations (XOR operation described in the Appendix) by exploiting the magnetization dynamics of the MTJs based on the VCMA phenomenon.
In this manuscript, we employ the very physics of voltage controlled magnetic anisotropy to construct insitu, inmemory, stateful computations using a two terminal spin device. Specifically, we use the voltage asymmetry of the VCMA effect to construct IMP (implication) logic and the precessional dynamics of the VCMA switching process to propose a massively parallel NOT operation. The key highlights of the present work and its advantages over previous works are as follows:

1.
We propose insitu, inmemory stateful IMP vector computations using the voltage asymmetry of the VCMA effect on two terminal magnetic tunnel junctions (MTJs). In addition, we propose a massively parallel NOT operation by exploiting the precessional switching dynamics of VCMA based MTJs.

2.
Further, the massively parallel behavior of the proposed NOT gate allows multicycle computation of other Boolean functions including AND, OR, NAND, NOR, NIMP (complement of IMP), thereby constructing a rich logic functionality embedded within the memory array in a stateful manner.

3.
One of the major advantages of the proposed insitu, inmemory stateful vector computations is the fact that we rely on the well known 1 transistor  1 MTJ bitcell without making any changes in the magnetic device or the bitcell circuit. This is turn makes our proposal attractive from manufacturability point of view. Further, as opposed to^{20,21} our logic computations do not rely on complex read operations given the fact that reading MTJ devices in general is a complex circuit problem. In addition, as opposed to the work in ref.^{24}, we do not need to represent the logic operands by an electrical input, rather both the logic operands can be stored in the memory array leading to higher throughput.

4.
We have developed a detailed devicecircuit model comprising of selfconsistent magnetization dynamics and electron transport model integrated seamlessly in SPICE environment to study the feasibility of the proposed logic computations.
VCMA mechanism: Voltage Asymmetry and Precessional Switching
VCMA mechanism: Voltage asymmetry
The basic device structure under consideration in this work is the two terminal magnetic tunnel junction (MTJ). An MTJ consists of two nanomagnets separated by an insulating oxide as shown in Fig. 1(a). The MTJ is called a perpendicular MTJ if the magnetization directions of the two nanomagnets are perpendicular to the plane of the nanomagnets. One of the nanomagnets is fixed called the pinned layer (PL), while the other nanomagnet can be switched by applying a voltage across the MTJ called the free layer (FL). The MTJ has two stable states called the parallel (P) state and the antiparallel (AP) state. When the magnetization of the two nanomagnets are in the same direction the MTJ is in low resistance P state and viceversa.
Conventionally, the state of the MTJ has been switched using the current induced spin transfer torque (STT) phenomenon^{31}. The basic physics associated with the STT phenomenon relies on the fact that a spin polarized current passing through the FL exerts a torque on the FL thereby flipping the state of the MTJ from the P to the AP state and viceversa. This exerted torque by the STT mechanism has to be sufficient to overcome the energy barrier (EB) associated with the FL. In perpendicular MTJ, it is the interface anisotropy that creates the required energy barrier between the two stable states of the MTJ. In general, higher the EB, higher is the current required to switch the MTJ. One of the key challenges associated with the STT phenomenon is the high switching current requirement^{32}. In order to reduce the current requirement for switching the nanomagnet various voltage driven switching phenomenon are under intense research investigation^{33,34}. One of the most promising technique and easy to incorporate in the the two terminal MTJ stack is the voltage controlled magnetic anisotropy (VCMA) effect^{33}.
VCMA effect is the phenomenon of being able to modulate the interface anisotropy of the MTJ stack by applying a voltage across the MTJ^{35}. Application of an electric field modulates the relative occupancy of the valence dorbitals, as shown schematically in Fig. 1(a), thereby effectively changing the interface anisotropy^{36,37}. Recall, in perpendicular MTJs it is the interface anisotropy that is primarily responsible for creating the required EB. A large EB is required for maintaining the nonvolatility of the MTJ devices. However, a large EB also makes it harder to switch the nanomagnets during the write process. VCMA effect allows one to temporarily reduce the EB by reducing the interface anisotropy in response to electric field. The reduced EB makes it easier to switch the nanomagnets, thereby reducing the switching current requirement. On the other hand, if the direction of the electric field is reversed, EB increases due to the VCMA effect making it much more difficult to switch the nanomagnet. This increase or decrease in the EB due to application of an electric voltage across the MTJ is shown schematically in Fig. 1(b). The figure shows that, the VCMA effect makes the MTJ stack asymmetric with respect to the voltage polarity. With favorable voltage polarity (pinned layer at higher potential than the free layer) the MTJ can be easily switched while if the voltage polarity is reversed the MTJ would be difficult to switch. In fact, it has been experimentally shown that when the EB is increased by applying a voltage, the MTJ breaks down at sufficiently higher voltages but does not switch^{38}. In later section, we would describe how this voltage asymmetry of the VCMA based MTJs would be used to construct stateful IMP logic for vector operations.
VCMA mechanism: Precessional switching
In addition, the VCMA effect allows for a new switching dynamics, called the precessional switching, in contrast to the typical STT based switching phenomenon^{39}. The precessional switching dynamics can be understood with respect to Fig. 1(c). Let us assume the magnetization of the FL is initially pointing in +zdirection due to the interface anisotropy that tends to align the magnetization direction perpendicular to the plane of the nanomagnet. As a consequence of the VCMA effect, when a voltage is applied across the MTJ, the interface anisotropy decreases. If the decrease in the interface anisotropy is sufficient, the magnetization would no longer be bound by the interface anisotropy and would be free to deviate from its initial position (+z direction in this case). Now, assume there is a small inplane field in +xdirection (denoted as H_{ in−plane } in Fig. 1(c)) either due to the shape anisotropy, or such inplane field can be engineered in the MTJ stack as experimentally demonstrated in ref.^{40}. Since, the interface anisotropy has been reduced by voltage application (V > 0 in Fig. 1(c)) and there is an effective field in the +x direction, the magnetization would tend to align itself to the effective field. It would do so by precessing and slowly damping towards the +x direction. This behavior is graphically depicted in Fig. 1(c), where the magnetization initially starts from position ‘I’ and then follows the trajectory marked by points ABC on application of electric field across the MTJ (V > 0).
If we turn OFF the applied voltage when the magnetization is at point A in Fig. 1(c), the magnetization would slowly dampen and point in the −z direction due to the interface anisotropy. Thus, by timing the voltage pulse such that magnetization makes a halfcycle around the hardaxis (+x in this case), it can be switched by 180°. This switching due to the precession of the magnetization across the hardaxis is called precessional switching. VCMA based precessional switching has several advantages including low energyrequirement and high switching speed^{41}. We would later describe how this precessional switching of the VCMA MTJs can be used to construct a massively parallel NOT operation.
Device Modeling
In this section, we describe the coupled devicecircuit simulation model developed for analyzing the proposed stateful logic computations. The model integrates and selfconsistently solves the magnetization dynamics and electron transport model in a SPICE platform, enabling a rigorous circuit simulation for evaluating the proposed vector operations.
Magnetization Dynamics with VCMA
The magnetization vector in a monodomain nanomagnet follows the dynamics governed by the wellknown LandauLifshitzGilbertSlonczewski (LLGS) equation^{42,43}. LLGS equation can be written as follows:
where \(\hat{m}\) is the unit magnetization vector, α is the Gilbert damping constant, γ is the gyromagnetic ratio, H_{ EFF } is the effective magnetic field experienced by the nanomagnet and \(\overrightarrow{STT}\) is the STT torque acting on the nanomagnet. The first term on the right hand side of Eq. 1 relates to magnetization precession along H_{ EFF } while the second and last terms describe the damping torque and STT, respectively. H_{ EFF } includes an external field (H_{ ext }), demagnetization field due to shape anisotropy^{44} (H_{ demag }), the interface perpendicular anisotropy field^{45} (H_{ ani }) and stochastic field due to thermal noise (H_{ thermal }), as described in Eq. 2. The \(\overrightarrow{STT}\) torque is expressed in Eq. 3, where β is the rate of spin transfer into the MTJFL, ε is the spin injection efficiency, \(\hat{P}\) is the polarization of the incoming spin current and ε′ describes the STT fieldlike torque.
Further, as described earlier, VCMA modulates the interface anisotropy of the MTJ stack in response to an applied voltage. VCMA is thus modeled using a voltage dependent anisotropy constant (K_{ i }), which is incorporated in the LLGS equation through H_{ ani }, as follows:
where ξ is the VCMA coefficient, V_{ MTJ } is the voltage applied across the MTJ stack, t_{ Mgo } is the spacer oxide thickness, K_{i0} is the nominal value of anisotropy constant at zero voltage (no VCMA), M_{ s } is saturation magnetization and t_{ FL } is thickness of the FL nanomagnet. The thermal noise was included in the LLGS equation using a thermal field given by Brown’s model^{46} as:
where \(\overrightarrow{\zeta }\) is a vector having components that are Gaussian random variables with zero mean and standard deviation of 1, ρ_{ mtj } is the volume of the nanomagnet, T is the ambient temperature, k_{ B } is the Boltzmann’s constant and dt is the simulation time step. The device dimensions and other parameters used in our simulations are tabulated in Table 1.
MTJ Resistance model
The resistance of the MTJ was modeled using the nonequilibrium Green’s function (NEGF) approach, benchmarked against experimental data from^{16}, as illustrated in Fig. 2(a). The details of various equations used in our NEGF model can be found in ref.^{30}. Our NEGF model is based on a potential profile wherein a nonmagnetic barrier separates two nanomagnets. The nonmagnetic barrier is characterized by its energybarrier while the nanomagnets by their bandsplitting energy. The results obtained by the NEGF calculations were encapsulated in an analytical fitting model such that the resulting MTJ resistance was modeled as a SPICE compatible voltage dependent resistance.
SelfConsistent SPICE Compatible Magnetization Dynamics and Resistance Model
A SPICE compatible devicecircuit model was developed in VerilogA for the VCMAMTJ. The VerilogA model concurrently solves the LLGS equation, the MTJ resistance model and the associated circuit equations. Predictive transistor models^{47} were used for the access transistors, thus completing the 1T 1VCMA MTJ bitcell model. Figure 2(b) shows graphically the various building blocks associated with our selfconsistent devicecircuit simulation framework.
Proposed insitu, inmemory Stateful Vector Logic Operations
Stateful vector IMP gates
Let us assume we have two VCMA based MTJs – ‘MTJ1’ and ‘MTJ2’ storing two input data bits ‘Bit1’ and ‘Bit2’, respectively. We wish to compute the implication (IMP) of bits ‘Bit1’ and ‘Bit2’ such that the new value of the MTJ2 would correspond to the IMP of the original values of bits ‘Bit1’ and ‘Bit2’. Further, let us assume that this logic computation has to be done in a ‘stateful’ manner such that the same VCMA MTJs (that function as memory elements storing bits ‘Bit1’ and ‘Bit2’) also act as logic computation units.
In order to understand the proposed stateful computations, let us consider the truth table of a two input IMP gate shown in Fig. 3(a). Note, the first column (A) would physically represent possible states of MTJ1 and the second column (B) would represent states of MTJ2. The third column (B’) represents the new state of MTJ2 after the logic operation has been completed. Interestingly, in Fig. 3(a), column B is same as B’ except for row 1 (highlighted in red). Further, we assume the low digital level (L) is mapped to the P state of the MTJ and high digital level (H) is mapped to the AP state. This implies in order to do the stateful computations, when the operand ‘A’ (MTJ1) is in the P state and operand ‘B’ (MTJ2) is also in the P state, the state of MTJ2 should change from P to AP, thereby mimicking the logic operation corresponding to row 1 of Fig. 3(a). Further, for all other cases since B = B’, the state of the MTJ2 should not change. Thus, if we can retain the state of MTJ2 for rows 2, 3, 4 and change the state from P to AP for row 1 we would have effectively accomplished the IMP operation.
Figure 3(b) and (c), illustrates the devicecircuit technique to do the aforementioned IMP computation. Let us assume we have two vector input operands ‘A’ and ‘B’. The bits ‘A_{0}’ to ‘A_{ N }’ corresponding to the input ‘A’ are stored in upper row of the memory array as shown in Fig. 1(b). Similarly, bits ‘B_{0}’ to ‘B_{ N }’ corresponding to the input ‘B’ are stored in lower row of the memory array. In order to do the bitwise IMP computations for operands ‘A’ and ‘B’ we would activate the corresponding wordlines WL1 and WLN. Simultaneously, a voltage V_{ DD } would be applied to SL1, while SLN would be grounded resulting in a current flow as marked by the red arrow in Fig. 3(b). A simplified version of the resulting circuit configuration, considering one column consisting of one bit from the vector operand ‘A’ and corresponding bit from the vector operand ‘B’, is shown in Fig. 3(c).
Figure 3(c) is basically a voltage divider, the voltage at node ‘mid’ depends on the resistance states of MTJ1 and MTJ2. Note, in this circuit configuration the pinnedlayer of MTJ1 has a lower voltage than the freelayer, while for MTJ2 the pinnedlayer is at a higher voltage than the freelayer. This in turn implies, with reference to Fig. 1(b), MTJ1 has a higher energy barrier (EB) while MTJ2 has a lowered energy barrier owing to the VCMA effect. As such, it is much easier to switch MTJ2 while the state of MTJ1 would remain intact due to increase in its EB. Thus, irrespective of the data stored in the two MTJs, MTJ1 would have a higher EB while MTJ2 would have a lower EB.
By appropriate choice of V_{ DD } and the MTJ resistances, the circuit in Fig. 3(c) can be designed such that MTJ2 switches from the P to the AP state only when MTJ1 is in the P state. A higher voltage at node ‘mid’ (corresponding to the P state of MTJ1) would imply enhanced lowering of the EB for MTJ2 allowing the small current flowing through the MTJ2 to be able to deterministically switch the MTJ2 from the P to the AP state as desired.
Note, it is due to the lowered EB of the MTJ2, that the small current flowing through the MTJs can switch the MTJ2, but not the MTJ1 (since the EB for MTJ1 has increased due to its voltage polarity). The current flowing through the MTJ2 switches its state due to the STT effect, given the fact that the switching current requirement for MTJ2 has been conditionally (only when MTJ1 is in the P state) reduced due to the voltage at node ‘mid’. This STT like switching behavior, as shown in Fig. 3(d), is evident from the magnetization dynamics of MTJ2, simulated using the model described in the previous section. Note, the P to AP switching of the MTJ2 only when MTJ1 is in the P state implements both the rows 1 and 3 of the Fig. 3(a). Specifically, when MTJ1 is in the AP state, voltage at node ‘mid’ is not high enough to sufficiently lower the EB of MTJ2, thereby retaining its original state corresponding to row 3. However, when MTJ1 is in the P state, the voltage across MTJ2 is sufficient to lower the EB of MTJ2 such that it switches to the AP state corresponding to row 1.
The state for MTJ2 (corresponding to the column B’ in Fig. 3(a)) for remaining rows 2 and 4 is same as the column B and is the AP state. Further, the current flow direction is such that it always tries to switch MTJ2 to the AP state. Thus, for rows 2 and 4, MTJ2 is initially in the AP state, moreover, the current flowing through the MTJ2 is also trying to switch it to the AP state, thereby the state of MTJ2 is retained for both rows 2 and 4.
In summary, for implementing the IMP operation, we perform selective switching of the MTJs. This selective switching is a result of the combined effect of the VCMA based lowering of the EB for MTJ2 and the STT induced torque due to the current flowing through the series connection of the two MTJs. Only when the MTJ1 is in P state (digital ‘L’) the voltage across the MTJ2 is high enough to sufficiently lower the EB such that MTJ2 switches by the STT mechanism from P to AP state (or from digital ‘L’ to ‘H’ state). The switching mechanism is still the STT effect, but the reason why some MTJs switch and others do not is based on the fact that it is the VCMA induced selective lowering of the EB that allows the STT current to be able to switch the MTJs. This selective switching corresponding to all the four cases shown in truth table of Fig. 3(a) is presented in Fig. 3(e). As it can be observed, only for the first row of the truth table the magnetization switches in all other cases it retains it’s original state. As a result, by merely activating WL1 and WL2 and applying appropriate voltages on lines SLs, insitu stateful vector IMP operation can be achieved.
Stateful parallel NOT gates
NOT is a one variable operation, therefore, let us consider a single bitcell consisting of 1 transistor  1 VCMA MTJ. In order to reverse the current state of the MTJ we can use the precessional switching dynamics of the VCMA effect. As explained in earlier sections, when sufficient voltage is applied across the VCMA MTJ, the interface anisotropy decreases and in presence of an effective inplane field the magnetization starts precessing around the hard axis as shown in Fig. 1(c). If the input voltage pulse is clocked such that the magnetization has made a half cycle around the hard axis the direction of magnetization would have been effectively reversed by 180°.
Interestingly, irrespective of whether the initial state of the magnetization vector was pointing in the +z or the −z direction, when a sufficient positive voltage is applied to lower the interface anisotropy, the magnetization would start precessing around the hardaxis. This implies, when the magnetization vector would have completed a halfcycle around the hardaxis, if it initially started from +z direction (−z direction), it would now be pointing closer to the −z direction (+z direction). If the voltage pulse is turned OFF when the magnetization has made a halfcycle around the hardaxis it would effectively have switched by 180°. Therefore, irrespective of the initial state of the MTJ, the magnetization direction would always be reversed if the input voltage pulse is clocked such that the magnetization has only completed a halfcycle around the hard axis.
This unipolar switching characteristic of the VCMA MTJ, wherein the magnetization always switches by 180° on application of appropriate voltage pulse, can be used to construct a massively parallel vector NOT operation as shown in Fig. 4(b,c). Let us assume we have to do a NOT operation for all the bits corresponding to rows WL1 and WLN. Both WL1 and WLN would be pulled high to activate the access transistors and proper voltage V_{ DD } needs to be applied to BL1 through BLN. This V_{ DD } would be dictated by the VCMA MTJ characteristics such that the magnetization starts precessing around the hardaxis. Usually, the voltage required for VCMA based precessional switching is higher than the voltage requirement for STTdominated switching^{48}. After a predetermined time duration, corresponding to the half cycle precession of the magnetization, the WL and V_{ DD } voltages would be pulled low, thereby reversing the state of all the MTJs connected to both WL1 and WLN.
It might be instructive to comment that the switching mechanism during the IMP operation, described in the previous subsection was STT dominated, the VCMA effect during the IMP operation merely reduced the EB such that the STT current can switch the device. In contrast, for the NOT operation, the switching dynamics is VCMA dominated, that results in precessional switching of the MTJs. The VCMA dominated switching dynamics is also evident from our simulation result shown in Fig. 4(c), which shows a typical magnetization trajectory during the precessional switching based NOT operation. Note, in the upper (lower) part of Fig. 4(c), the magnetization vector starts from +z axis (−z axis) and makes approximately a halfcycle around the xaxis before it dampens and consequently settles down in the −z direction (+z direction). Therefore, irrespective of its initial direction, the magnetization vector is always reversed when it completes a halfcycle around the hard axis. The presence of both the STT and VCMA dominated regime in the same MTJ device has been demonstrated experimentally in many works including^{48}.
In principle, we can activate all the WLs in the memory array, simultaneously, such that the entire memory array can be flipped in a massively parallel manner. However, in practice the number of WLs that can be simultaneously activated would be limited by the peripheral circuits and the current drivability of the drivers connected to BLs and WLs. Nevertheless, multiple rows can be easily flipped in one cycle resulting in a massively parallel stateful NOT operation. Further, one could argue that precise timing control of the voltage pulses are required for the proper functioning of the NOT operation and given circuit level variations the writeerrorrate (WER) for the proposed NOT operation would be exceptionally high. It is worth mentioning, by proper circuit techniques such errors can be mitigated. In fact, as demonstrated in ref.^{49}, authors in ref.^{49} were able to obtain WER as low as 1e14 for precessional switching in VCMA MTJs. A detailed description of the peripheral circuits and writescheme used for mitigating the WER in precessional switching of VCMA MTJs can be found in ref.^{49}. Further, the WER for the AP to the P precessional switching is slightly different from the P to the AP precessional switching. The difference arises due to the existence of the small current flow through the VCMA MTJ favoring one particular switching direction as opposed to the other. However the difference is usually small and has been extensively studied in ref.^{41}. In summary, the precessional switching of the VCMA MTJs can be used as a massively parallel NOT operation.
Other Logic Gates
We have already demonstrated that we can accomplish vector IMP and NOT operations in one cycle. In principle, since the IMP along with NOT operation is a universal gate, the proposed scheme can be used for mapping any arbitrary Boolean computations. However, since NOT is a massively parallel operation, the IMP operation can be combined with the NOT operation to achieve various other basic Boolean gates. For example, as shown in Fig. 5, by using two cycles stateful NAND/OR/NIMP logic operations can be accomplished. Further, if we assume three cycles, stateful AND/NOR operations can be computed using the proposed techniques. Note, as opposed to the stateful IMP logic in memristive crossbars^{23}, the present proposal has significant advantages due to the fact that the NOT operation can be achieved in a massively parallel manner that too in the usual 1transistor  1MTJ bitcell, thereby enabling other stateful logic operation as in NAND/NOR etc.
Normal Memory Read and Write Operations
For the sake of completeness, note that the 1transistor – 1VCMA MTJ array can still be used as a conventional memory block. Conventionally, in a usual MRAM array the source line (SL) and the bit line (BL) run parallel to each other while the word line (WL) runs in the orthogonal direction^{16}. However, in the present proposal we have the SL and the WL parallel to each other, while the BL is in the orthogonal direction as shown in Figs 3 and 4. For a normal write operation (wherein a particular data has to be stored in an MTJ), we would follow the ‘read before write’ scheme. The ‘read before write’ scheme for precessionally switched VCMA based arrays have been discussed in detail in various previous works including^{39,49}. The requirement for such a ‘read before write’ scheme can be understood as follows. Precessional switching can only reverse the direction of magnetization provided the pulse duration is properly chosen so that the magnetization makes approximately a half cycle around the hard axis. This necessitates reading the data stored in the MTJ before a write pulse can be applied. After reading the MTJs, only those MTJs are reversed through the VCMA voltage that are not in the desired state.
Such a write operation can be easily accomplished in the array structure shown in Figs 3 and 4. For performing the write operation, the selected WL would be activated by driving it to a high voltage. The data would first be read by driving the SL to a voltage V_{ read } and sensing the resultant current on the respective BLs. Once the data is read, subsequently only those MTJs would be switched whose original state differs from the data to be written into the MTJs. This can be accomplished by pulling the SL to ground and applying a write voltage of proper duration on those BLs for which the MTJs have to be switched. For all other MTJs, the BL would be grounded, thereby preventing any inadvertent switching.
Results
In this section, using the comprehensive simulation model described earlier, we evaluate the functionality and performance of the proposed inmemory vector computations. Note, during the process of magnetization switching, the resistance of the MTJ keeps changing which in turn would change the voltage across the MTJ. Therefore, both the STT and the VCMA strength is a function of the instantaneous direction of magnetization. In order to properly capture these effects a selfconsistent SPICE model like the one described in the earlier section on device modeling is required as opposed to mixed mode models that solve decoupled LLGS and resistance equations separately.
The vector IMP operations are performed using the STTdominated switching of MTJs. In performing an IMP operation on vectors A and B, the current flows from the bitcells storing bits corresponding to operand A to bitcells corresponding to operand B, eventually replacing vector B with the resulting bitwise IMP operation (refer to Fig. 3). Also, the negativeVCMA effect on bitcells storing operand A prevents them from switching their state. Figure 6(a) shows the probability of B’s final state  which represents the result  being ‘1’ (or ‘H’ or AP) for the four possible A and B inputs ‘00’, ‘01’, ‘10’ and ‘11’, as a function of the applied voltage pulse width. The simulation is done for various runs in presence of stochastic thermal variations. It can be observed that when the initial state of B is ‘H’ or AP (for inputs ‘01’ and ‘11’), the final state is also AP, irrespective of A’s state. This is because the direction of the current flow restricts B from switching from AP to P state. On the other hand, for the input ‘11’, B never switches its state since the current flowing through the bitcells in this case is designed to be lower than the critical current required for STT switching, given the fact that the voltage across MTJ2 is not high enough to sufficiently lower its EB. However, for the input ‘00’, B switches with a probability of ~1, for a voltage pulse width of ~25 ns, thus verifying the functionality and robustness of the bitwise IMP operation. The average energy consumption perbit and latency of the IMP operation is tabulated in Table 2.
While IMP uses STTdominated switching, NOT operation is primarily VCMAdominated. As described earlier, the magnetization starts precessing along the hardaxis when a sufficient voltage is applied across the MTJ (see Fig. 4(c)). Note that the V_{ DD } for the NOT operation is specifically chosen, so as to ensure VCMAdominated precessional dynamics. Figure 6(b) shows the switching probability as a function of voltage pulse width, in presence of thermal variations. The switching probability shows an oscillatory behavior since the final state of the MTJ depends on the magnetization vector direction at the instant when the voltage is turned off. Such oscillating switching probability is typical for precessionally switched magnets. When the magnetization makes a halfcycle of precession (~2 ns) around the hardaxis, a switching probability close to 1 is achieved, thus confirming the expected functionality for the NOT operation. The presented figure is for the P to the AP switching, similar oscillating probability was also obtained for the AP to P switching. Note that the NOT operation is massively parallel. Even multiple vectors can be inverted simultaneously, by activating the corresponding WLs and SLs of the bitcells. Table 2 enumerates the energy consumption perbit and latency of the NOT operation.
Before we conclude the manuscript, it is informative to mention that the present proposal relies on the VCMA as well as the STT effect for implementing the ‘stateful’ computations. The key material parameters that are crucial for proper functioning of the proposed schemes are the VCMA coefficient and the TMR ratio. Higher the VCMA coefficient better is the change in the energy barrier in response to applied voltage. Similarly, higher the TMR more is the resistance difference between the parallel and the antiparallel state of the MTJ and better is the control of the MTJ resistance on the STT current flowing through the series MTJs of Fig. 3(c). As such, material stacks that exhibit higher TMR and VCMA coefficient would be better suited for the proposed ‘stateful’ gates. The typical material parameters we used in our simulations are mentioned in Table 1.
Conclusion
The conventional vonNeumann computing architecture fails to deliver the required energy and throughput efficiency for emerging data intensive applications like artificial intelligence, IoT etc. Enabling inmemory computations is being hailed by the research community as a promising technique with a potential to go beyond the vonNeumann computing model. The basic idea driving such inmemory computations is to enable logic computations as close to the memory unit as possible. An extreme possibility is to do logic computations ‘insitu’ in stateful manner, wherein the same device acts like a storage element as well as logic computation engine. In this manuscript, we have proposed
insitu, inmemory Boolean stateful computations by leveraging the very physics of voltage controlled magnetic anisotropy in MTJs. The voltage asymmetry of VCMA based MTJs has been used to propose a stateful IMP operation, while the precessional switching dynamics has been exploited for constructing a massively parallel NOT operation. Further, various other gates including AND, OR, NAND, NOR, NIMP can be easily computed using multicycle operations. Our results have been verified by a detailed selfconsistent magnetization dynamics and resistance model. In addition, the present proposal does not require any changes in the basic magnetic device or the bitcell circuit, thereby making our proposal feasible from manufacturability point of view.
References
 1.
Peirce, C. S. Letter, Peirce to A. Marquand. Writings of Charles S. Peirce 5, 541–543 (1993).
 2.
Shannon, C. E. A symbolic analysis of relay and switching circuits. Electrical Engineering 57, 713–723 (1938).
 3.
Bardeen, J. & Brattain, W. H. The transistor, a semiconductor triode. Physical Review 74, 230 (1948).
 4.
Lempel, O. 2nd generation Intel core processor family: Intel core i7, i5 and i3. In Hot Chips 23 Symposium (HCS), 2011 IEEE 1–48 (IEEE, 2011).
 5.
Von Neumann, J. The computer and the brain (Yale University Press, 2012).
 6.
Chen, C. P. & Zhang, C.Y. Dataintensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences 275, 314–347 (2014).
 7.
Emma, P. G. Understanding some simple processorperformance limits. IBM journal of Research and Development 41, 215–232 (1997).
 8.
Kozyrakis, C. E. et al. Scalable processors in the billiontransistor era: Iram. Computer 30, 75–78 (1997).
 9.
Zaharia, M. et al. Resilient distributed datasets: A faulttolerant abstraction for inmemory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation 2–2 (USENIX Association, 2012).
 10.
Linn, E., Rosezin, R., Tappertzhofen, S., Böttger, U. & Waser, R. Beyond von Neumann—logic operations in passive crossbar arrays alongside memory operations. Nanotechnology 23, 305205 (2012).
 11.
Kang, M., Keel, M.S., Shanbhag, N. R., Eilert, S. & Curewitz, K. An energyefficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on 8326–8330 (IEEE, 2014).
 12.
Roy, K., Mukhopadhyay, S. & MahmoodiMeimand, H. Leakage current mechanisms and leakage reduction techniques in deepsubmicrometer CMOS circuits. Proceedings of the IEEE 91, 305–327 (2003).
 13.
Ghani, T. et al. Scaling challenges and device design requirements for high performance sub50 nm gate length planar CMOS transistors. In VLSI Technology, 2000. Digest of Technical Papers. 2000 Symposium on 174–175 (IEEE, 2000).
 14.
Skotnicki, T., Hutchby, J. A., King, T.J., Wong, H.S. & Boeuf, F. The end of CMOS scaling: Toward the introduction of new materials and structural changes to improve MOSFET performance. IEEE Circuits and Devices Magazine 21, 16–26 (2005).
 15.
Huai, Y. Spintransfer torque MRAM (STTMRAM): Challenges and prospects. AAPPS bulletin 18, 33–40 (2008).
 16.
Lin, C. et al. 45 nm low power CMOS logic compatible embedded STT MRAM utilizing a reverseconnection 1T/1MTJ cell. In Electron Devices Meeting (IEDM), 2009 IEEE International 1–4 (IEEE, 2009).
 17.
Govoreanu, B. et al. 10 × 10 nm 2 Hf/HfO x crossbar resistive RAM with excellent performance, reliability and lowenergy operation. In Electron Devices Meeting (IEDM), 2011 IEEE International 31–6 (IEEE, 2011).
 18.
Wong, H.S. P. et al. Phase change memory. Proceedings of the IEEE 98, 2201–2227 (2010).
 19.
Nomura, K., Abe, K., Yoda, H. & Fujita, S. Ultra low power processor using perpendicularSTTMRAM/SRAM based hybrid cache toward next generation normallyoff computers. Journal of Applied Physics 111, 07E330 (2012).
 20.
Jain, S., Ranjan, A., Roy, K. & Raghunathan, A. Computing in memory with spintransfer torque magnetic ram. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 470–483, https://doi.org/10.1109/TVLSI.2017.2776954 (2018).
 21.
Kang, W., Wang, H., Wang, Z., Zhang, Y. & Zhao, W. Inmemory processing paradigm for bitwise logic operations in STTMRAM. IEEE Transactions on Magnetics (2017).
 22.
Noguchi, H. et al. A 250MHz 256bI/O 1Mb STTMRAM with advanced perpendicular MTJ based dual cell for nonvolatile magnetic caches to reduce active power of processors. In VLSI Technology (VLSIT), 2013 Symposium on, C108–C109 (IEEE, 2013).
 23.
Borghetti, J. et al. Memristive switches enable stateful logic operations via material implication. Nature 464, 873 (2010).
 24.
Zhang, H., Kang, W., Wang, L., Wang, K. L. & Zhao, W. Stateful reconfigurable logic via a singlevoltagegated spin Halleffect driven magnetic tunnel junction in a spintronic memory. IEEE Transactions on Electron Devices 64, 4295–4301 (2017).
 25.
Mahmoudi, H., Windbacher, T., Sverdlov, V. & Selberherr, S. High performance MRAMbased stateful logic. In Ultimate Integration on Silicon (ULIS), 2014 15th International Conference on 117–120 (IEEE, 2014).
 26.
Mahmoudi, H., Windbacher, T., Sverdlov, V. & Selberherr, S. Implication logic gates using spintransfertorqueoperated magnetic tunnel junctions for intrinsic logicinmemory. SolidState Electronics 84, 191–197 (2013).
 27.
Chowdhury, Z. et al. Efficient inmemory processing using spintronics. IEEE Computer Architecture Letters (2017).
 28.
Lyle, A. et al. Direct communication between magnetic tunnel junctions for nonvolatile logic fanout architecture. Applied Physics Letters 97, 152504 (2010).
 29.
Lyle, A. et al. Magnetic tunnel junction logic architecture for realization of simultaneous computation and communication. IEEE Transactions on Magnetics 47, 2970–2973 (2011).
 30.
Fong, X. et al. KNACK: A hybrid spincharge mixedmode simulator for evaluating different genres of spintransfer torque MRAM bitcells. In 2011 International Conference on Simulation of Semiconductor Processes and Devices 51–54 (IEEE, 2011).
 31.
Stiles, M. D. & Zangwill, A. Anatomy of spintransfer torque. Physical Review B 66, 014407 (2002).
 32.
Jaiswal, A., Fong, X. & Roy, K. Comprehensive scaling analysis of current induced switching in magnetic memories based on inplane and perpendicular anisotropies. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6, 120–133 (2016).
 33.
Alzate, J. G. et al. Voltageinduced switching of nanoscale magnetic tunnel junctions. In Electron Devices Meeting (IEDM), 2012 IEEE International 29–5 (IEEE, 2012).
 34.
Borisov, P., Hochstrat, A., Chen, X., Kleemann, W. & Binek, C. Magnetoelectric switching of exchange bias. Physical Review Letters 94, 117203 (2005).
 35.
Amiri, P. K. & Wang, K. L. Voltagecontrolled magnetic anisotropy in spintronic devices. In Spin vol. 2, 1240002 (World Scientific, 2012).
 36.
Zhang, J., Lukashev, P. V., Jaswal, S. S. & Tsymbal, E. Y. Model of orbital populations for voltagecontrolled magnetic anisotropy in transitionmetal thin films. Physical Review B 96, 014435 (2017).
 37.
Kyuno, K., Ha, J.G., Yamamoto, R. & Asano, S. Firstprinciples calculation of the magnetic anisotropy energies of Ag/Fe (001) and Au/Fe (001) multilayers. Journal of the Physical Society of Japan 65, 1334–1339 (1996).
 38.
Wang, W.G., Li, M., Hageman, S. & Chien, C. Electricfieldassisted switching in magnetic tunnel junctions. Nature materials 11 (2012).
 39.
Sharmin, S., Jaiswal, A. & Roy, K. Modeling and design space exploration for bitcells based on voltageassisted switching of magnetic tunnel junctions. IEEE Transactions on Electron Devices 63, 3493–3500 (2016).
 40.
Zhao, Z., Smith, A. K., Jamali, M. & Wang, J.P. Externalfieldfree spin Hall switching of perpendicular magnetic nanopillar with a dipolecoupled composite structure. arXiv preprint arXiv:1603.09624 (2016).
 41.
Wang, S. et al. Comparative evaluation of spintransfertorque and magnetoelectric random access memory. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6, 134–145 (2016).
 42.
Gilbert, T. L. A phenomenological theory of damping in ferromagnetic materials. IEEE Transactions on Magnetics 40, 3443–3449 (2004).
 43.
d’Aquino, M. Nonlinear magnetization dynamics in thinfilms and nanoparticles. PhD. thesis, Università degli Studi di Napoli Federico II (2005).
 44.
Wang, Z. et al. Magnetization characteristic of ferromagnetic thin strip by measuring anisotropic magnetoresistance and ferromagnetic resonance. Solid State Communications 182, 10–13 (2014).
 45.
Jaiswal, A., Fong, X. & Roy, K. Comprehensive scaling analysis of current induced switching in magnetic memories based on inplane and perpendicular anisotropies. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6, 120–133, https://doi.org/10.1109/JETCAS.2016.2547698 (2016).
 46.
Brown, W. F. Jr. Thermal fluctuations of a singledomain particle. Journal of Applied Physics 34, 1319–1320 (1963).
 47.
Predictive Technology Models. http://ptm.asu.edu/ (2016).
 48.
Kanai, S. et al. Magnetization switching in a CoFeB/MgO magnetic tunnel junction by combining spintransfer torque and electric fieldeffect. Applied Physics Letters 104, 212406 (2014).
 49.
Noguchi, H. et al. Novel voltage controlled MRAM (VCM) with fast read/write circuits for ultra large last level cache. In Electron Devices Meeting (IEDM), 2016 IEEE International 27–5 (IEEE, 2016).
 50.
Ikeda, S. et al. A perpendicularanisotropy CoFeB–MgO magnetic tunnel junction. Nature materials 9, 721–724 (2010).
Acknowledgements
The research was funded in part by CBRIC, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA, the National Science Foundation, Intel Corporation and Vannevar Bush Faculty Fellowship.
Author information
Affiliations
Contributions
A.J. and K.R. conceived the idea, while A.A. designed the simulation framework and conducted the simulation experiments with help from A.J. All the authors helped in writing of the manuscript and discussing the results.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jaiswal, A., Agrawal, A. & Roy, K. Insitu, InMemory Stateful Vector Logic Operations based on Voltage Controlled Magnetic Anisotropy. Sci Rep 8, 5738 (2018). https://doi.org/10.1038/s41598018238862
Received:
Accepted:
Published:
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.