Experimental demonstration of an on-chip p-bit core based on stochastic magnetic tunnel junctions and 2D MoS2 transistors

Probabilistic computing is a computing scheme that offers a more efficient approach than conventional complementary metal-oxide–semiconductor (CMOS)-based logic in a variety of applications ranging from optimization to Bayesian inference, and invertible Boolean logic. The probabilistic bit (or p-bit, the base unit of probabilistic computing) is a naturally fluctuating entity that requires tunable stochasticity; by coupling low-barrier stochastic magnetic tunnel junctions (MTJs) with a transistor circuit, a compact implementation is achieved. In this work, by combining stochastic MTJs with 2D-MoS2 field-effect transistors (FETs), we demonstrate an on-chip realization of a p-bit building block displaying voltage-controllable stochasticity. Supported by circuit simulations, we analyze the three transistor-one magnetic tunnel junction (3T-1MTJ) p-bit design, evaluating how the characteristics of each component influence the overall p-bit output. While the current approach has not reached the level of maturity required to compete with CMOS-compatible MTJ technology, the design rules presented in this work are valuable for future experimental implementations of scaled on-chip p-bit networks with reduced footprint.


-Introduction
Computing is at a crossroads: just as the transistor-scaling driven by Moore's Law has afforded improvements in conventional CMOS-based computing performance, there is an inevitable slowing down due to fundamental device limits 1 .Furthermore, the inherently deterministic nature of conventional computing makes the current CMOS model unsuitable for contending with the continued future growth of applications such as in neuromorphic computing and Artificial Intelligence (AI) 2 .
A superior approach is that of probabilistic computing.In probabilistic computing, the key component is the probabilistic bit (or p-bit), a unit that fluctuates randomly, but controllably, between 0 and 1 3 .Indeed, a network of such p-bits can leverage their stochastic nature to function as efficient hardware accelerators for solving complex problems that are themselves inherently probabilistic.These problems, which lie at the core of many real-world machine learning applications and algorithms of AI, range in nature from combinatorial optimization problems (such as integer factorization) to recognition and classification [4][5][6][7][8][9][10][11][12][13][14][15][16] .
At its core, a p-bit requires a tunable stochastic element.While it should be noted that this can be implemented with standard CMOS technology [17][18][19] and a significant device overhead, the resulting p-bit suffers from a large areal and energy footprint, as well as not offering true randomness 20 .
An ultra-compact approach for tunable randomness that yields the desired sigmoidal-shaped input/output characteristics, which is scalable and energy-efficient, is achieved by exploiting the physics of low-barrier fluctuating nanomagnets when coupled with existing Magnetic Tunnel Junction (MTJ) technology.Such p-bit implementations using stochastic MTJs have been shown [21][22][23][24][25] , but as yet, the proof-of-concept implementations have required field-programmable gate arrays (FPGAs) or external circuitry, with orders of magnitude more transistors involved than needed in the 3T-1MTJ p-bit design explored in this work.
In this work, the first experimental on-chip demonstration of the core of a p-bit, exhibiting tunable stochasticity, is reported.Using a variation of the 3T-1MTJ design proposed by Camsari et al. 26 , a stochastic MTJ is integrated with a high-performance MoS2 transistor next to each other on the same chip, experimentally showing for the first time the desired gate-controlled fluctuations at room temperature.Moreover, this article elucidates the impact and interaction of the various critical device characteristics shown in Figure 1(a), including that of the i) MTJ, ii) the transistor that is part of the p-bit core and iii) the inverter (see Figure 1(a)).It is found thatagainst common wisdoma large Tunnel Magnetoresistance (TMR) is not the best choice for p-bits; telegraphic fluctuations are highly undesirable and are a sign of a slow device; matching of the MTJ resistance and the transistor characteristics is crucial; and an ideal inverter with a large gain is incompatible with the desired p-bit operation.
In detail this article is organized as follows: Section 2 briefly introduces the stochastic MTJ, the key element of the p-bit.Next, Section 3 shows the actual hardware demonstration of the p-bit core, focusing on the matching conditions between the MTJ and the transistor.This is followed by a detailed discussion of the impact of Tunneling Magnetoresistance (TMR) and the distribution of states on the p-bit performance in Section 4. Last, in Section 5, the impact of the inverter characteristics is discussed.

-Implementing probabilistic bits (p-bits) with stochastic MTJs
At its core, a Magnetic Tunnel Junction (MTJ) consists of two ferromagnetic layers separated by an ultrathin insulating layer (Figure 1(b)).The "fixed" layer, which has the stronger magnetic moment, is used as the reference for the "free" layer, whose magnetic moment is more susceptible to being switched.Important MTJ parameters are the Tunnel Magnetoresistance (TMR), that describes the difference in resistance between the parallel (P) and antiparallel (AP) arrangement of the two magnetic layers, and the energy barrier of the free layer, EB, that needs to be overcome to toggle between the two resistance states [27][28][29] .
For stable MTJs, such as those used in spin-transfer torque magnetic random access memory (STT-MRAM) applications 30 , energy barriers are large and when the resistance is measured as an external magnetic field is swept, the resulting minor loop exhibits deterministic switching of the free layer.Figure 1(c) shows an example minor loop of a fabricated MTJ that was observed to be stable.
If this energy barrier is made smaller, through material changes or shape scaling 31 , the ambient thermal energy may be sufficient for the free layer to switch stochastically between the two resistance states (Figure 1(d)).When biased at the center of this window, the signal is shown to be a naturally fluctuating output whereby the time spent in each resistance state (known as the dwell time, ) may be described by the equation: where   is the Boltzmann constant,  is the temperature and  0 is the "attempt time", a material-dependent constant that is ~1ns 32 .For in-plane stochastic MTJs, dwell times down to ~5ns have been demonstrated 33,34 .
For p-bit applications, this source of natural stochasticity is ideal; by coupling a stochastic MTJ with an access transistor, and including an inverter for amplification, a compact voltage-controlled p-bit design is achieved (Figure 1(a)) 26 .
The theoretical output from such a p-bit implementation, generated using modified experimental data from stochastic MTJs (Figure 1(e)) and circuit simulations of transistor behavior (Figure 1(f)), is shown before (Figure 1(g)) and after (Figure 1(h)) the inverter's amplification.(For more details regarding the use of experimental data in the circuit simulations, please see Supplementary Information 1).The core of the p-bit, which includes the stochastic MTJ and NMOS transistor, provides the tunable stochasticity while the inverter provides the thresholding and amplification of the stochastic signal.The resulting sigmoidal output allows for pinning at low-and high-input voltages, while exhibiting the desired output fluctuations in the transition region.
The tunability in the output is controlled through varying the transistor gate voltage (VIN), where changes in the relative resistance of the transistor to the MTJ change the voltage at the inverter's input.This voltage is then amplified through the inverter's operation, allowing the output to be pinned to output-low for low VIN, and to output-high for high VIN.In the middle region, the stochastic resistance fluctuations from the MTJ manifest as tunable random voltage fluctuations in the p-bit output.
This design is discussed further in the following section, which shows the experimental realization of the p-bit core using a stochastic MTJ and a 2D-MoS2 transistor.

-Experimental demonstration of a stochastic on-chip p-bit core with integrated stochastic MTJs & 2D FETs
For this demonstration, MTJ devices were first fabricated before those devices possessing sufficient TMR for a large read-signal are interconnected with appropriate resistance-matched field effect transistor (FET) devices in a 1T-1MTJ configuration.It is desirable to have the transistor chosen such that the on-state FET resistance is at least two orders of magnitude smaller than the MTJ's low-resistance state, RP, and that the off-state FET resistance is two orders of magnitude larger than the MTJ's high-resistance state, RAP, to attain the maximum swing in the output voltage.
Figure 2(a) shows a schematic of the 1T-1MTJ configuration for the on-chip p-bit core.The detailed stack structure for the MTJs used in this demonstration is shown in Figure 2(b).The magnetic layer (CoFeB) thicknesses, were chosen to best yield MTJs with in-plane anisotropy due to two reasons: MTJs with in-plane anisotropies have been shown to be more resistant to Spin Transfer Torque (STT)-pinning 35 , and have also shown to fluctuate with time scales that are orders of magnitude faster than perpendicular-anisotropy MTJs 32,34,36 .The interdigitated (IDT) monolayer (ML) MoS2 FETs are then fabricated alongside the completed MTJ devices.The cross-section of the FET is shown in Figure 2(e) while an SEM image of a fabricated IDT ML MoS2 FET is shown in Figure 2(f), where a single IDT FET includes 20 sets of source/drain contacts, with Lch~150nm and Wch~6.5μm, for a total effective channel width of 130μm.ML MoS2 is chosen as the channel material of the drive transistor due to the low thermal budget fabrication process (to help preserve the performance of the fabricated stochastic MTJs, which suffer shorting in the SiO2 isolation layer for temperatures above ~400˚C), low contact resistance 37 , the large bandgap (1.8eV), high on-state performance of scaled 2D-MoS2 FETs 38 and good electrostatic control achievable with ML MoS2..The dwell time of this device, a quantity that determines the speed at which a p-bit may operate, is calculated as the harmonic mean of   and   and is 695ms (Details on the dwell time extraction and the quality of randomness can be found in Supplementary Information 2).
The transfer characteristics of 24 as-fabricated IDT ML MoS2 FETs are seen in Figure 3(d), showing a narrow variation in the threshold voltage, while the benefits of the IDT structure are seen in the high current levels and on/off ratios.The on-current level is around 0.6mA at VDS = 0.1V and the on/off ratio is around ~ 10 10 , with a minimum subthreshold slope (SS) around 94 mV/dec.Note that the scaled devices operate at gate voltages on the order of ~1V, which is critical for the ultimate p-bit implementation to ensure that VIN and VOUT are identical.
Following the characterization of devices, a Ti/Au interconnect is fabricated between the MTJand MoS2-FET pair observed to have the best resistance match and stochastic signal.It is observed, however, that after the integration of MTJ and FETs, there is a degradation in the transistor performance, as shown in Figure 3(e), including degraded on-off ratio and SS.This is not a result of connecting the FET with the MTJ, but likely due to process-induced trap charges in the HfO2 gate oxide that produced an aging effect, whereby the FET characteristics were observed to degrade over time for this device 40 .
A circuit schematic of this 1T-1MTJ p-bit core is shown in Figure 3(f), while an optical microscope image of the finished device is shown in Figure 3(g).(To better understand the choice of VD and the impact of large current densities through the MTJ, see Supplementary Information 3).
For this measurement, the MTJ is biased at its 50-50 point (as seen in Figure 3(a)) and VINVERTER INPUT is measured 200 times at each input voltage value, VIN, to demonstrate the impact of the stochastic fluctuations on the p-bit core's output.
To compensate for the transistor degradation in the integrated p-bit core, VIN had to be significantly increased, which will not be required in a further optimized p-bit implementation.At large negative VIN, when the transistor is in its highly resistive OFF-state, the potential at VINVERTER INPUT is close to VD. Increasing VIN yields a decrease in the transistor's resistance, resulting in a reduction in VINVERTER INPUT as the transistor approaches its threshold voltage, VTH.
For this device, the leftward shift of the degraded transistor's threshold voltage, VTH, results in a leftward shift of the overall sigmoid while the degradation in the transistor's off-state resistance (shown in Figure 3(e)) results in the output not being fully pinned to VD (see Supplementary Information 5 for off-chip p-bit core implementations with better resistance-matching and better VIN-VOUT matching between the constituent MTJ-FET pair, illustrating that the non-idealities in the on-chip demonstration discussed here are a result of process modules and not a fundamental issue).
The impact of the MTJ's fluctuations also becomes increasingly clear in the p-bit core output as VIN is increased, with the magnitude of fluctuations observed at a maximum when the resistances of the transistor and the MTJ are approximately equal, and an equal voltage is dropped across both components.The red inset in Figure 3(h) reveals a significant voltage drift in the output due to charge traps from the degradation of the transistor gate oxide and its impact on the subthreshold slope.
A further increase in VIN to the transistor's ON-state, where the resistance of the transistor is less than that of the MTJ, sees the output approach 0V.The output here still shows the fluctuations from the MTJ, but at a much smaller scale (green inset, Figure 3(h)).This is beneficial as any STT-pinning effects from the large currents at this input voltage, that could act to potentially bias the 50-50 fluctuations of the MTJ, do not significantly impact the output of the p-bit core (Supplementary Information 3 shows how large current densities through the MTJ can result in STT-pinning).
In this way, this demonstration of a scaled on-chip p-bit core is shown to produce the desired sigmoidal output with the tunable stochasticity that is required for probabilistic computing.A desirable feature of the sigmoid is that it is centered around VIN = VD/2, such that VIN and VOUT may be of similar scales, and the output of one p-bit may be fed into the input of another p-bit to create correlated p-bit networks.This may be achieved by implementing a dual-gated transistor design, whereby the threshold voltage may be shifted to the desired region through the application of an additional top-gate voltage (demonstrated in Supplementary Information 4).
This demonstration also illustrates the impact the transistor has on the p-bit's output.For example, the subthreshold slope (SS) determines the steepness of the sigmoid (a steeper SS would yield a steeper sigmoid), and how well the transistor is resistance-matched with the MTJ impacts the VD range over which the output sigmoid spans and if the output can be pinned.Moreover, the location of the threshold voltage is critical in determining the centroid of the overall sigmoid (as shown in Supplementary Information 5).

-Influence of MTJ characteristics on the p-bit output
To study the impact of an MTJ's characteristics on a p-bit's output, experimental data from stochastic MTJs are used as input for circuit simulations, conducted using the Spectre Simulation Platform.A 3T-1MTJ model of the p-bit is used (Figure 4(a)), with additional bias points available at the body bias for the NMOS and PMOS transistors of the inverter for tuning of inverter characteristics (Further information about data handling, and the transistors that are part of the pbit circuitry, is provided in Supplementary Information 1).
Two key properties of an MTJ are investigated: the MTJ's TMR and the MTJ's distribution of resistance states.An ideal p-bit output in a 3T-1MTJ configuration would be a smooth sigmoidal function with a wide region of fluctuations, at the center of which are rail-to-rail fluctuations that could be used to drive other p-bits in a network of such devices.
Figure 4(b) shows the p-bit output for three MTJs fluctuating at the same frequency, but with TMR ratios scaled to different values (Supplementary Information 6 describes how this TMR-scaling was performed using actual measured MTJ fluctuations).The dotted line shows the timeaveraged VOUT at each VIN, while the shaded background shows the instantaneous output as VIN is swept linearly from 0 to 1.8V.
The largest TMR device (300%, blue) has the widest stochastic region and rail to rail fluctuations, but also shows a plateau in the time-averaged curve.These plateaus, or the pinning of the output over a range of input voltages, are non-ideal for concatenation purposes as it reduces the tunability of an individual p-bit's fluctuations with changes in its input from other p-bits in the network.
In contrast, the smallest TMR device (15%, black) has no plateaus but has a narrow range over which the fluctuations are visible.This is undesirable as it limits the VIN range in which usable fluctuations are observed, with the p-bit output primarily in the output-low or output-high state.To understand this behavior, consider Figure 4(c).
Figure 4(c) shows, for increasing VIN applied to the transistor's gate, the distributions of values at the inverter's input for each of the p-bits made with MTJs of differing TMRs, along with the voltage transfer curve (VTC) of the inverter (overlaid in green).The largest TMR device (300%, blue), with the largest resistance fluctuation, has the widest spread of values for Inverter Input, while the smallest TMR device (15%, black) has the narrowest distribution (Supplementary Information 7 provides further explanation of these voltage distributions).
For VIN = 0.8V (left graph), the value at the inverter's input is centered around VINVERTER INPUT ≈ 1.2V, such that the VOUT is within output-low, i.e. close to zero, on the VTC for both the 15% and 80% TMR.However, the 300%-TMR device has a sufficient number of states in the bottom arm of its VINVERTER INPUT distribution (blue) that is in-between the noise margin regions of the inverter's VTC, such that the average VOUT for the 300%-TMR device is shifted to a larger value of ~320mV (Figure 4(b)).
As VIN is increased, the transistor connected to the stochastic MTJ becomes more conducting, and the center of the distributions shift to smaller VINVERTER INPUT values.For VIN = 0.98V (right graph in Figure 4(c)), the 15%-TMR device (black) has inverter input values such that it interacts primarily with the output-high section of the VTC, giving an average VOUT that is pinned close to 1.7V (Figure 4(b)).
In contrast, the 300%-TMR device has a larger range of inverter input values that spans between the noise margin regions of the inverter.This results in the plateau effect, where changing the input voltage does not yield a meaningful change in average VOUT as the TMR is large enough for the distribution to span both the output-high and output-low regions of the VTC for a range of VIN, and inverter input, values.
To summarize, the smaller the TMR, the smaller the section of the VTC that is 'sampled' by the inverter input distribution, and the smaller the range of VOUT over which the values are averaged.This results in a smoother averaged output that is more sigmoidal and less prone to plateauing.However, the VIN range over which the stochastic fluctuations are observed is small, limited to the range between the output-high and output-low regions of the VTC, where the gain is non-zero.This means that for a small TMR device, rail-to-rail fluctuations are not observed at all.Although it has been shown that rail-to-rail fluctuations are not necessary for the entire fluctuating range 26 , the diminished output fluctuation range would make it difficult to form networks with small-TMR p-bits due to the insufficient voltage drive it would provide to the next p-bit.A large TMR-device is good for attaining rail-to-rail output voltages, such as at VIN = 0.9V (middle graph) where the 300%-TMR device shows an output spanning 0 to 1.8V, but is prone to the plateauing effect if the device's inverter input distribution spans the output-high and output-low regions of the VTC for an extended range of VIN values.This is a key finding: for a given inverter, the TMR should not be too high such that it spans the output-low and output-high regions of the inverter for a large VIN range.A 'perfect' inverter that has an infinite gain would be undesirable for p-bit applications, as even an MTJ with a small TMR would have a step-like plateau in the output.
Another characteristic that affects the p-bit's output is the MTJ's distribution of states.An MTJ with a very bimodal distribution is more prone to plateaus in the output, especially if the TMR is large enough for the fluctuations in the inverter's input to sample both output-high and output-low regions of the VTC.In contrast, an MTJ with a very continuous distribution, with the ideal being a uniform distribution between RP and RAP, would sample each value of the VTC equally and would give a much smoother sigmoidal output 41 .
A further key finding of this work is that there appears to be a correlation between the distribution of resistance states and the speed at which these in-plane MTJs fluctuate.To quantify how bimodal a MTJ's resistance fluctuations are, a new figure-of-merit, the 'distribution factor', is introduced.Using the normalized resistance output of an MTJ, histograms are created where the counts in the 8 'edge' state bins are divided by the 8 'middle' state bins.For statistical significance, the total number of data points is the same in each data set.Figures 5(a Figure 5(c) shows this distribution factor calculated for 23 stochastic MTJs, made with the same stack material, with dwell times spanning orders of magnitude (Supplementary Information 8 explains in greater detail why it is meaningful and justified to use the distribution factor as a key metric).
It is observed that the faster the MTJ fluctuates, the more middle states there are in the resistance distribution, and the less bimodal the distribution is.The dotted line is a guide to the eye which suggests that for this material stack, a uniform distribution with an equal edge-and middle-state counts would be achieved for MTJs with fluctuations in the tens of ns regime.This correlation suggests that a faster device, with a more continuous distribution, would yield a smoother sigmoidal output.This is tested with the two devices of different dwell times,  = 117μs and  = 27ms, that are scaled to the same TMR, and using the same inverter.Figure 5(d) shows that the faster device (117μs, red), which has a smaller distribution factor and is less bimodal, yields a p-bit output that is more ideal than the output from the slower device (27ms, orange) which suffers from the plateauing effect described previously.This is another key finding in that a faster MTJ has a two-fold advantage: firstly, the faster the fluctuation and speed of random number generation, the faster the p-bit may operate asynchronously, and secondly, the faster the MTJ, the more uniform the distribution of states is observed to be, and the more ideal the p-bit's output is.Thus, it is this interplay of the MTJ's TMR and the distribution of states, along with the inverter's properties, that can determine how ideal a p-bit's output is.

-Influence of inverter characteristics on the p-bit output
The inverter also offers a degree of control over the p-bit's output.Figure 6(a) shows the voltage transfer curve (VTC) for two inverters: one without applied body bias, called "pristine" (black curve), and the other which has been tuned, through application of a positive body bias to the NMOS-FET, to have a smaller gain (red curve).
Using the same MTJ and transistor, Figure 6(b) shows the impact of this inverter tuning on a pbit's output: the tuned inverter (red), with the smaller gain, shows a smoother sigmoid while the pristine inverter, with the larger gain, shows a more pronounced undesirable plateau in the output.This is because for a given MTJ with a bimodal distribution, the distribution of voltages at the inverter's input is less likely to span the output-low and output-high regions for an extended range of VIN (the cause of the undesirable plateaus) if the VTC is shallower and the gain is small.
However, the tuned inverter also suffers from a degradation in the noise margin, seen in Figure 6(a), which decreases the size of the p-bit's output fluctuation range.This is because the body bias at the NMOS transistor shifts its threshold voltage, lowering the channel resistance and making it harder to pin to output-high, VD, for large VIN.
This issue could be mitigated by using a more aggressively scaled technology node for the inverter than the 180nm-node used here.A 14nm-ultrascaled Fin-FET inverter (as used in previous p-bit simulation work 26,41,42 ), which provides a more piecewise-linear VTC that offers a lower gain (for a smoother sigmoidal output), and a wide-noise margin to pin the output to VD at high input voltages, would be desirable.

-Conclusion
In this work, the first experimental realization of an on-chip p-bit core is demonstrated, using a stochastic in-plane MTJ integrated with a 2D-MoS2 transistor in a 1T-1MTJ structure.Through experimental demonstration and circuit simulations, it is shown how each component of the p-bit influences the overall output.
For the transistor, a good resistance-match with the MTJ and a threshold voltage close to VD/2 is required to achieve a well-centered sigmoid that spans the full range of VD and is suitable for inverter amplification.
For the stochastic MTJ, too large a TMR can cause plateaus in the inverter's average output, while too small TMR gives an insufficient VIN range over which the usable fluctuations in VOUT are observed.Additionally, it is found that the speed at which the MTJ fluctuates is crucial to the pbit's output: a faster MTJ is observed to have a more uniform distribution (with more middle states between RP and RAP edge states), and for a given inverter, this results in a smoother VOUT sigmoid with less plateauing.A faster MTJ is also beneficial when concatenating p-bits to form a p-bit network, whereby the speed of the MTJs used can determine the speed of asynchronous operation.
For the inverter, the large gain and the steep VTC associated with the conventional 180nm-node technology used in the simulations was found to be more likely to yield undesirable plateaus in the p-bit output.A smaller gain inverter, with a piecewise linear VTC that maintains a wide noise margin in the input-low and input-high regions, achievable with a more scaled process, is desirable for p-bit applications.
These observations highlight how each component is crucial in determining the quality of the pbit's output and seek to provide design insights that can contribute towards the future goal of fully scaled on-chip p-bit networks.
These stacks are patterned into elliptical nanopillars using e-beam lithography and Ar-ion beam etching.Amorphous SiO2 is then deposited, to electrically insulate the bottom contact channel, with the etch hard mask in place as part of a self-aligned process.The hard masks are then removed using NMP-based solvent, after which the MTJs are annealed at 300˚C for 10 minutes to improve the TMR of the finished devices 43 .After the annealing procedure, the top contacts are defined using e-beam lithography, with e-beam evaporation used to deposit Ti/Au (20/140 nm) electrodes to enable electrical measurements across the MTJ.

2D FET Fabrication
The bottom gate electrode structure is made of a Cr (2nm)/Au(13nm) metal stack followed by 5.5nm HfO2 gate oxide.The HfO2 is deposited by an atomic layer deposition (ALD) system at 90ºC.Then the ML MoS2 flakes are wet transferred from the original Si/SiO2 growth substrate onto the bottom gate electrodes and then vacuum annealed at a pressure of ~5×10 −8 torr at 200ºC for 2 hours.After vacuum annealing, the flakes are etched into a stripe before the interdigitated source/drain contacts are defined by electron beam lithography (EBL) and Ni (70nm) is deposited as the contact metal by electron beam evaporation.state (the edge-state counts) compared to the amount of time the free layer spends in transitioning between them (the middle-state counts).This transition time is dependent on material properties of the MTJ stack layers, such as the perpendicular effective anisotropy field and the damping factor, and for in-plane MTJs is theorized to be in the range of approximately 1-10ns 5 .Therefore, the smaller the energy barrier, the smaller the dwell time in the P-or AP-state, while the transition time is relatively unaffected (with change in the energy barrier size).Thus, the smaller the dwell time, the fewer the edge states relative to middle states, which correlates to a smaller distribution factor. Supplementary

Figure 1 :
Figure 1: Implementing p-bits with stochastic MTJs.a) Schematic of the proposed p-bit design, comprised of a stochastic MTJ connected to the drain of an NMOS transistor, forming the stochastic core of the p-bit, and an inverter, for thresholding and signal amplification.b) Cross-section schematic of a typical MTJ stack, with layer thicknesses denoted in nm, and an explanation of the TMR effect.c) Minor loop of a stable MTJ, showing how the resistance changes deterministically as a function of magnetic field for a large energy-barrier free layer.d) Minor loop of a stochastic MTJ, showing how the resistance fluctuates between the parallel-and anti-parallel state for a small energy-barrier free layer.e) Example time-series resistance data for the fluctuating MTJ and f) transfer characteristics of the transistor used to obtain the example pbit's output.g) Graph showing the typical p-bit signal before the inverter's operation, as a function of the transistor gate voltage (defined as VIN).The average at each point is shown by the dotted line.h) Graph showing the typical output of the full p-bit, VOUT, as a function of the input voltage, VIN.The time-averaged signal at each input voltage is represented by the dotted line.

Figure 2 (
Figure 2(c) shows an SEM image of an exemplary elliptical nanopillar with the same dimensions as the MTJs used in this demonstration, while Figure 2(d) shows an optical microscope image of a finished MTJ device, along with a tilted-angle false-color SEM image of the MTJ region.

Figure 3 (
Figure 3(a) shows the minor loop of the stochastic MTJ used in the integrated p-bit.The dashed line at -16mT indicates the 50-50 point at which the device "spends" an equal amount of time in the AP-and P-state.All further measurements for this device are performed at this 50-50 point to ensure the MTJ's resistance output (Figure3(b)) is truly random.As this is an intrinsically Poisson process, fitting the histograms of the AP-and P-state dwell times (Figure3(c)) with an exponential envelope yields the average dwell time in each state (  and   , respectively)20,39 .The dwell

Figure 3 (
h) shows the output, VINVERTER INPUT, as a function of the input (FET gate) voltage, VIN.VD = 200mV was used to avoid excessive stress to the MgO barrier and to prevent the damage to the MTJ observed at larger current densities.

Figure 2 :Figure 3 :
Figure 2: Fabricating an on-chip p-bit.a) 3D schematic of the proposed design for an on-chip p-bit core, using a stochastic MTJ and 2D MoS2 FET.b) Side cross-section view of the MTJ stack, with the fixed-and free-layer denoted.c) Top SEM view of an example MTJ pillar of the same nominal dimensions as the stochastic MTJ in the integrated on-chip device.d) Optical microscope and (false-color) tilted-SEM images of an example finished MTJ device.e) Cross-section schematic of the 2D MoS2 FET.f) Optical microscope and SEM-images of the 2D FET, showing the interdigitated contacts that are used to attain high-current drives.
) and 5(b) show this process for two MTJs of different dwell times ( = 117μs and  = 27ms, respectively).

Figure 4 :
Figure 4: Influence of the MTJ's TMR on the p-bit output.a) Diagram of the p-bit circuit implemented in Cadence software for circuit simulations.b) Graph showing the outputs of p-bits made with MTJs of differing TMRs; an MTJ with too large a TMR is likely to cause undesirable plateaus in the p-bit output.c) Graph showing the distribution of voltages at the inverter's input (histogram data, right axis) for the different TMR devices, and how this interacts with the voltage transfer curve of the inverter (green curve, left axis), to produce the averaged VOUT signal for VIN values of 0.8V (left graph), 0.9V (middle graph) and 0.98V (right graph).

Figure 5 :
Figure 5: Resistance distributions of stochastic MTJ devices.a) Data for a stochastic device with a dwell time of 117μs, showing the raw signal as measured at the oscilloscope (left), the normalized resistance data, zoomed in to show the fluctuations are well-sampled (middle), that are used to create the device's normalized resistance histogram (right).b) Stochastic device data, as described previously, for a device of a slower dwell time of 27ms, showing correspondingly fewer 'middle states' in the normalized resistance histogram.c) Graph showing the "distribution factor" of 23 stochastic MTJ devices, a quantity defined as the number of 'edge states' divided by the number of 'middle states' in an MTJ's normalized resistance histogram, as a function of their measured dwell times.The dotted line is a guide to the eye, showing the predicted behavior for even faster devices made with this material stack.d) Graph showing the averaged output for two p-bits that use the same inverter and transistor, but with MTJs of different dwell times and distributions.

Figure 6 :
Figure 6: Influence of the inverter's characteristics on the p-bit output.a) Graph showing the voltage transfer curve (dotted line, left axis) and the absolute gain curve (solid line, right axis) for an untuned inverter (black) and an inverter that is tuned (red) to artificially lower the gain (by application of a body bias to the NMOS transistor in the inverter).b) The p-bit output, VOUT, as a function of the input voltage, VIN, for the untuned (black) and lower-gain inverter (red).

Supplementary Figure 1 :
Simulation Details.a) Plot showing the difference in the MTJ distribution when fed using the correct timestep ("Original Data"), and when using a too small timestep ("Interpolated Data").b) The output sigmoid of the stochastic p-bit core, VINVERTER INPUT, as a function of the input voltage, VIN.The red line shows the transient instantaneous value of the voltage, while the dotted line shows the time-averaged value.c) A plot of the full p-bit's output, after the inverter's thresholding and amplification.The value of VOUT is pinned to 0V for VIN below 0.6V, and to 1.8V for VIN above 1.2V, and so is not shown here.

Figure 9 :
Supporting information for the 'Distribution Factor'.a) A series of plots displaying the time-series data taken at the 50-50 point for the same MTJ, but with an increasing sampling time, Δ, between data points.B) Plot showing how the increasing sampling time does lead to a decreasing number of 'middle count' states (left axis), but the 'Distribution Factor' is relatively unaffected (right axis).Graphs plotting statistics from 23 stochastic devices demonstrating how the 'Distribution Factor' of these devices show no correlation with the c) average resistance, d) the magnitude of resistance fluctuations or e) where the stochastic switching is observed to occur (in terms of the externally applied magnetic field).The only correlation observed for this quantity is with the device's dwell time, as presented in the main text.