Abstract
Dense crossbar arrays of nonvolatile memory (NVM) can potentially enable massively parallel and highly energyefficient neuromorphic computing systems. The key requirements for the NVM elements are continuous (analoglike) conductance tuning capability and switching symmetry with acceptable noise levels. However, most NVM devices show nonlinear and asymmetric switching behaviors. Such nonlinear behaviors render separation of signal and noise extremely difficult with conventional characterization techniques. In this study, we establish a practical methodology based on Gaussian process regression to address this issue. The methodology is agnostic to switching mechanisms and applicable to various NVM devices. We show tradeoff between switching symmetry and signaltonoise ratio for HfO_{2}based resistive random access memory. Then, we characterize 1000 phasechange memory devices based on Ge_{2}Sb_{2}Te_{5} and separate total variability into devicetodevice variability and inherent randomness from individual devices. These results highlight the usefulness of our methodology to realize ideal NVM devices for neuromorphic computing.
Introduction
Over several decades, the von Neumann architecture has enabled exponential improvements in system performance. However, as device scaling has slowed and demand to handle big data has soared, the time and energy spent transporting data across the physically separated memory and processing units have started to limit the performance and power efficiency. As potential alternatives, neuroinspired nonvon Neumann computing paradigms have become promising candidates to perform realworld tasks^{1, 2}. One avenue of research is referred to as inmemory computing or computational memory, which exploits the physical properties of nonvolatile memory (NVM) devices for both storing and processing information^{3,4,5,6}. Recently, a largescale experimental demonstration of this concept using an array of one million phasechange memory (PCM) devices has been reported^{7}. Another paradigm is hardware acceleration of deep neural network (DNN)^{8,9,10,11,12} training via the use of dense crossbar arrays of NVM to perform locally analog computation at the location of the data. As shown in Fig. 1, it is possible to use NVM devices with variable conductance states, such as resistive random access memory (ReRAM)^{13} and PCM^{14} to represent the synaptic weights and to perform vectormatrix multiplication using the basic electrical principles, i.e., Ohm’s and Kirchhoff’s laws, thus enabling local and parallel computation on a large scale. By making the conductance change of the NVM element bidirectional, backpropagation algorithm can be implemented. Such a crossbar array of NVMs is expected to achieve significant acceleration factors of DNN training and remarkable reduction in power and area^{15, 16}. Another active area of research is spiking neural networks (SNNs) motivated by the need to build more biologically realistic neural network models. Several neuromorphic computing platforms are being developed which are optimized for emulating spikebased computation. These SNNs are typically trained using certain local update rules, such as the spiketimingdependent plasticity. NVM devices have recently found applications as both synaptic and neuronal elements of such SNNs^{17,18,19,20}.
The key technical challenge for these applications is to realize ideal NVM elements with continuous (analoglike) conductance tuning capability in response to electrical pulses with acceptable noise levels. For acceleration of DNN training, symmetric conductance change with positive and negative pulse amplitudes is another key requirement^{15, 16}. The device conductance should go up with a voltage pulse of one polarity and should go down by the same magnitude with a voltage pulse of the opposite polarity. In general, NVM elements do not show this symmetric switching behavior. Therefore, a differential approach is often used in which two conductance values are compared in a unit cell^{14}. In this configuration, linearity in switching is required to ensure a symmetric differential signal. In reality, most NVM elements exhibit highly nonlinear evolution of conductance as a function of the number of consecutively applied pulses. This results in significant errors in weight updates^{13}. In addition, such nonlinear conductance change makes separation of signal and noise extremely difficult. Most NVM elements show stochasticity related to the physical origins of switching. When incremental weight updates are performed for analog NVM devices, the magnitude of conductance change approaches the level of inherent randomness^{21}, manifesting as significant noise components. Therefore, establishing a universally applicable methodology to evaluate signaltonoise ratio (SNR) of nonlinear and analog NVM devices is of paramount importance for neuromorphic computing applications.
In this study, we first establish a practical methodology based on a machine learning algorithm to precisely separate signal and noise components from an analog NVM device with nonlinear conductance changes. The methodology is agnostic to the device physics, enabling us to apply it to different types of NVM elements. First, the methodology is applied to HfO_{2}based ReRAM to understand the relationship between switching symmetry and SNR. Next, the methodology is applied to PCM devices based on dopedGe_{2}Sb_{2}Te_{5} (GST). We characterize 1000 devices and separate devicetodevice variability and inherent randomness from individual devices.
Results
Analog switching behaviors of ReRAM and PCM
As shown in Fig. 2a, our ReRAM device exhibited analoglike (incremental) change in the device conductance (G) in response to voltage pulses. Consecutive positive voltage (set) pulses (pulse number 1–1000) on the top electrode caused an overall ascending trend of G with some pulsetopulse fluctuations. On the other hand, consecutive negative voltage (reset) pulses (pulse number 1001–2000) caused a descending trend of G with similar fluctuations. The change of G in oxide ReRAM device is attributed to change in the configuration of the current conducting filament which consists of oxygen vacancies in a metal oxide film^{22, 23} as schematically illustrated in Fig. 2b. The movement of the oxygen vacancies in response to electrical signals has a probabilistic nature and it emerges as inherent randomness in weight updates, which are superimposed on the expected signal^{13}.
As for PCM, we investigated the device G changes in response to 20 consecutive set pulses. Figure 2c is a plot of G as a function of pulse number, showing incremental changes with a nonlinear trace, which is convoluted with pulsetopulse fluctuations. The PCM device includes a small part of phasechange material that is sandwiched by top and bottom electrodes. Transition from the low conductance state (amorphous phase) to the high conductance state (crystalline phase) is caused by set pulses that create sufficient joule heating for crystallization of the GST material while the temperature is kept below the melting point as schematically illustrated in Fig. 2d. Due to the stochastic nature in crystallization of the phasechange materials^{2, 20, 21, 24, 25}, there is significant randomness associated with the weight updates. On the other hand, reset to the low conductance state requires melting of the GST material and this process is known to be abrupt. For the purpose of characterization of analog switching behaviors, we focused on incremental set operations for PCM in this study.
Characterization of NVM elements
To evaluate the performance of analog NVM elements for neuromorphic computing applications, one has to extract noisefree signals from experimental data. A conventional approach is to assume a parametric model for expected conductance changes, derived from relatively simple assumptions on underlying physics. For ReRAM devices, an exponential formula has been proposed to capture the nonlinear trend^{13}. However, the preassumed exponential relationship often causes significant errors when fitting weight update as a function of number of applied pulses. In addition, different NVM elements generally need different fitting formulas, making it difficult to compare key performance parameters, such as switching symmetry and SNR, on a common ground. To address this issue, we leverage a machine learning algorithm called Gaussian process regression (GPR)^{26}. GPR is a nonparametric Bayesian regression method, which does not assume any specific functional form such as linear and exponential. The main motivation for implementing GPR in the analysis of analog NVM elements is to let experimental data give predictions of noisefree signals by themselves. The major assumption we used is the smoothness of the curve. For analog NVM devices, we exploit continuous changes in switching media (e.g., filament configuration for ReRAM, volume of crystalline region for PCM) rather than noncontinuous phenomena to achieve incremental conductance changes. This makes analog switching data highly compatible with the assumption of smoothness. The key ingredient of GPR is the kernel matrix (Eq. (6) in Methods), which controls the smoothness of the estimated functional curve. We established a practical approach to optimize the kernel matrix by combining the Bayesian marginalized likelihood maximization with the frequentists’ crossvalidation approach. This enabled us to precisely separate signal and noise for our large dataset while avoiding numerical instability. The proposed inference procedure also assumes that a prior probability distribution over underlying functions follows a multivariate Gaussian distribution, which consists of a linear combination of finite random variables. This assumption is consistent with the switching mechanism of analog memory devices where the device conductance is governed by parallel configurations of randomly distributed conducting filaments comprising oxygen vacancies or crystalline phasechange materials. The measured device conductance values indeed follow a Gaussian distribution around noisefree signals and this was verified by observing the distribution of noise in our experimental data for ReRAM (Supplementary Note 1). The details of our GPRbased methodology are described in Methods section.
We performed crossvalidation^{27} using our ReRAM data and confirmed that the GPRbased methodology extracted the inherent features irrespective of the sampling size (Supplementary Note 2). We confirmed the robustness of our methodology against the variation of duration of input pulses from 5 to 100 ns, covering the range of interest for neuromorphic computing (Supplementary Note 3). We also confirmed the robustness of our methodology against the variation of test temperature (Supplementary Note 4). For the rest of the analysis, we used a pulse duration of 100 ns and tested the devices at room temperature. Next, we extracted key performance metrics using the GPR fitting. We applied the methodology to our ReRAM data with 1000 consecutive set pulses, followed by 1000 consecutive reset pulses, for the purpose of characterizing switching symmetry. As shown in Fig. 3a, the GPR fitting gave predicted noisefree curves (red lines) for both set (black) and reset (blue) pulse sequences. Once the noisefree curves are estimated, the G change per pulse, denoted by ΔG, is easily computed, based on which we define SNR as
where r represents the absolute difference between predicted and observed G values (i.e., residuals). The impact of SNR on the accuracy of neural network was previously discussed^{21}. Since relatively long sequences were used for both ReRAM and PCM devices to minimize fluctuations in read signals, we attribute r to inherent randomness associated with the physical origin of weight update. In artificial neural network implementations, fast reading is particularly preferred to decrease the overall cycle time and consequently accelerate the computational operations. This should increase the contribution of read noise. In this case, we need to optimize the read operation to balance the overall performance and the noise level, which is beyond the scope of this work. The extracted r value is shown as a function of pulse number in Fig. 3b. The absolute ΔG values for set and reset pulses are denoted by ΔG_{+} and ΔG_{−}, respectively. The ΔG_{+} (black) and ΔG_{−} (blue) are plotted as a function of pulse number in Fig. 3c. Figure 3d shows absolute SNR, calculated locally at each pulse from ΔG and r. For characterization of switching symmetry, we introduce symmetry factor (SF), which is defined as
With this definition, the degree of symmetry is quantified as a value between −1 and 1, with 0 corresponding to the perfect symmetry. Asymmetry in both directions (larger ΔG_{+} versus ΔG_{−}) are equally weighted around 0 and can be compared with absolute values. In order to compute SF and SNR at a given G level, we need to express ΔG_{+}, ΔG_{−}, and r as functions of G. Therefore, we divided the total G range into 100 subranges and computed a mean value of ΔG and a root mean square value of r within each G subrange. In this way, one can obtain SF and SNR for each G subrange. The local extraction (i.e., at a certain pulse number or G level) of SF and SNR is a powerful feature of our methodology. The symmetry requirement for acceleration of DNN training specified in ref. ^{15} (<5% difference between ΔG_{+} and ΔG_{−}) corresponds to SF <0.025.
Switching symmetry and SNR of ReRAM devices
We applied the GPRbased methodology on our ReRAM devices with different metal oxide thicknesses (device A: 5 nm, device B: 4 nm). The devices were tested under different set and reset voltages and the SNR and SF values were extracted locally at each G level, as shown in Fig. 4a. For SNR, we took mean values for set and reset traces. Representative G versus pulse number traces are shown in insets. Figure 4b shows a crosssectional twodimensional plot of SNR versus SF taken at G ~20 μs from Fig. 4a. At this G level, low SF values were achieved at relatively low SNR values, and vice versa. Data points are absent in the upperleft corner of Fig. 4b, indicating that there is a fundamental tradeoff between SNR and SF values. In order to investigate the relationship between SNR and SF values for multiple device/pulse conditions spanning different G levels, they were grouped according to SNR values and cumulative distribution function of SF were compared, as shown in Fig. 4c. The reproducibility of the trend was confirmed up to 10 different devices of device type B (Supplementary Note 5). One can clearly observe that the device/pulse conditions that lead to higher SNR values tend to result in poor switching symmetry. The tradeoff can be directly observed in the G versus pulse number plots (the insets of Fig. 4a). We speculate that higher switching symmetry is achieved by making the movement of oxygen vacancies more incremental and thereby changing the width of current conducting filament rather than completely rupturing and reforming it. ΔG is smaller for the former case and it should eventually approach the level of inherent randomness, resulting in lower SNR values. Such a tradeoff makes it difficult to improve both switching symmetry and SNR at the same time and it remains as a key challenge for ReRAM devices for neuromorphic computing applications. However, if these key metrics are accurately quantified like we demonstrated with our GPRbased methodology, one can optimize the device and pulse conditions to find the optimum point within the tradeoff. As reviewed in a previous section, switching symmetry is a critical requirement to implement backpropagation algorithm for DNNs. In reality, learning accuracy is compromised due to nonideal (asymmetric) switching characteristics of synaptic elements. Therefore, we optimized the device condition (device A) and the pulse condition (set: 1.6 V, reset: –1.8 V) using the GPRbased methodology to minimize SF. The beauty of our methodology is the capability to extract SF, agnostic to switching mechanisms and irrespective of data size. This enabled us to compare our ReRAM data with various resistive switching devices in literature^{28,29,30,31,32,33,34,35}. There have been reports on improved switching symmetry using pulses with varying amplitude^{28, 30, 31}. These cases were benchmarked together and marked separately in Fig. 4d. One can see a general trend of improved symmetry using pulses with varying amplitudes. This approach, however, requires sensing of current states of individual devices and adjustment of voltage amplitudes, which is not compatible with local and parallel computation. It should be noted that our optimized ReRAM data showed good switching symmetry compared with all benchmark data with identical voltage pulses. This is a significant step forward to realize online training capability in a parallel manner. Future work needs to focus on simultaneously achieving sufficiently high SNR values with materials optimizations.
Breakdown of variability components in 90nm PCM devices
A conventional approach to extract inherent randomness associated with weight updates is to test multiple devices and to obtain statistical distributions^{21}. The variability obtained in this manner, however, includes devicetodevice variability in addition to inherent randomness from individual devices. These variability components need to be quantified separately in order to accurately assess potentials of certain NVM elements for neuromorphic computing applications. We tested 1000 PCM devices and extracted signal and noise from individual devices using our GPRbased methodology. This enabled us to further break down the total variability to the inherent randomness of individual devices and the devicetodevice variability. These two variability components are illustrated in Fig. 5a with two representative PCM devices (devices 1 and 2) that were fabricated with the identical process. The GPR fitting was performed to predict noisefree signals as shown in red and blue solid lines, respectively, in Fig. 5a. The predicted signals for devices 1 and 2 deviate from each other due to devicetodevice variability. In addition, the experimental data points (shown in circles) fluctuate around the individual fitted lines, which is attributable to inherent randomness of weight updates since the read noise was minimized by the test sequence as described in Methods section. We compared the histograms of ΔG values extracted from experimental data and fitted curves after the pulse numbers 2 (Fig. 5b) and 6 (Fig. 5c). The statistical distribution of the fitted curves (red) is the contribution from devicetodevice variability, whereas the statistical distribution of the experimental data (blue) includes inherent randomness superimposed on top of that. The latter distribution was much wider, clearly showing significant contribution of inherent randomness. The peak ΔG value decreased and the devicetodevice variability (red) tightened from the second to the sixth pulse. On the other hand, the inherent randomness remained relatively constant. This resulted in the tail of total distribution (blue) extending into the negative ΔG regime, which is undesirable (Fig. 5c). The mean and standard deviation of ΔG obtained from the experimental data (shown in black circles and error bars) were compared with the root mean square of inherent randomness (r) obtained from the GPRbased methodology (shown in red error bars) as a function of pulse number in Fig. 5d. The total standard deviation became comparable with ΔG for incremental weight updates. Since the learning accuracy is known to degrade when the ratio of standard deviation to ΔG becomes >1^{21}, reduction of variability is indispensable. Our analysis revealed that a large portion of total variability is attributed to inherent randomness of individual devices (~67%) for a mature technology based on the 90 nm CMOS baseline. The median SNR value calculated from inherent randomness is ~35% for PCM devices, which is comparable to our ReRAM device switching at a similar G level (cf. Fig. 4b). This indicates that variability due to inherent randomness is a common challenge for ReRAM and PCM for neuromorphic computing applications. Innovations in device and material are needed to suppress this component. Our methodology based on GPR enables precise extraction of inherent randomness from individual devices and provides useful guidelines for further improvement.
Discussion
We established a practical methodology based on GPR to precisely separate signal and noise components from analog NVM elements with nonlinear conductance changes. This solves key technical challenges for characterization of artificial synapses of neuromorphic computing system, namely extraction of switching symmetry and SNR. The methodology is agnostic to switching mechanisms and therefore applicable to various types of NVMs. We applied the methodology to HfO_{2}based ReRAM devices and found the tradeoff between switching symmetry and SNR. Using SF as a guideline, substantial improvement in switching symmetry was achieved compared to reported ReRAM devices in literature. By systematic analysis of 1000 GSTbased PCM devices, we clearly demonstrated that a large portion of variability in weight update is attributable to inherent randomness from individual devices and this is the key component to be suppressed in order to achieve high classification accuracy.
Finally, the proposed methodology helps neuromorphic system engineers in two ways depending on phases of technology development. In an exploratory phase, our methodology enables extraction of switching symmetry and SNR from individual devices and expedites search for ideal materials. The conventional methodology requires fabrication of many devices with tight devicetodevice variability for extraction of SNR, which is difficult to attain in the early stage when exotic material options need to be screened. In a relatively mature technology phase, our methodology helps find the optimum input signals (e.g., duration and amplitude of pulses) that provide the best switching symmetry (linearity) and SNR within the tradeoff for the entire neuromorphic system.
Methods
ReRAM device fabrication and test
We fabricated 2terminal oxide ReRAM with device dimensions of 50 × 50 μm^{2}. First, a SiO_{2} underlayer was grown on a 200 mm Si wafer. Then, a 100 nmthick TiN film was deposited by reactive sputtering as a bottom electrode, followed by deposition of a HfO_{2} layer by atomic layer deposition as a switching layer where a current conducting filament is formed. We varied the thickness of the switching layer (device A: 5 nm, device B: 4 nm) to investigate its impact on switching symmetry. Next, a 20 nmthick TiN was deposited by reactive sputtering as a top electrode. The device area was defined by photolithography and reactive ion etching of the TiN electrode. To test switching symmetry and SNR of our ReRAM devices, we applied a sequence of weight update (write) pulses with the same voltage amplitude for each polarity. We used highresolution source measure unit (SMU) to read the device conductance state between the write pulses. We applied a small read voltage of 0.1 V to prevent disturbance in the resistance state. While keeping the read voltage applied across the device, we took multiple read steps with a 16.67 ms integration time until the measured values read at the instrument stabilized (typically within 3–10 repetitive read measurements in the device resistance range of interest). Then, we chose the last measurement as the representative value. We did not detect random telegraph noise with this read sequence. The write pulses had duration of 100 ns (unless otherwise mentioned) and various voltage amplitudes (set pulse: 1.6–1.7 V; reset pulse: −1.8 to −1.9 V) were compared to investigate the impacts on switching symmetry and SNR. In order to separate noises from weight update and those from weight read, we also carried out readonly test, where only read steps were repeated up to 1000 times without weight updates in between. Our linear regression analysis showed that the residual standard error of readonly trace is 2.51 × 10^{−7} S, which is almost one order lower than that of readafterwrite trace (1.38–1.57 × 10^{−6} S). Therefore, we attribute a majority of noise components of our ReRAM devices to inherent randomness in weight updates.
PCM device fabrication and test
The PCM devices were integrated into a chip fabricated in the 90 nm CMOS technology^{36}. The phasechange material is doped Ge_{2}Sb_{2}Te_{5}. The bottom electrode has a radius of ~20 nm and was defined using a sublithographic keyhole transfer process^{37}. The phasechange material is ~100 nmthick and extends to the top electrode. All experiments in this work were done on an array comprising 1 million devices, which is organized as a matrix of 512 word lines (WLs) and 2048 bit lines (BLs). The selection of one PCM device is done by serially addressing a WL and a BL. A single selected device can be programmed by forcing a current through the BL with a voltagecontrolled current source. For reading a PCM cell, the selected BL is biased to a constant voltage of 0.3 V. The resulting read current is integrated by a capacitor, and the resulting voltage is then digitized by an onchip 8bit cyclic ADC. The ADCs are calibrated by means of onchip reference polysilicon resistors. As for characterization of incremental device G change, each device was first initialized to a state that has almost zero conductance. After the initialization, a set pulse of 70 μA was applied followed by conductance read steps. The read step was repeated 50 times to obtain mean G values in order to minimize read noise and to focus on characterization of write noise. This sequence was repeated 20 times to obtain G values as a function of pulse numbers.
GPRbased methodology
The goal of GPR is to learn a probability distribution of the output signal, \(y\), conditioned on the input signal, \(x\), from data \(\left\{ {\left( {x^{\left( n \right)},y^{\left( n \right)}} \right){\mathrm{}}n = 1, \ldots ,N} \right\}\), where \(N\) is the number of samples and the superscript \((n)\) denotes the \(n\)th sample in the data. The distribution is given by
where \({\cal N}\left( {ym\left( x \right),s^2\left( x \right)} \right)\) denotes the Gaussian distribution of \(y\) with the mean \(m\left( x \right)\) and the variance \(s^2\left( x \right)\). Also, \(\sigma ^2\) denotes the variance that corresponds to measurement noise, I denotes the identity matrix, and \({\mathbf{y}}_N = \left( {y^{\left( 1 \right)}, \ldots ,y^{\left( N \right)}} \right)^{{\mathrm{T}}}\), where the superscript T denotes the matrix transpose.
The key ingredient of GPR is the kernel matrix \({\mathbf{K}}\), which controls the smoothness of the estimated functional curve. We use a nondimensional kernel \({\mathbf{K}}\) whose \((i,j)\) element is given by
The \(n\)th entry of the \(N\)dimensional vector \({\mathbf{k}}(x)\) is also given by \(K\left( {x,x^{\left( n \right)}} \right)\). The parameters \(\sigma _K^2,\sigma ^2\) are learned from the data, as explained later. The idea is to use the predictive mean, \(m\left( x \right)\), at the input value (pulse number) \(x\), as a noisefree version of the output signal (G).
Determining GPR parameters
The parameter \(\sigma ^2\) is determined by maximizing the log marginalized likelihood^{26}, which is given by
in our parameterization, where \(c\) denotes an unimportant constant, and det is the matrix determinant. Assuming \(\sigma _K\) is given for now and taking the derivative with respect to \(\sigma ^{  2}\), we have
To compute this, we need a value of \(\sigma _K\). In theory, we could find it by maximizing \(E\) simultaneously with \(\sigma .\) This approach, however, involves a complex nonlinear optimization procedure and often results in numerical instability in our application.
Here we propose a practical approach that combines the Bayesian marginalized likelihood maximization with the frequentists’ crossvalidation approach. Specifically, to determine \(\sigma _K\), we maximize the predictive leaveoneout (LOO) likelihood, as defined by
where \(m_{  i}\) and \(s_{  i}^2\) are the predictive mean and variance of GPR (Eqs. (4) and (5)) obtained from the dataset excluding the ith sample. To find the maximizer of \(L\left( {\sigma _K} \right)\), we can leverage the fact that the observed variance does not depend heavily on the input across the entire domain. By replacing \(s_{  i}^2\) with a constant, the LOO likelihood criterion is reduced to the task of finding a minimizer of the mean square of the residual (i.e., r), which is easily done independently of \(\sigma ^2\). In this study, we use the following procedure and criterion to find an appropriate σ_{ K } value from the experimental data. We vary σ_{ K } to cover a wide range and identify an optimum range where the change of σ_{ K } negligibly affects extracted r values. This is practically equivalent to maximizing the predictive LOO likelihood. Our criterion is r change of <1% for σ_{ K } change of 10% and this is met with a σ_{ K } value of around \(3 \times N\) for our dataset (Supplementary Note 6).
Data availability
The data that support the findings of this study are available from the corresponding author upon request.
References
Merolla, P. A. et al. A million spikingneuron integrated circuit with a scalable communication network and interface. Science 345, 668–673 (2014).
Burr, G. W. et al. Neuromorphic computing using nonvolatile memory. Adv. Phys. X 2, 89–124 (2016).
Gallo, M. L. et al. Mixedprecision inmemory computing. Nat. Electron. 1, 246–253 (2018).
Sheridan, P. M. et al. Sparse coding with memristor networks. Nat. Nanotechnol. 12, 784–789 (2017).
Wright, C. D., Hosseini, P. & Diosdado, J. A. V. Beyond vonNeumann computing with nanoscale phasechange memory devices. Adv. Funct. Mater. 23, 2248–2254 (2012).
Hosseini, P., Sebastian, A., Papandreou, N., Wright, C. D. & Bhaskaran, H. Accumulationbased computing using phasechange memories with FET access devices. IEEE Electron Device Lett. 36, 975–977 (2015).
Sebastian, A. et al. Temporal correlation detection using computational phasechange memory. Nat. Commun. 8, 1115 (2017).
Lecun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Collobert, R. & Weston, J. A unified architecture for natural language processing. In Proc. 25th International Conference on Machine Learning  ICML 08 (ACM, Helsinki, Finland, 2008).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015).
Hinton, G. et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29, 82–97 (2012).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).
Chen, P.Y. et al. Mitigating effects of nonideal synaptic device characteristics for onchip learning. In 2015 IEEE/ACM International Conference on ComputerAided Design (ICCAD) https://doi.org/10.1109/iccad.2015.7372570 (Publisher IEEE, Austin, USA, 2015).
Burr, G. W. et al. Experimental demonstration and tolerancing of a largescale neural network (165000 synapses) using phasechange memory as the synaptic weight element. IEEE Trans. Electron Devices 62, 3498–3507 (2015).
Gokmen, T. & Vlasov, Y. Acceleration of deep neural network training with resistive crosspoint devices: design considerations. Front. Neurosci. 10, 333 (2016).
Agarwal, S. et al. Resistive memory device requirements for a neural algorithm accelerator. In 2016 International Joint Conference on Neural Networks (IJCNN) https://doi.org/10.1109/ijcnn.2016.7727298 (Publisher IEEE, Vancouver, Canada, 2016).
Kuzum, D., Jeyasingh, R. G. D., Lee, B. & Wong, H.S. P. Nanoelectronic programmable synapses based on phase change materials for braininspired computing. Nano Lett. 12, 2179–2186 (2011).
Kim, S. et al. NVM neuromorphic core with 64kcell (256by256) phase change memory synaptic array with onchip neuron circuits for continuous insitu learning. In 2015 IEEE International Electron Devices Meeting (IEDM) https://doi.org/10.1109/iedm.2015.7409716 (Publisher IEEE, Washington DC, USA, 2015).
Saïghi, S. et al. Plasticity in memristive devices for spiking neural networks. Front. Neurosci. 9, 51 (2015).
Tuma, T., Pantazi, A., Gallo, M. L., Sebastian, A. & Eleftheriou, E. Stochastic phasechange neurons. Nat. Nanotechnol. 11, 693–699 (2016).
Boybat, I. et al. Stochastic weight updates in phasechange memorybased synapses and their influence on artificial neural networks. In 2017 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME) https://doi.org/10.1109/prime.2017.7974095 (Publisher IEEE, Giardini Naxos, Italy, 2017).
Miranda, E., Jimenez, D. & Sune, J. The quantum pointcontact memristor. IEEE Electron Device Lett. 33, 1474–1476 (2012).
Ielmini, D. Modeling the universal set/reset characteristics of bipolar RRAM by field and temperaturedriven filament growth. IEEE Trans. Electron Devices 58, 4309–4317 (2011).
Wong, H.S. P. et al. Phase change memory. Proc. IEEE 98, 2201–2227 (2010).
Gallo, M. L., Tuma, T., Zipoli, F., Sebastian, A. & Eleftheriou, E. Inherent stochasticity in phasechange memory devices. In 2016 46th European SolidState Device Research Conference (ESSDERC) https://doi.org/10.1109/essderc.2016.7599664 (Publisher IEEE, Lausanne, Switzerland, 2016).
Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning (MIT Press, Cambridge, United States 2008).
James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning: with Applications in R (Springer, New York, United States 2017).
Jang, J.W., Park, S., Burr, G. W., Hwang, H. & Jeong, Y.H. Optimization of conductance change in Pr_{1–x}Ca_{ x }MnO_{3}based synaptic devices for neuromorphic systems. IEEE Electron Device Lett. 36, 457–459 (2015).
Jo, S. H. et al. Nanoscale memristor device as synapse in neuromorphic systems. Nano Lett. 10, 1297–1301 (2010).
Wang, I.T., Chang, C.C., Chiu, L.W., Chou, T. & Hou, T.H. 3D Ta/TaOx/TiO2/Ti synaptic array and linearity tuning of weight update for hardware neural network applications. Nanotechnology 27, 365204 (2016).
Chen, W. et al. A CMOScompatible electronic synapse device based on Cu/SiO2/W programmable metallization cells. Nanotechnology 27, 255202 (2016).
Yao, P. et al. Face classification using electronic synapses. Nat. Commun. 8, 15199 (2017).
Marinella, M. J. et al. Multiscale codesign analysis of energy, latency, area, and accuracy of a ReRAM analog neural training accelerator. Preprint at http://arxiv.org/abs/1707.09952 (2017).
Wu, W. et al. Improving analog switching in HfOxbased resistive memory with a thermal enhanced layer. IEEE Electron Device Lett. 38, 1019–1022 (2017).
Woo, J. et al. Improved synaptic behavior under identical pulses using AlO_{x}/HfO_{2} bilayer RRAM array for neuromorphic systems. IEEE Electron Device Lett. 37, 994–997 (2016).
Close, G. F. et al. Device, circuit and systemlevel analysis of noise in multibit phasechange memory. In 2010 International Electron Devices Meeting https://doi.org/10.1109/iedm.2010.5703445 (Publisher IEEE, San Francisco, USA, 2010).
Breitwisch, M. et al. Novel lithographyindependent pore phase change memory. In 2007 IEEE Symposium on VLSI Technology https://doi.org/10.1109/vlsit.2007.4339743 (Publisher IEEE, Kyoto, Japan, 2007).
Acknowledgements
We would like to thank Marwan Khater, Hiroyuki Miyazoe, Adam Pyzyna, and the staff of Microelectronics Research Laboratory at IBM T.J. Watson Research Center for their contributions in device fabrication. We would also like to thank Wilfried Haensch for management support and valuable discussions.
Author information
Authors and Affiliations
Contributions
T.A. conceived the idea. N.G. and T.A. performed the experiments and analyzed all data. T.I., N.G., and T.A. developed the GPRbased methodology. S.K. performed the experiments on ReRAM and analyzed the data. I.B. and A.S. performed the experiments on PCM and analyzed the data. V.N. provided managerial support and critical comments. N.G. and T.A. wrote the manuscript with input from all the authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gong, N., Idé, T., Kim, S. et al. Signal and noise extraction from analog memory elements for neuromorphic computing. Nat Commun 9, 2102 (2018). https://doi.org/10.1038/s41467018044851
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467018044851
This article is cited by

Datadriven RRAM device models using Kriging interpolation
Scientific Reports (2022)

Memristive device with highly continuous conduction modulation and its underlying physical mechanism for electronic synapse application
Science China Materials (2021)

Performance of deer hunting optimization based deep learning algorithm for speech emotion recognition
Multimedia Tools and Applications (2021)

Multi level cell (MLC) in 3D crosspoint phase change memory array
Science China Information Sciences (2021)

On Improving The Computing Capacity of Dynamical Systems
Scientific Reports (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.