Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Uncertainty quantification via a memristor Bayesian deep neural network for risk-sensitive reinforcement learning

Abstract

Many advanced artificial intelligence tasks, such as policy optimization, decision making and autonomous navigation, demand high-bandwidth data transfer and probabilistic computing, posing great challenges for conventional computing hardware. Since digital computers based on the von Neumann architecture are good at precise and deterministic computing, their computing efficiency is limited by the high cost of both data transfer between memory and computing units and massive random number generation. Here we develop a stochastic computation-in-memory computing system that can efficiently perform both in situ random number generation and computation based on the nanoscale physical behaviour of memristors. This system is constructed based on a hardware-implemented multiple-memristor-array system. To demonstrate its functionality and efficiency, we implement a typical risk-sensitive reinforcement learning task, namely the storm coast task, with a four-layer Bayesian deep neural network. The computing system efficiently decomposes aleatoric and epistemic uncertainties by exploiting the inherent stochasticity of memristor. Compared with the conventional digital computer, our memristor-based system achieves a 10 times higher speed and 150 times higher energy efficiency in uncertainty decomposition. This stochastic computation-in-memory computing system paves the way for high-speed and energy-efficient implementation of various probabilistic artificial intelligence algorithms.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: BDNN for uncertainty quantification under real-world dynamic scenarios.
Fig. 2: Stochastic behaviour of the memristor array and ESCIM system.
Fig. 3: Risk-sensitive RL task and implementation of the BDNN.
Fig. 4: Uncertainty quantification and results for the risk-sensitive RL task.

Data availability

The data supporting the plots within this paper and other data supporting the findings in this study are available in a Zenodo repository41.

Code availability

The codes used for the simulations described in Methods are available in a Zenodo repository41 and a GitHub repository (https://github.com/YudengLin/memristorBDNN). The code that supports the communication between the custom-built ESCIM system and the integrated chip is available from the corresponding author on reasonable request.

References

  1. Chouard, T. & Venema, L. Machine intelligence. Nature 521, 435–435 (2015).

    Article  Google Scholar 

  2. Duan, Y., Edwards, J. S. & Dwivedi, Y. K. Artificial intelligence for decision making in the era of Big Data—evolution, challenges and research agenda. Int. J. Inf. Manage. 48, 63–71 (2019).

    Article  Google Scholar 

  3. Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 521, 452–459 (2015).

    Article  Google Scholar 

  4. Abdar, M. et al. A review of uncertainty quantification in deep learning: techniques, applications and challenges. Inf. Fusion 76, 243–297 (2021).

    Article  Google Scholar 

  5. Wang, H. & Yeung, D.-Y. Towards Bayesian deep learning: a framework and some existing methods. IEEE Trans. Knowl. Data Eng. 28, 3395–3408 (2016).

    Article  Google Scholar 

  6. Michelmore, R. et al. Uncertainty quantification with statistical guarantees in end-to-end autonomous driving control. In 2020 IEEE International Conference on Robotics and Automation (ICRA) 7344–7350 (IEEE, 2020).

  7. McAllister, R. et al. Concrete problems for autonomous vehiclesafety: advantages of Bayesian deep learning. In Proc. 26th International Joint Conference on Artificial Intelligence (IJCAI) 4745–4753 (Elsevier, 2017).

  8. Ticknor, J. L. A Bayesian regularized artificial neural network for stock market forecasting. Expert Syst. Appl. 40, 5501–5506 (2013).

    Article  Google Scholar 

  9. Jang, H. & Lee, J. Generative Bayesian neural network model for risk-neutral pricing of American index options. Quant. Finance 19, 587–603 (2019).

    Article  MathSciNet  MATH  Google Scholar 

  10. Begoli, E., Bhattacharya, T. & Kusnezov, D. The need for uncertainty quantification in machine-assisted medical decision making. Nat. Mach. Intell. 1, 20–23 (2019).

    Article  Google Scholar 

  11. Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

    Article  Google Scholar 

  12. Hüllermeier, E. & Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110, 457–506 (2021).

    Article  MathSciNet  MATH  Google Scholar 

  13. Depeweg, S., Hernandez-Lobato, J.-M., Doshi-Velez, F. & Udluft, S. Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 1184–1193 (PMLR, 2018).

  14. Coates, A. et al. Deep learning with COTS HPC systems. In Proc. 30th International Conference on Machine Learning (eds Dasgupta, S. & McAllester, D.) 1337–1345 (PMLR, 2013).

  15. Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. In Proc. 44th Annual International Symposium on Computer Architecture 1–12 (ACM, 2017).

  16. Horowitz, M. 1.1 Computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) 10–14 (IEEE, 2014).

  17. Thomas, D. B., Howes, L. & Luk, W. A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation. In Proc. ACM/SIGDA International Symposium on Field Programmable Gate Arrays 63–72 (ACM, 2009).

  18. Askar, T., Shukirgaliyev, B., Lukac, M. & Abdikamalov, E. Evaluation of pseudo-random number generation on GPU cards. Computation 9, 142 (2021).

    Article  Google Scholar 

  19. Thomas, D. B., Luk, W., Leong, P. H. W. & Villasenor, J. D. Gaussian random number generators. ACM Comput. Surv. 39, 11 (2007).

    Article  Google Scholar 

  20. Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).

    Article  Google Scholar 

  21. Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).

    Article  Google Scholar 

  22. Prezioso, M. et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 61–64 (2015).

    Article  Google Scholar 

  23. Lin, Y. et al. Demonstration of generative adversarial network by intrinsic random noises of analog RRAM devices. In 2018 IEEE International Electron Devices Meeting (IEDM) 3.4.1–3.4.4 (IEEE, 2018).

  24. Gao, L., Chen, P.-Y. & Yu, S. Demonstration of convolution kernel operation on resistive cross-point array. IEEE Electron Device Lett. 37, 870–873 (2016).

    Article  Google Scholar 

  25. Yu, S. Neuro-inspired computing with emerging nonvolatile memorys. Proc. IEEE 106, 260–285 (2018).

    Article  Google Scholar 

  26. Burr, G. W. et al. Neuromorphic computing using non-volatile memory. Adv. Phys. X 3, 89–124 (2017).

    Google Scholar 

  27. Dalgaty, T. et al. In situ learning using intrinsic memristor variability via Markov chain Monte Carlo sampling. Nat. Electron. 4, 151–161 (2021).

    Article  Google Scholar 

  28. Dalgaty, T., Esmanhotto, E., Castellani, N., Querlioz, D. & Vianello, E. Ex situ transfer of Bayesian neural networks to resistive memory-based inference hardware. Adv. Intell. Syst. 3, 2000103 (2021).

    Article  Google Scholar 

  29. Balatti, S., Ambrogio, S., Wang, Z. & Ielmini, D. True random number generation by variability of resistive switching in oxide-based devices. IEEE J. Emerg. Select. Top. Circuits Syst. 5, 214–221 (2015).

    Article  Google Scholar 

  30. Vodenicarevic, D. et al. Low-energy truly random number generation with superparamagnetic tunnel junctions for unconventional computing. Phys. Rev. Appl. 8, 054045 (2017).

    Article  Google Scholar 

  31. Kim, G. et al. Self-clocking fast and variation tolerant true random number generator based on a stochastic mott memristor. Nat. Commun. 12, 2906 (2021).

    Article  Google Scholar 

  32. Jiang, H. et al. A novel true random number generator based on a stochastic diffusive memristor. Nat. Commun. 8, 882 (2017).

    Article  Google Scholar 

  33. Lin, B. et al. A high-performance and calibration-free true random number generator based on the resistance perturbation in RRAM array. In 2020 IEEE International Electron Devices Meeting (IEDM) 38.6.1–38.6.4 (IEEE, 2020).

  34. Wu, W. et al. Improving analog switching in HfOx-based resistive memory with a thermal enhanced layer. IEEE Electron Device Lett. 38, 1019–1022 (2017).

    Article  Google Scholar 

  35. Chen, J. et al. A parallel multibit programing scheme with high precision for RRAM-based neuromorphic systems. IEEE Trans. Electron Devices 67, 2213–2217 (2020).

    Article  Google Scholar 

  36. Puglisi, F. M., Pavan, P. & Larcher, L. Random telegraph noise in HfOx Resistive Random Access Memory: from physics to compact modeling. In 2016 IEEE International Reliability Physics Symposium (IRPS) MY-8-1–MY-8-5 (IEEE, 2016).

  37. Ambrogio, S. et al. Statistical fluctuations in HfOx resistive-switching memory: part II—random telegraph noise. IEEE Trans. Electron Devices 61, 2920–2927 (2014).

    Article  Google Scholar 

  38. Blundell, C., Cornebise, J., Kavukcuoglu, K. & Wierstra, D. Weight uncertainty in neural network. In Proc. 32nd International Conference on Machine Learning (eds Bach, F. & Blei, D.) 1613–1622 (PMLR, 2015).

  39. Depeweg, S., Hernández-Lobato, J. M., Doshi-Velez, F. & Udluft, S. Learning and policy search in stochastic dynamical systems with Bayesian neural networks. In 5th International Conference on Learning Representations 1–14 (ICLR, 2017).

  40. Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347 (2017).

  41. Lin, Y. YudengLin/memristorBDNN: uncertainty quantification via a memristor Bayesian deep neural network for risk-sensitive reinforcement learning. Zenodo https://doi.org/10.5281/zenodo.7947059 (2023).

Download references

Acknowledgements

This work was supported in part by the STI 2030-Major Projects (2021ZD0201200), the National Natural Science Foundation of China (92064001, 62025111 and 61974081), the XPLORER Prize, the Shanghai Municipal Science and Technology Major Project and the Beijing Advanced Innovation Center for Integrated Circuits.

Author information

Authors and Affiliations

Authors

Contributions

Y. Lin and Q.Z. contributed to the overall experimental design. B.G., J.Z. and H.W. supervised this project and proposed the overall architecture. Y. Lin, P.Y., Y.Z. and Y. Liu benchmarked the system performance. Z.L., C.L., W.Z. and S.H. helped with the simulations and data analysis. Y. Lin and B.G. contributed to writing and editing the manuscript. All authors examined the results and reviewed the manuscript.

Corresponding authors

Correspondence to Bin Gao or Huaqiang Wu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Read noise distribution in the different current states.

a-f, The target currents under six current state values as shown in the figures. The read disturbance values of the memristors in the same current state are the difference values between the read and average currents of each cell. We use a block mapping method to program the current state to the target value within a \(\Delta I\)=\(\pm\)0.3 μA error margin. We program 2800 memristor cells for each selected target state, and each cell is measured over 1000 read cycles. All read noise distributions in the target current states can be fitted with a double exponential distribution (solid line).

Extended Data Fig. 2 Functional block diagram of the ESCIM system.

The core board with a field-programmable gate array (FPGA) supports the lower computer in communication with the upper computer and generation of control signals for operations. The TIA&ADC board and each DAC board provide 64 current quantization channels and 64 voltage supply channels, respectively. The eight socket boards containing 4K memristor chips can be connected in parallel to the DUT board. The mother board with voltage and digital signal conversion circuits is used for the connection of other boards.

Extended Data Fig. 3 Structure of the memristor BDNN and deployment on eight 4K memristor chips in the ESCIM system.

a, Structure of the memristor BDNN. Each layer input in the memristor BDNN is quantized to 8 bits. The activation functions in the hidden layers are rectifier functions, that is, ReLu(x) = max(x, 0), and those in the output layers are identity functions, that is, Linear(x) = x. The bias input of each layer is not shown in the figure. The dimensions of each memristor matrix are 1800 (6 × 100 × 3), 30300 (101 × 100 × 3), 606 (101 × 2 × 3). b, The memristor matrix is mapped onto eight 4K memristor chips. The 4K memristor chips are sequentially filled in columns, and different matrices can start with a new column.

Extended Data Fig. 4 Average drift current under various current states.

A statistical analysis of the average drift current \(\delta I\) with respect to the initial current under various current states is conducted. 1890 cells are programmed into a specific current state and the drift current of the cells is averaged. The drift current is the difference between the present read current and the initial current.

Extended Data Fig. 5

Flow chart of ex-situ training using the memristor variational inference and the drift compensation technique.

Extended Data Fig. 6 The read current distribution of the BDNN after memristor programming.

The purple histogram shows the programmed results, and the grey histogram depicts the target current state.

Extended Data Fig. 7 Prediction distribution of the memristor BDNN in the ESCIM system.

a-i, Typical y′ samples in the next state for several typical and noteworthy locations (\(x,{y}\)), where \(x\) = −10, 0, 10 and y = 1, 5, 9. We set action (\({a}_{x},\,{a}_{y}\))=(0, 0) and analyse the \(y\)-axis value at the next location for simplicity. The next state of the ground truth is sampled 360 times in the true dynamic sea environment (the same random seeds are set), and the next state is predicted 360 times to obtain the probability density. There is a slight difference between the prediction distribution (purple histogram) and the ground truth distribution (yellow histogram). The histogram is truncated at y′<0. Notably, the smaller the \(y\) value is, the higher the random disturbance in the predictive samples.

Extended Data Fig. 8 Predicted performance of the memristor BDNN over time with and without compensation.

We use the Jensen–Shannon (JS) divergence index as a performance metric to measure the similarity between two probability distributions (JS divergence \(\in\)[0, 1]). The time is counted from the moment when the programming process is finished. At ~3 seconds, the average JS divergence over the nine typical states is 0.021. The figure shows that the average JS divergence with drift compensation remains nearly constant over ~7000 seconds, indicating that the prediction performance of the memristor BDNN is as good as that at the beginning, and the memristor BDNN suitably accomplishes the regression task in a complex dynamic environment.

Extended Data Fig. 9

Circuit modules of the simulated memristor core to evaluate the speed and energy cost of the BDNN in the RL storm coast task.

Extended Data Fig. 10 Energy cost and latency of the GPU and ESCIM system in performing uncertainty decomposition in the risk-sensitive RL storm coast task.

a, Compared to the NVIDIA Tesla A100 GPU, the energy cost of the memristor-based ESCIM system is approximately 27 times better at 130 nm and 150 times at 28 nm. b, In addition, in regard to latency, that of the ESCIM system is 5 times better at 130 nm and 10 times better at 28 nm than that of the GPU.

Supplementary information

Supplementary Information

Supplementary Figs. 1–16, Tables 1–8, Notes 1–5 and Video 1.

Supplementary Video 1

The movie shows result for the risk-sensitive RL storm coast task. A boat trajectory passes through the low-epistemic uncertainty and low environmental stochastic sea area. The aleatoric and epistemic uncertainties are warning when boat is in sea areas with a high environmental stochasticity and high epistemic uncertainty, respectively. The warnings guide the boat paddling upwards in order to leave the high-uncertainties areas. The stable point occurred at a suitable distance from the coast owing to consideration of the uncertainties.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Y., Zhang, Q., Gao, B. et al. Uncertainty quantification via a memristor Bayesian deep neural network for risk-sensitive reinforcement learning. Nat Mach Intell 5, 714–723 (2023). https://doi.org/10.1038/s42256-023-00680-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00680-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing