Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices

Abstract

The development of small, energy-efficient artificial intelligence edge devices is limited in conventional computing architectures by the need to transfer data between the processor and memory. Non-volatile compute-in-memory (nvCIM) architectures have the potential to overcome such issues, but the development of high-bit-precision configurations required for dot-product operations remains challenging. In particular, input–output parallelism and cell-area limitations, as well as signal margin degradation, computing latency in multibit analogue readout operations and manufacturing challenges, still need to be addressed. Here we report a 2 Mb nvCIM macro (which combines memory cells and related peripheral circuitry) that is based on single-level cell resistive random-access memory devices and is fabricated in a 22 nm complementary metal–oxide–semiconductor foundry process. Compared with previous nvCIM schemes, our macro can perform multibit dot-product operations with increased input–output parallelism, reduced cell-array area, improved accuracy, and reduced computing latency and energy consumption. The macro can, in particular, achieve latencies between 9.2 and 18.3 ns, and energy efficiencies between 146.21 and 36.61 tera-operations per second per watt, for binary and multibit input–weight–output configurations, respectively.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overview of proposed nvCIM structure.
Fig. 2: Operation of BLIOMC and S2CWMB for multibit dot-product operations.
Fig. 3: In situ HRS-C scheme and HRS-FQ flow.
Fig. 4: Structure and operations of DbSO-CSA and GRLM-RCG.
Fig. 5: Measurement results of proposed nvCIM macro.

Data availability

The data supporting the plots in this paper and other findings of this study are available from the corresponding author upon reasonable request.

Code availability

The code supporting the experimental platforms and proposed nvCIM testchip is available from the corresponding author upon reasonable request.

References

  1. 1.

    Chen, W.-H. et al. A 65 nm 1 Mb nonvolatile computing-in-memory ReRAM macro with sub-16 ns multiply-and-accumulate for binary DNN AI edge processor. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 494–496 (IEEE, 2018).

  2. 2.

    Chen, W.-H. et al. CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors. Nat. Electron. 2, 420–428 (2019).

    Article  Google Scholar 

  3. 3.

    Mochida, R. et al. A 4M synapses integrated analog ReRAM based 66.5 TOPS/W neural-network processor with cell current controlled writing and flexible network architecture. In IEEE Symposium on VLSI Technology 175–176 (IEEE, 2018).

  4. 4.

    Xue, C.-X. et al. A 1Mb multibit ReRAM computing-in-memory macro with 14.6 ns parallel MAC computing time for CNN-based AI edge processors. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 388–390 (IEEE, 2019).

  5. 5.

    Xue, C.-X. et al. A 22 nm 2 Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 244–245 (IEEE, 2020).

  6. 6.

    Tang, K.-T. et al. Considerations of integrating computing-in-memory and processing-in-sensor into convolutional neural network accelerators for low-power edge devices. In IEEE Symposium on VLSI Technology Digest of Technical Papers 166–167 (IEEE, 2019).

  7. 7.

    Borghetti, J. et al. Memristive’ switches enable ‘stateful’ logic operations via material implication. Nature 464, 873–876 (2010).

    Article  Google Scholar 

  8. 8.

    Li, H. et al. Hyperdimensional computing with 3D VRRAM in-memory kernels: device-architecture co-design for energy-efficient, error-resilient language recognition. In Technical Digest International Electron Devices Meeting (IEDM) 16.1.1–16.1.4 (IEDM, 2016).

  9. 9.

    Chen, B. et al. Efficient in-memory computing architecture based on crossbar arrays. In Technical Digest International Electron Devices Meeting (IEDM) 16.5.1–16.5.4 (IEDM, 2015).

  10. 10.

    Prezioso, M. et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 61–64 (2015).

    Article  Google Scholar 

  11. 11.

    Yao, P. et al. Face classification using electronic synapses. Nat. Commun. 8, 15199 (2017).

    Article  Google Scholar 

  12. 12.

    Sheridan, P. et al. Sparse coding with memristor networks. Nat. Nanotechnol. 12, 784–789 (2017).

    Article  Google Scholar 

  13. 13.

    Li, C. et al. Analogue signal and image processing with large memristor crossbars. Nat. Electron. 1, 52–59 (2018).

    Article  Google Scholar 

  14. 14.

    Wang, Z. et al. Fully memristive neural networks for pattern classification with unsupervised learning. Nat. Electron. 1, 137–145 (2018).

    Article  Google Scholar 

  15. 15.

    Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).

    Article  Google Scholar 

  16. 16.

    Wu, F. et al. Brain-inspired computing exploiting carbon nanotube FETs and resistive RAM: hyperdimensional computing case study. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 492–493 (IEEE, 2018).

  17. 17.

    Wong, H.-S. P. et al. Memory leads the way to better computing. Nat. Nanotechnol. 10, 191–194 (2015).

    Article  Google Scholar 

  18. 18.

    Yang, J. J. et al. Memristive devices for computing. Nat. Nanotechnol. 8, 13–24 (2013).

    Article  Google Scholar 

  19. 19.

    Zidan, M.-A. et al. The future of electronics based on memristive systems. Nat. Electron. 1, 22–29 (2018).

    Article  Google Scholar 

  20. 20.

    Ielmini, D. et al. In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).

    Article  Google Scholar 

  21. 21.

    Zhang, J. et al. In-memory computation of a machine-learning classifier in a standard 6T SRAM array. IEEE J. Solid State Circuits 52, 915–924 (2017).

    Article  Google Scholar 

  22. 22.

    Khwa, W.-S. et al. A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 496–497 (IEEE, 2018).

  23. 23.

    Gonugondla, S. K. et al. A variation-tolerant in-memory machine learning classifier via on-chip training. IEEE J. Solid State Circuits 53, 3163–3173 (2018).

    Article  Google Scholar 

  24. 24.

    Gonugondla, S. K. et al. A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 490–491 (IEEE, 2018).

  25. 25.

    Jiang, Z. et al. XNOR-SRAM: in-memory computing SRAM macro for binary/ternary deep neural networks. In IEEE Symposium on VLSI Technology 173–174 (IEEE, 2018).

  26. 26.

    Dong, Q. et al. A 0.3V VDDmin 4 + 2 T SRAM for searching and in-memory computing using 55 nm DDC technology. In IEEE Symposium on VLSI Circuits Digest of Technical Papers 160–161 (IEEE, 2017).

  27. 27.

    Agrawal, A. et al. X-SRAM: enabling in-memory boolean computations in CMOS static random access memories. IEEE Trans. Circuits Syst. I 65, 4219–4232 (2018).

    Article  Google Scholar 

  28. 28.

    Jeloka, S. et al. A 28 nm configurable memory (TCAM/BCAM/SRAM) using push-rule 6T bit cell enabling logic-in-memory. IEEE J. Solid State Circuits 51, 1009–1021 (2016).

    Article  Google Scholar 

  29. 29.

    Agrawal, A. et al. Xcel-RAM: accelerating binary neural networks in high-throughput SRAM compute arrays. IEEE Trans. Circuits Syst. I 66, 3064–3076 (2019).

    Article  Google Scholar 

  30. 30.

    Bankman, D. et al. An always-on 3.8 μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28-nm CMOS. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 222–224 (IEEE, 2018).

  31. 31.

    Si, X. et al. A twin-8T SRAM computation-in-memory macro for multiple-bits CNN-based machine learning. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 396–397 (IEEE, 2019).

  32. 32.

    Si, X. et al. A 28 nm 64Kb 6 T SRAM computing-in-memory macro with 8b MAC operation for AI edge chips. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 246–247 (IEEE, 2020).

  33. 33.

    Su, J. -W. et al. A 28 nm 64Kb INference-training Two-way Transpose Multibit 6 T SRAM compute-in-memory macro for AI edge chips. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 240–241 (IEEE, 2020).

  34. 34.

    Kang, M. et al. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM. In IEEE Conference on Acoustics Speech and Signal Processing (ICASSP) 8326–8330 (IEEE, 2014).

  35. 35.

    Kang, M. et al. A multi-functional in-memory inference processor using a standard 6T SRAM array. IEEE J. Solid State Circuits 53, 642–655 (2018).

    Article  Google Scholar 

  36. 36.

    Kang, M. et al. A 19.4-nJ/decision, 364-K decisions/s, in-memory random forest multi-class inference accelerator. IEEE J. Solid State Circuits 53, 2126–2135 (2018).

    Article  Google Scholar 

  37. 37.

    Valavi, H. et al. A mixed-signal binarized convolutional-neural-network accelerator integrating dense weight storage and multiplication for reduced data movement. In IEEE Symposium on VLSI Circuits 141–142 (IEEE, 2018).

  38. 38.

    Biswas, A. et al. Conv-RAM: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 488–489 (IEEE, 2018).

  39. 39.

    Yang J. et al. Sandwich-RAM: an energy-efficient in-memory BWN architecture with pulse-width modulation. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 394–395 (IEEE, 2019).

  40. 40.

    Lecun, Y. et al. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

    Article  Google Scholar 

  41. 41.

    Wang, C.-H. et al. Three-dimensional 4F2 ReRAM cell with CMOS logic compatible process. In Technical Digest International Electron Devices Meeting (IEDM) 29.6.1–29.6.4 (IEDM, 2010).

  42. 42.

    Tseng, Y.-H. et al. High density and ultra small cell size of contact ReRAM (CR-RAM) in 90nm CMOS logic technology and circuits. In Technical Digest International Electron Devices Meeting (IEDM) 16.5.1–16.5.4 (IEDM, 2009).

  43. 43.

    Wang, C.-H. et al. Three-dimensional 4F2 ReRAM with vertical BJT driver by CMOS logic compatible process. IEEE Trans. Electron Devices 58, 2466–2472 (2011).

    Article  Google Scholar 

  44. 44.

    He, K. et al. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).

  45. 45.

    Krizhevsky, A. et al. Learning Multiple Layers of Features from Tiny Images Ch. 3 (2009); https://www.cs.toronto.edu/~kriz/cifar.html

  46. 46.

    Travis, N. B. et al. A high-speed clamped bit-line current-mode sense amplifier. IEEE J. Solid State Circuit 26, 542–548 (1991).

    Article  Google Scholar 

  47. 47.

    Chang, M.-F. et al. Embedded 1Mb ReRAM in 28nm CMOS with 0.27-to1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 332–334 (IEEE, 2014).

  48. 48.

    Liu, P. et al. A 65 nm ReRAM-enabled nonvolatile processor with 6× reduction in restore time and 4× higher clock frequency using adaptive data retention and self-write-termination nonvolatile logic. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 84–86 (IEEE, 2016).

  49. 49.

    Chiu, Y.-C. et al. A 40 nm 2 Mb ReRAM macro with 85% reduction in FORMING time and 99% reduction in page-write time using auto-FORMING and auto-write schemes. In IEEE Symposium on VLSI Technology Digest of Technical Papers 232–233 (IEEE, 2019).

Download references

Acknowledgements

We appreciate the support from NVM-DTP of TSMC, TSMC-NTHU JDP, NTHU as well as MOST-Taiwan for technical and financial support.

Author information

Affiliations

Authors

Contributions

C.-X.X. and Y.-C.C. designed the circuits for the nvCIM macro and testchip. C.-X.X., Y.-C.C., C.-Y. S., C.-C.H., C.-C.L., K.-T.T., R.-S.L, M.-S.H. and M.-F.C. contributed ideas. J.-S.L., T.-W.C., H.-Y.K., T.-Y.H., S.-P.H., C.-Y.L., J.-M.H., S.-H.T., T.-H.H. Y.-K.C. and S.-Y.W. built the test measurement system and testing flow for the ReRAM nvCIM macro. T.-W.L., W.-C.W., Y.-R.C., Y.-C.L., T.-H.W. and J.-H.W. built the CIFAR-10 demonstration system. C.-X.X. and Y.-C.C. performed the analysis and measurements of the nvCIM macro. C.-C.C., Y.-D.C. and M.-F.C. managed the project. C.-X.X., Y.-C.C., M.-S.H. and M.-F.C. wrote the paper.

Corresponding author

Correspondence to Meng-Fan Chang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sections 1–10.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xue, CX., Chiu, YC., Liu, TW. et al. A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices. Nat Electron 4, 81–90 (2021). https://doi.org/10.1038/s41928-020-00505-5

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing