A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices

Xue, Cheng-Xin; Chiu, Yen-Cheng; Liu, Ta-Wei; Huang, Tsung-Yuan; Liu, Je-Syu; Chang, Ting-Wei; Kao, Hui-Yao; Wang, Jing-Hong; Wei, Shih-Ying; Lee, Chun-Ying; Huang, Sheng-Po; Hung, Je-Min; Teng, Shih-Hsih; Wei, Wei-Chen; Chen, Yi-Ren; Hsu, Tzu-Hsiang; Chen, Yen-Kai; Lo, Yun-Chen; Wen, Tai-Hsing; Lo, Chung-Chuan; Liu, Ren-Shuo; Hsieh, Chih-Cheng; Tang, Kea-Tiong; Ho, Mon-Shu; Su, Chin-Yi; Chou, Chung-Cheng; Chih, Yu-Der; Chang, Meng-Fan

doi:10.1038/s41928-020-00505-5

Article
Published: 14 December 2020

A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices

Cheng-Xin Xue¹^na1,
Yen-Cheng Chiu¹^na1,
Ta-Wei Liu¹,
Tsung-Yuan Huang¹,
Je-Syu Liu¹,
Ting-Wei Chang¹,
Hui-Yao Kao¹,
Jing-Hong Wang¹,
Shih-Ying Wei¹,
Chun-Ying Lee¹,
Sheng-Po Huang¹,
Je-Min Hung¹,
Shih-Hsih Teng¹,
Wei-Chen Wei¹,
Yi-Ren Chen¹,
Tzu-Hsiang Hsu¹,
Yen-Kai Chen¹,
Yun-Chen Lo¹,
Tai-Hsing Wen¹,
Chung-Chuan Lo¹,
Ren-Shuo Liu¹,
Chih-Cheng Hsieh¹,
Kea-Tiong Tang¹,
Mon-Shu Ho²,
Chin-Yi Su³,
Chung-Cheng Chou³,
Yu-Der Chih³ &
…
Meng-Fan Chang ORCID: orcid.org/0000-0001-6905-6350¹

Nature Electronics volume 4, pages 81–90 (2021)Cite this article

5293 Accesses
72 Citations
14 Altmetric
Metrics details

Subjects

Abstract

The development of small, energy-efficient artificial intelligence edge devices is limited in conventional computing architectures by the need to transfer data between the processor and memory. Non-volatile compute-in-memory (nvCIM) architectures have the potential to overcome such issues, but the development of high-bit-precision configurations required for dot-product operations remains challenging. In particular, input–output parallelism and cell-area limitations, as well as signal margin degradation, computing latency in multibit analogue readout operations and manufacturing challenges, still need to be addressed. Here we report a 2 Mb nvCIM macro (which combines memory cells and related peripheral circuitry) that is based on single-level cell resistive random-access memory devices and is fabricated in a 22 nm complementary metal–oxide–semiconductor foundry process. Compared with previous nvCIM schemes, our macro can perform multibit dot-product operations with increased input–output parallelism, reduced cell-array area, improved accuracy, and reduced computing latency and energy consumption. The macro can, in particular, achieve latencies between 9.2 and 18.3 ns, and energy efficiencies between 146.21 and 36.61 tera-operations per second per watt, for binary and multibit input–weight–output configurations, respectively.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of proposed nvCIM structure.**

**Fig. 2: Operation of BLIOMC and S2CWMB for multibit dot-product operations.**

**Fig. 3: In situ HRS-C scheme and HRS-FQ flow.**

**Fig. 4: Structure and operations of DbSO-CSA and GRLM-RCG.**

**Fig. 5: Measurement results of proposed nvCIM macro.**

Selenium alloyed tellurium oxide for amorphous p-channel transistors

Article 10 April 2024

Ao Liu, Yong-Sung Kim, … Yong-Young Noh

Giant energy storage and power density negative capacitance superlattices

Article 09 April 2024

Suraj S. Cheema, Nirmaan Shanker, … Sayeef Salahuddin

Giant nanomechanical energy storage capacity in twisted single-walled carbon nanotube ropes

Article Open access 16 April 2024

Shigenori Utsumi, Sanjeev Kumar Ujjain, … Katsumi Kaneko

Data availability

The data supporting the plots in this paper and other findings of this study are available from the corresponding author upon reasonable request.

Code availability

The code supporting the experimental platforms and proposed nvCIM testchip is available from the corresponding author upon reasonable request.

References

Chen, W.-H. et al. A 65 nm 1 Mb nonvolatile computing-in-memory ReRAM macro with sub-16 ns multiply-and-accumulate for binary DNN AI edge processor. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 494–496 (IEEE, 2018).
Chen, W.-H. et al. CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors. Nat. Electron. 2, 420–428 (2019).
Article Google Scholar
Mochida, R. et al. A 4M synapses integrated analog ReRAM based 66.5 TOPS/W neural-network processor with cell current controlled writing and flexible network architecture. In IEEE Symposium on VLSI Technology 175–176 (IEEE, 2018).
Xue, C.-X. et al. A 1Mb multibit ReRAM computing-in-memory macro with 14.6 ns parallel MAC computing time for CNN-based AI edge processors. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 388–390 (IEEE, 2019).
Xue, C.-X. et al. A 22 nm 2 Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 244–245 (IEEE, 2020).
Tang, K.-T. et al. Considerations of integrating computing-in-memory and processing-in-sensor into convolutional neural network accelerators for low-power edge devices. In IEEE Symposium on VLSI Technology Digest of Technical Papers 166–167 (IEEE, 2019).
Borghetti, J. et al. Memristive’ switches enable ‘stateful’ logic operations via material implication. Nature 464, 873–876 (2010).
Article Google Scholar
Li, H. et al. Hyperdimensional computing with 3D VRRAM in-memory kernels: device-architecture co-design for energy-efficient, error-resilient language recognition. In Technical Digest International Electron Devices Meeting (IEDM) 16.1.1–16.1.4 (IEDM, 2016).
Chen, B. et al. Efficient in-memory computing architecture based on crossbar arrays. In Technical Digest International Electron Devices Meeting (IEDM) 16.5.1–16.5.4 (IEDM, 2015).
Prezioso, M. et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 61–64 (2015).
Article Google Scholar
Yao, P. et al. Face classification using electronic synapses. Nat. Commun. 8, 15199 (2017).
Article Google Scholar
Sheridan, P. et al. Sparse coding with memristor networks. Nat. Nanotechnol. 12, 784–789 (2017).
Article Google Scholar
Li, C. et al. Analogue signal and image processing with large memristor crossbars. Nat. Electron. 1, 52–59 (2018).
Article Google Scholar
Wang, Z. et al. Fully memristive neural networks for pattern classification with unsupervised learning. Nat. Electron. 1, 137–145 (2018).
Article Google Scholar
Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).
Article Google Scholar
Wu, F. et al. Brain-inspired computing exploiting carbon nanotube FETs and resistive RAM: hyperdimensional computing case study. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 492–493 (IEEE, 2018).
Wong, H.-S. P. et al. Memory leads the way to better computing. Nat. Nanotechnol. 10, 191–194 (2015).
Article Google Scholar
Yang, J. J. et al. Memristive devices for computing. Nat. Nanotechnol. 8, 13–24 (2013).
Article Google Scholar
Zidan, M.-A. et al. The future of electronics based on memristive systems. Nat. Electron. 1, 22–29 (2018).
Article Google Scholar
Ielmini, D. et al. In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).
Article Google Scholar
Zhang, J. et al. In-memory computation of a machine-learning classifier in a standard 6T SRAM array. IEEE J. Solid State Circuits 52, 915–924 (2017).
Article Google Scholar
Khwa, W.-S. et al. A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 496–497 (IEEE, 2018).
Gonugondla, S. K. et al. A variation-tolerant in-memory machine learning classifier via on-chip training. IEEE J. Solid State Circuits 53, 3163–3173 (2018).
Article Google Scholar
Gonugondla, S. K. et al. A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 490–491 (IEEE, 2018).
Jiang, Z. et al. XNOR-SRAM: in-memory computing SRAM macro for binary/ternary deep neural networks. In IEEE Symposium on VLSI Technology 173–174 (IEEE, 2018).
Dong, Q. et al. A 0.3V VDDmin 4 + 2 T SRAM for searching and in-memory computing using 55 nm DDC technology. In IEEE Symposium on VLSI Circuits Digest of Technical Papers 160–161 (IEEE, 2017).
Agrawal, A. et al. X-SRAM: enabling in-memory boolean computations in CMOS static random access memories. IEEE Trans. Circuits Syst. I 65, 4219–4232 (2018).
Article Google Scholar
Jeloka, S. et al. A 28 nm configurable memory (TCAM/BCAM/SRAM) using push-rule 6T bit cell enabling logic-in-memory. IEEE J. Solid State Circuits 51, 1009–1021 (2016).
Article Google Scholar
Agrawal, A. et al. Xcel-RAM: accelerating binary neural networks in high-throughput SRAM compute arrays. IEEE Trans. Circuits Syst. I 66, 3064–3076 (2019).
Article Google Scholar
Bankman, D. et al. An always-on 3.8 μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28-nm CMOS. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 222–224 (IEEE, 2018).
Si, X. et al. A twin-8T SRAM computation-in-memory macro for multiple-bits CNN-based machine learning. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 396–397 (IEEE, 2019).
Si, X. et al. A 28 nm 64Kb 6 T SRAM computing-in-memory macro with 8b MAC operation for AI edge chips. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 246–247 (IEEE, 2020).
Su, J. -W. et al. A 28 nm 64Kb INference-training Two-way Transpose Multibit 6 T SRAM compute-in-memory macro for AI edge chips. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 240–241 (IEEE, 2020).
Kang, M. et al. An energy-efficient VLSI architecture for pattern recognition via deep embedding of computation in SRAM. In IEEE Conference on Acoustics Speech and Signal Processing (ICASSP) 8326–8330 (IEEE, 2014).
Kang, M. et al. A multi-functional in-memory inference processor using a standard 6T SRAM array. IEEE J. Solid State Circuits 53, 642–655 (2018).
Article Google Scholar
Kang, M. et al. A 19.4-nJ/decision, 364-K decisions/s, in-memory random forest multi-class inference accelerator. IEEE J. Solid State Circuits 53, 2126–2135 (2018).
Article Google Scholar
Valavi, H. et al. A mixed-signal binarized convolutional-neural-network accelerator integrating dense weight storage and multiplication for reduced data movement. In IEEE Symposium on VLSI Circuits 141–142 (IEEE, 2018).
Biswas, A. et al. Conv-RAM: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 488–489 (IEEE, 2018).
Yang J. et al. Sandwich-RAM: an energy-efficient in-memory BWN architecture with pulse-width modulation. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 394–395 (IEEE, 2019).
Lecun, Y. et al. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Article Google Scholar
Wang, C.-H. et al. Three-dimensional 4F2 ReRAM cell with CMOS logic compatible process. In Technical Digest International Electron Devices Meeting (IEDM) 29.6.1–29.6.4 (IEDM, 2010).
Tseng, Y.-H. et al. High density and ultra small cell size of contact ReRAM (CR-RAM) in 90nm CMOS logic technology and circuits. In Technical Digest International Electron Devices Meeting (IEDM) 16.5.1–16.5.4 (IEDM, 2009).
Wang, C.-H. et al. Three-dimensional 4F2 ReRAM with vertical BJT driver by CMOS logic compatible process. IEEE Trans. Electron Devices 58, 2466–2472 (2011).
Article Google Scholar
He, K. et al. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).
Krizhevsky, A. et al. Learning Multiple Layers of Features from Tiny Images Ch. 3 (2009); https://www.cs.toronto.edu/~kriz/cifar.html
Travis, N. B. et al. A high-speed clamped bit-line current-mode sense amplifier. IEEE J. Solid State Circuit 26, 542–548 (1991).
Article Google Scholar
Chang, M.-F. et al. Embedded 1Mb ReRAM in 28nm CMOS with 0.27-to1V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 332–334 (IEEE, 2014).
Liu, P. et al. A 65 nm ReRAM-enabled nonvolatile processor with 6× reduction in restore time and 4× higher clock frequency using adaptive data retention and self-write-termination nonvolatile logic. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 84–86 (IEEE, 2016).
Chiu, Y.-C. et al. A 40 nm 2 Mb ReRAM macro with 85% reduction in FORMING time and 99% reduction in page-write time using auto-FORMING and auto-write schemes. In IEEE Symposium on VLSI Technology Digest of Technical Papers 232–233 (IEEE, 2019).

Download references

Acknowledgements

We appreciate the support from NVM-DTP of TSMC, TSMC-NTHU JDP, NTHU as well as MOST-Taiwan for technical and financial support.

Author information

These authors contributed equally: Cheng-Xin Xue, Yen-Cheng Chiu.

Authors and Affiliations

National Tsing Hua University (NTHU), Hsinchu City, Taiwan, Republic of China
Cheng-Xin Xue, Yen-Cheng Chiu, Ta-Wei Liu, Tsung-Yuan Huang, Je-Syu Liu, Ting-Wei Chang, Hui-Yao Kao, Jing-Hong Wang, Shih-Ying Wei, Chun-Ying Lee, Sheng-Po Huang, Je-Min Hung, Shih-Hsih Teng, Wei-Chen Wei, Yi-Ren Chen, Tzu-Hsiang Hsu, Yen-Kai Chen, Yun-Chen Lo, Tai-Hsing Wen, Chung-Chuan Lo, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang & Meng-Fan Chang
National Chung Hsing University (NCHU), Taichung City, Taiwan, Republic of China
Mon-Shu Ho
Taiwan Semiconductor Manufacturing Company (TSMC), Hsinchu, Taiwan, Republic of China
Chin-Yi Su, Chung-Cheng Chou & Yu-Der Chih

Authors

Cheng-Xin Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yen-Cheng Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Ta-Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tsung-Yuan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Je-Syu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ting-Wei Chang
View author publications
You can also search for this author in PubMed Google Scholar
Hui-Yao Kao
View author publications
You can also search for this author in PubMed Google Scholar
Jing-Hong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Ying Wei
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Ying Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-Po Huang
View author publications
You can also search for this author in PubMed Google Scholar
Je-Min Hung
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Hsih Teng
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Chen Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Ren Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tzu-Hsiang Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Yen-Kai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yun-Chen Lo
View author publications
You can also search for this author in PubMed Google Scholar
Tai-Hsing Wen
View author publications
You can also search for this author in PubMed Google Scholar
Chung-Chuan Lo
View author publications
You can also search for this author in PubMed Google Scholar
Ren-Shuo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Cheng Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Kea-Tiong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Mon-Shu Ho
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Yi Su
View author publications
You can also search for this author in PubMed Google Scholar
Chung-Cheng Chou
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Der Chih
View author publications
You can also search for this author in PubMed Google Scholar
Meng-Fan Chang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.-X.X. and Y.-C.C. designed the circuits for the nvCIM macro and testchip. C.-X.X., Y.-C.C., C.-Y. S., C.-C.H., C.-C.L., K.-T.T., R.-S.L, M.-S.H. and M.-F.C. contributed ideas. J.-S.L., T.-W.C., H.-Y.K., T.-Y.H., S.-P.H., C.-Y.L., J.-M.H., S.-H.T., T.-H.H. Y.-K.C. and S.-Y.W. built the test measurement system and testing flow for the ReRAM nvCIM macro. T.-W.L., W.-C.W., Y.-R.C., Y.-C.L., T.-H.W. and J.-H.W. built the CIFAR-10 demonstration system. C.-X.X. and Y.-C.C. performed the analysis and measurements of the nvCIM macro. C.-C.C., Y.-D.C. and M.-F.C. managed the project. C.-X.X., Y.-C.C., M.-S.H. and M.-F.C. wrote the paper.

Corresponding author

Correspondence to Meng-Fan Chang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sections 1–10.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xue, CX., Chiu, YC., Liu, TW. et al. A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices. Nat Electron 4, 81–90 (2021). https://doi.org/10.1038/s41928-020-00505-5

Download citation

Received: 25 February 2020
Accepted: 26 October 2020
Published: 14 December 2020
Issue Date: January 2021
DOI: https://doi.org/10.1038/s41928-020-00505-5

This article is cited by

Powering AI at the edge: A robust, memristor-based binarized neural network with near-memory computing and miniaturized solar cell
- Fadi Jebali
- Atreya Majumdar
- Jean-Michel Portal
Nature Communications (2024)
Thousands of conductance levels in memristors integrated on CMOS
- Mingyi Rao
- Hao Tang
- J. Joshua Yang
Nature (2023)
Wurtzite and fluorite ferroelectric materials for electronic memory
- Kwan-Ho Kim
- Ilya Karpov
- Deep Jariwala
Nature Nanotechnology (2023)
Bringing uncertainty quantification to the extreme-edge with memristor-based Bayesian neural networks
- Djohan Bonnet
- Tifenn Hirtzlin
- Elisa Vianello
Nature Communications (2023)
Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators
- Malte J. Rasch
- Charles Mackin
- Vijay Narayanan
Nature Communications (2023)