A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

Le Gallo, Manuel; Khaddam-Aljameh, Riduan; Stanisavljevic, Milos; Vasilopoulos, Athanasios; Kersting, Benedikt; Dazzi, Martino; Karunaratne, Geethan; Brändli, Matthias; Singh, Abhairaj; Müller, Silvia M.; Büchel, Julian; Timoneda, Xavier; Joshi, Vinay; Rasch, Malte J.; Egger, Urs; Garofalo, Angelo; Petropoulos, Anastasios; Antonakopoulos, Theodore; Brew, Kevin; Choi, Samuel; Ok, Injo; Philip, Timothy; Chan, Victor; Silvestre, Claire; Ahsan, Ishtiaq; Saulnier, Nicole; Narayanan, Vijay; Francese, Pier Andrea; Eleftheriou, Evangelos; Sebastian, Abu

doi:10.1038/s41928-023-01010-1

Article
Published: 10 August 2023

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

Manuel Le Gallo ORCID: orcid.org/0000-0003-1600-6151¹^na1,
Riduan Khaddam-Aljameh¹^na1,
Milos Stanisavljevic¹^na1,
Athanasios Vasilopoulos ORCID: orcid.org/0009-0001-9081-6139¹,
Benedikt Kersting¹,
Martino Dazzi¹,
Geethan Karunaratne ORCID: orcid.org/0000-0002-0805-4789¹,
Matthias Brändli¹,
Abhairaj Singh¹,
Silvia M. Müller²,
Julian Büchel¹,
Xavier Timoneda¹,
Vinay Joshi ORCID: orcid.org/0000-0001-6031-1669¹,
Malte J. Rasch ORCID: orcid.org/0000-0002-7988-4624³,
Urs Egger¹,
Angelo Garofalo¹,
Anastasios Petropoulos ORCID: orcid.org/0000-0003-1669-5233⁴,
Theodore Antonakopoulos ORCID: orcid.org/0000-0002-7863-1051⁴,
Kevin Brew ORCID: orcid.org/0000-0002-2515-2882⁵,
Samuel Choi⁵,
Injo Ok⁵,
Timothy Philip⁵,
Victor Chan⁵,
Claire Silvestre⁵,
Ishtiaq Ahsan⁵,
Nicole Saulnier⁵,
Vijay Narayanan³,
Pier Andrea Francese¹,
Evangelos Eleftheriou¹ &
…
Abu Sebastian ORCID: orcid.org/0000-0001-5603-5243¹

Nature Electronics volume 6, pages 680–693 (2023)Cite this article

8613 Accesses
31 Citations
208 Altmetric
Metrics details

Subjects

Abstract

Analogue in-memory computing (AIMC) with resistive memory devices could reduce the latency and energy consumption of deep neural network inference tasks by directly performing computations within memory. However, to achieve end-to-end improvements in latency and energy consumption, AIMC must be combined with on-chip digital operations and on-chip communication. Here we report a multicore AIMC chip designed and fabricated in 14 nm complementary metal–oxide–semiconductor technology with backend-integrated phase-change memory. The fully integrated chip features 64 AIMC cores interconnected via an on-chip communication network. It also implements the digital activation functions and additional processing involved in individual convolutional layers and long short-term memory units. With this approach, we demonstrate near-software-equivalent inference accuracy with ResNet and long short-term memory networks, while implementing all the computations associated with the weight layers and the activation functions on the chip. For 8-bit input/output matrix–vector multiplications, in the four-phase (high-precision) or one-phase (low-precision) operational read mode, the chip can achieve a maximum throughput of 16.1 or 63.1 tera-operations per second at an energy efficiency of 2.48 or 9.76 tera-operations per second per watt, respectively.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: IBM HERMES project chip overview.**

**Fig. 3: ResNet-9 on CIFAR-10 measurement results.**

**Fig. 4: LSTM for character prediction measurement results.**

**Fig. 5: LSTM for image caption generation measurement results.**

Accurate deep neural network inference using computational phase-change memory

Article Open access 18 May 2020

A compute-in-memory chip based on resistive random-access memory

Article Open access 17 August 2022

Memory devices and applications for in-memory computing

Article 30 March 2020

Data availability

The data that support the plots within this paper and other findings of this study are available from the corresponding authors upon reasonable request.

References

Murmann, B. Mixed-signal computing for deep neural network inference. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 29, 3–13 (2021).
Article Google Scholar
Shafiee, A. et al. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) 14–26 (IEEE Press, 2016).
Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 529–544 (2020).
Article Google Scholar
Yu, S., Jiang, H., Huang, S., Peng, X. & Lu, A. Compute-in-memory chips for deep learning: recent trends and prospects. IEEE Circuits Syst. Magazine 21, 31–56 (2021).
Article Google Scholar
Lanza, M. et al. Memristive technologies for data storage, computation, encryption, and radio-frequency communication. Science 376, eabj9979 (2022).
Article Google Scholar
Wang, Z. et al. Resistive switching materials for information processing. Nat. Rev. Mater. 5, 173–195 (2020).
Article Google Scholar
Xiao, T. P., Bennett, C. H., Feinberg, B., Agarwal, S. & Marinella, M. J. Analog architectures for neural network acceleration based on non-volatile memory. Appl. Phys. Rev. 7, 031301 (2020).
Article Google Scholar
Yu, S. et al. Binary neural network with 16 Mb RRAM macro chip for classification and online training. in 2016 IEEE International Electron Devices Meeting (IEDM) 16.2.1–16.2.4 (IEEE, 2016).
Hu, M. et al. Memristor-based analog computation and neural network classification with a dot product engine. Adv. Mater. 30, 1705914 (2018).
Article Google Scholar
Tsai, H. et al. Inference of long-short term memory networks at software-equivalent accuracy using 2.5m analog phase change memory devices. in 2019 Symposium on VLSI Technology T82–T83 (IEEE, 2019).
Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
Article Google Scholar
Joshi, V. et al. Accurate deep neural network inference using computational phase-change memory. Nat. Commun. 11, 2473 (2020).
Article Google Scholar
Biswas, A. & Chandrakasan, A. P. CONV-SRAM: an energy-efficient SRAM with in-memory dot-product computation for low-power convolutional neural networks. IEEE J. Solid-State Circuits 54, 217–230 (2019).
Article Google Scholar
Merrikh-Bayat, F. et al. High-performance mixed-signal neurocomputing with nanoscale floating-gate memory cell arrays. IEEE Trans. Neural Netw. Learn. Syst. 29, 4782–4790 (2018).
Article Google Scholar
Cai, F. et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nat. Electron. 2, 290–299 (2019).
Article Google Scholar
Chen, W.-H. et al. CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors. Nat. Electron. 2, 420–428 (2019).
Article Google Scholar
Yin, S., Sun, X., Yu, S. & Seo, J.-S. High-throughput in-memory computing for binary deep neural networks with monolithically integrated RRAM and 90-nm CMOS. IEEE Trans. Electron Devices 67, 4185–4192 (2020).
Article Google Scholar
Khaddam-Aljameh, R. et al. HERMES-Core—a 1.59-TOPS/mm² PCM on 14-nm CMOS in-memory compute core using 300-ps/LSB linearized CCO-based ADCs. IEEE J. Solid-State Circuits 57, 1027–1038 (2022).
Article Google Scholar
Deaville, P., Zhang, B. & Verma, N. A 22nm 128-kb MRAM row/column-parallel in-memory computing macro with memory-resistance boosting and multi-column ADC readout. in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits) 268–269 (IEEE, 2022).
Khwa, W.-S. et al. A 40-nm, 2M-cell, 8b-precision, hybrid SLC-MLC PCM computing-in-memory macro with 20.5–65.0TOPS/W for tiny-Al edge devices. in 2022 IEEE International Solid-State Circuits Conference (ISSCC) 65, 1–3 (IEEE, 2022).
Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512 (2022).
Article Google Scholar
Hung, J.-M. et al. A four-megabit compute-in-memory macro with eight-bit precision based on CMOS and resistive random-access memory for AI edge devices. Nat. Electron. 4, 921–930 (2021).
Article Google Scholar
Fick, L., Skrzyniarz, S., Parikh, M., Henry, M. B. & Fick, D. Analog matrix processor for edge AI real-time video analytics. in 2022 IEEE International Solid-State Circuits Conference (ISSCC) 65, 260–262 (IEEE, 2022).
Jia, H. et al. Scalable and programmable neural network inference accelerator based on in-memory computing. IEEE J. Solid-State Circuits 57, 198–211 (2022).
Article Google Scholar
Narayanan, P. et al. Fully on-chip MAC at 14 nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration-format. IEEE Trans. Electron Devices 68, 6629–6636 (2021).
Article Google Scholar
Dazzi, M. et al. Efficient pipelined execution of CNNs based on in-memory computing and graph homomorphism verification. IEEE Trans. Comput. 70, 922–935 (2021).
Article MathSciNet MATH Google Scholar
Boybat, I. et al. Neuromorphic computing with multi-memristive synapses. Nat. Commun. 9, 2514 (2018).
Article Google Scholar
Khaddam-Aljameh, R. et al. A multi-memristive unit-cell array with diagonal interconnects for in-memory computing. IEEE Trans. Circuits Syst. II, Exp. Briefs 68, 3522–3526 (2021).
Google Scholar
Sarwat, S. G. et al. Mechanism and impact of bipolar current voltage asymmetry in computational phase-change memory. Adv. Mater. 2201238 (2022).
Papandreou, N. et al. Programming algorithms for multilevel phase-change memory. in IEEE International Symposium on Circuits and Systems (ISCAS) 329–332 (IEEE, 2011).
Le Gallo, M. et al. Precision of bit slicing with in-memory computing based on analog phase-change memory crossbars. Neuromorp. Comput. Eng. 2, 014009 (2022).
Article Google Scholar
Ielmini, D., Sharma, D., Lavizzari, S. & Lacaita, A. Reliability impact of chalcogenide-structure relaxation in phase-change memory (PCM) cells, part I: experimental study. IEEE Trans. Electron Devices 56, 1070–1077 (2009).
Article Google Scholar
Le Gallo, M., Krebs, D., Zipoli, F., Salinga, M. & Sebastian, A. Collective structural relaxation in phase-change memory devices. Adv. Electron. Mater. 4, 1700627 (2018).
Article Google Scholar
Le Gallo, M., Sebastian, A., Cherubini, G., Giefers, H. & Eleftheriou, E. Compressed sensing with approximate message passing using in-memory computing. IEEE Trans. Electron Devices 65, 4304–4312 (2018).
Article Google Scholar
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Tech. Rep. (2009).
Marcus, M. P., Santorini, B. & Marcinkiewicz, M. A. Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19, 313–330 (1993).
Google Scholar
Rashtchian, C., Young, P., Hodosh, M. & Hockenmaier, J. Collecting image annotations using Amazon’s mechanical turk. in Proc. NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk CSLDAMT ’10 139–147 (Association for Computational Linguistics, 2010).
Rasch, M. J. et al. Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.08469 (2023).
Rasch, M. J. et al. A flexible and fast pytorch toolkit for simulating training and inference on analog crossbar arrays. in 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) 1–4 (IEEE, 2021).
Mujika, A., Meier, F. & Steger, A. Fast-slow recurrent neural networks. in Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, 2017).
Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. in Proc. 40th Annual Meeting on Association for Computational Linguistics, ACL ’02 311–318 (Association for Computational Linguistics, 2002).
Jain, S. et al. A heterogeneous and programmable compute-in-memory accelerator architecture for analog-AI using dense 2-D mesh. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 31, 114–127 (2022).
Article Google Scholar
Dazzi, M., Sebastian, A., Benini, L. & Eleftheriou, E. Accelerating inference of convolutional neural networks using in-memory computing. Front. Comput. Neurosci. 15, 674154 (2021).
Article Google Scholar
Lin, P. et al. Three-dimensional memristor circuits as complex neural networks. Nat. Electron. 3, 225–232 (2020).
Article Google Scholar
Huo, Q. et al. A computing-in-memory macro based on three-dimensional resistive random-access memory. Nat. Electron. 5, 469–477 (2022).
Article Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE Computer Society, 2016).
Devries, T. & Taylor, G. W. Improved regularization of convolutional neural networks with cutout. Preprint at arXiv https://doi.org/10.48550/arXiv.1708.04552 (2017).
Nandakumar, S. R. et al. Phase-change memory models for deep learning training and inference. in 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS) 727–730 (IEEE, 2019).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
MathSciNet MATH Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. in 3rd International Conference on Learning Representations, ICLR 2015 (2015).
Vinyals, O., Toshev, A., Bengio, S. & Erhan, D. Show and tell: a neural image caption generator. in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3156–3164 (IEEE, 2015).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2818–2826 (IEEE, 2016).
Russakovsky, O. et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015).
Article MathSciNet Google Scholar
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. in Proc. 32nd International Conference on International Conference on Machine Learning—Volume 37, ICML’15 448–456 (PMLR, 2015).

Download references

Acknowledgements

We thank G. W. Burr, M. Bühler, T. Maurer, A. Müller, Y. Kohda, K. Hosakawa, S. Ambrogio, F. L. Lie, F. Liu, T. Levin and T. Gordon for assistance with the chip design; A. Okazaki, H. Mori and M. Bergendahl for assistance with the chip packaging; J. F. Mas, G. Cristiano and J. Paret for chip testing and simulation; F. Odermatt, I. Boybat, S. R. Nandakumar, C. Piveteau, C. Lammie and H. Benmeziane for help with the network deployment on the chip; and A. Pantazi, R. Haas, A. Curioni, S. Tsai, W. Haensch, J. Burns, R. Divakaruni and M. Khare for managerial support. We would also like to thank L. Benini and B. Rajendran for their support with supervising the students. This work was supported by the IBM Research AI Hardware Center. A. Sebastian acknowledges partial funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement nos. 682675 and 966764).

Author information

These authors contributed equally: Manuel Le Gallo, Riduan Khaddam-Aljameh, Milos Stanisavljevic.

Authors and Affiliations

IBM Research Europe, Rüschlikon, Switzerland
Manuel Le Gallo, Riduan Khaddam-Aljameh, Milos Stanisavljevic, Athanasios Vasilopoulos, Benedikt Kersting, Martino Dazzi, Geethan Karunaratne, Matthias Brändli, Abhairaj Singh, Julian Büchel, Xavier Timoneda, Vinay Joshi, Urs Egger, Angelo Garofalo, Pier Andrea Francese, Evangelos Eleftheriou & Abu Sebastian
IBM Systems and Technology, Böblingen, Germany
Silvia M. Müller
IBM Research, Yorktown Heights, NY, USA
Malte J. Rasch & Vijay Narayanan
University of Patras, Rio Achaia, Greece
Anastasios Petropoulos & Theodore Antonakopoulos
IBM Research, Albany, NY, USA
Kevin Brew, Samuel Choi, Injo Ok, Timothy Philip, Victor Chan, Claire Silvestre, Ishtiaq Ahsan & Nicole Saulnier

Authors

Manuel Le Gallo
View author publications
You can also search for this author in PubMed Google Scholar
Riduan Khaddam-Aljameh
View author publications
You can also search for this author in PubMed Google Scholar
Milos Stanisavljevic
View author publications
You can also search for this author in PubMed Google Scholar
Athanasios Vasilopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Benedikt Kersting
View author publications
You can also search for this author in PubMed Google Scholar
Martino Dazzi
View author publications
You can also search for this author in PubMed Google Scholar
Geethan Karunaratne
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Brändli
View author publications
You can also search for this author in PubMed Google Scholar
Abhairaj Singh
View author publications
You can also search for this author in PubMed Google Scholar
Silvia M. Müller
View author publications
You can also search for this author in PubMed Google Scholar
Julian Büchel
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Timoneda
View author publications
You can also search for this author in PubMed Google Scholar
Vinay Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Malte J. Rasch
View author publications
You can also search for this author in PubMed Google Scholar
Urs Egger
View author publications
You can also search for this author in PubMed Google Scholar
Angelo Garofalo
View author publications
You can also search for this author in PubMed Google Scholar
Anastasios Petropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Theodore Antonakopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Brew
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Choi
View author publications
You can also search for this author in PubMed Google Scholar
Injo Ok
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Philip
View author publications
You can also search for this author in PubMed Google Scholar
Victor Chan
View author publications
You can also search for this author in PubMed Google Scholar
Claire Silvestre
View author publications
You can also search for this author in PubMed Google Scholar
Ishtiaq Ahsan
View author publications
You can also search for this author in PubMed Google Scholar
Nicole Saulnier
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Narayanan
View author publications
You can also search for this author in PubMed Google Scholar
Pier Andrea Francese
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Eleftheriou
View author publications
You can also search for this author in PubMed Google Scholar
Abu Sebastian
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.L.G. and A. Sebastian defined the neural network inference and compute precision characterization research. R.K.-A. led and performed the analogue design of the chip. M.S. led the digital design of the chip. M.S., M.D., G.K., M.B., A. Singh, S.M.M. and P.A.F. performed the digital design. M.L.G., A.V., B.K., G.K. and A.G. performed the chip testing and wrote the code to operate it. A.V. and B.K. performed the neural network inference and MVM characterization hardware experiments. J.B., X.T., V.J. and M.J.R. performed the hardware-aware training of the neural networks. M.L.G. and U.E. performed the chip performance measurements. U.E. built the chip testing platform. A.P. and T.A. wrote the field-programmable gate array code to interface with the chip. K.B., S.C., I.O., T.P., V.C., C.S., I.A. and N.S. performed the backend integration of the PCM devices and wafer-level testing. M.L.G. wrote the manuscript with input from all authors. V.N., P.A.F., E.E. and A. Sebastian supervised the project.

Corresponding authors

Correspondence to Manuel Le Gallo or Abu Sebastian.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Electronics thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Digital communication fabric.

a, Schematic of link controller. The dotted line refers to the core boundary. The transmitter-side link controller prepends a preamble to the payload being sent on the links. The preamble contains information to uniquely identify a routing path for a particular type of payload. If the receiving core is enabled to receive data from the transmitting core in the routing table, the link controller samples the incoming data. Furthermore, the link controller in the transmitting and receiving cores can select a portion of the payload according to another set of routing registers. b, Possible link connections for Core(3,5) and Core(4,5), where the notation Core(r,c) refers to the core located at row r and column c in Fig. 1b. c, Link connections for the entire chip (available connections are denoted in green color). The RX and TX connections for Core(3,5) and Core(4,5) shown in b are indicated. d, Link characterization results on one chip for communicating data from the LDPU of a core to the LDPU of another core. A payload of 255 bytes is sent by the transmitter core and an error is triggered if at least one byte of the payload received in the LDPU of the receiver core does not match with the original payload. All links with a Manhattan distance of 1 or 2 cores show no errors when run at 100 MHz, and 98% of them at 400 MHz. Links with longer Manhattan distances show more errors potentially due to attenuation from longer-distance routing metal wires due to parasitics. The issue can be mitigated in a future design by placement of buffers at reasonable distances along the wires, or by employing a core connectivity matrix that does not rely on long distance links. All the links used in the experimental demonstrations shown in this work have a Manhattan distance of 1 core and are fully working at 400 MHz.

Extended Data Fig. 2 PCM crossbar array.

a, Schematic of 8T4R unit-cell. The top electrodes of the conductance pairs of each polarity connect to separate bit lines \(B{L}_{m}^{+}\), \(B{L}_{m}^{-}\) and the sources of their lower access-transistors connect to separate source lines \(S{L}_{n}^{+}\), \(S{L}_{n}^{-}\). Thus, the devices in a conductance pair are weighted with equal significance and the total conductance per unit-cell becomes: \(\left({g}_{1}^{+}+{g}_{2}^{+}\right)-\left({g}_{1}^{-}+{g}_{2}^{-}\right)\). b, Schematic of PCM crossbar array. To program the PCM devices, the dedicated per-core programming FSM instructs the diagonal selection decoder to enable one diagonal of cells that contains the devices that are to be programmed. The diagonal selection decoder controls the \({SEL}_{m,n}^{1}\) and \({SEL}_{m,n}^{2}\) signals in the unit-cell, which are routed diagonally throughout the array. The selected devices are programmed by the current-steering DAC-based programming units located on top of the PCM array. To perform an MVM, the 256 inputs to the crossbar array (IN₀ − IN₂₅₅) are applied via the red source lines (SLs) to the 8T4R cells. The resulting bit line (BL) currents are summed up on the blue wires and read by the ADCs that flank the crossbar array on the left and right. c, Layout of one ADC. The block diagram that is shown below the layout illustrates the various components of the ADC, namely, the read voltage regulator, the current-to-frequency converter, and the 2 × 12-bit ripple counter.

Extended Data Fig. 3 PCM device.

a, A typical programming curve indicating the programmed device conductance as a function of the programming current. The device conductance is determined by the phase configuration within the PCM device and in particular, the size of the amorphous region. Data are presented as mean values + / − one standard deviation over 10 repeated measurements on a single device. b, Low-angle annular darkfield (LAADF) scanning transmission electron microscope (STEM) image of a fully RESET PCM device showing a substantially large amorphous region that fully blocks the bottom electrode. LAADF enables the imaging of the amorphous region with high resolution. c, LAADF of a partially RESET PCM device showing a much smaller amorphous region. The synaptic weights are stored in an analog manner in terms of these phase configurations and the resulting conductance values.

Extended Data Fig. 4 Input modulation modes for MVM.

a, Full array read procedure for MVMs showing the connection between ADC, unit-cells, and input modulator switches. Signals PP and PN connect the positive source lines \(S{L}_{1:N}^{+}\) to the positive potential V⁺ and negative potential V⁻, respectively. For NP and NN it is vice versa. b, 1-phase modulation mode. Inputs of positive and negative polarity are applied to weights of positive and negative polarity in one modulation cycle T_PWM. c, 4-phase modulation mode. Inputs of positive and negative polarity are applied individually to weights of positive and negative polarity in four modulation cycles.

Extended Data Fig. 5 Weight programming procedure.

a, Crossbar array during programming. b, Proposed TDP algorithm to program a target conductance value G on a unit-cell. c, Weight error comparison between TDP of this work and previous approaches. TDP - Max-fill refers to programming the two devices with iterative programming up to the ODP \({G}_{\max }\), as proposed in Ref. ³¹. Due to the wide SET distribution shown in Fig. 2a, some devices in the core either cannot achieve \({G}_{\max }\), or conversely could be programmed to much higher conductance values than \({G}_{\max }\). Therefore, the latter approach leads to programming inaccuracies resulting from either under-utilizing the conductance range of individual devices or from devices that cannot reach \({G}_{\max }\). The proposed TDP algorithm solves this issue by using the readout SET conductance of the devices of the unit-cell to map the weight.

Extended Data Fig. 6 Area and power splits.

a, Area breakdown of the main chip components. b, Static power consumed by the different components of the chip measured for the operation of the LSTM unit of the image caption generation task (4 core rows and GDPU active).

Extended Data Table 1 Summary of IBM HERMES Project Chip specifications

Full size table

Extended Data Table 2 Comparison of IBM HERMES Project Chip with other multi-core AIMC chips demonstrating neural network inference

Full size table

Supplementary information

Supplementary Video 1

Live demonstration of image captioning using the IBM HERMES project chip.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Le Gallo, M., Khaddam-Aljameh, R., Stanisavljevic, M. et al. A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference. Nat Electron 6, 680–693 (2023). https://doi.org/10.1038/s41928-023-01010-1

Download citation

Received: 27 May 2023
Accepted: 10 July 2023
Published: 10 August 2023
Issue Date: September 2023
DOI: https://doi.org/10.1038/s41928-023-01010-1

This article is cited by

Mosaic: in-memory computing and routing for small-world spike-based neuromorphic systems
- Thomas Dalgaty
- Filippo Moro
- Melika Payvand
Nature Communications (2024)
Memristor-based hardware accelerators for artificial intelligence
- Yi Huang
- Takashi Ando
- Qiangfei Xia
Nature Reviews Electrical Engineering (2024)
Local prediction-learning in high-dimensional spaces enables neural networks to plan
- Christoph Stöckl
- Yukun Yang
- Wolfgang Maass
Nature Communications (2024)
Cross-layer transmission realized by light-emitting memristor for constructing ultra-deep neural network with transfer learning ability
- Zhenjia Chen
- Zhenyuan Lin
- Huipeng Chen
Nature Communications (2024)
Energy efficiency and design challenges in analogue memristive chips
- Alex James
Nature Reviews Electrical Engineering (2024)