First demonstration of in-memory computing crossbar using multi-level Cell FeFET

Advancements in AI led to the emergence of in-memory-computing architectures as a promising solution for the associated computing and memory challenges. This study introduces a novel in-memory-computing (IMC) crossbar macro utilizing a multi-level ferroelectric field-effect transistor (FeFET) cell for multi-bit multiply and accumulate (MAC) operations. The proposed 1FeFET-1R cell design stores multi-bit information while minimizing device variability effects on accuracy. Experimental validation was performed using 28 nm HKMG technology-based FeFET devices. Unlike traditional resistive memory-based analog computing, our approach leverages the electrical characteristics of stored data within the memory cell to derive MAC operation results encoded in activation time and accumulated current. Remarkably, our design achieves 96.6% accuracy for handwriting recognition and 91.5% accuracy for image classification without extra training. Furthermore, it demonstrates exceptional performance, achieving 885.4 TOPS/W–nearly double that of existing designs. This study represents the first successful implementation of an in-memory macro using a multi-state FeFET cell for complete MAC operations, preserving crossbar density without additional structural overhead.

1)Comparing to typical memory application (i.e., just 1 and 0 by set and reset operation), it has been observed that the 2 bit MLC posts more challenges to NVM technology to meet more strict Vt shift requirements [Ref1].Although FeFET shows good potential to have 10 year extrapolated retention for a typical memory application [Ref2], it is not clear if MLC for FeFET in this work would show sufficiently stable Vt over time for all of the 0, 1, 2, 3 levels within the array, and also if such Vt shift would still be at a tolerant level for the inference application (e.g.acceptable accuracy loss on MNIST, CIFAR-10 and beyond).Of particular, it is especially important to evaluate Vt shift in partially programmed states (i.e., 1, and 2) compared to fully programmed and erased states (i.e., 0, and 3).It is important to include some thoughts in that perspective and discuss if the Vt shift with time on the 2 bit MLC in FeFETs would be a concern in the large array for inference application.
2)The other important aspect for inference application is read latency.In figure 2 of the manuscript, simulation work provides good potential for < 20ns time scale, however, experiments on FeFET array needs 600 us time scale.The big difference between experiment and simulation work is not clearly discussed in details in the manuscript, and it casts doubt on the cause of such slow read from experiments and its potential impact on inference tasks.Would that due to experimental set up (e.g.probe card, or test structure) or because of the proposed unit cell that includes 1M-ohm resistor, or related to the capacitor from the decoder?Most importantly, is this slow read related to the 32x32 FeFET based array and if that is a big concern in even larger array?If so, would that be an issue when larger FeFET array is used for inference application using larger NN?It would be important to discuss the 600us read latency in the experiments with improvement strategies listed in revised manuscript.If this slow read is due to serve trapping after programming, which has to require enough detrapping time in order to get enough memory window, then it is vital important to include Vt dynamics within 100us time scale, in order to clarify the origin of read latency and provides future direction for technology improvement.
3)One additional comment, the LeNet on MNIST shows test accuracy 96.64% when using FeFETs that have 40mv device to device variation, would this test accuracy improve significantly if device to device variation would be further improved a little bit?If this could be evaluated by some simulation work, that would be a very helpful guideline for ferroelectric technology improvement.In the manuscript, the authors employ the 1FeFET-1R cell structure for MAC driving and present a crossbar macro with minimal cell-to-cell variability effects.The authors utilize the MLC FeFET device and achieve 96.6% accuracy in handwriting recognition and 91.5% accuracy in image classification through simulations.However, additional experimental results are necessary to validate the FeFET device discussed in this manuscript.For this manuscript to be published in Nature Communications, a major revision is required, and the specific details are as follows: 1. Figure 1b illustrates the variation in threshold voltage of the transfer curve caused by the multistate FeFET.However, the distinction between threshold voltages for states 0 and 1 is not clearly discernible from states 2 and 3.This is a crucial concern that diminishes the reliability of information storage when using FeFET in an array format.In order to clearly distinguish the state between 0 and 1, please add the distribution of threshold voltage according to each information state.
2. The authors primarily focus on demonstrating device-to-device reliability due to the array's characteristics.However, to utilize the memory device FeFET, measuring the device's endurance and retention is essential.This is very important as an index showing the possibility of a device being applied to an actual circuit.Considering the FeFET device's utilization of multi-state states, it is necessary to provide evidence of endurance and retention for each multi-state.
3. Additional clarification is needed to explain the reduction in power consumption resulting from including 1R, as mentioned in lines 118 to 120 on page 8.

Letter and List of Updates
First Demonstration of In-Memory Computing Crossbar using Multi-level Cell FeFET Nature Communications, <NCOMMS-23-21473A> -Revisionresponse to the remaining two questions from reviewer 3. Changes in the manuscript are explicitly mentioned here and highlighted in the revised manuscript in blue.Reference, figure, and other numbers associated with this response letter are preceded by the letter "R".

Summary of changes:
1. We have added further clarifications on the reduction of the cell power and performed a new analysis to demonstrate the effects that the included resistor has on limiting the cell current.
2. We have provided a detailed explanation clarifying the specific method that we employed for measuring the V th states in the retention measurements as well as a detailed explanation for the involved procedures.

1
In the following, we provide a point-by-point

Reviewer 3
In the manuscript, the authors conducted simulations of a highly accurate IMC crossbar macro by minimizing variability effects between cells through the utilization of the 1FeFET-1R cell structure.In order for this manuscript to be considered for publication in Nature Communications, minor revisions are required.The specific details for these revisions are outlined below: We sincerely thank the reviewer for the effort.In the following, we provide a point-by-point response to two remaining questions.
R3Q1 -Additional discussion concerning the reduction of power consumption on lines 121 to 124 of page 8 appears necessary.When integrating a high resistance (1R) into a circuit, there is a general expectation that power consumption would increase to achieve comparable operation.
If there is a comparison reference where power consumption is reduced more than in any other structure, highlighting this would further clarify the advantages asserted by the authors regarding the 1FeFET-1R concept.
R3A1 -We agree with the reviewer that including a high resistance will increase the power (P = I 2 × R), but only if the current flowing through the circuit remains constant.However, when the voltage is constant, then integrating a high resistance reduces the current.In our design, the 1FeFET-1R cell is connected to a constant supply voltage V dd , which leads to a smaller current.Therefore, the power consumed by the cell is reduced when a high resistance is incorporated (P = V 2 /R).In practice, when we connect a resistance to the FeFET device to form our 1FeFET-1R cell, this limits the ON current to only 0.1 µA, instead of 2 µA, which is ON current in the baseline case where no resistance is connected to the FeFET.This is illustrated in Fig. R3.1, which presents the I D -V G transfer characteristic of a particular stored state of the FeFET when the FeFET is connected to a resistance (dashed line) and not connected to a resistance (solid line).This observation is also aligned with existing works (e.g.,[R3.1,R3.2]), which previously demonstrated that including a resistance leads to a lower current.Finally, it is noteworthy that the total power consumption of the entire crossbar array circuit is dominated by the power consumption of the analog-to-digital converters (ADCs).Therefore, reductions in the power of individual cells often do not lead to a noticeable impact on the total power consumption [R3.Changes in the revised manuscript: We have explained the reduction in power in page 8 at lines 122 to 124.In addition, we included the detailed discussion on the impact of resistance on power in the supplementary Section S1.
R3Q2 -A detailed explanation of the specific method employed for measuring the random Vth state and the target Vth state during retention measurements in Figure R3.2 is necessary.Provide sufficient explanation regarding how these values were derived, elucidating the procedures undertaken.
R3A2 -We apologize for the missing details and explanation regarding the employed method for measuring the different V th states.In the following, we elucidate the employed method step by step as well as the different procedures employed for measuring the V th states, programming and verifying the different V th states as well as measuring the retention of FeFET devices over time.
1 Structure of the fabricated FeFET-based crossbar array: The measurements are conducted on a fabricated FeFET-based crossbar array with a size of (9 × 7).Hence, the array consists of 9 word-lines (WL) and 7 bit-/source-lines (BL/SL).To form a crossbar, the FeFET devices are "AND" connected, as Fig. R3.2 illustrates.All measurements are performed at the wafer level and the temperature desired for experiments is set using a temperature-controlled chuck.
2 Procedure of V th measurement: After a write voltage pulse is applied to an FeFET, the device exhibits a certain V th .Reading-out the programmed/stored V th state is necessary to 1) verify after writing what the exact V th state, stored in FeFET, is and 2) perform later the required retention measurements, which quantify how the stored V th states may drift over time.To extract the V th of FeFET, the I D -V G transfer characteristic of FeFET needs to be first measured.To perform that for FeFETs in a certain row, the corresponding WL voltage for that row is ramped from 0 V to 1.4 V with an increment of 100 mV, while applying 100 mV at the BL.  then sampled at every read voltage step V G , with a sampling time of 80 µs, until obtaining the full To ensure a reliable current measurement, a settling time of 1 µs is waited for every read voltage step.Finally, the V th state is extracted from the measured I D -V G curve using the standard constant-current method [R3.5] in which the gate voltage is extracted at a certain fixed I D current of 100 nA.
3 Programming FeFET procedure: Writing FeFET devices occur at the row level in which a specific write voltage is applied to a certain WL, while all SL/BL are set to 0 V.To ensure programming FeFETs to a certain targeted V th state, a "write-verify" scheme is applied as follows.First a write voltage of 2.1 V is applied for 400 ns (i.e., a write pulse of 2.1 V amplitude and 400 ns pulse width is applied).Then, a period of 2 s is waited to provide a sufficient time for any de-trapping within the FeFET.Afterwards, the V th is measured to verify whether it matches the targeted level.V th measurement is performed using the procedure explained in 2 .If the measured V th does not match the target level, then a new write voltage pulse, with 40 mV higher amplitude, is applied.The "write-verfy" scheme is repeated until the target V th is reached.It is important to note that once a specific FeFET device reaches the target V th , it is changed to the "inhibit-condition" to ensure no disturbance occurs when other FeFET devices are being programmed.Inhibit is defined as a V BL = V SL = 3.2 V. To avoid disturbs in FeFETs sharing inhibited BL/SL along the passive WL, they are raised to V WLp = 1.6 V. "Write-verify" scheme is continued until all 7 FeFETs of the activated/selected WL reach the target V th state.The previous programming procedure is then applied to the another row (i.e., WL) to program the FeFET devices there to another target V th state.
4 Retention measurement procedure: The focus of this work is to demonstrate how 2-bit FeFET can be employed to perform in-memory computing crossbar.To realize 2-bit FeFET, four different V th states should be reliably stored.Therefore, we perform retention measurements to quantify how stored V th states may drift over time.To this end, four different V th states (1.2 V, 0.9 V, 0.6 V and 0.3 V) are targeted to be programmed in the FeFET devices.First, the temperature desired for the experiment (85 °C) is set using the temperature-controlled chuck and then a sufficient time is waited to ensure thermal stability.Then, four rows in the crossbar array are selected and the 7 FeFET devices in each row are programmed to a certain V th state (1.2 V, 0.9 V, 0.6 V, 0.3 V).The FeFET programming procedure along with the "write-verify" scheme is explained in 3 .Afterwards, the V th states stored in the 7 FeFETs across one row are, in parallel, measured.The V th measurement procedure is explained in 2 .The V th measurements are then repeated with logarithmic time steps for 10 5 seconds, which is approximately one day.The obtain measurements provide the necessary information about the retention behaviour of FeFET devices (i.e., the drift of V th over time).Fig. R3.4 presents the retention measurements of the four targeted V th states.
It is noteworthy that without applying a "write-verify" scheme, the programmed V th states will tend to be random due to the effects of variability.Fig. R3.5 demonstrates the retention measurements for 9 different V th states that were programmed without applying the "write-verify" scheme.In this experiment, 9 different write voltages are applied to the 9 rows (i.e., WLs) in the crossbar array.The write voltages are selected to cover the entire switching range of FeFETs and they start from 2.3 V up to 3.1 V with an incremental step of 100 mV.In this experiment, we employ the same V th measurement procedure explained in 2 and the same programming procedure explained in 3 but without the "write-verify" scheme.
Changes in the revised manuscript: We have included the above detailed explanation and procedures to the supplementary Section S3.

Figure R3. 1 :
Figure R3.1:The effect of including a resistor on reducing the cell current from over 2 µA to only

Figure R3. 2 :
Figure R3.2:Structure of the AND-connected FeFET-based crossbar array (9 × 7) with 9 word-line Figure R3.3:Overview flowchart of the employed procedure to program FeFETs into a certain Figure R3.4:Retention measurements for the four stored V th states.The "write-verify" scheme is Figure R3.5:Retention measurement of for different 9 V th states, which are random because the no The drain current I D is