Embedding security into ferroelectric FET array via in situ memory operation

Non-volatile memories (NVMs) have the potential to reshape next-generation memory systems because of their promising properties of near-zero leakage power consumption, high density and non-volatility. However, NVMs also face critical security threats that exploit the non-volatile property. Compared to volatile memory, the capability of retaining data even after power down makes NVM more vulnerable. Existing solutions to address the security issues of NVMs are mainly based on Advanced Encryption Standard (AES), which incurs significant performance and power overhead. In this paper, we propose a lightweight memory encryption/decryption scheme by exploiting in-situ memory operations with negligible overhead. To validate the feasibility of the encryption/decryption scheme, device-level and array-level experiments are performed using ferroelectric field effect transistor (FeFET) as an example NVM without loss of generality. Besides, a comprehensive evaluation is performed on a 128 × 128 FeFET AND-type memory array in terms of area, latency, power and throughput. Compared with the AES-based scheme, our scheme shows ~22.6×/~14.1× increase in encryption/decryption throughput with negligible power penalty. Furthermore, we evaluate the performance of our scheme over the AES-based scheme when deploying different neural network workloads. Our scheme yields significant latency reduction by 90% on average for encryption and decryption processes.


Introduction
The proliferation of smart edge devices has led to a massive influx of data, necessitating high-capacity and energy-efficient memory solutions for storage and processing.Traditional volatile memories, such as static random access memory (SRAM) and dynamic RAM (DRAM), are struggling to meet the demands due to their significant leakage power and low density 1 .To address this issue, high-density NVMs, such as mainstream vertical NAND flash, has become the cornerstone of modern massive information storage.NVM offers nonvolatility, zero leakage power consumption, and high density if integrated in dense 3D form 2 .Various emerging NVM technologies are being pursued targeting different levels of the memory hierarchy, e.g., as storage class memory or even as on-chip last-level cache, including 3D XPoint based on phase change memory (PCM) 3 , sequential or vertical 3D resistive memory, and back-end-of-line ferroelectric memory.Beyond simple data storage, NVM is playing an increasingly important role in data-centric computing, particularly in the compute-in-memory (CiM) paradigm.Within this paradigm, computation takes place in the analog domain within the memory array, eliminating the energy and latency associated with data transfer in conventional computing hardware.This has the potential to pave the way for sustainable data-intensive applications, particularly in the field of artificial intelligence, which is rapidly advancing with exponentially growing models.Hence NVM will be a crucial electronic component for ensuring sustainable computing in the future.However, the nonvolatility of NVM also brings many new security challenges and concerns 4, 5 that were absent in conventional volatile memories.One of the major threats occurs when a NVM is stolen or lost, the malicious attackers may exploit the unique properties of NVM to get unauthorized accesses by low-cost tampering and then easily extract all the sensitive information stored in the devices, such as users' passwords and credit card numbers, out of the memory, and is also known as the "stolen memory attack".
Compared to volatile memory such as SRAM which is considered safe due to the loss of data after power down, NVM retains data indefinitely, making them vulnerable after the system is powered down, as shown in Fig. 1(d).Besides, with the increasing demand of intensive computation and the stronger desire of large data capacity, replacing some parts of storage systems with NVMs increases the incentive to attack the system and makes more data vulnerable.Hence, the security vulnerability of NVM has become a critical issue for information-sensitive systems.
To address the above issue and ensure data security in modern NVM systems, data encryption is the most common approach.AES is the most common and widely-used cryptographic algorithm 6 .It is a symmetrical block cipher algorithm including two processes -encryption and decryption, which converts the plaintext (PT) to the ciphertext (CT) and converts back by using 128-, 192-, or 256-bits keys.Because of the high security and high computation efficiency it provides, AES algorithm has attracted many researchers to actively explore its related hardware implementations and applications in a wide range of fields, such as wireless communication 7 , financial transactions 8 etc.In addition, a variety of AES-based encryption techniques were proposed aiming to address the aforementioned NVM security issues and improve the security of NVM.However, AES encryption and decryption incurs significant performance and energy cost due to extra complexity involved with read and write operations, as shown in Fig. 1(e).An incremental encryption scheme, called as i-NVMM, was proposed to reduce the latency overhead 9 , in which different data in NVMs is encrypted at different times depending on what data is predicted to be useful to the processor.By doing partial encryption incrementally, i-NVMM can keep the majority of memory encrypted while incurring affordable encryption overheads.However, i-NVMM relies on the dedicated AES engine that is impacted by limited bandwidth.Other prior works have proposed near-memory and in-memory encryption techniques as solutions to address the performance issues.For instance, AIM, which refers to AES in-memory implementation, supports one in-memory AES engine that provides bulk encryption of data blocks in NVMs for mobile devices 10 .
In AIM, encryption is executed only when it's necessary and by leveraging the benefit of the in-memory computing architecture, AIM achieves high encryption efficiency but the bulk encryption limits support fine-grain protection.In summary, prior AES-based encryption schemes fail to efficiently address the aforementioned security issues in NVMs without incurring negligible costs.Our effort aims to break the dilemma between encryption/decryption performance and cost by finding a satisfactory solution to address the security vulnerability issue.
As illustrated in Fig. 1(f), we propose a memory encryption/decryption scheme that exploits the intrinsic memory array operations without incurring complex encryption/decryption circuitry overhead.The idea is to use the intrinsic memory array operations to implement a lightweight encryption/decryption technique, i.e., bit wise XOR between the secret key and the plaintext/ciphertext, respectively.In this way, the ciphertext is written into memory through normal memory write operations and the data is secure unless a correct key, which attackers do not possess, is provided during the memory sensing operation.This work demonstrates this proposed encryption/decryption operation in FeFET memories and can potentially be extended to other NVM technologies.
Ferroelectric HfO 2 has revived interests in ferroelectric memory for its scalability, CMOS compatiblity, and energy efficiency.Inserting the ferroelectric into the gate stack of a MOSFET, a FeFET is realized such that its threshold voltage (V TH ) can be programmed to the low-V TH (LVT) state or high-V TH (HVT) state by applying positive or negative write pulses on the gate, respectively.In this work, with the co-design from technology, circuit and architecture level, the proposed efficient encryption/decryption scheme can successfully remove the vulnerability window and achieve secure encryption in FeFET-based NVM.Moreover, since there is no additional complicated encryption/decryption engine (e.g.AES engine) as a part of the peripheral circuit in our architecture, ourdesign can avoid the latency/power/area costs in AES-based encryption designs by only adding lightweight logic gates, which dramatically improves the performance of memory and expands the range of potential applications in different fields.
With the proposed memory encryption/decyrption scheme integrated in FeFET memory array, many NVM-targeted attacks can be prevented.For example, if the memory device is stolen or lost, our design can effectively protect it against the malicious stolen memory attack as the attacker has no knowledge of what the data represents without correct secret keys even though they are able to physically access and read out the stored ciphertext (Fig. 1(a)).Besides, with negligible incurred overhead compared with normal memory, the proposed design can benefit wide applications that can exploit the added security feature without compromising performance.For instance, as shown in Fig. 1(b), NVM arrays can be used to accelerate the prevalent operation in deep neural networks, i.e., matrix vector multiplication (MVM) in memory.By storing the trained neural network weights as, for example, the NVM conductance, the intended MVM operation is naturally conducted in analog domain by applying the input as input voltage pulses and summing up the resulting array column current.As artificial intelligence makes significant strides in various application domains, especially those information sensitive sectors, how to protect these trained weights from malicious entities becomes an essential problem 11,12 .Many relevant works have explored and demonstrated that data encryp-tion embedded in CiM enables in-situ authentication and computation with high area and energy efficiency 13,14 .Compared to existing AES-based encryption design which would introduce significant delay, our encryption design can efficiently encrypt and decrypt all the weights in-situ and perform CiM computation with the encrypted weights directly thus ensuring high security and privacy.Another application example is secure encrypted virtualization (SEV) 15 .SEV systems require keys to isolate guests and the host OS/hypervisor from one another in order to ensure the data security in system hardware.
However, present SEV systems use AES engines for encryption.By replaceing the AES engines with our design, the system performance will be improved in terms of latency.CT is '1', then the upper FeFET should be set to HVT state and the bottom FeFET should be set to LVT state.In the decryption process, different read voltages (V R /0 V) are applied on the gate terminals of FeFETs.However, the voltage pattern of decryption is different from that of encryption in the proposed design.The voltage pattern (V R /0 or 0/V R ) is only relevant to the key of this cell.More specifically, if the key = 1, V R would be applied on the gate of the upper FeFET in the memory cell, and 0 V would be applied to the other FeFET.In contrast, if the key = 0, V R would be asserted on the bottom FeFET instead.In this way, original data (PT) can be successfully read out through sensing the current only when the user uses the correct key.However, for unauthorized users/attackers, even though they may have the physical access to read out the current of each memory cell, they are no aware of whether the information they read is correct or not since they don't know the correct keys for each block.Therefore, the FeFET memory are protected from information leakage and achieves intrinsic secure without extra circuit cost.

Overview of the proposed memory encryption/decryption scheme
Besides, the proposed in-situ memory encryption/decryption scheme is not just limited for the AND arrays.We also explore and demonstrate the feasibility of the proposed scheme to apply in other array structures, such as FeFET NAND array which provides potentially higher integration density (Supplementary Fig. S2) and FeFET NOR array (Supplementary Fig. S3).Both of them show that the proposed memory encryption/decryption scheme is general and can fit into different memory designs.More specifically, two FeFETs are coupled as one cell for representing one bit information -bit '1' or bit '0'.During the encryption process, firstly, CT will be determined by XORing PT and the corresponding key.Depending on different CT, complementary states will be programmed into the 2FeFET-based cell.During the decryption process, different read voltages depending on key patterns will be applied to the coupled FeFETs in the same cell.Finally, the correct information (PT) would be successfully read out.

Experimental Verification
In this section, functional verification of encryption/decryption operations on one single cell and memory array is demonstrated.For experimental measurement, FeFET devices integrated on the 28 nm high-κ metal gate (HKMG) technology platform are tested 16 .spectively.With CT of '0', the top/bottom FeFET is programmed to the LVT/HVT, using +4V/-4V, 1µs write gate pulse, respectively.Then the decryption process simply corresponds conventional array sensing operation but with key-dependent read voltages on the two FeFETs (i.e., dashed line in Fig. 3(c) and (e)).For example, with key of '1', the top/bottom FeFETs are applied with V R (i.e., 0.6V)/0V, respectively.In this way, the top FeFET contributes a high read current, thus corresponding to the PT of bit '1'.If the key is bit '0', the read biases for the two FeFETs are swapped such that the top/bottom FeFETs receive 0V/V R , respectively, where both FeFETs are cut-off, thus corresponding to the PT of bit '0'.Successful decryption can also be demonstrated for CT of bit '1' as shown in Fig. 3(d) and (f), where the top/bottom FeFETs are programmed to the HVT/LVT state, respectively and the same key-dependent read biases are applied.These results demonstrate successful single cell encryption/decryption using only in-situ memory operations.All-0 Key (Accuracy=50%) Decryption through sensed cell current Array-level experiments and functional verification are also performed and demonstrated.Without loss of generability, FeFET AND array is adopted.Fig. 3(g) illustrates a 8x7 FeFET AND memory array for measurements.As illustrated in Fig. 3(h), a checkerboard data pattern of PT (i.e., orange boxes represent data '1'; blue boxes represent data '0'.) and random keys shown in Fig. 3(i) are used.To show the most general case, bit-wise encryption/decryption is validated, as encryption at a coarser granularity, i.e., row-wise or block-wise, is simply derivation of the bit-wise case.With the PT and keys determined, the CT is simply the XOR result between the PT and corresponding keys, as shown in Fig. 3(j).Each CT bit is then stored as the complementary V TH states of the two FeFETs in each cell.Different write schemes along with disturb inhibition strategy can be applied 17 .In this work, block-wise erase is performed first by raising the body potential to reset the whole array to the HVT state and then selectively programming corresponding Fe-FETs into the LVT state.Fig. 3(k) shows the V TH map of 8x7 FeFETs in the array after the encryption process, corresponding to 4x7 encrypted CT.
For the decryption process, three different scenarios are considered, i.e., using correct keys, all-0 keys, and random keys.For bit-wise encryption/decryption in AND array, since all the FeFETs in the same row share the same word line, it requires two read cycles to sense the whole row.This is because the key-dependent read voltage biases are different for key bit '1' and bit '0'.Therefore two read cycles are required where cycle 1 and 2 reads out the cells with key bit '1' and '0', respectively.Cycle 1 results are temporarily buffered and merged with cycle 2 results.Note that the additional latency can be avoided if row-wise or block-wise encryption granularity is used, where the same word line bias can be applied.As shown in Fig. 3(l), under the condition of using correct keys, the user can successfully read out all PT.For attackers without the knowledge of keys, two representative scenarios are considered, where the attackers can simply apply all-0 keys or random keys.In the condition of all-0 keys, the accuracy is only 50%, as shown in Fig. 3(m).With random keys, the accuracy of decryption is only 32.1%, which is much worse than other two conditions.Above all, both the functional correctness of the proposed encryption design and the resistance against attacks are verified at the cell level and array level.

Evaluation and case study
To evaluate the feasibility and performance of the proposed in-situ memory encryption /decryption scheme using FeFET memory arrays, a comprehensive evaluation is performed between this work and AES-based encryption scheme 18 in terms of area, latency, power, and throughput.For a fair comparison, an 128x128 FeFET AND-type array is designed in 28nm HKMG platform and operates at 25 MHz, consistent with the reference AES work 18 .This speed serves as a pessimistic estimation of FeFET array encryption/decryption operation as it can operate at a higher speed.In addition, for memory sensing, 16 sense amplifiers (SAs) are used for illustration.If a higher sensing throughput is needed, more SAs can be deployed.For the evaluation, both the AES and proposed in-situ encryption/decyrption scheme are applied.As summarized in Table .4, for the prior AES-based work, the area cost of its AES unit is 0.00309 mm 2 .However, for the proposed scheme, the only functional gate required is XOR gates, whose area is negligible comparing to the whole memory area cost.Besides, latency is one of the most important criteria for evaluating encryption methods.In the proposed design, the encryption and decryption latency for 128-bit data are 5 cycles and 16 cycles, respectively, which is much less than the latency penalty of the AES accelerator (115.5 cycles, 117 cycles).One thing should be noticed is that decryption latency would be reduced if more SAs are used for sensing.Moreover, at the frequency of 25 MHz, the performance of 640/400 Mbps throughput is obtained during the encryption/decryption process, which is much better than that of the AES accelerator (throughput: 28.32 Mbps).Since the power consumption of our encryption circuit is only equal to that of multiple XOR gates, it is negligible compared to the AES accelerator (0.031 mW).
In addition, to investigate the latency benefit provided by the proposed scheme compared to the conventional AES scheme when implementing data encryption and decryp-tion with different neural network (NN) workloads, a case study is performed on 6 NN workloads which are Alexnet, Mobilenet, FasterRCNN, Googlenet, Restnet18, and Yolotiny via SCALE-Sim 19 which is a simulator for evaluating conventional neural network (CNN) accelerators.In this case study, we specifically consider this scenario -all the workloads are implemented into a systolic array for processing (Google TPU in this case).
The encrypted weights of each neural network are pre-loaded into FeFET-based memory arrays for feeding to the systolic system after decryption.After the computation, the outputs will be read out and securely stored into the FeFET memory with encryption.As shown in Fig. 4(b), the latency introduced by encryption and decryption processes of the proposed scheme is much less than that of AES-based scheme.The average latency reduction over these 6 workloads is ∼90%.According to the simulation results, it shows that the proposed in-situ memory encryption/decryption scheme offers significant time savings over the conventional AES scheme, especially when processing data-intensive applications, such as neural networks.

Conclusion
In summary, we propose an in-situ memory encryption/decryption scheme which can guarantee high-level security by exploiting the intrinsic memory array operations while incurring negligible overheads.In addition, the functionality of the proposed scheme is verified through experiments on both device-level and array-level.Moreover, the evaluation results show that our scheme can hugely improve the encryption/decryption speed and throughput with negligible power cost from system-level aspect.Furthermore, an application-level case study is investigated.It shows that our scheme can achieve 90% latency reduction on average compared to the prior AES-based accelerator.
The test structures, present on 300 mm wafers, are connected to the measurement setup through a semi-automatic probe station, facilitated by a probe card.

Figure 1 :
Figure 1: Motivation and potential applications.Potential applications of memory encryp-

For a deeperFigure 2 :
scheme in FeFET array, details from different granularity and levels are demonstrated in

Figure 3 :
Figure 3: Experimental verification.(a-b) TEM and schematic cross section.(c-f) I D -V G

Figure 4 :
Figure 4: Evaluation Results.(a) Comparison with AES-based encryption scheme 18 .(b) La- -based scheme: 28 nm CMOS process.Our scheme: FeFET embedded in 28 nm HKMG platform *: only for AES engine / XOR gate a : Need 16 cycles (8 cycles for one pair and 8 cycles for another pair) to read 128 bits with 16 SAs.Latency can be reduced with more SAs AES