Distributed power analysis attack on SM4 encryption chip

Encryption chips are specialized integrated circuits that incorporate encryption algorithms for data encryption and decryption, ensuring data confidentiality and security. In China, the domestic SM4 algorithm is commonly utilized, as opposed to the international AES encryption algorithm. These widely implemented encryption standards have been proven to be difficult to crack through crypt analysis methods Currently, power consumption side-channel attacks are the most prevalent method. They involve capturing power consumption data during the encryption process and subsequently recovering the encryption key from this data. The two leading methods are Differential Power Analysis (DPA) and machine learning techniques. DPA does not necessitate prior knowledge but relies heavily on the number of power consumption curves. With only 50 power consumption data points, the accuracy is a mere 80%. Machine learning methods require prior knowledge, achieving an accuracy rate above 95% with only 30 power traces, albeit with training times typically exceeding 15 min. In this paper, a distributed energy analysis attack approach was presented based on Correlation Power Analysis (CPA). The power consumption data was divided into 16 subsets, with each subset corresponding to 8 bytes of the key. By training each subset separately, the 8-byte key’s corresponding power consumption data is reduced to only 100 dimensions, resulting in a 76% decrease in cracking time and a 3% improvement in cracking accuracy rate.This article also trains a more complex 256 classification model to directly crack the final key, achieving a success rate of 28% in cracking 128-bit passwords with only 1 power trace

Encryption chips play a significant role in today's society, in areas such as financial payments, smart cards, mobile devices, and more.For example, many electronic devices like bank cards, smartphones, and electronic passports use secure chips to store and protect sensitive information.However, despite employing various encryption algorithms to safeguard data security, they are not always secure in practice.The compromise of encryption chips can lead to various harms, including financial losses, personal privacy breaches, security vulnerabilities, and more.
Side-channel attacks and power analysis attacks have become hot topics in the field of encryption chip attacks.Side-channel attacks are a method of attack that can decrypt keys by monitoring the electromagnetic radiation, power consumption, or other physical characteristics generated by encryption chips.However, side-channel attacks are not always feasible because they require significant computation and analysis, along with physical access to the hardware of the encryption chip.With technological advancements, power analysis attacks have also become a potent weapon for decrypting encryption chips.This attack method has seen rapid development over the past few years, and many research findings related to power analysis attacks have been published in international journals and conferences.
In 1999, KOCHER proposed the power side-channel attack method 1 , which provided an alternative encryption attack method aside from mathematical analysis.This method revealed the relationship between encryption hardware and encrypted data by using physical information generated during hardware data encryption, such as power consumption, electromagnetic radiation, and time-based data, to decrypt encryption algorithms 2 .Subsequently, power analysis attacks have been widely applied and developed.For example, Chari et al. and Mangard et al. proposed attack methods based on power and electromagnetic radiation analysis, referred to as 'Differential Power Analysis' and 'Differential Electromagnetic Analysis, ' respectively.Additionally, with the rapid development of machine learning, especially deep learning 3 , there is an increasing number of researchers applying machine learning to side-channel attacks, and its effectiveness far surpasses traditional analytical methods.Backs et al 4 .applied machine learning to sound side-channel attacks on printers, Hospodar et al 5 .classified intermediate values in template attacks using least squares support vector machines, Lerman et al 6 .used algorithms like random forests, support vector machines, self-organizing maps for side-channel analysis, Heuser et al 7 .employed multi-class support vector machines for attacking multi-value (Hamming weight models),

SM4 encryption algorithm and hardware implementation SM4 encryption algorithm
The SM4 algorithm employs a 128-bit key and has a block key length of 128 bits.It's processes include round key addition, S-Box substitution, linear transformation, key expansion, and inverse operations.
The first step is key expansion as Fig. 1, which extends the key MK into 32 round keys rk i : The second part is plaintext encryption as Fig. 2: The entire encryption process can be described using a simple formula: Where the key MK, plaintext PT, and ciphertext CT are all 128 bits, and the key MK i , X i , and rk i are all 32 bits. (

Hardware implementation of SM4 encryption
The chip chosen for this project is the Atmel Xmega-128D4, and the target board used is the ChipWhisperer CW308 UFO board.
ChipWhisperer is a company specialized in providing side-channel attack tools and training.CW308 is one of their produced target boards used for side-channel attack experiments.
The CW308 target board features a 50 MHz XMEGA microcontroller and provides various peripheral interfaces, such as a high-speed ADC (12-bit, 105 MSPS), programmable clock (100-500 MHz), and a USB interface for connecting to the host computer.This board can be used in conjunction with the ChipWhisperer Lite or Pro USB analyzer for the analysis and attack of side-channels in embedded systems.
The hardware implementation steps for the SM4 encryption algorithm are as follows: (1) Microchip Studio is selected as the development environment, and the C++ code for SM4 encryption is initially written.(2) The project's main program, serial communication program, and driver programs are completed and compiled.
(3) The generated HEX file is burned into the Xmega128D4 chip.

Principle of power traces collection
The CPU exhibits different power consumption characteristics when executing different instructions.This is because different instructions trigger different numbers of semiconductors, and some instructions may access memory, cache, and so on.Complex instructions may also require more clock cycles than others, leading to various factors that result in distinct power consumption patterns during instruction execution.ChipWhisperer, through its power measurement interface, can detect voltage variations on the VCC power line of the target chip.
The greater the decrease in voltage, the higher the current CPU power consumption.By accurately sampling power variations, we can create a graph illustrating the changes in CPU power consumption.This allows us to identify relevant signal features that leak information about CPU operations and subsequently exploit them.

The settings for power consumption data acquisition
In this study, the number of sampling points is set at 24,400, with an analog-to-digital converter offset of 1250, and triggering on the rising edge.Two different power consumption curves are collected: (1) 500 power consumption traces with a fixed key and random plaintext, used for decryption.
(2) 10,000 power consumption traces with random key and random plaintext, used for training in machine learning methods.
* Figure 4 is an example of power trace

Power analysis attack
Correlation power analysis attack (CPA)

Attack principle
The effectiveness of side-channel attack depends on the selection of attack points within the cryptographic algorithm and the corresponding energy model.
In the case of the SM4 encryption algorithm, during each round iteration, the input for each round is XORed with the round key for that round, followed by passing through an S-box.The S-box is a non-linear transformation that generates significant power consumption during the transformation, as compared to linear transformations.Therefore, choosing this point as an intermediate value for power analysis attacks makes it easier to break the key.This value, denoted as V i atk , can be represented as follows: * HW() converts numbers into Hamming weight.

Cracking of the round key
Starting with the cracking of rk 0 [0] , proceed to crack every 8 bits: (1) With n identical keys and different plaintexts, compute the correlation coefficient between the power trace samples and the intermediate value Hamming weight, as expressed by the following formula: P jk represents the value of the k-th sample point of the j-th power trace, and P k represents the value vector of all power traces at time k:" www.nature.com/scientificreports/ (2) For each power trace, calculate V for m ∈ (0, 256): PT j represents the plaintext of the j-th power trace.
Where P k and V rk 0 [0]=m are both n-dimensional vectors, and their correlation coefficient calculation formula is: Iterate through sample points k ∈ (0, 24400) and m ∈ (0, 256) , searching for the maximum point.At this point: Take key as the cracked value for rk 0 [0] , it appears at the pos-th sample point in the power trace.Use this method to sequentially crack rk 0 [1] , rk 0 [2] , rk 0 [3] , and obtain the complete rk 0 .
Equation ( 17) is used to crack rk 1 .In total, 4 rounds of cracking are performed to obtain rk 0 , rk 1 , rk 2 , and rk 3 , from which the original key is reconstructed.

Recovery of key
Given rk 0 , rk 1 , rk 2 , rk 3 , with CK i as fixed parameters, and L ′ representing a constant linear transformation within the encryption, the formula yields: Subsequently, the SM4 key MK is reconstructed from K 0 , K 1 , K 2 , K 3 , and FK i (fixed parameters), following this computation method: At this point, the initial key is fully recovered (Fig. 5).
(9) www.nature.com/scientificreports/ The weaknesses of CPA attack (1) CPA attacks depend on the number of power traces to be cracked.In the experiments, the number of power traces was gradually reduced, and multiple inputs were used.The success rate of CPA attacks on the SM4 encryption chip is as Fig. 6: (2) The CPA method relies on clock alignment, and it cannot crack the key when clock asynchrony is introduced.
(3) The CPA method cannot crack the key when random masking is applied.

Attack principle
For each power trace, there is a corresponding attack intermediate value.If multiple power traces are obtained with random plaintext and random keys, and their corresponding intermediate values are calculated, with power traces represented as X and the Hamming weight of the corresponding intermediate value as Y, we can use machine learning methods to train a model: Then, the trained model is applied to the target power traces for key recovery 10 .

Attack steps (1) PCA (Principal Component Analysis) Dimensionality Reduction
The original power traces consist of 24,400 sampling points.Regardless of the training mode, a dataset with 24,400 dimensions would require an impractically long training time.The core idea of PCA is as follows: the principal components of a matrix are the eigenvectors of its covariance matrix, sorted by their corresponding eigenvalues.PCA reduces a set of potentially correlated high-dimensional variables into a set of lower-dimensional, linearly uncorrelated variables known as principal components.These lower-dimensional data components aim to retain as much of the original data's variance as possible.Without delving into specific details, the PCA algorithm can be applied to achieve data dimensionality reduction through the use of PCA API calls.
(2) Machine Learning-Based Key Recovery Using machine learning methods, a corresponding Y is obtained for each power trace.
Taking the attack on rk 0 [0] as an example, iterate through rk 0 [0] ∈ (0, 256) , corresponding to the i-th power trace, with the intermediate value: Compare each bit of all Y m with Y one by one, and determine the value of m that makes the most identical bits as the cracked key.

Attack performance
We trained with 10,000 power traces and used 30 power traces for key cracking.We employed three different methods: SVM, LSTM, and CNN.Multiple experiments were conducted, and the results are recorded as follows: From Table 1, it can be observed that as the dimensionality increases, the training time becomes longer, and the success rate improves.Conversely, with fewer dimensions, training time is reduced, but the success rate decreases.
Analyzing the underlying reasons, excessive dimensionality reduction results in data loss.While it may improve speed, it leads to a decrease in success rate.PCA dimensionality reduction employs the same approach for cracking each round key, which makes it unable to capture the specific sampling points corresponding to each round key, preventing precise matching.
The appendix includes SM4 power traces and a self-made SVM-based SM4 encryption chip decryptor, with customizable parameters.
Comparison between the CPA method and the machine learning method CPA method and machine learning method have their own advantages and disadvantages.CPA method does not require prior information and has a fast cracking speed, but it requires a large number of power traces to be cracked.In contrast, the machine learning cracking method requires prior information (a significant amount of historical power traces), has a slower cracking speed, but requires fewer power traces to be cracked.Their characteristics are shown in Table 2.

Distributed power analysis attack Attack principle
Distributed power attack is based on the correlation coefficient between each 8-bit round key and the power trace, which samples the power trace to generate 16 sub-power traces.This reduces the dimensionality of the data, thereby improving the cracking efficiency.At the same time, each sub-power trace is more targeted and less susceptible to interference, which can increase the success rate.
From Fig. 7, it can be observed that different round keys manifest at different positions on the power traces.By considering the magnitude of the correlation coefficients, it is possible to extract the sampling points from the power traces for each round key.

Attack performance
In this study, we selected the top 100 points with the highest correlation coefficients for extraction, forming 16 sub-traces.We then trained them separately using machine learning-based methods and conducted multiple experiments, with the results as follows: From Tables 3 and 4, it can be observed that the three machine learning methods have similar accuracy.When using 10 power traces, the success rate is approximately around 70%.In terms of training time, SVM is the fastest, while LSTM takes the longest.

The generality of distributed power analysis attack methods
This paper applies the method to the cracking of AES encryption chips, taking SVM as an example.Similarly, we selected 10,000 power traces as the training set and used the output values of the S-box as the intermediate values for the attack.We compared the training time and attack success rate of the two, and the results are shown in Table 5. www.nature.com/scientificreports/ From Table 5, it can be observed that the speed of cracking AES encryption chips is faster, and the success rate is higher.This is because in the AES encryption process, the key is XORed directly with the plaintext, making the key more vulnerable to exposure.On the other hand, SM4 generates round keys from the key before interacting with the plaintext, making it more concealed and challenging to crack.

Attacking of masked SM4 encryption chip
In engineering, it is common to incorporate masking techniques into the encryption process to counteract sidechannel attacks, such as power analysis attacks.Compared to standard encryption methods, masking involves operations like Galois multiplicate or XOR with intermediate values within the encryption process.
The location at which the sub-plaintext and the round key first interact during the SM4 encryption process: If the mask is added before this point, it is treated as the key, and the method mentioned earlier is used to find the intermediate value, and the mask is attacked.If the mask is added after this point, the round keys are cracked first, and the entire mask and key are cracked through multiple iterations using this method.
In this paper, an attempt was made to perform XOR operations with a fixed mask at each S-box output, and using this method, all the keys and masks were successfully cracked.

Table 1 .
Cracking speed and success rate of various machine learning methods. 1 (s) represents the cracking time of the method. 2(%) represents the success rate of the cracking

Table 2 .
Comparison of CPA and machine learning.

Table 3 .
Success rates of various machine learning methods.

Table 4 .
Processing times of various machine learning methods.

Table 5 .
Comparison of AES and SM4 encryption crack.