Min-entropy estimation for semiconductor superlattice true random number generators

Semiconductor superlattice true random number generator (SSL-TRNG) has an outstanding practical property on high-throughput and high-security cryptographic applications. Security in random number generators is closely related to the min-entropy of the raw output because feeding cryptographic applications with insufficient entropy leads to poor security and vulnerability to malicious attacks. However, no research has focused on the min-entropy estimation based on the stochastic model for SSL-TRNG, which is a highly recommended method for evaluating the security of a specific TRNG structure. A min-entropy estimation method is proposed in this paper for the SSL-TRNG by extending the Markov stochastic model derived from the memory effects. By calculating the boundary of the transition matrix, the min-entropy result is the average value of each sample (1 bit) is 0.2487. Moreover, the experimental results show that the estimator is accurate enough to adjust compression rate dynamically in post-processing to reach the required security level, estimating entropy on the fly rather than off-line.

www.nature.com/scientificreports/ At present, theoretical entropy estimation and statistical entropy estimation are the mainstream methods to estimate the entropy. References [22][23][24][25] introduced the theoretical proof of TRNG safety obtained from a reasonable random model. However, making appropriate assumptions is already complicated, not to mention that some TRNG structures do not even have apposite stochastic models 26,27 . Relatively, statistical entropy estimation treats various types of TRNGs as black boxes for statistical testing and still based on the idea of entropy estimation, which can solve some problems that TRNGs cannot quantify by modeling entropy estimation. According to the ISO/IEC 18031 28 and AIS 31 29 standards, it is recommended to use theoretical entropy estimation to evaluate the quality of TRNG. Semiconductor superlattices (SSL) is an all-solid-state electronic device periodically grown by two semiconductor materials with matching lattice 30 . In 1996, Zhang et al. 31 first observed the chaos current oscillation in a lightly doped and weakly coupled GaAs/AlAs superlattice under a DC bias voltage. However, the chaos oscillation phenomenon only can be observed in a limited temperature range. In 2012, Huang et al. 32 proposed to use GaAs/Al 0.45 Ga 0. 55 As material instead of GaAs/AlAs to grow semiconductor superlattice and successfully observed chaos oscillation phenomenon at room temperature experimentally. Many scholars have confirmed that the SSL is an ideal entropy source by exploring the structure of the GaAs/Al 0.45 Ga 0. 55 As SSL and the large-amplitude chaos current oscillation generate truly random numbers [33][34][35] . Moreover, the high-throughput embedded system of semiconductor superlattice true random number generator (SSL-TRNG) was reported recently 36 . SSL-TRNG is very practical, and the random numbers generated can be used as a key in high-end security cryptographic applications to ensure security. However, no research has focused on the security analysis based on the stochastic model for SSL-TRNG.
In this paper, for the first time, we introduce the Markov stochastic model derived from the memory effects of SSL-TRNG and its use for min-entropy estimation in realistic conditions. First, the lower bound of the minentropy is obtained by computing the boundary of the transition matrix at a high confidence level. Then we design simulations and experiments to verify the theoretical conclusions. By computing bounds on the transition matrix, the min-entropy result is 0.2487 on average per sample (1 bit). Therefore, more SSL-SKD output bits can significantly increase the speed of random number generation and the efficiency of entropy utilization to ensure sufficient entropy through the method proposed in this paper. Moreover, we demonstrate that the estimator is effective enough to support online estimation.

Results
Entropy source. The chaotic oscillation phenomenon of SSL can be used to generate random bits at high speed and enough entropy, which has attracted considerable interest recently [33][34][35] . Under specific offset voltage, the SSL is an ideal non-linear dynamic system with one-dimensional multi-degree-freedom. Its non-linear characteristic comes from the negative differential conductance phenomenon is caused by electrons forming cascade resonance tunneling through quantum wells 31,32 . Since quantum mechanics is extremely sensitive to specific nanostructures in SSL, random fluctuations affect the atomic level during the growth process result in the unique and unpredictable nanostructures of SSL devices. When the static field domain is subject to external interference, the SSL exhibits a transient chaos phenomenon 37 , sensitive to slight differences in input signals. At the same time, it has a memory effect 38 due to the charge storage of the quantum well. Under continuous input signal excitation, experimental observations show that at the specific moment, the output of a superlattice device is not only related to the current excitation but also related to the dynamic system state caused by the accumulation of historical inputs. Besides, the output bandwidth of the SSL can reach 500 MHz due to the high-frequency chaos oscillation.
As it turned out, the SSL combines with high-throughput and high-security as the entropy source to generate random numbers has the following application advantages: (1) The random number is generated and derived internally by the physical structure and cannot be cloned mathematically and physically. (2) The SSL devices can mass-production parallelly in standard semiconductor manufacturing processes. (3) The SSL can operate above room temperature and resist environmental fluctuations and human interference. (4) The SSL devices are low in cost and simple in application mechanism, which can easily implement electronically. Figure 1 shows the architecture of the SSL-TRNG. The SSL device exhibits excellent performance as an entropy source to generate a random sequence. The TRNG system generally is composed of the three fundamental components: entropy source, entropy harvester, and entropy extraction 39 . Entropy estimation, adding to the components of SSL-TRNG, and providing security guarantee and anomaly detection to applications.

SSL-TRNG principle.
The entropy harvester is a generalized mechanism that samples the original waveform output from the entropy source and converts it into a binary sequence. Its implementation efficiency depends on the efficiency of the selected entropy source. At first, the output of the SSL device will be digitized since it is an analog waveform. Then, through the analog-to-digital converter (ADC) digitization process, the chaotic current signal can be sampled and quantized into the original random sequence.
The entropy estimation gives how much entropy is contained in the original number sequence and provides parameters for entropy extraction. Moreover, the online entropy estimation mechanism finds out the running defects of the system in time and ensures robustness.
The entropy extraction, also known as randomness extractor, aims to convert the original random sequence from harvester into shorter and almost uniformly distributed random sequences. Numerous extraction methods such as the XOR method, Von Neumann extractor, and least significant bit (LSB) function are used widely 40,41 . Although these schemes are simple to implement, they may fail to correct the deviation and cause high entropy loss 42 . In the following narrative, universal hash functions will be our scheme to provide information-theoretic www.nature.com/scientificreports/ security 43 . The entropy extraction stage is compressive, and the full entropy random sequence will be obtained through this process.

Time complexity and space complexity.
In the entropy estimation algorithm proposed in this paper, the most time-consuming is to obtain the P matrix from the original sequence. The process of calculating the P matrix requires a double "for" loop, where the length of the original sequence determines the number of the outer "for" loops and the inner is the quantized state space size. Obviously, in this experiment, the quantized state space size is 2. Then, get the T(n) = O(n).
In the program operation, the temporarily stored data includes the original data sequence and the transition matrix. The size of the matrix will calculate by bit quantization of the data. In this experiment, 1-bit quantization is 4, and then get the S(n) = O(n).
From the above discussion, the algorithm has linear time and space complexity. It has obvious advantages for realizing online entropy estimation.
Estimation results. Using the method of estimating the min-entropy proposed in this paper, we conducted multiple sampling tests on the original data sequence generated by SSL-TRNG, and each sample contained 1,000,000 samples. The obtained min-entropy results are shown in Table 1. Table 1 also lists the calculation results of the transition matrix in estimating the min-entropy and the matrix boundary when the confidence level is 95%. Similarly, using the Markov Model of NIST to estimate the min-entropy of original output sequence, the results are listed in the right column of this work.
According to the results of entropy estimation in Table 1, the min-entropy (per 1 bit) of the original output sequence of the superlattice device is 0.2487 bit (this work) and 0.3641 bit (Markov model). This work finds the lower bound of the min-entropy based on the Markov method, so the entropy estimation results are more accurate, which can also reflect in the Table 1. The results also indicate the upper limit of the compression rate of the entropy extractor. In addition, it can find that the result of min-entropy has less fluctuation, which can reflect the stability of the random number system of the superlattice device to a certain extent.

Statistical test.
There is a recognized and accepted standard for statistical testing randomness, which is the statistical test suite 800-22 from the National Institute of Standards and Technology (NIST) 44 contains 15 subtest items.The NIST standard requires that the length of the sequence to be tested should be at least 1Mbit, and their uniformity judge by checking the distribution of the P-values. The judgment result gives by the P-value P T and the proportion σ . In this experiment, 1000 bitstreams with a length of 1Mbit used for the NIST statistical test under the significance level 0.01. Then, the P-value P T should be greater than 0.0001, and the proportion σ should be greater than 0.98. Table 2 shows the NIST SP 800-22 statistical test results for SSL-TRNG. We conclude that the random numbers generated by SSL-TRNG can pass the evaluation of NIST 800-22, where the parameters of the extractor are determined by the entropy estimation results in this paper.

Discussion and conclusions
Entropy is an important metric in secure systems. There are many methods of entropy estimation. In addition to min-entropy, there is Shannon entropy, Rényi entropy, collision entropy, etc. In this paper, the conservative method is used to estimate the min-entropy of sequences generated by semiconductor superlattice to ensure that the SSL-TRNG outputs full entropy random numbers. According to the entropy estimation results and the property of the SSL entropy source, SSL-TRNG can generate full entropy sequences at high speed, which can satisfy the application of one-time pad cipher. At the same time, it can provide random bits for the cryptographic primitive such as symmetric ciphers, public-key cryptography, certificates, signatures, which play a significant role in the blockchain and the Internet of Things to protect core applications and defend against invasion [45][46][47] . We collect TRNGs with ADC sampling Oscillate, Optical vacuum fluctuation, Stokes field phase fluctuations, and quantum as entropy sources and show the min-entropy (per sample) and full-entropy throughput (Mb/s) of SSL-TRNG in comparison with that of other TRNGs in Table 3. In terms of security and performance, our  www.nature.com/scientificreports/ work achieves significantly higher entropy bit rates for a given confidence level than the TRNG of ADC sampling Oscillate (33 Mb/s in Ref. 13 ), the TRNG of Stokes field phase fluctuations (145 Mb/s in Ref. 48 ). The only directly comparable work which offers a min-entropy (Per 1 bit) is Ref. 13 , whose full-entropy throughput is less 46 times than ours. Our total entropy throughput rate is slightly lower than that of quantum TRNG (1770 Mb/s in Ref. 49 ), and TRNG, whose entropy source is optical vacuum fluctuation, is (6000 Mb/s in Ref. 50 ) four times that of ours. SSL-TRNG performs well in cryptographic applications with high-security and high speed requirements. Compared with true random number generators, which use other physical entropy sources, SSL-TRNG fully adapts in terms of throughput, frequency, area, etc. Though, SSL-TRNG is easy to implement lightweight and miniaturized hardware. In addition, SSL devices can be mass-produced and are resistant to environmental fluctuations and human interference. They are implemented electronically without the high cost and complex application mechanisms. It achieves the best balance between speed and ease of use.
SSL-TRNG uses semiconductor superlattices as physical entropy source to generate truly random numbers. And entropy estimation provides a crucial evaluation for the security of SSL-TRNG. In this work, we propose a min-entropy estimation method for the SSL-TRNG and verify its feasibility for the first time. In particular, the stochastic model established using the Markov model as a template heuristically. By looking for the boundary of the Markov transition matrix, get the lower bound of min-entropy under a high confidence level. Through experiments, the average result of min-entropy is 0.2487 per sample (1 bit). In addition, the results also prove that the estimator is accurate enough to dynamically adjust the compression ratio in post-processing to achieve the required security level, estimating entropy instantly instead of offline.
The work of this paper not only provides a security guarantee for SSL-TRNG, but also a new clew for the research of quantifying the SSL physical entropy source. Our future work will be extended by adding experimental samples, expanding the entropy estimation model and in-depth analysis entropy source to this research, further enhancing model selection and parameter optimization for similar entropy estimation problems.

Preliminaries.
Min-entropy is the most conservative method to measure the unpredictability of a set of sequences.
Definition 1 Suppose that the independent discrete random variable X takes a value from the finite set A = x 1 , x 2 , . . . , x n when i = 1, . . . , n , the min-entropy with probability P r (X = x i ) = p i is From the previous discussion, the output sequence from the SSL-TRNG entropy source has memory effects. The current output is not only related to the current excitation but also the historical input. The dependency between the output sequence is the most complex complication to address 54 . It should think whether it is feasible to solve this difficulty by accepting a simple output-dependent model and analyzing the model, but in fact, it is impractical or impossible to obtain an accurate stochastic model of the output sequences.
The Markov model 55 is a typical example of data dependence: the next output state of the N-order Markov process depends on the previous N output states. Heuristically, we use the Markov model as a template and establish a stochastic model for the output sequence of the SSL-TRNG entropy source. Therefore, the dependence of the output sequence is limited to the Markov process.

Definition 2
The Markov process defines by three elements: (1) State space X. X is a set containing all states.
(2) Transition matrix P. The elements in P are defined as which means the transition probability from the current state i to the next state j is P ij .
(3) Initial state distribution p(x (0) ) . The meaning is that when t = 0 , x takes the corresponding probability of any possible state in the state space.
(2) P ij = p x (t+1) = j|x (t) = i , www.nature.com/scientificreports/ A stochastic process {X n } n∈N that takes values from a finite set A is called a first-order Markov chain 56 , if for all n ∈ N and all x 0 , x 1 , · · · , x k ∈ A . The initial probability p(x (0) ) of the chain are p i = P r (X 0 = i) , whereas the transition probabilities P ij are P r (X n+1 = j|X n = i).

Definition 4
The min-entropy of a Markov chain with length L is defined as Min-entropy estimation of TRNG. The entropy estimation method of TRNG includes two processes: establishing a stochastic model and estimating entropy 51 . First, assumptions are made about the entropy source of the TRNG based on the noise model, such as the noise source obeys independent normal distribution. Then, the process of converting noise sources into random bits describes in mathematical language according to the proposed hypothesis and the working principle of TRNG, which is to establish a stochastic model. Finally, the probability distribution of the output can be calculated and estimated the entropy of the TRNG according to the established random model. As far as we know, lots of work has done to establish stochastic models and estimate entropy for various TRNGs. Generally, TRNGs have their corresponding stochastic models, though some stochastic models are generic and adapt to several generators 25 . Specifically, Refs. 23,24 investigate models through the evolution of phase, and Refs. 22,52,53,57 through the time for elementary oscillator-based TRNG (EO-TRNG). The chaos-based TRNGs use ADC to build chaotic circuits 58 or sample chaotic signals to generate random sequences 26,27 . Under the absence of corresponding stochastic models, the theoretical entropy sufficiency cannot guarantee. The NIST Special Publication 800-90B 55 , whose latest version was published in January 2018, is a typical representative of entropy estimation. Its specific content includes estimating the entropy source's min-entropy and providing a standard for designing and testing the entropy source. Reference 59 proposes using neural network technology to solve the min-entropy estimation problem, which provides a new idea for entropy estimation. By extending an existing model and the multi-bit ADC output, Ref. 13 obtain the lower bound of the entropy for the ADC sampling-based TRNG. Ref. 42 presents a method for maximizing the conditional min-entropy of the random sequence generated by quantum-to-classical-noise ratio. To address the limitations about the entropy source outputs may be dependent and the distribution of random variables may change over time, Ref. 56 proposes alternative methods for estimating the entropy in each output from an entropy source based on concepts from machine learning.

Stochastic model of SSL-TRNG.
Suppose that X(t) = {x 1 , x 2 , . . . , x L } are the sampling output sequence of the SSL-TRNG entropy source and the length is L. Further, suppose that X(t) is a Markov process with the initial state distribution p(x (0) ) and the transition matrix is P ∈ [0, 1] n×n , where the X(t) determines the p(x (0) ) and the matrix P. In X(t) = {x 1 , x 2 , . . . , x L } , we count the frequency of x (0) to estimate p(x (0) ) and each state transitioning to other states to estimate P ij . Obviously, the size of the X(t) affects their accuracy directly because some infrequent transitions may not appear in the X(t) data set.
Therefore, the min-entropy of X(t) can be defined as In the Markov process, accurately estimating the transition probability matrix is vital for estimating the entropy. In this case, if we overestimate the transition probability, the min-entropy will be underestimated. However, lots of tests will minimize the possibility.
H ∞ P, p(x (0) ), n = min  www.nature.com/scientificreports/ According to the Eq. (7), P ij is the only variable. Then the minimum bound of the min-entropy H ∞ by finding the maximum bound of the transition matrix P. Suppose there is a matrix M such that M ij ≥ P ij (i, j = 1, . . . , n) , then H ∞ (M, p(x (0) ), n) ≤ H ∞ (P, p(x (0) ), n) by the monotonic decline of the −log 2 function 54 .
So how to get such a matrix M? Suppose the state i from X(t), and the transition probability P ij from state i to state j, where i, j = 1, . . . , n . We choose a value m ij and define the confidence interval [0, m ij ] , so that our choice satisfies the confidence level α : P[M ij ≤ m ij |p i , p ij ] ≥ α.
The interval with the confidence level α is obtained by calculating the probability that more transitions are expected to be observed than actual. Similarly, we can define m ij in terms of the observed proportion: where Then, Hoeffding's inequality limits the error of matrix M within the prescribed confidence. In this way, the bound of the transition matrix M ∈ [0, 1] n×n the value of m ij is calculated by Eq.